LAND ECONOMY WORKING PAPER SERIES ___________________________________________________________________ Number 37: Probability Misspecification Errors in the UK Non-native Species Risk Assessment Scheme and Possible Economic Implications Corresponding Author: Neil McRoberts Land Economy & Environment Research Group SAC Research Division SAC Edinburgh EH9 3JG Tel: 0131-535-4154 Email: neil.mcroberts@sac.ac.uk Probability misspecication Errors in the UK non-Native Species Risk Assessment Scheme and Possible Economic Implications Neil McRoberts Land Economy & Environment Research Group, SAC, Edinburgh EH9 3JG, UK Gareth Hughes School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JG, UK Summary The published papers by Holt [1] and Holt et al. [2] describing the scientic basis for the current UK non-native risk assessment scheme (NNRAS) contain several fatal methodological errors: The papers make a misspecication error in the denition of likelihood ratios: the false negative proportion is wrongly used in place of the false positive proportion. The misspecication error is compounded into a series of further problems: 1. Likelihoods are wrongly equated with odds and resulting calculations, in which supposed odds are multiplied together, are mathematically meaningless; 2. The misspecication error appears to result from a lack of understanding of the algebra of conditional probability. In addition to the misspecication problem, this lack of understanding also allows conditionality to be ascribed to the data seemingly deus ex machina rather than by reference to a known source of information; e.g. a gold standard; 3. Without reference to a gold standard and a resulting classication of existing data, a constant prior probability of invasiveness of 0.5 is attributed to every test. This value is arithmetically necessary to resolve the erroneous calculation of odds proposed in the NNRAS but is; (a) not supported by the available evidence; (b) overly uninformative for many test species, and; (c) gives the NNRAS an apparent capacity to provide information which it does not possess mathematically. 4. Furthermore, a prior probability of 0.5 is used in the published calculations so that it cancels from the calculation of posterior odds and the nal estimate of the 'odds' of invasiveness depends only on the test data. This being the case, a Bayesian framework is not needed, since there is no process of probability updating and the calculated risk scale is eectively a classicaly summary statistic, embellished with unnecessary and meaningless probability calculations. There is no simple means to perform a calculation of expected economic performance for the NNRAS, because the authors did not carry out (or at least do not provide) a proper case/control analysis of their risk calculations. If the misspecication error is included in expected cost calculations, using representative estimates for costs and error rates from the literature, the analysis suggests that at there is a very low probability threshold of invasiveness above which the least expected cost action for the NNRAS is simply to assume that all test species are invasive without carrying out a risk assessment and deny entry. The practical consequence of this would be that the scheme would be selfdefeating (since it would almost always be cheapest simply to assume that a new species was invasive without bothering to test it), overly restrictive to trade and therefor likely to be challenged under international trade legislation. The current scheme would be dicult to defend against such a challenge since it is not transparent, and cannot be defended economically. 1 Introduction A UK strategy to deal with threats from the importation of unwanted invasive species was launched in May 2008 (http://www.scotland.gov.uk/News/Releases/2008/05/28101303). Part of the strategy is based on a risk assessment scheme developed under a DEFRA research contract and peer-reviewed by the consultants RPS following a competitive tendering process administrated by SEERAD (now Scottish Government RERAD). Specic methodological components of the risk assessment scheme, which deal with calculation of the odds of an unknown species being invasive, have been published in peer-reviewed journals [1,2]. The risk assessment scheme has also recently been summarized in peer-reviewed form [3]. We point out that the fact that the scheme has been passed by peer review does not guarantee it to be free from errors. In fact, given that it certainly does contain errors, all that can be said is that it (i.e. the scheme) provides one piece of evidence that the peer review processes of two journals and one consulting agency are not perfect; no-one involved in scientic research should be suprised by this. However, given the need for transparency in the evidence base used for policy formulation, we msut question whether it can ever be acceptable for work which is known to be awed to continue to be used as the basis for enacting policy. There is even less reason for this to continue when the changes required to correct the aws have been pointed out and the necessary corrections placed in the public domain [4, and this paper]. Hughes [4] gives a detailed technical description of the errors and some suggestions for calculating a more informative overall risk score based on the same data as the awed UK NNRAS method. Here, we provide some further comments on the important errors in the current scheme and show: (a) that it leads to diculties in establishing appropriate error rates and expected error costs, and; (b) that the most logical error rates and costs which can be derived from the published scheme are opposite to the correct versions. This has several potentially serious implications since: (i ) the costs of clean-up following the wrongful importation of an invasive species are typically orders of magnitude greater than the losses to the economy from wrongful denial of entry to a non-invasive species, and; (ii ) the costs of clean-up for an invasive species are usually public costs, while the losses from non-importation are mostly private costs. We also discuss an ineciency in the suggested approach for using the risk assessment scheme which ignores useful available information about the prior (or background) probability of invasiveness and may bias the scheme towards apparently generating useful information more often than is rationally justied by the data. This issue would be particularly serious for potential users of the scheme who are not well enough. The remainder of this paper is divided into three sections, each dealing with an identied problem with the current UK NNRAS. The rst section considers the underlying technical problem of misspecication errors in the conditional probabilities used in the scheme, and we diagnose the likely sources of confusion that led to those errors. In the second section we examine some potential economic consequences of any attempt to use the current scheme within a formal decision-making framework. Thirdly, we highlight the background theory which supports our assertion that the current scheme is systematically biased towards providing more apparent evidence about invasiveness than will normally be justied by available evidence. 1. Updating probabilities of invasiveness using data from risk assessment scoring Denitions and notation Gold standard : An independent assessment of the risk status of each species which is taken to be the true evaluation of the its risk status, and against which the performance of the potential risk assessment scheme is assessed. 2 Conditional probability : The probability that some event or state of aairs (v , say) given that another event/state of aairs (r , say) is true, written P(vjr ); :v (or equivalently v) signies the negation of event v . So, for example, if v stands for ``is invasive" and r stands for "predicted to be invasive", :v would stand for ``not invasive" and P(:v j:r) would be read as ``the probability of being not invasive given a prediction of non-invasiveness. Likelihoods : Likelihoods are conditional probabilities (see above). In statistical theory they are most commonly encountered in hypothesis testing and model tting, where the term likelihood typically refers to the probability of the data given the truth of an hypothesis. In the current context they will occur most commonly as probabilities of species being either invasive or noninvasive given the result of a risk assessment. A likelihood ratio (LR) is simply the result of dividing one likelihood by another. Odds : The odds of an event or state of aairs is a function of the probability of the event such that odds(v ) = P (v)/(1 ? P (v )). The inverse relationship, which expresses probabilities as functions of odds, is P (v) = odds(v)/(1 + odds(v)): Bayesian updating : The odds of uncertain events change as new evidence accumulates. Bayes' theorem1 provides a formal way to perform the updating process. The theorem can be written in a number of ways, but a useful form in the current context is shown in equation 1: odds( )posterior = LR odds( )prior v v (1) Equation 1 states that the odds of v after (posterior to) some new evidence has been accumulated is equal to the product of the likelihood ratio associated with the evidence and the odds of the event prior to the evidence being available. Here, the prior and posterior odds refer to the possibility that a candidate species for import is invasive, while the evidence is the result of performing a risk assessment on the species. Dening Likelihood ratios from an evidence base Consider Figure 1, which is adapted from Figure 1 in Hughes & Madden [5]. The gure is a case/control plot which shows the distributions of test scores for plant species included in the Australian weed risk assessment database, previously published by Pheloung et al . [6]. The horizontal axis (WRA) shows the values for a weed risk index derived from the individual risk components. Note that WRA here is equivalent to the term si [1,2]. Increasing values of WRA indicate an increasing probability that a species presents a risk of being a weedy invasive (such as Japanese Knotweed, for example). There are two frequency distributions in Fig. 1, one for species known to be invasive weeds (the cases) and the other for species known not to be invasives (the controls). These classications into invasive and non-invasive are made by reference to an independent gold standard (this could be expert opinion, known historical fact or some other reference which is independent of the risk assessment scheme). Note further that the ranges of the scores for invasives and non-invasives show considerable overlap so that there is no single threshold WRA value, such that all invasives have scores equal to or greater than this value and all non-invasives have scores less than this value. Hughes & Madden [5] discuss the statisitcal methods used to select a WRA threshold value which is optimised according to various dierent criteria. Hereafter we will use T to refer to the threshold score generically. In Fig 1. a threshold value of T = 4 is indicated by the dashed vertical line. Note that most, but not all, invasives have scores greater than 4 and most, but not all, non-invasives have scores less than 4. 1. Bayes theorem (credited posthumously to the Rev. Thomas Bayes (1702-1761)) is typically written as: a) Pr(a) Pr(a jb) = Pr(b Pr( in which Pr(b ja) is the likelihood, Pr(a) is the prior probability of a, and Pr(b) is the b) j probability of the data 3 Figure 1. Distributions of weed risk assessment scores for invasive (squares, dotted line) and non-invasive (circles, solid line) species, based on the data of Pheloung et al. [6] as analysed by Hughes & Madden [5]. A threshold WRA score T = 4 is indicated by the dashed vertical line. Now, having immposed a threshold, T , we can use the frequency distributions for invasives and non-invasives to specify four further quantities, as indicated in Fig. 1: 1. The True Positive Proportion (TPP) is the proportion of invasives (cases) with WRA scores greater than or equal to T . 2. The False Negative Proportion (FNP) is the proportion of invasives (cases) with WRA scores less than T . 3. The False Positive Proportion (FPP) is the proportion of non-invasives (controls) with WRA scores greater than or equal to T . 4. The True Negative Proportion (TNP) is the proportion of non-invasives (controls) with WRA scores less than T . Since TPP and FNP together account for all of the invasives, TPP+FNP=1, and, similarly, TNP+FPP=1, since these two proportions account for all of the non-invasives. Each of the proportions is an estimate of a likelihood , and therefore (as dened above) a conditional probability. For example the TPP is an estimate of the probability that a specimen will have a WRA score greater than or equal to T , given that it is invasive; i.e. TPP ' (WRA j ). Similarly, FPP ' (WRA j: ) (i.e. FPP is an estimate of the probability that a specimen will have a WRA score greater than T given that it is a non-invasive). Together these two likelihoods dene the likelihood ratio for a positive risk assessment (i.e. positive in the sense that it produces a value greater than or equal to the selected threshold), as shown in equation 2: P P T T v v TPP (WRA j ) LR+ = TPP FPP = 1 ? TNP = (WRA j: ) P P 4 T v T v (2) Now, Holt [2] and Holt et al . [3] correctly speciy this likelihood ratio in terms of conditional probabilities (for example equation 2 in [1] is correct and can be compared directly with equation 2, above), but they use the wrong quantity for the FPP. We describe the mistake in more detail at the end of this section, but here the important point to note is that equation 2, above, is the correct form, and that what Holt [1,2] actually calculate is shown below as equation 3: (WRA j ) = TPP LR+Holt = 1 ?TPP = (3) TPP (WRA j ) FNP P T v P <T v Referring back to Fig. 1 it is clear that this misspecication error in the NNRAS [1,2] will introduce systematic errors into any risk assessment which is based on their treatment of conditional probabilities. To highlight this, consider the eect on the quantities generated by equation 2 (the correct form) and equation 3 (the incorrect form) as we vary the threshold T over the range of possible WRA scores. Reducing the threshold will increase TPP and FPP, but decrease FNP. Thus, the eect of decreasing the threshold in the outcome of equation 2 is unclear and will depend on the specic way which the distributions of invasives and non-invasives vary over the range of WRA. However, while the exact eect of decreasing T on the true LR+ is uncertain it will eventually lead to a value of to LR+= 1, since TPP = 1 and FPP = 1 if T is set at its minimum possible value. Referring back to equation 1, we can see that this makes sense. If T is set to its minimum value, all of the cases and all of the controls will have WRA values greater than T and hence the threshold will provide no discrimination betwee invasive and non-invasive species. In this case we would expect the odds of a species being invasive not to be changed by using the test, and the value of LR+ = 1 in equation 1 would ensure that this is the case. In contrast to the situation for the correct LR+, it is straightforward to see the implications of the incorrect version of LR+Holt given by Holt [2]. This will increase aif we lower the sthreshold, since as T gets smaller, TPP increases and therefore FNP decreases (recall that FNP = 1-TPP). The converse eects are produced if we set higher thresholds by moving T to the right, increasing the risk threshold. As the risk threshold is increased TPP reduces and FNP increases and thus, LR+Holt decreases. This is opposite to what is required. If the risk score is positively correlated with the odds that an unknown species is invasive, other things being equal, we would anticipate that obtaining a risk score greater than or equal to a high threshold would indicate a high odds of the species in question being invasive; the data in Fig. 1 display this characteristic. Accordingly, as we increase the value of T , we would expect LR+ to increase so that equation 1 will reect the intuitive result just described. In the correct specication of LR+ this is what happens because FPP decreases more quickly than TPP as the threshold increases until with T set at its maximum possible value, both proportions equal 0. At at which the maximum possible value of the threshold, LR+ collapses to 0 and no specimen is classed as invasive. To reiterate, the LR+Holt, will decrease as T increases so that for increasingly stringent tests of invasiveness, equation 1, will make less and less dierence to the odds of a species which tests positive being classed as invasive; this is clearly opposite to what is required of a rational risk assessment scheme in which exceeding a higher risk threshold ought to provide more evidence of a test species being invasive. It should be apparent that the methodology for updating odds based on evidence is dependent on the availability of the independent gold standard classication of species in the database as either invasive or non-invasive. All of calculations of conditional odds are made relative to this classication. Indeed, the gold standard classication is the conditional information on which the entire Bayesian methodology for risk evaluation and updating is based. As Hughes [4] pointed out, in the empirical examples discussed in the NNRAS [1,2] this conditional information is either absent or it is not used in constructing the scheme, (conditionality being attributed apparently deus ex machina ). However, the authors make informal use of the agreement between their risk scores and the available gold standard (expert judgement in this case) as part of the evidence in support of their claim that the proposed NNRAS is superior to risk score averaging (see, for example, Table 3 in [1]). 5 Probable sources for the misspecication error Having outlined one of the issues arising from the misspecication error, we know consider the question of how such an error could occur. The source of the misspecication error in the NNRAS apparently stems from a confusion on the part of its authors between conditional and unconditional probabilities. This mistake itself might have arisen from their failure to use the gold standard evidence at their disposal to impose the conditioning needed to use teh Bayesian apparatus they employ. What follows is a daiagnosis of the liely source of their mistake. Consider the following statement: (a) The probability of obtaining score s i for an unknown species is pi. If statement (a) is true then statements (b), (c) and (d) are also true: (b) The probability of not obtaining score s i for risk component i for an unknown species is (1pi) (c) pi+(1- pi) = 1 (d) The odds of obtaining score si for an unknown species is (1 ?p p ) i i Note that the probabilities pi and (1- pi) are not conditional on any other information. As unconditional statements of probability, statements (a) and (b) are true and we can interpret pi as the unconditional prior probability of obtaining score si from the proportions of invasive and non-invasive species in the available data. It seems likely that it is a missunderstanding of how statements (a) and (b) should be adapted to take account of conditionality on other information that led to the misspecication error in the NNRAS. Thus, Holt [1] (p59, in the paragraph between equations 1 and 2) starts with statement (e), below, and then wrongly applies the arithmetic linking statements (a) and (b) to generate statement (f) [note that statement (f) is false), allowing a value (wrongly) to be calculated for the probability in statement (g): (e) The probabililty that risk component i has score, si, given that the species concerned poses a risk is ( i j ) (f) ( i j ) + ( i j: ) = 1, or equivalently ( i j: ) = 1 ? ( i j ) (g) The probability that risk component i has score, si, given that the species concerned does not pose a risk is ( i j: ) P s P s v P s v v P s P s v P s v v : The correct set of deductions starting with statement(e) is: (e) The probabililty that risk component i has score si, given that the species concerned poses a risk is ( i j ). (h) The probability that risk component i does not have score si, given that the species in question poses a risk is (: i j ) = 1 ? ( i j ) (i) The probability that risk component i has score si, given that the species concerned does not pose a risk is ( i j: ) (j) The probability that risk component i has score si, given that the species concerned does not pose a risk is one minus the probability that risk component i is not given score si, given that the species concerned does not pose a risk: ( i j: ) = 1 ? (: i j: ) P s v P P s P s v P s s v P s v v v : 6 Comparing statements (f) and (g) with statements (h) and (j) shows where the mistake in the NNRAS arises. The arithmetic of the unconditional probabilities applies to the elements on the left hand side of bar in the conditional probabilities; Holt [1] mistakenly applies it to the elements in the right hand side and in so doing attempts to add pieces of two completely separate probability distributions together and make them add up to 1. Hughes [4] also notes that Holt [1] and Holt et al. [2] attempt to multiply odds for separate risk components together in a way which is not consistent with the laws of probability. Reference to statement (d), and the apparent misunderstanding of how the arithmetic of probability applies to conditional probabilities reveals how this second type of error probably arose. First, recall that an odds is a ratio of two probabilities; p /(1- p ), for example. Next, recall that a likelihood ratio is also a ratio of two probabilities. For example, as noted in equation 3 above, the positive likelihood ratio, LR+, is dened as ( i j )/ ( i j: ). As we have just shown, the misspecication error in the NNRAS denes LR+Holt as ( i j )/(1 ? ( i j )); i.e. it leads to missinterpretation of likelihood ratios as odds. Finally, we note that Holt [1] and Holt et al . [2] correctly believe that likelihood ratios can be multiplied together: The overall likelihood ratio which results from applying a series of tests (or assessments) is the product of the individual likelihood ratios of the tests . However, proper likelihood ratios are not odds. While likelihood ratios can be directly multiplied, odds cannot. It appears that because the authors of the NNRAS wrongly believe that likelihood ratios are odds, while rightly believing that likelihood ratios can be multiplied, they wrongly believe odds can be multiplied. To obtain the posterior odds from a series of tests one must use equation 1 and multiply the overall likelihood ratio by the prior odds to obtain the posterior odds. Thus, a mistaken identication of likelihood ratios as odds, wrongly combined with the true knowledge that likelihood ratios can be multiplied together, gives rise to equations 3 and 4 in [1], which are simply, mathematically meaningless. In addition to making the mathematical basis for the NNRAS unsound, the misspecication error is also likely to lead to errors in estimating the economic consequences of using the scheme in practice, as discussed in the next section. P s T v P s T P s v T v P s T v 2. Probable economic consequences from using the NNRAS Referring back to Figure 1 we recall that, with respect to any risk threshold we impose and given the existence of a gold standard, it is possible to dene four conditional probabilities: the true positive proportion (TPP), the false negative proportion (FNP), the true negative proportion (TNP) and the false positive proportion (FPP). Two of these, FPP and FNP, can be thought of as the long-run expected error rates for using the risk assessment scheme and will be important elements in any attempt to calculate expected costs for the scheme. False positives are species which are classed as posing an invasive risk by the risk assessment scheme, but which are in fact are non-invasive. False negatives are, conversely, species which are classed as not posing a risk but which actually are invasive. It is apparent that if we reduce the threshold risk value we will decrease the numbers of false negatives at the expense of increasing the numbers of false positives. At the same time the proportion of true positives identied will increase and the proportion of true negatives decrease. Since perfect discrimination of invasives and nonivasives is impossible in practice, the question arises of how to operate the risk assessment scheme so as to reach some optimum balance in the trade-o between the false positive and false negative errors. This point is discussed at some length by Hughes & Madden [5] in relation to the case/control methodology introduced above. Here we present an alternative (and equivalent) discussion of the issue using a simple expected cost approach. We will demonstrate that the misspecication error in the current NNRAS will make any attempt to nd an economic optimum operating point for it problematic, and that any calculation of expected costs based on the current scheme will be prone to invert the false positive and false negative costs. 7 Denitions and notation Expected value : The expected value for an event is the value obtained when the event occurs multiplied by the probability that the event occurs. For events with several mutually exclusive possible outcomes the expected value is the sum of the products of the values of the individual outcomes and their probabilities. For example, supppose that a fair coin is to be tossed and you will lose 1.00 if it lands heads and 0.00 if it lands tails. Let p be the probability that the coin lands heads up and (1-p ) that it lands tails up. The expected value (loss) of the coin toss is the value of getting heads multiplied by the probability of getting heads, plus the value of getting tails multiplied by the probability of getting tails. If the coin is fair p = (1 ? p) = 0.5, so the expected value of the coin toss is (0.5 1.00) + (0.5 0.00) = 0.50. Regret : The regret associated with a decision is the dierence between the value obtained from the choice which is made and the value of the best possible choice which could have been made with perfect knowledge. In the preceding example no choice was included; the expected value of the toss was simply dependent on the outcome. Suppose we introduce the possibility of picking in advance either heads or tails. In this more sophisticated experiment we will assume that we lose 1.00 if we pick wrongly and lose 0.00 if we pick correctly. There are now four possibilities as shown in Table 1. Table 1. Hypothetical losses associated with guessing a coin toss Guess Coin Lands Heads Tails Heads 0.00 1.00 Tails 1.00 0.00 Because the coin is fair and we have no way of predicting the outcome of the tosses, one rational strategy is to guess heads half the time and tails half the time, selecting which to guess at random. Since the coin toss and our guesses are independent the probability of each of the four outcomes is 0.5 0.5 = 0.25. The expected regret from this experiment is found by adding up the four elements in Table 2. Table 2. Hypothetical expected regret associated with guessing a coin toss Guess Coin Lands Heads Tails Heads (0.5 0.5 0.00) (0.5 0.5 1.00) Tails (0.5 0.5 1.00) (0.5 0.5 0.00) For completeness we note that the expeted regret for the second example is 0.5, and that this would also be the case if we followed a strategy of always calling heads or always tails. Expected regret for risk assessment schemes In order to highlight the potential problems with the current scheme we will describe the calculation of expected regret for well-founded schemes before illustrating how the misspecication errors in the current scheme would be likely to lead to problems. We dene the cost of false negatives as C FN and the cost of false positives as C FP. The cost of false negatives is typically signicantly higher than for false positives (e.g. Smith et al . [6] give a ratio of 15:1 for weedy species; a ratio which they consider to be conservative). We scale all costs such that C FN = 1. Assume that there is a prior probability, p , that any unknown spe8 cies is invasive. Now, consider a tactic in which we perform no risk assessments, assume that no species poses an invasive risk, and allow all species to be imported. The expected regret from this tactic is FN. Similarly the expected regret for a tactic in which we perform no risk assessments but assume that all species are invasive and deny import to all species is (1 ? ) Although it is unlikely in practice, for illustration, suppose that C FP = C FN. The FP. expected regret associated with assuming no risk will increase as the prior probability, p , of a species being invasive increases, and the expected regret associated with asssuming that all species pose a risk will decrease. The expected regret of these two options will be equal when = (1 ? ) = 0.5. The expected regret lines for these two sets of assumptions are shown in Figure 2a. The risk management strategy with lowest expected regret under these conditions is to follow the ``don't test, assume non-invasive`` tactic for 0.5 and the ``don't test, assume invasive`` for 0.5. At values of 0.5, the upper portion of the ``don't test assume noninvasive" tactic is excluded by the lower expected regret of the tactic which assumes that all species are invasive. The opposite condition exlcudes the use of the ``don't test assume invasive`` tactic for values of 0.5. This strategy is known as a naive Bayes classier. p C p C p p p p > p > p Figure 2. Expected regret graphs for dierent predictors of invasiveness. (a) a naive Bayes predictor for a cost ratio = 1. (b) The naive predictor from (a) compared with a naive predictor which takes account of the cost ratio between ase positives and false negatives suggested by Smith et al. [7] (cost ratio = 0.067); (c) A predictor with the cost ratio as in (b) with the likelihood ratio correctly dened and taken from the analysis of the probability of weediness in the data set of Pheloung et al. [6] by Hughes & Madden [5], (FPP = 0.488, FNP = 0.063); (d) the same analysis as shown in (c) but using the incorrect specication of likelihoods given by Holt [2,3]. Note that the scaling on the vertical axis is dierent in (a) than (b), (c) and (d). Explanation of the probabilities marked by arrows is given in the text. 9 As already noted however, the costs of false negatives are typically much greater than for false postives. Using the for FP = 151 FN [7] the expected regret for a no-test strategy incorporating unequal costs is shown in Figure 2b. Note that in comparison with the situation in which the costs are equal, the overall expected regret is lower because the cost of false positives is lower than the cost of false negatives. Also, note that the prior probability at which the decision maker should switch tactics (from assuming that no species are invasive to assuming that all species are invsive) is much lower than in the previous case; with the unequal cost ratio of 15:1, the threshold is at p t1 = 0.0625. Neither of the rst two examples includes an evidence-based risk assessment. The expected regret from using an evidence-based risk assessment scheme (such as the NNRAS) is a function of the prior probability that a species poses an invasive risk, and the false positive and false negative probabilities and C FP and C FN of the scheme; this mirrors the structure of the second coin tossing example, above. In the example shown in Figure 1 for the weed risk assessment scheme, at T = 4, FPP = 0.488 and FNP = 0.063 [5]. The expected regret for this risk assessment scheme is given in equation 4, and displayed in Figure 2c along with the previous examples. C C ((1 ? ) (FPP p )) + ( (FNP CFP p )) (4) CFN In Figure 2c two threshold prior probabilities, p t1 and p t2 are indicated. These are the values of p at which the expected regret for two dierent tactics is equal and, as before, indicate threshold probabilities at which the decision maker should switch tactics in order to operate the risk assessment scheme with the lowest long term expected regret. At prior probabilities up to and including p t1 the least regret tactic is to assume that there is no risk and allow access to all species without testing. From p t1 and up to p t2 (inclusive) the decision maker should operate the risk assessment scheme at the threshold score indicated in Figure 1, while above p t2 the best option (i.e. the one with the lowest long-run expected regret) is to assume that all species pose a risk and avoid imports without testing. The values of the thesholds in the example are p t1 = 0.034 and p t2 = 0.353. Having described the basics required for an analysis of expected regret we can now deal with the implications for such an analysis arising from the misspecied likelihoods in the NNRAS. As a preliminary to examining these possible problems, we emphasise that the authors of the NNRAS did not carry out an analysis of expected values and so what follows is an analysis of the problems implied by their misspecication errors. Recall that the positive likelihood ratio for a test is properly dened as TPP/FPP, and that it is incorrectly dened in the NNRAS [1,2] as TPP/FNP. The rst potential issue is now apparent. Basing an analysis of expected regret on the denitions in [1,2] would result in the FNP being used wrongly in place of the FPP. As we have seen from equation 4, above, both FPP and FNP are used in calculating the expected regret for a risk assessment scheme. If we have (mistakenly) FNP in place of FPP, the question arises as to what value should can be used in place of FNP to calculated expected regret? If we assume a similar misspecication error for negative results as was made for positive results, the most likely quantity that would be used, in place of FNP would be the FPP. In short, the most likely outcome of the misspecication error in the NNRAS is an inversion of the error rates. Figure 2c shows the consequences of this inversion of the FNP and FPP using the same values as in the previous, correct example. In this case the threshold for switching from not testing (and assuming that all species are non-invasive) to testing occurs at = 8.2 10?3 and the threshold for switching from testing to not testing (and assuming that all species are invasive) is at = 0.114, as indicated in Figure 2d. The example just presented gives an illustration of the sort of error which might occur if the misspecication problem in the NNRAS is propagated through an economic analysis of its expected performance. However, rather than focus on specic numeric cases, we want to highlight more general issues which this analysis throws up. p p 10 First, in common with more detailed economic analyses [8] of this type of problem, our simple analysis based on expected regret highlights the link between the cost-eectiveness of any risk assessment scheme and the prior probability of the events which it is intended to identify. From a purely nancial perspective there are events which are either so improbable, or so probable, that evidence-based risk assessment is not worthwhile. Put simply, it is not always economically rational to make a risk assessment, and both the cost eectiveness of any inspection scheme which is put in place, and the range of probabilities of invasiveness over which it will be economically useful, will depend on its error rates. None of these issues is addressed in the published descriptions of the NNRAS. Secondly, because the expected regret is not independent of the prior probability of invasiveness, p , this parameter needs to be included realistically in the scheme and not assumed to be a constant, as it is in the methodology proposed by Holt et al . [1,2]. We can expand on this point specically in relation to the value of p = 0.5 used by Holt [1,2]. The authors of NNRAS picked this prior probability on the grounds that, in the absence of any other information, it is reasonable to use a value which gives an equal probability to unknown species being invasive or non-invasive. This argument seems reasonable on rst inspection, but should be rejected for two straightforward empirical reasons, which we will outline here, and one more abstract (though no less important) reason which is described in the third section of this paper. Holt's use of p = 0.5 as the prior probability for the risk assessment scheme is appropriate only if two conditions are met. These are: (1) that the cost ratio for false positive and false negative errors is equal, i.e. C FP = C FN (this assumption is made implicitly by Holt); (2) there is no information available to select another prior (this is the assumption made explicitly by Holt). In other words the NNRAS is set up as a naive Bayes classier with an equal cost ratio (Figure 2a); it would operate as follows: We begin with an assumption of equal probability of invasiveness and non-invasiveness ( p = 0.5). A risk assessment is carried which leads to an updated estimate of the probability of invasiveness (the posterior probability). If this value is greater than 0.5 we should reject the species and if it less than 0.5 we should accept it. The expected costs (regret) of these decisions are shown in Figure 2a. However, as noted above, the rst of these steps is only relevant if we assume that the costs of the two possible types of mistake are equal (i.e. if C FP = C FN). If, as is usually the case, C FP C FN, then the rational starting point will be some value of p 0.5 (as depicted in Figure 2b) which is dependent on the cost ratio C FP:C FN. When the false positive and false negative /CFN) costs are not equal the appropriate prior probability is = 1 +(C(FP CFP/CFN) However, if one tried to use such a value as the starting point for the proposed risk assessment scheme, one would run into diculty since the use of p = 0.5 is crucial to the subsequent calculation of the probability risk scale used in the NNRAS. At p = 0.5, oddsprior(v ) = 1, and the prior odds cancels out of the calculations, leaving the nal probabilistic risk value dependent only on the data arising from the evaluation. Using any other value would mean that the prior odds did not cancel out of the calculations and one would be forced to include them2. Since Holt's scheme does not have any formal means to do this it would not be possible to calculate the probabilistic risk scores. The second empirical reason to question the use of p = 0.5 as an rational starting point for calculating the odds of invasiveness is that we can determine an empirical prior probability from the data used to construct the scheme, from other sources, or from expert knowledge. In other words, there is no need to make such an uniformative estimate of the prior probability of invasiveness; information to guide the selection of a more informative prior will almost always be available. For example, in the data set used to construct Figure 1, the proportion of known serious or minor weed species (i.e. the cases) was 0.77. So, if we assume that the data set is rep< < p 2. As an aside here we raise the question here of why, if one designs the risk assesssment calculations to remove the prior probability, one needs to bother with the Bayesian apparatus for probability updating: basing one's evaluation of risk purely on the experimental evidence is an entirely standard (or frequentist) approach. 11 resentative of the population of potential invasive weeds as a whole3, p = 0.77 is a more rational prior probability of invasiveness than p = 0.5. As we have already pointed out, however, with a value other than p = 0.5 for the prior, the NNRAS will run into problems since the calculations required to obtain Holt's probabilistic risk scale will not work. These two reasons to reject Holt's suggested risk calculation are linked to the assumption of equal costs for the two types of error, and the assumption of p = 0.5 as a rational prior. As we have demonstrated neither assumption is justied and the calcualtion on which they are based is consequently of questionable value. The third reason to question the proposed scheme also relates to the suggested use of p = 0.5 as the prior but is an issue related to the eect this choice has on the apparent power of the method to dierentiate invasiveness from non-invasives. 3. Information content of predictions from the risk assessment We will state the issue at hand here rst and then describe the underlying, standard results from information theory on which it is based. The use of p = 0.5 as the prior probability will exagerate the apparent information about invasiveness which the scheme supplies; by initiating each evaluation of invasiveness at the point of maximum uncertainty the scheme will be misleading because it will always lead to a reduction in apparent uncertainty about the invasiveness of an unknown species. To see this, consider the following hypothetical example. Suppose we tell you we have a device which can predict the probability of invasiveness of unknown species. An independent authority whose opinion you trust completely tells you our device is 100% accurate. Now suppose that the best estimate for the prior probability of invasiveness among a certain group of organisms is 75%. An unknown plant species is presented for evaluation, you use our device to carry out an evaluation and it predicts ``probability of invasiveness = 90%". You already know for this type of species that the probability of invasiveness is 75%, even in the absence of a specic examination of the risk for this species, so you are not very surprised by the prediction. In other words, the device did not supply very much new information beyond what you already had, simply from knowing the prior probability. Conversely had the prediction been ``invasive with probability 10%" given a background in which 75% of similar species are invasive, you would have learned much more (i.e. been supplied with more information, and been more surprised4). The concept captured in the example has a formal basis in information theory and concerns the quantity expected information, H . Expected information is a logarithmic function of probability and is measured in information units, (the best known being the bit , which is the unit of measurement when the log function has base 2). The values of H (in bits) for a two-way choice are shown in Figure 3 superimposed on the expected regret graph previously introduced in Figure 2. Looking at Figure 3 it can be seen that the expected information curve reaches its maximum at p = 0.5 and decreases as p increases or decreases on either side of 0.5. H can be thought of as a measure of the uncertainty associated with a given probability. It can be seen that if the prior probability is either relatively high or relatively low, then a prediction which moves the probability further towards p = 1 in the former case or p = 0 in the latter, tends to conrm what the prior probability already indicates, and reduces uncertainty, but does not supply much additional information. Starting every risk assessment at p= 0.5 maximises the amount of apparent information the scheme can provide. The proposed scheme will almost certainly appear to be more useful (i.e. more informative) than it actually is because every prediction begins from an assumed position of maximum uncertainty, or minimum information, and thus must lead to an apparent reduction in uncertainty by denition. 3. This is a necessary assumption to use any decision rule derived from the data to make inferences about the invasiveness of new species. 4. As a trivial example suppose our device predicted rain for the day ahead and you used it to predict the status for "tomorrow" during November on the Mull of Kintyre. A true prediction of rain under such circumstances would be neither surprising nor very impressive, given the underlying probability of being right purly by chance. 12 Expected regret for a naive Bayes classier with equal error costs for false positive and negative decisions, and expected information, H, as functions of probability. Note that H (in bits, right scale) reaches a maximum at p = 0.5. Figure 3. References 1. Holt, J. 2006. Score averaging for alien species risk assessment: a probabilistic alternative. Journal of Environmental Management 81: 58-62. 2. Holt, J., Black, R., and Abdallah, R. 2006. A rigorous yet simple quantitative risk assessment method for quarantine pests and non-native organisms. Annals of Applied Biology 149: 167-173. 3. Baker, R. H. A, Black, R., Copp, G. H., Haysom, K. A., Hulme, P. E., Thomas, M. B., Brown, N, A., Brown, M., Ray, J. C. CannonN, R. J. C., Ellis, J., Ellis, M., Ferris, R., Glaves, P., Gozlan, R. E., Holt, J., Howe, E., Knight, J. D., MacLeod, A., Moore, N. P., MumfordD, J. D., Murphy, S. T., Parrott, T, D., Sansford, C. E., Smith, G. C., ST-Hilaire, E, S., Ward, N. L.. 2007. The UK risk assessment scheme for all non-native species. Neobiota, Volume 7. 4. Notes on the mathematical basis of the UK Non-Native Organism Risk Assessment Scheme. http://arxiv.org/ftp/arxiv/papers/0804/0804.1443.pdf. 5. Hughes, G., Madden, L.V. 2003. Evaluating predictive models with application in regulatory policy for invasive weeds. Agricultural Systems, 76: 755-764. 6. Pheloung, P.C., Williams, P.A., Halloy, S.R., 1999. A weed risk assessment model for use as a biosecurity tool evaluating plant introductions. Journal of Environmental Management 57, 239:251. 13 7. Smith, C.S., Lonsdale, W.M., Fortune, J. 1999. When to ignore advice: invasion predictions and decision theory. Biological Invasions, 1:89-96. 8. McAusland, C., Costello, C. 2008. Avoiding invasives: trade-related policies for controlling unintentional exotic species introductions. Journal of Environmental Economics and Managemement, 48: 958-977. 14
© Copyright 2025 Paperzz