Probability Misspecification Errors in the UK Non

LAND ECONOMY WORKING PAPER SERIES
___________________________________________________________________
Number 37: Probability Misspecification Errors in the UK Non-native
Species Risk Assessment Scheme and Possible Economic
Implications
Corresponding Author:
Neil McRoberts
Land Economy & Environment Research Group
SAC Research Division
SAC Edinburgh
EH9 3JG
Tel: 0131-535-4154
Email: neil.mcroberts@sac.ac.uk
Probability misspecication Errors in the UK non-Native Species Risk
Assessment Scheme and Possible Economic Implications
Neil McRoberts
Land Economy & Environment Research Group, SAC, Edinburgh EH9 3JG, UK
Gareth Hughes
School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3JG, UK
Summary
The published papers by Holt [1] and Holt et al. [2] describing the scientic basis for the current
UK non-native risk assessment scheme (NNRAS) contain several fatal methodological errors:
The papers make a misspecication error in the denition of likelihood ratios: the false
negative proportion is wrongly used in place of the false positive proportion.
The misspecication error is compounded into a series of further problems:
1. Likelihoods are wrongly equated with odds and resulting calculations, in which
supposed odds are multiplied together, are mathematically meaningless;
2. The misspecication error appears to result from a lack of understanding of the
algebra of conditional probability. In addition to the misspecication problem, this
lack of understanding also allows conditionality to be ascribed to the data seemingly deus ex machina rather than by reference to a known source of information;
e.g. a gold standard;
3. Without reference to a gold standard and a resulting classication of existing data,
a constant prior probability of invasiveness of 0.5 is attributed to every test. This
value is arithmetically necessary to resolve the erroneous calculation of odds proposed in the NNRAS but is; (a) not supported by the available evidence; (b)
overly uninformative for many test species, and; (c) gives the NNRAS an apparent
capacity to provide information which it does not possess mathematically.
4. Furthermore, a prior probability of 0.5 is used in the published calculations so
that it cancels from the calculation of posterior odds and the nal estimate of
the 'odds' of invasiveness depends only on the test data. This being the case, a
Bayesian framework is not needed, since there is no process of probability
updating and the calculated risk scale is eectively a classicaly summary statistic,
embellished with unnecessary and meaningless probability calculations.
There is no simple means to perform a calculation of expected economic performance for
the NNRAS, because the authors did not carry out (or at least do not provide) a proper
case/control analysis of their risk calculations. If the misspecication error is included in
expected cost calculations, using representative estimates for costs and error rates from
the literature, the analysis suggests that at there is a very low probability threshold of
invasiveness above which the least expected cost action for the NNRAS is simply to
assume that all test species are invasive without carrying out a risk assessment and deny
entry. The practical consequence of this would be that the scheme would be selfdefeating (since it would almost always be cheapest simply to assume that a new species
was invasive without bothering to test it), overly restrictive to trade and therefor likely to
be challenged under international trade legislation. The current scheme would be dicult to defend against such a challenge since it is not transparent, and cannot be
defended economically.
1
Introduction
A UK strategy to deal with threats from the importation of unwanted invasive species was
launched in May 2008 (http://www.scotland.gov.uk/News/Releases/2008/05/28101303). Part
of the strategy is based on a risk assessment scheme developed under a DEFRA research contract and peer-reviewed by the consultants RPS following a competitive tendering process
administrated by SEERAD (now Scottish Government RERAD). Specic methodological components of the risk assessment scheme, which deal with calculation of the odds of an unknown
species being invasive, have been published in peer-reviewed journals [1,2]. The risk assessment
scheme has also recently been summarized in peer-reviewed form [3]. We point out that the fact
that the scheme has been passed by peer review does not guarantee it to be free from errors. In
fact, given that it certainly does contain errors, all that can be said is that it (i.e. the scheme)
provides one piece of evidence that the peer review processes of two journals and one consulting
agency are not perfect; no-one involved in scientic research should be suprised by this.
However, given the need for transparency in the evidence base used for policy formulation, we
msut question whether it can ever be acceptable for work which is known to be awed to continue to be used as the basis for enacting policy. There is even less reason for this to continue
when the changes required to correct the aws have been pointed out and the necessary corrections placed in the public domain [4, and this paper].
Hughes [4] gives a detailed technical description of the errors and some suggestions for calculating a more informative overall risk score based on the same data as the awed UK NNRAS
method. Here, we provide some further comments on the important errors in the current
scheme and show: (a) that it leads to diculties in establishing appropriate error rates and
expected error costs, and; (b) that the most logical error rates and costs which can be derived
from the published scheme are opposite to the correct versions. This has several potentially serious implications since: (i ) the costs of clean-up following the wrongful importation of an
invasive species are typically orders of magnitude greater than the losses to the economy from
wrongful denial of entry to a non-invasive species, and; (ii ) the costs of clean-up for an invasive
species are usually public costs, while the losses from non-importation are mostly private costs.
We also discuss an ineciency in the suggested approach for using the risk assessment scheme
which ignores useful available information about the prior (or background) probability of invasiveness and may bias the scheme towards apparently generating useful information more often
than is rationally justied by the data. This issue would be particularly serious for potential
users of the scheme who are not well enough.
The remainder of this paper is divided into three sections, each dealing with an identied
problem with the current UK NNRAS. The rst section considers the underlying technical
problem of misspecication errors in the conditional probabilities used in the scheme, and we
diagnose the likely sources of confusion that led to those errors. In the second section we
examine some potential economic consequences of any attempt to use the current scheme within
a formal decision-making framework. Thirdly, we highlight the background theory which supports our assertion that the current scheme is systematically biased towards providing more
apparent evidence about invasiveness than will normally be justied by available evidence.
1. Updating probabilities of invasiveness using data from risk assessment scoring
Denitions and notation
Gold standard : An independent assessment of the risk status of each species which is taken to be
the true evaluation of the its risk status, and against which the performance of the potential risk
assessment scheme is assessed.
2
Conditional probability : The probability that some event or state of aairs (v , say) given that
another event/state of aairs (r , say) is true, written P(vjr ); :v (or equivalently v) signies
the negation of event v . So, for example, if v stands for ``is invasive" and r stands for "predicted to be invasive", :v would stand for ``not invasive" and P(:v j:r) would be read as ``the
probability of being not invasive given a prediction of non-invasiveness.
Likelihoods : Likelihoods are conditional probabilities (see above). In statistical theory they are
most commonly encountered in hypothesis testing and model tting, where the term likelihood
typically refers to the probability of the data given the truth of an hypothesis. In the current
context they will occur most commonly as probabilities of species being either invasive or noninvasive given the result of a risk assessment. A likelihood ratio (LR) is simply the result of
dividing one likelihood by another.
Odds : The odds of an event or state of aairs is a function of the probability of the event such
that odds(v ) = P (v)/(1 ? P (v )). The inverse relationship, which expresses probabilities as functions of odds, is P (v) = odds(v)/(1 + odds(v)):
Bayesian updating : The odds of uncertain events change as new evidence accumulates. Bayes'
theorem1 provides a formal way to perform the updating process. The theorem can be written
in a number of ways, but a useful form in the current context is shown in equation 1:
odds( )posterior = LR odds( )prior
v
v
(1)
Equation 1 states that the odds of v after (posterior to) some new evidence has been accumulated is equal to the product of the likelihood ratio associated with the evidence and the odds of
the event prior to the evidence being available. Here, the prior and posterior odds refer to the
possibility that a candidate species for import is invasive, while the evidence is the result of performing a risk assessment on the species.
Dening Likelihood ratios from an evidence base
Consider Figure 1, which is adapted from Figure 1 in Hughes & Madden [5]. The gure is a
case/control plot which shows the distributions of test scores for plant species included in the
Australian weed risk assessment database, previously published by Pheloung et al . [6]. The
horizontal axis (WRA) shows the values for a weed risk index derived from the individual risk
components. Note that WRA here is equivalent to the term si [1,2]. Increasing values of WRA
indicate an increasing probability that a species presents a risk of being a weedy invasive (such
as Japanese Knotweed, for example). There are two frequency distributions in Fig. 1, one for
species known to be invasive weeds (the cases) and the other for species known not to be invasives (the controls). These classications into invasive and non-invasive are made by reference to
an independent gold standard (this could be expert opinion, known historical fact or some other
reference which is independent of the risk assessment scheme). Note further that the ranges of
the scores for invasives and non-invasives show considerable overlap so that there is no single
threshold WRA value, such that all invasives have scores equal to or greater than this value and
all non-invasives have scores less than this value. Hughes & Madden [5] discuss the statisitcal
methods used to select a WRA threshold value which is optimised according to various dierent
criteria. Hereafter we will use T to refer to the threshold score generically. In Fig 1. a
threshold value of T = 4 is indicated by the dashed vertical line. Note that most, but not all,
invasives have scores greater than 4 and most, but not all, non-invasives have scores less than 4.
1. Bayes theorem (credited posthumously to the Rev. Thomas Bayes (1702-1761)) is typically written as:
a) Pr(a)
Pr(a jb) = Pr(b Pr(
in which Pr(b ja) is the likelihood, Pr(a) is the prior probability of a, and Pr(b) is the
b)
j
probability of the data
3
Figure 1. Distributions of weed risk assessment scores for invasive (squares, dotted line) and non-invasive
(circles, solid line) species, based on the data of Pheloung et al. [6] as analysed by Hughes & Madden [5]. A
threshold WRA score T = 4 is indicated by the dashed vertical line.
Now, having immposed a threshold, T , we can use the frequency distributions for invasives and
non-invasives to specify four further quantities, as indicated in Fig. 1:
1. The True Positive Proportion (TPP) is the proportion of invasives (cases) with WRA
scores greater than or equal to T .
2. The False Negative Proportion (FNP) is the proportion of invasives (cases) with WRA
scores less than T .
3. The False Positive Proportion (FPP) is the proportion of non-invasives (controls) with
WRA scores greater than or equal to T .
4. The True Negative Proportion (TNP) is the proportion of non-invasives (controls) with
WRA scores less than T .
Since TPP and FNP together account for all of the invasives, TPP+FNP=1, and, similarly,
TNP+FPP=1, since these two proportions account for all of the non-invasives. Each of the
proportions is an estimate of a likelihood , and therefore (as dened above) a conditional probability. For example the TPP is an estimate of the probability that a specimen will have a WRA
score greater than or equal to T , given that it is invasive; i.e. TPP ' (WRA j ). Similarly, FPP ' (WRA j: ) (i.e. FPP is an estimate of the probability that a specimen will
have a WRA score greater than T given that it is a non-invasive). Together these two likelihoods dene the likelihood ratio for a positive risk assessment (i.e. positive in the sense that it
produces a value greater than or equal to the selected threshold), as shown in equation 2:
P
P
T
T v
v
TPP
(WRA j )
LR+ = TPP
FPP = 1 ? TNP = (WRA j: )
P
P
4
T v
T
v
(2)
Now, Holt [2] and Holt et al . [3] correctly speciy this likelihood ratio in terms of conditional
probabilities (for example equation 2 in [1] is correct and can be compared directly with equation 2, above), but they use the wrong quantity for the FPP. We describe the mistake in
more detail at the end of this section, but here the important point to note is that equation 2,
above, is the correct form, and that what Holt [1,2] actually calculate is shown below as equation 3:
(WRA j ) = TPP
LR+Holt = 1 ?TPP
=
(3)
TPP
(WRA j ) FNP
P
T v
P
<T v
Referring back to Fig. 1 it is clear that this misspecication error in the NNRAS [1,2] will introduce systematic errors into any risk assessment which is based on their treatment of conditional
probabilities. To highlight this, consider the eect on the quantities generated by equation 2
(the correct form) and equation 3 (the incorrect form) as we vary the threshold T over the
range of possible WRA scores.
Reducing the threshold will increase TPP and FPP, but decrease FNP. Thus, the eect of
decreasing the threshold in the outcome of equation 2 is unclear and will depend on the specic
way which the distributions of invasives and non-invasives vary over the range of WRA.
However, while the exact eect of decreasing T on the true LR+ is uncertain it will eventually
lead to a value of to LR+= 1, since TPP = 1 and FPP = 1 if T is set at its minimum possible
value. Referring back to equation 1, we can see that this makes sense. If T is set to its minimum value, all of the cases and all of the controls will have WRA values greater than T and
hence the threshold will provide no discrimination betwee invasive and non-invasive species. In
this case we would expect the odds of a species being invasive not to be changed by using the
test, and the value of LR+ = 1 in equation 1 would ensure that this is the case.
In contrast to the situation for the correct LR+, it is straightforward to see the implications of
the incorrect version of LR+Holt given by Holt [2]. This will increase aif we lower the
sthreshold, since as T gets smaller, TPP increases and therefore FNP decreases (recall that
FNP = 1-TPP). The converse eects are produced if we set higher thresholds by moving T to
the right, increasing the risk threshold. As the risk threshold is increased TPP reduces and
FNP increases and thus, LR+Holt decreases. This is opposite to what is required. If the risk
score is positively correlated with the odds that an unknown species is invasive, other things
being equal, we would anticipate that obtaining a risk score greater than or equal to a high
threshold would indicate a high odds of the species in question being invasive; the data in Fig. 1
display this characteristic. Accordingly, as we increase the value of T , we would expect LR+ to
increase so that equation 1 will reect the intuitive result just described. In the correct specication of LR+ this is what happens because FPP decreases more quickly than TPP as the
threshold increases until with T set at its maximum possible value, both proportions equal 0.
At at which the maximum possible value of the threshold, LR+ collapses to 0 and no specimen
is classed as invasive.
To reiterate, the LR+Holt, will decrease as T increases so that for increasingly stringent tests of
invasiveness, equation 1, will make less and less dierence to the odds of a species which tests
positive being classed as invasive; this is clearly opposite to what is required of a rational risk
assessment scheme in which exceeding a higher risk threshold ought to provide more evidence of
a test species being invasive.
It should be apparent that the methodology for updating odds based on evidence is dependent
on the availability of the independent gold standard classication of species in the database as
either invasive or non-invasive. All of calculations of conditional odds are made relative to this
classication. Indeed, the gold standard classication is the conditional information on which
the entire Bayesian methodology for risk evaluation and updating is based. As Hughes [4]
pointed out, in the empirical examples discussed in the NNRAS [1,2] this conditional information is either absent or it is not used in constructing the scheme, (conditionality being attributed
apparently deus ex machina ). However, the authors make informal use of the agreement
between their risk scores and the available gold standard (expert judgement in this case) as part
of the evidence in support of their claim that the proposed NNRAS is superior to risk score
averaging (see, for example, Table 3 in [1]).
5
Probable sources for the misspecication error
Having outlined one of the issues arising from the misspecication error, we know consider the
question of how such an error could occur. The source of the misspecication error in the
NNRAS apparently stems from a confusion on the part of its authors between conditional and
unconditional probabilities. This mistake itself might have arisen from their failure to use the
gold standard evidence at their disposal to impose the conditioning needed to use teh Bayesian
apparatus they employ. What follows is a daiagnosis of the liely source of their mistake.
Consider the following statement:
(a) The probability of obtaining score s i for an unknown species is pi.
If statement (a) is true then statements (b), (c) and (d) are also true:
(b) The probability of not obtaining score s i for risk component i for an unknown species is (1pi)
(c) pi+(1- pi) = 1
(d) The odds of obtaining score si for an unknown species is (1 ?p p )
i
i
Note that the probabilities pi and (1- pi) are not conditional on any other information. As
unconditional statements of probability, statements (a) and (b) are true and we can interpret pi
as the unconditional prior probability of obtaining score si from the proportions of invasive and
non-invasive species in the available data. It seems likely that it is a missunderstanding of how
statements (a) and (b) should be adapted to take account of conditionality on other information
that led to the misspecication error in the NNRAS. Thus, Holt [1] (p59, in the paragraph
between equations 1 and 2) starts with statement (e), below, and then wrongly applies the
arithmetic linking statements (a) and (b) to generate statement (f) [note that statement (f) is
false), allowing a value (wrongly) to be calculated for the probability in statement (g):
(e) The probabililty that risk component i has score, si, given that the species concerned
poses a risk is ( i j )
(f) ( i j ) + ( i j: ) = 1, or equivalently ( i j: ) = 1 ? ( i j )
(g) The probability that risk component i has score, si, given that the species concerned
does not pose a risk is ( i j: )
P s
P s
v
P s
v
v
P s
P s
v
P s
v
v :
The correct set of deductions starting with statement(e) is:
(e) The probabililty that risk component i has score si, given that the species concerned
poses a risk is ( i j ).
(h) The probability that risk component i does not have score si, given that the species in
question poses a risk is (: i j ) = 1 ? ( i j )
(i) The probability that risk component i has score si, given that the species concerned
does not pose a risk is ( i j: )
(j) The probability that risk component i has score si, given that the species concerned
does not pose a risk is one minus the probability that risk component i is not given score
si, given that the species concerned does not pose a risk:
( i j: ) = 1 ? (: i j: )
P s
v
P
P s
P s
v
P
s
s
v
P s
v
v
v :
6
Comparing statements (f) and (g) with statements (h) and (j) shows where the mistake in the
NNRAS arises. The arithmetic of the unconditional probabilities applies to the elements on the
left hand side of bar in the conditional probabilities; Holt [1] mistakenly applies it to the elements in the right hand side and in so doing attempts to add pieces of two completely separate
probability distributions together and make them add up to 1.
Hughes [4] also notes that Holt [1] and Holt et al. [2] attempt to multiply odds for separate risk
components together in a way which is not consistent with the laws of probability. Reference to
statement (d), and the apparent misunderstanding of how the arithmetic of probability applies
to conditional probabilities reveals how this second type of error probably arose.
First, recall that an odds is a ratio of two probabilities; p /(1- p ), for example. Next, recall that
a likelihood ratio is also a ratio of two probabilities. For example, as noted in equation 3 above,
the positive likelihood ratio, LR+, is dened as ( i j )/ ( i j: ). As we have just
shown, the misspecication error in the NNRAS denes LR+Holt as ( i j )/(1 ? ( i j )); i.e. it leads to missinterpretation of likelihood ratios as odds. Finally, we note
that Holt [1] and Holt et al . [2] correctly believe that likelihood ratios can be multiplied
together: The overall likelihood ratio which results from applying a series of tests (or assessments) is the product of the individual likelihood ratios of the tests . However, proper likelihood
ratios are not odds. While likelihood ratios can be directly multiplied, odds cannot. It appears
that because the authors of the NNRAS wrongly believe that likelihood ratios are odds, while
rightly believing that likelihood ratios can be multiplied, they wrongly believe odds can be multiplied. To obtain the posterior odds from a series of tests one must use equation 1 and multiply
the overall likelihood ratio by the prior odds to obtain the posterior odds.
Thus, a mistaken identication of likelihood ratios as odds, wrongly combined with the true
knowledge that likelihood ratios can be multiplied together, gives rise to equations 3 and 4 in
[1], which are simply, mathematically meaningless. In addition to making the mathematical
basis for the NNRAS unsound, the misspecication error is also likely to lead to errors in estimating the economic consequences of using the scheme in practice, as discussed in the next section.
P s
T v
P s
T
P s
v
T v
P s
T v
2. Probable economic consequences from using the NNRAS
Referring back to Figure 1 we recall that, with respect to any risk threshold we impose and
given the existence of a gold standard, it is possible to dene four conditional probabilities: the
true positive proportion (TPP), the false negative proportion (FNP), the true negative proportion (TNP) and the false positive proportion (FPP). Two of these, FPP and FNP, can be
thought of as the long-run expected error rates for using the risk assessment scheme and will be
important elements in any attempt to calculate expected costs for the scheme. False positives
are species which are classed as posing an invasive risk by the risk assessment scheme, but which
are in fact are non-invasive. False negatives are, conversely, species which are classed as not
posing a risk but which actually are invasive.
It is apparent that if we reduce the threshold risk value we will decrease the numbers of false
negatives at the expense of increasing the numbers of false positives. At the same time the proportion of true positives identied will increase and the proportion of true negatives decrease.
Since perfect discrimination of invasives and nonivasives is impossible in practice, the question
arises of how to operate the risk assessment scheme so as to reach some optimum balance in the
trade-o between the false positive and false negative errors. This point is discussed at some
length by Hughes & Madden [5] in relation to the case/control methodology introduced above.
Here we present an alternative (and equivalent) discussion of the issue using a simple expected
cost approach. We will demonstrate that the misspecication error in the current NNRAS will
make any attempt to nd an economic optimum operating point for it problematic, and that
any calculation of expected costs based on the current scheme will be prone to invert the false
positive and false negative costs.
7
Denitions and notation
Expected value : The expected value for an event is the value obtained when the event occurs
multiplied by the probability that the event occurs. For events with several mutually exclusive
possible outcomes the expected value is the sum of the products of the values of the individual
outcomes and their probabilities. For example, supppose that a fair coin is to be tossed and you
will lose 1.00 if it lands heads and 0.00 if it lands tails. Let p be the probability that the coin
lands heads up and (1-p ) that it lands tails up. The expected value (loss) of the coin toss is the
value of getting heads multiplied by the probability of getting heads, plus the value of getting
tails multiplied by the probability of getting tails. If the coin is fair p = (1 ? p) = 0.5, so the
expected value of the coin toss is (0.5 1.00) + (0.5 0.00) = 0.50.
Regret : The regret associated with a decision is the dierence between the value obtained from
the choice which is made and the value of the best possible choice which could have been made
with perfect knowledge. In the preceding example no choice was included; the expected value of
the toss was simply dependent on the outcome. Suppose we introduce the possibility of picking
in advance either heads or tails. In this more sophisticated experiment we will assume that we
lose 1.00 if we pick wrongly and lose 0.00 if we pick correctly. There are now four possibilities
as shown in Table 1.
Table 1. Hypothetical losses associated with guessing a coin toss
Guess
Coin Lands
Heads Tails
Heads 0.00 1.00
Tails
1.00 0.00
Because the coin is fair and we have no way of predicting the outcome of the tosses, one rational
strategy is to guess heads half the time and tails half the time, selecting which to guess at
random. Since the coin toss and our guesses are independent the probability of each of the four
outcomes is 0.5 0.5 = 0.25. The expected regret from this experiment is found by adding up
the four elements in Table 2.
Table 2. Hypothetical expected regret associated with guessing a coin toss
Guess
Coin Lands
Heads
Tails
Heads (0.5 0.5 0.00) (0.5 0.5 1.00)
Tails (0.5 0.5 1.00) (0.5 0.5 0.00)
For completeness we note that the expeted regret for the second example is 0.5, and that this
would also be the case if we followed a strategy of always calling heads or always tails.
Expected regret for risk assessment schemes
In order to highlight the potential problems with the current scheme we will describe the calculation of expected regret for well-founded schemes before illustrating how the misspecication
errors in the current scheme would be likely to lead to problems.
We dene the cost of false negatives as C FN and the cost of false positives as C FP. The cost of
false negatives is typically signicantly higher than for false positives (e.g. Smith et al . [6] give a
ratio of 15:1 for weedy species; a ratio which they consider to be conservative). We scale all
costs such that C FN = 1. Assume that there is a prior probability, p , that any unknown spe8
cies is invasive. Now, consider a tactic in which we perform no risk assessments, assume that no
species poses an invasive risk, and allow all species to be imported. The expected regret from
this tactic is FN. Similarly the expected regret for a tactic in which we perform no risk
assessments but assume that all species are invasive and deny import to all species is (1 ? ) Although it is unlikely in practice, for illustration, suppose that C FP = C FN. The
FP.
expected regret associated with assuming no risk will increase as the prior probability, p , of a
species being invasive increases, and the expected regret associated with asssuming that all species pose a risk will decrease. The expected regret of these two options will be equal when =
(1 ? ) = 0.5. The expected regret lines for these two sets of assumptions are shown in Figure
2a. The risk management strategy with lowest expected regret under these conditions is to
follow the ``don't test, assume non-invasive`` tactic for 0.5 and the ``don't test, assume
invasive`` for
0.5. At values of
0.5, the upper portion of the ``don't test assume noninvasive" tactic is excluded by the lower expected regret of the tactic which assumes that all
species are invasive. The opposite condition exlcudes the use of the ``don't test assume invasive``
tactic for values of 0.5. This strategy is known as a naive Bayes classier.
p
C
p
C
p
p
p
p >
p >
p
Figure 2. Expected regret graphs for dierent predictors of invasiveness. (a) a naive Bayes predictor for a cost
ratio = 1. (b) The naive predictor from (a) compared with a naive predictor which takes account of the cost ratio
between ase positives and false negatives suggested by Smith et al. [7] (cost ratio = 0.067); (c) A predictor with
the cost ratio as in (b) with the likelihood ratio correctly dened and taken from the analysis of the probability
of weediness in the data set of Pheloung et al. [6] by Hughes & Madden [5], (FPP = 0.488, FNP = 0.063); (d)
the same analysis as shown in (c) but using the incorrect specication of likelihoods given by Holt [2,3]. Note
that the scaling on the vertical axis is dierent in (a) than (b), (c) and (d). Explanation of the probabilities
marked by arrows is given in the text.
9
As already noted however, the costs of false negatives are typically much greater than for false
postives. Using the for FP = 151 FN [7] the expected regret for a no-test strategy incorporating unequal costs is shown in Figure 2b. Note that in comparison with the situation in which
the costs are equal, the overall expected regret is lower because the cost of false positives is
lower than the cost of false negatives. Also, note that the prior probability at which the
decision maker should switch tactics (from assuming that no species are invasive to assuming
that all species are invsive) is much lower than in the previous case; with the unequal cost ratio
of 15:1, the threshold is at p t1 = 0.0625.
Neither of the rst two examples includes an evidence-based risk assessment. The expected
regret from using an evidence-based risk assessment scheme (such as the NNRAS) is a function
of the prior probability that a species poses an invasive risk, and the false positive and false negative probabilities and C FP and C FN of the scheme; this mirrors the structure of the second coin
tossing example, above. In the example shown in Figure 1 for the weed risk assessment scheme,
at T = 4, FPP = 0.488 and FNP = 0.063 [5]. The expected regret for this risk assessment
scheme is given in equation 4, and displayed in Figure 2c along with the previous examples.
C
C
((1 ? ) (FPP p
)) + ( (FNP CFP
p
))
(4)
CFN
In Figure 2c two threshold prior probabilities, p t1 and p t2 are indicated. These are the values
of p at which the expected regret for two dierent tactics is equal and, as before, indicate
threshold probabilities at which the decision maker should switch tactics in order to operate the
risk assessment scheme with the lowest long term expected regret. At prior probabilities up to
and including p t1 the least regret tactic is to assume that there is no risk and allow access to all
species without testing. From p t1 and up to p t2 (inclusive) the decision maker should operate
the risk assessment scheme at the threshold score indicated in Figure 1, while above p t2 the best
option (i.e. the one with the lowest long-run expected regret) is to assume that all species pose
a risk and avoid imports without testing. The values of the thesholds in the example are p t1 =
0.034 and p t2 = 0.353.
Having described the basics required for an analysis of expected regret we can now deal with the
implications for such an analysis arising from the misspecied likelihoods in the NNRAS. As a
preliminary to examining these possible problems, we emphasise that the authors of the NNRAS
did not carry out an analysis of expected values and so what follows is an analysis of the problems implied by their misspecication errors.
Recall that the positive likelihood ratio for a test is properly dened as TPP/FPP, and that it
is incorrectly dened in the NNRAS [1,2] as TPP/FNP. The rst potential issue is now
apparent. Basing an analysis of expected regret on the denitions in [1,2] would result in the
FNP being used wrongly in place of the FPP. As we have seen from equation 4, above, both
FPP and FNP are used in calculating the expected regret for a risk assessment scheme. If we
have (mistakenly) FNP in place of FPP, the question arises as to what value should can be used
in place of FNP to calculated expected regret? If we assume a similar misspecication error for
negative results as was made for positive results, the most likely quantity that would be used, in
place of FNP would be the FPP. In short, the most likely outcome of the misspecication error in the NNRAS is an inversion of the error rates. Figure 2c shows the consequences of this inversion of the FNP and FPP using the same values as in the previous, correct example. In this case the threshold for switching from not testing (and assuming that all
species are non-invasive) to testing occurs at = 8.2 10?3 and the threshold for switching from
testing to not testing (and assuming that all species are invasive) is at = 0.114, as indicated in
Figure 2d. The example just presented gives an illustration of the sort of error which might
occur if the misspecication problem in the NNRAS is propagated through an economic analysis
of its expected performance. However, rather than focus on specic numeric cases, we want to
highlight more general issues which this analysis throws up.
p
p
10
First, in common with more detailed economic analyses [8] of this type of problem, our simple
analysis based on expected regret highlights the link between the cost-eectiveness of any risk
assessment scheme and the prior probability of the events which it is intended to identify. From
a purely nancial perspective there are events which are either so improbable, or so probable,
that evidence-based risk assessment is not worthwhile. Put simply, it is not always economically rational to make a risk assessment, and both the cost eectiveness of any inspection scheme
which is put in place, and the range of probabilities of invasiveness over which it will be economically useful, will depend on its error rates. None of these issues is addressed in the published
descriptions of the NNRAS.
Secondly, because the expected regret is not independent of the prior probability of invasiveness,
p , this parameter needs to be included realistically in the scheme and not assumed to be a constant, as it is in the methodology proposed by Holt et al . [1,2]. We can expand on this point
specically in relation to the value of p = 0.5 used by Holt [1,2]. The authors of NNRAS
picked this prior probability on the grounds that, in the absence of any other information, it is
reasonable to use a value which gives an equal probability to unknown species being invasive or
non-invasive. This argument seems reasonable on rst inspection, but should be rejected for
two straightforward empirical reasons, which we will outline here, and one more abstract
(though no less important) reason which is described in the third section of this paper.
Holt's use of p = 0.5 as the prior probability for the risk assessment scheme is appropriate only
if two conditions are met. These are: (1) that the cost ratio for false positive and false negative
errors is equal, i.e. C FP = C FN (this assumption is made implicitly by Holt); (2) there is no
information available to select another prior (this is the assumption made explicitly by Holt).
In other words the NNRAS is set up as a naive Bayes classier with an equal cost ratio (Figure
2a); it would operate as follows: We begin with an assumption of equal probability of invasiveness and non-invasiveness ( p = 0.5). A risk assessment is carried which leads to an updated
estimate of the probability of invasiveness (the posterior probability). If this value is greater
than 0.5 we should reject the species and if it less than 0.5 we should accept it. The expected
costs (regret) of these decisions are shown in Figure 2a.
However, as noted above, the rst of these steps is only relevant if we assume that the costs of
the two possible types of mistake are equal (i.e. if C FP = C FN). If, as is usually the case,
C FP C FN, then the rational starting point will be some value of p 0.5 (as depicted in Figure
2b) which is dependent on the cost ratio C FP:C FN. When the false positive and false negative
/CFN)
costs are not equal the appropriate prior probability is = 1 +(C(FP
CFP/CFN) However, if one tried
to use such a value as the starting point for the proposed risk assessment scheme, one would run
into diculty since the use of p = 0.5 is crucial to the subsequent calculation of the probability
risk scale used in the NNRAS. At p = 0.5, oddsprior(v ) = 1, and the prior odds cancels out of
the calculations, leaving the nal probabilistic risk value dependent only on the data arising
from the evaluation. Using any other value would mean that the prior odds did not cancel out
of the calculations and one would be forced to include them2. Since Holt's scheme does not
have any formal means to do this it would not be possible to calculate the probabilistic risk
scores.
The second empirical reason to question the use of p = 0.5 as an rational starting point for calculating the odds of invasiveness is that we can determine an empirical prior probability from
the data used to construct the scheme, from other sources, or from expert knowledge. In other
words, there is no need to make such an uniformative estimate of the prior probability of invasiveness; information to guide the selection of a more informative prior will almost always be
available. For example, in the data set used to construct Figure 1, the proportion of known serious or minor weed species (i.e. the cases) was 0.77. So, if we assume that the data set is rep<
<
p
2. As an aside here we raise the question here of why, if one designs the risk assesssment calculations to
remove the prior probability, one needs to bother with the Bayesian apparatus for probability updating:
basing one's evaluation of risk purely on the experimental evidence is an entirely standard (or frequentist)
approach.
11
resentative of the population of potential invasive weeds as a whole3, p = 0.77 is a more rational
prior probability of invasiveness than p = 0.5. As we have already pointed out, however, with a
value other than p = 0.5 for the prior, the NNRAS will run into problems since the calculations
required to obtain Holt's probabilistic risk scale will not work. These two reasons to reject
Holt's suggested risk calculation are linked to the assumption of equal costs for the two types of
error, and the assumption of p = 0.5 as a rational prior. As we have demonstrated neither
assumption is justied and the calcualtion on which they are based is consequently of questionable value. The third reason to question the proposed scheme also relates to the suggested use
of p = 0.5 as the prior but is an issue related to the eect this choice has on the apparent power
of the method to dierentiate invasiveness from non-invasives.
3. Information content of predictions from the risk assessment
We will state the issue at hand here rst and then describe the underlying, standard results
from information theory on which it is based. The use of p = 0.5 as the prior probability will
exagerate the apparent information about invasiveness which the scheme supplies; by initiating
each evaluation of invasiveness at the point of maximum uncertainty the scheme will be misleading because it will always lead to a reduction in apparent uncertainty about the invasiveness
of an unknown species. To see this, consider the following hypothetical example. Suppose we
tell you we have a device which can predict the probability of invasiveness of unknown species.
An independent authority whose opinion you trust completely tells you our device is 100%
accurate. Now suppose that the best estimate for the prior probability of invasiveness among a
certain group of organisms is 75%. An unknown plant species is presented for evaluation, you
use our device to carry out an evaluation and it predicts ``probability of invasiveness = 90%".
You already know for this type of species that the probability of invasiveness is 75%, even in the
absence of a specic examination of the risk for this species, so you are not very surprised by the
prediction. In other words, the device did not supply very much new information beyond what
you already had, simply from knowing the prior probability. Conversely had the prediction
been ``invasive with probability 10%" given a background in which 75% of similar species are
invasive, you would have learned much more (i.e. been supplied with more information, and
been more surprised4).
The concept captured in the example has a formal basis in information theory and concerns the
quantity expected information, H . Expected information is a logarithmic function of probability
and is measured in information units, (the best known being the bit , which is the unit of measurement when the log function has base 2). The values of H (in bits) for a two-way choice are
shown in Figure 3 superimposed on the expected regret graph previously introduced in Figure 2.
Looking at Figure 3 it can be seen that the expected information curve reaches its maximum at
p = 0.5 and decreases as p increases or decreases on either side of 0.5. H can be thought of as
a measure of the uncertainty associated with a given probability. It can be seen that if the
prior probability is either relatively high or relatively low, then a prediction which moves the
probability further towards p = 1 in the former case or p = 0 in the latter, tends to conrm
what the prior probability already indicates, and reduces uncertainty, but does not supply much
additional information. Starting every risk assessment at p= 0.5 maximises the amount of
apparent information the scheme can provide. The proposed scheme will almost certainly
appear to be more useful (i.e. more informative) than it actually is because every prediction
begins from an assumed position of maximum uncertainty, or minimum information, and thus
must lead to an apparent reduction in uncertainty by denition.
3. This is a necessary assumption to use any decision rule derived from the data to make inferences about
the invasiveness of new species.
4. As a trivial example suppose our device predicted rain for the day ahead and you used it to predict the status
for "tomorrow" during November on the Mull of Kintyre. A true prediction of rain under such circumstances
would be neither surprising nor very impressive, given the underlying probability of being right purly by chance.
12
Expected regret for a naive Bayes classier with equal error costs for false positive
and negative decisions, and expected information, H, as functions of probability. Note that H
(in bits, right scale) reaches a maximum at p = 0.5.
Figure 3.
References
1. Holt, J. 2006. Score averaging for alien species risk assessment: a probabilistic alternative. Journal of Environmental Management 81: 58-62.
2. Holt, J., Black, R., and Abdallah, R. 2006. A rigorous yet simple quantitative risk assessment method for quarantine pests and non-native organisms. Annals of Applied Biology
149: 167-173.
3. Baker, R. H. A, Black, R., Copp, G. H., Haysom, K. A., Hulme, P. E., Thomas, M. B.,
Brown, N, A., Brown, M., Ray, J. C. CannonN, R. J. C., Ellis, J., Ellis, M., Ferris, R.,
Glaves, P., Gozlan, R. E., Holt, J., Howe, E., Knight, J. D., MacLeod, A., Moore, N. P.,
MumfordD, J. D., Murphy, S. T., Parrott, T, D., Sansford, C. E., Smith, G. C., ST-Hilaire, E, S., Ward, N. L.. 2007. The UK risk assessment scheme for all non-native species.
Neobiota, Volume 7.
4. Notes on the mathematical basis of the UK Non-Native Organism Risk Assessment
Scheme. http://arxiv.org/ftp/arxiv/papers/0804/0804.1443.pdf.
5. Hughes, G., Madden, L.V. 2003. Evaluating predictive models with application in regulatory policy for invasive weeds. Agricultural Systems, 76: 755-764.
6. Pheloung, P.C., Williams, P.A., Halloy, S.R., 1999. A weed risk assessment model for use
as a biosecurity tool evaluating plant introductions. Journal of Environmental Management 57, 239:251.
13
7. Smith, C.S., Lonsdale, W.M., Fortune, J. 1999. When to ignore advice: invasion predictions and decision theory. Biological Invasions, 1:89-96.
8. McAusland, C., Costello, C. 2008. Avoiding invasives: trade-related policies for controlling unintentional exotic species introductions. Journal of Environmental Economics
and Managemement, 48: 958-977.
14

Download Report

Probability Misspecification Errors in the UK Non

Paperzz.com

Your Paperzz