IJE Advance Access originally published online on January 9, 2008
International Journal of Epidemiology 2008 37(2):382-385; doi:10.1093/ije/dym291
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Brief Report
How far from non-differential does exposure or disease misclassification have to be to bias measures of association away from the null?
1 Department of Pediatrics, University of Minnesota, Minneapolis, MN, USA.
2 Department of Epidemiology and Department of Statistics, University of California, Los Angeles, CA, USA.
3 Division of Environmental Health Sciences, University of Minnesota, Minneapolis, MN, USA.
* Corresponding author. Department of Pediatrics, University of Minnesota, Mayo Mail Code 715, 420 Delaware St. SE, Minneapolis, MN 55455, USA. E-mail: jure0007{at}umn.edu
| Abstract |
|---|
|
|
|---|
A well-known heuristic in epidemiology is that non-differential exposure or disease misclassification biases the expected values of an estimator toward the null value. This heuristic works correctly only when additional conditions are met, such as independence of classification errors. We present examples to show that, even when the additional conditions are met, if the misclassification is only approximately non-differential, then bias is not guaranteed to be toward the null. In light of such examples, we advise that evaluation of misclassification should not be based on the assumption of exact non-differentiality unless the latter can be deduced logically from the facts of the situation.
Keywords Exposure measurement, misclassification, odds ratio, prevalence, sensitivity analysis
Accepted 17 December 2007
| Introduction |
|---|
|
|
|---|
A well-known heuristic in epidemiology is that non-differential exposure or disease misclassification biases the expected values of an estimator toward the null value.1–6 A more precise version is that our estimate is probably closer to the null than it would be were there no misclassification.7–12 Even with this probabilistic modification, the rule works only when special conditions besides non-differentiality are met,13–18 e.g. that the misclassification in question is independent of other errors.16,17 Furthermore, many forms of differential error will also produce bias toward the null under the same conditions.19 Thus, non-differentiality is neither necessary nor sufficient for bias toward the null.
Even allowing some utility for the rule, a careful reading of the epidemiological literature reveals that non-differential is not consistently defined. Some epidemiological textbooks state correctly that it means the error probabilities must be the same for both groups compared20 (p. 107) or identical21 (p. 192) in both groups. Following the latter definitions, most books use examples of non-differential misclassification in which the misclassification probabilities are exactly the same. Nonetheless, it is our impression that epidemiologists believe that approximate non-differentiality is sufficient for the rule to work, as reflected in books that say non-differential misclassification results when the classification errors occur in similar proportions22 (p. 169). The question is then how close to non-differential must the classification error be to produce bias toward the null, given that the other conditions necessary for the rule are satisfied.23
We present examples to demonstrate that, even if other conditions for bias toward the null are met, the bias is not guaranteed to be toward the null if the misclassification is only approximately non-differential by certain ordinary judgments. Our examples will concern misclassification of an uncommon exposure (under 10% prevalence) using the odds ratio as the measure of association. Because of the parallel algebra, the points also apply to misclassification of an uncommon disease in a cohort or prevalence study using the odds ratio, or the ratio of rates or proportions. With the values of sensitivity and specificity reversed, they would also apply to misclassification using the odds ratio when non-exposure was uncommon.
| Methods |
|---|
|
|
|---|
Table 1 gives a hypothetical 2 x 2 table of expected cell counts which we will use for illustration, identical to data from a study of the association of private pesticide-applicator exposure with circulatory and respiratory birth anomalies.24 Suppose for the moment that the expected counts are correctly classified on outcome status (case, non-case) but to some degree incorrectly classified on exposure status (exposed, unexposed). In this single-stratum set-up, with a binary exposure and no outcome misclassification, the impact of exactly non-differential misclassification is to produce bias toward the null and possibly beyond.2–5
|
Our examples will be limited to less extreme cases in which the error probabilities are always less than the measured exposure prevalences. That is, within both the case and non-case groups, in our examples the false-negative probability (probability of being classified as unexposed if exposed; equal to 1 – sensitivity) is less than the measured non-exposure prevalence, and the false-positive probability (probability of being classified as exposed if unexposed; equal to 1 – specificity) is less than the measured exposure prevalence. These restrictions avoid negative corrections and allow the expected corrected odds ratio to be computed using the simple formula
|
| (1) |
| Results |
|---|
|
|
|---|
As an initial example, suppose that, for both cases and non-cases, the false-negative probability is 0.26 and the false-positive probability is 0.01 (i.e., Fn1 = Fn0 = 0.26 and Fp1 = Fp0 = 0.01). Applying formula (1) to Table 1, we obtain 2.06, a value greater than the expected odds ratio with misclassification of 1.62.
Suppose now that the Fn1 = Fn0 = 0.21, but Fp1 = 0.02 and Fp0 = 0.01. Although the latter two error probabilities are clearly very different on a relative scale, their absolute difference is very small and so they might be judged the same for all practical purposes. While the odds ratio from Table 1 (expected counts with exposure misclassification) is 1.62, the expected odds ratio after correction is 1.34. Thus the rather small difference in Fp between the cases and non-cases led to a large correction in the direction opposite that expected from non-differential misclassification.
Table 2 shows the expected corrected odds ratio for various combinations of false-negative and false-positive probabilities. Overall, in accord with the low exposure prevalence, the corrected odds ratios are strongly affected by small changes in the false-positive probabilities, whereas changes in the false-negative probabilities have comparatively little impact on the results. In rows 5 and 7 the correction more than quadruples the odds ratio, while in row 6 the correction halves the odds ratio, going beyond the null. In such extreme instances we would not deem the corrected odds ratio reliable, and we would recommend instead approaches to the problem that can easily handle such extremes, such as Bayesian or shrinkage methods.26–28
|
| Discussion |
|---|
|
|
|---|
In our examples, the low prevalence of exposure led to extreme sensitivity of the results to the false-positive probabilities. This sensitivity is a manifestation of the well-known screening problem that low specificity for an uncommon condition can lead to huge errors in estimating prevalence. In our setting the problem translates into extreme sensitivity of misclassification corrections to violations of non-differentiality. We would take this problem as a good reason to avoid reliance on the non-differentiality assumption in drawing inferences.
Even if exact non-differentiality holds, misclassification is guaranteed to produce bias toward the null only under certain conditions, which if sufficiently violated can lead to bias away from the null.13–17 Sometimes these conditions may be obviously correct, e.g. when the exposure variable is a binary state such as employment in an industry. Other conditions may not be so obvious, however, especially when the exposure is the result of categorizing a continuous variable.15 When the status of these conditions is uncertain or the exposure is rare, one should not be too certain that classification errors have produced bias toward the null, even if one is fairly sure that the classification probabilities are very similar in cases and non-cases.
We thus recommend that quantitative evaluation of misclassification such as sensitivity analysis25–27,29,30 be used in place of qualitative heuristics, especially when decisions based on the magnitude of effects are to be based on the data in question. Even better is to obtain data on replicate or alternative measures of exposure, so that data-based correction methods can be brought to bear on the problem.26,28,31–33 Regardless of whether such data are available, we advise that evaluation not be based on the assumption of exact non-differentiality unless the latter can be deduced logically from the facts of the situation.34
| Acknowledgements |
|---|
|
|
|---|
This study was supported in part by the Children's Cancer Research Fund, Minneapolis, MN (to A.J.). We thank a reviewer for helpful comments on an earlier draft.
Conflict of interest: None declared.
KEY MESSAGES
|
| References |
|---|
|
|
|---|
1 Bross I. Misclassification in 2 x 2 tables. Biometrics (1954) 10:478–86.[CrossRef][Web of Science]
2 Newell DJ. Errors in the interpretation of errors in epidemiology. Am J Public Health Nations Health (1962) 52:1925–28.[Web of Science][Medline]
3 Keys A, Kihlberg JK. Effect of misclassification on estimated relative prevalence of a characteristic. Part I. Two populations infallibly distinguished. Part II. Errors in two variables. Am J Public Health Nations Health (1963) 53:1656–65.[Web of Science][Medline]
4 Gullen WH, Bearman JE, Johnson EA. Effects of misclassification in epidemiologic studies. Public Health Rep (1968) 83:914–18.[Medline]
5 Goldberg JD. The effects of misclassification on the bias in the difference between two proportions and the relative odds in the fourfold table. J Am Stat Assoc (1975) 70:561–67.[CrossRef][Web of Science]
6 Weinberg CR, Umbach DM, Greenland S. When will nondifferential misclassification of an exposure preserve the direction of a trend*. Am J Epidemiol (1994) 140:565–71.
7 Thomas DC. RE: When will nondifferential misclassification of an exposure preserve the direction of a trend*. Am J Epidemiol (1995) 142:782–83.
8 Weinberg CR, Umbach DM, Greenland S. Weinberg et al. reply [letter]. Am J Epidemiol (1995) 142:784.
9 Sorahan T, Gilthorpe MS. Non-differential misclassification of exposure always leads to an underestimate: An incorrect conclusion. Occup Environ Med (1994) 51:839–40.
10 Wacholder S, Hartge P, Lubin JH, Dosemeci M. Non-differential misclassification and bias towards the null: a clarification. Occup Environ Med (1995) 52:557–58.
11 Sorahan T, Gilthorpe MS. Sorahan and Gilthorpe reply [letter]. Occup Environ Med (1995) 52:558.
12 Jurek AM, Greenland S, Maldonado G, Church TR. Proper interpretation of misclassification effects: expectations versus observations. Int J Epidemiol (2005) 34:680–87.
13 Dosemeci M, Wacholder S, Lubin JH. Does nondifferential misclassification of exposure always bias a true effect toward the null value*. Am J Epidemiol (1990) 132:746–48.
14 Wacholder S, Dosemeci M, Lubin JH. Blind assignment of exposure does not always prevent differential misclassification. Am J Epidemiol (1991) 134:433–37.
15 Flegal KM, Keyl PM, Nieto FJ. Differential misclassification arising from nondifferential errors in exposure measurement. Am J Epidemiol (1991) 134:1233–44.
16 Kristensen P. Bias from nondifferential but dependent misclassification of exposure and outcome. Epidemiology (1992) 3:210–15.[Web of Science][Medline]
17 Chavance M, Dellatolas G, Lellouch J. Correlated nondifferential misclassifications of disease and exposure: application to a cross-sectional study of the relation between handedness and immune disorders. Int J Epidemiol (1992) 21:537–46.
18 Greenland S, Gustafson P. Accounting for independent nondifferential misclassification does not increase certainty that an observed association is in the correct direction. Am J Epidemiol (2006) 164:63–68.
19 Drews CD, Greenland S. The impact of differential recall on the results of case-control studies. Int J Epidemiol (1990) 19:1107–12.
20 Checkoway H, Pearce N, Kriebel D. Research Methods in Occupational Epidemiology. (2004) New York: Oxford University Press.
21 Savitz DA. Interpreting Epidemiologic Evidence: Strategies for Study Design and Analysis. (2003) New York: Oxford University Press.
22 Hennekens CH, Buring JE. Epidemiology in Medicine. (1987) Boston: Little, Brown and Company.
23 Maldonado G, Greenland S, Phillips C. Approximately nondifferential exposure misclassification does not ensure bias toward the null [abstract]. Am J Epidemiol (2000) 151:S39.
24 Garry VF, Schreinemachers D, Harkins ME, Griffith J. Pesticide appliers, biocides, and birth defects in rural Minnesota. Env Health Perspect (1996) 104:394–99.[CrossRef]
25 Greenland S, Lash TL. Bias analysis (Ch. 19). In: Modern Epidemiology.—Rothman KJ, Greenland S, Lash TL, eds. (2008) 3rd. Philadelphia, PA: Lippincott-Raven.
26 Gustafson P. Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. (2004) Boca Raton, FL: Chapman & Hall/CRC.
27 Gustafson P, Greenland S. Curious phenomena in Bayesian adjustment for exposure misclassification. Stat Med (2006) 25:87–103.[CrossRef][Web of Science][Medline]
28 Carroll RJ, Ruppert D, Stefanski LA, Crainceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. (2006) (2nd edn). Boca Raton, FL: Chapman & Hall/CRC.
29 Greenland S. Multiple-bias modelling for analysis of observational data (with discussion). J R Stat Soc A (2005) 168:267–308.
30 Fox MP, Lash TL, Greenland S. A method to automate probabilistic sensitivity analyses of misclassified binary variables. Int J Epidemiol (2005) 34:1370–76.
31 Spiegelman D, Zhao B, Kim J. Correlated errors in biased surrogates: study designs and methods for measurement error correction. Stat Med (2005) 24:1657–82.[CrossRef][Web of Science][Medline]
32 Cole S, Chu H, Greenland S. Multiple-imputation for measurement-error correction. Int J Epidemiol (2006) 35:1074–81.
33 Greenland S. Maximum-likelihood and closed-form estimators of epidemiologic measures under misclassification. J Stat Plan Inference (2007) 138:528–38.[CrossRef]
34 Greenland S, Fox MP, Lash TL. Reply to Roger Marshall [letter]. Int J Epidemiol (2006) 35:1589–90.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A M Jurek, G Maldonado, L G Spector, and J A Ross Periconceptional maternal vitamin supplementation and childhood leukaemia: an uncertainty analysis J Epidemiol Community Health, February 1, 2009; 63(2): 168 - 172. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
