Skip Navigation


IJE Advance Access originally published online on October 29, 2006
International Journal of Epidemiology 2006 35(6):1590-1592; doi:10.1093/ije/dyl228
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
35/6/1590    most recent
dyl228v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by GORI, G. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by GORI, G. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2006; all rights reserved.

Letters to the Editor

Dimensional errors of metaphorical measurements. Can they be resolved?

GIO BATTA GORI

The Health Policy Center. 6704 Barr Road, Bethesda, MD 20816, USA. E-mail: gorigb{at}msn.com

A recent commentary by James Marshall in this journal condenses a number of reports published in the last few years and is concerned, at long last, that imprecision ‘... may well consign epidemiological inquiry to the scientific sidelines.’1 The central unease is that precision and accuracy of measurement are essential to scientific qualification, coupled with the realization that well-behaved imprecision, of the sort Marshall links with postulations of Bross2 half a century ago, is virtually unknown in today's epidemiologic practice. At large, imprecision is seen as hanging on the synthesis of the random and systematic errors of primary data (e.g. exposures) and of estimates of and corrections for biases and confounders.

The problem is that many of the primary data cannot actually be measured and their values have to be simply estimated, especially in observational studies of chronic conditions, which has led to recommendations that study conclusions should be complemented with sensitivity analysis using Monte Carlo and Bayesian techniques. Yet, the question remains of whether a range of uncertainty is more useful or more confusing for decision makers confronted with choices. For instance, the Fox et al. sensitivity analysis that Marshall quotes presents a range of odds ratios from 1.11 to 10.7, namely an hypothetical span that could be wider or narrower depending on the arbitrary choice of sensitivity and specificity parameters.1,3 Thus the recommendations Marshall summarizes and endorses may make epidemiology reports more up-front, but may not add much to their factual representations.

In addition, a more basic problem with the measurement of primary exposure data is characteristically overlooked by epidemiologists. The case where exposures can be measured materially is extremely rare, given that exposures are usually determined cumulatively over many years or over the lifespan of study subjects. Typically, such exposures are ‘measured’ either by fitting subjects in graded categories of exposure or by asking individual subjects to recall their exposures over prior time. In the first case, some rudiment of measure is obtained when exposure categories are parameterized or not, based on measured or assumed levels of exposure for different job descriptions and on individual permanence in those jobs. Here, the inevitable assignment of some subjects to wrong categories creates misclassifications that confound the results to an unknowable extent, and that sensitivity analyses such as reviewed by Marshall can only attempt to warn about.

More severe and frequent is the problem of studies based on personal recall of chronic exposures, where not even qualitatively similar measurements are possible to allow comparison of different individual recalls. The reason is that measurement requires the use of a standard meter, whereas it is undeniable that each subject uses idiosyncratic memory yardsticks, each recall being generated according to different and erratic metrologies of unknowable dimensions, quite separate and in addition to preferential recall biases. Under such circumstances, individual recalls may be recorded as precise digits, but each digitized recall embodies a uniquely heterogeneous dimension. Therefore, a set of recalls is affected by dimensional errors that cannot be presumed to be randomly distributed, and cannot be standardized to fit some credibly manageable distribution across different individual recalls.

The traditional meaning of sample means here is lost because frequentist elaborations are based on the premise that individual measurements are obtained using the same yardstick. In effect, the means of individual recall responses are affected by unknowable dimensional error and do not become more precise, meaningful, or realistic with increasing sample size, meaning that the confidence intervals and associated P-values derived from recall digits are illusory. In the end, whatever primary data means might be obtained, they would derive from an unstable collection of individual dimensional reveries, rather than from factual measurements of events long gone.

The severity of the problem varies with the context: it may be less conspicuous in the case of individual daily consumption and quality of cigarettes that are known to remain substantially invariant over long periods of time. Yet, even for cigarette smoking the original 1964 Surgeon General Report carefully offered a judgment call rather than a conclusion based directly on science.4 At the lower end of credibility are lifetime recalls of exposures to toxics, second hand smoke, and common foods. In regard to the latter, for instance, the general acceptance "...that food frequency methods are sufficiently valid for etiologic studies."5 is based—among other assumptions—on the assuredly wrong assumption that individual dietary recalls derive from a homogenous metrology and do not suffer from dimensional error. In fact, each respondent will have his or her own elastic reference points, against which to come up with seemingly precise digital answers.

Dimensional issues are of paramount concern in science and engineering, and their neglect has serious consequences not only in epidemiology. In a spectacular engineering mishap of 1999, a $125 million NASA Mars orbiter was lost in space because one assembling team had used the Metric system, while the other favoured English units.6 Clearly, a dimensional confusion in epidemiology would be magnified many fold because of the different dimensions that each of multiple recalls entails.

Devising simulation methods to account for dimensional error in sensitivity analysis might be feasible, but their contribution to factual epidemiologic representations will remain problematic. Perhaps it is time to recognize the epistemic limits of epidemiology and agree with Doll and Peto that epidemiologic observations ‘... can seldom be made according to the strict requirements of experimental science and therefore may be open to a variety of interpretations. [Wherefore]... the need to observe imaginatively what actually happens to various different categories of people will remain.’7

Epidemiologic imagination, of course, is ideally suited for the generation of research hypotheses that more precise and accurate disciplines may explore. Still, sensitivity analysis might be a step in the right direction, especially if it were directed at estimating the level to which hazard ratios must rise in order likely to exceed the background noise of the cumulative imprecisions of a study. Arguable fixes to this problem were advanced long ago with the suggestion that only hazard ratios over 2 or 3 ought to be taken seriously,8 although some formally reasoned methods would be more convincing.

In the end, however, it cannot be denied that measurement in the observational epidemiology of chronic conditions will likely remain an intractable problem, inevitably leaving much of epidemiology on the dreaded sidelines of science. This may not matter until delusional postmodern interpretations of science hold sway, but how modest epidemiology should be in the face of its limitations remains a question that epidemiologists ought to address, before less kind others might wake up to it. As we chase lower and lower multifactorial risks with ever larger studies, too many skeletons keep piling up in the closet913: enough to question the levels of funding against the factual value of results.

References

1 Marshall JR. Commentary: About that measurement problem. Int J Epidemiol 2005;34:1376–7.[Free Full Text]

2 Bross I. Misclassification in 2x2 tables. Biometrics 1954;10:478–86.[CrossRef]

3 Fox M, Lash T, Greenland S. A method to automate probabilistic sensitivity analyses of misclassified binary variables. Int J Epidemiol 2005;34:1370–6.[Abstract/Free Full Text]

4 Smoking and Health. Report of the advisory committee to the Surgeon General of the Public Health Service. Public Health Service. Washington, DC: U.S. Department of Health, Education, and Welfare, 1964.

5 Byers T. Food frequency dietary assessment: how bad is good enough? Am J Epidemiol 2001;154:1087–8.[Free Full Text]

6 CNN. Metric mishap causes loss of NASA orbiter. http://www.cnn.com/TECH/space/9909/30/mars.metric.02/

7 Doll R, Peto R. The causes of cancer. J Nat Cancer Inst 1981;66:1192–312. (see p. 1218).

8 Breslow NE, Day NE. Statistical methods in cancer research. Volume I. The analysis of case-control studies. Publication No. 32 Lyon: International Agency for Research on Cancer, 1980.

9 Mark DM. Deaths attributable to obesity. JAMA 2005;293:1918–19.[Free Full Text]

10 Parker-Pope T. Trials and errors: In the study of Women's Health, design flaws raise questions. The Wall Street Journal. February 28, 2006. p. A1.

11 Statistical clinical trials ready for the scrap heap. Letters. The Wall Street Journal. March 13, 2006. p. A19.

12 Gori GB. Considerations on guidelines of good epidemiologic practice. Ann Epidemiol 2002;12:73–8.[CrossRef][Web of Science][Medline]

13 Feinleib M. New directions for community intervention studies. AJPH 1997;86:1696–8.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Int J EpidemiolHome page
G. B. Gori
Metaphorical measurements and theories
Int. J. Epidemiol., August 1, 2007; 36(4): 931 - 932.
[Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
35/6/1590    most recent
dyl228v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by GORI, G. B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by GORI, G. B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?