IJE Advance Access originally published online on March 3, 2008
International Journal of Epidemiology 2008 37(3):624-626; doi:10.1093/ije/dyn035
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Commentary: Why are we biased against bias?
Department of Epidemiology, University of North Carolina, School of Public Health, Chapel Hill, NC 27599-7435, USA. E-mail: jay_kaufman{at}unc.edu
Accepted 31 January 2008
Greater attention to causal inference has been one of the most important trends in social epidemiology over the last decade. The groundwork was laid 35 years ago by Mervyn Susser's book Causal Thinking in the Health Sciences,1 but growing interest more recently in causal techniques such as potential outcomes models and directed graphs has given the field new capacities for strengthening inference and honing arguments.2 Many techniques that have been standard in econometrics and the social sciences for years have made their way into social epidemiology in the last decade, including multilevel modeling,3 propensity score matching4 and instrumental variables.5 One such technique, exploited cleverly in the article by Gilman and colleagues,6 is the fixed effects regression model.
Epidemiologists have long been enthusiastic users of the same conditional estimator used for fixed effects analyses, but only in the context of the matched case-control study.7 Rather than have a case and a control matched by study design, however, the tradition in social sciences has been to consider exposed and unexposed observations matched by nature. For example, Sastry and Hussey considered racial differences in birthweight, conditioned on neighborhood of residence.8 This analysis holds constant all aspects of the neighborhood environment, and pools neighborhood-specific exposure contrasts to obtain a single summary measure of the racial contrast. In another recent example, Nelson and colleagues sought to estimate the causal effect of breast-feeding history on subsequent development of obesity, but they recognized that mothers decisions to breast-feed are correlated with many other behaviours and attitudes that could plausibly affect later obesity risk.9 Therefore, in addition to the effect estimate adjusted for measured confounders in a large cohort, the authors also obtained an estimate in the smaller number of siblings who were discordant on breast-feeding history. This latter estimate holds constant all maternal and family environment factors, and therefore eliminates confounding by all of these innumerable and unmeasurable influences. This is the really remarkable promise of the fixed effects model, and one that makes it so attractive for social epidemiology, where exposures are often heavily confounded by myriad contextual, behavioural and attitudinal quantities that would be difficult to assess exhaustively.
Gilman and colleagues take a similar approach, considering the relatively small number of siblings in the data set who are discordant on educational attainment. They use this contrast as a way of estimating the causal effect of education on smoking intensity and quit attempts. As they note, this holds constant all factors at the family environment level, eliminating confounding by parental behaviour, community influences and many other potential common influences on exposure and outcome. The estimates reported by Gilman et al. from this fixed effects analysis are generally closer to the null than the fully-adjusted estimates from the complete cohort, leading the authors to conclude that some additional confounding has been removed. This is a clever analytic strategy, and one that I hope will see wider use in social epidemiology in the future. But, as usual, the path is not always so rosy at it might first appear. As we pluck yet another plump fruit from the bountiful tree of methodology, let us be well aware of the worm that may lurk within.
Confounding is a bias, and a biased estimator leads to estimation error in expectation. But bias is not the only source of estimation error. An unbiased estimator has a distribution over repeated trials that is centered on the true value of the parameter, but has an expected squared error (ESE) equal to its variance. When an estimator is biased, then its ESE is equal to the sum of its variance and its squared bias. It is reasonable to suggest that our goal ought not to be minimum bias (i.e. the smallest possible expected distance between the true value of a parameter and the expected value of the estimator), but rather, that our goal ought to be minimum error. That is, in the one trial that we get to conduct, we would like our estimate to be as close as possible to the true value. If this is indeed our goal, then the fixed effect estimate provided by Gilman et al. is not always what we are looking for.
Let's take as an illustrative example the estimates presented in the abstract. The number of pack-years smoked was higher in those with less than a high school education, with a fully adjusted rate ratio (RR) equal to 1.58 (95% CI: 1.31, 191) in the full cohort, and with a RR equal to 1.23 (95% CI: 0.80, 1.93) in the sibling fixed effects analysis. Even in the best case scenario that the latter estimate is completely unconfounded, it is still considerably more imprecise; the Model III standard error of the ln(RR) is 0.225 compared with the Model II standard error of 0.096 (Figure 1). If the null hypothesis is true, then the ESE for the fixed effects estimator may indeed be the smaller of the two. If there really is a causal effect of smoking, however, then the confounded estimate in the full cohort can outperform the unbiased estimate by having the sum of its squared bias and variance still be less than the ESE of the fixed effects estimate. If the true causal effect is a RR of around 1.5, for example, then obtaining the value of 1.23 with the unbiased fixed effects estimator is not at all unusual; the difference between the log values is about 0.9 SD, and so we would expect to see a value this far or further from the true value in more than a third of all repetitions of the trial. Moreover, the point estimates are not substantially different, as Figure 1 makes clear, while the fixed effects estimate is dramatically less precise. The entire range of hypothetical RR values that were compatible with the data under Model II are still compatible with the data under Model III.
|
Fixed effects models are a creative and valuable addition to the social epidemiologists armamentarium, but sometimes the tool can simply be too expensive. When a very small subset of the cohort is available for the discordant analysis, as in the papers by Nelson et al.9 and by Gilman et al.,6 one must be concerned about two things. One is the highly selected nature of the subset. Who are these siblings, who are so disparate in their education or breast feeding exposures and what happened to create that disparity? Causal inference is premised on the idea that one sibling just happened to get more education than the other, as if through some kind of lottery, but this seems suspicious. Siblings share 50% of their alleles, but if in that remaining 50% that are not shared, one sibling has an allele that led to an impulsive phenotype, this could predict lower educational attainment and more smoking. Such a trait would not be adjusted for in the fixed effects analysis, a caveat that Gilman and colleagues are very careful to note. But the second concern about the small subset of discordant pairs is about the precision of the estimator. In general, we ought to try to be close to the right answer. The logical basis for random-effects models, for example, is that a biased estimator can outperform an unbiased estimator by having lower variance, and therefore lower ESE.10 This same logic will sometimes prefer a confounded estimate over an unconfounded estimate if the adjustment comes at a huge price in terms of precision. An exception to this notion might be if the result is simply going to be contributed to a later meta-analysis, instead of interpreted on its own. In that case, the bias vs precision trade-off might be weighed differently. But in the case that one wants an interpretation of the study that was conducted, then error is error, and the less of it, the better.
| References |
|---|
|
|
|---|
1 Susser M. Causal Thinking in the Health Sciences: Concepts and Strategies of Epidemiology (1973) New York: Oxford University Press. 181.
2 Oakes JM, Kaufman JS, eds. Methods in Social Epidemiology. (2006) San Francisco: Jossey Bass.
3 OCampo P, Xue X, Wang MC, Caughy M. Neighborhood risk factors for low birthweight in Baltimore: a multilevel analysis. Am J Public Health (1997) 87:1113–18.
4 Bingenheimer JB, Brennan RT, Earls FJ. Firearm violence exposure and serious violent behavior. Science (2005) 308:1323–26.
5 de Walque D. Does education affect smoking behaviors? Evidence using the Vietnam draft as an instrument for college education. J Health Econ (2007) 26:877–95.[CrossRef][Web of Science][Medline]
6 Gilman SE, Martin LT, Abrams DB, et al. Educational attainment and cigarette smoking: a causal association? In: International Journal of Epidemiology (2006) doi:10.1093/ije/dym250.
7 Holford TR, White C, Kelsey JL. Multivariate analysis for matched case-control studies. Am J Epidemiol (1978) 107:245–56.
8 Sastry N, Hussey JM. An investigation of racial and ethnic disparities in birth weight in Chicago neighborhoods. Demography (2003) 40:701–25.[Web of Science][Medline]
9 Nelson MC, Gordon-Larsen P, Adair LS. Are adolescents who were breast-fed less likely to be overweight? Analyses of sibling pairs to reduce confounding. Epidemiology (2005) 16:247–53.[CrossRef][Web of Science][Medline]
10 Greenland S. Principles of multilevel modelling. Int J Epidemiol (2000) 29:158–67.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. E. Gilman, H. Gardener, and S. L. Buka Maternal Smoking during Pregnancy and Children's Cognitive and Physical Development: A Causal Risk Factor? Am. J. Epidemiol., September 1, 2008; 168(5): 522 - 531. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

