IJE Advance Access originally published online on November 3, 2008
International Journal of Epidemiology 2009 38(1):274-275; doi:10.1093/ije/dyn232
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Commentary: How small is small?
Strangeways Research Laboratory, University of Cambridge, Wort's Causeway, Cambridge CB1 8RN, UK.
E-mail: nick.day{at}cwgsy.net
Accepted 1 October 2008
The paper by Prof. Burton and his colleagues1 provides a more comprehensive approach to power calculations than that usually used in population genomics. By incorporating explicitly factors which are known to diminish power, they provide a tool for generating realistic power profiles. This is clearly a useful advance on previous approaches which considered such factors as disease misclassification and measurement error more as nuisance parameters, worthy at best of a footnote. The paper, however, seems to suffer from a certain lack of focus, as if the authors were dazzled by the power of modern genotyping technology. The scene is set early in the introduction, where they claim that a series of recent publications has convincingly identified a range of genetic associations with chronic disease, but that prior to these publications, genetic association studies were strikingly inconsistent. Of the recent publications they cite, 15 out of 20 were published in 2007, the remaining 5 in 2005 and 2006. It is an astonishing dismissal of several decades of productive work on candidate genes prior to 2005, when many gene–disease associations were reliably identified. Even in the 1970s quite a number of disease associations with HLA genes were identified and replicated.2,3 In subsequent years, many further candidate genes have been clearly established. What has changed, and changed dramatically, in recent years is the capacity to genotype on a scale unimaginable even 10 years ago. The effect has been 2-fold. First, vast numbers of polymorphisms can be studied simultaneously, rather than focusing attention on a small number of genes, which a priori might be thought to modify risk. Second, very many more individuals can be genotyped in a single study. The earlier work, concentrating on candidate genes, was designed to look for effects that today would be considered large, i.e. relative risks for carriers of a high risk allele approaching 2-fold. Sample sizes of a few hundred were adequate.4 With the revolution in genotyping came the realization that much smaller relative risks could be unambiguously identified. The problem, however, is that for most complex diseases one would expect an increasing number of associations the smaller the size of effect one looks for. That is, one might expect the number of polymorphisms with relative risks of 1.05 to be considerably larger than the number of polymorphisms with relative risks of 1.25, in turn larger than the number with relative risks of 1.5. Where does one stop? It is clearly a situation of rapidly diminishing returns: the smaller the effect, the greater the resources needed to identify it. The authors start their discussion with the clarion call Big bioscience is critically poised ... . It is essential to close the reality gap that currently exists between the sample sizes really required to detect determinants of scientific interest that have plausible bioclinical effects, and the sample sizes that are typically used ... Clearly one needs sample sizes sufficient to give good power to detect the effects one is interested in, but how small can or should these be? How small does small have to be for it to be not worth studying? If genotyping and sample collection were free, there might be no limit to the smallness of effect one hoped to demonstrate, but in the real world nothing is free and there is competition for funds. The authors avoid the issue of how small are the effects one should design studies to detect. Instead, they ask the question How far should the international investment in biobanking go?, and here the lack of focus is striking. They appear to equate case series based studies, such as the WTCCC with long-term cohort studies such as the UK and other Biobanks. The former can bring together very large numbers of cases in a short period of time. They are clearly ideal for studying genetic effects, including gene–gene interactions. The main limitation on the smallness of effect they are designed to detect, i.e. the number of individuals included, is the cost of genotyping. It is from studies of this type that one would expect the major new discoveries of purely genetic effects to emerge, and as the cost of genotyping continues to decline, effects of ever smaller size. In addition, there is a good practical justification for wanting to identify a large number of weak effect polymorphisms.5 Cohort studies are essentially different. The major limitation on size is the cost of recruiting individuals into the study, many of whom will only appear in specific disease studies many years in the future, if at all. With cases appearing only slowly over time, they are very inefficient for studying purely genetic effects. Case–control studies will have provided answers long before these newly launched cohort studies have accrued sufficient cases. What, then, are these large cohort studies for? Clearly, to study environmental (in the broad sense) determinants of disease, and specifically gene–environment interactions. So how large does a gene–environment interaction have to be in order to be of real interest? Considerably larger, I would suspect, than a purely genetic effect. How is one going to use an interaction effect (i.e. ratio of relative risks) of 1.5 or less? Furthermore, for studying interactions, the reliability of the environmental measures is crucial, as can be seen in the authors Figure 2. In the design of Biobanks, one has the impression that quality of measurement has been sacrificed for sample size, one suspects because they were mistakenly focused on purely genetic effects. The basis for deciding future investment in biobanks depends on much more than power curves. It requires a clear definition of both the nature and the size of effects being sought.
Finally, perhaps the most elegant result from genome-wide association studies that has so far appeared is the one linking statin associated myopathy to a non-synonymous SNP linked to statin metabolism, where the odds ratio between the two homozygotes was 16.9.6 The study was based on 85 cases and 90 controls. In the days of the Hadron collider and big bioscience, one may need reminding that imagination and intelligence have always been prime drivers of scientific advance.
Conflicts of interest: None declared.
| References |
|---|
|
|
|---|
1 Burton PR, Hansell AL, Fortier I, et al. Size matters: just how big is BIG?: quantifying realistic sample size requirements for human genome epidemiology. Int J Epidemiol (2009) 38:263–73.
2 HLA and Disease 1976. INSERM Publication No. 58.
3 Kissmeyer-Nielsen F, ed. Histocompatibility Testing. (1975) Copenhagen: Munksgaard.
4 Chan SH, Day NE, Kunaratnam N, Chia KB, Simons MJ. HLA and nasopharyngeal cancer in Chinese: a further study. Int J Cancer (1983) 32:171–76.[Web of Science][Medline]
5 Pharoah PDP, Antoniou A, Bobrow M, Zimmern RL, Easton DF, Ponder BAJ. Polygenic susceptibility to breast cancer and implications for prevention. Nat Genet (2002) 31:33–36.[CrossRef][Web of Science][Medline]
6 SEARCH Study Collaborative Group. Genome-wide study finds SLCO1B1 variants strongly linked to statin-induced myopathy. N Engl J Med (2008) 359:789–99.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||