IJE Advance Access first published online on March 2, 2007
This version published online on March 11, 2007
International Journal of Epidemiology, doi:10.1093/ije/dyl291
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
First steps in molecular epidemiology: Lower et al. 1979
Imperial College London and University of Torino.
E-mail: p.vineis{at}imperial.ac.uk
Accepted 1 December 2006
It is usually believed that the expression molecular epidemiology was first introduced (at least for chronic diseases) in the seminal paper by Perera and Weinstein in 1982.1 It should not be forgotten, however, that the expression appeared already in the title of the paper by Lower et al. published in 1979.2 Terminology apart, the paper we submit to the attention of the readers as a historical reprint raises a number of interesting issues.
| A priori hypothesis |
|---|
|
|
|---|
The paper is rather surprising because about half of it is taken by a long introduction on the biological premises of the work. This would probably not be accepted by the editor of an epidemiological journal today. Most papers nowadays are brief, factual and convey a simple, short, take-home message. Also, in the era of high-throughput technologies, rather than putting forward sound hypotheses that justify the choice of particular genes or exposures for investigation, we see the burgeoning of genome-wide, proteome-wide and other ome-wide scanning, that apparently does not need the long and tedious work of thinking in terms of models of causation.
The paper by Lower and colleagues is particularly well-written because it exposes very elegantly a theory of bladder carcinogenesis, from which a study design, with specific hypotheses to be tested, ensues. Since bladder cancer was known to be mainly due to exposure to aromatic amines (as occupational carcinogens or in tobacco smoke), the authors chose correctly to investigate the role of N-acetyltransferase 2 (NAT2) in modulating the risk of cancer. Very clear graphical representations in the paper show the metabolic pathways that aromatic amines undergo and the role played by NAT2. At that time it was not possible yet to genotype subjects in the context of an epidemiological investigation, therefore cases and controls were phenotyped with a biochemical method that separated very clearly the rapid from the slow metabolizers. The study shows a difference in the distribution of the acetylator phenotype among the cases and the controls (as expected a priori, with more slow acetylators among the former), but only in (urban) Denmark and not in (rural) Sweden. The authors attribute the difference between rural Sweden and urban Denmark to the different levels of exposure to aromatic amines in the two study locations, from occupational or environmental (air pollution) sources. This interpretation is plausible but uncertain, since studies have generally failed in showing a clear association between air pollution and bladder cancer, and occupational exposures to aromatic amines are rare. The reasoning is nevertheless interesting because, correctly, it stresses the crucial role played by the environment when genetic susceptibility is investigated.
The association between bladder cancer and the NAT2 genotype is probably one of the best investigated in the history of the genetics of cancer, and one of the fewconcerning low-penetrant genesthat have been replicated several times.3,4 Usually the association with NAT2 has been found in particular in the exposed groups, such as industrial workers or smokers, suggesting that Lower's idea that the discrepancy between Sweden and Denmark could be due to exposures was essentially correct. In fact, this intuition has a more general implication that I will address in the following.
Still some notations on the study design and analysis. The studywhich is clearly interesting and generally well-conductedreflects however, the limitations of several similar studies conducted in the early times of molecular epidemiology, i.e. the lack of a series of controls who were really comparable to the case series; in this particular example, controls were recruited both from the hospital personnel and among the patients, a questionable choice. Also the statistical analysis is simplified, being based on the calculation of P-values and not measures of association with their confidence limits. This is not a purely academic issue because it is well-known that the P-value has serious limitations in its interpretation. But these critical comments do not detract from the great merits of the paper.
| Early examples of Mendelian randomization |
|---|
|
|
|---|
NAT2 and bladder cancer also represents one of the first examples of Mendelian randomization. The latter consists in assessing the relationship between a gene variant that is known to influence the levels of the exposure variable, and the disease. Since a gene variant is randomly assorted from parents to the offspring, the association between the gene variant and the disease is not confounded by covariates that usually confound epidemiological associations.5 Key concepts of Mendelian randomization have appeared in Katan's suggestion that genetic variants related to cholesterol level could be used to investigate whether the association between low cholesterol and increased cancer risk was real,6 and by Honkanen's suggestion that lactase persistence could better characterize the difficult-to-measure environmental exposure of calcium intake.7
When early studies suggested that bladder cancer was not only associated with tobacco smoking, but in particular with the smoking of black tobacco, the hypothesis was put forward that this was due to the higher content in aromatic amines of black, air-cured tobacco.8 So, one side of the causal reasoning was the explanation, in terms of aromatic amines, of the association between black tobacco and bladder cancer. However, we could not exclude that the association was mediated and confounded by other characteristics of black tobacco cigarettes, such as the lack of filter, or the amount of tar. To reinforce the hypothesis that aromatic amines and not other variables (confounders) explained the observation, we did a study on NAT2 and adducts of the aromatic amine 4-aminobiphenyl, a potent carcinogen in tobacco smoke. We found that in fact (i) the biomarker was higher in the blood of black tobacco smokers than in the blood of smokers of other types of tobacco or of non-smokers; and (ii) that the levels of the marker were higher in slow acetylators, as expected.9 This was the second arm of the Mendelian randomization reasoning. The third arm was to show that bladder cancer was more likely to occur in slow acetylators, as we, Lower and others showed.
Mendelian randomization can be a powerful way to disentangle the role of genuine risk factors, including compounds in a mixture, from the role of confounders.
| Geneenvironment interactions without the environment? |
|---|
|
|
|---|
Today the tendency is to study a large number of genes with high-throughput technologies. Platforms such as Illumina allow for the investigation of up to 300 000500 000 gene variants. The usual design is based on a first genome-wide scan, which leads to the selection of a number of positive results (depending on the level of
- error established a priori), and by two to four replications in independent data sets. The gene variants investigated are thus narrowed down in each replicate until a small number (say, a dozen) of gene variants are left as the end product. There are several critical aspects in this procedure.- Lack of data on environmental exposures. It is well-known, and well stated in Lower's paper, that the role of low-penetrant gene variants is related to environmental exposures (and in fact Mendelian randomization has been used to confirm the role of the latter on the basis of genetic knowledge).10 Whole genome scans do not use information on exposures and do not take geneenvironment interactions into account. What are the implications of this choice? Let us suppose that the effect of the gene variant is zero (OR = 1.0) in the unexposed, while the OR is 4.0 in the exposed, a realistic assumption for many geneenvironment interactions. The main effect of the gene will depend on the prevalence of exposure. If the prevalence of exposure is 50%, then the main effect will be represented by an OR of 2.5, but if the prevalence of the relevant exposure is just 10% (which is in fact more realistic for many environmental exposures) then the main effect OR becomes 1.3, hardly detectable with a reasonable sample size.The question one can legitimately ask is whether we would have identified NAT2 as relevant to bladder cancer through whole genome scan. In addition to NAT2, the second validated low-penetrant gene variant for bladder cancer is GSTM1 deletion.4 Unfortunately, deletions are not detected through whole genome scan.
- Study design. The study design for the investigation of gene variants is usually the case-control investigation. In a specific whole genome scan, cases come from a hospital series (rarely they have a population basis) and controls come from disparate sources including blood donors. Is this a sensible strategy? At first glance, one would say yes, because it is unlikely thatwithin an ethnically homogeneous population (that is, leaving population stratification aside)gene variants undergo selection bias. However, at a closer look, selection bias is likely. A blood donor sample is likely to be selected in that it tends to exclude subjects with extreme behaviours and exposures, and this can introduce bias in the comparison with the case series in the case-control studies. This would impair not only a reasonable investigation of risk factors (which is not a goal of whole genome scans) but also of geneenvironment interactions and genetic associations. For example, it is likely that the DRD2 gene polymorphisms are associated with both alcohol drinking behaviours and alcohol-related disease. Selected healthy blood donors will have a lower proportion of heavy drinkers and alcohol-related diseases than a random sample of the general population, with a consequently lower representation of the high-risk gene variants. This can happen also in population controls, due to selective response, but will not happen in cases, who do not undergo the same selection process.The rather obvious solution is to prefer studies nested within cohorts, since the cases and the controls come from the same source population, i.e. no selection bias is present, and cases and controls are strictly comparable. In addition, exposure ascertainment is usually much more accurate in cohort studies, thus allowing a better understanding of geneenvironment interactions. However, a disadvantage of cohort studies is that they usually focus on a relatively limited range of exposures, i.e. they do not cover the wide range covered (retrospectively) in case-control studies.
- A further, unresolved problem is not only the (internal) comparability of cases and controls in each replicate, but also the comparability of replicates. If the first study is a population-based case-control study (PBCC), the second a hospital based study (HBCC) and the third a cohort (C), how can their results be compared? In the example above: it is possible that in the PBCC there is an underrepresention of heavy drinkers among controls (but less so in the case series), with the ensuing underrepresentation also of DRD2 variants; in the second study (HBCC) anything can happen, depending on the choice of hospital controls, while in the cohort study at least cases and controls are mutually comparable, although the cohort itself is usually unlikely to be representative of the general population.
In summary, it is not necessarily true that the main problem in the study of main effects of gene variants through whole genome scanning is false positives due to the large number of comparisons (which justifies the replications in different data sets). In fact, false positives from selection bias are also a problem, together with the general attenuationif not disappearanceof the genetic associations due to the lack of information on exposures. Some of these problems were already alluded to in Lower's paper of 1979.
| Acknowledgements |
|---|
|
|
|---|
I am grateful to Paul Brennan for thoughtful comments. This work was made possible by an EU grant to the ECNIS Network of Excellence (grant FOOD-CT-2005513943)(WP8).
| References |
|---|
|
|
|---|
1 Perera FP and Weinstein IB. (1982) Molecular epidemiology and carcinogen-DNA adduct detection: new approaches to studies of human cancer causation. J Chronic Dis 35:581600.[CrossRef][ISI][Medline]
2 Lower GM, Nilsson T, Nelson CE, Wolf H, Gamsky TE, Bryan GT. (1979) N-acetyltransferase phenotype and risk in urinary bladder cancer: approaches in molecular epidemiology. Preliminary results in Sweden and Denmark. Environ Health Perspect. 29: pp. 7179 (Reprinted Int J Epidemiol doi:10.1093/ije/dyl290.).[ISI][Medline]
3 Vineis P, Caporaso N, Cuzick J, Lang M, Malats N, Boffetta P. (1999) Genetic Susceptibility to Cancer: Metabolic Polymorphisms. Scientific Publ.(IARC, Lyon, France).
4 Garcia-Closas M, Malats N, Silverman D, et al. (2005) NAT2 slow acetylation, GSTM1 null genotype, and risk of bladder cancer: results from the Spanish Bladder Cancer Study and meta-analyses. Lancet 366:64959.[CrossRef][ISI][Medline]
5 Davey Smith G, Ebrahim S, Lewis S, Hansell AL, Palmer LJ, Burton PR. (2005) Genetic epidemiology and public health: hope, hype and future prospects. Lancet 366:148489.[CrossRef][ISI][Medline]
6 Katan MB. (1986) Apoliopoprotein E isoforms, serum cholesterol and cancer. Lancet 327:5078.
7 Honkanen R, Pulkkinen P, Järvinen MR. (1996) Does lactose intolerance predispose to low bone density? A population-based study of perimenopausal Finnish women. Bone 19:2328.[Medline]
8 Bartsch H, Malaveille C, Friesen M, Kadlubar FF, Vineis P. (1993) Black (air-cured) and blond (flue-cured) tobacco cancer risk. IV: Molecular dosimetry studies implicate aromatic amines as bladder carcinogens. Eur J Cancer 29A:1199207.
9 Vineis P, Bartsch H, Caporaso N, et al. (1994) Genetically based N-acetyltransferase metabolic polymorphism and low-level environmental exposure to carcinogens. Nature 369:15456.[CrossRef][Medline]
10 Vineis P, Airoldi L, Veglia P, et al. (2005) Environmental tobacco smoke and risk of respiratory cancer and chronic obstructive pulmonary disease in former smokers and never smokers in the EPIC prospective study. BMJ 330:277.
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||