IJE Advance Access originally published online on May 3, 2006
International Journal of Epidemiology 2006 35(3):536-537; doi:10.1093/ije/dyl070
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Commentary |
Commentary: Statistical analysis or biological analysis as tools for understanding biological causes
Museum of Comparative Zoology, Harvard University, MA, USA
E-mail: lewontin{at}oeb.harvard.edu
The Analysis of Variance and the Analysis of Causes1 was a paper of its time. The problem it addressed was how the observed variation in phenotype among human individuals and groups was analysed in an attempt to separate and assign importance to the roles of genetic and environmental variation. It dealt with the conceptual errors involved in the formulation of the problem and in the understanding of the statistical methodology then available to investigate the question. Thirty-five years ago, when that essay and the two papers in the American Journal of Human genetics that were its instigation were published, the methodology available to human geneticists for understanding the interplay between genetic and environmental variation was severely limited. There were relatively few cases in which alternative alleles at single gene loci could be identified and associated with particular phenotypes as, for example, sickle cell anaemia. There was then no ambiguity about genetic and environmental effects. However, for most characteristics of interest to human geneticists, such as longevity, disease susceptibility, and behaviour, phenotypic variation could not be related to well-characterized genetic variants of known genes. Genetic variation meant, in practice, differences among individuals or groups having different degrees of relationship with each other. Identical twins are genetically identical while full sibs share, on the average, only half of their allelic states. Members of the same geographical race will have a higher probability of having alleles in common than individuals with different geographical ancestry. Using the average degree of genetic similarity among individuals of known ancestral relationship, calculated from genetic principles, and on rather non-specific estimates of the similarities and differences between their environments, estimates were made of the proportion of all the variation in a phenotypic characteristic that was caused by genetic as opposed to environmental differences among them.
The statistical methodology employed for investigating the sources of variation was the Analysis of Variance (ANOVA), a technique originally developed to analyse the genetic and environmental influences on characters of agricultural importance in domesticated plants and animals. The use of ANOVA in agricultural genetics is somewhat less problematical than in human genetics. First, a virtually unlimited number of genetically identical individuals can be produced by making inbred lines or, for some plants, by clonal reproduction. Second, in a typical agricultural trial, genetically different strains are tested together in the field at defined multiple locations and in multiple years. While year-to-year effects cannot be duplicated, the same localities (experimental plots on designated farms) can be employed over and over again. In human genetics, on the other hand, whole genotypes cannot be replicated in large numbers and there is no real control or replicability of the environments in which individuals develop. Nevertheless, the general structure of inference, using ANOVA, is essentially the same in human and in agricultural genetics. The point of the paper was to explain why the statistical partitioning of observed variation in phenotype into variance associated with variation in genetic relationship as opposed to variance assigned to environmental dissimilarities does not, in fact, separate genetic and environmental causes in development, whether in human genetics or in agricultural applications. The reason why the partitioning of variance does not partition causes is that changing the distribution of genotypes will also change the environmental variance, while changes in the distribution of environments will also change the genetic variance. Moreover, neither the magnitude nor the direction of these changes can be predicted from the analysis.
The methodology of genetics has changed drastically in the last 35 years. Reports of enzyme variants detected by gel electrophoresis being associated with pathologies began to appear in the American Journal of Human Genetics in the late 1960s and made up
20% of research papers in the American Journal of Human Genetics by 1974, when the paper on the analysis of causes was written. Such identification of single gene differences by electrophoresis made it possible to examine the phenotype of large numbers of individuals with the same variant in genotype. However, the simple knowledge that one or more unspecified amino acids differed between the proteins involved was not sufficient to provide a causal analysis of phenotypic difference. It gave only an observed correlation between phenotype and some unspecified change in a specific gene.
The real revolution in causal analysis has been a result of studies at the level of DNA and RNA. First, the detailed maps of the genome at the DNA level have made it possible, by the co-segregation techniques originally used in Quantitative Trait Locus (QTL) studies, to localize a genetic variant to a small region of a chromosome. Next, candidate genes in this region can be sequenced and inferences made about the DNA change that is connected with the phenotypic difference. Finally, it then becomes possible to provide a detailed molecular analysis of the chain of causation between nucleotide substitution and cell development and function.
This radical change in the process of investigation from statistical inference to concrete molecular and cellular investigation has had both a positive and a negative consequence. On the positive side it becomes possible to provide a concrete causal explanation of changes in phenotype, a form of explanation that must be the eventual goal of functional biology. On the negative side, however, it is a powerful reinforcement of the erroneous notion that variation in phenotype is entirely the consequence of genetic variation. Despite its weaknesses and misinterpretations, the statistical analysis of variation into proportions of genetic, environmental and geneenvironment interaction components of phenotypic variance reinforced the correct understanding that genes and environment are effective variables in development and function. In its present stage, dizzy with success, genetics has lost sight of half the story of biological causation. If the ANOVA did nothing else it created a mind-set that was much closer to the truth than the naive current prejudice that DNA has in it all the information necessary to specify the organism.
| Reference |
|---|
|
|
|---|
1 Lewontin RC. The analysis of variance and the analysis of causes. Am J Hum Genet 1974;26:40011. (Reprinted Int J Epidemiol 2006;35:52025.)[ISI][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. EBRAHIM The future of modern epidemiology: genetics, methods, and history Int. J. Epidemiol., June 1, 2006; 35(3): 511 - 512. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
