Skip Navigation

International Journal of Epidemiology 2008 37(3):447-451; doi:10.1093/ije/dyn049
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ewens, W. J
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ewens, W. J
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2008; all rights reserved.

Commentary: On Haldane's ‘defense of beanbag genetics’

Warren J Ewens

Department of Biology, The University of Pennsylvania, Philadelphia PA 19104, USA. E-mail: wewens{at}sas.upenn.edu

Accepted 23 January 2008


    Introduction
 Top
 Introduction
 Why did Haldane defend...
 A review of the...
 What should the defence...
 The current situation
 References
 
It is now almost 45 years since Haldane's ‘Defense of beanbag genetics’1 appeared, and the time is ripe for an evaluation of that defence, particularly in the light of developments over the last two decades. This deliberately provocative review is divided into four parts: first, a discussion of why Haldane felt that a defence was necessary; second, a review of what his defence actually was; third, suggestions as to what his defence should have been; and fourth, a discussion of what changes might be made to the defence had it been written now and not in 1964.


    Why did Haldane defend ‘beanbag genetics’?
 Top
 Introduction
 Why did Haldane defend...
 A review of the...
 What should the defence...
 The current situation
 References
 
Haldane's ‘defense of beanbag genetics’ arose from two comments made by Mayr in 1959 and 1963, respectively. The 1963 comment did little more than introduce the ‘beanbag’ expression, and was made at the beginning of Chapter 10 in his classic book Animal Species and Evolution.2 Mayr stated that: ‘The Mendelian was apt to compare the genetic contents of a population to a bag full of colored beans’, and that ‘thinking of beanbag genetics is in many ways quite misleading’. The 1959 comment was made at a Cold Spring Harbor symposium,3 and was more challenging to population geneticists: ‘What, precisely, has been the contribution of the [Fisher, Wright and Haldane] mathematical school to evolutionary thinking?’ We start by discussing Mayr's beanbag comment and the reason why it was made.

The beanbag comment, although made at the beginning of Chapter 10 of Mayr's book, was in my opinion motivated by the closing comments in Chapter 9, in which he discussed the then-new concept of the substitutional load, or the ‘cost of natural selection’. This concept, put forward by Haldane4,5 a few years before Mayr's comments on it, in effect claimed that the rate of evolution by natural selection was severely limited because too rapid a rate would require an unbearable offspring requirement on selectively favoured individuals. This argument, together with further calculations made mainly by Kimura,6 became the initial impetus for Kimura's controversial neutral theory of evolution. This is not the place to go into the detailed mathematics of why I for one, and many others, find the mathematical calculations supporting the neutral theory to be misguided. It is sufficient to note for the purposes of evaluating the usefulness of population genetics theory that perhaps the major reason for disliking these calculations is that they were made on an inappropriate reductionist basis. That is, single locus fitnesses were in effect assigned to the various genotypes at any locus, and the fitness of the entire individual is then found, implicitly, by multiplying the fitnesses of the genotypes at each locus that this individual has over all loci in the genome. This leads to absurdly high fitness values, and these caused much unnecessary confusion concerning the substitutional load concept. As stated above, this concept was, arguably, the impetus of the beanbag comments, but Haldane refers only once to this specific issue in his ‘defense’, where he claims that his calculations on this load perhaps define the main factor in determining the speed of evolution. If so, the essential collapse of the load concept in the last few decades provides a negative, not a positive, commentary on the value of some of his mathematical calculations.

Haldane's ‘defense’ focuses on his reply to the second of the Mayr quotations given above, so it is more appropriate also to focus on the broader issues that both he and Mayr take up in this second quotation. These issues are in any event more important than the specific substitutional load concept, and they do deserve serious consideration.


    A review of the ‘defense’
 Top
 Introduction
 Why did Haldane defend...
 A review of the...
 What should the defence...
 The current situation
 References
 
There is much discursive comment in the defence concerning Latin, Greek and other authors, as well as commentary on various political and religious matters and other extraneous material. These are ignored in the comments that follow, and I note here only that it is sometimes difficult to follow the point of some of Haldane's arguments, since they often drift off into this extraneous material. So what were his main points of defence? Following an introductory page referring to Mayr's comments, the next two pages of the defence do no more than state that the mathematical theory of his (Haldane's) various papers in the 1920's on the rate of gene replacement under natural selection, and the parallel papers of others, do not contain any deep mathematics. While this is no doubt true, and the statement itself suitably modest, it does not advance the defence at all. Once Haldane really begins, on the fourth page of his paper, one finds, in my view, a rather weak defence of the place of mathematical calculations in evolutionary thinking.

The first point that Haldane addressed was the question of whether the fact that mutation rates are low implies that mutation is the ‘pace-maker’ of evolution, as claimed by Hogben.7 Haldane in effect claims that since mutations will arise at any gene locus several times in any generation in a population of size several hundred thousand, the mutation rate might not be, indeed probably is not, the main factor determining the rate at which evolution occurs. He further states that only an algebraic argument, which he and Wright initiated on this point, can be decisive in determining the matter. However, he does not take account of the fact that beneficial mutations form only a small proportion of all mutations, and that the probability of fixation even of a favourable mutation is quite small. Further, any mathematical treatment of this point must rely on some mathematical evolutionary model that can only imperfectly reflect reality. Finally, even among mathematical population geneticists his claim is not agreed to. Wright, in referring to his shifting balance theory of evolution, which he felt would lead to faster evolution than that arising from the Fisherian paradigm, states that evolution ‘would be very slow ... since it can be shown to be limited by the mutation rate’, while under Kimura's neutral theory the mutation rate is, exactly, the evolutionary rate. Where is there here a decisive mathematical argument?

The next point that Haldane takes up concerns his estimation of mutation rates, derived essentially from the formula for gene frequencies under a selection-mutation balance. I find his arguments here most unconvincing. First, the estimates involve estimation of selective values, and it is not shown how this is done. Second, no calculation is made of the certainly relatively large standard errors of the estimates. Third, the formulae used assume a stationary situation, and no note is taken of the effect of the certainly non-stationary behaviour of the human population for which he makes his calculations. Finally, it is not enough merely to estimate rates from theory: this is an empty exercise unless it is shown that the estimates agree with observation. He goes on to claim that later papers of his, written in the 1950's, provide even more accurate estimates, but these papers contain no theory at all and describe only proposed laboratory experiments involving rats, not humans, and provide nothing new.

Haldane next took up the fact that his classic 1920's differential equations showed that the fitness differentials required to explain the rapid increase of the melanic form if Biston betularia are very high, of the order of 40%, and that these differences were confirmed by field observations. This claim can certainly be accepted as a validation of his equations. But it should be noted that this observation directly contradicts his central substitutional load calculation that a species cannot cope with an excess reproductive requirement of more than 10% for the most fit genotype. It is also claiming far too much, as he does, that ‘it was not till 1957 [when this calculation was made] that biologists took my 1924 calculation seriously’.

He next takes up the point that stable polymorphisms can exist for several reasons, including the classic heterozygote selective advantage case and the case of a selective advantage for rare genotypes (for example through a sterility mechanism). But these points can be understood and indeed arrived at from purely verbal arguments, and do not require serious mathematics. I shall discuss in the following section the points that I feel Haldane should have made to defend mathematical population genetics, and on the narrow point of heterozygous advantage, one can point out that when more than two allelic types exist in a population, the condition that all types are maintained by selection of the heterozygote advantage type, involving as it does the eigenvalues of a fitness matrix, can only be arrived at by mathematical methods.

And so it goes on. My main complaint about the ‘defense’ is that it focuses, in my view, on local tactical matters rather than on broad-ranging strategic matters. One cannot imagine the grand sweep of Newtonian or Einsteinian dynamics without the mathematics involved. Can one imagine the grand sweep of evolutionary theory without the mathematics? To me the answer is clearly ‘yes’: indeed, this is what Darwin produced. But are there key points of the theory that are illuminated and for which mathematics is almost a sine qua non, upon which Haldane should have based his defence?


    What should the defence have been?
 Top
 Introduction
 Why did Haldane defend...
 A review of the...
 What should the defence...
 The current situation
 References
 
Taking up first one of the themes of the preceding paragraph, it is obviously impossible to form a mathematical description of the evolutionary process having a precision anything like that achieved by the mathematical analyses available in physics. So far as predicting the course of evolution is concerned, the best that we can hope for something like the use of mathematics in an area like weather prediction, where only approximate and short-range procedures are possible. The complexities of biological systems and the unavoidable stochastic element make this so. Beyond this, however, there is another area where one might claim an essential role for mathematics, namely by arriving at broad general evolutionary principles that can be reached only by a mathematical analysis. What principles have been reached in this way, and thus what defence should Haldane have made?

Before pursuing this point, it is perhaps useful to draw another parallel between the application of mathematics in physics and in evolutionary genetics. I feel sure that Fisher saw himself in his relation with Darwin as Maxwell was in his relation with Faraday. Both Darwin and Faraday were marvellous observers of nature, experimentalists and theorists. Both admitted to essentially no skills in mathematics. Maxwell completed Faraday's work with his celebrated equations, essential to the further development of many areas of physics. I feel sure that Fisher would have made a better defence of beanbag genetics than did Haldane, so I start with a few observations concerning his well-known book.8 In this book Fisher made it as one of the opening claims that the simple mathematics surrounding results such as the Hardy–Weinberg law demonstrate not only the compatibility of the Darwinian paradigm with the Mendelian hereditary system, but indeed the necessity of the Mendelian system for that paradigm. This was pointed out at a time when many biologists had not realized, or had even resisted, this fact. This was a paradigm-changing conclusion, and Haldane could surely have started his defence with a similar comment, which derived from the mathematics surrounding the stability of Hardy–Weinberg genotype frequencies. He could then have further mentioned the importance of the concept of the additive genetic (or, better, genic) variance, developed at length by Fisher. This concept is central to Fisher's Fundamental Theorem of Natural Selection, to plant and animal breeding, to the study of evolution directed by natural selection, and broadly to the relationship between the genome and its constituent genes, as well as leading to the idea of the Analysis of Variance. The results flowing from this concept, and indeed the concept itself, could not have been arrived at other than from a mathematical point of view.

Fisher, Haldane and Wright were all well aware of the fact that natural selection is a mechanism, within the Mendelian framework, for generating outcomes of very high a priori improbability. Unfortunately, their mathematical work on this point was not, and perhaps still has not been, sufficiently widely disseminated. The argument that intricate structures such as the eye, the heart and so on could not have evolved by natural selection, at least in the time known to be available since the formation of the earth, was disposed of by their mathematical work in the early years. As recently as 1986 this old chestnut has been revived (for example by Denton9), and so long as the mathematical work referred to above continues to be unappreciated, we can probably rely on further resurrections of it. Again, mathematics has here been a crucial support of the Darwinian/Mendelian paradigm.

Evolutionary processes contain a significant stochastic element, and the implications of this were investigated mathematically from an early time. The results obtained, in particular by Fisher, were crucial in showing that arguments such as those advanced by Hagedoorn and Hagedoorn,10 to the effect that random factors would rapidly destroy the variation upon which selection acts, were not justified. Again, this observation was a crucial one for the acceptance of the Darwinian/Mendelian paradigm, and again the result could only be obtained by a mathematical treatment.

One can cite other examples where mathematics has been central to an understanding of the properties of, indeed to a broad acceptance of, Darwinian/Mendelian paradigm, but those discussed above are enough to show, in my opinion, that Haldane aimed too low in his defence of mathematical work in evolutionary theory, and that he missed a golden opportunity of defending his craft well.


    The current situation
 Top
 Introduction
 Why did Haldane defend...
 A review of the...
 What should the defence...
 The current situation
 References
 
The ammunition available to Haldane for his defence was of course that available in the early 1960's. What further ammunition might be available today to take the defence even further?

Perhaps the main point made by Mayr, and acknowledged by Haldane, is that it is the entire genome in all its complexity that is central to essentially all important evolutionary questions, and that the predominantly single-locus mathematical theory of the time did not handle interactive effects of genes, entire genome results, and generally the myriad complexities of biological reality. Are we much further ahead today on this issue? Mathematical multi-locus theory has blossomed in the last 50 years, and has revealed some of the complexities of evolution where fitnesses depend on the genes at many loci. The effects of recombination have been extensively studied. It can however be claimed that these studies do not lead to paradigm-changing conclusions—it was always clear that evolution based on the whole genome level would be far more complicated than that based on a single gene locus. Further, no uniform new paradigm appears yet to have emerged from the mathematical analyses conducted. Some curious results have indeed emerged—for example, population mean fitness can steadily decrease under natural selection, essentially because of the effects of recombination, but these decreases appear not to be large or to arise often. The Fundamental Theorem of Natural Selection, when correctly interpreted, is now seen as a whole-genome result, applying even under non-random mating, and does provide insight into the effects of individual genes at individual loci on changes in population mean fitness. But, to repeat, these advances do not yet come close to addressing Mayr's point.

Perhaps the main contemporary argument for the usefulness of mathematics in the study of evolution can be based on the statistical analysis of the currently available large volumes of data, at the molecular level, describing samples of genetic material taken from natural populations. These analyses are retrospective, looking backward in time, and ask about the properties of the evolutionary process that led to the data currently observed. Analyses of this sort cannot be conducted on anything other than a mathematical basis, and it appears quite reasonable to claim that they provide the best opportunity for assessing many properties of the evolutionary process, as it has in fact happened.

We discuss just one example, mainly because it bears on the ‘beanbag’ debate in two different ways. Recall that the essence of the beanbag comment was that investigation of the genes at one single gene locus cannot provide a full picture of the evolutionary process, involving as it does the interactive effects of many genes at many loci. Two developments in the 1960's are relevant to this point. First, as mentioned above, two-locus and eventually multi-locus evolutionary models were developed, aiming at getting away from single-locus analyses and investigating, among other things, the evolutionary effects of recombination between gene loci. One major concept that arose from this work was the concept of linkage disequilibrium. If strong linkage disequilibrium exists between two gene loci, then the evolutionary processes at these loci are not independent, and have to be considered together. Perhaps the most important question arising from multi-locus analyses thus concerns the question of how much linkage disequilibrium actually exists in practice in natural populations. We return to this question, and the linkage disequilibrium question, below.

The second development in the 1960's was that data from many gene loci were becoming available, and that it was found that much more genetic variation appeared to exist at these loci than had, in many quarters, previously been thought. This led, as noted above, to the neutral theory, which claimed that a very high proportion of the polymorphisms involved were not due to natural selection, but reflected, instead, purely random stochastic variation of selectively equivalent alleles. Naturally this proposal was controversial, and starting in the 1970's, tests of this theory based on allele and later SNP data were put forward. Of these it is appropriate to mention the Watterson11 and the Tajima12 testing procedures. To use a currently popular expression, these procedures aimed at finding ‘signatures of selection’.

These tests focused on the genes at one single locus (or the nucleotides at one single site), although of course attempts were made to glue together several single locus (site) analyses into one single analysis. The results of these tests were often inconclusive, a problem possibly arising from the low power of these tests to detect selection. (More on the power question below.)

The extensive DNA sequence data now available allow a different approach for testing for selection. If one new mutant allelic type is in the process of replacing another type at some gene locus under the action of selection, then because of linkage between this locus and neighbouring loci, this will tend to drag along towards fixation whatever allele at any such linked site happened to be on the same chromosome as the initial favoured new mutant gene. Of course recombination between sites will tend to break down this hitchhiking effect to some extent, but for very closely linked sites recombination will not be a significant factor. One might then expect to see a significant lack of polymorphism surrounding the selected locus at about the time that the fixation of the favoured allele takes leave. Without going into the details of the various analyses used, modern tests of selection are often based on this observation. It is indeed an interesting question to ask about the extent to which the currently observed haplotype blocks have arisen for such a selective reason.

The point to be made from the beanbag perspective is that these modern tests are not based on genetic information at one single locus. They rely among other things on the size the haplotype blocks observed, and these might extent over many loci. It is possible then to claim that modern genome data rather than single-locus beanbag data, together with the appropriate statistical analyses, will truly lead us to approach that Mayr was clearly seeking.

It is appropriate to conclude with two notes of caution. First, all statistical analyses of contemporary data which aim at finding the forces that led to these data rely on some mathematical model of evolution which can be at best only a rough approximation to reality. Robustness properties of the analyses are much needed. Second, because of the non-independence of the genes in any population because of eventual co-ancestry, the power of these tests can be low, even at a genome level. Similarly the standard errors of estimates of parameters, instead of being of the ‘usual’ form Formula , where n is the sample size, are normally of the far larger form Formula . These facts have to be kept in mind if one wishes to mount an ‘extended beanbag’ defence of a mathematically based assessment of the procedures leading to the form of currently observed genomic data.

Conflict of interest: None declared.


    References
 Top
 Introduction
 Why did Haldane defend...
 A review of the...
 What should the defence...
 The current situation
 References
 
1 Haldane JBS. A defense of beanbag genetics. Perspect Biol Med (1964) 7:343–60. Reprinted Int J Epidemiol 2008;37:435–42.[Web of Science][Medline]

2 Mayr E. Animal Species and Evolution (1963) Cambridge MA: Belknap Press.

3 Mayr E. Where are we? Cold Spring Harbor Symp Quant Biol (1959) 24:1–24.[Abstract/Free Full Text]

4 Haldane JBS. The cost of natural selection. J Genet (1957) 55:511–24.[Medline]

5 Haldane JBS. More precise expressions for the cost of natural selection. J Genet (1961) 57:351–60.[Web of Science]

6 Kimura M. Evolutionary rate at the molecular level. Nature (1968) 217:624–26.[CrossRef][Medline]

7 Hogben LT. In: Darwinism and the Study of Society—Manton MP, ed. (1961) London: Tavistock Publications.

8 Fisher RA. The Genetical Theory of Natural Selection (1930) Oxford: Clarendon Press.

9 Denton M. Evolution: A Theory in Crisis (1986) Bethesda, MD: Adler and Adler.

10 Hagedoorn AL, Hagedoorn AC. The Relative Value of the Processes Causing Evolution (1921) The Hague: Martinus Nijhoff.

11 Watterson GA. The homozygosity test of neutrality. Genetics (1978) 88:405–17.[Abstract/Free Full Text]

12 Tajima F. Statistical methods for testing the neutral mutations hypothesis by DNA polymorphism. Genetics (1989) 123:585–95.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Int J EpidemiolHome page
G. D. Smith
'Something funny seems to happen': J.B.S. Haldane and our chaotic, complex but understandable world
Int. J. Epidemiol., June 1, 2008; 37(3): 423 - 426.
[Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ewens, W. J
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ewens, W. J
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?