Skip Navigation


IJE Advance Access originally published online on August 5, 2005
International Journal of Epidemiology 2005 34(5):1077-1079; doi:10.1093/ije/dyi156
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
34/5/1077    most recent
dyi156v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gauderman, W J.
Right arrow Articles by Conti, D. V
Right arrow Search for Related Content
PubMed
Right arrow Articles by Gauderman, W J.
Right arrow Articles by Conti, D. V
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2005; all rights reserved.

Commentary

Commentary: Models for longitudinal family data

W James Gauderman* and David V Conti

Department of Preventive Medicine, University of Southern California School of Medicine, 1540 Alcazar Street, CHP-200, Los Angeles, CA 90089-9011, USA

* Corresponding author. E-mail: jimg{at}usc.edu

Cohort studies will become increasingly important in understanding the aetiology of complex human traits.1 While the longstanding approach of analysing cross-sectional data to identify genetic and/or environmental factors for disease or quantitative traits has resulted in some success, there have been many inconclusive results and far too few replications. There are recognized explanations that are often put forth for this, including low power and heterogeneity across study samples. However, a reason that is not often cited is that a single cross-sectional examination of data may not capture the essential aetiological mechanisms. For example, a specific variant genotype might cause an increase in a trait value that cumulates as a person ages. That is, a specific gene may affect the trajectory of the trait over time. Thus, two studies, one of young-aged subjects and the other of older-aged subjects, would likely come to different conclusions with respect to that locus owing to the different part of the gene–age trajectory that was examined. In a similar manner, different genes may also act at different time periods in the disease process, such as disease initiation or progression. In such situations, underlying genes may only be identified through longitudinal studies that accurately capture the dynamic nature of the phenotype.

The standard cohort study involves longitudinal follow-up of individuals (unrelated subjects). Although this is an effective design for studying measured factors (candidate genes and environmental exposures), it does not permit estimation or adjustment for unmeasured genetic or environmental factors that are shared within families. Understanding the distribution of traits with respect to shared genetic and environmental factors is an important first step in the progression from descriptive genetic epidemiology to targeted studies of specific loci or genomewide searches using linkage or association methods.

The paper by Burton et al.2 in this issue appropriately points out the need to integrate longitudinal and family studies, and discusses several advantages of combining these data types. By basing their modelling framework on generalized linear mixed models (GLMMs) and using a Bayesian estimation procedure, they develop a flexible approach for estimating the fraction of variance in both trait level and trait slope over time that can be attributed to additive genetic and shared (within family) environmental effects. Moreover, the model is applicable to repeated quantitative or binary traits. This, combined with their previous work for survival traits,3 encompasses a class of models and estimation methods that should handle almost any type of outcome one would collect in a longitudinal family study.

In their analysis of systolic blood pressure in the Framingham Heart Study data, Burton et al.2 estimated the narrow-sense heritability for slope over time of only 9%, but a much larger heritability (44.3%) for the intercept. As the authors point out, the latter refers to the proportion of total variance attributable to additive genetic effects at baseline. Care must always be taken when interpreting what is meant by ‘baseline’. To provide a context for understanding this issue, we show their linear model for SBP:

where i, j, and k index family, individual, and measurement number, respectively, T denotes age, and T* is a fixed age value. Remaining terms include a vector of measured covariates (X) and an error (e) that is assumed to be normally distributed with mean zero. The authors set T* to 52.7 years, the mean age in the sample. The parameters bT and bTij measure the overall average slope of SBP on age and the subject-specific deviation in slope from that average, respectively. The latter is treated as a random effect, is modelled as a function of genetic and shared environmental components of variance, and is the source of the estimated 9% heritability in slopes. These slope parameters are invariant to the choice of T* and estimate the change over the time-period for which the longitudinal measures have been obtained.

In contrast, what do the intercept parameters measure? Both the overall average (b0) and subject-specific deviation (b0ij) parameterize the mean SBP at baseline, where baseline refers to the covariate profile at which all other terms in the model drop out. In the above model, this is when age 5 T* 5 52.7 years and each X has a zero value (e.g. female of average weight, height, etc.). Therefore, the reported heritability of 44.3% for baseline SBP is referable to the distribution of SBP at age 52.7 and reflects cumulative heritability up to that age. It is important to recognize that the intercept-based heritability is not invariant to the choice of T*. In other words, one will get a different estimated heritability for the intercept (mean SBP) with different choices of T*. This fact can be used to advantage to better understand the effect of underlying genes on longitudinal trajectory. For example, one could run repeated analyses setting T* to 0, 10, 20, etc. to estimate heritability at birth, age 10, age 20, etc. The estimated heritability at age 10, for example, represents the cumulative effect of heritability at birth plus genetic effects on the growth slope that occurred between birth and age 10.

We note that the linear model above could be generalized (e.g. using a linear spline model), to allow both the growth slope and heritability estimates on slopes to vary over time. This would be important for a trait such as lung function, which increases rapidly through childhood and then decreases slowly over time in adulthood. For this trait, one could imagine that there might be different genetic influences in the growth and decline processes. Studying the change in heritability across ages might suggest the possibility of environmental factors that act through time to enhance or suppress gene expression.

In the light of currently available technologies, one is unlikely to be satisfied with only estimates of heritability. Instead, investigators are likely to want to study candidate genes and perform genome screens by association-based methods. Certainly, having age-specific estimates of heritability in hand should be viewed as a key step in designing an optimal study to test specific genetic loci. However, one may be tempted to then discard the family-based design in place of easier-to-conduct case–control or cohort studies of unrelated individuals. In our view, the family-based cohort study can still play an important role in the context of gene-association studies. First, tests of gene associations for trait-average and trait-slope based on within-family comparisons will be free from biases owing to population stratification. Second, one will have the opportunity to conduct joint tests of linkage and association. Joint models can yield a more powerful test than either a linkage or association test alone, and can be useful for distinguishing a marker from a true underlying trait locus.4 Third, one can monitor estimates of heritability for both intercepts and slopes as measured genes are added (as covariates X) to the model. This would provide one way of determining how fruitful it might be to continue searching for additional genes and may suggest targeting additional searches to specific age groups. Although, such extensions to measured genotypes can quickly lead to complex models, the flexibility of the GLMM framework and the ability of Bayesian estimation procedures to handle large integrations make such extensions feasible, albeit computationally demanding (see ref. 5 for an application to measured genotypes using GLMMs and Bayesian estimation).

In summary, the work by Burton et al.2 and others (see ref. 6 for a summary) highlights the potential importance of longitudinal family-based studies. These studies will be costly and more difficult to perform than either a family-based cross-sectional study or a longitudinal study of unrelated individuals. However, the combination of these two designs is likely to pay large dividends in our attempts to understand genetic and environmental determinants of complex human traits. The longitudinal family-based design should be given careful consideration as we plan new studies.


    References
 Top
 References
 
1 Collins FS. The case for a US prospective cohort study of genes and environment. Nature 2004;429:475–77.[CrossRef][Medline]

2 Burton P, Scurrah K, Tobin M, Palmer LJ. Covariance components models for longitudinal family data. Int J Epidemiol 2005;34:1063–77.

3 Scurrah K, Palmer LJ, Burton P. Variance components analysis for pedigree-based censored survival data using generalized linear mixed models (GLMMs) and Gibbs sampling in BUGS. Genet Epidemiol 2000;19:127–48.[CrossRef][ISI][Medline]

4 Millstein J, Siegmund K, Conti D, Gauderman W. Testing association and linkage using affected sib-parent study designs. Genet Epidemiol (in press).

5 Conti DV, Gauderman WJ. SNPs, haplotypes, and model selection in a candidate gene region: The SIMPle analysis for multilocus data. Genet Epidemiol 2004;27:429–41.[Medline]

6 Gauderman W, Macgregor S, Briollais L et al. Longitudinal data analysis in pedigrees. Genet Epidemiol 2003;25(Suppl.1):S18–28.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
34/5/1077    most recent
dyi156v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Gauderman, W J.
Right arrow Articles by Conti, D. V
Right arrow Search for Related Content
PubMed
Right arrow Articles by Gauderman, W J.
Right arrow Articles by Conti, D. V
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?