IJE Advance Access originally published online on April 14, 2005
International Journal of Epidemiology 2005 34(5):1063-1077; doi:10.1093/ije/dyi069
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Methodology |
Covariance components models for longitudinal family data
1 Biostatistics and Genetic Epidemiology, Department of Health Sciences and Department of Genetics, Institute of Genetics, University of Leicester, UK
2 Department of Physiology and Centre for Genetic Epidemiology, the University of Melbourne, Australia
3 Western Australian Institute for Medical Research and UWA Centre for Medical Research, University of Western Australia, Australia
* Corresponding author. University of Leicester, Department of Health Sciences, 2228 Princess Road West, Leicester LE1 6TP, UK. E-mail: paul.genepi{at}ntlworld.com
| Abstract |
|---|
|
|
|---|
A longitudinal family study is an epidemiological design that involves repeated measurements over time in a sample that includes families. Such studies, that may also include relative pairs and unrelated individuals, allow closer investigation of not only the factors that cause a disease to arise, but also the genetic and environmental determinants that modulate the subsequent progression of that disease. Knowledge of such determinants may pay high dividends in terms of prognostic assessment and in the development of new treatments that may be tailored to the prognostic profile of individual patients. Unfortunately longitudinal family studies are difficult to analyse. They conflate the complex within-family correlation structure of a cross-sectional family study with the correlation over time that is intrinsic to longitudinal repeated measures. Here we describe an approach to analysis that is relatively straightforward to implement, yet is flexible in its application. It represents a natural extension of a Gibbs-sampling-based approach to the analysis of cross-sectional family studies that we have described previously. The approach can be applied to pedigrees of arbitrary complexity. It is applicable to continuous traits, repeated binary disease states, and repeated counts or rates with a Poisson distribution. It not only supports the analysis of observed determinants, including measured genotypes, but also allows decomposition of the correlation structure, thereby permitting conclusions to be drawn about the effect of unobserved genes and environment on key features of disease progression, and hence to estimate the heritability of these features. We demonstrate the efficacy of our methods using a range of simulated data analyses, and illustrate its practical application to longitudinal blood pressure data measured in families from the Framingham Heart Study.
Keywords Longitudinal, family studies, MCMC, Gibbs sampling, Bayesian, genetic epidemiology
Accepted 1 March 2005
A conventional longitudinal study involves the repeated evaluation of one or more measurable characteristics (phenotypes or traits) in a series of unrelated individuals. Such studies are used widely. The repeated measurements provide for reduced error, increased statistical power and, critically, a means to study the pattern and determinants of, systematic changes in a phenotype of interest over time.13 In contrast, a cross-sectional family study enrols groups of relatives rather than unrelated individuals and typically involves a single phenotypic assessment in each of the study participants in each family. This permits the study of phenotypic similarities and differences amongst close relatives and allows one to disentangle the genetic and environmental contributions to the aetiology of the disease or trait under study.47 Both types of studies involve the analysis of complex correlated data: within-subject phenotypic correlations over time in a longitudinal study, and within-family phenotypic correlations in a cross-sectional family study. Crucially, the correlated data structures fundamental to both types of study permit the investigation of latent, or unobserved, aetiological determinants of a trait as well as those reflected in variables that may be measured directly.
By combining the features of longitudinal studies in individuals and cross-sectional studies in families, longitudinal family studies increase the power to resolve the genetic and environmental determinants of traits associated with complex diseases and provide a powerful and flexible method to study the corresponding determinants of change in such traits over time.
The scientific power and the analytic challenge of longitudinal family studies derive from the residual correlation that exists between the phenotypes of individuals in the same family4,5,7,8 and between repeated measurements in the same individual over time.1,3,9 Sometimes this correlation (or covariance) is of prime interest in its own right. In other situations, it is a nuisance that must be taken into appropriate account in order to obtain valid statistical inferences. In either case, the correlation structure of longitudinal family data conflates all the complexity of cross-sectional family data with that of longitudinal data in individuals. This means that the analysis of such data is difficult and there is a relative paucity of methods10 with which it may be approached.
This dearth of applicable methods is a serious problem, and the importance of the methodological challenge that it implies was underlined by the decision to make longitudinal family data from the Framingham Heart Study the focus of Genetic Analysis Workshop 13 (GAW13) in New Orleans, in November 2002.11,12 At that workshop, many different approaches to the analysis of longitudinal family data were investigated.10,1214 An important conclusion was that: although Well-selected cross-sectional data (e.g., first or last visit) provided good power for detecting some genes., ... summaries of longitudinal data (e.g., means, slopes) were generally most effective for finding genes, particularly those that affected trends in outcome over time.10 This suggests that future investigations based on longitudinal family data hold considerable promise. It also underscores the priority that must be given to the development of appropriate analytic methods and supporting software. Therefore, contemporary experience with longitudinal family studies is rather limited, and any contribution that methods for the analysis of longitudinal family data may potentially make to scientific progress lies primarily in the future.
Nevertheless, in order to fully realise this potential, it is already clear that a good analytic model for longitudinal familial data should satisfy a number of desirable criteria. First, it should be able to take appropriate account of the correlation structure. Second, it should be able to deal easily with repeated measurements on a trait with any one of a variety of common functional forms. These include binary disease states, continuous traits with a normal distribution, and phenotypes that are reflected in a rate or count with a Poisson distribution. Third, such a model should be applicable to family data in the broadest sense including unrelated individuals, sibships, nuclear families, and complex multigenerational pedigrees. Fourth, the model should be tractable: the necessary mathematics should readily be implemented in appropriate software, and that software should be usable by research groups other than those that originally developed the model.
Keeping these criteria in mind, a number of different approaches to analysis have been proposed, with varying degrees of success and applicability. One of the simplest is to apply some form of data reduction procedure to resolve the repeated measurements in each individual to generate a single summary measure.3 In a second step, the summary measures may be analysed using a conventional method for cross-sectional family data.10 As an example, the quantitative change in a phenotype between the first and last measurements can be used as a summary measure in each individual. Alternatively, a regression model can be fitted for each individual and its intercept or slope, or sometimes an appropriate regression residual, may be used as an adjusted phenotype.1,3,10 However, such two-step approaches are often less efficient than a single model that makes use of all the data simultaneously. Furthermore, two-stage procedures often fail to take proper account of uncertainty in the value of the summary measures when fitting the second stage model.1,3
Multilevel models and marginal models that assume a simple hierarchical correlation structure are often used to analyse repeated measures data in individuals,9,1517 but it is difficult to generalize these approaches to fully utilize the additional information contained in family data. For example, multilevel models can be used for nuclear family data,18 but the model specification is hard to generalize to multigenerational pedigrees. Furthermore, when the outcome is non-normal, multilevel marginal models fitted using iterative generalized least squares (IGLS,15) can introduce bias.19 On the other hand, marginal models fitted using generalized estimating equations (first order GEEs9,20) are useful for normal, binary, and Poisson repeated measures data when fixed effects are of primary interest, but first order GEEs do not produce efficient estimates of the covariance parameters themselves.
The topic has long been of interest to genetic epidemiologists. More than 20 years ago, Lange and colleagues described very general covariance components models for quantitative phenotypes in pedigrees that could be extended to multivariate traits and thus, in principle, to longitudinal data.5 These could be fitted with user modifiable Fortran 77 code in the FISHER Program.21 Unfortunately, although the modification to the scoring algorithm that was required to extend the models to non-linear problems was straightforward to formulate,5 the software implementation was restricted to problems involving multivariate normal or multivariate t distributions.21 Structural equation models designed for the cross-sectional analysis of quantitative family data and twins can also be extended to longitudinal data.22,23 But using standard software, it is again difficult to generalize the models to non-linear problems involving non-normal phenotypes and also to extended pedigrees.23,24
In this paper, we propose an approach to the analysis of longitudinal family data that is based on a natural extension of the generalized linear mixed models (GLMMs16) that we have already developed for cross-sectional family data. These models are appropriate for continuous and binary traits,8 for Poisson distributed counts or rates25 and for other phenotypes with any one of the many distributions that are available in the flexible Markov chain Monte Carlo (MCMC) environment in which the models are fitted (see below). They may be used both in nuclear families8 and extended pedigrees.25 The models are fitted using a MCMC-based approach, usually Gibbs sampling,2628 within the general purpose software WinBUGS.29 By sampling from the joint posterior distribution of unknown parameters, this approach avoids high dimensional integration and thereby surmounts many of the practical difficulties encountered when fitting GLMMs for non-multivariate normal data when analytic likelihood estimation methods are used.28
This paper describes the extension of GLMMs for cross-sectional family data8,25,30 to longitudinal family data. In section 1 we motivate the extended model. In section 2 we test out the proposed methods on simulated data and section 3 describes a real data analysis based on the Framingham Heart Study. Section 4 is a general discussion.
| 1 Methods |
|---|
|
|
|---|
The construction of GLMMs for cross-sectional data from randomly ascertained families has been described in detail elsewhere.8,25 A brief description is presented here before the extension to longitudinal data is outlined. We assume throughout that families are non-informatively ascertained, i.e. that they are obtained as a population-based sample, with no deliberate excess or under-sampling on the basis of the trait of interest.
Assume that a phenotypic measurement yij has been obtained from each member j of family i, and that, conditional on covariates and random effects, the random error in each observed phenotype follows a distribution f(.) (e.g. normal, binomial, or Poisson). Then the model may be written:
![]() | (1) |
(if required) is a nuisance parameter such as the error variance of a normal distribution; ß0i is a vector that contains the random effects of all subjects in the ith family: its jth element is ß0ij; V0 is a covariance matrix reflecting the within-family covariance of the elements of ß0i. The right-hand side of the first line of the model (ß0 + ß0ij) + x'ijß, is often called the linear predictor and may be denoted
ij.
If yij has a normal distribution, g(.) is usually taken to be the identity function (i.e. the linear predictor directly predicts yij),
is the residual error variance (
), and model (1) is an extended form of conventional multiple regression. If yij is binary, g(.) is usually the logit function,
is not required and model (1) is an extended logistic regression model. Given a binary trait, g(.) can also be taken to be the probit function. If yij is a censored survival time (or a rate or count) and analysis is to be based on an extended Poisson regression model, as described in Scurrah et al.,25 g(.) is the logarithmic function and
is again not required.
In the typical cross-sectional family models that we have described8,25 we use three variance components to parameterize the within-family covariance (or correlation) structure (
,
and
). The variance component
denotes additive polygenic variance.46 That is, the variation that arises from the effect of individual alleles at multiple loci with no genetic interactions. Covariance consequent upon a shared nuclear family environment is denoted by
. The additional covariance between siblings due to a common sibling environment, not shared with parents, is denoted
. In traditional family designs,
is completely confounded with polygenic dominance (
)4 and, here, we choose to model only
.8 Nevertheless, models incorporating
(e.g. in twin family studies, when
and
can be disentangled) or
(maternal line effects) can be fitted if appropriate.
Therefore the total random effect ß0ij for each individual can be viewed as the sum of three distinct random effects (each associated with one of the three variance components):
![]() |
), the Cij terms represent the effect of common nuclear family environment (variance,
) and the Csij terms represent common sibling environmental effects (variance,
). These are shared appropriately between relatives to build the complex within-family correlation structure that is required.
We have described the construction of such models using Gibbs sampling in WinBUGS29 both in nuclear families8 and in extended pedigrees.25 Under the standard nuclear family parameterization,8,25 and as is biologically appropriate,6,31 the conditional covariance between parents (on the scale of the linear predictor) is
, that between a parent and child is
and the covariance between two full siblings is
. In extended pedigrees, the manner in which the random effects are generated also leads naturally to biologically appropriate covariances between more distant relatives;6,31 for example, grandparent-grandchild (
), uncle-niece (
) and first cousins (
). It is also straightforward to ensure appropriate covariances between monozygous twins and half siblings.
We have demonstrated that the GLMMs generate approximately consistent estimates both of conventional regression coefficients (fixed effects) and of components of variance. We have also shown that individual random effects (specifically the Aij) can successfully be used as adjusted phenotypes in subsequent linkage analyses.8,25,32,33
A series of working WinBUGS files, with a variety of different purposes, may be downloaded from our website: http://www.prw.le.ac.uk/research/HCG/gebugs.html. The three files for the longitudinal family study analyses were all constructed in WinBUGS 1.429 and should definitely be able to run on that platform.
It is now straightforward to extend the GLMMs for cross-sectional family studies to incorporate genetic and environmental contributions to the longitudinal change in a phenotype over time. If Tijk is the time (sometimes age) of the kth measurement in the jth individual in the ith family, model (1) can be extended:
![]() | (2) |
, is now indexed by time as well as family and subject. It can therefore incorporate time-dependent covariates that can change in their observed values with time. The term ßT estimates the average increase in response (on the scale of the linear predictor) associated with a one-unit increase in time. That is, ßT reflects the overall rate of longitudinal change in the phenotype of interest. The subject level random effect, ßTij, is analogous to ß0ij (see above). It is a composite random effect, specific to the jth member of the ith family. It reflects polygenic and shared environmental contributions to the rate of phenotypic change over time (ßTij = ATij + CTij + CsTij). ßTi is a family level random effects vector with jth element ßTij'; VT is a covariance matrix reflecting the within-family covariance of the elements of ßTi and is parameterized by variance components
,
, and
. The generation of the ßTi random effects is so organized that VT has the same overall structure as V0 (see the WinBUGS files for longitudinal family data on our website http://www.prw.le.ac.uk/research/HCG/gebugs.html). However, the actual magnitudes of the variance components (
,
, and
) with which it is parameterized, are assumed here to be unrelated to the magnitudes of those that parameterize V0 (i.e.
,
, and
). Nevertheless, just as the size of
indicates the extent to which additive polygenic effects influence the mean value of the phenotype (an intercept effect), the estimated size of
indicates the influence of additive polygenic effects on the rate of change of the phenotype over time (a slope effect).
Model parameters may be interpreted as described previously.8,25 Interpretation of the fixed regression coefficients depends in the usual way on the chosen link function. Covariances and variances are also interpreted on the scale of the linear predictor. Thus, in a binary model with a logit(.) link, a fixed regression coefficient (ßX) estimates the increase in the log(odds) of a positive phenotype associated with a one-unit increase in the value of covariate X. Similarly, the covariance between the log(odds) of a positive phenotype in two full siblings is (
), while the corresponding covariance of the rate of linear increase in log(odds) between two siblings is
. In normal models that incorporate an unshared residual error term (
) one can also calculate the narrow sense heritability of the phenotype6:
. Furthermore, even if the phenotype is not normally distributed, a
term can usually be added to reflect unshared residual variance (on the scale of the linear predictor) in the gradient from subject to subject. This allows one to calculate, on the scale of the linear predictor, the narrow sense heritability of the rate of linear change in the phenotype over time:
![]() |
| 2 Simulated data analysis |
|---|
|
|
|---|
The models described in section 1 were applied to a range of datasets that were simulated using S-PLUS 2000.37 The primary purpose of the simulation work that will be described in this section was to investigate whether a correctly specified model of the type motivated in section 1 would generate estimates that were consistent and that 95% credible intervals (in this setting, a Bayesian equivalent to 95% confidence intervals) had appropriate coverage. A range of different simulated target values was used, both for the regression coefficients and for the variance components. Four primary sets of simulation scenarios were constructed. These may be categorized by whether or not they involve a continuous normally distributed trait or a binary trait, and whether the simulated pedigrees were nuclear families or extended pedigrees. Unless otherwise stated, all MCMC analyses were based upon the running of three parallel chains. The first 1000 iterations of each chain were discarded as a burn-in, and formal inferences about the posterior distribution were based on appropriate summaries of iterations 100010000. All iterations in the stated range were analysed, despite the autocorrelation between adjacent iterations, since for the conventional types of summary in which we were interested: there is no real advantage in thinning except to reduce storage requirements.29
2.1 A normal trait in nuclear families
Simulation was based upon model (2) with a normally distributed response variable and an identity link:
![]() |
ijk denotes the full linear predictor on the right-hand side of model (2). In addition to the fixed parameters reflecting the overall intercept (ß0) and slope (ßT) there are two observed covariate vectorsbvar (binary) and qvar (continuous) with coefficients ßbvar and ßqvar. The covariance matrices (V0 and VT) associated with the random effect vectors for the regression intercept (ß0i) and slope (ßTi) are parameterized with terms
and
, respectively. Table 1 summarizes the combined MCMC results from 250 independently simulated datasets each based on a common simulation model of this form. Each dataset was based on the simulation of 50 nuclear families containing five members (two parents and three children) with observations made at five equally spaced time-points. A random 10% of individuals and 10% of observations were deleted, and all individuals with less than three observations were also deleted. These random deletions were deliberately invoked in order to break up the otherwise balanced nature of the simulated data. Across 50 typical datasets (2500 families generated) this particular simulation mechanism generated 1468 families with five members and 815, 192, 22, and 3 families, respectively, with four, three, two, and one member. Overall, 2009 families had two parents, 463 had one, and 28 had no parents. Correspondingly, 1814 families had three children, 619 had two children, 62 had one, and 5 had none in the dataset. Across the fifty datasets, there were 11 223 individuals and of these, 6689 had observations at all five time-points, 3751 at four, and 783 at three. No attempt was made to impute missing data.
|
Subsequently, six sets of simulations were also generated. Each of these six scenarios involved the generation and analysis of 25 independent datasets with a variety of different target values for each parameter. In addition, the 250 independent datasets simulated under the scenario detailed in Table 1 were arbitrarily divided up into 10 sets of 25 simulations. This implies a total of 16 groups (6 + 10) each consisting of 25 independent datasets. Figure 1 details the relationship between target values and mean MCMC-based estimates for each parameter across the 16 simulation groups. Figure 2 details the corresponding coverage of 95% credible intervals for each parameter. In both figures, a number of the plotting points so closely overlie one another that less than 16 individual points can be distinguished in each figure. The grouping of the simulations in this manner and the graphical presentation of the results in Figures 1 and 2 not only allow one to examine the extent of bias and coverage at a particular set of target values, but also to determine how that bias and coverage might change as the target values vary.
|
|
Overall (see Table 1 and Figures 1 and 2), the means of the posterior means were close to the simulated target values and the mean coverage was close to nominal (95.3%) with no obvious pattern to the small discrepancies from 95%. However, there was a consistent tendency for the estimates of all of the variance components to be slightly raised. The geometric mean of the posterior mean ÷ target value ratios for all seven variance-covariance parameters was
1.10. A positive bias of up to 2030% in the estimation of variance terms in Gibbs-sampling-based GLMMs has been noted before.27 The bias tends to be greater in small datasets and if the true variance being estimated is close to zero.8 If it is viewed as being a concern, it can be mitigated by using the posterior mode as a summary measure rather than the posterior mean.27,38 That this bias is evident in this particular example and not in the examples described in sections 3.2 and 3.3 is because this was deliberately constructed as a small problem. In fact, it might reasonably be emphasized that despite the fact that a complex GLMM is being fitted to a sample consisting of only 50 nuclear families, the bias in the parameter estimates is very smalland certainly not enough to lead to misleading inferences.
2.2 A normal trait in extended pedigrees
As a familiar and convenient template for the family structures in the simulations involving extended pedigrees, we used the extended pedigree structures that were used in the simulated out-bred population generated for the Genetic Analysis Workshop 12 (GAW12).39 However, for our purpose, we simulated entirely new exposure and outcome data under the models specified in this paper and made no use of the outcome/exposure data or simulating models that were supplied and analysed as part of GAW12.
Using this template, we generated family structures corresponding to 23 extended pedigrees (n = 1497 individuals). Each pedigree consisted of four or five generations, and the number of identified individuals in each pedigree varied from 37 to 128. Given the structure of these families, we simulated phenotypic and covariate data for the 1000 subjects who were deemed to be alive and available for study.39 The remaining 497 individuals were identified in the pedigrees but were allocated missing value codes for both the phenotypic and covariate data.
Simulation was again based upon model (2) with a normally distributed response variable and an identity link. The only change in model structure from the model used to simulate the normal trait in nuclear families (section 2.1) was that there were three observed covariate vectors (x1, x2, and x3) with coefficients ß1, ß2, and ß3. The data were simulated so as to reflect a realistic dataset with systolic blood pressure (SBP) as the phenotype of interest. The fixed covariates were generated so as to represent: proportion of ideal body weight (continuous); salt intake above or below the population median (binary), and current smoking (binary). Longitudinal measurements were made at five time points at two yearly intervals; time was zeroed at the time of the central measurement. A random 20% of the measurements were deleted from the dataset.
An annotated S-PLUS script file (normal.extended.pedigrees. example.ssc) that generates a single simulated dataset of the form described in the preceding paragraph may be found at website http://www.prw.le.ac.uk/research/HCG/gebugs.html. The website also includes an annotated copy of a WinBUGS file (normal.extended.pedigrees.example.odc) into which the two datasets written out by the S-PLUS file have already been imported. The WinBUGS file is ready to run in WinBUGS version 1.429 (or version 1.4.1). The distribution of key variables in this particular simulated data set is summarized in Box 1. By clicking on the arrows indicating the WinBUGS folds29 for Target Values and Stats under RESULTS in the annotated WinBUGS file, it can be seen that all point estimates are close to simulated target values, and that all 95% credible intervals include the corresponding simulated target values.
| Box 1 Key variables in simulated dataset Total number of longitudinal measurements 4017 Systolic blood pressure mean = 121.9 mm Hg; SD = 9.6 mm Hg; minimum = 87 mm Hg; maximum = 157 mm Hg Weight relative to ideal body weight mean = 1.35; SD = 0.39; minimum = 0.7; maximum = 2.0 Salt intake above population median yes = 51.4%; no = 48.6% Current smoker yes = 30.4%; no = 69.6%
|
As a more rigorous test of the consistency and coverage of point and interval estimates, Table 2 summarizes the combined MCMC results from the analysis of 20 independently simulated datasets of the same form. There was no obvious bias to the posterior means and the coverage was close to nominal.
|
2.3 A binary trait in nuclear families
Simulation was based upon model (2) with a binary trait and a logit link:
![]() |
) reflecting random variation from subject to subject in the gradient reflecting the rate at which the log-odds of exhibiting the trait increased with age. An annotated S-PLUS script file (binary.nuclear.families.example.ssc) that generates a single simulated dataset in this form, and an annotated WinBUGS file (binary.nuclear.families.example.odc) into which this dataset has been incorporated can be found on our web site. A comparison of Target Values and Stats under RESULTS in the WinBUGS file demonstrates that point estimates are close to the simulated targets and that the 95% credible intervals all include their corresponding target values. The single simulated dataset in the example file is based upon 1000 nuclear families, and upon a longitudinal binary phenotype assessed on up to seven equally spaced occasions. A random 20% of individuals and 20% of observations were deleted, and all individuals with less than three non-deleted assessments were also excluded. Table 3 summarizes the combined MCMC results from 20 independently simulated datasets based on the same model, but using a different set of target values for the model parameters. Each scenario was based on the simulation of 500 nuclear families each containing five members (two parents and three children) with observations made at five equally spaced time-points. A random 10% of individuals and 25% of observations were deleted, and all individuals with less than three observations were deleted. Across five typical datasets (2500 families generated) this simulation mechanism led to 591 families with five members and 964, 663, 244, and 32 families, respectively, with four, three, two and one members. All members were deleted from six families and these pedigrees did not therefore appear in the simulated dataset at all. Overall, 1420 families had two parents, 906 had one and 174 had no parents in the dataset. Correspondingly, 1021 families had three children, 1066 had two children, 379 had one, and 34 had none in the dataset. Totally, across the five datasets, there were 9320 individuals and of these, 5654 had observations at all five time-points, 2986 at four and 680 at three.
|
Overall (see Table 3), there was no consistent bias to the parameter estimates and the coverage was again close to nominal.
2.4 A binary trait in extended families
Simulation was again based upon model (2) with a binary response variable and a logit link. Family composition was based upon the structure of the same set of extended pedigrees used in section 2.2. Data were simulated at five equally spaced time-points with 20% of observations deleted at random. Twenty independently simulated datasets were constructed. Table 4 summarizes the results of the MCMC analysis applied to these datasets. This example exhibited the same small positive bias in the posterior means for the variance terms that was illustrated in the example considered in section 2.1. In fact the bias was slightly larger in this example: the geometric mean of the posterior mean 4 target value ratios for all seven variance-covariance parameters was
1.15. The explanation for the bias and the potential solutions are the same as described in section 2.1. That the bias is evident in this particular example emphasizes the fact that binary phenotypes are far less informative than quantitative traits, particularly when one is fitting a complex random effects model.8
|
2.5 Overall assessment of model fit and statistical inferences
On the basis of informal scrutiny of the developing MCMC chains and of the plots of the GelmanRubin diagnostic,36 it was clear that MCMC convergence was typically quick (certainly within 1000 iterations) under all scenarios. This is well illustrated in the diagnostic plots included at the bottom of the exemplar WinBUGS files available on our website. As is inevitable in these models,8,25,28 there is substantial autocorrelation from iteration to iteration within a chain; particularly so, for the random effect variances. As a necessary response, chains were therefore run long enough to ensure that any effect that this may have had on statistical inference was minimal.
In relation to the quality of the resultant statistical inferences, across all scenarios, the only issue beyond the small positive bias in the posterior means for the covariance terms (see above) was that there was perhaps a tendency for
, in particular, to be overestimated (by up to 25%). We have no obvious explanation for this particular phenomenon; it is not a prominent feature of our GLMM models in general.8,25 We also note that, as a characteristic, it was not shared by
; indeed, it was
that was most consistently overestimated amongst the variance components associated with the slope (by up to 19%). This obviously warrants further study.
| 3 Real data analysis |
|---|
|
|
|---|
We analysed the longitudinal Framingham Family Cohort dataset, provided for the GAW13, which included 4692 individuals in 330 pedigrees.40
The primary response variable of the GLMMs described here was the SBP measured over time. Explanatory covariates for all models included: sex, age, height, weight, number of standard drinks per week, and number of cigarettes per day. Sex (0 = female; 1 = male) was analysed as a binary covariate. Other variables were treated as continuous. All continuous covariates were centred at or close to their mean. Quadratic and cubic terms in age were included in the fixed effects component of the model in order to allow for a non-linear increase in SBP with age.
The covariates drink (number of standard drinks per week) and cpd (number of cigarettes per day) were measured at only 11 and 18 time-points, respectively, in the first cohort, although cohort members were subject to a maximum of 21 examinations each. The majority of individuals exhibited only one or two distinct values for these variables throughout the period of study and each missing value of drink and cpd was therefore replaced by the interpolated mean of the observed values bracketing the missing observation. This increased the amount of useful observations without excluding important covariates from the model. The modelling that we report here was based entirely on those records that had non-missing values for SBP, age, height, and weight, and non-missing or interpolated values for drink and cpd. This represented a total of 24 174 observations in 2860 individuals (on average, 8.45 observations per phenotyped subject). The older subjects in the study had potentially many more observations than the younger subjects but, given that age was included in the model and that our method does not require balanced data, this was not problematic.
The classes of models fitted in this analysis are equivalent to those described in section 1. Models were run using two parallel MCMC chains for 20 000 iterations after a burn-in of 15 000 iterations. The raw traces of the MCMC chains and the GelmanRubin diagnostic plots29,36 for the individual parameters of interest suggested that this length of burn-in and analysis were adequate.
The problem of how to model continuous SBP when some individuals are on blood pressure treatment is difficult.41 Since complex modelling of SBP was not the main aim of this particular analysis, we chose the simplest valid method of adjustment based on known average treatment effects.4143 This involved adding a constant (10 mm Hg) to each phenotype when an individual was on treatment, to reflect the true SBP that might have been observed if the individual had not been on treatment.
Although the model is Bayesian, vague priors were used throughout (see section 1), and in consequence the effect of each covariate on the phenotype could be assessed using a pseudo-likelihood-based approach. For example, an empirical posterior mean divided by its empirical standard deviation could be treated as an approximate standardized normal deviate (Z-score), and 95% credible intervals could be interpreted as
95% confidence intervals.17,44
The results of fitting the variance components GLMM in WinBUGS are shown in Table 5. Both posterior means and 95% credible intervals are presented for each parameter. We deliberately excluded certain intermediate phenotypes (such as cholesterol and BMI) from the model, even though they were clearly related to SBP. This was because they were potentially on causal pathways of interest, and we did not want to model away their contribution to variation in SBP.
|
The size of
(relative to the other variance components for the slope) implied a narrow sense heritability (
) estimate of only 9.0%. Furthermore, a formal test at the null hypothesis (
) using the criterion described by Self and Liang45 suggested that
was not statistically significant (P = 0.11). These findings suggest that the additive effect of polygenes explains at most a small proportion of the total variability in the rate of increase of SBP with age, and that non-genetic influences because of common family or common sibling environment (such as shared diet) account for a markedly greater part of the variability. In contrast, the variance components estimates for the intercept suggested that additive genetic effects represent the largest component of variance for SBP at baseline (
44.3%).
A number of covariates exhibit a significant association with SBP. SBP appears to be slightly higher in men, increase markedly with body weight, fall with height, and increase with the number of standard drinks consumed per day. These relationships are all bio-clinically plausible. In addition, individuals from the more recent Framingham cohort appear to have a lower SBP than those included in the earlier cohort. Having taken account of these other covariates, the linear component of the mean rate of change of SBP with each additional year of age was
0.61, suggesting that for an average individual, SBP increases by
6.1 mm Hg every 10 years. A non-linear effect was also apparent, since the credible intervals for both the quadratic and cubic fixed effect terms for age exclude 0. The non-linear relationship between mean SBP and age that is implied by the model is illustrated in Figure 3. Separate profiles are presented for males and females. The plots for either sex are presented for subjects that have a weight and height that correspond to the sex-specific means of these two measures across all the 24 174 observations that were included in the analysis. The age range plotted corresponds to the central 90% of the observed distribution of age across all observations in the analysis. In order to keep the illustrative example as simple as possible, the model presented in Table 5 contains no interactions between age and gender. The sex-specific SBP profiles with age in Figure 3 are therefore parallel. Such interactions could easily be added and it would have been possible to model entirely independent relationships between SBP and age in men and in women.
|
Results changed little when
was excluded, when
and
were excluded, and when
and
were excluded. This suggests that the analytic findings were robust to the particular choice of correlation/covariance structure that we adopted. In addition, there was little change when the SBP adjustment (for those on blood pressure treatment) was changed to +5 mm Hg or +20 mm Hg (instead of +10 mm Hg). | 4 Discussion |
|---|
|
|
|---|
Longitudinal family studies have an important role to play in attempts to dissect the aetiological architecture of the complex diseases. They will help us to look more closely not only at the aetiological determinants that are responsible for initial disease causation, but also, beyond these, at the genetic and environmental determinants that modulate the subsequent natural history of a disease. Knowledge of such determinants may pay high dividends not only in terms of prognostic assessment, but also in the development of new therapeutic modalities that can be biologically targeted and may, potentially, be tailored to the prognostic profile of individual patients.
There is increasing international investment in large case series for major complex diseases, such as the British Medical Research Council's National DNA Collections Initiative (http://www.mrc.ac.uk), and in large general population epidemiological studies that are linked to health records, such as UK Biobank46 (http://www.biobank.ac.uk/), Iceland's deCODE project47,48 (http://sunsite.berkeley.edu/biotech.iceland/), the Estonian Gene Bank Project (http://www.geenivaramu.ee/), CONOR (the cohort of Norway),49 and the Western Australian Data Linkage Project50 (http://www.populationhealth.uwa.edu.au/welcome/research/dlu/linkage). Some of these initiatives are able to delineate some family structure by design (e.g. deCODE, CONOR, and the Western Australian Data Linkage Project) while others (e.g. UK Biobank) are actively investigating the scientific and ethical issues that underpin the possibility of record linkage to identify close relatives after recruitment. Studies such as these are a rich potential source of longitudinal family data and will become even more informative over time. The optimal analysis of data from these studies requires that we are able to model changes in a phenotype over time, not only in individuals, but also in relative pairs, nuclear families and in extended families.
The particular approach to the analysis of such data that we describe in this paper generalizes a pre-existing method for cross-sectional family data to longitudinal family data, but we are not the first to consider such a generalization. For example, approaches based on the covariance components models described by Lange and colleagues5,21 or on structural equation modelling22,23 may be viewed as extensions of precisely this type. But, for the reasons we discussed in the beginning, these methods are currently restricted in their potential application by the constraints of available software. Equivalently, de Andrade et al.24 describe an approach to longitudinal family data that extends an earlier method for linkage analysis designed for cross-sectional family data.51 However, their work has a different focus to ours. First, it is aimed specifically at quantitative traits. Second, they have a primary interest in incorporating the effect of a major gene linked to an observed marker. Third, their model is designed to address scientific questions of the form: does heritability vary over time? As it is structured, it does not directly permit one to turn the question around in order to assess the impact of unobserved genes and environment on the systematic rate of change of a phenotype over time. This latter is one of the fundamental objectives of our own modelling. In their analysis of the Framingham Heart Study data in GAW13,14 Soler and Blangero52 also extend a conventional variance-components-based approach to linkage analysis to produce a flexible one-step approach for longitudinal family data. But their work again focuses on linkage analysis for continuous traits with a multivariate normal distribution. At the same workshop, Yang et al.53 and MacGregor et al.54 described random regression models which incorporated polynomial coefficients to model the covariance matrix as a smooth function of time.10 These models were again used to estimate age-specific changes in heritability, but not to estimate the heritability of the rate of change of a quantitative phenotype over time.
In contrast to the alternative methods, the approach based on MCMC that we describe extends easily, both to non-normal problems and to complex pedigrees.8,25 Furthermore, our models are so structured that they lead naturally to inferences based on the genetic and environmental covariance components associated with the trait mean and with the rate of change of the trait over time. Among the wide variety of different approaches to analysing longitudinal family data that were explored in GAW13,10,1214 including those referred to above, there is none that is designed to address these particular scientific questions that is both as flexible and as tractable as the GLMMs we describe. However, our models make a number of key assumptions that warrant further discussion.
First, they assume that, conditional on the fixed and random effectsand the genetically and environmentally determined correlation structure that they generatethe repeated observations in an individual are conditionally independent. Should it be viewed as desirable to introduce an additional autoregressive component to the model to account for a tendency for closely spaced observations in an individual to be more similar than measurements that are made at wider intervals,3,9,52 the flexibility of the WinBUGS modelling environment is such that it would be straightforward to do so.29 However, such models would be susceptible to poor mixing. They would demand the running of a prolonged series of iterations, and this would come with a heavy computational overhead.
Second, they assume that the ß0ij are independent of the ßTij. But realistic scenarios can be imagined under which some determinants affect both the level of response and the rate of change of the response with time.10 Such determinants will introduce a covariance between the ß0ij and ßTij. In principle, this is straightforward to incorporate. For example, the additive effect of genes that affect both the intercept and the gradient can be included in model (2) by extending the model for the subject level random effects:
![]() |
, which reflects the additive genetic effect (on the intercept) that is attributable to genes that also affect the slope. The parameter
is an unknown constant: it allows the average magnitude of the effect of the Qij on the intercept to differ from the corresponding effect on the gradient. The model implies a (genetically mediated) covariance of
x
between ß0ij and ßTij. Posterior distributions for
and
are obtained from the MCMC analysis, as also are posterior distributions for the individual Qij. Equivalent extensions can be implemented to incorporate covariances arising from shared environmental determinants that affect both the level of response and the rate of change of response over time. Although such extensions are straightforward to implement, they impair the mixing of the MCMC processand models must therefore be run longer. Our website http://www.prw.le.ac.uk/research/HCG/gebugs.html contains a working example of a GLMM (longitudinal.covariance.b0.bT.odc) that models a simulated scenario that is precisely equivalent to that considered above. In this particular case, the response variable is normally distributed and the dataset consists of five stacked simulations, each involving 500 nuclear families. As simulated, all subjects are observed at five separate time-points; but 10% of subjects and 10% of observations are deleted at random. Analysis is based on a burn-in of 5000 iterations and an analysis of 50 000 iterations in each of three chains. Detailed results for all parameters may be obtained from the website. The simulated values for the four key parameters
,
,
, and
were: 25, 5, 20, and 0.4, respectively. The corresponding estimates (95% credible intervals) from the MCMC analysis averaged across the five separate simulations were: 23.1 (13.231.5); 4.22 (2.565.73); 21.1 (12.731.0); and 0.48 (0.300.70). Results from each of the five individual simulations were very similar to one another. All the simulated target values fell within the 95% credible intervals of the estimates. It is clear from inspecting the history of the MCMC traces (see website) that the mixing of the traces for the key parameters was very slow and that an analysis run shorter than 50 000 iterations would have been inappropriate. We are considering ways to reparameterize the model to improve the mixing. Third, the models described in this paper explicitly assume that families are ascertained randomly and although we have developed approaches to deal with non-random ascertainment when fitting MCMC-based GLMMs of this type,55,56 such models require careful interpretation. To date, randomly ascertained longitudinal family data have been unusual, but this is gradually changing. In addition to the large national initiatives referred to above, population-based studies in progress that include a longitudinal family component include the Framingham Heart Study,57 the Busselton Health study5860 and studies using The Netherlands Twin Register.61 Several other large longitudinal studies based primarily on the population-based recruitment of individuals are considering extensions to permit an enhanced ability to undertake family-based research. These include ALSPAC (http://www.alspac.bris.ac.uk/) and the proposed US National Children's Study (http://nationalchildrensstudy.gov/). In addition, many cross-sectional family studies are likely to collect longitudinal data in the future. These include the Western Australian Twin Child Health (WATCH) Study.62,63
Fourth, the models in this paper all focus on the determinants of a linear trend in the value of a trait over time. But the method easily extends to more complex time trends. For example, if the linear predictor in model (2) is re-expressed in the form:
![]() |
One of the greatest challenges in fitting extended GLMMs is the limited amount of relevant information that is intrinsic to the data. Binary phenotypes in particular may not always provide enough information to fit as complex a model as may be desired. But this is a fundamental feature of the underlying science rather than a problem with our particular approach itself. Nevertheless, this is why we illustrated the analysis of a binary phenotype in 500 nuclear families, while the equivalent simulations for a normal phenotype were able to be based on only 50 nuclear families (see comments in section 2.1). Furthermore, this emphasizes the scientific need for large sample sizes when one wishes to fit sophisticated models (particularly those with latent variables) to data relating to a complex disease. Another relevant limitation is the computer time required to fit the GLMMs. As an approximate guide, using a typical high end PC (Pentium 4 CPU 3 GHz; 1GB memory) 10 000 iterations of the model based on the simulated normal phenotype fitted to the extended pedigrees (1497 subjects in 23 pedigrees with phenotypic information available on 1000 subjects each with, on average, four longitudinal assessments) took 23 mins to compile and initialize and
4 h to run. As computers become faster the processing speed will inevitably become a lesser concern.
In the light of all the alternatives, the particular approach to analysis that we describe in this paper offers a number of distinct benefits. (i) It makes full use of the information contained in the repeated observations for each individual, without first requiring the information to be summarized in any way. In other words, using the terminology adopted by Gauderman et al.10 it is a joint rather than a two-step method. (ii) It allows the inclusion of all pedigree members, without requiring decomposition of the pedigrees into smaller groups of relatives. (iii) Genetic effects can be modelled with complete flexibility. Conventional regression covariates may be used to reflect the direct effect of measured genotypes, as well as interactions between genotypes at different loci and/or between genotypes and measured environmental determinants. Variance components can be used to model and estimate the effect of unmeasured genes. It is then straightforward to add a segregation component involving an unobserved major gene.30 It is theoretically possible to incorporate genetic linkage by modifying the Mendelian segregation probabilities of the simple segregation model to reflect estimates of IBD sharing based on linked markers. As an alternative, linkage analysis can instead be based on the Aij and ATij random effects (see models (1) and (2)) as continuous phenotypes (already adjusted for confounders and non-genetic components of variance) that reflect the effect of additive genetic effects on the aetiology and progression of the phenotype of interest.25,32,64 Sampled values for each of these parameters are generated at every iteration of the MCMC process, thus permitting the construction of a posterior distribution for each Aij and ATij. Linkage analysis can then be based on the posterior means of these parameters in each individual subject.25,33 From the specific perspective of a linkage analysis, this second approach is, of course, a two-step model.10 However, the longitudinal and genetic elements of the full covariance model are fitted jointly, and from all other perspectives our approach is really a one-step model. Furthermore, we are working to incorporate genetic linkage directly into the GLMMs. In practice, the theory is straightforward, but there are problematic restrictions imposed by the updating procedures in WinBUGS that have to be circumvented. (iv) The results obtained from our simulated data analyses suggest that our approach generates accurate point estimates with approximately nominal coverage of credible intervals. Furthermore, the analysis of the Framingham Heart Study data indicates that our extended variance components models appear to perform well in practice. Not only are the models relatively straightforward to fit, but they also permit inferences to be expressed in a manner that is intuitive from a bio-clinical perspective. In the case of the Framingham data, the results supported the existence of a large additive genetic component to SBP itself but a smaller (non-significant) additive genetic component to the rate of increase in SBP with age. There was strong evidence of important shared environmental effects. All this is consistent with the complex, multifactorial susceptibility to the development and natural history of hypertension that would be anticipated. (v) Precisely, the same methods can also be used in situations involving family-based data and a phenotype that is cross-sectional from an epidemiological perspective, but consists of many repeated measures. For example, 24 hour ambulatory blood pressure assessment65 generates multiple blood pressure measurements over a single day and a night. (vi) Randomly missing data can be accommodated with ease, and there is no need for balanced data; repeated observations may occur at irregular intervals that can vary from subject to subject. (vii) Formally, the GLMMs can easily be extended to the wide range of other distributions supported within the WinBUGS modelling environment.8,16,29 This means that the same models and methods that can be used to analyse the determinants of change in a normally distributed trait over time can also be used to analyse repeated evaluations of a binary trait or of a Poisson distributed rate or count, with minimal changes to the WinBUGS code. (viii) Parameter interpretation is straightforward, and the models may be fitted with readily available freeware (WinBUGS29). (ix) The Bayesian approach provides flexibility and clear, intuitive answers, and incorporates uncertainty about each set of parameters into estimates of the uncertainty about every other set of parameters. While software programs such as ASREML66,67 or Proc Mixed in SAS68 may be used to efficiently analyse specific types of longitudinal data (e.g. normally distributed data) our method is more generally applicable and more flexible. (x) When fitted using Gibbs sampling in WinBUGS, the models we describe are highly tractable not only in the sense that they can be easily implemented in software that is freely available, but also in the sense that, because that particular software encodes a general purpose modelling environment widely used throughout the world, our methods could be easily used by research groups other than our own.
In conclusion, the GLMMs described in this paper do not solve all problems in the analysis of longitudinal family data and the models themselves require further study. For example, it is important to investigate the consequences of model misspecification. However, we concur with the sentiment: Commonly used genetic software programs are not designed for longitudinal data analysis and there is a clear need to develop integrated programs.10 When scientific interest focuses on genephenotype associations and on formally decomposing the genetic and environmental covariance components for the intercept and for the slope of a trait, the models that we have described fulfil the need for an integrated approach, and appear to work very well. Furthermore, they provide a sound platform upon which other key developments can now be based.
KEY MESSAGES
|
| Acknowledgments |
|---|
The methodological research programme in Genetic Epidemiology at the University of Leicester is supported by MRC Cooperative Grant no. G9806740, by Program Grant no. 00\3209 from the National Health and Medical Research Council of Australia, and by Leverhulme Research Interchange Grant no. F/07134/K. The University's research programme in the genetics of hypertension is supported, in part, by a Wellcome Trust Thematic Programme Grant in Functional Genomics (no. 066780/Z/01/Z). M.D.T. has been funded by a Medical Research Council Training Fellowship in Health Services and Health of the Public Research. We thank the Framingham Heart Study investigators who kindly provided longitudinal data to GAW13. The Framingham Heart Study is supported by the NHLBI Framingham Heart Study Contract N01-HC-25195. We are grateful to Dr Jean McCluer and the organizers of GAW12 and GAW13. GAW is funded by NIGMS grant GM31575.
| References |
|---|
|
|
|---|
1 Feldman HA. Families of lines: random effects in linear regression analysis. J Appl Physiol 1988;64:172132.
2 Zeger SL, Liang K-Y. An overview of methods for the analysis of longitudinal data. Stat Med 1992;11:182539.[Web of Science][Medline]
3 Burton P, Gurrin L, Sly P. Extending the simple linear regression model to account for correlated responsesan introduction to generalized estimating equations and multi-level mixed modelling. Stat Med 1998;17:126191.[CrossRef][Web of Science][Medline]
4 Fisher R. The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh 1918;52:399433.
5 Lange K, Boehnke M. Extensions to pedigree analysis IV. Covariance components models for multivariate traits. Am J Med Genet 1983;14:51324.[CrossRef][Web of Science][Medline]
6 Khoury MJ, Beaty TH, Cohen BH (eds). Fundamentals of Genetic Epidemiology. Oxford: Oxford University Press, 1993.
7 Hopper J. Variance components for statistical genetics: applications in medical research to characteristics related to human diseases and health. Stat Methods Med Res 1993;2:199223.[Medline]
8 Burton PR, Tiller KJ, Gurrin LC, Cookson WO, Musk AW, Palmer LJ. Genetic variance components analysis for binary phenotypes using generalized linear mixed models (GLMMs) and Gibbs sampling. Genet Epidemiol 1999;17:11840.[CrossRef][Web of Science][Medline]
9 Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:1322.
10 Gauderman WJ, Macgregor S, Briollais L et al. Longitudinal data analysis in pedigree studies. Genet Epidemiol 2003;25:S1828.
11 Almasy L, Amos C, Bailey-Wilson J et al. Genetic Analysis Workshop 13: analysis of Longitudinal Family Data for Complex Diseases and Related Risk Factors. BMC Genet 2003;4(Suppl. 1):13.[CrossRef][Medline]
12 Almasy L, Cupples LA, Daw EW et al. Genetic Analysis Workshop 13: introduction to workshop summaries. Genet Epidemiol 2003; 25:S14.[CrossRef]
13 Strauch K, Golla A, Wilcox MA, Baur MP. Genetic analysis of phenotypes derived from longitudinal data: Presentation Group 1 of Genetic Analysis Workshop 13. Genet Epidemiol 2003;25:S517.
14 Bickeboller H, Barrett JH, Jacobs KB, Rosenberger A. Modeling and dissection of longitudinal blood pressure and hypertension phenotypes in genetic epidemiological studies. Genet Epidemiol 2003;25:S7277.
15 Goldstein H. Multilevel mixed linear model analysis using iterative generalised least squares. Biometrika 1986;73:4356.
16 Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Stat Assoc 1993;88:925.[CrossRef][Web of Science]
17 Burton PR, Gurrin LC, Campbell MJ. Clinical significance not statistical significance: a simple Bayesian alternative to p-values. J Epidemiol Community Health 1998;52:31823.[Abstract]
18 Burton PR. Applications in genetic epidemiology: modelling correlations in nuclear families using multilevel modelling. Multilevel Modelling Newsletter 1995;7:58.
19 Browne WJ, Draper D. Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models. Computational Statistics 2000;15:391420.[CrossRef][Web of Science]
20 Zeger SL, Liang K-Y. Longitudinal data analysis for discrete and continuous outcomes. Biometrics 1986;42:12130.[CrossRef][Web of Science][Medline]
21 Lange K, Weeks D, Boehnke M. Programs for pedigree analysis: MENDEL, FISHER and dGene. Genet Epidemiol 1988;5:47172.[CrossRef][Web of Science][Medline]
22 Eaves LJ, Long J, Heath AC. A theory of developmental change in quantitative phenotypes applied to cognitive development. Behav Genet 1986;16:14361.[CrossRef][Web of Science][Medline]
23 Neale MC, Cardon LR. Methodology for Genetic Studies of Twins and Families. London: Kluwer, 1992.
24 de Andrade M, Gueguen R, Visvikis S, Sass C, Siest G, Amos CI. Extension of variance components approach to incorporate temporal trends and longitudinal pedigree data analysis. Genet Epidemiol 2002;22:22132.[CrossRef][Web of Science][Medline]
25 Scurrah KJ, Palmer LJ, Burton PR. Variance components analysis for pedigree-based censored survival data using generalized linear mixed models (GLMMs) and Gibbs sampling in BUGS. Genet Epidemiol 2000;19:12748.[CrossRef][Web of Science][Medline]
26 Geman S, Geman D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 1984;6:72141.
27 Zeger SL, Karim MR. Generalized linear models with random effects; a Gibbs sampling approach. J Am Stat Assoc 1991;86:7986.[CrossRef][Web of Science]
28 Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. London: Chapman and Hall, 1996.
29 Spiegelhalter DJ, Thomas A, Best N, Lunn D. WinBUGS Version 1.4User Manual. Cambridge: MRC Biostatistics Unit, 2003.
30 Palmer LJ, Cookson WO, James AL, Musk AW, Burton PR. Gibbs-sampling based segregation analysis of asthma-associated quantitative traits in a population-based sample of nuclear families. Genet Epidemiol 2001;20:35672.[CrossRef][Web of Science][Medline]
31 Burton PR, Tobin MD. Epidemiology and genetic epidemiology. In: Balding DJ, Bishop M, Cannings C (eds). Handbook of Statistical Genetics. Chichester: Wiley, 2003.
32 Palmer LJ, Tiller KJ, Burton PR. Genome-wide linkage analysis using genetic variance components of alcohol dependency-associated censored and continuous traits. Genet Epidemiol 1999;17:S28388.
33 Scurrah KJ, Tobin MD, Burton PR. Longitudinal variance components models for systolic blood pressure, fitted using Gibbs sampling. BMC Genet 2003;4(Suppl. 1):S25.
34 Spiegelhalter DJ, Thomas A, Best N, Gilks RW. BUGS: Bayesian inference using Gibbs sampling, Version 0.50. Cambridge: MRC Biostatistics Unit, 1995.
35 Lambert PC, Sutton AJ, Burton P, Abrams K, Jones DJ. How Vague is Vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Stat Med (in press).
36 Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences (with discussion). Stat Sci 1992;17:45772.
37 S-PLUS 6.1 for Windows. 2002. Seattle, Insightful Corp.
38 Hazelton ML, Gurrin LC. A note on variance components in linear mixed models. Genet Epidemiol 2003;24:297301.[CrossRef][Web of Science][Medline]
39 Almasy L, Terwilliger J, Nielsen D, Dyer D, Zaykin D, Blangero J. GAW12: simulated genome scan, sequence and family data for a common disease. Genet Epidemiol 2001;21(Suppl. 1):S33238.
40 Cupples LA, Yang Q, Demissie S, Copenhafer D, Levy D, Framingham Heart Study investigators. Description of the Framingham Heart Study data for Genetic Analysis Workshop 13. BMC Genet 2003;4(Suppl. 1):S2.[CrossRef]
41 Tobin MD, Sheehan NA, Scurrah KJ, Burton PR. Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure. Stat Med (in press).
42 Palmer LJ. Loosening the cuff: important new advances in modeling antihypertensive treatment effects in genetic studies of hypertension. Hypertension 2003;41:19798.
43 Cui JS, Hopper JL, Harrap SB. Antihypertensive treatments obscure familial contributions to blood pressure variation. Hypertension 2003;41:20710.
44 Burton PR. Helping doctors to draw appropriate inferences from the analysis of medical studies. Stat Med 1994;13:1699713.[Web of Science][Medline]
45 Self SG, Liang K-Y. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 1987;82:60510.[CrossRef][Web of Science]
46 Mitchell P. UK launches ambitious tissue/data bank project. Nat Biotechnol 2002;20:529.[CrossRef][Web of Science][Medline]
47 Chadwick R. The Icelandic databasedo modern times need modern sagas? BMJ 1999;319:44144.
48 Helgadottir A, Manolescu A, Thorleifsson G et al. The gene encoding 5-lipoxygenase activating protein confers risk of myocardial infarction and stroke. Nat Genet 2004;36:23339.[CrossRef][Web of Science][Medline]
49 Magnus P, Arnesen E, Holmen J et al. CONOR (Cohort NORway): historie, form
l og potensiale. Norsk Epidemiologi 2003;13:7982.
50 Holman CD, Bass AJ, Rouse IL, Hobbs MS. Population-based linkage of health records in Western Australia: development of a health services research linked database. Aust NZ J Public Health 1999;23:45359.[Web of Science][Medline]
51 Amos CI. Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Hum Genet 1994;54:53543.[Web of Science][Medline]
52 Soler J, Blangero J. Longitudinal familial analysis of blood pressure involving parametric (co)variance functions. BMC Genet 2003;4(Suppl.):86.[CrossRef]
53 Yang Q, Chazaro I, Cui J et al. Genetic analyses of longitudinal phenotype data: a comparison of univariate methods and a multivariate approach. BMC Genetics 2003;4(Suppl. 1):S29.
54 MacGregor S, Knott SA, White I, Visscher PM. Longitudinal variance-components analysis of the Framingham Heart Study data. BMC Genetics 2003;4(Suppl. 1):S22.
55 Burton PR, Palmer LJ, Jacobs K, Keen KJ, Olson JM, Elston RC. Ascertainment adjustment: where does it take us? Am J Hum Genet 2000;67:150514.[CrossRef][Web of Science][Medline]
56 Burton PR. Correcting for non-random ascertainment in generalized linear mixed models (GLMMs) fitted using Gibbs sampling. Genet Epidemiol 2003;24:2435.[CrossRef][Web of Science][Medline]
57 Levy D, DeStefano AL, Larson MG et al. Evidence for a gene influencing blood pressure on chromosome 17, genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham Heart Study. Hypertension 2000;36:47783.
58 Chandler PJ, Bock RD. Age-changes in adult staturetrend estimation from mixed longitudinal data. Ann Hum Biol 1991;18:43340.[CrossRef][Web of Science][Medline]
59 Knuiman MW, Divitini ML, Bartholomew HC, Welborn TA. Spouse correlations in cardiovascular risk factors and the effect of marriage duration. Am J Epidemiol 1996;143:4853.
60 Palmer LJ, Knuiman MW, Divitini ML et al. Familial aggregation and heritability of adult lung function: results from the Busselton Health Study. Eur Respir J 2001;17:696702.
61 Boomsma DI, Vink JM, van Beijsterveldt T et al. Netherlands Twin Register: a focus on longitudinal research. Twin Res 2002;5:40106.[CrossRef][Web of Science][Medline]
62 Hansen J, de Klerk NH, Croft M, Alessandri P, Burton P. The Western Australian Twin Child Health (WATCH) Study: work in progress. Austr Epidemiol 2000;7:1620.
63 Hansen J, Allesandri PT, Croft ML, Burton PR, de Klerk NH. The Western Australian Register of Childhood Multiples: effects of questionnaire design and follow-up protocol on response rates and representativeness. Twin Res 2004;7:14961.[CrossRef][Web of Science][Medline]
64 Scurrah KJ, Sheehan NA, Burton PR. Association and linkage for age at onset of a common oligogenic disease using genetic variance component models. Genet Epidemiol 2001;21(Suppl. 1): S68085.
65 Mancia G, Sega R, Grassi G, Cesana G, Zanchetti A. Defining ambulatory and home blood pressure normality: further considerations based on data from the PAMELA study. J Hypertens 2001;19:99599.[CrossRef][Web of Science][Medline]
66 Gilmour AR et al. ASREML Manual. Orange, NSW, Australia: New South Wales Department of Agriculture, 2002.
67 Meyer K. DFREML Version 2.1Programs to estimate variance components by restricted maxiumum likelihood using a derivative-free algorithm. Armidale, NSW, Australia: AGBU, University of New England, 1992.
68 SAS. Inc., SAS Institute. 992001. Cary, NC, USA, SAS Institute Inc.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. D. Tobin, N. J. Timpson, L. V. Wain, S. Ring, L. R. Jones, P. M. Emmett, T. M. Palmer, A. R. Ness, N. J. Samani, G. D. Smith, et al. Common Variation in the WNK1 Gene and Blood Pressure in Childhood: The Avon Longitudinal Study of Parents and Children Hypertension, November 1, 2008; 52(5): 974 - 979. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











