IJE Advance Access originally published online on July 28, 2005
International Journal of Epidemiology 2005 34(6):1359-1368; doi:10.1093/ije/dyi148
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Theory and Methods |
The effect of measurement error in risk factors that change over time in cohort studies: do simple methods overcorrect for regression dilution?
1 Department of Epidemiology and Population Health, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London, UK.
2 MRC Biostatistics Unit, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge, UK.
* Corresponding author. Department of Epidemiology and Population Health, Medical Statistics Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK. E-mail: chris.frost{at}lshtm.ac.uk
| Abstract |
|---|
Background The attenuation of the relationship between disease and a risk factor subject to error through regression dilution is well recognized, and researchers often make attempts to adjust for its effects. However, the adjustment methods most often adopted in cohort studies make an implicit assumption that the relationship is driven exclusively by current error-free levels of the risk factor and not by past levels. Here we investigate the bias that is introduced if this assumption is invalid.
Methods We model disease risk at a particular time in terms of error-free levels of the risk factor at that time and in past periods, and summarize the life-course risk factor-disease relationship using crude current level, history adjusted current level and lifetime level associations. Using systolic blood pressure data from the Framingham Heart Study we show the impact of measurement error on these associations and investigate the biases that can occur with simple correction methods.
Results A simple ratio of ranges type correction factor overestimates the lifetime level association by 29% in the presence of a relatively modest dependency of current risk on past levels (levels 5 years ago half as predictive of current risk as current levels).
Conclusions Simple methods of correction for regression dilution bias can lead to substantial overcorrection if the risk factor-disease relationship is not short term.
Keywords Regression dilution bias, measurement error, cohort studies, ratio of ranges, Framingham study, blood pressure
Accepted 21 June 2005
Most risk factors vary over time within subjects and are also measured with error. Interest often centres on the relationship between risk of disease and the error-free levels of a risk factor over some period of time. If only single measurements of the risk factor are available and these are used in an epidemiological analysis, it is well known that the resulting effect sizes are biased estimates of the magnitude of the association between the error-free level of the risk factor and disease.1 Random variation that is unrelated to risk is often termed measurement error, and the bias that it introduces regression dilution,2 since in regression models with only a single risk factor, its effect is always to reduce the magnitude of the association. This latter term is, however, somewhat misleading because in a multiple regression analysis, involving more than one risk factor, measurement error may decrease, increase, or even reverse the direction of an association.3
| Simple non-life-course correction methods |
|---|
Replacing single measurements of risk factors with a mean of two or more measurements reduces the impact of measurement error but does not eliminate it. However, repeated measurements of the risk factor can be used to characterize the extent of measurement error and this has led to the development of methods that utilize repeated measurements to correct for the biases introduced through measurement error.14 Such methods have been used both in individual cohort studies and in meta-analyses.17 A number of these applications utilize repeated measures of the risk factor made a number of years apart (often in a subset of individuals or using data from another study) to calculate correction factors that are then used to inflate the regression coefficient relating to a risk factor measured at baseline.
One of the most commonly adopted techniques is MacMahon's ratio of ranges method.2 This divides the subjects into quintile groups on the basis of the baseline measurement of the risk factor and calculates the difference between the mean levels of the variable in the top and bottom quintile groups for both the baseline and a second measurement. The ratio of ranges method attributes the difference in risk between the top and bottom quintile groups to the difference in mean levels of the risk factor in these groups at the time of the second measurement. Hence the correction factor is the ratio of the difference in mean levels between the top and bottom quintile groups at the time of the first measurement to that at the time of the second measurement. For example Clarke et al.5 use pairs of measurements of blood pressure and cholesterol made 6, 16 and 26 years apart to claim that uncorrected associations of (coronary) disease risk with baseline measurements underestimate the strength of the real associations with usual levels of these risk factors during the first decade of exposure by about one-third, the second decade by about one-half, and the third decade by about two-thirds.
Applications that utilize methods, such as the ratio of ranges, in this way effectively take associations seen with baseline levels of a risk factor and attribute them to levels at a later point in time, often one that is concurrent with the time at risk. Since these methods do not consider how risk factor levels at different times throughout follow-up might contribute differentially to later risk, we term them non-life-course correction methods. In this paper we will demonstrate that such methods are unbiased only if risk factor levels at the time of the baseline measurement have no independent effect on risk, once error-free levels at the time of the second measurement have been allowed for.
The potential dangers are illustrated by considering the following scenario. Suppose individuals in an industrial cohort were divided into quintile groups on the basis of asbestos exposure at the age of 40 years and that exposure was remeasured in survivors at 70 years of age, with mesothelioma mortality rates calculated in each of the groups over a further 10-year period of follow-up to 80 years of age. The difference in mean asbestos exposure between the top and bottom quintile groups at 70 years of age would be expected to be much smaller than that at 40 years of age and we could clearly not attribute the anticipated difference in mesothelioma mortality wholly to the difference in exposure at 70 years of age because of the independent effect of earlier exposure. Hence the ratio of ranges would overcorrect in this situation.
There are alternative ways of using measurements at baseline and a later follow-up time to generate correction factors [see ref. (8) for a detailed comparison of many of the commonly adopted univariate methods] that make the same implicit assumption about the time course of the relationship between the risk factor and the disease. For example, Rosner's regression calibration method9 is a variation of the ratio of ranges method that takes the correction factor to be the reciprocal of the slope of the regression line relating the second measurement to the first. The reciprocal of the product moment correlation coefficient also gives similar results to the ratio of ranges provided both measurements have the same standard deviation.
| The need for a life-course approach |
|---|
The limitations of epidemiological studies that relate risk of disease to risk factors measured at only one or two time points are increasingly recognized, with life-course studies that relate lifetime risk to exposure over a lifetime being advocated10 and carried out in some settings.11 Such methods can address the way in which relationships between risk factors and disease evolve over time. As an example, the risk of coronary heart disease in a 60-year-old man is likely to depend both on current and past error-free blood pressure, with the dependency on error-free blood pressure at 55 years of age different from that at 45 years or 35 years of age. Such scenarios require multiple regression models and it is known12 that adopting univariate corrections for regression dilution in a multiple regression setting can introduce serious errors.
Before we can investigate the impact of correction methods for measurement error we need to establish a framework for the relationship between error-free levels of a risk factor and disease over a lifetime. A full analysis of the relationship between disease risk and a risk factor would require adjustment for covariates, possibly simultaneously allowing for measurement error in these covariates as well. For the purposes of illustration we will ignore such covariates and confine ourselves to investigating associations between lifetime levels of a single risk factor and disease.
| A life-course framework |
|---|
Suppose that we divide a person's lifetime up into relatively short periods and consider levels of the risk factor and risk of death in each of these periods. The periods of time must be sufficiently short for variation in the levels of the risk factor within each period to be unrelated to risk, with the period-specific mean level carrying all relevant information pertaining to disease. We term this theoretical mean level (which can be interpreted as the observed mean of an infinitely large number of evenly spaced measurements taken over the period) the period-specific error-free level. The risk of death in a particular period is likely to depend jointly on error-free levels both in the concurrent and preceding periods. A common way to describe such a relationship is via a logistic regression model where the log odds of disease is linearly related to the error-free levels of the risk factor as follows:
![]() | (1) |
i is the risk in the index period,
is the centred error-free risk factor level in the index period, and
is the corresponding level t periods previously. We term this a full risk factor history model. Statistical alternatives to the logistic regression model could be also used. For example, a Cox proportional hazards model, or a generalized linear model for a binary outcome with a log link could be used. In developing our methodology we will describe log odds ratios arising from logistic regression, but subject to assumptions that commonly hold in practice (see Appendix), our results will apply equally to log hazard and log rate ratios. Conceptually, in order to provide a full description of the association between a risk factor and risk of death, a lifetime should be divided into many short periods, with a separate full risk factor history model for each follow-up period. Each model would relate risk-of-disease in a particular follow-up period to error-free levels in that period and all earlier periods conditional on survival to the start of the period. It is possible, but not essential, to assume that the regression coefficients do not differ between follow-up periods. In practice, data limitations mean that such models are only defined for particular age ranges (e.g. 3079 years in 5-year age bands) or in periods of time relative to the start of a study (first decade post-baseline, second decade post-baseline, etc.). For simplicity, we will consider a single follow-up period, but our arguments apply equally to a set of models for different follow-up periods.
The full risk factor history model (1) can be rewritten to relate disease risk to a weighted average of the period-specific error-free risk factor levels, where the weight given to each period is proportional to the period-specific regression coefficient.
![]() | (2) |
and
. In the asbestos and mesothelioma examples described above, much of the weight would come from measurements made decades earlier because of the long latent period of the disease. For relationships between coronary heart disease and risk factors such as blood pressure and cholesterol, it is plausible that weights from recent periods are larger. However, there is evidence to suggest that weights for periods further back in time will not be negligible. For example, an analysis of 18-year follow-up in the Whitehall study13 provides evidence that the magnitude of the relationship between cholesterol and coronary heart disease increases with increasing interval between measurement and period at risk, suggesting that the relationship is not entirely short term in nature. | The weight-halving interval |
|---|
In this paper, we will focus primarily on situations where the magnitude of the independent relationship between risk of disease and the period-specific level of the error-free risk factor declines according to the length of time between the period of measurement and the period at risk. We introduce a weight-halving interval to characterize this. For example, if the independent log odds ratio associated with a unit increase in an error-free risk factor now is twice that associated with a unit increase 5 years ago, and four times that associated with such an increase 10 years ago and so on, we have a 5-year weight-halving interval. Of course, not all lifetime relationships can be exactly described in terms of a weight-halving interval, but such a concept yields a family of scenarios in which to investigate the impact of measurement error. This includes the cases where risk depends only on the level in the current period (weight-halving interval 5 0) and where risk depends on the lifetime mean level (weight-halving interval =
). | Summary measures of association for a risk factor history relationship |
|---|
Interpretation of a full risk factor history model, such as the logistic regression model presented above, may be complex because of the number of parameters in the model. Hence it can be instructive to consider summary statistics, such as summary odds ratios, that describe key facets of the relationship. Three such summary statistics, each having a different interpretation, are of potential interest (Table 1).
- Crude current association: We use this term as an abbreviation for the crude association with the current error-free level of the risk factor. It can be expressed as the (period-specific) odds ratio associated with a 1 unit increase in the error-free risk factor in the same period. No allowance is made for confounding by risk factor levels in earlier periods, and for this reason, it is the least epidemiologically relevant of the three relationships.
- History-adjusted current association: This is the association with the current error-free level of the risk factor adjusted for all past error-free levels. It can be expressed as the odds ratio associated with a 1 unit increase in the risk factor conditional on levels in earlier periods. It can be considered to describe the short-term potentially modifiable risk in that it measures the difference in risk between two subjects who have different risk factor levels now but had identical levels in the past.
- Lifetime level association: We define this to be the odds ratio associated with a 1 unit increase in the risk factor in the concurrent and each preceding period. It expresses the difference in risk between two subjects whose level has differed by 1 unit throughout their lives. As such it can be described as measuring the long-term potentially modifiable risk, and is perhaps the most epidemiologically relevant of the three associations.
|
Parameters in the full risk factor history model equate to the history-adjusted current log odds ratio [ß1* in Equation (1)] and lifetime level log odds ratio [ßL* in Equation (2)]. The crude current log odds ratio depends additionally on the variances of the period-specific error-free levels and the correlations between them. Formulae are given in the Appendix.
| Measurement error and methods to correct for its effects |
|---|
The effects of measurement error, and methods to correct for it, in the multiple logistic regression model have been described by Rosner et al.4 They are dependent on the between- and within-subject variancecovariance matrices for the covariates in the model. In the full risk factor history model these covariates correspond to the period-specific measurements of the risk factor. Hence, if repeated measurements of the risk factor in each period are available then corrections can be made.
In many applications much more limited data are available, prompting the type of non-life-course analysis carried out by Clarke et al.5 They propose that a naïve effect, such as a log odds ratio, estimated by relating baseline measures of a risk factor to risk in a later period, be inflated using a ratio of ranges correction factor derived from pairs of measurements, one in the baseline and one in the follow-up periods. The key question is whether or not such inflated associations estimate a meaningful parameter, such as one of the summary log odds ratios presented above. In the Appendix we give formulae that relate such non-life-course corrected log odds ratios to the parameters of the full risk factor history model, the variances of the period-specific error-free risk factor levels, the correlations between them, and the period-specific within-subject variances. In general, the non-life-course corrected log odds ratios are not equivalent to either the lifetime level, crude or history-adjusted current log odds ratios. In particular, both the ratio of ranges and simple regression calibration overcorrect the lifetime level log odds ratio by a factor given approximately by
![]() | (3) |
b)iN is the covariance between error-free levels of the risk factor in period i and period N (defining period 1 to be the period at risk and period N the baseline period). If (
b)iN is constant for all follow-up periods (i) then there is no overcorrection. However, since in many applications (
b)iN is likely to decline as the interval between periods increases (i.e. as i decreases from N to 1) there will be overcorrection unless risk in the current period is exclusively dependent on error-free risk factor levels in the current period (w1 = 1 and w2 = ... = wN = 0, i.e. the weight-halving interval is zero). We term this situation exclusively current level prediction of risk.
Since (
b)iN is proportional to the slope of the linear model relating observed measurements of the risk factor in the ith period to those in the Nth period [ß(xi | xN)see Equation (A7) in Appendix] the extent of the over correction can be expressed as
![]() | (4) |
is the slope of the linear model relating a repeated measurement of the risk factor in the baseline period to an initial value in that period. Since the ßs in Equation (4) can be estimated by simple regression models (or ratios of ranges), this permits investigation of the sensitivity of results obtained using non-life-course methods to the assumption of exclusively current level prediction of risk. | Example: relationship between coronary heart disease and systolic blood pressure over 30 years |
|---|
To investigate the extent to which non-life-course methods can introduce bias in cohort studies we consider the relationship between risk of death from coronary heart disease mortality in men in the period of time 2530 years following a baseline measurement and systolic blood pressure in that and earlier periods. For convenience of reporting we assume that log odds ratios are used to relate blood pressure to disease risk, noting, however, that the results apply equally to log rate and log hazard ratios estimated from appropriate models. We consider a range of possible weight-halving intervals for the full risk factor history relationship. We use data from the Framingham Heart Study14,15 to estimate the variances of the period-specific error-free levels, the correlations between them and the period-specific within-subject variances. These estimates are then taken as plausible values (ignoring the random error inherent in their estimation) and used in conjunction with the range of weight-halving intervals to derive the relationships between the various odds ratios of interest (using the algebraic results in the Appendix).
The Framingham Heart Study is a US cohort study, established in the late 1940s in a population of 5209 individuals aged between 30 and 60 years at baseline. Participants were invited to attend two-yearly clinic visits, and systolic blood pressure was one of many variables measured. One thousand five hundred and eight male subjects were known to be alive, and hence at risk, at the start of the period 2530 years after the baseline measurement. We divide follow-up into 5-year periods and assume, for illustrative purposes, that the mortality risk in the sixth period depends only on the error-free systolic blood pressure in that period and each of five preceding periods.
Three different scenarios are illustrated in Figure 1. In the first scenario (exclusively current level prediction) the current risk of death depends only on the current error-free level (shown as a rectangle). The independent effect of the error-free blood pressure level in each earlier 5-year period is zero. The second scenario illustrates a weight-halving interval of 5 years, with the dependency on the level 5 years previously (weight = 25.4%) exactly half that on the current level (50.8%) and twice that on the level 10 years previously (12.7%). The figure also shows an exclusively historical level prediction scenario where the current risk depends only upon the level in the period 25 years previously. Such a scenario is unlikely to be true, but we include it here for comparative purposes.
|
A random effect analysis of variance model, fitted using the statistical package MLwiN (Centre for Multilevel Modelling, Institute of Education, University of London), was used to estimate the mean systolic blood pressure and between- and within- subject variances in each period together with the between-period correlations. In total, 19 734 measurements of systolic blood pressure were available, with the random effect approach giving unbiased estimates provided that data are missing at random.16
| Results |
|---|
Table 2 shows the means and standard deviations of the period-specific error-free blood pressures, the correlations between them, and the period-specific within-subject standard deviations. A number of trends are apparent. Within-period within-subject variability generally increases with time (although no such clear-cut trend is apparent for within-period between-subject variability) and correlations between error-free levels decline with increasing time interval between measurements.
|
Table 3 shows the relationships between the three summary log odds ratios for different weight-halving intervals. The three summary odds ratios are identical only for exclusively current level prediction of risk. In all the other scenarios the history-adjusted current odds ratio is less than the lifetime level odds ratio. The crude current odds ratio is greater than the history-adjusted current odds ratio through confounding with earlier blood pressure levels. It is less than the lifetime level odds ratio because the current level acts as an imperfect surrogate for earlier levels.
|
Table 4 relates the naïve log odds ratio to the summary log odds ratios under each of the weight-halving scenarios. It is always less than the lifetime level log odds ratio; however, the extent to which it underestimates this varies with the weight-halving interval. For exclusively current level prediction the historical log odds ratio is only 36% of the lifetime level log odds ratio. At the other end of the spectrum, for exclusively historical level prediction the figure is 63%.
|
Table 4 also shows the effect of correcting the naïve log odds ratio using the ratio of ranges, simple regression calibration and correlation coefficient methods. Both the ratio of ranges and regression calibration methods give rise to the lifetime level log odds ratio only under exclusively current level prediction. Under all other scenarios these methods lead to overcorrection with the extent of the overcorrection increasing with increasing weight-halving interval. For exclusively historical level prediction they overcorrect by 74%. Even with a relatively short weight-halving interval of 5 years, results are overcorrected by 29%. Under all scenarios other than exclusively current level prediction, the extent of the overestimation is even greater if the aim is to obtain crude current or history-adjusted current odds ratios.
The table also shows that correction using the correlation coefficient consistently leads to overestimation to an even greater extent than with the ratio of ranges or regression calibration. The overestimation occurs even under the assumption of exclusively current level prediction. This is a consequence of the increase in the variability of systolic blood pressure over time in these data. For a weight-halving interval of 5 years the lifetime level, crude current, and history-adjusted current log odds ratios are overestimated by 53, 63, and 201%, respectively.
The effect of age
In common with other published analyses, the above analysis is rather simplistic because it takes no account of the different ages of the subjects at baseline. Table 5 illustrates the effect of separately calculating the variancecovariance matrices in men aged 3039 years (N = 695), 4049 years (N = 528), and 5059 years (N = 266) at baseline. For brevity, results are expressed relative to the lifetime level log odds ratios only. The table shows that the impact of measurement error depends not only on the weight-halving interval, but also on the age of the subjects at baseline. For exclusively current level prediction and short weight-halving intervals the impact of measurement error increases quite dramatically with age. For longer weight-halving intervals the age-specific differences are less marked. As a consequence of this, using the ratio of ranges to make correction leads to the most substantial overcorrection in those aged 5059 years at baseline.
|
| Discussion |
|---|
In long-term cohort studies where risk factors change over time, the way in which such factors relate to risk of disease is likely to be complex. In our view, a life-course approach to studies with follow-up of more than a few years is essential in order to obtain a full understanding of the relationship. This can be achieved by dividing follow-up into a number of periods, either by age or by time relative to a baseline measurement, and using a set of appropriate regression models to relate risk in each period to error-free levels of the risk factor in that and earlier periods. If the set of regression models are homogeneous it is helpful to combine them into a general model relating risk in any period to past and present error-free levels. However, we do not address this issue in this paper, instead focussing on a single model for risk in a particular period. If the risk of disease in such a period is dependent both upon current and past error-free levels of a risk factor then a multiple regression full risk factor history model is needed to provide a complete description of the relationship. In this paper we also propose three distinct summary statistics that describe different facets of the relationship between risk in this period and risk factor levels. We suggest that the epidemiologically most relevant of these is the lifetime level log odds ratio (or alternatively rate or hazard ratio) which relates a consistent change in the risk factor over the whole of past history to the current risk of disease. Our view contrasts with other approaches which refer to real associations with usual levels (of risk factors in decades post baseline)5 and associations with usual blood pressure at the start of a decade6 where the aim appears to be to estimate crude associations with error-free risk factor levels at a particular point in time.
Owing to the complexity in the risk factor-exposure relationship, correction for measurement error in a risk factor that evolves over time must ideally be life-course in nature in order to avoid substantial bias. We have shown that commonly adopted non-life-course methods of correction, such as the ratio of ranges method, only provide unbiased correction under an assumption that current risk is exclusively dependent on the current error-free level of the risk factor. If current risk of disease is also dependent on past error-free levels of the risk factor, the non-life-course methods will usually lead to overcorrection of the lifetime level association, with the extent of the overcorrection dependent on the rate at which the importance of past measurements declines. Such overcorrection applies to an even greater extent if the aim is to estimate the crude current association. For systolic blood pressure, still greater overcorrection can occur if a correlation coefficient is used to calculate correction factors. Furthermore, correction methods need to take account of age at risk in order to provide reliable effect estimates.
Our results demonstrate that a number of claimed corrected associations between usual levels of blood pressure will, in fact, be overcorrected if the disease-exposure relationship is long term in nature. For example, Clarke et al.5 claim that uncorrected associations of (coronary) disease risk with baseline measurements of blood pressure underestimate the strength of the real associations with usual levels of these risk factors during the third decade of exposure by about two-thirds, but this claim relies on the assumption that risk in the third decade of exposure is dependent only on error-free blood pressure measurements in that decade. In the Prospective Studies Collaboration of one million patients non-life-course correction methods are used to relate risk in a particular decade to usual blood pressure at the start of that decade. The associations seen in this study will be overcorrected if risk in each decade is, at least in part, dependent on error-free levels prior to the start of that decade.
Limitations on the amount of data available in many cohort studies may mean that a full life-course model cannot be fitted. However, we have shown [Equation (3)] that the extent to which non-life-course ratio of ranges correction overestimates the lifetime level log odds-ratio is governed by the life-course weights and the covariance structure of the risk factor. Even if the extent of available data does not permit a full risk factor history analysis we believe that a sensitivity analysis, perhaps utilizing a plausible range of weight-halving intervals, should always accompany simple ratio of ranges type corrections.
Our intention in this paper has not been to provide a full account of the relationship between systolic blood pressure and coronary heart disease. The Framingham Heart Study is sufficiently large to allow estimation of parameters that describe the variability in blood pressure between- and within-subjects over 30 years of follow-up. However, it lacks statistical power to fully disentangle the contribution of blood pressure in each period to risk in the concurrent and subsequent periods. We, therefore, estimated the components of systolic blood pressure variability from the Framingham study and combined these with theoretical models for the dependency of risk on current and past blood pressure. A full analysis of the Framingham data would also need to take account of factors such as the impact of past cardiovascular events and the effects of anti-hypertensive drugs on coronary heart disease risk. Nonetheless, this paper demonstrates the dangers of using simple statistical techniques in a complex epidemiological setting. The challenge for epidemiologists and statisticians alike is to ensure that future longitudinal studies are designed, and analytical tools developed, in ways that allow reliable answers to be given to life-course epidemiological questions.
| Appendix |
|---|
(i) The full risk factor history model and summary associations
For ease of interpretation we will develop the risk factor history model within a logistic regression framework. However, the following results will also hold for other generalized linear models and for Cox proportional hazards models [see section (iv)].
For the ith subject denote death in the index period by Di; let
be the error-free risk factor level in the index period and
be the corresponding level t periods previously. The full risk factor history model for the conditional probability of death is then
![]() | (A1) |
![]() | (A2) |
The vector (w1, w2, ..., wN) is a vector of life-course weights whereas and are the lifetime level and history-adjusted current log odds ratios, respectively.
Define
b to be the between-subject variancecovariance matrix of the error-free risk factor levels in the N periods and
w to be a diagonal matrix whose elements are the period-specific within-subject variances and assume that
. Provided
is small and/or
is small for all j, using a result of Neuhaus and Jewell,17 taking expectations conditional on
gives
![]() | (A3) |
denotes the first element of the vector
.
It follows from Equation (A3) that the crude current log odds ratio is given by
. Note that our results describe the relationships between true model parameters. Observed regression coefficients would be subject additionally to random variability.
(ii) Naïve log odds ratio in the presence of measurement error
In this section we derive the log odds ratio relating a single measurement of the risk factor in the earliest period to mortality risk in the index period.
First consider the situation where there is a single observed measurement of the risk factor in each period and assume that
, where xi is the vector of observed risk factor measurements. A logistic regression model can be used to relate these to mortality risk
![]() | (A4) |
Rosner et al.4 show that the vector of regression coefficients in this model is given approximately by
![]() | (A5) |
Now consider the situation where only a single measurement in the earliest period (N) is available and related to mortality risk using the model logit
. Using the results from Neuhaus and Jewell17 in an analogous way to above, this naïve log odds ratio (ßH) is given approximately by
![]() | (A6) |
(iii) Non-life-course corrections
Using either the ratio of ranges or regression calibration to make a non-life-course correction involves dividing an estimate of ßH by an estimate of the slope of the relationship relating a measurement in the index period (1) to one in the earliest period (N). The slope of this relationship is given by
![]() | (A7) |
Using a correlation coefficient to make a correction involves dividing by an estimate of
![]() |
It follows from Equation (A6) that measures of association calculated from baseline measurements and corrected using the ratio of ranges,
, regression calibration
and the correlation coefficient
are given by
![]() | (A8) |
![]() | (A9) |
These parameters do not, in general, equate to either the lifetime level, crude current or history-adjusted current log odds ratios. In particular since
, expressing them as proportions of the lifetime level log odds ratio gives
![]() | (A10)* |
![]() | (A11) |
(iv) Alternative risk factor history models for rate and hazard ratios
If a log, rather than a logistic, link function was used in the formulation of the full risk factor history model [Equation (A1)] results analogous to those above can be derived for risk ratios. The results analogous to Equations (A3), (A5), and (A6) hold exactly (rather than approximately, subject to the conditions stated above), if a log link is used.
If a Cox proportional hazards model was used, subject to the same conditions as above and provided additionally that the cumulative event rate was low, it follows from Hughes18 that analogous results to those above will hold approximately for log hazard ratios.
KEY MESSAGES
|
| Acknowledgments |
|---|
We thank the National Heart, Lung, and Blood Institute for permission to use the Framingham Heart Study data. We also thank Professors Simon Thompson and David Leon for their valuable comments on this work and acknowledge Melissa Wright's contribution to the statistical programming while this work was in development. The Framingham Heart Study is conducted and supported by the NHLBI in collaboration with Framingham Heart Study investigators. This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study or NHLBI.
| References |
|---|
1 Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error: the case of multiple covariates measured with error. Am J Epidemiol 1990;132:73545.
2 MacMahon S, Peto R, Cutler J et al. Blood pressure, stroke, and coronary heart disease. Part 1, Prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias. Lancet 1990; 335:76574.[CrossRef][Web of Science][Medline]
3 Phillips AN, Davey-Smith G. How independent are independent effects?: relative risk estimation when correlated exposures are measured imprecisely. J Clin Epidemiol 1991;44:122331.[CrossRef][Web of Science][Medline]
4 Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. Am J Epidemiol 1992;136:140013.
5 Clarke R, Shipley M, Lewington S et al. Underestimation of risk associations due to regression dilution in long-term follow-up of prospective studies. Am J Epidemiol 1999; 150:34153.
6 Prospective Sudies Collaboration. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet 2002;360:190313.[CrossRef][Web of Science][Medline]
7 Dyer AR, Elliott P, Shipley M. Urinary electrolyte excretion in 24 hours and blood pressure in the INTERSALT Study: II, estimates of electrolyte-blood pressure associations corrected for regression dilution bias. Am J Epidemiol 1994;139:94051.
8 Frost C, Thompson S. Correcting for regression dilution bias: comparison of methods for a single predictor variable. J R Stat Society Series A 2000;163:17389.[CrossRef]
9 Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med 1989;8:105169.[Web of Science][Medline]
10 Ben-Shlomo Y, Kuh D. A life course approach to chronic disease epidemiology: conceptual models, empirical challenges and interdisciplinary perspectives. Int J Epidemiol 2002;31:28593.
11 Lamont D, Parker L, White M et al. Risk of cardiovascular disease measured by carotid intima-media thickness at age 4951: lifecourse study. BMJ 2000; 320:27378.
12 Knuiman MW, Divitini ML, Buzas JS, Fitzgerald PEB. Adjustment for regression dilution in epidemiological analyses. Ann Epidemiol 1998;8:5663.[CrossRef][Web of Science][Medline]
13 Shipley MJ, Pocock SJ, Marmot MG. Does plasma cholesterol concentration predict mortality from coronary heart disease in elderly people? 18 year follow up in Whitehall study. BMJ 1991;303:8992.
14 Dawber TR, Kannel WB, Lyell LP. An approach to longitudinal studies in a community: the Framingham Study. Ann NY Acad Sci 1963;107:53956.[Web of Science][Medline]
15 Anderson KM, Castelli WP, Levy D. Cholesterol and mortality. 30 years of follow-up from the Framingham study. JAMA 1987;257:217680.
16 Rubin DB. Inference and missing data. Biometrika 1976;63:58192.
17 Neuhaus JM, Jewell NP. A geometric approach to assess bias due to omitted covariates in generalized linear models. Biometrika 1993;80:80715.
18 Hughes MD. Regression dilution in the proportional hazards model. Biometrics 1993;49:105666.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
V. Skalicka, F. van Lenthe, C. Bambra, S. Krokstad, and J. Mackenbach Material, psychosocial, behavioural and biomedical factors in the explanation of relative socio-economic inequalities in mortality: evidence from the HUNT study Int. J. Epidemiol., October 1, 2009; 38(5): 1272 - 1284. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bartlett, B. De Stavola, I. White, and C. Frost RE: "EFFECTS OF PAST AND RECENT BLOOD PRESSURE AND CHOLESTEROL LEVEL ON CORONARY HEART DISEASE AND STROKE MORTALITY, ACCOUNTING FOR MEASUREMENT ERROR" Am. J. Epidemiol., February 15, 2008; 167(4): 502 - 503. [Full Text] [PDF] |
||||
![]() |
The Fibrinogen Studies Collaboration Regression dilution methods for meta-analysis: assessing long-term variability in plasma fibrinogen among 27 247 adults in 15 prospective studies Int. J. Epidemiol., December 1, 2006; 35(6): 1570 - 1578. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. D. Smith Equal, but different? Ecological, individual and instrumental approaches to understanding determinants of health Int. J. Epidemiol., December 1, 2005; 34(6): 1179 - 1180. [Full Text] [PDF] |
||||
![]() |
R. W Morris and J. R Emberson Commentary: Over-correction for regression dilution bias? Not for blood pressure vs coronary heart disease Int. J. Epidemiol., December 1, 2005; 34(6): 1368 - 1369. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


















