IJE Advance Access originally published online on January 6, 2008
International Journal of Epidemiology 2008 37(2):368-378; doi:10.1093/ije/dym242
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The evaluation of the diet/disease relation in the EPIC study: considerations for the calibration and the disease models
1 International Agency for Research on Cancer (IARC-WHO), Lyon, France.
2 Strangeways Research Laboratory, University of Cambridge, UK.
3 National Institute of Public Health and the Environment (RIVM), Bilthoven, Netherlands.
4 Cancer Epidemiology Unit, University of Oxford, Headington, Oxford, UK.
5 German Institute of Human Nutrition, Postdam-Rehbrücke, Germany.
6 INSERM (Institut National de la Santé et de la Recherche Médicale), ERI-20, Institut Gustave Roussy, Villejuif, France.
7 Catalan Institute of Oncology, Barcelona, Spain.
8 Department of Clinical Epidemiology, Aalborg Hospital, Aarhus University Hospital, Aalborg, Denmark.
9 Institute of Community Medicine, University of Tromsø, Norway.
10 Department of Hygiene and Epidemiology, University of Athens Medical School, Athens, Greece.
11 Cancer Registry, Azienda Ospedaliera Civile-M.P. Arezzo, Ragusa, Italy.
12 Department of Community Medicine, Lund University, Malmö University Hospital, Malmö, Sweden.
13 Department of Epidemiology and Public Health, Imperial College London; United Kingdom.
14 German Cancer Research Centre, Clinical Epidemiology, C020, AG Nutritional Epidemiology, Heidelberg, Germany.
*Corresponding author. IARC-WHO, 150 cours Albert-Thomas, 69372 Lyon cedex 08, France. E-mail: Ferrari{at}iarc.fr
| Abstract |
|---|
|
|
|---|
Background International multicentre studies on diet and cancer are relatively new in epidemiological research. They offer a series of challenging methodological issues for the evaluation of the association between dietary exposure and disease outcomes, which can both be quite heterogeneous across different geographical regions. This requires considerable work to standardize dietary measurements at the food and the nutrient levels.
Methods Within the European Prospective Investigation into Cancer and Nutrition (EPIC), a calibration study was set up to express individual dietary intakes according to the same reference scale. A linear regression calibration model was used to correct the association between diet and disease for measurement errors in dietary exposures. In the present work, we describe an approach for analysing the EPIC data, using as an example the evaluation of the association between fish intake and colorectal cancer incidence.
Results Sex- and country-specific attenuation factors ranged from 0.083 to 0.784, with values overall higher for men compared with women. Hazard ratio estimates of colorectal cancer for a 10 g/day increase in fish intake were 0.97 [95% confidence interval (CI): 0.95–0.99] and 0.93 (0.88–0.98), before and after calibration, respectively.
Conclusions In a multicentre study, the diet/disease association can be evaluated by exploiting the whole variability of intake over the entire study. Calibration may reduce between-centre heterogeneity in the diet–disease relationship caused by differential impact of measurement errors across cohorts.
Keywords Calibration, multicentre study, measurement error, diet, EPIC
Accepted 30 October 2007
| Introduction |
|---|
|
|
|---|
Over the past 30 years, much epidemiological research has focused on the relationship between diet-related factors and chronic diseases, particularly cancer and cardiovascular diseases. For this purpose, several study designs have been employed using different types of information, such as ecological correlations, migrant and time-trend studies and case-control and cohort studies.
Analytical epidemiology studies aimed at investigating the relationship between diet and chronic diseases face important challenges, very often related to diet ascertainment and confounding bias. Moreover, study populations often show limited variations in nutrient intake. In addition, diet is a complex mixture of foods and nutrients with many correlated elements, very difficult to measure with sufficient accuracy and precision, in order to distinguish the role of different foods and nutrients in epidemiological investigations.
Random and systematic measurement errors in individual dietary assessment may bias estimates of diet–disease associations.1 These effects can be reduced by increasing the overall heterogeneity of the dietary exposures studied, thus limiting the proportion of variability attributable to random error. This was the rationale behind the European Prospective Investigation into Cancer and Nutrition (EPIC),2 a multicentre cohort study on diet and cancer consisting of 23 regional centres in 10 Western European countries with widely varying dietary practices and cancer incidence rates.3
In the EPIC study, information on habitual individual dietary intake were assessed using different validated dietary questionnaires (Q) across participating countries to capture geographical specificity of diet.3 For proper comparison of dietary measurements between cohorts, a calibration sub-study was established for the first time in a large, multicentre European study.4 This approach involved an additional dietary assessment common across study populations to re-express individual dietary intakes according to the same reference scale.5 For this, a single 24 h dietary recall (24-HDR), (R) was collected, as the EPIC reference calibration method, from a stratified random sample of 36 900 subjects from the entire EPIC cohort, using a software program (EPIC-SOFT) specifically designed to standardize the dietary measurements across study populations.4,6 A linear regression calibration model was used to adjust for possible systematic over- or under-estimation in dietary intake measurements and correct for attenuation bias in relative risk estimates.1,7,8
In this work, some issues related to the statistical analyses of these complex data are presented and discussed. In particular, we address how the between- and within-cohort components of the calibration model were used for measurement error correction at the individual and at the ecological level. Furthermore, we address aspects on the evaluation of the diet/disease relationship in the risk model, to exploit fully the overall variability of exposure in the EPIC study, and to evaluate heterogeneity of associations across cohorts. The evaluation of the relationship between diet and cancer incidence in the EPIC study is illustrated with an example of the assessment of the association between fish intake and colorectal cancer risk. The variability of fish intake within- and between-EPIC cohorts, before and after calibration, is evaluated.
| Material and Methods |
|---|
|
|
|---|
EPIC included 366 521 women and 153 457 men, mostly aged 35–70 years, recruited in 23 centres located in 10 European countries. After exclusion of 22 432 prevalent cases of any cancer other than non-melanoma skin cancer at the date of enrolment, 10 208 subjects in the top and bottom 1% of the distribution of the ratio of reported total energy intake to energy requirement were excluded from the analysis to reduce the impact of outlier values. In addition, subjects with missing questionnaire data or missing dates of diagnosis or follow up, representing 2% of the participants were excluded. A total of 478 040 subjects were included in the analysis. Details on the follow-up of study subjects included in the analyses are described by Riboli et al.3
The calibration model
A calibration model was used to correct for errors in dietary measurements, the relation between food groups and/or nutrients and cancer incidence. Given the multicentre nature of EPIC, the calibration model should reflect two levels of evidence, by making dietary exposures comparable across centres (between-group calibration) and correcting the diet/disease relationship for measurement errors (within-group calibration).
For study subject i = 1, ... , nj, in group j = 1, ... , J, be it centre or country, let Tij = (Tij, 1, ... , Tij, K1) denote the 1 x K1 vector of true values of the dietary variables; Qij = (Qij, 1,..., Qij, K1) be a 1xK1 vector of the corresponding questionnaire measurements; Rij = (Rij, 1, ... , Rij, K1) be a 1 x K1 vector of the same error-prone variables measured with the reference instrument and Zij = (Zij, 1, ... , Zij, K2) be the 1 x K2 vector of variables assumed to be measured without errors.
For the k1-th error-prone variables, k1 = 1, ... , K1, it is assumed that the reference measurements Rij, k1 = Tij, k1 +
Rij, k1, which requires that the errors
Rij,k1 are strictly random and that variation around individuals unknown true intake is totally due to within-person random variability or to random measurement errors in reporting individuals diet, i.e. Cov(
Rij,k1, Tij, k1 | group = j) = 0. On the other hand, the corresponding questionnaire measurements Qij,k1 are assumed to contain random and systematic within-person measurement errors.8–10
In line with the calibration methodology introduced by Rosner et al.1 the reference measurements were linearly regressed in a multivariate calibration model11 on the dietary questionnaire measurements to estimate the factors that quantify the impact of bias on observed (naïve) risk estimates. A very general form of the EPIC multivariate calibration model can be defined, for each group j, as
|
| (1) |
The terms
ij capture the residual random variability within each group and are assumed to be multivariate and
ij i.i.d. (0, 
|group=,j). The group-specific K2 x K1 coefficient matrices
j model the effect of confounder variables (Zij), consistently with covariates included in the disease model. The columns of the group-specific matrices
j = (
1, ... ,
K1)j are K1 x 1 vectors of coefficients that relate the k1-th reference measurements Rij, k1 to the vector of dietary questionnaire variables Qij, given the error-free confounding variables, Zij.11 It is worth noting that in situations where more than one variable is measured with errors (K1 > 1), the level of bias of the k1-th parameter that quantifies the diet/disease association, has a multiplicative component, referred to as attenuation factor (elements in the main diagonal of the matrix
j), and an additive component referred to as contamination factors (CFs)12 (off-diagonal elements, in the k1-th column of
j), due to residual confounding by the other (K1 – 1) variables measured with error in the vector Qij included in model (1).11 The way relative risk estimates are affected by attenuation and CFs is outlined in Appendix 1.
The regression calibration allows, for each variable measured with errors, the imputation of predicted values, E(Tij|Q,Z), for the entire cohort, indicating average true intake levels for given questionnaire measurements which are used as predictors in a disease model. In model (1), the terms
j are 1 x K1 group-specific intercept terms. In this way, the between-group component of the calibration model is taken into account, as E(Tij|Q,Z) values are centred on group-specific R means, thus correcting for group-specific biases, while the terms in
j capture the within-group component in the calibration model.
Empirical evidence suggested the use of gender-specific calibration models to take into account gender heterogeneity in the accuracy of Q measurements. In the current study, a multivariate calibration model was used for fish, energy from fat sources and energy from non-fat sources. Calibration was not performed for physical activity values because no reference measurements are available in EPIC for this variable. The sampling distributions of days (Monday–Thursday vs Friday–Sunday) and seasons of 24-HDR administration was corrected using a set of weights in (1) to reproduce an even distribution of recalls across weekday and season.
A general form of the calibration model is described in (1). In this study, country-specific calibration models, rather than centre-specific, were implemented. This allowed greater stability for parameter estimates in (1) to be reached, particularly in those centres with relatively small sample size. Country-specific models were adjusted by centre, to take into account the geographical specificity of dietary variables.
Analyses performed at the country-level showed that residuals were rather homogeneous over the range of intake of fish, energy from fat and energy from non-fat sources, respectively. In some countries, however, the distribution of residuals of fish intake in model (1) was skewed, with a notable frequency of zero values in R measurements. An approach to take into account a heteroskedastic structure of data in the calibration model (1) was employed,13 but results were very similar to the ordinary approach.
Variability of exposure
In order to evaluate the variability of dietary variables, a mixed model was used to estimate the between- and the within-centre variance components of dietary exposures, which reflect variability at the individual and at the aggregate level,8 respectively. Variance components of dietary exposures were estimated using the following mixed model
|
| (2) |
While the two components of energy intake approximated a normal distribution, the distribution of fish intake was rather skewed, with a varying amount of zero-values across centres, particularly for R measurements. Variance components are fairly robust to departures from normality in mixed models.14 However, although analyses performed after log-transformation of dietary variables or using positive Q measurements only showed that variance components and ICC estimates were similar to estimates obtained using untransformed variables, these quantities should be interpreted cautiously.
All analyses were run using the SAS Statistical Software.15 For all analyses, P-values <0.05 were considered statistically significant.
The disease model
The association between fish intake and colorectal cancer risk was evaluated using Cox proportional hazards regression16 with age as the primary time variable.17,18 The analyses were stratified by centre to control for differences in follow-up procedures, questionnaire design and other centre effects. These analyses included 2 279 075 person-years in 4.8 years of follow-up on average since 1992, and 1329 colorectal cancer cases with complete and satisfactory data from a total of 15 centres and 10 countries, as detailed by Norat et al.19
If the overall exposure distribution allows it, the use of pre-defined cut-offs is advisable, in order to better model the expected dose–response relationship over the entire range of exposure in EPIC. In this study, pre-defined cut-points for fish intake (0–20/20–40/40–80/80–120/+120 g/day) were used. In order to perform isocaloric comparisons and to partly control for the error in the estimated intake of fish, energy intake was included to reduce the effect of error correlation between the different dietary components.20,21 Estimated energy intake was divided into energy from fat and energy from non-fat sources, since it is mostly the fat components of diet that contribute to fish intake.19,22 In addition, to improve the performance of these isocaloric comparisons, weight was consistently included in the model.23
Height, smoking (never, former and current smoker) and job-related physical activity (not employed, sedentary occupation, standing occupation, manual and heavy manual work activities) were also incorporated in the model. Furthermore, to explore potential differences by sex, gender-specific as well as overall disease models were used.
To test hazard ratios (HRs) for overall significance, the differences in log-likelihood between models with and without the set of indicator variables used to model the association between categories of exposure and disease risk was compared with a
2-distribution, with degrees of freedom equal to the number of indicators minus one. Similarly, tests of trend were computed using continuous variables, according to a
2-distribution with one degree of freedom. Tests of heterogeneity across countries of associations between each dietary factor and colorectal cancer risk were performed using indicator variables to model the effect of centres and interaction terms between dietary factors and country-specific indicator variables. P-values were derived after comparison of the log-likelihood difference between models with and without interaction terms with a
2-distribution with seven and nine degrees of freedom in men and women, respectively. Similarly, tests for heterogeneity were performed using predicted values of intake after linear calibration.
After correction for measurement errors, the variability of parameter estimates in the disease model should take into account the uncertainty stemming from the regression calibration model. Rosner et al.1 developed an analytical formula to obtain corrected standard errors (SE) for the parameter estimates using a Taylor series expansion. In a multicentre design, the use of group-specific calibration models produces a set of group-specific matrices
j. In the EPIC study, instead of using a meta-analytical approach, thus pooling centre-specific measurement error corrected log(HR) estimates,24 and applying a multivariate Taylor series expansion to correct the SE of pooled eatimates,11 a bootstrap sampling procedure was used25,26 to compute corrected SE for the overall calibrated parameter estimates.
Sampling in the calibration model is performed to estimate the coefficients in the calibration model and obtain predicted values to estimate the corrected association for dietary exposure,
. This procedure yields a set of parameter estimates,
, m = 1, ... , M, where M is the number of bootstrap samplings. The variance of the corrected parameter can be approximated by
|
| (3) |
Although measurement error correction is expected to inflate the variability of the parameter that quantifies the diet/disease relationship, in two cases in this study the statistical significance of the calibrated parameter was higher than the uncorrected HR estimate (e.g. fish intake in men). This might be attributable to the fact that in a multicentre study, if the regression parameters in the calibration model are estimated with great precision, calibration can lead to a reduction of the between-centre heterogeneity in the overall diet–disease relationship caused by differential impact of measurement errors, thus leading to a relatively more precise estimate of the overall relative risk,7,9 as illustrated in Appendix 2.
To assess heterogeneity of de-attenuated associations across countries, the uncertainty from the calibration model was not taken into account as such uncertainty [reflected in second part of expression (3)] was minimal.
| Results |
|---|
|
|
|---|
In Table 1, attenuation factors estimated in gender- and country-specific calibration models are presented for fish intake. Attenuation factors were low in women in the UK (0.083) and Norway (0.212), as well as in Sweden (0.236 in men and 0.191 in women). Similar values were observed in men and women, with the exception of the UK (0.518 vs 0.083 for men and women, respectively) and Italy (0.531 vs 0.297). In the same table, CFs, which are the coefficients in model (1) relating reference measurements of energy from fat and energy from non-fat sources to Q measurements of fish intake were generally negative, and ranged substantially across countries. To interpret accurately the CFs, it should be noted that their absolute values reflect the difference in scale between energy and fish intakes.
|
The square root of variance components of the dietary variables, as well as ICC values, are shown in Table 2. For fish intake, the amount of variability explained by between-centre differences is very similar in the two genders, and lower in R compared with Q measurements. This is mostly due to the large within-person day-to-day variability of fish intake in the single R measurement available per subject. The between-centre variability of the predicted intake, E(T|Q, Z), is very close to variations in the R measurements. The within-centre variability of E(T|Q, Z) shrank when compared with Q measurements. As a consequence, ICC values are 2-fold higher for E(T|Q, Z) than for Q, for both men and women. More pronounced increases in ICC values are observed in E(T|Q, Z) compared with Q measurements for energy from non-fat sources compared with energy from fat.
|
In Table 3, the number of colorectal cancer cases, of person-years and the HRs estimates for pre-defined categories of fish intake are reported. Similar associations were observed in men and women.
|
In Table 4, original and calibrated HR estimates are reported in models, where fish and energy intakes were modelled on a continuous scale. These results suggest that fish intake is associated with a reduction in risk of colorectal cancer, consistently for men and women. Energy from fat and from non-fat sources showed, respectively, negative and positive relationships with colorectal cancer, although these associations were not statistically significant. Linear calibration resulted in a sizeable de-attenuation in the risk for fish intake, showing a 2-fold reduction in women and almost a 3-fold reduction in men for log(HR) estimates.
|
The corrected SEs of calibrated parameters estimated with a bootstrap sampling approach did not show appreciable variation compared with uncorrected SEs. Linear calibration reduced the heterogeneity of associations across countries in men. Overall, after linear calibration, for men and women combined, the relation between fish and colorectal cancer was statistically not heterogeneous in the EPIC countries.
| Discussion |
|---|
|
|
|---|
Multicentre epidemiological studies on diet and chronic diseases are relatively new and they offer a series of challenging methodological issues on the standardization of dietary measurements at the food and nutrient levels. In the EPIC study, much effort has been invested to make dietary exposures comparable across centres. This issue was addressed with a calibration study and the preparation of a common standardized nutrient database to improve the comparability of dietary exposures across the 10 participating countries. Thus, the same rules were applied to compile the nutrient databases and aggregate diverse food groups across countries in order to use standardized R measurements as reference.27,28
Linear regression calibration is a method aimed at correcting a linear diet/disease relation when exposure is expressed on a continuous scale, in which relative risk estimates express the increase in risk for a quantitative increment of intake. The variance of predicted intake expresses the variance predicted by Q measurements
, and does not reflect true variability. In EPIC, with only one replicate of R measurements, it is not possible to estimate the true variation of exposure, as it has been previously reported.8 Although quantile-specific means of predicted (calibrated) exposure are correct estimates of true mean intake, categorical variables based on the predicted values distribution should be used with caution because the predicted variation does not reflect true variation.
In this study, linear associations between fish intake and risk of colorectal cancer are presented, before and after calibration. In the case of dietary exposure with a sizeable frequency of non-consumers, possible departures from linearity between dietary exposure and disease should be consistently evaluated. For observed Q measurements, this might be achieved with the inclusion of an indicator variable (0 = non-consumer/1 = consumer) in the model, which allows the diet/disease relationship to be evaluated among consumers only. Similarly, the same indicator can be used in the linear regression calibration to evaluate possible departure from linearity if the indicator is statistically significant, and possibly use calibration models using non-zero Q values only. In the current study, although the percentage of non-consumers of fish intake according to Q measurements ranged from 0.9 (Denmark) to 24.5% (UK), no evidence of departure from linearity has been observed in the calibration and in the disease model.
The regression calibration approach has a theoretical justification based on a linear approximation, provided that the errors in R and Q measurements are uncorrelated.1 Correlation between errors in self-reported dietary measurements is likely to exist as a consequence of study subjects tendency to consistently under- or over-estimate dietary intake.9,29–31 Attenuation factors are overestimated in the case of correlation between errors in R and Q measurements,22,32 at least when the univariate case is considered, i.e. the case of calibrating the relationship of one dietary exposure and disease risk.31 As a consequence, the linear calibration will result in a conservative correction.
In the case of multiple exposure measured with errors, high levels of error correlations between questionnaire measurements can lead to bias of regression estimates in multivariable calibration models. Although it has been suggested that energy adjustment, either by energy density or by direct adjustment, can reduce the effect of correlation between random measurement errors,21,33,34 it should be noted that these considerations apply in situations where the correlation between errors in Q and R measurements is not too sizeable, and when truly unbiased reference measurements are available. Both conditions are likely to challenge a correct evaluation of many diet–disease associations in nutritional epidemiology, including the one investigated in this work. In the present analysis, energy adjustment was performed in the multivariate calibration model. Although the observed correlation coefficients between fish and energy from fat and energy from non-fat sources do not reflect the true association between these variables, because of measurement errors in dietary exposure, as well as of correlation between errors, our findings suggest that error correlations between fish and energy intake are not likely to be very high.
In general, energy adjustment seems to be particularly effective in the case of macronutrient intakes,8,20,21 for which higher correlations between errors are expected. Even after energy adjustment, error correlation in Q and R measurements cannot be ruled out, and findings on the aetiological role of dietary exposure factors on disease outcome need an extremely cautious interpretation.
Linear calibration assumes that R measurements provide an unbiased estimate of true exposure. The paper by Ferrari et al.31 showed under reporting in EPIC R measurements, thus suggesting the presence of some systematic errors. Under the assumption that under reporting is either randomly distributed across subjects within groups or is of the same magnitude across groups, R measurements could still provide a reference scale without absolute validity but common across subjects within group (at the individual level), and for between-group comparison (at the aggregate level)35. However, results from the OPEN validation study31 showed that 24-HDR measurements contain, at least for total energy and protein intakes, intake-related bias, which occurs when the tendency to under or overestimate is different for subjects with a high intake than for subjects with a low intake. Intake-related bias is not randomly distributed and it is likely to be related to study subjects characteristics, thus providing evidence on the limitations of the use of 24-HDRs as reference measurements.
The EPIC calibration model was adjusted for weight, age, and a set of weights was used to model the effect of week day of 24-HDR, as well as the season of the year in which the 24-HDR was administered. Although this adjustment was performed consistently in order to partially control the systematic component of measurement errors,36 it cannot be excluded that systematic error is still present and bias individual R measurements.
In the EPIC study, fish intake showed a significant protective effect for colorectal cancer.19 The effect of dietary exposure on cancer incidence was assessed by modelling the effect of these factors as continuous variables, or by using categories of exposure, to capture dose–response effects other than linear. In a multicentre study, the use of EPIC-wide categories of exposure ensures that the comparisons are made over the variability of intake of the entire EPIC cohort, although adjusting for centre would limit the evaluation of the diet/disease relationship within each group. In some cases, the distribution of intakes could vary widely by centre, so the within-centre distribution of cases by category of intake can be very heterogeneous across centres. This approach is therefore valid on condition that each category of consumption is represented by cases from each centre so that, within each centre, comparisons are made across the full range of intakes. In addition, it is assumed that the diet/disease association is the same across the entire exposure range. The power of the comparison will be weaker than if each centre had the same distribution, but the comparisons and reported HRs refer to the full range of variation in intake.
Regression calibration concentrates the values of calibrated exposure within each centre around the R mean for that centre as a consequence of the shrinkage in the distribution of predicted values.8 In this study, the analysis of variance components of predicted intake values (i.e. after calibration) showed strong shrinkage towards centre-specific mean values, where estimates of ICCs of E(T|Q,Z) values were 2-fold higher than for Q measurements.
In the current study, no measurement error correction was performed for physical activity and smoking status, for which no reference measurements are available. Although the observed correlation between these factors and the dietary variables investigated in this study was not sizeable, the evaluation of their confounding role in the disease model is likely to be attenuated by measurement error. Therefore, the presence of bias in corrected HR estimates arising from the inclusion of variables measured with error in the disease model cannot be entirely ruled out.
Measurement errors bias the relative risk estimate in the disease model, and cause power loss. Within single study centres, calibration may correct for bias but does not recuperate power losses due to random errors. In addition, in multivariate analyses with more than one variable measured with error, incorrect tests and confidence intervals (CIs) can be produced in the disease model. In this study, a relatively more precise estimate of the overall HR was observed,37 compared with a naïve approach that does not take into account measurement errors in dietary variables. However, the possibility cannot be excluded that, in the context of measurement error correction with multiple variables measured with errors, the nominal values of statistical tests in the disease model are not fully controlled for.
A bootstrap sampling procedure was set up to take into account the uncertainty of estimating the attenuation factors in the calibration model compared with using a meta-analytical approach using centre-specific corrected SEs, but in contrast to the latter it does not use any specific assumptions to derive asymptotic estimates corrected SE. This could be an advantage, particularly in the case of food groups, with a sizeable frequency of zero consumers in the 24-HDR measurements. In this study, bootstrapping did not practically inflate the variability of the corrected risk parameters, possibly because of the large sample size of the EPIC calibration study, thus suggesting that the observed SEs provide sufficiently accurate estimates.
Here, a strategy for statistical analyses within the EPIC study has been presented and discussed. This strategy entails group-specific calibration models, to take into account the regional specificity and accuracy of the dietary questionnaires, and an overall disease model stratified (in the Cox model sense) by centre to evaluate the diet/disease association before and after measurement error correction. A very valuable alternative would be to use group-specific disease models, and to pool the different estimates following a meta-analytical approach in close similarity with what is done within the Pooling Project.24 Such a strategy was considered in this study and showed very similar results to those presented here.
| Appendix 1 |
|---|
|
|
|---|
For each group j, the uncorrected log-relative risk estimate of the first variable measured with errors, β1,obs, can be expressed as
|
|
1,k1 (j) indicate the elements of the vector
1 in matrix
j, and βk1, true express the true risk estimates for the variables measured with errors. The parameter β1,obs is the result of a linear combination, which reflects a multiplicative bias, captured by the attenuation factor,
1, 1(j), as well as an additive component, associated with the CFs
1, k1 (j)(for k1
1), and the true risk estimates for the other K1 – 1 variables βk1, true(k1
1). | Appendix 2 |
|---|
|
|
|---|
In the case of a multicentre study (j = 1, ... , J centres), it is assumed, for simplicity, that the dietary variable under investigation has the same variability (var(Q)), and the same number of cancer cases (D) in all centres [e.g. var(Q) = var(Q)j and D = Dj, for each j]. Let's consider the case of one variable measured with error, where the centre-specific estimates that quantify the diet–disease associations are the parameter βj (= log(RR)), whose variability can be approximated by38
|
|
Following a meta-analytical approach, the pooled estimate can be derived as
|
|
|
|
Therefore, the ratio of the pooled parameter over its SE is equal to
|
| (A1) |
The P-value associated to this ratio is computed according to a standard normal distribution.
It can be assumed that the centre-specific attenuation factors (
j) are inversely proportional to the magnitude of the observed associations. This is equivalent to assume that the heterogeneity of βj estimates is entirely attributable to measurement errors with high degree of specificity in the different centres, e.g.
= cj = c.
The de-attenuated pooled estimate can be obtained as
|
|
This last result stems from the assumption that estimates of
j coefficients are measured with great precision, so that
. In this way,
. The variance of the de-attenuated pooled estimate can be obtained as
|
|
|
| (A2) |
It is shown that the de-attenuated ratio in (A2), under the hypotheses made, is greater than the ratio observed in (A1), thus indicating a gain in the relative precision of the overall estimate after regression calibration.
|
|
Recalling that
, and
, it follows that
|
| (A3) |
After definition of the two quantities
and
,
|
|
|
|
Expression (A3) is equivalent to
|
| (A4) |
Expression (A4) always applies, as a sum of squares plus the product of the number of centres times a square is a positive quantity. For this reason, standing the assumptions made, the associated P-value will be lower after measurement error correction, thus indicating that in cases where the between-centres heterogeneity in the diet–disease associations is largely attributable to differential measurement errors across centres, measurement error correction can lead to a relative gain in statistical power.
| Acknowledgements |
|---|
|
|
|---|
The EPIC study was funded by Europe Against Cancer Programme of the European Commission (SANCO); Ligue contre le Cancer (France); Socit 3M (France); Mutuelle Genrale de lEducation Nationale; Institut National de la Santè et de la Recherche Medicale (INSERM); German Cancer Aid; German Cancer Research Center; German Federal Ministry of Education and Research; Danish Cancer Society; Health Research Fund (FIS) of the Spanish Ministry of Health; the participating regional governments and institutions of Spain; Cancer Research UK; Medical Research Council, UK; the Stroke Association, UK; British Heart Foundation; Department of Health, UK; Food Standards Agency, UK; the Wellcome Trust, UK; Greek Ministry of Health; Greek Ministry of Education; Italian Association for Research on Cancer; Italian National Research Council; Dutch Ministry of Public Health, Welfare and Sports; World Cancer Research Fund (WCRF); Swedish Cancer Society; Swedish Scientific Council; Regional Government of Skane, Sweden; Norwegian Cancer Society. We are very thankful to the reviewers for their enlightening comments on the developments of this work.
Conflict of interest: None declared.
| References |
|---|
|
|
|---|
1 Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med (1989) 8:1051–69. (Discussion 1071–13).[Web of Science][Medline]
2 Riboli E. Nutrition and cancer: background and rationale of the European Prospective Investigation into Cancer and Nutrition (EPIC). Ann Oncol (1992) 3:783–91.
3 Riboli E, Hunt KJ, Slimani N, et al. EPIC: study populations and data collection. Public Health Nutr (2002) 5:1113–24.[CrossRef][Web of Science][Medline]
4 Slimani N, Ferrari P, Ocké M, et al. Standardization of the 24-hour diet recall calibration method used in the European Prospective Investigation into Cancer and Nutrition (EPIC): general concepts and preliminary results. Eur J Clin Nutr (2000) 54:900–17.[CrossRef][Web of Science][Medline]
5 Slimani N, Kaaks R, Ferrari P, et al. EPIC calibration study: rationale, design and population characteristics. Public Health Nutr (2002) 5:1125–45.[CrossRef][Web of Science][Medline]
6 Slimani N, Deharveng G, Charrondiere RU, et al. Structure of the standardized computerized 24-hour diet recall interview used as reference method in the 22 centers participating in the EPIC project. Comput Methods Programs Biomed (1999) 58:251–66.[CrossRef][Web of Science][Medline]
7 Kaaks R, Plummer M, Riboli E, et al. Adjustment for bias due to errors in exposure assessments in multicenter cohort studies on diet and cancer: a calibration approach. Am J Clin Nutr (1994) 59(Suppl 1):245S–50S.[Medline]
8 Ferrari P, Kaaks R, Fahey MT, et al. Within- and between-cohort variation in measured macronutrient intakes, taking account of measurement errors, in the European Prospective Investigation into Cancer and Nutrition study. Am J Epidemiol (2004) 160:814–22.
9 Kaaks R, Riboli E, Estève J, van Kappel AL, van Staveren WA. Estimating the accuracy of dietary questionnaire assessments: validation in terms of structural equation models. Stat Med (1994) 13:127–42.[Web of Science][Medline]
10 Kaaks R, Ferrari P, Ciampi A, Plummer M, Riboli E. Uses and limitations of statistical accounting for random error correlations, in the validation of dietary questionnaire assessments. Public Health Nutr (2002) 5:969–76.[CrossRef][Web of Science][Medline]
11 Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol (1990) 132:734–45.
12 Kipnis V, Freedman LS, Brown CC, et al. Effect of measurement error on energy-adjustment models in nutritional epidemiology. Am J Epidemiol (1997) 146:842–55.
13 Blough DK, Ramsey SD. Using generalized linear models to assess medical care costs. Health Serv Outcome Res Methodol (2000) 1:185–202.[CrossRef]
14 Verbeke G, Lesaffre E. The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Comput Stat Data Anal (1997) 23:541–56.[CrossRef]
15 SAS Institute. SAS Online Doc®, Version 8. (1999) Cary, NC: SAS Institute Inc.
16 Cox DR. Regression models and life tables (with discussion). J R Stat Soc Ser B (1972) 34:187–220.
17 Korn EL, Graubard BI, Midthune D. Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale. Am J Epidemiol (1997) 145:72–80.
18 Thiebaut AC, Benichou J. Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study. Stat Med (2004) 23:3803–20.[CrossRef][Web of Science][Medline]
19 Norat T, Bingham S, Ferrari P, et al. Meat and fish consumption, and colorectal cancer risk: the European Prospective Investigation into Cancer and Nutrition (EPIC). J Natl Cancer Inst (2005) 97:906–16.
20 Willett WC, Howe GR, Kushi LH. Adjustment for total energy intake in epidemiologic studies. Am J Clin Nutr (1997) 65(Suppl. 4):1220S–28S.[Medline]
21 Day NE, Wong MY, Bingham S, et al. Correlated measurement error–implications for nutritional epidemiology. Int J Epidemiol (2004) 33:1373–81.
22 Bingham SA, Luben R, Welch A, Wareham N, Khaw KT, Day N. Are imprecise methods obscuring a relation between fat and breast cancer? Lancet (2003) 362:212–14.[CrossRef][Web of Science][Medline]
23 Jakes RW, Day NE, Luben R, et al. Adjusting for energy intake–what measure to use in nutritional epidemiological studies? Int J Epidemiol (2004) 33:1382–86.
24 Smith-Warner SA, Spiegelman D, Ritz J, et al. Methods for pooling results of epidemiologic studies: the Pooling Project of prospective studies of diet and cancer. Am J Epidemiol (2006) 163:1053–64.
25 Efron B, Tibshirani RJ. An Introduction to the Bootstrap. (1998) London: Chapman & Hall.
26 Rosner B, Gore R. Measurement error correction in nutritional epidemiology based on individual foods, with application to the relation of diet to breast cancer. Am J Epidemiol (2001) 154:827–35.
27 Deharveng G, Charrondiere UR, Slimani N, Southgate DA, Riboli E. Comparison of nutrients in the food composition tables available in the nine European countries participating in EPIC. European Prospective Investigation into Cancer and Nutrition. Eur J Clin Nutr (1999) 53:60–79.[CrossRef][Web of Science][Medline]
28 Slimani N, Charrondière UR, van Staveren W, Riboli E. Standardization of food composition databases for the European Prospective Investigation into Cancer and Nutrition (EPIC): general theoretical concept. J Food Comp Anal (2000) 13:567–84.[CrossRef]
29 Plummer M, Clayton D. Measurement error in dietary assessment: an investigation using covariance structure models. Part I. Stat Med (1993) 12:925–35.[Web of Science][Medline]
30 Day NE, Ferrari P. Some methodological issues in nutritional epidemiology. In: Nutrition and Lifestyle: Opportunities for Cancer Prevention.—Riboli E, Lambert R, eds. (2002) Lyon: International Agency for Research on Cancer. 5–10. (IARC Sci Publ 156).
31 Kipnis V, Subar AF, Midthune D, et al. Structure of dietary measurement error: results of the OPEN biomarker study. Am J Epidemiol (2003) 158:14–21.
32 Kipnis V, Midthune D, Freedman I, et al. Bias in dietary-report instruments and its implications for nutritional epidemiology. Public Health Nutr (2002) 5:915–23.[CrossRef][Web of Science][Medline]
33 Willett W. Commentary: dietary diaries versus food frequency questionnaires—a case of undigestible data. Int J Epidemiol (2001) 30:317–19.
34 Schatzkin A, Kipnis V, Carroll RJ, et al. A comparison of a food frequency questionnaire with a 24-hour recall for use in an epidemiological cohort study: results from the biomarker-based Observing Protein and Energy Nutrition (OPEN) study. Int J Epidemiol (2003) 32:1062–63.
35 Riboli E, Kaaks R. Invited commentary: the challenge of multi-center cohort studies in the search for diet and cancer links. Am J Epidemiol (2000) 151:371–74.
36 Ferrari P, Slimani N, Ciampi A, et al. Evaluation of under- and overreporting of energy intake in the 24-hour diet recalls in the European Prospective Investigation into Cancer and Nutrition (EPIC). Public Health Nutr (2002) 5:1329–45.[Web of Science][Medline]
37 Kaaks R, Riboli E, van Staveren W. Calibration of dietary intake measurements in prospective cohort studies. Am J Epidemiol (1995) 142:548–56.
38 Truett J, Cornfield J, Kannel W. A multivariate analysis of the risk of coronary heart disease in Framingham. J Chronic Dis (1967) 20:511–24.[CrossRef][Web of Science][Medline]
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
V. Pala, V. Krogh, F. Berrino, S. Sieri, S. Grioni, A. Tjonneland, A. Olsen, M. U. Jakobsen, K. Overvad, F. Clavel-Chapelon, et al. Meat, eggs, dairy products, and risk of breast cancer in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort Am. J. Clinical Nutrition, September 1, 2009; 90(3): 602 - 612. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. J. van Duijnhoven, H B. Bueno-De-Mesquita, P. Ferrari, M. Jenab, H. C Boshuizen, M. M Ros, C. Casagrande, A. Tjonneland, A. Olsen, K. Overvad, et al. Fruit, vegetables, and colorectal cancer risk: the European Prospective Investigation into Cancer and Nutrition Am. J. Clinical Nutrition, May 1, 2009; 89(5): 1441 - 1452. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Spiegelman Commentary: Calculations of EPIC proportions Int. J. Epidemiol., April 1, 2008; 37(2): 379 - 381. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



