Skip Navigation

International Journal of Epidemiology 2008 37(5):1158-1160; doi:10.1093/ije/dyn204
This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Higgins, J. P T
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Higgins, J. P T
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2008; all rights reserved.

Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified

Julian P T Higgins

MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 0SR, UK.

E-mail: julian.higgins{at}mrc-bsu.cam.ac.uk

Accepted 28 August 2008

It is generally accepted that meta-analyses should assess heterogeneity, which may be defined as the presence of variation in true effect sizes underlying the different studies. This assessment might be achieved by performing a statistical test for heterogeneity, by quantifying its magnitude, by quantifying its impact or by a combination of these. Patsopoulos, Evangelou and Ioannidis propose methods for examining the effect of excluding studies (or groups of studies) on an assessment of heterogeneity.1 Their methods offer benefits over the sometimes practiced ‘leave one out’ approach to sensitivity analysis, by recognizing that the overall effect (against which heterogeneity is measured) changes each time an influential study is excluded. The authors offer a sequential approach (in which the overall effect and heterogeneity measure are re-estimated after the most influential study is removed at each iteration), and a combinatorial approach (in which groups of studies are removed). I suspect the sequential approach may often be performed informally in practice, whereby an obvious outlier is excluded, but another study then appears to be an outlier compared with the remaining studies and is in turn excluded. Of course, if heterogeneity permeates the entire set of studies, one might be tempted continue excluding studies to reduce heterogeneity until a single study remains. A predefined stopping rule (a ‘desired heterogeneity threshold’, in the authors’ terminology) may therefore appear to offer a useful way forward.

Sensitivity analyses are important components of meta-analyses and should be widely encouraged. But is it helpful to assess sensitivity of heterogeneity measures to exclusion of studies, and is it sensible in particular to define a ‘desired threshold’ in terms of the I2 statistic, as these authors have done?

Heterogeneity is to be expected in a meta-analysis: it would be surprising if multiple studies, performed by different teams in different places with different methods, all ended up estimating the same underlying parameter. From the standpoint that heterogeneity is inevitable in a meta-analysis, we are left with the question of whether there is an ‘acceptable’ degree of heterogeneity. My own view is that any amount of heterogeneity is acceptable, providing both that the predefined eligibility criteria for the meta-analysis are sound and that the data are correct. The challenge is then to decide on the most appropriate way to analyse heterogeneous studies, and this will depend on the aims of the synthesis and, to an extent, the observed directions and magnitudes of effects. It may involve a random-effects meta-analysis, in which the heterogeneity is assumed to take a particular form (often, but not necessarily, a normal distribution), or it might involve incorporating study-level covariates.

The paper does not include much discussion of the primary purpose of evaluating sensitivity of heterogeneity metrics to exclusion of studies. Some possibilities might be:

  1. to learn about robustness of the heterogeneity metric per se;
  2. to remove heterogeneity prior to performing a meta-analysis; and
  3. to identify causes of heterogeneity.

The first option strikes me as being of little interest. Furthermore, the same authors have previously illustrated the considerable uncertainty that typically surrounds heterogeneity indices, arguing rightly that this uncertainty should routinely be presented.2 Such presentation should adequately address the question of robustness. The second option raises important questions about the validity of the subsequent meta-analysis, since removal of studies is tantamount to manipulation of the eligibility criteria. As I have commented, methods are available that allow heterogeneity to be taken into account. The third option perhaps offers more promise. However, the message is not clear in the paper, which argues that the methods ‘offer at least an objective approach that is "agnostic", i.e. it is not influenced initially by consideration of known specific study characteristics’. However, the post hoc hypotheses that need to be thought up to explain why the excluded studies might be outlying or influential appear to conflict strongly with any claims of ‘agnosticism’.

If the rationale for examining sensitivity of heterogeneity metrics to exclusion of studies is at best questionable, the actual implementation in terms of I2 is, unfortunately, flawed. Notably, the paper offers no rationale for seeking to reduce I2 below an arbitrary threshold. Indeed, I do not believe any sound rationale can be provided. This is because the paper is based on a misunderstanding of the I2 statistic as, to use two specific quotes, ‘measuring the magnitude of the between-study heterogeneity’, or as a ‘point estimate of between-study heterogeneity’. I2 is neither of these things. It represents the approximate proportion of total variability in point estimates that can be attributed to heterogeneity.3,4 The total variation depends importantly on the within-study precisions (essentially the sample sizes of the individual studies). Therefore, so must I2. Furthermore, I2 does not estimate a meaningful parameter, so should be regarded as a descriptive statistic rather than a point estimate. The authors omit to mention that the magnitude of heterogeneity can be quantified, using a point estimate of the among-study variance of true effects, often called {tau}2 (tau-squared). Thus, I2 may be viewed as the proportion of variability in the point estimates that is due to {tau}2 rather than within-study error. A more appropriate descriptor for I2 would be a measure of inconsistency, since it depends on the extent of overlap in confidence intervals across studies.

To illustrate why I2 is not a sensible metric for Patsopoulos, Evangelou and Ioannidis to use, consider the artificial data sets in Figures 1 and 2. In Figure 1, all studies have the same within-study error, a situation in which I2 correlates closely with the among-study variance, {tau}2. The effect of the sequential algorithm is illustrated in Table 1, and it is seen to behave reasonably well, in the sense that excluding studies to reduce I2 has the effect of reducing {tau}2. In Table 2, however, the algorithm excludes studies (first C, then E) that lie in the middle of the distribution of effect sizes, and the estimate of among-study variance increases when the second of these is removed. This is because removal of large studies increases the average extent of within-study precision and thus reduces I2, even though removal of such studies may actually increase the estimated amount of heterogeneity. Like I2, estimates of {tau}2 often come with considerable uncertainty, illustrated in the tables using confidence intervals.5


Figure 1
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1 Artificial data with equal within-study errors

 

Figure 2
View larger version (14K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2 Artificial data with different within-study errors

 

View this table:
[in this window]
[in a new window]

 
Table 1 Results of sequential algorithm for data with equal within-study errors

 

View this table:
[in this window]
[in a new window]

 
Table 2 Results of sequential algorithm for data with different within-study errors

 
If these algorithms are to be implemented, they should focus on reducing the true among-study variation rather than I2. However, I doubt that they would be a useful addition to routine meta-analysis procedures. Preferable would be to exclude studies (or groups or studies) to assess robustness of the conclusions of the meta-analysis itself. For example, a prediction interval for the true effect in a new study, which encompasses the full distribution of effects in a random-effects meta-analysis, is a convenient way to present findings of a meta-analysis in a way that acknowledges heterogeneity.6 The sequential or combinatorial exclusion of studies from the meta-analysis may reveal that a prediction interval that is apparently persuasive is in fact sensitive to a few influential studies.

In summary, the proposed methods should be avoided for both philosophical and technical reasons, but could be adapted to assess the robustness of conclusions concerning the effect sizes of real interest in a meta-analysis.


    References
 Top
 References
 
1 Patsopoulos NA, Evangelou E, Ioannidis JPA. Sensitivity of between-study heterogeneity in meta-analysis: proposed metrics and empirical evaluation. Int J Epidemiol (2008) 37:1148–57.[Abstract/Free Full Text]

2 Ioannidis JPA, Patsopoulos NA, Evangelou E. Uncertainty in heterogeneity estimates in meta-analyses. Br Med J (2007) 335:914–16.[Free Full Text]

3 Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med (2002) 21:1539–58.[CrossRef][Web of Science][Medline]

4 Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analysis. Br Med J (2003) 327:557–60.[Free Full Text]

5 Viechtbauer W. Confidence intervals for the amount of heterogeneity in meta-analysis. Stat Med (2007) 26:37–52.[CrossRef][Web of Science][Medline]

6 Higgins JPT, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. In: J R Stat Soc A. (in press).


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Am Coll CardiolHome page
A. M. Clark, L. A. Savard, and D. R. Thompson
What is the strength of evidence for heart failure disease-management programs?
J. Am. Coll. Cardiol., July 28, 2009; 54(5): 397 - 401.
[Abstract] [Full Text] [PDF]


Home page
Int J EpidemiolHome page
M. D Coory
Comment on: heterogeneity in meta-analysis should be expected and appropriately quantified
Int. J. Epidemiol., April 6, 2009; (2009) dyp157v1.
[Full Text] [PDF]


Home page
Int J EpidemiolHome page
N. A Patsopoulos, E. Evangelou, and J. P. Ioannidis
Heterogeneous views on heterogeneity
Int. J. Epidemiol., October 21, 2008; (2008) dyn235v1.
[Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Higgins, J. P T
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Higgins, J. P T
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?