IJE Advance Access published online on October 25, 2009
International Journal of Epidemiology, doi:10.1093/ije/dyp309
Modelling relative survival in the presence of incomplete data: a tutorial
1Cancer Research UK Cancer Survival Group, London School of Hygiene and Tropical Medicine, London, UK.
2North West Cancer Intelligence Service, Christie Hospital NHS Foundation Trust, Wilmslow Road, Manchester, UK.
3Medical Statistics Unit, London School of Hygiene and Tropical Medicine, London, UK.
* Corresponding author. Cancer Research UK Cancer Survival Group, Non-Communicable Disease Epidemiology Unit, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK. E-mail: ula.nur{at}lshtm.ac.uk
| Abstract |
|---|
Background Missing data frequently create problems in the analysis of population-based data sets, such as those collected by cancer registries. Restriction of analysis to records with complete data may yield inferences that are substantially different from those that would have been obtained had no data been missing. Naive methods for handling missing data, such as restriction of the analysis to complete records or creation of a missing category, have drawbacks that can invalidate the conclusions from the analysis. We offer a tutorial on modern methods for handling missing data in relative survival analysis.
Methods We estimated relative survival for 29 563 colorectal cancer patients who were diagnosed between 1997 and 2004 and registered in the North West Cancer Intelligence Service. The method of multiple imputation (MI) was applied to account for the common example of incomplete stage at diagnosis, under the missing at random (MAR) assumption. Multivariable regression with a generalized linear model and Poisson error structure was then used to estimate the excess hazard of death of the colorectal cancer patients, over and above the background mortality, adjusting for significant predictors of mortality.
Results Incomplete information on stage, morphology and grade meant that only 55% of the data could be included in the complete-case analysis. All cases could be included after indicator method (IM) or MI method. Handling missing data by MI produced a significantly lower estimate of the excess mortality for stage, morphology and grade, with the largest reductions occurring for late-stage and high-grade tumours, when compared with the results of complete-case analysis.
Conclusion In complete-case analysis, almost 50% of the information could not be included, and with the IM, all records with missing values for stage were combined into a single missing category. We show that MI methods greatly improved the results by exploiting all the information in the incomplete records. This method also helped to ensure efficient inferences about survival were made from the multivariate regression analyses.
Keywords Cancer registry, colorectal cancer, missing data, multiple imputation, stage, relative survival
Accepted 2 September 2009