Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Ismail, A.
Right arrow Articles by Bellis, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ismail, A.
Right arrow Articles by Bellis, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

International Journal of Epidemiology 2000;29:536-541
© International Epidemiological Association 2000

How many data sources are needed to determine diabetes prevalence by capture-recapture?

AA Ismaila, NJ Beechinga, GV Gillb and MA Bellisc

a Division of Tropical Medicine, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool L3 5QA, UK.
b University Department of Medicine, University Hospital Aintree, Liverpool L9 7AL, UK.
c School of Health, Liverpool John Moores University, 79 Tithebarn Street, Liverpool L2 2ER, UK.

Reprint requests to: Dr GV Gill, Department of Medicine, University Hospital Aintree, Lower Lane, Liverpool L9 7AL, UK. E-mail: G.Gill{at}liv.ac.uk

Background Capture-recapture (CR) methods are increasingly used to estimate the size of human populations, including those with diabetes. Few studies have examined the demographic details needed to match patients on the lists used in these techniques, or to determine the optimum number of lists.

Methods Six lists of known diabetic patients attending different medical settings during the study year were obtained. The effects on total enumeration after aggregation of these lists were examined using increasing numbers of demographic data items as patient identifiers. The CR estimates of prevalence were obtained using 15 different combinations of two lists. Estimates were obtained after log-linear modelling for interdependence between different combinations of three and four lists, and after combining the six available lists into three logical lists.

Results For matching patients, adding date of birth to first name and family name as matching criteria increased the total of identified patients from 2500 to 2585 (3% increase), corresponding to a period prevalence of 1.5% (95% CI : 1.41–1.52). Addition of further identifiers, such as partial postcode, only increased the estimate by a further 15 patients (0.5%), and more detailed matching with full postcode introduced uncertainty. The use of two-list CR yielded widely varying estimates of the total diabetic population from 1379 (95% CI : 435–2273) to 9554 (95% CI : 7291–10 983). Log-linear modelling using different combinations of three and four lists produced estimates of 5074 (95% CI : 4417–5947) and 5578 (95% CI : 4918–7081), respectively, after compensating for statistical interdependence between the lists used. The appropriate condensation of six available lists into three lists for modelling yielded estimates of 5492 (95% CI : 4870–6285), corresponding to a CR-adjusted period prevalence of 3.1% (95% CI : 3.03–3.19%).

Conclusions In a Western population, the only demographic data required for matching patients on lists used for CR methods are first name, family name and date of birth, if unique identifiers such as social security numbers are not available. Two lists alone do not produce reliable data, and at least three lists are needed to allow for modelling for ‘dependence’ between datasets. The use of more than three lists does not substantially alter the absolute value or confidence of enumeration, and multiple lists (if available) should be condensed into three lists for use in CR calculations.

Accepted 1 December 1999


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Public Health (Oxf)Home page
R. L. Knowles, A. Smith, R. Lynn, J. S. Rahi, and on behalf of the British Paediatric Surveillance U
Using multiple sources to improve and measure case ascertainment in surveillance studies: 20 years of the British Paediatric Surveillance Unit
J. Public Health Med., June 1, 2006; 28(2): 157 - 165.
[Abstract] [Full Text] [PDF]


Home page
JRSMHome page
G. V Gill, A. A Ismail, N. J Beeching, S. B J Macfarlane, and M. A Bellis
Hidden diabetes in the UK: use of capture-recapture methods to estimate total prevalence of diabetes mellitus in an urban population
J R Soc Med, July 1, 2003; 96(7): 328 - 332.
[Abstract] [Full Text] [PDF]


Home page
Health Informatics JournalHome page
D. I.R. Boyle and S. G. Cunningham
Resolving fundamental quality issues in linked datasets for clinical care
Health Informatics Journal, June 1, 2002; 8(2): 73 - 77.
[Abstract] [PDF]


Home page
QJMHome page
G.V. Gill, A.A. Ismail, and N.J. Beeching
The use of capture-recapture techniques in determining the prevalence of type 2 diabetes
QJM, July 1, 2001; 94(7): 341 - 346.
[Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.