Skip Navigation

International Journal of Epidemiology 2005 34(4):949-952; doi:10.1093/ije/dyi012
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Coggon, D.
Right arrow Articles by Evanoff, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Coggon, D.
Right arrow Articles by Evanoff, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2005; all rights reserved.

Methodology

Assessing case definitions in the absence of a diagnostic gold standard

David Coggon1,*, Christopher Martyn1, Keith T Palmer1 and Bradley Evanoff2

1 MRC Environmental Epidemiology Unit, University of Southampton, Southampton General Hospital, Southampton SO16 6YD, UK
2 Division of General Medical Sciences, Department of Medicine, Washington University School of Medicine, Campus Box 8005, 660 South Euclid Avenue, St. Louis, MO 63110, USA

* Corresponding author. MRC Environmental Epidemiology Unit, Southampton General Hospital, Southampton SO16 6YD, UK. E-mail: dnc{at}mrc.soton.ac.uk


    Abstract
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
Optimal case definition is important in epidemiological research, but can be problematic when no satisfactory gold standard is available. In particular, difficulties arise where the pathology underlying a disorder is unknown or cannot be reliably diagnosed. This problem can be overcome if diagnoses are viewed not necessarily as labels for disease processes, but more generally as a useful method for classifying people for the purpose of preventing or managing illness. With this perspective, the value of a case definition lies in its practical utility in distinguishing groups of people whose illnesses share the same causes or determinants of outcome (including response to treatment). A corollary is that the best-case definition for a disorder may vary according to the purpose for which it is being applied.


Keywords Diagnosis, classification, validity

Accepted 23 November 2004

A recent review of diagnostic criteria for upper limb musculoskeletal disorders found 27 published classification systems, no two of which were the same.1 The differences related not only to the criteria by which individual disorders were specified and the names by which they were identified, but also to the range of diagnoses distinguished. Such diversity of classification, which is by no means confined to rheumatology, presents a major challenge to the epidemiologist. Optimal case definition is important in the design of studies, but can be problematic when there is no satisfactory gold standard against which to assess potential diagnostic criteria. In this paper, we examine the basis for diagnostic classifications in medicine and suggest a way of addressing the difficulties that arise in the absence of agreed gold standards.


    Diagnosis as a descriptor of disease
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
When medical students embark on their clinical training, they are taught first how to elicit a history from a patient and how to conduct a relevant physical examination. The aim is to establish a ‘correct’ diagnosis so that prognosis can be predicted and appropriate treatment can be given. Often, a firm diagnosis cannot be made on the basis of only symptoms and signs, and further investigation is required to distinguish between several possible differential diagnoses, for example by radiology or blood tests. Sometimes, an initial diagnosis is revised in the light of new findings at operation, at autopsy, or as the patient's illness evolves over time, because it no longer appears ‘correct’. Implicit throughout is the notion that one or more specific disease processes are responsible for the patient's presenting complaints and are waiting to be discovered, or alternatively that in reality there is no underlying pathology. Diseases are conceived as having an independent existence,2 and clinicians use clinical data to identify the diseases that are most likely to be present.3

This model has obvious limitations. Illness is not simply a manifestation of disease. (Different authors have variously defined terms such as ‘illness’ and ‘disease’.38 For definitions of these and other key terms as used in this paper, see Table 1.) It arises from a complex interplay of pathological, physiological, psychosocial, and cultural influences. Nevertheless, epidemiologists have widely embraced the concept of diseases as objective natural phenomena that can be observed, classified, and investigated, and most epidemiological textbooks include sections on assessment of the ‘accuracy’ of diagnostic tests. We recognize that many diseases occur in a continuous spectrum of severity such that the borderline of normality is ill defined and somewhat arbitrary (e.g. osteoporosis, sensory-neural deafness), and that the means by which we classify a person as having a disease are generally imperfect and sometimes subjective. However, these problems do not detract from our belief in the objective nature of the diseases that we study. Thus, in formulating a case definition, our aim is to distinguish individuals in whom a specific disease is present or has occurred at a specified level of severity.


View this table:
[in this window]
[in a new window]
 
Table 1 Definitions of terms

 
Thinking of case definition in this way works well when disorders are associated with clearly defined pathology that can be established as present or absent with reasonable confidence in at least a representative sample of the study population. For example a fracture of the femoral neck will almost always be demonstrable radiologically, and the presence of e-antigen on serological testing normally provides a reliable index of active infection by the hepatitis B virus. Even if it is not practical or ethical to apply the relevant diagnostic criteria in everyone who is studied, they provide a gold standard against which other case definitions can be tested. Any lack of sensitivity or specificity can then be taken into account when interpreting the results of investigations that use them. For example, thorough histological examination of tissues is generally regarded as providing reliable evidence for or against a diagnosis of cancer (i.e. a gold standard). However, many valuable epidemiological discoveries have been made using less accurate clinical diagnoses of cancer from death certificates (most notably in cohort studies using cancer mortality as an outcome), the effect of any diagnostic misclassification generally being to bias risk estimates towards the null.

Problems arise when there is no satisfactory diagnostic gold standard for the pathology that is presumed to underlie a disorder, and still more when the underlying pathology is unknown.


    Pathology cannot always be diagnosed unequivocally
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
Raynaud's disease is characterized by episodes in which the distal parts of one or more fingers become first pale, cold and numb, and then red and painful. It is believed to result from intermittent spasm of blood vessels in the digits, and can be identified objectively if a patient is observed during an attack. However, currently there is no sure method of excluding the pathology when episodes are not witnessed. How in situations such as this, can we compare potential case definitions and optimize the choice?

One criterion might be the repeatability (consistency) of diagnosis between observers. If there is substantial disagreement between adequately trained independent observers in the classification of cases, then a method of diagnosis cannot be considered entirely satisfactory. However, the fact that a diagnostic test is objective and repeatable does not necessarily imply that it is meaningful, and recognition of this limitation has in some cases led to a paradox. Thus, in exploring the use of cold challenge for the diagnosis of Raynaud's phenomenon, researchers have assessed its sensitivity and specificity against reported symptoms as a standard.9 However, if symptom history provides a reliable gold standard then there should be no need for more elaborate investigations such as cold challenge.

Katz et al. have proposed that where there is no satisfactory diagnostic gold standard, the best proxy against which epidemiological case definitions can be assessed may be the opinion of an expert clinician.10 Classification would then be determined by an empirical approach that identified the elements of history, physical examination, and laboratory tests that best assigned subjects to diagnostic categories specified by the clinician. However, expert clinicians do not always agree, and even when there is a consensus of experts, we cannot be certain that they are right, or that their opinion on what constitutes a case will not change over time. The fourth edition of Brain's textbook of neurology, published in 1951, described Alzheimer's disease as a pre-senile dementia, affecting people between the ages of 40 and 60 years.11 Now, of course, it is thought to be the most common cause of cognitive decline in the elderly, and older age would not be a reason for excluding the diagnosis.


    Disorders for which the underlying pathology is unknown
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
The challenges that confront the epidemiologist are even greater when the pathogenesis of disease is uncertain. Disorders such as schizophrenia are distinguished on the basis of clinical findings, in the belief that the illnesses of those diagnosed arise from a similar, although as yet undefined, pathological mechanism. Historically, such hunches have sometimes turned out to be correct. For example, scurvy was recognized as a distinct clinical entity long before James Lind showed (in 1747) that it could be successfully treated with citrus fruit, and even longer before vitamin C was isolated and synthesized (1932). However, in the absence of a postulated disease mechanism, how can one set of diagnostic criteria be evaluated in comparison with another?


    Statistical techniques
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
One approach that has been advocated when there is no satisfactory diagnostic gold standard is the use of latent class analysis.12 In this technique a mathematical model is constructed in which a ‘latent’ variable is assumed to represent an individual's ‘true’ disease status, and the model is used to estimate errors for different diagnostic tests that might be applied. However, the findings depend on assumptions about the inter-dependence of the errors from each test, and this cannot be established empirically.


    A utilitarian view of diagnosis
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
An alternative way out of the difficulty is to think of diagnoses not necessarily as labels for disease processes, but more generally as a useful method of classifying people for the ultimate purpose of preventing or managing illness.2 This approach has been advocated previously in psychiatry,13 but could usefully be applied more widely. The value of a case definition is then determined by its practical utility in distinguishing groups of people whose illnesses share the same causes or determinants of outcome (including response to treatment), and competing case definitions can be compared according to their performance against these criteria.

There is evidence, for example, that numbness and tingling in the hand which is confined to the sensory distribution of the median nerve and affects most of that area, differs in its association with risk factors from the same symptoms occurring in other anatomical patterns (being more strongly associated with activities that physically stress the wrist, and showing no association with psychological risk factors).14 If correct, this may form the basis for a useful diagnostic distinction. Another example is the classification of non-Hodgkin's lymphoma, in which methods based solely on histological appearances have been superceded by schemes that incorporate data on immunological cell surface markers, with improved prognostic accuracy.15

In psychiatry, diagnostic classifications based entirely on history and clinical examination have been developed through successive iterations following a process described by Robins and Guze as ‘continuing self-rectification and increasing refinement leading to more homogeneous diagnostic groupings’.16 These have been shown to distinguish successfully between patients with different prognosis and response to treatment. Similarly, in the classification of musculoskeletal disorders such as low back pain, useful schemes have not required the specification of pathology, but have described the pattern of symptoms and patient experience.17

If diagnostic criteria are viewed in this way, then optimizing their utility will involve a trade-off between ‘lumping’ and ‘splitting’. When diagnostic categories are too inclusive, the effects of risk factors and determinants of outcome may be diluted to the extent that they are missed or ignored. But if case definition is too discriminatory, the statistical power of investigations may be compromised, as well as the scope for exploiting results in all of the circumstances to which they are relevant. It would be unfortunate, for example, if the benefits of a treatment for non-Hodgkin's lymphoma in general were missed because it had been investigated in only one subtype of the tumour.

Thinking of diagnoses as a useful method of classifying people does not preclude their being related to underlying pathology. Diagnoses that group people whose illnesses arise through the same identified pathological process are likely to be useful for preventive or therapeutic purposes. However, knowledge of underlying pathology is not an essential requirement. Another starting point might be an observed syndrome—i.e. an unusual clustering of symptoms, signs or other clinical features in certain individuals. The fact that clinical features cluster indicates that they may arise as part of the same disease process. And even if there is no shared pathology, there may be shared psychosocial or cultural causes that are amenable to manipulation.

Clinical consensus can also provide a useful initial basis for classification in so far as it represents what clinicians perceive to be a useful way of categorizing patients. However, the value of such classifications cannot be assumed without empirical evidence of utility. The Quebec task force on Spinal Disorders used a consensus process and systematic literature review to arrive at a classification of low back disorders based primarily on symptoms and signs.17 This has helped our understanding of the differences in prognosis of different presentations of low back pain, and has advanced efforts to manage the disorder better.18


    Optimal case definition can vary
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
An important implication of adopting a utilitarian approach to diagnosis is that the optimal case definition for a disorder may vary according to the circumstances in which it is applied. Thus, from analysis of clinical findings in patients who had consulted a physician with hip pain, and using clinical diagnosis as the gold standard, the American College of Rheumatologists concluded that osteophytosis was the best radiographic discriminant of hip osteoarthritis.19 This suggests that the presence or absence of osteophytosis would be a useful component of case definition for osteoarthritis in studies of hospital patients with hip problems, among whom the prevalence of inflammatory arthritis may be substantial. However, in community-based studies, where the prevalence of inflammatory hip disease is much lower than that of osteoarthritis, the most important diagnostic requirement is to differentiate cases not from people with other arthritides, but from people who have no hip disease at all. For this purpose, the presence of joint space narrowing may be a more useful criterion.20 Another example is the classification of chronic renal disease. In aetiological studies there is probably value in distinguishing patients with glomerulonephritis from those with other underlying pathologies, whereas this distinction may be less important when investigating the clinical management of end-stage renal failure.


    How can case definitions be optimized in practice?
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
When faced with a need to classify illness for which there is no satisfactory diagnostic gold standard, researchers can explore the merits of potential case definitions empirically. For example, they may investigate the discriminatory potential of different diagnostic groupings in relation to patterns of association with known or suspected risk factors (as in the example of sensory disturbance in the hand described earlier 14), or in predicting prognosis and clinical outcome when a sample of subjects is followed longitudinally. In this exploratory phase it will usually be sensible to start with the finest classification that can meaningfully be analysed statistically, and then aggregate categories for which no clear distinction is apparent. When using this approach, however, it is important to be aware that some apparent similarities or differences between potential diagnostic groups may be a product of chance, and that when applied to other samples of subjects, the aggregated categories may therefore not perform as well. This problem can be addressed by developing the diagnostic classification in a random subset of subjects, and then testing its performance in the remainder.

Another aspect of methodology that may require care is the choice of measures of association for comparison of relationships to risk factors. In cross-sectional studies of common disorders such as shoulder pain and low back pain, associations with risk factors are often summarized by prevalence ratios. However, the maximum value of a prevalence ratio is constrained by the prevalence of the disorder in unexposed subjects, since by definition prevalence cannot be greater than 100%. This makes prevalence ratios less satisfactory when comparing the associations of a given risk factor with disorders that differ markedly in their prevalence, and in these circumstances it may be preferable to use odds ratios.


    Conclusion
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
We recommend that epidemiologists should view diagnoses not necessarily as labels for diseases but more generally as a useful method of classifying people for the purpose of preventing and managing illness. Where there is good reason to believe that a category of illness results from a defined pathological process and the presence or absence of this pathology can be established accurately, then this provides a gold standard against which other case definitions can be assessed. However, where there is no established underlying pathology or no credible gold standard for the presumed underlying pathology, case definitions should be evaluated according to their practical utility in the elucidation of preventable causes and the optimization of clinical care.


KEY MESSAGES

  • Optimal case definition is important in epidemiology, but can be difficult when there is no satisfactory diagnostic gold standard.
  • One approach to this problem is to think of diagnoses not necessarily as labels for disease processes, but more generally as a useful method of classifying people for the ultimate purpose of preventing or managing illness.
  • The value of a case definition is then assessed by its ability to distinguish, usefully, groups of people whose illnesses share the same causes, or prognosis and response to treatment.

 


    References
 Top
 Abstract
 Diagnosis as a descriptor...
 Pathology cannot always be...
 Disorders for which the...
 Statistical techniques
 A utilitarian view of...
 Optimal case definition can...
 How can case definitions...
 Conclusion
 References
 
1 Van Eerd D, Beaton D, Cole D, Lucas J, Hogg-Johnson S, Bombardier C. Classification systems for upper-limb musculoskeletal disorders in workers: a review of the literature. J Clin Epidemiol 2003;56:925–36.[CrossRef][Web of Science][Medline]

2 Wulff HR. What is understood by a disease entity? J R Coll Physicians Lond 1979;13:219–20.[Web of Science][Medline]

3 Feinstein AR. Scientific methodology in clinical medicine II. Classification of human disease by clinical behaviour. Ann Intern Med 1964;61:757–81.[Abstract/Free Full Text]

4 Feinstein AR. Taxonomy and logic in clinical data. Ann NY Acad Sci 1969;161:450–59.[CrossRef][Web of Science][Medline]

5 Miettinen OS, Flegel KM. Elementary concepts of medicine: III. Illness: somatic anomaly with .... J Eval Clin Pract 2003;9:315–17.[CrossRef][Web of Science][Medline]

6 Miettinen OS, Flegel KM. Elementary concepts of medicine: IV. Sickness from illness and in health. J Eval Clin Pract 2003;9:319–20.[CrossRef][Web of Science][Medline]

7 Miettinen OS, Flegel KM. Elementary concepts of medicine: V. Disease: one of the main subtypes of illness. J Eval Clin Pract 2003;9:321–23.[CrossRef][Web of Science][Medline]

8 Miettinen OS, Flegel KM. Elementary concepts of medicine: VI. Genesis of illness: pathogenesis, aetiogenesis. J Eval Clin Pract 2003;9:325–27.[CrossRef][Web of Science][Medline]

9 Bovenzi M. Finger systolic blood pressure indices for the diagnosis of vibration-inducted white finger. Int Arch Occup Environ Health 2002;75:20–28.[Web of Science][Medline]

10 Katz JN, Stock SR, Evanoff BA et al. Classification criteria and severity assessment in work-associated upper extremity disorders: Methods matter. Am J Ind Med 2000;38:369–72.[CrossRef][Web of Science][Medline]

11 Brain WR. Diseases of the Nervous System. 4th edn. London: Oxford University Press, 1951.

12 Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics 2004;60:427–35.[CrossRef][Web of Science][Medline]

13 Kendell RE. Clinical validity. Psychol Med 1989;19:45–55.[Web of Science][Medline]

14 Reading I, Walker-Bone K, Palmer KT, Cooper C, Coggon D. Anatomic distribution of sensory symptoms in the hand and their relation to neck pain, psycho-social variables and occupational activities. Am J Epidemiol 2003;157:524–30.[Abstract/Free Full Text]

15 Baird S. The usefulness of cell surface markers in predicting the prognosis of non-Hodgkin's lymphomas. Crit Rev Clin Lab Sci 1993;30:1–28.[Web of Science][Medline]

16 Robins E, Guze SB. Establishment of diagnostic validity in psychiatric illness: its application to schizophrenia. Am J Psychiat 1970;126:107–11.

17 Spitzer WO, LeBlanc FE, Dupuis M. Scientific approach to the assessment and management of activity-related spinal disorders. A monograph for clinicians. Report of the Quebec Task Force on Spinal Disorders. Spine 1987;12:51–59.

18 Frank J, Sinclair S, Hogg-Johnson S et al. Preventing disability from work-related low-back pain: New evidence gives new hope if we can just get all the players onside. CMAJ 1998;158:1625–31.[Abstract]

19 Altman R, Alarcon G, Appelrouth D et al. The American College of Rheumatology criteria for the classification and reporting of osteoarthritis of the hip. Arthritis Rheum 1991;34:505–14.[Web of Science][Medline]

20 Croft P, Cooper C, Wickham C, Coggon D. Defining osteoarthritis of the hip for epidemiological studies. Am J Epidemiol 1990;132:514–22.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Occup. Environ. Med.Home page
E. Suarthana, E. Meijer, D. E Grobbee, and D. Heederik
Predicting occupational diseases
Occup. Environ. Med., November 1, 2009; 66(11): 713 - 714.
[Full Text] [PDF]


Home page
Occup Med (Lond)Home page
A. K. Burton, N. A. S. Kendall, B. G. Pearce, L. N. Birrell, and L. C. Bainbridge
Management of work-relevant upper limb disorders: a review
Occup. Med., January 1, 2009; 59(1): 44 - 52.
[Abstract] [Full Text] [PDF]


Home page
Occup Med (Lond)Home page
K. Walker-Bone, I. Reading, D. Coggon, C. Cooper, and K. T. Palmer
Risk factors for specific upper limb disorders as compared with non-specific upper limb pain: assessing the utility of a structured examination schedule
Occup. Med., June 1, 2006; 56(4): 243 - 250.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Coggon, D.
Right arrow Articles by Evanoff, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Coggon, D.
Right arrow Articles by Evanoff, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?