Professional News
DSM Developers Employ Strategies to Strengthen Diagnostic Reliability
Psychiatric News
Volume 47 Number 5 page 6-28

Design of tests for diagnostic reliability in DSM-5 field trials is intended to reflect real-world clinical situations and marks a departure from study methods used in previous editions of DSM.

In an interview with Psychiatric News, Helena Kraemer, Ph.D., a consultant to the DSM-5 Task Force, said much has been learned since DSM-IV was published about how the manner in which agreement between raters is measured can dramatically influence kappa, the statistical tool developed by statisticians to determine reliability of observations. (For a description of how kappa scores are derived and interpreted, see How Is Reliability Estimated? at DSM-5 Emphasizes Diagnostic Reliability).

Kraemer noted that a kappa score is essentially a measure of the ratio between a true “signal”—the reliability of diagnostic criteria—and “noise.” The noise that obscures the true signal might take the form of the inconsistency of expression of criteria by patients (different patients with the same condition may present differently, and the same patient might present differently on different days), or it could take the form of the inconsistency of application of those criteria by clinicians (different clinicians will interpret data about the patient differently).

To capture, as nearly as possible, clinical decision making in the “real world,” methods for testing reliability should allow for both forms of noise; after all, it’s the noise that makes diagnosis difficult and (sometimes) variable in the real world, and so researchers who want a true picture of clinical reality must account for it in their study design; test methods that fail to do so render inflated reliability scores.

In past DSM field trials—and in most reliability tests throughout the field of medicine—a form of testing known as “interrater” testing has been used in which two or more raters review the same patient material at the same time. This design accounts for the noise of clinician inconsistency but doesn’t account for the differences in the way patients with the same condition may present, or how the same patient may present differently on a different day; in that way, interrater testing is likely to render a reliability score that doesn’t reflect true clinical reality.

A crucial innovation in DSM-5 field trials is the exclusive use of a test-retest design that requires that the same patients be observed separately by two or more raters within an interval during which the clinical conditions of the patients are unlikely to have changed. As Kraemer noted in the January American Journal of Psychiatry, “Now the noise related to both patients and to clinicians is included, as it would be in clinical practice.”

Additionally, at least two other aspects of earlier field trials may have skewed reliability results for previous DSM editions: the inclusion of “case subjects” who are unequivocally symptomatic and “control subjects” who are unequivocally asymptomatic, thereby omitting the ambiguous middle of the population for whom diagnostic errors are the most common; and the use of expert raters.

Field trials for DSM-5 employed several features to correct these deficiencies. Among them:

  • Random selection of patients, with few exclusion criteria.

  • Use of clinicians who are not selected on the basis of any special expertise in the disorders being evaluated.

  • Application of the entire DSM-5  diagnostic system in each diagnostic evaluation, rather than focusing on one diagnosis at a time.

  • Instructions to participating clinicians that they make their diagnoses according to usual practices, not on the basis of standardized diagnostic interviews, which are rarely used in routine clinical practice.

All of these innovations are designed to test, with optimal fidelity, diagnostic criteria intended for real-world clinicians treating real-world patients. “From the beginning there has been a determination that the proposed criteria are for the use of clinicians, not for research or epidemiological uses,” Kraemer said.

William Narrow, M.D., APA associate director of research, concurred.

“The current field trials more closely reflect actual diagnostic practice by clinicians and reduce or eliminate other features that would artificially increase reliability,” he told Psychiatric News. “The resulting kappas are therefore more realistic whether lower, higher, or the same as previously found.”

More information about reliability testing in field trials is posted on APA’s DSM-5 Web site at www.dsm5.org.inline-graphic-1.gif

Interactive Graphics


Citing articles are presented as examples only. In non-demo SCM6 implementation, integration with CrossRef’s "Cited By" API will populate this tab (http://www.crossref.org/citedby.html).
Related Articles