Uneasy lies the head that wears the crown, and a long-reigning monarch of
psychiatric measurement may well see pitchforks and torches approaching the
The Hamilton Depression Rating Scale became the de facto gold standard for
calibrating depression severity in the decades after its development by
British psychiatrist Max Hamilton in 1960. Psychiatrists need a reliable tool
to measure the effects of treatment in both the research lab and in the
clinic, but many have questioned the utility of the Hamilton in recent
"Everyone knows there's a problem with the Hamilton Depression scale,
but what's better?" asked R. Michael Bagby, Ph.D., of the Center for
Addiction and Mental Health in Toronto.
"The aim of psychiatric treatment is remission, but you don't know if
the patient is in remission unless you have a scale to measure signs and
symptoms and their severity before and after treatment," said A. John
Rush, M.D., of the University of Texas Southwestern Medical Center in
Bagby and four colleagues conducted a systematic review of 70 studies of
the Hamilton scale. They found it somewhat reliable but"
psychometrically and conceptually flawed," they wrote in the
December 2004 American Journal of Psychiatry.
Support for his team's analysis came in part from Eli Lilly and Co., as
well as the Ontario Mental Health Foundation and the Michael Smith Foundation
for Health Research. Besides clinicians and researchers, the pharmaceutical
industry and regulators would like to find a reliable, widely accepted scale,
one that would clarify for all parties how well a drug was working.
"There's so much error in these instruments that medicines are not
getting into the marketplace," said Bagby.
The researchers based their review on item-response theory, which calls for
maximizing the selection of items that are most sensitive to change.
Psychometric theory may sound like dry stuff, but it's very important, said
Bagby, whose background is in statistics and test construction. Applying item
response theory to the Hamilton scale revealed weaknesses at every level, he
"The Hamilton depression scale's internal reliability is adequate,
but many scale items are poor contributors to the measurement of depression
severity," wrote the researchers. "Others have poor interrater and
To be useful, each item on a scale should measure just one symptom and rate
higher or lower amounts of that symptom. Listing unrelated elements as part of
the same question only confounds the outcome. For instance, the general
somatic symptoms item encodes "feelings of heaviness,""
diffuse backache," and "loss of energy"—hardly
steps along a continuum of severity, said Bagby.
Inclusion of psychotic symptoms may violate these parameters too.
"A patient with guilt-themed hallucinations may be more severely ill
than a patient who has nonpsychotic guilt feelings, but is he or she feeling
more guilt?" he asked.
A further problem arises when some items can be scored with more possible
points than others. For instance, feeling tired all the time contributes two
points to the general somatic symptoms item, while weeping all the time may
contribute three or four points to the depressed mood item. This gives more
weight to weepiness than sleepiness.
Yet another question is whether the Hamilton is showing its age. Can a test
designed 45 years ago accurately reflect current standards defining
depression? Several items on the Hamilton scale (like loss of insight or
hypochondriasis) are not among DSM-IV diagnostic criteria for
depression, while some DSM-IV features (like weight gain or
oversleeping) are not on the Hamilton.
Attempts have been made in recent decades to improve the Hamilton scale.
The GRID-HAMD scale (2002) revised questions on the Hamilton scale but kept
the original 17 items, retaining their discontinuities with DSM-IV
definitions of depression, said Bagby.
Giving the Hamilton in a structured interview has improved reliability,
said Rush, but doesn't fix the problem of differentially weighted items."
So a scale total is not as valid as it could be."
A more radical solution is needed, Bagby argues.
"It is time to retire the Hamilton depression scale," he said,
rather than merely revise it.
"The Hamilton has seen better days and should be replaced,"
agreed Rush. "Psychiatry would be better served by a rating scale that
measured the signs and symptoms of the syndrome and also their
Alternatives do exist. The Montgomery-Asberg Depression Rating Scale
(MADRS), published in 1979, has achieved some acceptance. Rush and colleagues
published the Inventory of Depressive Symptoms (IDS) in 1985 and have been
refining and validating it ever since.
The IDS and its short version, the Quick IDS (QIDS), capture all nine
DSM-IV criteria for depression, said Rush. All items are scored with
the same three-point severity scale. It picks up common, associated symptoms
and can be used in clinical practice as well as in research. "Doctors
can use the same scale in practice that they read about in journals," he
A self-report version that patients fill out compares well with the IDS and
demands less clinician time, said Rush.
APA is working with the American College of Physicians and the American
Academy of Family Practitioners to develop a nine-item Patient Health
Questionnaire (the PHQ-9) with the support of Pfizer Inc. Severity is graded 0
("not at all") to 3 ("nearly every day"). A simple
chart interprets provisional diagnoses and treatment recommendations for
primary care physicians.
The Depression Inventory Development team, a Toronto-based consortium of 14
pharmaceutical companies and representatives from academia, is working on a
new depression rating scale.
Even regulators appear open to new standards.
"The HAMD has been the most widely used depression instrument for
many years; however, we have always been willing to accept other valid
instruments, and in fact some programs now use the MADRS," said a
spokesperson for the Food and Drug Administration in a statement. "We
have also indicated that we would accept the IDS."
"Dr. Bagby's research has important implications for the future of
DSM and how instruments like this should be considered," said
Darrel Regier, M.D., M.P.H., director of APA's Office of Research and
executive director of the American Psychiatric Institute for Research and
Education. "We must pay attention to how to characterize severity and
how to link measurement to diagnostic criteria. Then we would be better
prepared to study the accuracy of diagnosis and medical treatment."
The article, "The Hamilton Depression Rating Scale: Has the
Gold Standard Become a Lead Weight?," is posted online at<http://ajp.psychiatryonline.org/cgi/content/full/161/12/2163>.
More information on the IDS is posted at<www.edc.gsph.pitt.edu/stard/public/idsqids.html>.▪
Am J Psychiatry20041612163