Skip to content

Using Machine Learning to Identify Suicide Risk

Key findings

  • Researchers characterized suicide risk in a general hospital population by using natural language processing to extract estimates of National Institute of Mental Health (NIMH) Research Domain Criteria (RDoC) domains from hospital discharge summaries
  • Over more than 2.2 million person-years of follow-up, there were modest but significant associations between each of five RDoC scores and the risk of suicide or accidental death
  • Higher positive valence symptom burden imparted the greatest risk, particularly during the first 1.5 years of follow-up
  • In general, risk peaked in the first 90 days and subsequently diminished, but with negative valence, there was a gradual increase in risk after one year that plateaued after two years
  • Although this approach showed feasibility for suicide risk stratification, like all other such tools, it isn't yet ready for application to clinical practice

Suicide rates continue to rise in the U.S. In 2016, suicide was the tenth most common cause of death. Identifying people at risk is an important public health priority which requires more and better risk prediction capability.

Furthermore, most research on risk factors for suicide has been done in the context of psychiatric care rather than in nonpsychiatric general medical settings. Expanding to general medical care is an opportunity for improvement as most suicide attempts occur among individuals who consulted a primary care physician rather than a psychiatrist in the month prior.

Roy H. Perlis, MD, MSc, director of the Center for Quantitative Health in the Department of Psychiatry at Massachusetts General Hospital, Thomas McCoy, MD, the center's director of research, and colleagues took a new approach. They characterized suicide risk in terms of National Institute of Mental Health (NIMH) Research Domain Criteria (RDoC) symptom burdens estimated in a general medical/surgical hospital population through natural language processing. They describe their findings in Depression & Anxiety.

Background on the RDoC

The Research Domain Criteria (RDoC), is a framework for researching mental disorders. It is constructed around major domains of human functioning, of which the researchers analyzed five:

  • Arousal/regulatory systems activate neural systems and provide appropriate homeostatic regulation of such systems as energy balance and sleep
  • Cognitive systems are responsible for attention, memory, language and other cognitive processes
  • Negative valence systems are primarily responsible for responses to aversive situations: fear, anxiety, loss, sustained threat and inability to obtain positive rewards following repeated or sustained efforts
  • Positive valence systems are primarily responsible for responses to positive motivational situations or contexts, such as reward-seeking, consummatory behavior and reward/habit learning
  • Systems for social processes mediate various types of responses in interpersonal settings, including perception and interpretation of others' actions

Unlike many previously identified risk factors for suicide, such as race and sex, RDoC domain-associated symptom burdens could vary over time and are potentially modifiable. In this paper the risk attributable to cross-sectional RDoC symptom estimates were shown to associate with time-varying risk. The researchers hope time-varying risk and the potential for time-varying scores will contribute to the development of more impactful suicide risk stratification tools.

Study Design

The team retrospectively examined discharge notes on 444,317 adults, ages 18 to 90 years old, who were admitted to one of two large New England hospitals between 2005 and 2013. They applied a technique, which was recently developed and validated by Dr. Perlis's and Dr. McCoy's team at Mass General and published in Biological Psychiatry, in which natural language processing based on prior machine learning (a form of artificial intelligence) is used to extract estimates of RDoC dimensions from narrative clinical notes.

Recognizing that suicide cannot always be coded reliably, the researchers made suicide or accidental death the primary outcome. The median follow-up was 1,793 days, representing 2,262,588 person-years of follow-up. During that time there were 1,982 suicides or accidental deaths.

Key Results

  • There were modest but significant associations between the risk of suicide or accidental death and RDoC scores for each of the five domains
  • Higher positive valence symptom burden imparted the greatest risk, particularly during the first 1.5 years of follow-up
  • In general, risk peaked in the first 90 days, then diminished, with the rate of decline varying for different domains
  • Negative valence was an exception to this pattern: Risk gradually increased during the first year after hospitalization and plateaued two years after

Implications for Intervention

Because the identified risk changed over time, the research team suspects that patient-specific risk reduction strategies could be developed targeting short, intermediate and longer-term risk as distinct opportunities for intervention.

They caution, however, that although their approach shows feasibility for suicide risk stratification, it, like other prediction tools, isn't yet ready for application to clinical practice.

Learn more about the Center for Quantitative Health

Explore research in the Mass General Department of Psychiatry


By monitoring people via smartphone several times per day, Mass General researchers became the first to show that there are distinct patterns of suicidal thinking, which may have implications for suicide prevention efforts.


Concerned by results from a nationwide survey, a Massachusetts General Hospital psychiatrist and colleagues call for campuses to help mitigate student stress, with a special focus on transgender students and those who are racial/ethnic, gender or sexual minorities.