Skip to content

Machine Learning to Identify Risk Stratification Variables After Acute Coronary Syndrome

Key findings

  • In cardiology and other fields of medicine, there is a great need to accurately identify high-risk subgroups that are missed by traditional risk stratification metrics
  • In this study, a machine learning technique identified 19 clinical features that predict six-month mortality after acute coronary syndrome, including 11 features not included by experts when deriving the GRACE score
  • A data imputation method was used to impute missing clinical variables, therefore enabling risk assessment when only partial information is available about a patient
  • The resulting risk model showed better performance than the GRACE score, most notably for patients who were classified as low risk using the method

A risk score such as the Global Registry of Acute Coronary Events (GRACE) or Thrombolysis in Myocardial Infarction (TIMI) is usually assigned after acute coronary syndrome to ensure the patient gets appropriate therapy. Unfortunately, many adverse events have occurred in patients who were not classified as being at high risk using these conventional metrics.

Collin M. Stultz, MD, PhD, cardiologist in the Cardiology Division at Massachusetts General Hospital, and colleagues suspected the problem lay in how traditional risk scores are developed. For example, the GRACE dataset—an international registry derived from 94 hospitals—contains over 1,400 clinical variables. The GRACE score was created by pruning that list to about 50 variables, based on a review of medical literature and clinical experience.

In Nature: Scientific Reports, Dr. Stultz and his colleagues report a new, more objective technique for discovering clinical features that have prognostic value. It can accommodate a large number of features, does not rely on expert knowledge and is better able to identify high-risk subgroups among patients who would be identified as low risk using traditional scoring systems.

A Machine Learning Method

The researchers applied bootstrap LASSO regression (BLR), a machine learning method for selecting important variables, to the ACS cohort in the GRACE study. Their objective was to identify a set of features that identify patients at high risk of death six months after presenting with an acute coronary syndrome.

The analysis was restricted to clinical features that are available within the first 24 hours after presentation to the hospital. There were 198 such features in the registry and 15,534 patients in GRACE had values for all 198 features. The researchers used 80% (12,428) of those patients for the BLR analysis and left the other 20% (3,106) as a holdout set. The two sets of patients had the same mortality rate.

Feature Selection

BLR identified 19 of the 198 features as being most predictive. Of those, eight are also part of the GRACE score: age, systolic blood pressure, pulse, Killip class, cardiac arrest, ST-segment deviation, initial creatinine and initial positive enzymes.

The 11 features not part of the GRACE score were admission weight; history of heart failure, peripheral artery disease or renal insufficiency; chronic warfarin use; and use of the following before hospitalization or within the first 24 hours: statin, diuretic, insulin, intravenous inotropic agent or oral or intravenous beta-blocker.

The researchers then refined the BLR model using a technique known as ridge logistic regression (RLR) and tested it on the holdout set of patients. A prognostic model that was developed using only 19 features performed similarly to a model developed using all 198 features.

Data Imputation

Traditional risk scores can be used only when all of the input variables are known. To sidestep that problem, the researchers used a data imputation technique to estimate values for any of the 19 features that a patient's record was missing. They named their final model "RLR with a variable number of inputs" (RLRVI).

RLRVI Performance

The team constructed a development set of the 43,063 patients in GRACE who had values for all 19 features identified by BLR. The RLRVI model correctly classified more patients than the GRACE score, both overall and in subsets of patients with ST-elevation myocardial infarction (STEMI), non-STEMI or unstable angina.

Most notably, for patients who fell within the lowest 14% of risk (GRACE score ≤87), the RLRVI model was significantly better than the GRACE score at predicting six-month mortality.

The RLRVI model performed better than the GRACE score even when data imputation was used for any subset of the non-GRACE features.

Validation of the RLRVI Model

The team also constructed a validation set of 6,363 patients who had data available for all eight GRACE score features but were not part of the development set or the cohort used to develop the original GRACE risk score. For these patients, the RLRVI model showed better prognostic ability than the GRACE score and offered a statistically significant improvement in all hazard ratios except in the subgroup with unstable angina.

Likewise, in the lowest-risk patients, the RLRVI model offered improved hazard ratios in all patients except those with unstable angina. This new approach to developing risk stratification metrics should be important not only in cardiology but in other fields of medicine plagued by the "low risk–high number of deaths" phenomenon.

Learn more about the Cardiology Division at Mass General

Refer a patient to the Corrigan Minehan Heart Center


Based on the first in vivo study of healed plaques in patients who have had acute coronary syndrome (ACS), cardiologists at Massachusetts General Hospital warn that these plaques frequently show features of vulnerability to future major cardiac events.


Judy Wei-Ming Hung, MD, director of Echocardiography in the Division of Cardiology, is focused on the diagnosis and treatment of mitral regurgitation. In this video, she discusses an algorithm for approaching and diagnosing the condition that she developed as part of a recent set of joint guidelines for severe aortic stenosis.