Skip to content

Machine Learning Models Identify Patients at High Risk of Prolonged Hospital Stay After Primary THA

Key findings

  • To make bundled payment models for total hip arthroplasty (THA) more cost-effective, there's a need to evaluate how patient-related factors influence the risk of prolonged length of stay, a principal driver of costs for an episode of care
  • In this study, four machine learning models were applied to predict prolonged length of stay after primary THA using a large national data set of 246,265 patients
  • All models showed consistently excellent performance in terms of discrimination, calibration, and clinical utility across the training and testing sessions
  • Age, transfusion, body mass index, operation time, and certain preoperative laboratory results (hematocrit, platelet count, and white blood cell count) were the best predictors of a prolonged stay; many of these factors could be optimized before surgery

The Centers for Medicare & Medicaid Services have asked the American Medical Association to reevaluate the bundled payment model for primary total hip arthroplasty (THA). The current payment model applies a flat rate for THA without adjustment for variability in patient-specific risks.

Length-of-stay (LOS) is a principal driver of the costs for an episode of primary THA care. To make bundled payment models more cost-effective, it's imperative to evaluate how patient-related factors influence the risk of prolonged LOS.

Tony Lin-Wei Chen, MD, PhD, a research fellow in the Bioengineering Laboratory of the Department of Orthopaedic Surgery, Young-Min Kwon, MD, PhD, vice chair of the Department and director of the Lab, and colleagues have developed machine learning algorithms, a form of artificial intelligence, that accurately predict prolonged LOS after primary THA. They detail their results in The Journal of Arthroplasty.


Using data from the American College of Surgeons National Surgical Quality Improvement Program for 2013 to 2020, the team evaluated records of 246,265 patients who underwent primary, non-emergent THA. 81,889 had prolonged LOS, defined as a stay exceeding the 75th percentile (three days).

Potential predictors of LOS (age, sex, body mass index, ethnicity, comorbidities, American Society of Anesthesiologists classification, white blood cell count, hematocrit, platelet count, operation time, transfusion, and concurrent surgical procedures) were used to construct four machine learning models: an artificial neural network, a random forest model, histogram-based gradient boosting, and k-nearest neighbor.

Model Performance

All models showed excellent prediction accuracy during testing:

  • Discrimination, determined by the area under the receiver operating curve (AUC)—ranged from 0.72 to 0.74 (values >0.7 are considered good in clinical settings)
  • Calibration slope, a measure of whether the predictor effects in the training and test set are the same—ranged from 0.83 to 1.04 (a perfect value is 1)
  • Calibration intercept, a measure of whether the model is over- or underestimating the predictor probabilities—ranged from −0.01 to 0.11 (a perfect value is 0)
  • Brier score of overall model performance—ranged from 0.184 to 0.192 (values closer to zero indicate better performance)

The artificial neural network performed best with an AUC of 0.73, calibration slope of 0.99, calibration intercept of −0.01, and Brier score of 0.185.

Model Interpretability

The strongest predictors of prolonged LOS were age >65 years, transfusion after surgery, operation time >84 minutes, BMI >30.4 kg/m2, preoperative hematocrit <40.2%, preoperative platelet count <242 thousand/mm3, and preoperative white blood cell count >6.8 thousand/mm3.

The excellent prediction performance of machine learning models demonstrated their capacity to identify patients at risk of prolonged LOS. Many factors contributing to prolonged LOS can be optimized to minimize hospital stay for high-risk patients. These results indicate the potential of machine learning to aid discharge planning, and they support a shift to a risk-based reimbursement model.

Learn more about the Bioengineering Laboratory at Mass General

Refer a patient for Hip Arthroplasty at Mass General

Related topics


Researchers in Massachusetts General Hospital's Bioengineering Lab found that among patients undergoing revision THA who had severe tissue necrosis and abductor muscle deficiency, none who received a dual mobility implant experienced dislocation, versus 16% of those who received conventional liners.


Young-Min Kwon, MD, PhD, and colleagues in the Bioengineering Laboratory developed and validated a convolutional neural network that is 96% accurate or more when identifying arthroplasty implants from plain radiographs, whether primary or revision and whether implanted in the hip or knee.