Using Machine Learning to Predict Prostate Cancer Progression During Active Surveillance

Journal Published on January 3, 2022

Originally published on Urologic Oncology

Key findings

This study evaluated whether machine learning approaches could improve the prediction of prostate cancer progression compared with a traditional logistic regression model during active surveillance
All machine learning models were significantly better than logistical regression with regard to the F1 score, the primary performance metric
Compared with the machine learning models, logistic regression also had the lowest sensitivity (10.6%) and negative predictive value (72%) but it had the highest specificity (97%) and positive predictive value (62%)
This work confirms that machine learning models are superior to logistic regression for that task

Protocols for active surveillance (AS) of very-low-risk or low-risk prostate cancer generally involve serial physician visits, prostate specific antigen (PSA) testing and repeat biopsies. Although proven to reduce overtreatment, surveillance can be time-consuming for both the provider and patient, and prostate biopsy has multiple risks. In addition, some patients feel anxious about deferring intervention.

Error: Please enter a valid email address.

Email Address

To allow more personalized care, several groups have developed models to predict progression on AS using traditional statistical approaches. Madhur Nayan, MDCM, PhD, a fellow in the Combined Harvard Program in Urologic Oncology at Massachusetts General Hospital and a lecturer in the Institute for Medical Engineering and Science at Massachusetts Institute of Technology, Adam S. Feldman, MD, MPH, director of the Combined Harvard Urologic Oncology Fellowship and director of Urologic Research in the Department of Urology at Mass General, and a urologic oncologist at Mass General Cancer Center, and colleagues recently became the first to evaluate machine learning (ML) approaches.

In Urologic Oncology, the team reports the superiority of ML models to a traditional logistic regression (T-LR) model for predicting disease progression during AS.

Data Source

The data for the models came from 790 patients who were diagnosed with localized prostate cancer between 1997 and 2016 and managed with AS. All had a PSA <10 ng/mL, clinical-stage up to T2a and grade 1 disease on diagnostic biopsy. Over a median follow-up of 6.3 years, 234 patients had grade progression.

Model Development

The researchers randomly split the cohort into a training set (80% of patients) and a test set (20%). They developed the models in the training set using patient and disease characteristics measured at diagnosis (four characteristics for the T-LR model and 14 for the ML models).

The ML models were a support vector machine, random forest, two-layer artificial neural network and ML version of logistic regression (ML-LR). Given the clinical priority of correctly predicting patients with grade progression and the imbalanced nature of the dataset (a minority of patients had progression), the models were developed with clinical guidance to optimize the F1 score.

The F1 score is the harmonic mean of the positive predictive value (also known as precision) and sensitivity. It can range from 0 to 1, with 1 indicating perfect precision and sensitivity.

Primary Performance Metric

In the test set, the F1 scores were:

Support vector machine—0.59
ML-LR—0.52
Artificial neural network—0.39
Random forest—0.38
T-LR—0.18

All differences between the LR model and the ML models were statistically significant (P <0.001).

Other Metrics

Compared with the ML models, the T-LR model had the lowest sensitivity (11%) and negative predictive value (72%) but the highest specificity (97%) and positive predictive value (62%).

Despite the low sensitivity of the T-LR model, its c-statistic (0.69) was higher than that of the random forest (0.60) and artificial neural network (0.55). The support vector machine was the overall best model by F1 score (0.59), sensitivity (72%), negative predictive value (85%) and c-statistic (0.70).

Future Directions

This work did not achieve a robust model for predicting the progression of prostate cancer during AS. Still, it demonstrates the value of ML methods for that task compared with traditional statistical approaches. A likely next step will be to develop a prediction model that combines clinical data with a convolutional neural network that analyzes MRI data.

0.59

F1 score of a support vector machine (machine learning model) for predicting progression of prostate cancer during active surveillance

0.18

F1 score of a traditional logistic regression model for predicting progression of prostate cancer during active surveillance

72%

sensitivity of a support vector machine (machine learning model) for predicting progression of prostate cancer during active surveillance

11%

sensitivity of a traditional logistic regression model for predicting progression of prostate cancer during active surveillance

view original journal article Subscription may be required

Urologic Oncology

Journal Article Published: August 29, 2021

Visit the Department of Urology

Refer a patient to the Department of Urology

Editorial: MRI Promising for Prostate Cancer Screening in the Asymptomatic General Population

Commenting on a multicenter, prospective U.K. study, Susanna I. Lee, MD, PhD, and Aileen O'Shea, MBBCh, BAO, of the Department of Radiology, say it's now apparent that biparametric MRI is a promising method of community-based prostate cancer screening.

Oncology, Prostate, Prostate Cancer, Radiology

Journal April 21, 2021

Prostate Cancer Patients with Posterior Positive Surgical Margins Have Poorer Prognosis

Shulin Wu, MD, PhD, and Chin-Lee Wu, MD, PhD, of the Departments of Urology and Pathology, present evidence that prostate cancer patients who have posterior positive surgical margins (PSM) after radical prostatectomy have a more aggressive disease than those with anterior PSM.

Oncology, Prostate, Prostate Cancer, Urologic Cancer, Urology

Journal June 4, 2021