Artificial Intelligence Feasible for Predicting Malignant Transformation of Oral Lesions
Key findings
- Otolaryngologists at Mass Eye and Ear developed machine learning models that are 80% accurate in predicting malignant transformation in patients with a variety of oral lesions
- In the main dataset of 2,192 patients with the term "dysplasia" in a pathology report and at least one diagnosis code related to an oral lesion, the best-performing model had predictive accuracy of 80% and an area under the curve of 0.86
- When multiple models were trained on a subset of 1,232 patients with biopsy-proven dysplasia, a random forest model had predictive accuracy of 71% and an area under the curve of 0.75
- Dysplasia grade, history of multiple lesions, year of diagnosis, low hemoglobin, histopathologic description of atypia and history of other cancers were the most influential features for predicting malignant transformation
Oral cavity squamous cell carcinoma is often preceded by a precursor lesion, and the presence and severity of dysplasia in biopsy specimens is a critical risk factor for malignant transformation. However, it can be challenging to determine which patients are at the highest risk for malignant transformation.
Subscribe to the latest updates from Otolaryngology Advances in Motion
Now, researchers in the Department of Otolaryngology–Head and Neck Surgery at Mass Eye and Ear have trained machine learning models to predict which dysplastic oral lesions may progress to malignancy based on information from electronic health records. Michael P. Wu, MD; Matthew G. Crowson, MD, MPH, MASc; Mark A. Varvares, MD, FACS; and a colleague report in The Laryngoscope.
Methods
The team used clinical, demographic, laboratory and pathology data from a centralized registry that compiles information from multiple hospitals within the Mass General Brigham health system. More than 4.6 million patients are represented, with the earliest records dating to 1986.
On September 23, 2021, the researchers queried the registry and identified:
- Overall dataset—2,192 patients with the term "dysplasia" in a pathology report and at least one diagnosis code related to an oral lesion, of whom 34% progressed to malignancy
- Dysplasia subset—1,232 patients with a pathology-confirmed diagnosis of either dysplasia or submucosal fibrosis, of whom 54% progressed to malignancy
- No-dysplasia subset—960 patients with dyskeratosis/hyperkeratosis (43%), atypia (22%), inflammation (49%), lichen planus/lichenoid features (13%) and/or fungal forms (9%), of whom 10% progressed to malignancy
After exploratory data analysis, 35 features were included in machine learning model development. Separate experiments were performed on the overall dataset and the dysplasia subset. In each experiment, the dataset under study was divided randomly into 95% for model development (70% training, 30% validation within the development set) and 5% for testing of model performance.
Overall Dataset
A gradient boosting classifier model performed best in the 5% holdout dataset of all oral lesions. It was 80% accurate in predicting malignant transformation and the area under the receiver operator characteristic curve was 0.86. The presence of dysplasia was the most influential feature in predicting progression to malignancy.
Dysplasia Subset
The best-performing models in the dysplasia subset—logistic regression, random forest and linear discriminant analysis models—were each 73% accurate in the training phase.
The researchers used the random forest model to explore the relative influences of the 35 features on risk of malignant transformation. Dysplasia grade was the most important, followed by history of multiple lesions. The next most influential features were year of diagnosis, low hemoglobin, description of atypia on the histopathology report and history of other cancers.
Inflammation, tongue subsite and fungal forms were some of the other influences. Active smoking and active drinking status had a slight inverse relationship to malignant transformation, and neither sex nor race was strongly predictive.
When the random forest model was tested on the 5% holdout testing set, its accuracy was 71% and the area under the curve was 0.75.
Clinical Applications
In addition to being of interest for development of prediction models, these study results can guide clinical practice:
- The 10% malignant transformation rate in the no-dysplasia subset supports recent reports that some oral lesions previously thought to have limited malignant potential, such as non-reactive oral keratoses, may represent an early form of dysplasia
- The fact that atypia, inflammation and fungi were predictive in the dysplasia subset emphasizes that risk of malignant transformation should not be underestimated in lesions that may appear reactive but harbor dysplasia
view original journal article Subscription may be required
Learn more about the Department of Otolaryngology–Head and Neck Surgery at Mass Eye and Ear
Refer a patient to Mass Eye and Ear/Mass General Brigham