2023 ISAKOS Biennial Congress Paper
Limited Clinical Utility Of A Preliminary Machine Learning Revision Prediction Model Based on A National Hip Arthroscopy Registry
R. Kyle Martin, MD, FRCSC, St. Cloud, MN UNITED STATES
Solvejg Wastvedt, BA, Minneapolis UNITED STATES
Jeppe Lange, MD, PhD, Aarhus DENMARK
Ayoosh Pareek, MD, New York, NY UNITED STATES
Julian Wolfson, PhD, Minneapolis, MN UNITED STATES
Bent Lund, MD, Horsens DENMARK
University of Minnesota, Minneapolis, MN, UNITED STATES
FDA Status Not Applicable
Summary
Machine learning analysis of the Danish Hip Arthroscopy Registry produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness.
Abstract
Purpose
Several risk factors associated with revision hip arthroscopy surgery have been identified, however, the ability to translate these pre-operative factors into a specific risk score is poor. A clinical tool to estimate a patient’s individual risk of having subsequent revision hip arthroscopy would be a valuable adjunct for the surgeon to guide discussions regarding surgical decision making and expectations. Machine learning has the potential to improve our predictive capability through analysis of large clinical datasets. These models can identify factors associated with outcome and use these factors to formulate prospective predictive algorithms. The purpose of this study was to determine if machine learning analysis of the Danish Hip Arthroscopy Registry (DHAR) can develop a clinically meaningful calculator for predicting the probability of a patient undergoing subsequent revision surgery following primary hip arthroscopy.
Methods
Machine learning analysis was performed on the DHAR. The primary outcome for the models was probability of revision hip arthroscopy within 1, 2, and/or 5 years after primary hip arthroscopy. Data was split randomly into training (75%) and test (25%) sets. Four models intended for this type of data were tested: Cox elastic net, random survival forest, gradient boosted regression (GBM), and super learner. These four models represent a range of approaches to statistical details like variable selection and model complexity. Model performance was assessed by calculating calibration and area under the curve (AUC). Analysis was performed using only variables available in the pre-operative clinical setting and then repeated to compare model performance using all variables available in the registry.
Results
In total, 5,581 patients were included for analysis. Average follow-up time or time-to-revision was 4.25 (±2.51) years and overall revision rate was 11%. All four models were generally well calibrated and demonstrated concordance in the moderate range when restricted to only preoperative variables (0.62-0.67), and when considering all variables available in the registry (0.63-0.66). The 95% confidence intervals for model concordance were wide for both analyses, ranging from a low of 0.53 to a high of 0.75, indicating uncertainty about the true accuracy of the models.
Conclusion
The association between pre surgical factors and outcome following hip arthroscopy is complex. Machine learning analysis of the DHAR produced a model capable of predicting revision surgery risk following primary hip arthroscopy that demonstrated moderate accuracy but likely limited clinical usefulness due to the wide confidence interval. Prediction accuracy would benefit from enhanced data quality within the registry and this preliminary study holds promise for future model generation as the DHAR matures. Ongoing collection of high-quality data by the DHAR should enable improved patient-specific outcome prediction that is generalizable across the population. While the results from this preliminary study are not suitable for immediate clinical application, it should serve as a baseline for future outcome prediction studies applying machine learning to large hip arthroscopy datasets. Additionally, there is optimism regarding the future development of patient-specific revision risk estimation if data collection can be improved.
Level of Evidence: Level-III