2025 ISAKOS Biennial Congress Paper
Multicentric Development and Validation of a Fully Automated Artificial Intelligence System for Planning of Medial Open Wedge High Tibial Osteotomy on Weight-Bearing Anterior-Posterior Long Leg Radiographs
Marco-Christopher Rupp, MD, Munich, Bavaria GERMANY
Felix Lindner, cand. med., Munich GERMANY
Matthias Feucht, MD, Munich GERMANY
Lorenz Fritsch, MD, Munich GERMANY
Rüdiger von Eisenhart-Rothe, MD, MBA, Schwandorf, Bayern GERMANY
Matthias Jung, MD, Freiburg GERMANY
Maximilian F. Russe, MD, Freiburg GERMANY
Sebastian Siebenlist, MD, MHBA, Prof., Munich, Bavaria GERMANY
Nikolas Wilhelm, M.Sc. PhD, Munich GERMANY
Department of Sports Orthopaedics, Technical University of Munich, Germany, GERMANY
FDA Status Not Applicable
Summary
This multicentric study developed and validated an artificial intelligence model for fully automated high tibial osteotomy planning, demonstrating precision and reliability comparable to expert orthopedic surgeons, while significantly reducing processing time and highlighting the potential of this technology to assist in labor-intensive tasks
Abstract
Background
Preoperative planning, currently performed by orthopedic surgeons(OS), is essential for successful osteotomy, but is labor-intensive and error-prone. As such, there is significant potential in the use of automated artificial intelligence(AI) algorithms to augment the abilities of OS in managing lower extremity pathologies in a high-volume tasks that demand high precision and reliability. The aim was to develop a deep learning(DL) system capable of fully autonomous planning a high tibial osteotomy (HTO) on weight bearing anterior posterior(a.p.) long leg radiographs(LLR) and to validate clinical accuracy, reliability and processing time of the DL model compared to expert OS on multicentric test data sets (2 institutions).
Methods
A total of 594 patients (age 41.1±13.2 years, 182 female) who underwent osteotomy between 01/2010-01/2021 were retrospectively enrolled. Manual annotations for lower extremity alignment and high tibial landmarks were performed on a.p. LLR radiographs. The data set was divided into training (60%, n=399), validation (10%, n=59), and hold-out testing (30%, n=136), with an additional external dataset (n=100) for further testing. A DL system with twelve expert-networks was trained to identify relevant landmarks for HTO planning. The DL model was then tested on fully automated planning of a medial open wedge HTO based on 1) the desired postoperative intersection of the mechanical axis with the tibial plateau (%-based) and 2) the desired femorotibial axis (°-based). Accuracy, reliability, and processing time were compared with three expert orthopedic surgeons using FDA-approved preoperative planning software on the test data sets.
Results
Agreement of the annotated landmarks by the OS and the predicted landmarks by the DL model was (Sørensen-Dice coefficient: 0.94±0.02).
For preoperative alignment analysis, compared to the ground truth, human preoperative accuracy ranged from 0.13±0.10° to 1.06 ± 1.4°, reliability ranged from ICC 0.84-1.0 and clinically acceptable accuracy ranged from 72%-100 %. Preoperative accuracy of the DL model ranged from 0.22±0.18° to 1.04±1.28°, reliability ranged from ICC 0.93-1.0 and clinically acceptable accuracy ranged from 91%-100%.
For the planning of the osteotomy gap, compared to the ground truth, human accuracy ranged from 0.46±0.65 mm to 0.5±0.88 mm (%-based) and 0.43±0.55 to 0.52±0.75 mm (°-based), reliability ranged from ICC 0.9-0.96 and 0.95-0.97 and clinically acceptable accuracy was 93.6%-96.0% (%-based) and 89.8%-93.9% (°-based). For the DL model, the accuracy was 0.53±0.63 mm (%-based) and 0.44±0.53 mm(°-based), reliability was ICC 0.99 (both methods) and clinically acceptable accuracy was 98.1% (%-based) and 96.2% (°-based).
The DL-model significantly and substantially outperformed the OS in intrarater reliability (0.88 vs 1.0) and processing time for a fully automated alignment analysis (25.3 ± 0.8 seconds vs 148.7 ± 10.1 seconds, p < 0.01).
Conclusion
The developed DL model allowed for fully automated planning of a mow-HTO on a.p. LLR with precision, reliability, and robustness comparable to that of an expert OS, not failing on a single image during testing. By significantly and substantially outperforming human raters in processing time as well as repeated measurement reliability, DL models such as this yield the potential assist the orthopedic providers by accelerating and enhancing osteotomy planning in clinical practice.