2025 ISAKOS Biennial Congress ePoster
Predicting Ankle Injury Severity in High School Football Players: A Machine Learning Approach
Rohan M Shah, BA, Chicago, IL UNITED STATES
Dillan Prasad, MS, Glencoe, IL UNITED STATES
Ravi Ameet Patel, BA, Scottsdale, AZ UNITED STATES
Nicholas D'ambrose, BA, Chicago, IL UNITED STATES
Christy Collins, PhD, Chicago UNITED STATES
Michael Terry, MD, Chicago, IL UNITED STATES
Vehniah K. Tjong, MD, FRCSC, Chicago, IL UNITED STATES
Northwestern University Feinberg School of Medicine, Chicago, Illinois, UNITED STATES
FDA Status Not Applicable
Summary
Machine learning models were successfully mapped onto a national high school injury database to predict severity of ankle injury in high school football players, with feature importance analysis identifying the most predictive variables.
Abstract
Purpose
High school American football players are predisposed to lower extremity injury, including of the ankle. Severe ankle injury in developing years can not only impact the near present for these athletes, but also their injury risk far into the future. Prevention of these injuries is paramount to improving health outcomes of young athletes. We aim to understand the factors correlating with severe ankle injury through broad data analysis. This study harnesses machine learning models to predict severity of ankle injury in high school football players through analysis of a national, high school sports injury database.
Methods
The High School Reporting Information Online (RIO) database was searched for all ankle injuries due to high school football. The nationwide database documents high school athletic exposures and injuries from 2005-2020. Sixteen predictor variables were included, consisting of injury type and assessment, patient demographics, treatment type, and football-specific descriptors. Simple numerical zeroing was used if a variable was missing data. The outcome of interest was ankle injury severity, defined as return to sport >22 days or medical disqualification for the season. Four ML algorithms (Logistic Regression [LR], Random Forest [RF], Support Vector Classifier (SVC), and eXtreme Gradient Boosting [XGBoost]) were created, with model performance analyzed through Area Under the Curve (AUC) statistic. For the top-performing model, a feature importance analysis using SHAP scores was utilized.
Results
A total of 4,999 ankle injuries were included, with an average age of 16.05 years (SD = 2.01), height of 69.8 inches (SD = 3.5), and weight of 184.2 pounds (SD = 39.2). The highest performing ML model was RF (AUC: 0.852), followed closely by XGBoost (AUC: 0.85). LR (AUC: 0.827) and SVC (AUC: 0.808) rounded out the group. Feature importance analysis was conducted on RF and found the most influential variables to be ligament tear severity of 3 (coefficient: 2.813) and assessment method of x-ray (coefficient: 2.109). Other important features included weight of the athlete (national weight of 185.15 [coefficient: 1.669] and national weight of 237.62 [coefficient: 1.44]) and assessment method of surgery (coefficient: 1.64). Regarding football specific features, stepping on the team mat (coefficient: 0.9358) and playing in punt coverage (coefficient: 0.8904) had the highest influence in predicting severity of ankle injury.
Conclusions
Machine learning models can be successfully used to predict ankle injury severity in high school boys' football. Specifically, feature importance analysis found ligament tear severity, x-ray assessment, and athlete weight as the top predictors of severe ankle injury.