Predicting Ankle Injury Severity in High School Football Players: A Machine Learning Approach

Predicting Ankle Injury Severity in High School Football Players: A Machine Learning Approach

Rohan M Shah, BA, UNITED STATES Dillan Prasad, MS, UNITED STATES Ravi Ameet Patel, BA, UNITED STATES Nicholas D'ambrose, BA, UNITED STATES Christy Collins, PhD, UNITED STATES Michael Terry, MD, UNITED STATES Vehniah K. Tjong, MD, FRCSC, UNITED STATES

Northwestern University Feinberg School of Medicine, Chicago, Illinois, UNITED STATES


2025 Congress   ePoster Presentation   2025 Congress   Not yet rated

 

Anatomic Location

Diagnosis / Condition

Patient Populations

Sports Medicine


Summary: Machine learning models were successfully mapped onto a national high school injury database to predict severity of ankle injury in high school football players, with feature importance analysis identifying the most predictive variables.


Purpose

High school American football players are predisposed to lower extremity injury, including of the ankle. Severe ankle injury in developing years can not only impact the near present for these athletes, but also their injury risk far into the future. Prevention of these injuries is paramount to improving health outcomes of young athletes. We aim to understand the factors correlating with severe ankle injury through broad data analysis. This study harnesses machine learning models to predict severity of ankle injury in high school football players through analysis of a national, high school sports injury database.

Methods

The High School Reporting Information Online (RIO) database was searched for all ankle injuries due to high school football. The nationwide database documents high school athletic exposures and injuries from 2005-2020. Sixteen predictor variables were included, consisting of injury type and assessment, patient demographics, treatment type, and football-specific descriptors. Simple numerical zeroing was used if a variable was missing data. The outcome of interest was ankle injury severity, defined as return to sport >22 days or medical disqualification for the season. Four ML algorithms (Logistic Regression [LR], Random Forest [RF], Support Vector Classifier (SVC), and eXtreme Gradient Boosting [XGBoost]) were created, with model performance analyzed through Area Under the Curve (AUC) statistic. For the top-performing model, a feature importance analysis using SHAP scores was utilized.

Results

A total of 4,999 ankle injuries were included, with an average age of 16.05 years (SD = 2.01), height of 69.8 inches (SD = 3.5), and weight of 184.2 pounds (SD = 39.2). The highest performing ML model was RF (AUC: 0.852), followed closely by XGBoost (AUC: 0.85). LR (AUC: 0.827) and SVC (AUC: 0.808) rounded out the group. Feature importance analysis was conducted on RF and found the most influential variables to be ligament tear severity of 3 (coefficient: 2.813) and assessment method of x-ray (coefficient: 2.109). Other important features included weight of the athlete (national weight of 185.15 [coefficient: 1.669] and national weight of 237.62 [coefficient: 1.44]) and assessment method of surgery (coefficient: 1.64). Regarding football specific features, stepping on the team mat (coefficient: 0.9358) and playing in punt coverage (coefficient: 0.8904) had the highest influence in predicting severity of ankle injury.

Conclusions

Machine learning models can be successfully used to predict ankle injury severity in high school boys' football. Specifically, feature importance analysis found ligament tear severity, x-ray assessment, and athlete weight as the top predictors of severe ankle injury.