Summary
Machine learning was successfully used to predict severity of shoulder injuries in a cohort of high school football athletes.
Abstract
Introduction
Severe injuries in high school sports can present substantial risks, including physical and emotional consequences, as well as the disruption of young athletic careers. In the context of high school football, injuries are common and can occasionally cause long-term damage. Identifying sport-specific factors that predispose severe injuries can inform regulatory changes to reduce adverse events, identify players at elevated risk, and inform post-injury expectations. The present study aims to apply machine and deep learning techniques to predict shoulder injury severity in a national, high school sports injury database.
Methods
The High School Reporting Information Online (RIO) was queried for all shoulder injuries sustained by athletes playing boys football. High School RIO is a national, deidentified database that has captured athletic exposures and injuries since the 2005 - 2006 academic year. Injury severity was dichotomized by time needed to return to sport (RTS), with prolonged RTS being ≥22 days, including medical disqualification for the season. A total of 14 predictors encompassing demographic information, injury setting, and sports-specific factors were included. Several ML algorithms (balanced random forest [RF], elastic-net regression [LR], and gradient boosted tree [GBM]) and one neural net (NN) were created. Model performance was measured using the Area Under the Receiver Operating Curve (AUC) statistic. Additionally, a feature importance analysis using SHAP scores was conducted for the top performing model.
Results
A total of 2,405 patients were included in this study, with an average age of 16.1 years (SD: 1.2), height of 70.0 inches (SD: 3.9), and weight of 182.2 pounds (SD: 54.4). A total of 355 athletes (14.8%) experienced a prolonged return to sport. The GBM had the best performance (AUC: 0.61 ± 0.01), followed by the ENet (AUC: 0.60 ± 0.01), RF (AUC: 0.60 ± 0.02), and NN (AUC: 0.57 ± 0.02). On feature importance analysis, level of play (varsity, junior varsity, sophomore, or freshman), weight, and recurrent injury status were distinctively the top three predictors. Age and height were also ranked in the top five predictors.
Conclusions
Machine learning was successfully used to predict RTS in high school boys football shoulder injuries. Additionally, a feature importance analysis identified level of play, athlete weight, and recurrent injury status as the top predictors of prolonged RTS.