Data-Centric AI for predicting non-contact injuries in professional soccer players

  • Matheus Melo Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Matheus Maia Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Gabriel Padrão Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Diego Brandão Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Eduardo Bezerra Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Juliano Spineti Fluminense Football Club
  • Lucas Giusti Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)
  • Jorge Soares Centro Federal de Educação Tecnológica Celso Suckow da Fonseca (CEFET/RJ)


One big concern in soccer professional teams is to search for preventive measures to reduce the frequency of harmful episodes in their athletes since these episodes greatly impact the sports industry and affect both the team’s performance and the association’s economic situation. Thus, the present work proposes a methodology to predict non-contact injury episodes that may affect them in a microcycle through Data-centric AI concepts. The prediction model is trained using a dataset related to professional soccer athletes. The most interesting result were with AUC-ROC of 79,8%. About the performance improvement strategies applied, the best undersampling ratio was 70/30, PCA with one or two principal components did best, and the Decision Tree algorithm excelled.
Palavras-chave: Professional soccer, Injury prediction, Machine learning, Sports injuries, Data Science, Data-Centric AI


