Exploring the Impact of GitHub Actions on Pull Request Reviews in Machine Learning Projects

João Helis Bernardo; Daniel Alencar da Costa; Sérgio Queiroz de Medeiros; Uirá Kulesza

doi:10.5753/sbes.2025.10766

João Helis Bernardo UFRN / IFRN
Daniel Alencar da Costa University of Otago
Sérgio Queiroz de Medeiros UFRN
Uirá Kulesza UFRN

DOI: https://doi.org/10.5753/sbes.2025.10766

Resumo

Continuous Integration (CI) tools like GitHub Actions were originally designed to streamline development workflows in traditional software systems by automating tasks such as building and testing, which has proven beneficial in improving review efficiency. However, ML projects present additional complexities—such as non-determinism, challenging testing processes, and longer build durations—that may limit the effectiveness of CI in supporting efficient reviews. Given these unique challenges, it is essential to reassess how CI tools impact the review process within ML contexts. This study empirically investigates the impact of GitHub Actions on PR review dynamics across 55 GitHub-based ML projects, focusing on metrics such as time to close a PR (i.e., PR latency), PR churn, comments, and PR submission frequency. Using a Regression Discontinuity Design (RDD), we analyze PR data from 12 months before and after the adoption of GitHub Actions. Our results show that GitHub Actions does not significantly reduce PR review times in ML projects, with factors such as PR churn and backlog size playing a larger role in influencing review efficiency. Additionally, rejected PRs were characterized by higher churn and more extensive discussions. These findings suggest that, while CI tools automate repetitive tasks and reduce manual workload, they may not fully address the unique demands of ML project reviews. We provide practical recommendations to enhance review efficiency in ML workflows, including strategies for incremental PR submissions and optimized backlog management.

Palavras-chave: continuous integration, machine learning, github actions, software engineering for ML, review efficiency

Referências

Anders Arpteg, Björn Brinne, Luka Crnkovic-Friis, and Jan Bosch. 2018. Software engineering challenges of deep learning. In 2018 44th euromicro conference on software engineering and advanced applications (SEAA). IEEE, 50–59.

João Helis Bernardo, Daniel Alencar da Costa, and Uirá Kulesza. 2018. Studying the impact of adopting continuous integration on the delivery time of pull requests. In Proceedings of the 15th International Conference on Mining Software Repositories. 131–141.

João Helis Bernardo, Daniel Alencar da Costa, Uirá Kulesza, and Christoph Treude. 2023. The impact of a continuous integration service on the delivery time of merged pull requests. Empirical Software Engineering 28, 4 (2023), 97. DOI: 10.1007/s10664-023-10327-6

João Helis Bernardo, Daniel Alencar da Costa, Sergio Queiroz de Medeiros, and Uirá Kulesza. 2024. How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions. In Proceedings of the 21th International Conference on Mining Software Repositories.

Nathan Cassee, Bogdan Vasilescu, and Alexander Serebrenik. 2020. The silent helper: the impact of continuous integration on code reviews. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 423–434.

Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological bulletin 114, 3 (1993), 494.

Thomas D Cook and D T Campbell. 1979. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Houghton Mifflin.

Elizamary de Souza Nascimento, Iftekhar Ahmed, Edson Oliveira, Márcio Piedade Palheta, Igor Steinmacher, and Tayana Conte. 2019. Understanding development process of machine learning systems: Challenges and solutions. In 2019 acm/ieee international symposium on empirical software engineering and measurement (esem). IEEE, 1–6.

Tapajit Dey and Audris Mockus. 2020. Effect of technical and social factors on pull request quality for the npm ecosystem. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–11.

Wagner Felidré, Leonardo Furtado, Daniel A da Costa, Bruno Cartaxo, and Gustavo Pinto. 2019. Continuous integration theater. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1–10.

Guilherme Freitas, João Helis Bernardo, Gustavo SizíLio, Daniel Alencar Da Costa, and Uirá Kulesza. 2023. Analyzing the Impact of CI Sub-practices on Continuous Code Quality in Open-Source Projects: An Empirical Study. In Proceedings of the XXXVII Brazilian Symposium on Software Engineering. 1–10.

A. Gałecki and T. Burzykowski. 2013. Linear Mixed-Effects Models Using R: A Step-by-Step Approach. Springer New York. [link]

Georgios Gousios, Martin Pinzger, and Arie van Deursen. 2014. An exploratory study of the pull-based software development model. In Proceedings of the 36th international conference on software engineering. 345–355.

Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering. 426–437.

Ray Hyman. 1982. Quasi-experimentation: Design and analysis issues for field settings (book). Journal of Personality Assessment 46, 1 (1982), 96–97.

Guido W Imbens and Thomas Lemieux. 2008. Regression discontinuity designs: A guide to practice. Journal of econometrics 142, 2 (2008), 615–635.

Foutse Khomh, Bram Adams, Jinghui Cheng, Marios Fokaefs, and Giuliano Antoniol. 2018. Software engineering for machine-learning applications: The road ahead. IEEE Software 35, 5 (2018), 81–84.

Alexandra Kuznetsova, Per B Brockhoff, and Rune Haubo Bojesen Christensen. 2017. lmerTest package: tests in linear mixed effects models. Journal of statistical software 82, 13 (2017).

Shinichi Nakagawa and Holger Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in ecology and evolution 4, 2 (2013), 133–142.

Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, and Jeff Skowronek. 2006. Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys. In annual meeting of the Florida Association of Institutional Research, Vol. 177. 34.

Dhia Elhaq Rzig, Foyzul Hassan, Chetan Bansal, and Nachiappan Nagappan. 2022. Characterizing the Usage of CI Tools in ML Projects. In Proceedings of the 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 69–79.

Jadson Santos, Daniel Alencar da Costa, and Uirá Kulesza. 2022. Investigating the impact of continuous integration practices on the productivity and quality of open-source projects. In Proceedings of the 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 137–147.

Diego Saraiva, Daniel Alencar Da Costa, Uirá Kulesza, Gustavo Sizílio, José Gameleira Neto, Roberta Coelho, and Meiyappan Nagappan. 2023. Unveiling the Relationship Between Continuous Integration and Code Coverage. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, 247–259.

Simon Sheather. 2009. A modern approach to regression with R. Springer Science & Business Media.

Donald L Thistlethwaite and Donald T Campbell. 1960. Regression-discontinuity analysis: An alternative to the ex post facto experiment. Journal of Educational psychology 51, 6 (1960), 309.

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous integration in GitHub. In Proceedings of the 2015 10th joint meeting on foundations of software engineering. 805–816.

Zhiyuan Wan, Xin Xia, David Lo, and Gail C Murphy. 2019. How does machine learning change software development practices? IEEE Transactions on Software Engineering 47, 9 (2019), 1857–1871.

Hironori Washizaki, Hiromu Uchida, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2019. Studying software engineering patterns for designing machine learning systems. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE, 49–495.

Yasuhiro Watanabe, Hironori Washizaki, Kazunori Sakamoto, Daisuke Saito, Kiyoshi Honda, Naohiko Tsuda, Yoshiaki Fukazawa, and Nobukazu Yoshioka. 2021. Preliminary literature review of machine learning system development practices. In 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 1407–1408.

MairieliWessel, Alexander Serebrenik, Igor Wiese, Igor Steinmacher, and Marco A Gerosa. 2020. Effects of adopting code review bots on pull requests to oss projects. In 2020 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 1–11.

Mairieli Wessel, Joseph Vargovich, Marco A Gerosa, and Christoph Treude. 2023. Github actions: the impact on the pull request process. Empirical Software Engineering 28, 6 (2023), 131.

Daniel S Wilks. 2011. Statistical methods in the atmospheric sciences. Vol. 100. Academic press.

Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bogdan Vasilescu. 2017. The impact of continuous integration on other software development practices: a large-scale empirical study. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 60–71.