Towards Effective Collaboration between Software Engineers and Data Scientists developing Machine Learning-Enabled Systems

Gabriel Busquim; Allysson Allex Araújo; Maria Julia Lima; Marcos Kalinowski

doi:10.5753/sbes.2024.3027

Gabriel Busquim PUC-Rio http://orcid.org/0009-0004-9048-2097
Allysson Allex Araújo UFCA https://orcid.org/0000-0003-2108-2335
Maria Julia Lima PUC-Rio https://orcid.org/0000-0003-3843-021X
Marcos Kalinowski PUC-Rio https://orcid.org/0000-0003-1445-3425

DOI: https://doi.org/10.5753/sbes.2024.3027

Resumo

Incorporating Machine Learning (ML) into existing systems is a demand that has grown among several organizations. However, the development of ML-enabled systems encompasses several social and technical challenges, which must be addressed by actors with different fields of expertise working together. This paper has the objective of understanding how to enhance the collaboration between two key actors in building these systems: software engineers and data scientists. We conducted two focus group sessions with experienced data scientists and software engineers working on real-world ML-enabled systems to assess the relevance of different recommendations for specific technical tasks. Our research has found that collaboration between these actors is important for effectively developing ML-enabled systems, especially when defining data access and ML model deployment. Participants provided concrete examples of how recommendations depicted in the literature can benefit collaboration during different tasks. For example, defining clear responsibilities for each team member and creating concise documentation can improve communication and overall performance. Our study contributes to a better understanding of how to foster effective collaboration between software engineers and data scientists creating ML-enabled systems.

Palavras-chave: Machine Learning, ML-enabled System, Data Science, Software Engineering, Collaboration

Referências

Timo Aho, Outi Sievi-Korte, Terhi Kilamo, Sezin Yaman, and Tommi Mikkonen. 2020. Demystifying data science projects: A look on the people and process of data science today. In Product-Focused Software Process Improvement: 21st International Conference, PROFES 2020, Turin, Italy, November 25–27, 2020, Proceedings 21. Springer, 153–167.

Cláuvin Almeida, Marcos Kalinowski, Anderson Uchôa, and Bruno Feijó. 2023. Negative effects of gamification in education software: Systematic mapping and practitioner perceptions. Information and Software Technology 156 (2023), 107142.

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300.

Victor R Basili and H Dieter Rombach. 1988. The TAME project: Towards improvement-oriented software environments. IEEE Transactions on software engineering 14, 6 (1988), 758–773.

Andrew Begel and Thomas Zimmermann. 2014. Analyze this! 145 questions for data scientists in software engineering. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 12–23. DOI: 10.1145/2568225.2568233

Gabriel Busquim, Allysson Allex Araújo, Maria Julia Lima, and Marcos Kalinowski. 2024. Artifacts: Towards Effective Collaboration between Software Engineers and Data Scientists developing Machine Learning-Enabled Systems. DOI: 10.5281/zenodo.10884480

Gabriel Busquim, Hugo Villamizar, Maria Julia Lima, and Marcos Kalinowski. 2024. On the Interaction Between Software Engineers and Data Scientists When Building Machine Learning-Enabled Systems. In International Conference on Software Quality. Springer, 55–75.

Mary Debus. 1994. Manual para excelencia en la investigación mediante grupos focales. Academy for Educational Development, Washington, D.C. 97 pages.

Miryung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2017. Data scientists in software teams: State of the art and challenges. IEEE Transactions on Software Engineering 44, 11 (2017), 1024–1038.

Jyrki Kontio, Johanna Bragge, and Laura Lehtola. 2008. The focus group method as an empirical tool in software engineering. In Guide to advanced empirical software engineering. Springer, 93–116.

Grace A Lewis, Stephany Bellomo, and Ipek Ozkaya. 2021. Characterizing and detecting mismatch in machine-learning-enabled systems. In 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN). IEEE, 133–140.

Alina Mailach and Norbert Siegmund. 2023. Socio-Technical Anti-Patterns in Building ML-Enabled Software: Insights from Leaders on the Forefront. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 690–702.

Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner. 2022. Collaboration challenges in building ML-enabled systems: communication, documentation, engineering, and process. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 413–425. DOI: 10.1145/3510003.3510209

David Piorkowski, Soya Park, April Yi Wang, Dakuo Wang, Michael Muller, and Felix Portnoy. 2021. How AI Developers Overcome Communication Challenges in a Multidisciplinary Team: A Case Study. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (April 2021), 131:1–131:25. DOI: 10.1145/3449205

Anselm Strauss and Juliet Corbin. 1998. Basics of qualitative research techniques. (1998).

Hugo Villamizar, Marcos Kalinowski, Hélio Lopes, and Daniel Mendez. 2024. Identifying concerns when specifying machine learning-enabled systems: A perspective-based approach. Journal of Systems and Software (2024), 112053.

Zhiyuan Wan, Xin Xia, David Lo, and Gail C Murphy. 2019. How does machine learning change software development practices? IEEE Transactions on Software Engineering 47, 9 (2019), 1857–1871.

Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles, Workflows, and Tools. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (May 2020), 22:1–22:23. DOI: 10.1145/3392826