A Case Study on Data Science Processes in an Academia-Industry Collaboration

  • Sara Silva Unicamp
  • Breno Bernard Nicolau De França Unicamp

Resumo


Data Science (DS) is emerging in major software development projects and often needs to follow software development practices. Therefore, DS processes will likely continue to attract Software Engineering (SE) practices and vice-versa. This case study aims to map and describe a software development process for Machine Learning(ML)-enabled applications and associated practices used in a real DS project at the Recod.ai laboratory in collaboration with an industrial partner. The focus was to analyze the process and identify the strengths and primary challenges, considering their expertise in robust ML practices and how they can contribute to general software quality. To achieve this, we conducted semi-structured interviews and analyzed them using procedures from the Straussian Grounded Theory. The results showed that the DS development process is iterative, with feedback between activities, which differs from the processes in the literature. Additionally, this process presents a greater involvement of domain experts. Besides, the team prioritizes software quality characteristics (attributes) in these DS projects to ensure some aspects of the final product’s quality, i.e., functional correctness and robustness. To achieve those, they use regular accuracy metrics and include explainability and data leakage as quality metrics during training. Finally, the software engineer’s role and its responsibilities differ from those of a traditional industry software engineer, as s/he is involved in most of the process steps. These characteristics can contribute to high-quality models achieving the partner needs and, consequently, relevant contributions to the intersection between SE and DS.
Palavras-chave: data science, machine learning, case study, software processes
Publicado
07/11/2023
Como Citar

Selecione um Formato
SILVA, Sara; FRANÇA, Breno Bernard Nicolau De. A Case Study on Data Science Processes in an Academia-Industry Collaboration. In: SIMPÓSIO BRASILEIRO DE QUALIDADE DE SOFTWARE (SBQS), 22. , 2023, Brasília/DF. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 1–10.