Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development

  • João Lucas Correia UFAL
  • Juliana Alves Pereira PUC-Rio
  • Rafael Mello CEFET-RJ
  • Alessandro Garcia PUC-Rio
  • Baldoino Fonseca UFAL
  • Márcio Ribeiro UFAL
  • Rohit Gheyi UFCG
  • Marcos Kalinowski PUC-Rio
  • Renato Cerqueira IBM Research Brazil
  • Willy Tiengo UFAL

Resumo


Data scientists often develop machine learning models to solve a variety of problems in the industry and academy. To build these models, these professionals usually perform activities that are also performed in the traditional software development lifecycle, such as eliciting and implementing requirements. One might argue that data scientists could rely on the engineering of traditional software development to build machine learning models. However, machine learning development presents certain characteristics, which may raise challenges that lead to the need for adopting new practices. The literature lacks in characterizing this knowledge from the perspective of the data scientists. In this paper, we characterize challenges and practices addressing the engineering of machine learning models that deserve attention from the research community. To this end, we performed a qualitative study with eight data scientists across five different companies having different levels of experience in developing machine learning models. Our findings suggest that: (i) data processing and feature engineering are the most challenging stages in the development of machine learning models; (ii) it is essential synergy between data scientists and domain experts in most of stages; and (iii) the development of machine learning models lacks the support of a well-engineered process.
Palavras-chave: Software Engineering, Machine Learning, Practitioner, Empirical Study
Publicado
01/12/2020
Como Citar

Selecione um Formato
CORREIA, João Lucas et al. Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development. In: SIMPÓSIO BRASILEIRO DE QUALIDADE DE SOFTWARE (SBQS), 19. , 2020, São Luiz do Maranhão. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 91-100.