Inside Commits: An Empirical Study on Commits in Open-Source Software

  • Mívian Ferreira UFMG
  • Diego Golçalves CEFET-MG
  • Kecia Ferreira CEFET-MG
  • Mariza Bigonha UFMG

Resumo


GitHub is currently the most popular open-source software hosting platform, containing about 20 million public repositories. Many studies have relied on data mined from GitHub repositories, especially commits. However, not knowing the characteristics of commits may introduce biases and threats in those studies. This work presents an empirical study to characterize commits in terms of three aspects: categories of activities performed in the commits; co-occurrences of activities in commits; and size of commits by category. We analyzed 1M commits from the 24 most popular and most active Java-based projects hosted in GitHub. The main findings of this work show that: reengineering is the most frequent activity; 30% of commits involve more than one type of activity; the most common co-occurrence of activities in commits is reengineering with forwarding and corrective reengineering, however in a low rate, only 8%. The results of this study should be considered by empirical works to avoid threats and biases when considering commits’ data.
Publicado
29/09/2021
Como Citar

Selecione um Formato
FERREIRA, Mívian; GOLÇALVES, Diego; FERREIRA, Kecia; BIGONHA, Mariza. Inside Commits: An Empirical Study on Commits in Open-Source Software. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 35. , 2021, Joinville. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 .