On the Nature of Duplicate Pull Requests: An Empirical Study Using Association Rules

  • Cleyciane Lima UFAC
  • Daricelio Soares UFAC

Resumo


In open source communities, developers submit pull requests, either to add features, fix bugs or make any modifications to software artifacts, which can be accepted or rejected after review by a core team member of the project. Considering the large-scale growth of collaborative software development in recent years, several challenges have arisen that have become barriers for developers, such as the occurrence of duplicate pull requests. Some works applied techniques to detect duplicate pull requests and performed a study with mixed approach to analyze the extent to which duplicate pull requests affect development in open source communities. However, these studies do not address descriptively, the relationships that exist between the occurrence of duplicate pull requests and other characteristics, such as lifetime and number of commits performed, that are present in these contributions. In this paper, the data mining technique called association rules is used to perform a set of studies, from 49.762 pull requests over 6 OSS projects hosted on GitHub, revealing new knowledge about the nature of duplicate pull requests. The results indicate that some structural characteristics, the developer inexperience and the requester profile individually or jointly, are present in different intensities in duplicate pull requests. Identifying these aspects can support software communities in understanding the nature of duplicate pull requests.

Palavras-chave: Association Rules, Duplicate Pull Request, GitHub
Publicado
03/10/2022
Como Citar

Selecione um Formato
LIMA, Cleyciane; SOARES, Daricelio. On the Nature of Duplicate Pull Requests: An Empirical Study Using Association Rules. In: SIMPÓSIO BRASILEIRO DE COMPONENTES, ARQUITETURAS E REUTILIZAÇÃO DE SOFTWARE (SBCARS), 16. , 2022, Uberlândia. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 68–75.