On the Nature of Duplicate Pull Requests: An Empirical Study Using Association Rules
Resumo
In open source communities, developers submit pull requests, either to add features, fix bugs or make any modifications to software artifacts, which can be accepted or rejected after review by a core team member of the project. Considering the large-scale growth of collaborative software development in recent years, several challenges have arisen that have become barriers for developers, such as the occurrence of duplicate pull requests. Some works applied techniques to detect duplicate pull requests and performed a study with mixed approach to analyze the extent to which duplicate pull requests affect development in open source communities. However, these studies do not address descriptively, the relationships that exist between the occurrence of duplicate pull requests and other characteristics, such as lifetime and number of commits performed, that are present in these contributions. In this paper, the data mining technique called association rules is used to perform a set of studies, from 49.762 pull requests over 6 OSS projects hosted on GitHub, revealing new knowledge about the nature of duplicate pull requests. The results indicate that some structural characteristics, the developer inexperience and the requester profile individually or jointly, are present in different intensities in duplicate pull requests. Identifying these aspects can support software communities in understanding the nature of duplicate pull requests.