Tracking the decisions to select repositories for Mining Software Repositories experiments


Mining Software Repositories analyzes and cross-links the data available in software repositories. This enables MRS to recognize patterns in software repositories. For example, to study how developers resolve conflicting merges. However, two main problems exist in the selection process of repositories: the limitations presented in traditional approaches used when selecting repositories and the lack of a systematic process for choosing repositories, turning off the experiments' reproducibility. This approach is proposed to resolve identified limitations and assist users in software repositories' selection. Initial results show that this approach returns at least 1.8 times more repositories, overcoming, for instance, the main language restriction in searches.
Palavras-chave: Mining Software Repositories, Repository Selection


Isabel Cafezeiro, José Viterbo, Leonardo Cruz da Costa, Luciana Salgado, Marcelo da Costa Rocha, and Rodrigo Salvador Monteiro. Strengthening of the sociotechnical approach in information systems research. Sociedade Brasileira de Computação, 2017.

Gleiph Ghiotto, Leonardo Murta, Márcio Barros, and André van der Hoek. On the nature of merge conflicts: A study of 2,731 open source java projects hosted by github. IEEE Transactions on Software Engineering, 46(8):892–915, 2020.

Georgios Gousios. The ghtorent dataset and tool suite. In 2013 10th Working Conference on Mining Software Repositories (MSR), pages 233–236, 2013.

Ahmed E. Hassan. The road ahead for mining software repositories. In 2008 Frontiers of Software Maintenance, pages 48–57, 2008.

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. An in-depth study of the promises and perils of mining github. Empirical Software Engineering, 21:2035–2071, 2016.

Mark J Lemay. Understanding java usability by mining github repositories. In 9th Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.

Victor A. Luzgin and Ivan I. Kholod. Overview of mining software repositories. In 2020 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), pages 400–404, 2020.

Sayed Mohsin Reza, Omar Badreddin, and Khandoker Rahad. Modelmine: A tool to facilitate mining models from open source repositories. In Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, MODELS ’20, New York, NY, USA, 2020. Association for Computing Machinery.

M. Vidoni. A systematic process for mining software repositories: Results from a systematic literature review. Information and Software Technology, 144:106791, 2022.

Abdulkadir Şeker, Banu Diri, Halil Arslan, and Mehmet Amasyali. A systematic mapping of software engineering challenges: Ghtorrent case, 03 2020.
COSTA, Hiero Henrique Barcelos; OLIVEIRA, Guilherme Marques de; SALLES, Victor Souza; MENEZES, Gleiph Ghiotto Lima. Tracking the decisions to select repositories for Mining Software Repositories experiments. In: TRILHA DE TEMAS, IDEIAS E RESULTADOS EMERGENTES EM SISTEMAS DE INFORMAÇÃO - SIMPÓSIO BRASILEIRO DE SISTEMAS DE INFORMAÇÃO (SBSI), 20. , 2024, Juiz de Fora/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 333-338. DOI: