RAISE: A Self-Hosted Platform for Mining and Managing Data from GitHub and Jira

  • Breno Neves PUC-Rio
  • Daniel Coutinho PUC-Rio
  • Eduardo Sardenberg PUC-Rio
  • Arthur Alesi PUC-Rio
  • Marcelo Machado PUC-Rio
  • Johny Arriel PUC-Rio
  • Ana Luísa Cavalvante PUC-Rio
  • Robbie Carvalho PUC-Rio
  • Juliana Alves Pereira PUC-Rio

Resumo


The growing number of software repositories has opened new opportunities for researchers to investigate how software is developed, howteams collaborate, and howquality evolves over time. However, mining useful information out of these repositories often requires custom, ad hoc scripts tailored to specific studies, which are rarely built to be reusable or shareable. This hinders reproducibility, results in redundant data extraction efforts, and wastes computational resources, particularly in academic environments where multiple researchers or teams may need similar datasets. To address these limitations, we developed RAISE, a self-hosted platform for mining and managing data from GitHub and Jira. RAISE offers both an REST API and a web-based interface, allowing streamlined data retrieval, exploration, and export. It is built with widely adopted technologies such as Django, Docker, Celery and React. It is easy to deploy, supports background task execution, and ensures consistent behavior across environments. The platform provides fine-grained filtering, integrates data from both local and remote repositories, and stores results in a structured database for reuse. To evaluate the RAISE’s practical value and usability, we performed a user-centered evaluation with six participants, who engaged in a range of realistic repository mining tasks of varying complexity.

Palavras-chave: Mining Software Repositories, Empirical Software Engineering, GitHub, Jira

Referências

Mohammad Almarzouq, Abdullatif Alzaidan, and Jehad AlDallal. 2020. Mining GitHub for research and education: challenges and opportunities. International Journal of Web Information Systems 16, 5 (2020), 549–567. DOI: 10.1108/IJWIS-03-2020-0016

Milica Avramovska, Elizabeta Hristovska, and Sonja Calamani. 2024. Applying Jira – A Tool for the Organization and Optimization of Work Processes in the Machine Industry Based on the Experience of the IT Industry. SAR Journal 7, 4 (2024), 289–295. DOI: 10.18421/SAR74-01

Diego Castro and Marcelo Schots. 2018. Analysis of Test Log Information through Interactive Visualizations. In Proceedings of the 26th IEEE/ACM International Conference on Program Comprehension (ICPC ’18). ACM, Gothenburg, Sweden, 156–166. DOI: 10.1145/3196321.3196345

K.K. Chaturvedi, V.B. Singh, and Prashast Singh. 2013. Tools in Mining Software Repositories. In 13th International Conference on Computational Science and Its Applications. IEEE, 89–98. DOI: 10.1109/ICCSA.2013.22

Sander Coremans, Jakob Krüger, and Dirk Fahland. 2023. Process Mining from Jira Issues at a Large Company. In CAiSE Forum. [link]

GitPython Developers. 2025. GitPython: Python library to interact with Git repositories. [link]. Versão 3.1.44, acessado em 22 de abril de 2025.

Sergio Dueñas, Jesus M. Gonzalez-Barahona, Gregorio Robles, Víctor Cosentino, Daniel Izquierdo-Cortázar, Dominguez Fernandez, and Andrea Capiluppi. 2021. GrimoireLab: A toolset for software development analytics. PeerJ Computer Science 7 (2021), e601. DOI: 10.7717/peerj-cs.601

Jesus M. Gonzalez-Barahona and Gregorio Robles. 2023. Revisiting the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Information and Software Technology 158 (2023), 107191. DOI: 10.1016/j.infsof.2023.107191

Georgios Gousios and Diomidis Spinellis. 2012. GHTorrent: GitHub’s Data from a Firehose. In Proceedings of the 9th Working Conference on Mining Software Repositories (MSR). 12–21. DOI: 10.1109/MSR.2012.6224294

Georgios Gousios and Diomidis Spinellis. 2017. Mining Software Engineering Data from GitHub. In Proceedings of the 39th International Conference on Software Engineering Companion. 503–506. DOI: 10.1109/ICSE-C.2017.164

Monika Gupta, Ashish Sureka, and Srinivas Padmanabhuni. 2014. Process Mining Multiple Repositories for Software Defect Resolution from Control and Organizational Perspective. In Proceedings of the 11th Working Conference on Mining Software Repositories. 122–131. DOI: 10.1145/2597073.2597081

Nicole Hoess, Carlos Paradis, Rick Kazman, and Wolfgang Mauerer. 2025. Does the ToolMatter? Exploring Some Causes of Threats to Validity in Mining Software Repositories. In 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 645–656.

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035–2071. DOI: 10.1007/s10664-015-9393-5

Thomas Krismayer, Christoph Mayr-Dorn, Johann Tuder, Rick Rabiser, and Paul Grünbacher. 2019. Using Constraint Mining to Analyze Software Development Processes. In 2019 IEEE/ACM International Conference on Software and System Processes (ICSSP). 94–103. DOI: 10.1109/ICSSP.2019.00021

Marek Macura. 2014. Integration of data from heterogeneous sources using ETL technology. Computer Science 15, 2 (2014), 109–124. DOI: 10.7494/csci.2014.15.2.109

Rita Marques, Miguel Mira da Silva, and Diogo R. Ferreira. 2018. Assessing Agile Software Development Processes with Process Mining: A Case Study. In 20th Conference on Business Informatics. DOI: 10.1109/CBI.2018.00021

Lloyd Montgomery, Clara Lüders, and Walid Maalej. 2022. An alternative issue tracking dataset of public Jira repositories. In Proceedings of the 19th International Conference on Mining Software Repositories. 73–77. DOI: 10.1145/3524842.3528486

Lloyd Montgomery, Clara Lüders, andWalid Maalej. 2024. Mining Issue Trackers: Concepts and Techniques. In MSR 2024 Conference Proceedings. DOI: 10.1007/978-3-031-73143-3_11

Breno Neves, Daniel Coutinho, Eduardo Sardenberg, Arthur Alesi, Marcelo Machado, Johny Arriel, Ana Luísa Cavalcante, and Juliana Alves Pereira. 2025. Replication Package: RAISE. [link].

Mikko Raatikainen, Quim Motger, Clara Marie Lüders, Xavier Franch, Lalli Myllyaho, Elina Kettunen, Jordi Marco, Juha Tiihonen, Mikko Halonen, and Tomi Männistö. 2022. Improved management of issue dependencies in issue trackers of large collaborative projects. IEEE Transactions on Software Engineering 49, 4 (2022), 2128–2148. DOI: 10.1109/TSE.2022.3212166

Ezequiel O Ramos and Rogério Rossi. 2023. Process Mining Applied in a Software Project Development with SCRUM and ProM. European Journal of Engineering and Technology Research 8, 5 (2023), 17–24. DOI: 10.24018/ejeng.2023.8.5.3089

Daniela Rodriguez, Andy Zaidman, and Arie van Deursen. 2024. Sharing Software-Evolution Datasets: Practices, Challenges, and Recommendations. In Proceedings of the 46th International Conference on Software Engineering (ICSE). ACM. DOI: 10.1145/3660798

Tamanna Siddiqui and Ausaf Ahmad. 2017. Data Mining Tools and Techniques for Mining Software Repositories: A Systematic Review. 654 (2017), 717–726. DOI: 10.1007/978-981-10-6620-7_70

Francisco Zigmund Sokol, Mauricio Finavaro Aniche, and Marco Tulio Valente Gerosa. 2013. MetricMiner: Supporting Researchers in Mining Software Repositories. In 2013 IEEE 13th InternationalWorking Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 125–134. DOI: 10.1109/SCAM.2013.6648195

Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 908–911. DOI: 10.1145/3236024.3264598

Melina Vidoni. 2022. A Systematic Process for Mining Software Repositories: Results from a Systematic Literature Review. Information and Software Technology 144 (2022), 106791. DOI: 10.1016/j.infsof.2021.106791

Jiuang Zhao, Zitian Yang, Li Zhang, Xiaoli Lian, Donghao Yang, and Xin Tan. 2024. DRMiner: Extracting Latent Design Rationale from Jira Issue Logs. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 468–480. DOI: 10.1145/3691620.3695019
Publicado
22/09/2025
NEVES, Breno et al. RAISE: A Self-Hosted Platform for Mining and Managing Data from GitHub and Jira. In: SIMPÓSIO BRASILEIRO DE COMPONENTES, ARQUITETURAS E REUTILIZAÇÃO DE SOFTWARE (SBCARS), 19. , 2025, Recife/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 112-122. DOI: https://doi.org/10.5753/sbcars.2025.14603.