RAISE: A Self-Hosted Platform for Mining and Managing Data from GitHub and Jira
Resumo
The growing number of software repositories has opened new opportunities for researchers to investigate how software is developed, howteams collaborate, and howquality evolves over time. However, mining useful information out of these repositories often requires custom, ad hoc scripts tailored to specific studies, which are rarely built to be reusable or shareable. This hinders reproducibility, results in redundant data extraction efforts, and wastes computational resources, particularly in academic environments where multiple researchers or teams may need similar datasets. To address these limitations, we developed RAISE, a self-hosted platform for mining and managing data from GitHub and Jira. RAISE offers both an REST API and a web-based interface, allowing streamlined data retrieval, exploration, and export. It is built with widely adopted technologies such as Django, Docker, Celery and React. It is easy to deploy, supports background task execution, and ensures consistent behavior across environments. The platform provides fine-grained filtering, integrates data from both local and remote repositories, and stores results in a structured database for reuse. To evaluate the RAISE’s practical value and usability, we performed a user-centered evaluation with six participants, who engaged in a range of realistic repository mining tasks of varying complexity.
Referências
Milica Avramovska, Elizabeta Hristovska, and Sonja Calamani. 2024. Applying Jira – A Tool for the Organization and Optimization of Work Processes in the Machine Industry Based on the Experience of the IT Industry. SAR Journal 7, 4 (2024), 289–295. DOI: 10.18421/SAR74-01
Diego Castro and Marcelo Schots. 2018. Analysis of Test Log Information through Interactive Visualizations. In Proceedings of the 26th IEEE/ACM International Conference on Program Comprehension (ICPC ’18). ACM, Gothenburg, Sweden, 156–166. DOI: 10.1145/3196321.3196345
K.K. Chaturvedi, V.B. Singh, and Prashast Singh. 2013. Tools in Mining Software Repositories. In 13th International Conference on Computational Science and Its Applications. IEEE, 89–98. DOI: 10.1109/ICCSA.2013.22
Sander Coremans, Jakob Krüger, and Dirk Fahland. 2023. Process Mining from Jira Issues at a Large Company. In CAiSE Forum. [link]
GitPython Developers. 2025. GitPython: Python library to interact with Git repositories. [link]. Versão 3.1.44, acessado em 22 de abril de 2025.
Sergio Dueñas, Jesus M. Gonzalez-Barahona, Gregorio Robles, Víctor Cosentino, Daniel Izquierdo-Cortázar, Dominguez Fernandez, and Andrea Capiluppi. 2021. GrimoireLab: A toolset for software development analytics. PeerJ Computer Science 7 (2021), e601. DOI: 10.7717/peerj-cs.601
Jesus M. Gonzalez-Barahona and Gregorio Robles. 2023. Revisiting the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Information and Software Technology 158 (2023), 107191. DOI: 10.1016/j.infsof.2023.107191
Georgios Gousios and Diomidis Spinellis. 2012. GHTorrent: GitHub’s Data from a Firehose. In Proceedings of the 9th Working Conference on Mining Software Repositories (MSR). 12–21. DOI: 10.1109/MSR.2012.6224294
Georgios Gousios and Diomidis Spinellis. 2017. Mining Software Engineering Data from GitHub. In Proceedings of the 39th International Conference on Software Engineering Companion. 503–506. DOI: 10.1109/ICSE-C.2017.164
Monika Gupta, Ashish Sureka, and Srinivas Padmanabhuni. 2014. Process Mining Multiple Repositories for Software Defect Resolution from Control and Organizational Perspective. In Proceedings of the 11th Working Conference on Mining Software Repositories. 122–131. DOI: 10.1145/2597073.2597081
Nicole Hoess, Carlos Paradis, Rick Kazman, and Wolfgang Mauerer. 2025. Does the ToolMatter? Exploring Some Causes of Threats to Validity in Mining Software Repositories. In 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 645–656.
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035–2071. DOI: 10.1007/s10664-015-9393-5
Thomas Krismayer, Christoph Mayr-Dorn, Johann Tuder, Rick Rabiser, and Paul Grünbacher. 2019. Using Constraint Mining to Analyze Software Development Processes. In 2019 IEEE/ACM International Conference on Software and System Processes (ICSSP). 94–103. DOI: 10.1109/ICSSP.2019.00021
Marek Macura. 2014. Integration of data from heterogeneous sources using ETL technology. Computer Science 15, 2 (2014), 109–124. DOI: 10.7494/csci.2014.15.2.109
Rita Marques, Miguel Mira da Silva, and Diogo R. Ferreira. 2018. Assessing Agile Software Development Processes with Process Mining: A Case Study. In 20th Conference on Business Informatics. DOI: 10.1109/CBI.2018.00021
Lloyd Montgomery, Clara Lüders, and Walid Maalej. 2022. An alternative issue tracking dataset of public Jira repositories. In Proceedings of the 19th International Conference on Mining Software Repositories. 73–77. DOI: 10.1145/3524842.3528486
Lloyd Montgomery, Clara Lüders, andWalid Maalej. 2024. Mining Issue Trackers: Concepts and Techniques. In MSR 2024 Conference Proceedings. DOI: 10.1007/978-3-031-73143-3_11
Breno Neves, Daniel Coutinho, Eduardo Sardenberg, Arthur Alesi, Marcelo Machado, Johny Arriel, Ana Luísa Cavalcante, and Juliana Alves Pereira. 2025. Replication Package: RAISE. [link].
Mikko Raatikainen, Quim Motger, Clara Marie Lüders, Xavier Franch, Lalli Myllyaho, Elina Kettunen, Jordi Marco, Juha Tiihonen, Mikko Halonen, and Tomi Männistö. 2022. Improved management of issue dependencies in issue trackers of large collaborative projects. IEEE Transactions on Software Engineering 49, 4 (2022), 2128–2148. DOI: 10.1109/TSE.2022.3212166
Ezequiel O Ramos and Rogério Rossi. 2023. Process Mining Applied in a Software Project Development with SCRUM and ProM. European Journal of Engineering and Technology Research 8, 5 (2023), 17–24. DOI: 10.24018/ejeng.2023.8.5.3089
Daniela Rodriguez, Andy Zaidman, and Arie van Deursen. 2024. Sharing Software-Evolution Datasets: Practices, Challenges, and Recommendations. In Proceedings of the 46th International Conference on Software Engineering (ICSE). ACM. DOI: 10.1145/3660798
Tamanna Siddiqui and Ausaf Ahmad. 2017. Data Mining Tools and Techniques for Mining Software Repositories: A Systematic Review. 654 (2017), 717–726. DOI: 10.1007/978-981-10-6620-7_70
Francisco Zigmund Sokol, Mauricio Finavaro Aniche, and Marco Tulio Valente Gerosa. 2013. MetricMiner: Supporting Researchers in Mining Software Repositories. In 2013 IEEE 13th InternationalWorking Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 125–134. DOI: 10.1109/SCAM.2013.6648195
Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 908–911. DOI: 10.1145/3236024.3264598
Melina Vidoni. 2022. A Systematic Process for Mining Software Repositories: Results from a Systematic Literature Review. Information and Software Technology 144 (2022), 106791. DOI: 10.1016/j.infsof.2021.106791
Jiuang Zhao, Zitian Yang, Li Zhang, Xiaoli Lian, Donghao Yang, and Xin Tan. 2024. DRMiner: Extracting Latent Design Rationale from Jira Issue Logs. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 468–480. DOI: 10.1145/3691620.3695019
