Issue Labeling Dynamics in Open-Source Projects: A Comprehensive Analysis
Resumo
Open-source repositories play a vital role in modern software development, facilitating collaboration and code sharing among developers worldwide. In this study, we investigate the usage of labels in GitHub repositories to understand their impact on the issue resolution process and project management.We employ data mining techniques to gather a dataset comprising 10,673,459 issues from 13,280 repositories hosted on GitHub’s featured topics list. Our study design involves four phases: repository selection, mining repository issues, pre-processing issues’ components, and data processing to address research questions (RQs). The first RQ focuses on the frequency and usage of standard and custom labels in repositories. The second and third RQs delve into the average time for labeling issues and defining the triage phase from labeling practices. We found that 73.14% of repositories employ issue labeling, with most labeling activity concentrated before the 100th day since issue opening. This rapid labeling process is often followed by a structured label change pattern, potentially corresponding to specific issue phases like triage, implementation, or change validation. Analyzing time intervals between label changes, we observed that most issues undergo triage within 1 to 100 days, with labels prioritized based on their frequency in the resolution process. Our analysis sheds light on labels’ significance in organizing and classifying issues through a systematic triage process within open-source repositories. Labels serve as social and technical elements, contributing to enhanced organization, identification, implementation, and validation of code changes. These findings provide valuable insights into the effective management and maintenance of open-source projects, aiding developers and project managers in optimizing issue resolution processes. The results and scripts from our study are available in the supplementary material repository for further exploration and reference by the software engineering community.
Palavras-chave:
Open-source Repositories, Issue, Issue labeling, Defect, Triage, Issue Life Cycle
Referências
2021. Modern Software Engineering: Doing What Works to Build Better Software Faster (1st ed.). Addison-Wesley Professional.
Jesús M Alonso-Abad, Carlos López-Nozal, Jesús M Maudes-Raedo, and Raúl Marticorena-Sánchez. 2019. Label prediction on issue tracking systems using text mining. Progress in Artificial Intelligence 8, 3 (2019), 325–342.
John Anvik, Lyndon Hiew, and Gail C. Murphy. 2005. Coping with an Open Bug Repository. In Proceedings of the 2005 OOPSLA Workshop on Eclipse Technology EXchange (San Diego, California) (eclipse ’05). ACM, New York, NY, USA, 35–39.
John Anvik, Lyndon Hiew, and Gail C. Murphy. 2006. Who Should Fix This Bug?. In Proceedings of the 28th International Conference on Software Engineering (Shanghai, China) (ICSE ’06). ACM, New York, NY, USA, 361–370.
J. Aranda and G. Venolia. 2009. The secret life of bugs: Going past the errors and omissions in software repositories. In 2009 IEEE 31st International Conference on Software Engineering. 298–308.
Mario Luca Bernardi, Gerardo Canfora, Giuseppe A. Di Lucca, Massimiliano Di Penta, and Damiano Distante. 2012. Do Developers Introduce Bugs When They Do Not Communicate? The Case of Eclipse and Mozilla. In 2012 16th European Conference on Software Maintenance and Reengineering (CSMR ’12). IEEE Computer Society, USA, 139–148.
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiß, Rahul Premraj, and Thomas Zimmermann. 2007. Quality of Bug Reports in Eclipse. In Proceedings of the 2007 OOPSLAWorkshop on Eclipse Technology EXchange (Montreal, Quebec, Canada) (eclipse ’07). ACM, New York, NY, USA, 21–25.
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. 2008. What Makes a Good Bug Report?. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Atlanta, Georgia) (SIGSOFT ’08/FSE-16). ACM, New York, NY, USA, 308–318.
T. F. Bissyandé, D. Lo, L. Jiang, L. Réveillère, J. Klein, and Y. L. Traon. 2013. Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). 188–197.
Hudson Borges and Marco Tulio Valente. 2018. What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform. Journal of Systems and Software 146 (2018), 112 – 129.
Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, and Belén Rolandi. 2015. Exploring the use of labels to categorize issues in open-source software projects. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 550–554.
Yguaratã Cerqueira Cavalcanti, Paulo Anselmo da Mota Silveira Neto, Ivan do Carmo Machado, Tassio Ferreira Vale, Eduardo Santana de Almeida, and Silvio Romero de Lemos Meira. 2014. Challenges and opportunities for software change request repositories: a systematic mapping study. J. Softw. Evol. Process. 26, 7 (2014), 620–653.
Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW ’12). ACM.
Steven Davies and Marc Roper. 2014. What’s in a Bug Report?. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (Torino, Italy) (ESEM ’14). ACM, New York, NY, USA.
Andrea Di Sorbo, Giovanni Grano, Corrado Aaron Visaggio, and Sebastiano Panichella. 2021. Investigating the criticality of user-reported issues through their relations with app rating. Journal of Software: Evolution and Process 33, 3 (2021), e2316. DOI: 10.1002/smr.2316
Github Inc. 2020. Page of all features topics in Github. [link]
Github Inc. 2024. Assigning issues and pull requests to other GitHub users. [link]
Github Inc. 2024. Basic writing and formatting syntax. [link]
Yingying He, Wenhua Yang, Minxue Pan, Yasir Hussain, and Yu Zhou. 2023. Understanding and Enhancing Issue Prioritization in GitHub. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 813–824. DOI: 10.1109/ASE56229.2023.00044
Jueun Heo and Seonah Lee. 2023. An Empirical Study on the Performance of Individual Issue Label Prediction. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 228–233. DOI: 10.1109/MSR59073.2023.00041
Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In 35th International Conference on Software Engineering (ICSE). 392–401.
Joselito Mota Jr., Railana Santana, and Ivan Machado. 2021. GrumPy: an automated approach to simplify issue data analysis for newcomers. In Proceedings of the XXXV Brazilian Symposium on Software Engineering (Joinville, Brazil) (SBES ’21). ACM, 33–38. DOI: 10.1145/3474624.3476012
Joselito Júnior, Gláucya Boechat, and Ivan Machado. 2021. Label it be! A largescale study of issue labeling in modern opensource repositories. In 24th Iberoamerican Conference on Software Engineering (CIbSE 2021). Curran Associates, 262–275. [link]
Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2019. Ticket Tagger: Machine Learning Driven Issue Classification. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 406–409. DOI: 10.1109/ICSME.2019.00070
Jaweria Kanwal and Onaiza Maqbool. 2012. Bug prioritization to facilitate bug report triage. Journal of Computer Science and Technology 27, 2 (2012), 397–412.
Nilam Kaushik, Mehdi Amoui, Ladan Tahvildari,Weining Liu, and Shimin Li. 2013. Defect Prioritization in the Software Industry: Challenges and Opportunities. In IEEE Sixth International Conference on Software Testing, Verification and Validation. 70–73.
Sunghun Kim and E. James Whitehead. 2006. How Long Did It Take to Fix Bugs?. In Proceedings of the 2006 International Workshop on Mining Software Repositories (Shanghai, China) (MSR ’06). ACM, New York, NY, USA, 173–174.
Ran Mo, Shaozhi Wei, Qiong Feng, and Zengyang Li. 2022. An exploratory study of bug prediction at the method level. Information and Software Technology 144 (2022), 106794. DOI: 10.1016/j.infsof.2021.106794
Audris Mockus. 2010. Organizational Volatility and Its Effects on Software Defects. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering (Santa Fe, New Mexico, USA) (FSE ’10). ACM, New York, NY, USA, 117–126.
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating github for engineered software projects. Empirical Software Engineering 22, 6 (2017), 3219–3253.
Kumiyo Nakakoji, Yasuhiro Yamamoto, Yoshiyuki Nishinaka, Kouichi Kishida, and Yunwen Ye. 2002. Evolution Patterns of Open-Source Software Systems and Communities. In Proceedings of the International Workshop on Principles of Software Evolution (Orlando, Florida) (IWPSE ’02). ACM, New York, NY, USA, 76–85.
Maleknaz Nayebi, Shaikh Jeeshan Kabeer, Guenther Ruhe, Chris Carlson, and Francis Chew. 2018. Hybrid Labels Are the New Measure! IEEE Software 35, 1 (2018), 54–57.
Daniel Rodriguez, Israel Herraiz, and Rachel Harrison. 2012. On software engineering repositories and their open problems. In First International Workshop on Realizing AI Synergies in Software Engineering (RAISE). 52–56.
Kurt Schneider and Jan-Peter von Hunnius. 2003. Effective experience repositories for software engineering. In Proceedings of the 25th International Conference on Software Engineering (ICSE ’03). IEEE.
Mohammed Latif Siddiq and Joanna C. S. Santos. 2022. BERT-Based GitHub Issue Report Classification. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). 33–36. DOI: 10.1145/3528588.3528660
Margaret-Anne Storey, Jody Ryall, Janice Singer, Del Myers, Li-Te Cheng, and Michael Muller. 2009. How Software Developers Use Tagging to Support Reminding and Refinding. IEEE Transactions on Software Engineering 35, 4 (July 2009), 470–483.
Christoph Treude and Margaret-Anne Storey. 2012. Work Item Tagging: Communicating Concerns in Collaborative Software Development. IEEE Transactions on Software Engineering 38, 1 (2012), 19–34.
J. H. van Moll, J. C. Jacobs, B. Freimut, and J. J. M. Trienekens. 2002. The importance of life cycle modeling to defect detection and prevention. In 10th International Workshop on Software Technology and Engineering Practice (STEP ’02). IEEE Computer Society, 144–155.
Jun Wang, Xiaofang Zhang, Lin Chen, and Xiaoyuan Xie. 2022. Personalizing label prediction for GitHub issues. Information and Software Technology 145 (2022), 106845. DOI: 10.1016/j.infsof.2022.106845
Wenxin Xiao, Jingyue Li, Hao He, Ruiqiao Qiu, and Minghui Zhou. 2023. Personalized First Issue Recommender for Newcomers in Open Source Projects. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 800–812. DOI: 10.1109/ASE56229.2023.00158
Jifeng Xuan, He Jiang, Zhilei Ren, andWeiqin Zou. 2012. Developer prioritization in bug repositories. In 34th International Conference on Software Engineering (ICSE). 25–35.
Suraj Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn. 2019. Mining Software Defects: Should We Consider Affected Releases?. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 654–665. DOI: 10.1109/ICSE.2019.00075
D Zubrow. 2009. IEEE Standard Classification for Software Anomalies. IEEE Computer Society (2009). IEEE 1044-2009.
Jesús M Alonso-Abad, Carlos López-Nozal, Jesús M Maudes-Raedo, and Raúl Marticorena-Sánchez. 2019. Label prediction on issue tracking systems using text mining. Progress in Artificial Intelligence 8, 3 (2019), 325–342.
John Anvik, Lyndon Hiew, and Gail C. Murphy. 2005. Coping with an Open Bug Repository. In Proceedings of the 2005 OOPSLA Workshop on Eclipse Technology EXchange (San Diego, California) (eclipse ’05). ACM, New York, NY, USA, 35–39.
John Anvik, Lyndon Hiew, and Gail C. Murphy. 2006. Who Should Fix This Bug?. In Proceedings of the 28th International Conference on Software Engineering (Shanghai, China) (ICSE ’06). ACM, New York, NY, USA, 361–370.
J. Aranda and G. Venolia. 2009. The secret life of bugs: Going past the errors and omissions in software repositories. In 2009 IEEE 31st International Conference on Software Engineering. 298–308.
Mario Luca Bernardi, Gerardo Canfora, Giuseppe A. Di Lucca, Massimiliano Di Penta, and Damiano Distante. 2012. Do Developers Introduce Bugs When They Do Not Communicate? The Case of Eclipse and Mozilla. In 2012 16th European Conference on Software Maintenance and Reengineering (CSMR ’12). IEEE Computer Society, USA, 139–148.
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiß, Rahul Premraj, and Thomas Zimmermann. 2007. Quality of Bug Reports in Eclipse. In Proceedings of the 2007 OOPSLAWorkshop on Eclipse Technology EXchange (Montreal, Quebec, Canada) (eclipse ’07). ACM, New York, NY, USA, 21–25.
Nicolas Bettenburg, Sascha Just, Adrian Schröter, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. 2008. What Makes a Good Bug Report?. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Atlanta, Georgia) (SIGSOFT ’08/FSE-16). ACM, New York, NY, USA, 308–318.
T. F. Bissyandé, D. Lo, L. Jiang, L. Réveillère, J. Klein, and Y. L. Traon. 2013. Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). 188–197.
Hudson Borges and Marco Tulio Valente. 2018. What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform. Journal of Systems and Software 146 (2018), 112 – 129.
Jordi Cabot, Javier Luis Cánovas Izquierdo, Valerio Cosentino, and Belén Rolandi. 2015. Exploring the use of labels to categorize issues in open-source software projects. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 550–554.
Yguaratã Cerqueira Cavalcanti, Paulo Anselmo da Mota Silveira Neto, Ivan do Carmo Machado, Tassio Ferreira Vale, Eduardo Santana de Almeida, and Silvio Romero de Lemos Meira. 2014. Challenges and opportunities for software change request repositories: a systematic mapping study. J. Softw. Evol. Process. 26, 7 (2014), 620–653.
Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW ’12). ACM.
Steven Davies and Marc Roper. 2014. What’s in a Bug Report?. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (Torino, Italy) (ESEM ’14). ACM, New York, NY, USA.
Andrea Di Sorbo, Giovanni Grano, Corrado Aaron Visaggio, and Sebastiano Panichella. 2021. Investigating the criticality of user-reported issues through their relations with app rating. Journal of Software: Evolution and Process 33, 3 (2021), e2316. DOI: 10.1002/smr.2316
Github Inc. 2020. Page of all features topics in Github. [link]
Github Inc. 2024. Assigning issues and pull requests to other GitHub users. [link]
Github Inc. 2024. Basic writing and formatting syntax. [link]
Yingying He, Wenhua Yang, Minxue Pan, Yasir Hussain, and Yu Zhou. 2023. Understanding and Enhancing Issue Prioritization in GitHub. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 813–824. DOI: 10.1109/ASE56229.2023.00044
Jueun Heo and Seonah Lee. 2023. An Empirical Study on the Performance of Individual Issue Label Prediction. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 228–233. DOI: 10.1109/MSR59073.2023.00041
Kim Herzig, Sascha Just, and Andreas Zeller. 2013. It’s not a bug, it’s a feature: How misclassification impacts bug prediction. In 35th International Conference on Software Engineering (ICSE). 392–401.
Joselito Mota Jr., Railana Santana, and Ivan Machado. 2021. GrumPy: an automated approach to simplify issue data analysis for newcomers. In Proceedings of the XXXV Brazilian Symposium on Software Engineering (Joinville, Brazil) (SBES ’21). ACM, 33–38. DOI: 10.1145/3474624.3476012
Joselito Júnior, Gláucya Boechat, and Ivan Machado. 2021. Label it be! A largescale study of issue labeling in modern opensource repositories. In 24th Iberoamerican Conference on Software Engineering (CIbSE 2021). Curran Associates, 262–275. [link]
Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2019. Ticket Tagger: Machine Learning Driven Issue Classification. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 406–409. DOI: 10.1109/ICSME.2019.00070
Jaweria Kanwal and Onaiza Maqbool. 2012. Bug prioritization to facilitate bug report triage. Journal of Computer Science and Technology 27, 2 (2012), 397–412.
Nilam Kaushik, Mehdi Amoui, Ladan Tahvildari,Weining Liu, and Shimin Li. 2013. Defect Prioritization in the Software Industry: Challenges and Opportunities. In IEEE Sixth International Conference on Software Testing, Verification and Validation. 70–73.
Sunghun Kim and E. James Whitehead. 2006. How Long Did It Take to Fix Bugs?. In Proceedings of the 2006 International Workshop on Mining Software Repositories (Shanghai, China) (MSR ’06). ACM, New York, NY, USA, 173–174.
Ran Mo, Shaozhi Wei, Qiong Feng, and Zengyang Li. 2022. An exploratory study of bug prediction at the method level. Information and Software Technology 144 (2022), 106794. DOI: 10.1016/j.infsof.2021.106794
Audris Mockus. 2010. Organizational Volatility and Its Effects on Software Defects. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering (Santa Fe, New Mexico, USA) (FSE ’10). ACM, New York, NY, USA, 117–126.
Nuthan Munaiah, Steven Kroh, Craig Cabrey, and Meiyappan Nagappan. 2017. Curating github for engineered software projects. Empirical Software Engineering 22, 6 (2017), 3219–3253.
Kumiyo Nakakoji, Yasuhiro Yamamoto, Yoshiyuki Nishinaka, Kouichi Kishida, and Yunwen Ye. 2002. Evolution Patterns of Open-Source Software Systems and Communities. In Proceedings of the International Workshop on Principles of Software Evolution (Orlando, Florida) (IWPSE ’02). ACM, New York, NY, USA, 76–85.
Maleknaz Nayebi, Shaikh Jeeshan Kabeer, Guenther Ruhe, Chris Carlson, and Francis Chew. 2018. Hybrid Labels Are the New Measure! IEEE Software 35, 1 (2018), 54–57.
Daniel Rodriguez, Israel Herraiz, and Rachel Harrison. 2012. On software engineering repositories and their open problems. In First International Workshop on Realizing AI Synergies in Software Engineering (RAISE). 52–56.
Kurt Schneider and Jan-Peter von Hunnius. 2003. Effective experience repositories for software engineering. In Proceedings of the 25th International Conference on Software Engineering (ICSE ’03). IEEE.
Mohammed Latif Siddiq and Joanna C. S. Santos. 2022. BERT-Based GitHub Issue Report Classification. In 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). 33–36. DOI: 10.1145/3528588.3528660
Margaret-Anne Storey, Jody Ryall, Janice Singer, Del Myers, Li-Te Cheng, and Michael Muller. 2009. How Software Developers Use Tagging to Support Reminding and Refinding. IEEE Transactions on Software Engineering 35, 4 (July 2009), 470–483.
Christoph Treude and Margaret-Anne Storey. 2012. Work Item Tagging: Communicating Concerns in Collaborative Software Development. IEEE Transactions on Software Engineering 38, 1 (2012), 19–34.
J. H. van Moll, J. C. Jacobs, B. Freimut, and J. J. M. Trienekens. 2002. The importance of life cycle modeling to defect detection and prevention. In 10th International Workshop on Software Technology and Engineering Practice (STEP ’02). IEEE Computer Society, 144–155.
Jun Wang, Xiaofang Zhang, Lin Chen, and Xiaoyuan Xie. 2022. Personalizing label prediction for GitHub issues. Information and Software Technology 145 (2022), 106845. DOI: 10.1016/j.infsof.2022.106845
Wenxin Xiao, Jingyue Li, Hao He, Ruiqiao Qiu, and Minghui Zhou. 2023. Personalized First Issue Recommender for Newcomers in Open Source Projects. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 800–812. DOI: 10.1109/ASE56229.2023.00158
Jifeng Xuan, He Jiang, Zhilei Ren, andWeiqin Zou. 2012. Developer prioritization in bug repositories. In 34th International Conference on Software Engineering (ICSE). 25–35.
Suraj Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn. 2019. Mining Software Defects: Should We Consider Affected Releases?. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 654–665. DOI: 10.1109/ICSE.2019.00075
D Zubrow. 2009. IEEE Standard Classification for Software Anomalies. IEEE Computer Society (2009). IEEE 1044-2009.
Publicado
30/09/2024
Como Citar
JR, Joselito; NASCIMENTO, Lidia P. G.; SANTOS, Alcemir; MACHADO, Ivan.
Issue Labeling Dynamics in Open-Source Projects: A Comprehensive Analysis. In: SIMPÓSIO BRASILEIRO DE COMPONENTES, ARQUITETURAS E REUTILIZAÇÃO DE SOFTWARE (SBCARS), 18. , 2024, Curitiba/PR.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 51-60.
DOI: https://doi.org/10.5753/sbcars.2024.3855.