Knowledge Islands: Visualizing Developers Knowledge Concentration

Otávio Cury; Guilherme Avelino

doi:10.5753/sbes.2024.3610

Otávio Cury UFPI
Guilherme Avelino UFPI

DOI: https://doi.org/10.5753/sbes.2024.3610

Resumo

Current software development is often a cooperative activity, where different situations can arise that put the existence of a project at risk. One common and extensively studied issue in the software engineering literature is the concentration of a significant portion of knowledge about the source code in a few developers on a team. In this scenario, the departure of one of these key developers could make it impossible to continue the project. This work presents Knowledge Islands, a tool that visualizes the concentration of knowledge in a software repository using a state-of-the-art knowledge model. Key features of Knowledge Islands include user authentication, cloning, and asynchronous analysis of user repositories, identification of the expertise of the team’s developers, calculation of the Truck Factor for all folders and source code files, and identification of the main developers and repository files. This open-source tool enables practitioners to analyze GitHub projects, determine where knowledge is concentrated within the development team, and implement measures to maintain project health. The source code of Knowledge Islands is available in a public repository, and there is a presentation about the tool in video.

Palavras-chave: Software repository mining, knowledge concentration, code authorship

Referências

Nuri Almarimi, Ali Ouni, Moataz Chouchen, and Mohamed Wiem Mkaouer. 2021. csDetector: an open source tool for community smells detection. In 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1560–1564.

Guilherme Avelino, Eleni Constantinou, Marco Tulio Valente, and Alexander Serebrenik. 2019. On the abandonment and survival of open source projects: An empirical investigation. In 13th International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1–12.

Guilherme Avelino, Eleni Constantinou, Marco Tulio Valente, and Alexander Serebrenik. 2019. On the abandonment and survival of open source projects: An empirical investigation. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1–12.

Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating truck factors. In 2016 IEEE 24th International Conference on Program Comprehension (ICPC). IEEE, 1–10.

Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2019. Measuring and analyzing code authorship in 1 + 118 open source projects. Science of Computer Programming 176 (5 2019), 14–32. DOI: 10.1016/j.scico.2019.03.001

Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. 2011. Don’t touch my code! Examining the effects of ownership on software quality. In 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 4–14.

Fabio Calefato, Marco Aurelio Gerosa, Giuseppe Iaffaldano, Filippo Lanubile, and Igor Steinmacher. 2022. Will you come back to contribute? Investigating the inactivity of OSS core developers in GitHub. Empirical Software Engineering 27, 3 (2022), 1–41.

G Ann Campbell and Patroklos P Papapetrou. 2013. SonarQube in action. Manning Publications Co.

Edna Dias Canedo, Rodrigo Bonifácio, Márcio Vinicius Okimoto, Alexander Serebrenik, Gustavo Pinto, and Eduardo Monteiro. 2020. Work practices and perceptions from women core developers in oss communities. In 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–11.

Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2012. Who is going to mentor newcomers in open source projects?. In ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.

Valerio Cosentino, Javier Luis Cánovas Izquierdo, and Jordi Cabot. 2015. Assessing the bus factor of git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 499–503.

Otávio Cury, Guilherme Avelino, Pedro Santos Neto, Marco Túlio Valente, and Ricardo Britto. 2024. Source code expert identification: Models and application. Information and Software Technology (2024), 107445.

Otávio Cury, Guilherme Avelino, Pedro Santos Neto, Ricardo Britto, and Marco Túlio Valente. 2022. Identifying source code file experts. In 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 125–136.

Mívian Ferreira, Marco Tulio Valente, and Kecia Ferreira. 2017. A comparison of three algorithms for computing truck factors. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 207–217.

Thomas Fritz, Gail C Murphy, Emerson Murphy-Hill, Jingwen Ou, and Emily Hill. 2014. Degree-of-knowledge: Modeling a developer’s knowledge of code. ACM Transactions on Software Engineering and Methodology (TOSEM) 23, 2 (2014), 1–42.

Vahid Haratian, Mikhail Evtikhiev, Pouria Derakhshanfar, Eray Tüzün, and Vladimir Kovalenko. 2023. BFSig: Leveraging File Significance in Bus Factor Estimation. In 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1926–1936.

Lile Hattori and Michele Lanza. 2010. Syde: a tool for collaborative software development. In 32nd ACM/IEEE International Conference on Software Engineering-Volume 2. 235–238.

Md Kamal Hossen, Huzefa Kagdi, and Denys Poshyvanyk. 2014. Amalgamating source code authors, maintainers, and change proneness to triage change requests. In 22nd International Conference on Program Comprehension. 130–141.

E. Jabrayilzade, M. Evtikhiev, E. Tuzun, and V. Kovalenko. 2022. Bus Factor in Practice. In 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE Computer Society, Los Alamitos, CA, USA, 97–106. DOI: 10.1109/ICSE-SEIP55303.2022.9793985

Elgun Jabrayilzade, Mikhail Evtikhiev, Eray Tüzün, and Vladimir Kovalenko. 2022. Bus factor in practice. In 44th International Conference on Software Engineering: Software Engineering in Practice. 97–106.

Andreas Karlsson. [n. d.]. Driving Development Resilience: Analyzing Truck Factors across Proprietary and Open-Source Projects. ([n. d.]).

Egor Klimov, Muhammad Umair Ahmed, Nikolai Sviridov, Pouria Derakhshanfar, Eray Tüzuü, and Vladimir Kovalenko. 2023. Bus Factor Explorer. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2018–2021.

Segla Kpodjedo, Filippo Ricca, Philippe Galinier, and Giuliano Antoniol. 2008. Not all classes are created equal: toward a recommendation system for focusing testing. In Proceedings of the 2008 international workshop on Recommendation systems for software engineering. 6–10.

H Maen, ML Collard, and J Maletic. 2010. Measuring class importance in the context of design evolution. In Program Comprehension (ICPC), IEEE 18th International Conference on. IEEE.

Paul Ralph, Sebastian Baltes, Gianisa Adisaputri, Richard Torkar, Vladimir Kovalenko, Marcos Kalinowski, Nicole Novielli, Shin Yoo, Xavier Devroey, Xin Tan, et al. 2020. Pandemic programming: how COVID-19 affects software developers and how their organizations can help (2020). arXiv preprint arXiv:2005.01127 (2020).

Filippo Ricca, Alessandro Marchetto, and Marco Torchiano. 2011. On the Difficulty of Computing the Truck Factor. Vol. 6759 LNCS. 337–351. Issue ii. DOI: 10.1007/978-3-642-21843-9_26

Peter C Rigby, Yue Cai Zhu, Samuel M Donadelli, and Audris Mockus. 2016. Quantifying and mitigating turnover-induced knowledge loss. 38th International Conference on Software Engineering (ICSE), 1006–1016. DOI: 10.1145/2884781.2884851

Emre Sülün, Eray Tüzün, and Uğur Doğrusöz. 2019. Reviewer recommendation using software artifact traceability graphs. In 15th International Conference on Predictive Models and Data Analytics in Software Engineering. 66–75.

Adam Tornhill. 2015. Your code as a crime scene: use forensic techniques to arrest defects, bottlenecks, and bad design in your programs. Your Code as a Crime Scene (2015), 1–218.

Adam Tornhill. 2018. Assessing technical debt in automated tests with codescene. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 122–125.