Exploring Code Clone Behavior in Modern Code Review
Resumo
The Modern Code Review (MCR) process is iterative and asynchronous, enabling the early identification of issues during development. One of the main challenges in this context is the presence of code clones, fragments of code copied with small modifications that hinder maintainability. In this study, we analyzed 80k revisions from the CROP dataset using the Siamese detector, identifying 27,656 relevant clones across six systems. A manual validation indicated a predominance of Type-I (46.7%) and Type-III (45.3%) clones. We also identified 224 reviews in which clones appeared in a single revision (Single), 1,258 reviews in which clones appeared across multiple revisions (Recurring), and 236 reviews at the intersection of both categories. To deepen the analysis, we introduced two metrics, Duration and Distance, to assess how clones are introduced or removed during the review. This paper presents an expanded abstract of the study “An Exploratory Study on the lifecycle of Code Clones During Code Review”, published at SBES 2025.
Referências
Matheus Paixao, Jens Krinke, Donggyun Han, and Mark Harman. 2018. CROP: Linking Code Reviews to Source Code Changes. In Proceedings of the IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) (Gothenburg, Sweden). IEEE, 46–49.
Chaiyong Ragkhitwetsagul and Jens Krinke. 2019. Siamese: scalable and incremental code clone search via multiple code representations. Empirical Software Engineering 24, 4 (2019), 2236–2284.
Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe Bianco, and Rocco Oliveto. 2019. Toxic code snippets on stack overflow. IEEE Transactions on Software Engineering 47, 3 (2019), 560–581.
Chanchal K Roy, Minhaz F Zibran, and Rainer Koschke. 2014. The vision of software clone management: Past, present, and future (keynote paper). In 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). IEEE, 18–33.
Jeffrey Svajlenko, Judith F Islam, Iman Keivanloo, Chanchal K Roy, and Mohammad Mamun Mia. 2014. Towards a big data curated benchmark of inter-project code clones. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 476–480.
Anderson Uchôa, Caio Barbosa, Daniel Coutinho, Willian Oizumi, Wesley KG Assunçao, Silvia Regina Vergilio, Juliana Alves Pereira, Anderson Oliveira, and Alessandro Garcia. 2021. Predicting design impactful changes in modern code review: A large-scale empirical study. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 471–482.
Italo Uchoa, Denis Sousa, Matheus Paixão, Pedro Ulisses, Anderson Uchoa, and Chaiyong Ragkhitwetsagul. 2025. An Exploratory Study on the Lifecycle of Code Clones During Code Review. In Brazilian Symposium on Software Engineering (SBES). Recife, Brazil.
Wei Wang and Michael Godfrey. 2014. Investigating intentional clone refactoring. Electronic Communications of the EASST 63 (2014).
Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer recommendation for pull-requests in GitHub: What can we learn from code reviewand bug assignment? Information and software technology 74 (2016).
