An Exploratory Study on the Lifecycle of Code Clones During Code Review
Abstract
The Modern Code Review (MCR) process is iterative and asynchronous, enabling early identification of several issues during software development. Overall, code reviewconsists of inspecting code before merging it into the codebase. Code clones are code fragments that are copied and reused across different (or the same) codebases, often with minor changes. Developers must be aware of code clones in their projects, as issues in a cloned fragment may cause adjustments in all related clones, which can significantly impact the project’s maintainability. Nevertheless, there is still a gap in research addressing the presence and behavior of code clones during code review. By leveraging the CROP dataset (with over 28k code reviews and 80k revisions) and the Siamese clone detector, we identified 27,656 relevant code clones that underwent code review in 6 different software systems. A manual validation of a representative sample indicated a predominance of Type-I (46.74%) and Type-III (45.3%) clones. Based on the clones’ lifecycle within the review, we categorized the reviews into Single and Recurring, according to how the clones are introduced and/or removed during the review process. We identified 224 reviews for which clones appear in a single review (Single) and 1,258 reviews for which clones appear in multiple revisions (Recurring). Additionally, 236 code reviews lie at the intersection of Recurring and Single code reviews. To deepen the analysis, we introduced two metrics, Duration and Distance, to assess how clones are introduced or removed during the review. We observed that, on average, clones are often introduced at the beginning of the code review, commonly surviving the review process and being merged into the codebase.
Keywords:
Modern Code Review, Code Clones, Empirical Study
References
Qurat Ul Ain, Wasi Haider Butt, Muhammad Waseem Anwar, Farooque Azam, and Bilal Maqbool. 2019. A systematic review on code clone detection. IEEE access 7 (2019), 86121–86144.
Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 712–721.
Jason Cohen. 2010. Modern code review. Making Software: What Really Works, and Why We Believe It (2010), 329–336.
Dror G Feitelson, Eitan Frachtenberg, and Kent L Beck. 2013. Development and deployment at facebook. IEEE Internet Computing 17, 4 (2013), 8–17.
Judith F Islam, Manishankar Mondal, and Chanchal K Roy. 2016. Bug replication in code clones: An empirical study. In 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), Vol. 1. IEEE, 68–78.
Jing Jiang, Jiangfeng Lv, Jiateng Zheng, and Li Zhang. 2021. How developers modify pull requests in code review. IEEE Transactions on Reliability 71, 3 (2021), 1325–1339.
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE transactions on software engineering 28, 7 (2002), 654–670.
Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2016. An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering 21 (2016), 2146–2189.
Manishankar Mondai, Chanchal K Roy, and Kevin A Schneider. 2018. Microclones in evolving software. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 50–60.
Manishankar Mondal, Banani Roy, Chanchal K Roy, and Kevin A Schneider. 2019. Investigating context adaptation bugs in code clones. In 2019 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 157–168.
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider. 2018. Micro-clones in Evolving Software. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 70–80. DOI: 10.1109/SANER.2018.8330196
Matheus Paixao, Jens Krinke, Donggyun Han, and Mark Harman. [n. d.]. Codebase Repository of CROP. [link]
Matheus Paixao, Jens Krinke, Donggyun Han, and Mark Harman. 2018. CROP: Linking Code Reviews to Source Code Changes. In Proceedings of the IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) (Gothenburg, Sweden). IEEE, 46–49.
Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, and Mark Harman. 2019. The impact of code review on architectural changes. IEEE Transactions on Software Engineering 47, 5 (2019), 1041–1059.
Matheus Paixao and Paulo Henrique Maia. 2019. Rebasing in code review considered harmful: A large-scale empirical investigation. In 2019 19th international working conference on source code analysis and manipulation (SCAM). IEEE, 45–55.
Matheus Paixão, Anderson Uchôa, Ana Carla Bibiano, Daniel Oliveira, Alessandro Garcia, Jens Krinke, and Emilio Arvonio. 2020. Behind the intents: An in-depth empirical study on software refactoring in modern code review. In Proceedings of the 17th International Conference on Mining Software Repositories. 125–136.
Luca Pascarella, Davide Spadini, Fabio Palomba, and Alberto Bacchelli. 2019. On the effect of code review on code smells. arXiv preprint arXiv:1912.10098 (2019).
Chaiyong Ragkhitwetsagul and Jens Krinke. 2019. Siamese: scalable and incremental code clone search via multiple code representations. Empirical Software Engineering 24, 4 (2019), 2236–2284.
Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe Bianco, and Rocco Oliveto. 2019. Toxic code snippets on stack overflow. IEEE Transactions on Software Engineering 47, 3 (2019), 560–581.
Chanchal K Roy, Minhaz F Zibran, and Rainer Koschke. 2014. The vision of software clone management: Past, present, and future (keynote paper). In 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). IEEE, 18–33.
Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern code review: a case study at google. In Proceedings of the 40th international conference on software engineering: Software engineering in practice. 181–190.
G Shobha, Ajay Rana, Vineet Kansal, and Sarvesh Tanwar. 2021. Code clone detection—a systematic review. Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2020, Volume 2 (2021), 645–655.
Denis Sousa, Matheus Paixao, Chaiyong Ragkhitwetsagul, and Italo Uchoa. 2024. Code Clone Configuration as a Multi-Objective Search Problem. In Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 503–509.
Jeffrey Svajlenko, Judith F Islam, Iman Keivanloo, Chanchal K Roy, and Mohammad Mamun Mia. 2014. Towards a big data curated benchmark of inter-project code clones. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 476–480.
Christopher Thompson and David Wagner. 2017. A large-scale study of modern code review and security in open source projects. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering. 83–92.
Anderson Uchôa, Caio Barbosa, Daniel Coutinho, Willian Oizumi, Wesley KG Assunçao, Silvia Regina Vergilio, Juliana Alves Pereira, Anderson Oliveira, and Alessandro Garcia. 2021. Predicting design impactful changes in modern code review: A large-scale empirical study. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 471–482.
Anderson Uchôa, Caio Barbosa, Willian Oizumi, Publio Blenílio, Rafael Lima, Alessandro Garcia, and Carla Bezerra. 2020. How does modern code review impact software design degradation? an in-depth empirical study. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 511–522.
Italo Uchoa, Denis Sousa, Matheus Paixão, Pedro Maia, Anderson Uchôa, and Chaiyong Ragkhitwetsagul. 2025. Replication Package for the paper: ‘An Exploratory Study on the Lifecycle of Code Clones during Code Review”. [link]
Brent van Bladel and Serge Demeyer. 2021. A comparative study of test code clones and production code clones. Journal of Systems and Software 176 (2021), 110940.
Tiantian Wang, Mark Harman, Yue Jia, and Jens Krinke. 2013. Searching for better configurations: a rigorous approach to clone evaluation. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 455–465.
Wei Wang and Michael Godfrey. 2014. Investigating intentional clone refactoring. Electronic Communications of the EASST 63 (2014).
Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 261–271.
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media.
Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Information and software technology 74 (2016), 204–218.
Morteza Zakeri-Nasrabadi, Saeed Parsa, Mohammad Ramezani, Chanchal Roy, and Masoud Ekhtiarzadeh. 2023. A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges. Journal of Systems and Software 204 (10 2023), 111796. DOI: 10.1016/j.jss.2023.111796
Fiorella Zampetti, Gabriele Bavota, Gerardo Canfora, and Massimiliano Di Penta. 2019. A study on the interplay between pull request review and continuous integration builds. In 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, 38–48.
Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and challenges of modern code review. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 712–721.
Jason Cohen. 2010. Modern code review. Making Software: What Really Works, and Why We Believe It (2010), 329–336.
Dror G Feitelson, Eitan Frachtenberg, and Kent L Beck. 2013. Development and deployment at facebook. IEEE Internet Computing 17, 4 (2013), 8–17.
Judith F Islam, Manishankar Mondal, and Chanchal K Roy. 2016. Bug replication in code clones: An empirical study. In 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), Vol. 1. IEEE, 68–78.
Jing Jiang, Jiangfeng Lv, Jiateng Zheng, and Li Zhang. 2021. How developers modify pull requests in code review. IEEE Transactions on Reliability 71, 3 (2021), 1325–1339.
Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. 2002. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE transactions on software engineering 28, 7 (2002), 654–670.
Shane McIntosh, Yasutaka Kamei, Bram Adams, and Ahmed E Hassan. 2016. An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering 21 (2016), 2146–2189.
Manishankar Mondai, Chanchal K Roy, and Kevin A Schneider. 2018. Microclones in evolving software. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 50–60.
Manishankar Mondal, Banani Roy, Chanchal K Roy, and Kevin A Schneider. 2019. Investigating context adaptation bugs in code clones. In 2019 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 157–168.
Manishankar Mondal, Chanchal K. Roy, and Kevin A. Schneider. 2018. Micro-clones in Evolving Software. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 70–80. DOI: 10.1109/SANER.2018.8330196
Matheus Paixao, Jens Krinke, Donggyun Han, and Mark Harman. [n. d.]. Codebase Repository of CROP. [link]
Matheus Paixao, Jens Krinke, Donggyun Han, and Mark Harman. 2018. CROP: Linking Code Reviews to Source Code Changes. In Proceedings of the IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) (Gothenburg, Sweden). IEEE, 46–49.
Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, and Mark Harman. 2019. The impact of code review on architectural changes. IEEE Transactions on Software Engineering 47, 5 (2019), 1041–1059.
Matheus Paixao and Paulo Henrique Maia. 2019. Rebasing in code review considered harmful: A large-scale empirical investigation. In 2019 19th international working conference on source code analysis and manipulation (SCAM). IEEE, 45–55.
Matheus Paixão, Anderson Uchôa, Ana Carla Bibiano, Daniel Oliveira, Alessandro Garcia, Jens Krinke, and Emilio Arvonio. 2020. Behind the intents: An in-depth empirical study on software refactoring in modern code review. In Proceedings of the 17th International Conference on Mining Software Repositories. 125–136.
Luca Pascarella, Davide Spadini, Fabio Palomba, and Alberto Bacchelli. 2019. On the effect of code review on code smells. arXiv preprint arXiv:1912.10098 (2019).
Chaiyong Ragkhitwetsagul and Jens Krinke. 2019. Siamese: scalable and incremental code clone search via multiple code representations. Empirical Software Engineering 24, 4 (2019), 2236–2284.
Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe Bianco, and Rocco Oliveto. 2019. Toxic code snippets on stack overflow. IEEE Transactions on Software Engineering 47, 3 (2019), 560–581.
Chanchal K Roy, Minhaz F Zibran, and Rainer Koschke. 2014. The vision of software clone management: Past, present, and future (keynote paper). In 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). IEEE, 18–33.
Caitlin Sadowski, Emma Söderberg, Luke Church, Michal Sipko, and Alberto Bacchelli. 2018. Modern code review: a case study at google. In Proceedings of the 40th international conference on software engineering: Software engineering in practice. 181–190.
G Shobha, Ajay Rana, Vineet Kansal, and Sarvesh Tanwar. 2021. Code clone detection—a systematic review. Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2020, Volume 2 (2021), 645–655.
Denis Sousa, Matheus Paixao, Chaiyong Ragkhitwetsagul, and Italo Uchoa. 2024. Code Clone Configuration as a Multi-Objective Search Problem. In Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 503–509.
Jeffrey Svajlenko, Judith F Islam, Iman Keivanloo, Chanchal K Roy, and Mohammad Mamun Mia. 2014. Towards a big data curated benchmark of inter-project code clones. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 476–480.
Christopher Thompson and David Wagner. 2017. A large-scale study of modern code review and security in open source projects. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering. 83–92.
Anderson Uchôa, Caio Barbosa, Daniel Coutinho, Willian Oizumi, Wesley KG Assunçao, Silvia Regina Vergilio, Juliana Alves Pereira, Anderson Oliveira, and Alessandro Garcia. 2021. Predicting design impactful changes in modern code review: A large-scale empirical study. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 471–482.
Anderson Uchôa, Caio Barbosa, Willian Oizumi, Publio Blenílio, Rafael Lima, Alessandro Garcia, and Carla Bezerra. 2020. How does modern code review impact software design degradation? an in-depth empirical study. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 511–522.
Italo Uchoa, Denis Sousa, Matheus Paixão, Pedro Maia, Anderson Uchôa, and Chaiyong Ragkhitwetsagul. 2025. Replication Package for the paper: ‘An Exploratory Study on the Lifecycle of Code Clones during Code Review”. [link]
Brent van Bladel and Serge Demeyer. 2021. A comparative study of test code clones and production code clones. Journal of Systems and Software 176 (2021), 110940.
Tiantian Wang, Mark Harman, Yue Jia, and Jens Krinke. 2013. Searching for better configurations: a rigorous approach to clone evaluation. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 455–465.
Wei Wang and Michael Godfrey. 2014. Investigating intentional clone refactoring. Electronic Communications of the EASST 63 (2014).
Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 261–271.
Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media.
Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Information and software technology 74 (2016), 204–218.
Morteza Zakeri-Nasrabadi, Saeed Parsa, Mohammad Ramezani, Chanchal Roy, and Masoud Ekhtiarzadeh. 2023. A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges. Journal of Systems and Software 204 (10 2023), 111796. DOI: 10.1016/j.jss.2023.111796
Fiorella Zampetti, Gabriele Bavota, Gerardo Canfora, and Massimiliano Di Penta. 2019. A study on the interplay between pull request review and continuous integration builds. In 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). IEEE, 38–48.
Published
2025-09-22
How to Cite
UCHOA, Italo; SOUSA, Denis; PAIXAO, Matheus; MAIA, Pedro; UCHÔA, Anderson; RAGKHITWETSAGUL, Chaiyong.
An Exploratory Study on the Lifecycle of Code Clones During Code Review. In: BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING (SBES), 39. , 2025, Recife/PE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 293-303.
ISSN 2833-0633.
DOI: https://doi.org/10.5753/sbes.2025.9930.
