Cross-language Clone Detection for Mobile Apps

  • Stephannie Jimenez Universidad de los Andes
  • Gordana Rakic University of Novi Sad
  • Silvia Takahashi Universidad de los Andes
  • Nicolás Cardozo Universidad de los Andes


Clone detection provides insight about replicated fragments in a code base. With the rise of multi-language code bases, new techniques addressing cross-language code clone detection enable the analysis of polyglot systems. Such techniques have not yet been applied to the mobile apps’ domain, which are naturally polyglot. Native mobile app developers must synchronize their code base in at least two different programming languages. App synchronization is a difficult and time-consuming maintenance task, as features can rapidly diverge between platforms, and feature identification must be performed manually. Our goal is to provide an analysis framework to reduce the impact of app synchronization. A first step in this direction consists on a structural algorithm for cross-language clone detection exploiting the idea behind enriched concrete syntax trees. Such trees are used as a common intermediate representation built from programming languages’ grammars, to detect similarities between app code bases. Our technique finds code similarities with 79% precision for controlled tests where Type 1-3 clones are manually injected for the analysis of both single- and cross-language cases for Kotlin and Dart. We evaluate our tool on a corpus of 52 mobile apps identifying code similarities with a precision of 65% to 84% for the full application logic.

Palavras-chave: Program analysis, Clone detection, Cross-language analysis, Mobile apps


(2021). PMD: An extensible cross-language static code analyzer.

Ain, Q. U., Butt, W. H., Anwar, M. W., Azam, F., and Maqbool, B. (2019). A Systematic Review on Code Clone Detection. IEEE Access, 7:86121–86144.

Al-Omari, F., Keivanloo, I., Roy, C. K., and Rilling, J. (2012). Detecting clones across microsoft. net programming languages. In 2012 19th Working Conference on Reverse Engineering, pages 405–414. IEEE.

Budimac, Z., Rakić, G., and Savić, M. (2012). Ssqsa architecture. In Balkan Conference in Informatics, BCI’12, page 287–290, New York, NY, USA. ACM.

Cheng, X., Peng, Z., Jiang, L., Zhong, H., Yu, H., and Zhao, J. (2017). Clcminer: detecting cross-language clones without intermediates. IEICE TRANSACTIONS on Information and Systems, 100(2):273–284.

Cordy, J. R. and Roy, C. K. (2011). The NiCad Clone Detector. In IEEE International Conference on Program Comprehension, pages 219–220.

Göde, N. and Koschke, R. (2009). Incremental Clone Detection. In European Conference on Software Maintenance and Reengineering, SMR’09, pages 219–228.

Gordon, S. and Bannier, B. (2021). xsgordon/duplo-fork: C/C++/Java Duplicate Source Code Block Finder.

Harris, S. (2018). Simian - Similarity Analyser — Duplicate Code Detection for the Enterprise — Overview.

Kamiya, T., Kusumoto, S., and Inoue, K. (2002). CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Transactions on Software Engineering, 28(7):654–670.

Kraft, N. A., Bonds, B. W., and Smith, R. K. (2008). Cross-language clone detection. In SEKE, pages 54–59.

Lingxiao, J., Ghassan, M., Zhendong, S., and Stephane, G. (2018). skyhover/Deckard: Code clone detection; clone-related bug detection; sematic clone analysis.

Mondal, M., Roy, C. K., and Schneider, K. A. (2020). A survey on clone refactoring and tracking. Journal of Systems and Software, 159(110429):27.

Nichols, L., Emre, M., and Hardekopf, B. (2019). Structural and nominal cross-language clone detection. In Hähnle, R. and van der Aalst,W., editors, Fundamental Approaches to Software Engineering, FASE’19, pages 247–263. Springer International Publishing.

Parr, T. (2013). The Definitive ANTLR 4 Reference, volume 1. 1 edition.

Patkar, N., Ghafari, M., Nierstrasz, O., and Hotomski, S. (2020). Caveats in eliciting mobile app requirements. In Proceedings of the Evaluation and Assessment in Software Engineering, EASE’20, pages 180–189, New York, NY, USA. ACM.

Rakić, G. and Budimac, Z. (2013). Introducing enriched concrete syntax trees. In Proceedings of the International Multiconference on Information Society, pages 211–214.

Roy, C. K., Cordy, J. R., and Koschke, R. (2009). Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 74(7):470–495.

Sajnani, H., Saini, V., Svajlenko, J., Roy, C. K., and Lopes, C. V. (2016). SourcererCC: Scaling code clone detection to big-code. In International Conference on Software Engineering, pages 1157–1168. IEEE Computer Society.

Vislavski, T., Rakić, G., Cardozo, N., and Budimac, Z. (2018). Licca: A tool for cross-language clone detection. In International Conference on Software Analysis, Evolution and Reengineering, SANER’18, pages 512–516. IEEE.

Walker, A., Cerny, T., and Song, E. (2020). Open-source tools and benchmarks for code-clone detection. ACM SIGAPP Applied Computing Review, 19(4):28–39.

Zhang, F., Li, L., Liu, C., and Zeng, Q. (2020). Flow Chart Generation-Based Source Code Similarity Detection Using Process Mining. Scientific Programming, 2020.
JIMENEZ, Stephannie; RAKIC, Gordana; TAKAHASHI, Silvia; CARDOZO, Nicolás. Cross-language Clone Detection for Mobile Apps. In: CONGRESSO IBERO-AMERICANO EM ENGENHARIA DE SOFTWARE (CIBSE), 26. , 2023, Montevideo, Uruguai. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 107-121. DOI: