Differential Fuzzing Go Compilers using LLMs: A Methodological Proposal

Luiz Paulo Grafetti Terres; Samuel da Silva Feitosa

doi:10.5753/eres.2025.16857

Luiz Paulo Grafetti Terres UFFS
Samuel da Silva Feitosa UFFS

DOI: https://doi.org/10.5753/eres.2025.16857

Resumo

Compilers are essential tools for software development. Ensuring their reliability is vital for the security of the software ecosystem. Traditional compiler testing exposes them to many generated programs, but conventional code generators struggle with its own complex implementation and effective test case generation. Recent advancements in Large Language Models (LLMs) and their code-related proficiency present an opportunity to address these challenges. This work proposes a methodology for the differential fuzzing of Go compilers that leverages LLMs as test case generators. The proposed method employs a cross-compiler differential testing strategy to test three compilers: GOLLVM, GCCGO, and the official Go compiler.

Palavras-chave: Compilers, LLMs, Differential Fuzzing, Go Compilers, Test Generation

Referências

Aho, A. V., Lam, M. S., Sethi, R., and Ullman, J. D. (2006). Compilers: Principles, Techniques, and Tools (2nd Edition). Addison Wesley.

Aschermann, C., Frassetto, T., Holz, T., Jauernig, P., Sadeghi, A.-R., and Teuchert, D. (2019). Nautilus: Fishing for deep bugs with grammars.

Bauer, S., Cuoq, P., and Regehr, J. (2015). Deniable backdoors using compiler bugs. International Journal of PoC—— GTFO, 0x08, pages 7–9.

Boujarwah, A. and Saleh, K. (1997). Compiler test case generation methods: a survey and assessment. Information and Software Technology, 39(9):617–625.

Chen, J., Patra, J., Pradel, M., Xiong, Y., Zhang, H., Hao, D., and Zhang, L. (2020). A survey of compiler testing. ACM Comput. Surv., 53(1).

Eom, J., Jeong, S., and Kwon, T. (2024). Fuzzing javascript interpreters with coverage-guided reinforcement learning for llm-based mutation. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2024, page 1656–1668, New York, NY, USA. Association for Computing Machinery.

Gu, Q. (2023). Llm-based code generation method for golang compiler testing. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, page 2201–2203, New York, NY, USA. Association for Computing Machinery.

Herrera, A., Gunadi, H., Magrath, S., Norrish, M., Payer, M., and Hosking, A. L. (2021). Seed selection for successful fuzzing. In Proceedings of the 30th ACM SIGSOFT In ternational Symposium on Software Testing and Analysis, ISSTA 2021, page 230–243, New York, NY, USA. Association for Computing Machinery.

Jiang, J., Wang, F., Shen, J., Kim, S., and Kim, S. (2024). A survey on large language models for code generation.

Leroy, X., Blazy, S., Kästner, D., Schommer, B., Pister, M., and Ferdinand, C. (2016). CompCert - A Formally Verified Optimizing Compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, Toulouse, France. SEE.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv., 55(9).

Liu, Y., He, H., Han, T., Zhang, X., Liu, M., Tian, J., Zhang, Y., Wang, J., Gao, X., Zhong, T., Pan, Y., Xu, S., Wu, Z., Liu, Z., Zhang, X., Zhang, S., Hu, X., Zhang, T., Qiang, N., Liu, T., and Ge, B. (2024). Understanding llms: A comprehensive overview from training to inference.

Liu, Y., He, H., Han, T., Zhang, X., Liu, M., Tian, J., Zhang, Y., Wang, J., Gao, X., Zhong, T., Pan, Y., Xu, S., Wu, Z., Liu, Z., Zhang, X., Zhang, S., Hu, X., Zhang, T., Qiang, N., Liu, T., and Ge, B. (2025). Understanding llms: A comprehensive overview from training to inference. Neurocomputing, 620:129190.

Manes, V. J. M., Han, H., Han, C., Cha, S. K., Egele, M., Schwartz, E. J., and Woo, M. (2019). The art, science, and engineering of fuzzing: A survey.

McKeeman, W. M. (1998). Differential testing for software. Digital Technical Journal, 10(1):100–107.

Miller, B. P., Fredriksen, L., and So, B. (1990). An empirical study of the reliability of unix utilities. Communications of the ACM, 33(12):32–44.

Munley, C., Jarmusch, A., and Chandrasekaran, S. (2024). Llm4vv: Developing llm-driven testsuite for compiler validation. Future Generation Computer Systems, 160:1–13.

Ni, Y. and Li, S. (2025). Interleaving large language models for compiler testing.

Ou, X., Li, C., Jiang, Y., and Xu, C. (2025). The mutators reloaded: Fuzzing compilers with large language model generated mutation operators. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, ASPLOS ’24, page 298–312, New York, NY, USA. Association for Computing Machinery.

Sun, C., Le, V., Zhang, Q., and Su, Z. (2016). Toward understanding compiler bugs in gcc and llvm. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, page 294–305, New York, NY, USA. Association for Computing Machinery.

Tang, Y., Ren, Z., Kong, W., and Jiang, H. (2020). Compiler testing: a systematic literature analysis. Frontiers of Computer Science, 14(1):1–20.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. (2017). Attention is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.

Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., and Wang, Q. (2024a). Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering, 50(4):911–936.

Wang, Z., Chu, Z., Doan, T. V., Ni, S., Yang, M., and Zhang, W. (2024b). History, development, and principles of large language models-an introductory survey.

Xia, C. S., Paltenghi, M., Le Tian, J., Pradel, M., and Zhang, L. (2024). Fuzz4all: Universal fuzzing with large language models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, ICSE ’24, New York, NY, USA. Association for Computing Machinery.

Yang, W., Gao, C., Liu, X., Li, Y., and Xue, Y. (2024). Rust-twins: Automatic rust compiler testing through program mutation and dual macros generation. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE ’24, page 631–642, New York, NY, USA. Association for Computing Machinery.

Zhang, Z., Klees, G., Wang, E., Hicks, M., and Wei, S. (2023). Fuzzing configurations of program options. ACM Trans. Softw. Eng. Methodol., 32(2).