Evaluating the Capability of LLMs in Identifying Compilation Errors in Configurable Systems

Resumo


Compilation is an important process in developing configurable systems, such as Linux. However, identifying compilation errors in configurable systems is not straightforward because traditional compilers are not variability-aware. Previous approaches that detect some of these compilation errors often rely on advanced techniques that require significant effort from programmers. This study evaluates the efficacy of Large Language Models (LLMs), specifically ChatGPT4, Le Chat Mistral and Gemini Advanced 1.5, in identifying compilation errors in configurable systems. Initially, we evaluate 50 small products in C++, Java, and C languages, followed by 30 small configurable systems in C, covering 17 different types of compilation errors. ChatGPT4 successfully identified most compilation errors in individual products and in configurable systems, while Le Chat Mistral and Gemini Advanced 1.5 detected some of them. LLMs have shown potential in assisting developers in identifying compilation errors in configurable systems.
Palavras-chave: LLMs, Compilation Errors, Configurable Systems

Referências

Iago Abal, Claus Brabrand, and Andrzej Wasowski. 2014. 42 variability bugs in the linux kernel: a qualitative analysis. In ACM/IEEE International Conference on Automated Software Engineering. ACM, 421–432.

Iago Abal, Jean Melo, Stefan Stănciulescu, Claus Brabrand, Márcio Ribeiro, and Andrzej Wasowski. 2018. Variability bugs in highly configurable systems: a qualitative analysis. Transactions on Software Engineering and Methodology, 26, 3, 10:1–10:34.

Lucas Albuquerque, Rohit Gheyi, and Márcio Ribeiro. 2024. Evaluating the capability of llms in identifying compilation errors in configurable systems (artifacts). [link]. (2024).

Victor R. Basili, Gianluigi Caldiera, and H. Dieter Rombach. 1994. The Goal Question Metric Approach, 528–532.

Ira D. Baxter and Michael Mehlich. 2001. Preprocessor conditional removal by simple partial evaluation. In Proceedings of the Eighth Working Conference on Reverse Engineering. IEEE Computer Society, 281–290.

Larissa Braz, Rohit Gheyi, Melina Mongiovi, Márcio Ribeiro, Flávio Medeiros, and Leopoldo Teixeira. 2016. A change-centric approach to compile configurable systems with #ifdefs. In Proceedings of the 15th International Conference on Generative Programming: Concepts & Experiences, 109–119.

Larissa Braz, Rohit Gheyi, Melina Mongiovi, Márcio Ribeiro, Flávio Medeiros, Leopoldo Teixeira, and Sabrina Souto. 2018. A change-aware per-file analysis to compile configurable systems with #ifdefs. Computer Languages, Systems & Structures, 54, 427–450.

DAIR.AI. 2024. Prompt Engineering Guide. [link]. (2024).

Paul Gazzillo and Robert Grimm. 2012. SuperC: parsing all of C by taming the preprocessor. In ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 323–334.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.

Xinyi Hou et al. 2023. Large Language Models for software engineering: A systematic literature review. DOI: 10.48550/ARXIV.2308.10620.

Christian Kästner, Paolo G. Giarrusso, Tillmann Rendel, Sebastian Erdweg, Klaus Ostermann, and Thorsten Berger. 2011. Variability-aware parsing in the presence of lexical macros and conditional compilation. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, 805–824.

Jörg Liebig, Sven Apel, Christian Lengauer, Christian Kästner, and Michael Schulze. 2010. An analysis of the variability in forty preprocessor-based software product lines. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. ACM, 105–114.

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Computing Surveys (CSUR), 55, 9, 1–35.

Romero Malaquias, Márcio Ribeiro, Rodrigo Bonifácio, Eduardo Monteiro, Flávio Medeiros, Alessandro Garcia, and Rohit Gheyi. 2017. The discipline of preprocessor-based annotations does #ifdef tag n’t #endif matter. In Proceedings of the 25th International Conference on Program Comprehension, 297–307.

Flávio Medeiros, Christian Kastner, Márcio Ribeiro, Rohit Gheyi, and Sven Apel. 2016. A comparison of 10 sampling algorithms for configurable systems. In Proceedings of the International Conference on Software Engineering, 643–654.

Flávio Medeiros, Christian Kastner, Márcio Ribeiro, Sarah Nadi, and Rohit Gheyi. 2015. The love/hate relationship with the C preprocessor: an interview study. In Proceedings of the European Conference on Object-Oriented Programming, 999–1022.

Flávio Medeiros, Márcio Ribeiro, and Rohit Gheyi. 2013. Investigating preprocessor-based syntax errors. In Generative Programming: Concepts and Experiences. ACM, 75–84.

Flávio Medeiros, Márcio Ribeiro, Rohit Gheyi, Larissa Braz, Christian Kästner, Sven Apel, and Kleber Santos. 2020. An empirical study on configurationrelated code weaknesses. In 34th Brazilian Symposium on Software Engineering. ACM, 193–202.

Flávio Medeiros, Iran Rodrigues, Márcio Ribeiro, Leopoldo Teixeira, and Rohit Gheyi. 2015. An empirical study on configuration-related issues: investigating undeclared and unused identifiers. In Proceedings of the Generative Programming: Concepts and Experiences (GPCE), 35–44.

Austin Mordahl, Jeho Oh, Ugur Koc, Shiyi Wei, and Paul Gazzillo. 2019. An empirical study of real-world variability bugs detected by variability-oblivious tools. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 50–61.

Raphael Muniz, Larissa Braz, Rohit Gheyi, Wilkerson Andrade, Baldoino Fonseca, and Márcio Ribeiro. 2018. A qualitative analysis of variability weaknesses in configurable systems with #ifdefs. In Proceedings of the International Workshop on Variability Modelling of Software-Intensive Systems, 51–58.

June Sallou, Thomas Durieux, and Annibale Panichella. 2024. Breaking the silence: the threats of using llms in software engineering. In ACM/IEEE 46th International Conference on Software Engineering - New Ideas and Emerging Results. ACM/IEEE.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008.

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing Wang. 2024. Software testing with large language models: survey, landscape, and vision. IEEE Transactions on Software Engineering, 50, 911–936.

Yue Zhang et al. 2023. Siren’s song in the AI ocean: a survey on hallucination in large language models. (2023). arXiv: 2309.01219 [cs.CL].
Publicado
30/09/2024
ALBUQUERQUE, Lucas; GHEYI, Rohit; RIBEIRO, Márcio. Evaluating the Capability of LLMs in Identifying Compilation Errors in Configurable Systems. In: SIMPÓSIO BRASILEIRO DE ENGENHARIA DE SOFTWARE (SBES), 38. , 2024, Curitiba/PR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2024 . p. 574-580. DOI: https://doi.org/10.5753/sbes.2024.3560.