Investigating LLM Capabilities in the Identification of Compilation Errors in Configurable Systems

Lucas Albuquerque; Rohit Gheyi

doi:10.5753/cbsoft_estendido.2024.4055

Lucas Albuquerque UFCG
Rohit Gheyi UFCG

DOI: https://doi.org/10.5753/cbsoft_estendido.2024.4055

Resumo

Compilation is an important process in developing configurable systems, such as Linux. However, identifying compilation errors in configurable systems is not straightforward because traditional compilers are not variability-aware. Previous approaches that detect some of these compilation errors often rely on advanced techniques that require significant effort from programmers. This study evaluates the efficacy of Large Language Models (LLMs), specifically CHATGPT4, GEMINI ADVANCED 1.5, LE CHAT MISTRAL, and LLAMA 3, in identifying compilation errors in configurable systems. We evaluate them in 30 small configurable systems in C, covering 17 different types of compilation errors. CHATGPT4 successfully identified 28 out of 30 compilation errors. LE CHAT MISTRAL, LLAMA 3 and GEMINI ADVANCED 1.5 detected 24, 20, and 16 errors, respectively. LLMs have shown potential in assisting developers in identifying compilation errors in configurable systems.

Referências

Abal, I., Brabrand, C., and Wasowski, A. (2014). 42 variability bugs in the linux kernel: a qualitative analysis. In Automated Software Engineering, pages 421–432. ACM.

Abal, I., Melo, J., Stănciulescu, S., Brabrand, C., Ribeiro, M., and Wasowski, A. (2018). Variability bugs in highly configurable systems: A qualitative analysis. Transactions on Software Engineering and Methodology, 26(3):10:1–10:34.

Albuquerque, L. and Gheyi, R. (2024). Investigating LLM capabilities in the identification of compilation errors in configurable systems (artifacts). [link].

Albuquerque, L., Gheyi, R., and Ribeiro, M. (2024). Evaluating the capability of LLMs in identifying compilation errors in configurable systems. In Proceedings of the Brazilian Symposium on Software Engineering (NIER track), SBES-NIER.

Baxter, I. D. and Mehlich, M. (2001). Preprocessor conditional removal by simple partial evaluation. In Working Conference on Reverse Engineering, pages 281–290. IEEE.

Braz, L., Gheyi, R., Mongiovi, M., Ribeiro, M., Medeiros, F., and Teixeira, L. (2016). A change-centric approach to compile configurable systems with ifdefs. In Generative Programming: Concepts & Experiences, pages 109–119.

Braz, L., Gheyi, R., Mongiovi, M., Ribeiro, M., Medeiros, F., Teixeira, L., and Souto, S. (2018). A change-aware per-file analysis to compile configurable systems with ifdefs. Computer Languages, Systems & Structures, 54:427–450.

Brown, T. B. et al. (2020). Language models are few-shot learners. In Advances in Neural Information Processing Systems.

DAIR.AI (2024). Prompt Engineering Guide. [link].

Gazzillo, P. and Grimm, R. (2012). SuperC: parsing all of C by taming the preprocessor. In Programming Language Design and Implementation, pages 323–334. ACM.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.

Hou, X. et al. (2023). Large Language Models for software engineering: A systematic literature review.

Kästner, C., Giarrusso, P. G., Rendel, T., Erdweg, S., Ostermann, K., and Berger, T. (2011). Variability-aware parsing in the presence of lexical macros and conditional compilation. In OOPSLA, pages 805–824. ACM.

Liebig, J., Apel, S., Lengauer, C., Kästner, C., and Schulze, M. (2010). An analysis of the variability in forty preprocessor-based software product lines. In International Conference on Software Engineering, pages 105–114. ACM.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys (CSUR), 55(9):1–35.

Malaquias, R., Ribeiro, M., Bonifácio, R., Monteiro, E., Medeiros, F., Garcia, A., and Gheyi, R. (2017). The discipline of preprocessor-based annotations does ifdef tag n’t endif matter. In Int. Conference on Program Comprehension, pages 297–307.

Medeiros, F., Kastner, C., Ribeiro, M., Gheyi, R., and Apel, S. (2016). A comparison of 10 sampling algorithms for configurable systems. In ICSE, pages 643–654.

Medeiros, F., Kastner, C., Ribeiro, M., Nadi, S., and Gheyi, R. (2015a). The love/hate relationship with the C preprocessor: An interview study. In Proceedings of the European Conference on Object-Oriented Programming, pages 999–1022.

Medeiros, F., Ribeiro, M., and Gheyi, R. (2013). Investigating preprocessor-based syntax errors. In Generative Programming: Concepts and Experiences, pages 75–84.

Medeiros, F., Ribeiro, M., Gheyi, R., Braz, L., Kästner, C., Apel, S., and Santos, K. (2020). An empirical study on configuration-related code weaknesses. In 34th Brazilian Symposium on Software Engineering, pages 193–202. ACM.

Medeiros, F., Rodrigues, I., Ribeiro, M., Teixeira, L., and Gheyi, R. (2015b). An empirical study on configuration-related issues: investigating undeclared and unused identifiers. In Generative Programming: Concepts and Experiences, pages 35–44.

Mordahl, A., Oh, J., Koc, U., Wei, S., and Gazzillo, P. (2019). An empirical study of real-world variability bugs detected by variability-oblivious tools. In Foundations of Software Engineering, pages 50–61. ACM.

Muniz, R., Braz, L., Gheyi, R., Andrade, W., Fonseca, B., and Ribeiro, M. (2018). A qualitative analysis of variability weaknesses in configurable systems with ifdefs. In Variability Modelling of Software-Intensive Systems, pages 51–58.

Sallou, J., Durieux, T., and Panichella, A. (2024). Breaking the silence: the threats of using llms in software engineering. In ICSE-NIER. ACM/IEEE.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In NeurIPS, pages 5998–6008.

Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., and Wang, Q. (2024). Software testing with large language models: Survey, landscape, and vision. TSE, 50:911–936.