Language Models as Architectural Gatekeepers: Automating Conformance Checking from Natural Language

Andrielly Lucena; Everton L. G. Alves; João Brunet

doi:10.5753/sbes.2025.11100

Andrielly Lucena UFCG
Everton L. G. Alves UFCG
João Brunet UFCG

DOI: https://doi.org/10.5753/sbes.2025.11100

Resumo

Ensuring that implemented code adheres to its intended architecture (architectural conformance) remains a critical challenge in software engineering. While formal verification tools exist, their use is hindered by the overhead of explicitly defining formal architectural specifications. At the same time, valuable architectural decisions and design constraints are often embedded in informal channels, such as pull request discussions and issue trackers, where they are expressed in natural language rather than formal specifications. In this paper, we explore the potential of Large Language Models (LLMs) to bridge this gap by investigating whether they can transform informal design rules, based on real development discussions on GitHub, into design tests — executable conformance checkers. Rather than relying on formal models, our approach focuses on deriving design tests from implicit architectural decisions discussed in natural language. To investigate this, we conducted a preliminary empirical study to generate 30 design tests representing rules for 6 design patterns. Our results show the potential of such a strategy, as 96.67% of the generated design tests execute successfully. Furthermore, 63.33% correctly assert expected behaviors, and 76.67% accurately reflect the intended architectural rules. This approach has the potential to simplify the conformance checking process by reducing the need to manually write formal specifications and tests. By leveraging existing development discussions, it makes the process more accessible and less time-consuming, and supports early identification of architectural violations during development.

Palavras-chave: Large Language Models, Design Rules, Architectural Conformance, Design Tests

Referências

Mistral AI. 2024. Codestral. Mistral AI. [link] Accessed: 23 Mar. 2025.

Emilie Anthony, Astrid Berntsson, Tiziano Santilli, and Rebekka Wohlrab. 2024. We’re Drifting Apart: Architectural Drift from the Developers’ Perspective. In 2024 IEEE 21st International Conference on Software Architecture (ICSA). IEEE, 101–111.

Len Bass, Paul Clements, and Rick Kazman. 2012. Software Architecture in Practice (3rd ed.). Addison-Wesley Professional.

David Baum, Jens Dietrich, Craig Anslow, and Richard Müller. 2018. Visualizing Design Erosion: How Big Balls of Mud are Made. In 2018 IEEEWorking Conference on Software Visualization (VISSOFT). 122–126. DOI: 10.1109/VISSOFT.2018.00022

Joao Brunet, Dalton Guerrero, and Jorge Figueiredo. 2009. Design tests: An approach to programmatically check your code against design rules. In 2009 31st International Conference on Software Engineering-Companion Volume. IEEE, 255–258.

João Brunet, Gail C Murphy, Ricardo Terra, Jorge Figueiredo, and Dalton Serey. 2014. Do developers discuss design?. In Proceedings of the 11thWorking Conference on Mining Software Repositories. 340–343.

Andrea Caracciolo, Mircea Filip Lungu, and Oscar Nierstrasz. 2015. A unified approach to architecture conformance checking. In 2015 12th Working IEEE/IFIP Conference on Software Architecture. IEEE, 41–50.

Fangwei Chen, Li Zhang, and Xiaoli Lian. 2020. An improved mapping method for automated consistency check between software architecture and source code. In 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). IEEE, 60–71.

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).

Eduardo F de Lima and Ricardo Terra. 2020. ArchPython: architecture conformance checking for Python systems. In Proceedings of the XXXIV Brazilian Symposium on Software Engineering. 772–777.

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024).

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 (2020), 9459–9474.

Andrielly Lucena. 2025. andriellyll/design-test-generation: Reproducibility Package – Language Models as Architectural Gatekeepers. DOI: 10.5281/zenodo.17055874

Jorge Miño, Roberto Andrade, Jenny Torres, and Kharol Chicaiza. 2024. Leveraging Generative Artificial Intelligence for Software Antipattern Detection. In International Conference on Information Management. Springer, 138–149.

OpenAI. 2024. GPT-4o. OpenAI. [link] Accessed: 21 Mar. 2025.

PostgreSQL Global Development Group. 2024. PostgreSQL Documentation. [link] Acessado: 2025-03-23.

Mengchao Ren. 2024. Advancements and Applications of Large Language Models in Natural Language Processing: A Comprehensive Review. Applied and Computational Engineering 97 (2024), 55–63.

Guillermo Rodriguez, Marcelo Armentano, Álvaro Soria, and Emilio Corengia. 2020. Evaluation of Markov Models for Architecture Conformance Checking. IEEE Latin America Transactions 18, 01 (2020), 43–50.

Daniel Gustavo San Martín Santibánez. 2021. REMEDY: architectural conformance checking for adaptive systems. (2021).

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

Erik Whiting and Sharon Andrews. 2020. Drift and Erosion in Software Architecture: Summary and Prevention Strategies. In Proceedings of the 2020 the 4th International Conference on Information System and Data Mining (Hawaii, HI, USA) (ICISDM ’20). Association for Computing Machinery, New York, NY, USA, 132–138. DOI: 10.1145/3404663.3404665

Haozhou Zhao. 2023. pgvector: Open-source vector similarity search for Postgres. [link]. Accessed: 17 May 2025.