Language Models as Architectural Gatekeepers: Automating Conformance Checking from Natural Language
Abstract
Ensuring that implemented code adheres to its intended architecture (architectural conformance) remains a critical challenge in software engineering. While formal verification tools exist, their use is hindered by the overhead of explicitly defining formal architectural specifications. At the same time, valuable architectural decisions and design constraints are often embedded in informal channels, such as pull request discussions and issue trackers, where they are expressed in natural language rather than formal specifications. In this paper, we explore the potential of Large Language Models (LLMs) to bridge this gap by investigating whether they can transform informal design rules, based on real development discussions on GitHub, into design tests — executable conformance checkers. Rather than relying on formal models, our approach focuses on deriving design tests from implicit architectural decisions discussed in natural language. To investigate this, we conducted a preliminary empirical study to generate 30 design tests representing rules for 6 design patterns. Our results show the potential of such a strategy, as 96.67% of the generated design tests execute successfully. Furthermore, 63.33% correctly assert expected behaviors, and 76.67% accurately reflect the intended architectural rules. This approach has the potential to simplify the conformance checking process by reducing the need to manually write formal specifications and tests. By leveraging existing development discussions, it makes the process more accessible and less time-consuming, and supports early identification of architectural violations during development.
Keywords:
Large Language Models, Design Rules, Architectural Conformance, Design Tests
References
Mistral AI. 2024. Codestral. Mistral AI. [link] Accessed: 23 Mar. 2025.
Emilie Anthony, Astrid Berntsson, Tiziano Santilli, and Rebekka Wohlrab. 2024. We’re Drifting Apart: Architectural Drift from the Developers’ Perspective. In 2024 IEEE 21st International Conference on Software Architecture (ICSA). IEEE, 101–111.
Len Bass, Paul Clements, and Rick Kazman. 2012. Software Architecture in Practice (3rd ed.). Addison-Wesley Professional.
David Baum, Jens Dietrich, Craig Anslow, and Richard Müller. 2018. Visualizing Design Erosion: How Big Balls of Mud are Made. In 2018 IEEEWorking Conference on Software Visualization (VISSOFT). 122–126. DOI: 10.1109/VISSOFT.2018.00022
Joao Brunet, Dalton Guerrero, and Jorge Figueiredo. 2009. Design tests: An approach to programmatically check your code against design rules. In 2009 31st International Conference on Software Engineering-Companion Volume. IEEE, 255–258.
João Brunet, Gail C Murphy, Ricardo Terra, Jorge Figueiredo, and Dalton Serey. 2014. Do developers discuss design?. In Proceedings of the 11thWorking Conference on Mining Software Repositories. 340–343.
Andrea Caracciolo, Mircea Filip Lungu, and Oscar Nierstrasz. 2015. A unified approach to architecture conformance checking. In 2015 12th Working IEEE/IFIP Conference on Software Architecture. IEEE, 41–50.
Fangwei Chen, Li Zhang, and Xiaoli Lian. 2020. An improved mapping method for automated consistency check between software architecture and source code. In 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). IEEE, 60–71.
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
Eduardo F de Lima and Ricardo Terra. 2020. ArchPython: architecture conformance checking for Python systems. In Proceedings of the XXXIV Brazilian Symposium on Software Engineering. 772–777.
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024).
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 (2020), 9459–9474.
Andrielly Lucena. 2025. andriellyll/design-test-generation: Reproducibility Package – Language Models as Architectural Gatekeepers. DOI: 10.5281/zenodo.17055874
Jorge Miño, Roberto Andrade, Jenny Torres, and Kharol Chicaiza. 2024. Leveraging Generative Artificial Intelligence for Software Antipattern Detection. In International Conference on Information Management. Springer, 138–149.
OpenAI. 2024. GPT-4o. OpenAI. [link] Accessed: 21 Mar. 2025.
PostgreSQL Global Development Group. 2024. PostgreSQL Documentation. [link] Acessado: 2025-03-23.
Mengchao Ren. 2024. Advancements and Applications of Large Language Models in Natural Language Processing: A Comprehensive Review. Applied and Computational Engineering 97 (2024), 55–63.
Guillermo Rodriguez, Marcelo Armentano, Álvaro Soria, and Emilio Corengia. 2020. Evaluation of Markov Models for Architecture Conformance Checking. IEEE Latin America Transactions 18, 01 (2020), 43–50.
Daniel Gustavo San Martín Santibánez. 2021. REMEDY: architectural conformance checking for adaptive systems. (2021).
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
Erik Whiting and Sharon Andrews. 2020. Drift and Erosion in Software Architecture: Summary and Prevention Strategies. In Proceedings of the 2020 the 4th International Conference on Information System and Data Mining (Hawaii, HI, USA) (ICISDM ’20). Association for Computing Machinery, New York, NY, USA, 132–138. DOI: 10.1145/3404663.3404665
Haozhou Zhao. 2023. pgvector: Open-source vector similarity search for Postgres. [link]. Accessed: 17 May 2025.
Emilie Anthony, Astrid Berntsson, Tiziano Santilli, and Rebekka Wohlrab. 2024. We’re Drifting Apart: Architectural Drift from the Developers’ Perspective. In 2024 IEEE 21st International Conference on Software Architecture (ICSA). IEEE, 101–111.
Len Bass, Paul Clements, and Rick Kazman. 2012. Software Architecture in Practice (3rd ed.). Addison-Wesley Professional.
David Baum, Jens Dietrich, Craig Anslow, and Richard Müller. 2018. Visualizing Design Erosion: How Big Balls of Mud are Made. In 2018 IEEEWorking Conference on Software Visualization (VISSOFT). 122–126. DOI: 10.1109/VISSOFT.2018.00022
Joao Brunet, Dalton Guerrero, and Jorge Figueiredo. 2009. Design tests: An approach to programmatically check your code against design rules. In 2009 31st International Conference on Software Engineering-Companion Volume. IEEE, 255–258.
João Brunet, Gail C Murphy, Ricardo Terra, Jorge Figueiredo, and Dalton Serey. 2014. Do developers discuss design?. In Proceedings of the 11thWorking Conference on Mining Software Repositories. 340–343.
Andrea Caracciolo, Mircea Filip Lungu, and Oscar Nierstrasz. 2015. A unified approach to architecture conformance checking. In 2015 12th Working IEEE/IFIP Conference on Software Architecture. IEEE, 41–50.
Fangwei Chen, Li Zhang, and Xiaoli Lian. 2020. An improved mapping method for automated consistency check between software architecture and source code. In 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). IEEE, 60–71.
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
Eduardo F de Lima and Ricardo Terra. 2020. ArchPython: architecture conformance checking for Python systems. In Proceedings of the XXXIV Brazilian Symposium on Software Engineering. 772–777.
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024).
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33 (2020), 9459–9474.
Andrielly Lucena. 2025. andriellyll/design-test-generation: Reproducibility Package – Language Models as Architectural Gatekeepers. DOI: 10.5281/zenodo.17055874
Jorge Miño, Roberto Andrade, Jenny Torres, and Kharol Chicaiza. 2024. Leveraging Generative Artificial Intelligence for Software Antipattern Detection. In International Conference on Information Management. Springer, 138–149.
OpenAI. 2024. GPT-4o. OpenAI. [link] Accessed: 21 Mar. 2025.
PostgreSQL Global Development Group. 2024. PostgreSQL Documentation. [link] Acessado: 2025-03-23.
Mengchao Ren. 2024. Advancements and Applications of Large Language Models in Natural Language Processing: A Comprehensive Review. Applied and Computational Engineering 97 (2024), 55–63.
Guillermo Rodriguez, Marcelo Armentano, Álvaro Soria, and Emilio Corengia. 2020. Evaluation of Markov Models for Architecture Conformance Checking. IEEE Latin America Transactions 18, 01 (2020), 43–50.
Daniel Gustavo San Martín Santibánez. 2021. REMEDY: architectural conformance checking for adaptive systems. (2021).
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
Erik Whiting and Sharon Andrews. 2020. Drift and Erosion in Software Architecture: Summary and Prevention Strategies. In Proceedings of the 2020 the 4th International Conference on Information System and Data Mining (Hawaii, HI, USA) (ICISDM ’20). Association for Computing Machinery, New York, NY, USA, 132–138. DOI: 10.1145/3404663.3404665
Haozhou Zhao. 2023. pgvector: Open-source vector similarity search for Postgres. [link]. Accessed: 17 May 2025.
Published
2025-09-22
How to Cite
LUCENA, Andrielly; ALVES, Everton L. G.; BRUNET, João.
Language Models as Architectural Gatekeepers: Automating Conformance Checking from Natural Language. In: BRAZILIAN SYMPOSIUM ON SOFTWARE ENGINEERING (SBES), 39. , 2025, Recife/PE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 713-719.
ISSN 2833-0633.
DOI: https://doi.org/10.5753/sbes.2025.11100.
