Automated Clustering of Microservices Using Natural Language Processing and Clustering Algorithms

Santiago di Sabato; Guillermo Rodríguez; Claudia A. Marcos; Santiago Vidal; José Renan A. Pereira; Danyllo Albuquerque; Mirko Perkusich

doi:10.5753/sbcars.2025.14592

Santiago di Sabato ISISTAN / UNICEN
Guillermo Rodríguez ISISTAN / UNICEN
Claudia A. Marcos ISISTAN / UNICEN
Santiago Vidal ISISTAN / UNICEN
José Renan A. Pereira UFCG
Danyllo Albuquerque UFCG
Mirko Perkusich UFCG

DOI: https://doi.org/10.5753/sbcars.2025.14592

Resumo

Microservices have become a leading architectural style for creating scalable systems. Designing these architectures involves breaking down large APIs into well-defined service boundaries. While many existing clustering techniques rely on having access to the complete source code, API specifications (such as OpenAPI) are often more readily available in practice. This is particularly true when code is distributed across different teams, linked to legacy systems, or expensive to analyze. However, API specifications are often poorly documented, which hampers the effectiveness of clustering. Large Language Models (LLMs) present a promising solution by inferring semantic relationships even from sparse or incomplete descriptions. This paper introduces ODAM (OpenAPI Documentation Analysis and Modeling), a Python-based pipeline that (i) fills missing endpoint descriptions with ChatGPT, (ii) embeds all texts, (iii) performs an intermediate topic induction, and (iv) feeds the resulting vectors to either K-Means or hierarchical clustering. We investigated the ODAM’s feasibility using a real-world API (Twilio, 45 microservices), comparing six pipeline variants—three intermediate strategies crossed with two description settings—against an expert-defined reference, using four evaluation criteria: success rate, failure rate, Silhouette score, and execution time. The results showed the LLM-augmented pipeline achieved a 64% success rate—34% higher than the best non-LLM baseline—and uncovered five coherent business domains at 𝑘=5, where Silhouette peaked at 0.55. Statistical pipelines ran in under 100 seconds, while the LLMenhanced version took 1.6 hours, but dropped to under one minute with cached descriptions. Final clustering quality remained stable across K-Means and hierarchical algorithms, with 0.05 variation in Silhouette scores. Overall, injecting LLM-generated semantics into sparsely documented APIs materially improves microservice clustering accuracy and exposes high-level capabilities; practitioners can trade runtime for precision via a simple cache toggle.

Palavras-chave: Software Architecture, LLM, Software Modernization, Microservices, Architectural Decomposition

Referências

Yalemisew Abgaz, Andrew McCarren, Peter Elger, David Solan, Neil Lapuz, Marin Bivol, Glenn Jackson, Murat Yilmaz, Jim Buckley, and Paul Clarke. 2023. Decomposition of monolith applications into microservices architectures: A systematic review. IEEE Transactions on Software Engineering 49, 8 (2023), 4213–4242.

Lauren Adams, Francis Boyle, Patrick Boyle, Dario Amoroso d’Aragona, Tomas Cerny, and Davide Taibi. 2023. ChatGPT for Microservice Development: How Far Can We Go?. In Proceedings of the International Conference on Microservices. 1–12. [link]

Aakash Ahmad, Muhammad Waseem, Peng Liang, Mahdi Fahmideh, Mst Shamima Aktar, and Tommi Mikkonen. 2023. Towards human-bot collaborative software architecting with chatgpt. In Proceedings of the 27th international conference on evaluation and assessment in software engineering. 279–285.

Omar Al-Debagy. 2021. Microservices Identification Methods and Quality Metrics. Ph.D. Dissertation. Budapest University of Technology and Economics (Hungary).

Omar Al-Debagy and Peter Martinek. 2019. A new decomposition method for designing microservices. Periodica Polytechnica Electrical Engineering and Computer Science 63, 4 (2019), 274–281.

Ahmed S. Alsayed, Hoa K. Dam, and Chau Nguyen. 2024. MicroRec: Leveraging Large Language Models for Microservice Recommendation. In Proceedings of the 21st International Conference on Mining Software Repositories (MSR ’24). 419–430. DOI: 10.1145/3643991.3644916

Sathurshan Arulmohan, Marie-Jean Meurs, and Sébastien Mosser. 2023. Extracting domain models from textual requirements in the era of large language models. In 2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C). IEEE, 580–587.

Luciano Baresi, Martin Garriga, and Alan De Renzis. 2017. Microservices identification through interface analysis. In Service-Oriented and Cloud Computing: 6th IFIP WG 2.14 European Conference, ESOCC 2017, Oslo, Norway, September 27-29, 2017, Proceedings 6. Springer, 19–33.

Grzegorz Blinowski, Anna Ojdowska, and Adam Przybyłek. 2022. Monolithic vs. microservice architecture: A performance and scalability evaluation. IEEE access 10 (2022), 20357–20374.

Saurabh Chauhan, Zeeshan Rasheed, Abdul M. Sami, Zheying Zhang, Jussi Rasku, Kai-Kristian Kemell, and Pekka Abrahamsson. 2025. LLM-Generated Microservice Implementations from RESTful API Definitions. In Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE ’25). DOI: 10.5220/0013391000003928

Leonardo da Rocha Araujo, Guillermo Rodríguez, Santiago Vidal, Claudia Marcos, and Rodrigo Pereira dos Santos. 2022. Empirical Analysis on OpenAPI Topic Exploration and Discovery to Support the Developer Community. Computing and Informatics 40, 6 (Feb. 2022), 1345–1369. DOI: 10.31577/cai_2021_6_1345

Leonardo Henrique Da Rocha Araujo, Guillermo Horacio Rodríguez, Santiago Agustín Vidal, Claudia Andrea Marcos, and Rodrigo Pereira Dos Santos. 2022. Empirical analysis on openapi topic exploration and discovery to support the developer community. (2022).

Rudra Dhar, Karthik Vaidhyanathan, and Vasudeva Varma. 2024. Can llms generate architectural design decisions?-an exploratory empirical study. In 2024 IEEE 21st International Conference on Software Architecture (ICSA). IEEE, 79–89.

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology 33, 8 (2024), 1–79.

Sam Newman. 2021. Building microservices: designing fine-grained systems. " O’Reilly Media, Inc.".

Idris Oumoussa and Rajaa Saidi. 2024. Evolution of microservices identification in monolith decomposition: A systematic review. IEEE Access 12 (2024), 23389–23405.

Alina Petukhova, João P Matos-Carvalho, and Nuno Fachada. 2025. Text clustering with large language model embeddings. International Journal of Cognitive Computing in Engineering 6 (2025), 100–108.

Sebastian Pinto-Agüero and Rene Noel. 2025. Microservices Evolution Factors: a Multivocal Literature Review. IEEE Access (2025).

Ernesto Quevedo, Amr S. Abdelfattah, Alejandro Rodriguez, Jorge Yero, and Tomas Cerny. 2024. Evaluating ChatGPT’s Proficiency in Understanding and Answering Microservice Architecture Queries Using Source Code Insights. SN Computer Science 5, 4 (2024), 422. DOI: 10.1007/s42979-024-02664-0

Ana Martínez Saucedo, Guillermo Rodríguez, Fabio Gomes Rocha, and Rodrigo Pereira dos Santos. 2025. Migration of monolithic systems to microservices: A systematic mapping study. Information and Software Technology 177 (2025), 107590.

Meshal Shutaywi and Nezamoddin N Kachouie. 2021. Silhouette analysis for performance evaluation in machine learning with applications to clustering. Entropy 23, 6 (2021), 759.

Mehmet Söylemez, Bedir Tekinerdogan, and Ayça Kolukısa Tarhan. 2022. Challenges and solution directions of microservice architectures: A systematic literature review. Applied sciences 12, 11 (2022), 5507.

Tatjana Stojanovic and Saša D. Lazarević. 2023. The Application of ChatGPT for Identification of Microservices. E-business technologies conference proceedings 3, 1, 99–105. [link]

Fredy H Vera-Rivera, Eduard Gilberto Puerto Cuadros, Boris Perez, Hernán Astudillo, and Carlos Gaona. 2023. SEMGROMI—a semantic grouping algorithm to identifying microservices using semantic similarity of user stories. PeerJ Computer Science 9 (2023), e1380.

Yingying Wang, Harshavardhan Kadiyala, and Julia Rubin. 2021. Promises and challenges of microservices: an exploratory study. Empirical Software Engineering 26, 4 (2021), 63.

Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, Anders Wesslén, et al. 2012. Experimentation in software engineering. Vol. 236. Springer.

Shenglin Zhang, Sibo Xia,Wenzhao Fan, Binpeng Shi, Xiao Xiong, Zhenyu Zhong, Minghua Ma, Yongqian Sun, and Dan Pei. 2024. Failure diagnosis in microservice systems: A comprehensive survey and analysis. ACM Transactions on Software Engineering and Methodology (2024).