Semantic Clustering in the Context of Legislative Amendments
Abstract
This study explores the semantic clustering of Brazilian legislative amendments through Large Language Models. Given the high number of amendments drafted annually in the Brazilian Federal Chamber of Deputies and the Federal Senate, and the consequent extensive hours spent by teams responsible for grouping these documents, the use of automated techniques for efficient analysis and organization presents itself as a beneficial alternative for legislative bodies. The study compared approaches with and without text preprocessing, varying the model’s temperature parameter to assess its impact on result quality. To validate the effectiveness of the formed clusters, precision, recall, and F1 metrics were applied.References
Agnoloni, T., Marchetti, C., Battistoni, R., and Briotti, G. (2022). Clustering similar amendments at the Italian senate. In Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference, pages 39–46, Marseille, France. European Language Resources Association.
Bird, S. and Loper, E. (2004). NLTK: The natural language toolkit. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 214–217, Barcelona, Spain. Association for Computational Linguistics.
Câmara dos Deputados (2019). Câmara lança Ulysses, robô digital para facilitar acesso a informações legislativas. [Acessado em: 15 de julho de 2024].
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
dos Deputados, C. (2024). u4 ordenado completo. Acessado em: 17 dez. 2024.
Moreira, V. and Huyck, C. (2001). A stemming algorithm for the portuguese language. In Proceedings of the Eighth International Symposium on String Processing and Information Retrieval (SPIRE 2001), Porto, Portugal. IEEE.
OpenAI (2023). Gpt-4 technical report. Acessado em 4 de março de 2024.
Pressato, D., de Andrade, P. L. C., Junior, F. R., Siqueira, F. A., Souza, E. P. R., da Silva, N. F. F., de Souza Dias, M., and de Leon Ferreira de Carvalho, A. C. P. (2024). Natural language processing application in legislative activity: a case study of similar amendments in the Brazilian senate. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 614–619, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
Souza, E., Vitório, D., Moriyama, G., Santos, L., Martins, L., Souza, M., Fonseca, M., Félix, N., de Carvalho, A. C. P. d. L. F., Albuquerque, H. O., and Oliveira, A. L. I. (2021). An information retrieval pipeline for legislative documents from the brazilian chamber of deputies. In Legal Knowledge and Information Systems, pages 119–126. IOS Press.
Vayadande, K., Bhat, A., Bachhav, P., Bhoyar, A., Charoliya, Z., and Chavan, A. (2024). Ai-powered legal documentation assistant. In Proceedings of the 4th International Conference on Pervasive Computing and Social Networking (ICPCSN). IEEE.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2020). BERTSCORE: Evaluating text generation with BERT. In Proceedings of the 8th International Conference on Learning Representations (ICLR). Cornell University and ASAPP Inc.
Bird, S. and Loper, E. (2004). NLTK: The natural language toolkit. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 214–217, Barcelona, Spain. Association for Computational Linguistics.
Câmara dos Deputados (2019). Câmara lança Ulysses, robô digital para facilitar acesso a informações legislativas. [Acessado em: 15 de julho de 2024].
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
dos Deputados, C. (2024). u4 ordenado completo. Acessado em: 17 dez. 2024.
Moreira, V. and Huyck, C. (2001). A stemming algorithm for the portuguese language. In Proceedings of the Eighth International Symposium on String Processing and Information Retrieval (SPIRE 2001), Porto, Portugal. IEEE.
OpenAI (2023). Gpt-4 technical report. Acessado em 4 de março de 2024.
Pressato, D., de Andrade, P. L. C., Junior, F. R., Siqueira, F. A., Souza, E. P. R., da Silva, N. F. F., de Souza Dias, M., and de Leon Ferreira de Carvalho, A. C. P. (2024). Natural language processing application in legislative activity: a case study of similar amendments in the Brazilian senate. In Gamallo, P., Claro, D., Teixeira, A., Real, L., Garcia, M., Oliveira, H. G., and Amaro, R., editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1, pages 614–619, Santiago de Compostela, Galicia/Spain. Association for Computational Lingustics.
Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
Souza, E., Vitório, D., Moriyama, G., Santos, L., Martins, L., Souza, M., Fonseca, M., Félix, N., de Carvalho, A. C. P. d. L. F., Albuquerque, H. O., and Oliveira, A. L. I. (2021). An information retrieval pipeline for legislative documents from the brazilian chamber of deputies. In Legal Knowledge and Information Systems, pages 119–126. IOS Press.
Vayadande, K., Bhat, A., Bachhav, P., Bhoyar, A., Charoliya, Z., and Chavan, A. (2024). Ai-powered legal documentation assistant. In Proceedings of the 4th International Conference on Pervasive Computing and Social Networking (ICPCSN). IEEE.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2020). BERTSCORE: Evaluating text generation with BERT. In Proceedings of the 8th International Conference on Learning Representations (ICLR). Cornell University and ASAPP Inc.
Published
2025-09-29
How to Cite
ANDRADE, Pedro L. C. de; SOUZA, Ellen P. R.; SILVA, Nádia F. F. da; CARVALHO, André C. P. L. F. de.
Semantic Clustering in the Context of Legislative Amendments. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 569-579.
ISSN 2763-9061.
DOI: https://doi.org/10.5753/eniac.2025.13910.
