Semantic Clustering of Civic Proposals: A Case Study on Brazil’s National Participation Platform

  • Ronivaldo Ferreira UFPA
  • Guilherme da Silva UnB
  • Carla Rocha UnB
  • Gustavo Pinto UFPA

Abstract


Promoting participation on digital platforms such as Brasil Participativo has emerged as a top priority for governments worldwide. However, due to the sheer volume of contributions, much of this engagement goes underutilized, as organizing it presents significant challenges: (1) manual classification is unfeasible at scale; (2) expert involvement is required; and (3) alignment with official taxonomies is necessary. In this paper, we introduce an approach that combines BERTopic with seed words and automatic validation by large language models. Initial results indicate that the generated topics are coherent and institutionally aligned, with minimal human effort. This methodology enables governments to transform large volumes of citizen input into actionable data for public policy.

References

Aguiar, C. S. R., Alves, I., Gomes, L., Pinos, B., Bellix, L., and Parra, H. (2024). Colaboração multissetorial para desenvolvimento e manutenção de soluções tecnológicas de participação: o caso do brasil participativo.

Clemente, A. J. (2018). Leonardo secchi. análise de políticas públicas: Diagnóstico de problemas, recomendação de soluções. são paulo: Cengage learning, 2016.

Constantino, K., Cruz, V. A. L., Zucheratto, O. M., França, C., Carvalho, M., Silva, T. H., Laender, A. H., and Gonçalves, M. A. (2022). Segmentação e classificação semântica de trechos de diários oficiais usando aprendizado ativo. In Simpósio Brasileiro de Banco de Dados (SBBD), pages 304–316. SBC.

Feng, F., Yang, Y., Cer, D., Arivazhagan, N., and Wang, W. (2020). Language-agnostic bert sentence embedding. arXiv preprint arXiv:2007.01852.

Grootendorst, M. (2022). Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794.

Hott, H. R., Silva, M. O., Oliveira, G. P., Brandão, M. A., Lacerda, A., and Pappa, G. (2023). Evaluating contextualized embeddings for topic modeling in public bidding domain. In Brazilian Conference on Intelligent Systems, pages 410–426. Springer.

Reimers, N. and Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.

Saravia, E. and Ferrarezi, E. (2007). Políticas públicas. Coletâneas. Volumes, 1.

Silva, M. O., Oliveira, G. P., Costa, L. G., and Pappa, G. L. (2024a). Evaluating domain-adapted language models for governmental text classification tasks in portuguese. In Simpósio Brasileiro de Banco de Dados (SBBD), pages 247–259. SBC.

Silva, M. O., Oliveira, G. P., Costa, L. G., and Pappa, G. L. (2024b). Govbert-br: A bert-based language model for brazilian portuguese governmental data. In Brazilian Conference on Intelligent Systems, pages 19–32. Springer.

Silva, M. O., Paula, A. F., Oliveira, G. P., Vaz, I. A., Hott, H., Gomide, L. D., Reis, A. P., Mendes, B. M., Bacha, C. A., Costa, L. L., et al. (2022). Lipset: Um conjunto de dados com documentos rotulados de licitações públicas. In Dataset Showcase Workshop (DSW), pages 13–24. SBC.

Silva, N. F. d., Silva, M. C. R., Pereira, F. S., Tarrega, J. P. M., Beinotti, J. V. P., Fonseca, M., Andrade, F. E. d., and de Carvalho, A. C. d. L. (2021). Evaluating topic models in portuguese political comments about bills from brazil’s chamber of deputies. In Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, November 29–December 3, 2021, Proceedings, Part II 10, pages 104–120. Springer.

Silveira, R., Fernandes, C. G., Araujo Monteiro Neto, J., Furtado, V., and Pimentel Filho, J. E. (2021). Topic modelling of legal documents via legal-bert. Topic Modelling of Legal Documents via LEGAL-BERT.

Souza, F., Nogueira, R., and Lotufo, R. (2020). Bertimbau: pretrained bert models for brazilian portuguese. In Brazilian conference on intelligent systems, pages 403–417. Springer.
Published
2025-09-29
FERREIRA, Ronivaldo; SILVA, Guilherme da; ROCHA, Carla; PINTO, Gustavo. Semantic Clustering of Civic Proposals: A Case Study on Brazil’s National Participation Platform. In: NATIONAL MEETING ON ARTIFICIAL AND COMPUTATIONAL INTELLIGENCE (ENIAC), 22. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 700-711. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2025.14018.

Most read articles by the same author(s)