Topic Modeling of Committee Discussions in the Brazilian Chamber of Deputies


Ensuring that civil society can monitor and supervise the actions of its representatives is essential to build strong democracies. Despite significant advances in transparency, Brazilian National Congress committees are presently complex to follow and monitor due to the lack of open structured data about their discussions and the sheer volume of activity in these committees. This work presents two contributions to this context. First, we create and present an open dataset including structured speeches of the 25 Chamber of Deputies' standing committees over the last two decades. Second, we use Natural Language Processing techniques - especially Latent Dirichlet Allocation (LDA) - to identify themes addressed on these committees. Based on these latent topics, we explore similarities and differences between the standing committees, their relationships, and how their debates change over time. Our results show that committees accommodate conversations - including their main topic and opposing agendas - and describe how the topics discussed in the committees reverberate external events.

Palavras-chave: Chamber of Deputies, Latent Dirichlet Allocation, Natural Language Processing, Politics


Arora, R. and Ravindran, B. Latent Dirichlet Allocation Based Multi-Document Summarization. In Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data. Association for Computing Machinery, Singapore, pp. 91–97, 2008.

Batista, M. QUAIS POLÍTICAS IMPORTAM? Usando ênfases na agenda legislativa para mensurar saliência. Revista Brasileira de Ciências Sociais 35 (104): 1–20, 2020.

Blei, D. M. and Lafferty, J. D. Topic Models. In A. N. Srivastava and M. Sahami (Eds.), Text Mining: Classification, Clustering, and Applications. Chapman and Hall/CRC, New York, pp. 71–93, 2009.

Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent Dirichlet Allocation. The Journal of Machine Learning Research 3 (18): 993–1022, 2003.

Chang, J., Boyd-Graber, J., Gerrish, S., Wang, C., and Blei, D. M. Reading Tea Leaves: How Humans Interpret Topic Models. In Proceedings of the 22nd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, New York, pp. 288–296, 2009.

de Secondat de Montesquieu, C.-L., Carrithers, D. W., and Nugent, T. The Spirit of the Laws. University of California Press, Berkeley, 1977.

Greene, D. and Cross, J. P. Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. Political Analysis 25 (1): 77–94, 2017.

Huyck, C. and Orengo, V. M. A Stemming Algorithmm for the Portuguese Language. In International Symposium on String Processing and Information Retrieval. IEEE Computer Society, California, pp. 186–193, 2001.

McInnes, L., Healy, J., Saul, N., and Grossberger, L. UMAP: Uniform Manifold Approximation and Projection. The Journal of Open Source Software 3 (29): 861, 2018.

Moreira, D. Com a Palavra os Nobres Deputados: Ênfase Temática dos Discursos dos Parlamentares Brasileiros. Dados 63 (1): 1–37, 2020.
Como Citar

Selecione um Formato
DOS SANTOS, M. A.; ANDRADE, N.; MORAIS, F.. Topic Modeling of Committee Discussions in the Brazilian Chamber of Deputies. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 9. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 49-56. ISSN 2763-8944. DOI: