LegisPL-BR - Dataset of Brazilian bills

  • Juan M. V. Marciano Federal Institute of Piauí (IFPI)
  • Vinicius P. Machado Federal University of Piauí (UFPI)
  • Arlino H. M. Araújo Federal University of Piauí (UFPI)

Abstract


The area of legislative informatics has benefited from the use of structured data for political analysis. However, the dataset available in Brazil on legislative proposals is not well documented. This work presents the construction of a structured dataset with proposals of the type Bill (PL) from the Chamber of Deputies. The data were extracted from the official API and enriched with other information. As a result, an open and unified dataset is made available, with potential for legislative studies and applications in data science.

Keywords: Dataset, Bills, Data Science

References

Barbalho, F. A. (2018). A emergência do campo de políticas públicas de dados abertos governamentais no brasil. Conhecer: Debate entre o Público e o Privado, 8(20):118–137.

Bommarito, M. J., Katz, D. M., and Detterman, E. M. (2020). Lexnlp: Natural language processing and information extraction for legal and regulatory texts. In Research Hand-book on Big Data Law. Edward Elgar Publishing.

Brandt, M. B. e. a. (2018). Modelo de dados abertos conectados para informação legislativa. Informação & Sociedade: Estudos, 28(2):149–161.

Brasil (2011). Lei nº 12.527, de 18 de novembro de 2011. [link]. Diário Oficial da União, Brasília, DF, 18 nov. 2011. Acesso em: 2 jun. 2025.

Breitman, K. e. a. (2012). Open government data in brazil. IEEE Intelligent Systems, 27(3):45–49.

Cavalcante, G. V., Sousa, F. R. d., Vaz, R. C. R., and Araujo, C. H. G. (2016). Dados abertos legislativos: o parlamento e o cidadão. Anais da VIII Jornada de Pesquisa e Extensão.

Chebolu, S. U. S., Dernoncourt, F., Lipka, N., and Solorio, T. (2023). A review of datasets for aspect-based sentiment analysis. In Proceedings of the 13th International Joint Conference on Natural Language Processing (IJCNLP 2023) and the 3rd Conference of the Asia-Pacific Chapter of the ACL (Volume 1: Long Papers), pages 611–628, Nusa Dua, Bali, Indonesia. Association for Computational Linguistics.

Dalenogare, L. G. C. and Araújo, M. A. D. d. (2019). Abordagens teóricas em dados governamentais abertos. Revista Gestão e Tecnologia, 19(5):296–314.

Dzikrullah, F. and Rinjani, M. A. (2017). A framework design to develop integrated data system for smart e-government based on big data technology. Bulletin of Social Informatics Theory and Application, 1(2):41–51.

Fagundes, M. F. and Ribeiro Junior, D. I. (2020). Modelo baseado em frictionless data aplicado aos dados abertos governamentais. Revista Digital de Biblioteconomia e Ciência da Informação, 18:e020034.

Janssen, K. (2011). The influence of the psi directive on open government data. Government Information Quarterly, 28(4):446–456.

Janssen, M., Charalabidis, Y., and Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information Systems Management, 29(4):258–268.

Li, A., Hua, X., Liao, Y., and Bansal, M. (2023). Stance detection on social media with background knowledge. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), pages 14252–14270, Singapore. Association for Computational Linguistics.

Mumcuoğlu, E. e. a. (2021). Natural language processing in law: Prediction of outcomes in the higher courts of turkey. Information Processing & Management, 58(6):102684.

Nay, J. (2018). Natural language processing and machine learning for law and policy texts. Technical Report 3438276, Social Science Research Network (SSRN). Acesso em: 2 jun. 2025.

Oliveira, G. P., Silva, M. O., Costa, L. G. L., Dutra, M. T., and Pappa, G. L. (2024). Icpset: Um conjunto de dados estruturados de itens de compras p’ublicas. In Proceedings of the VI Dataset Showcase Workshop (DSW), pages 103–113, Florian’opolis, SC, Brazil. SBC.

Rolim, T. V., Ávila, C. V. S., Freitas, R., Mariano, R. G., and Vidal, V. M. P. (2024). Construção do dataset semântico de pessoas jurídicas. In Proceedings of the VI Dataset Showcase Workshop (DSW), pages 41–52, Florianópolis, SC, Brazil. SBC.

Ruijer, E., Grimmelikhuijsen, S., and Meijer, A. (2017). Open data for democracy: Developing a theoretical framework for open data use. Government Information Quarterly, 34(1):45–52.

Testa, G., Mesquita, L., and Bolognesi, B. (2024). Do fisiologismo ao centro do poder: as reformas eleitorais e o centrão 2.0. Caderno CRH, 37(100):e024003.

Wilkinson, M. D. e. a. (2016). The fair guiding principles for scientific data management and stewardship. Scientific Data, 3.
Published
2025-09-29
MARCIANO, Juan M. V.; MACHADO, Vinicius P.; ARAÚJO, Arlino H. M.. LegisPL-BR - Dataset of Brazilian bills. In: DATASET SHOWCASE WORKSHOP (DSW), 7. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 58-70. DOI: https://doi.org/10.5753/dsw.2025.247728.