CandiDATA: an enhanced dataset for data analysis of elections in Brazil from 1945 to 2020


  • Felipe F. Vasconcelos Universidade Federal de Alagoas
  • João V. S. Tavares Universidade Federal de Alagoas
  • Matheus G. S. Oliveira Universidade Federal de Alagoas
  • Fabio J. Coutinho Universidade Federal de Alagoas
  • João Paulo Clarindo Universidade de São Paulo



data integration, data cleaning, electoral data


The Brazilian Superior Electoral Court (TSE) keeps data on elections that have taken place in Brazil since 1933. These data constitute an important collection serving as a reference for works in several research areas. However, this collection is not fully exploited due to some problems, such as missing and non-standard data, making analysis and integration with external databases difficult. Previous works built limited datasets and tools because of these problems as they only include data since the 1998 election, disregarding the election years from 1945 and 1996. This work discusses the steps to create CandiDATA – a standardized and enhanced dataset from TSE data, including a toolkit of webscrapping and data visualization. CandiDATA is available in open format and covers the election period between 1945 and 2020.


Download data is not yet available.


Araújo, C. The limits of women’s quotas in brazil. IDS Bulletin 41 (5): 17–24, 2010.

Bray, T. The JavaScript Object Notation (JSON) Data Interchange Format. RFC 8259, RFC Editor, 2017.

Camargo, A., Silva, R., Amaral, E., Heinen, M., and Pereira, F. Mineração de dados eleitorais: descoberta de padrões de candidatos a vereador na região da campanha do Rio Grande do Sul. Brazilian Journal of Applied Computing 8 (1): 64–73, abr., 2016.

CEPESP, F. Cepespdata - political database. [link], 2020. [Online; access in aug. 12].

Clarindo, J. P., Fontes, W., and Coutinho, F. QualiSUS: um dataset sobre dados da Saúde Pública no Brasil. In XXXIV brazilian symposium on Databases: Dataset Showcase Workshop, SBBD 2019 Companion. SBC, Fortaleza, CE, Brazil, October 7-10, 2019, pp. 418–428, 2019. in Portuguese.

Drapeau, M. The State of CSV and JSON. [link], 2018. [Online; access em jul. 19].

Economist, T. Global democracy has a very bad year, 2021.

Filho, R. M., Almeida, J., and Pappa, G. Pesquisa eleitoral em redes sociais: Inclusão da análise de novas dimensões. In Anais do III Brazilian Workshop on Social Network Analysis and Mining. SBC, Porto Alegre, RS, Brasil, pp. 164–175, 2014.

Jacintho, L. H., da Silva, T., Parmezan, A., and Batista, G. Brazilian presidential elections: Analysing voting patterns in time and space using a simple data science pipeline. In Anais do VIII Symposium on Knowledge Discovery, Mining and Learning. SBC, Porto Alegre, RS, Brasil, pp. 217–224, 2020.

MTE. Brazilian classification of occupations. [link], 2020. [Online; access in aug. 11].

Shafranovich, Y. Common Format and MIME Type for Comma-Separated Values (CSV) Files. RFC 4180, RFC Editor. October, 2005.

Speck, B. and Mancuso, W. A study on the impact of campaign finance, political capital and gender on electoral performance. Brazilian Political Science Review (Online) vol. Vol. 18, pp. P. 34–58, 04, 2014.

Tribunal Superior Eleitoral. Brazilian electronic voting machine : 20 years in favor of democracy. Electoral Superior Court, 2016.

TSE. Repositório de dados eleitorais. [link], 2020. [Online; access in aug. 11].

Vasconcelos, F., Tavares, J., Ribeiro, M., Coutinho, F. J., and Clarindo, J. P. Candidata: um dataset para análise das eleições no brasil. In Anais do III Dataset Showcase Workshop. SBC, Porto Alegre, RS, Brasil, pp. 160–168, 2021.




How to Cite

F. Vasconcelos, F., Tavares, J. V. S., Oliveira, M. G. S., J. Coutinho, F., & Clarindo, J. P. (2022). CandiDATA: an enhanced dataset for data analysis of elections in Brazil from 1945 to 2020. Journal of Information and Data Management, 13(1).



Dataset Showcase Workshop 2021 - Extended Papers