FakeBrAccent: A Portuguese Audio Deepfake Dataset with Different Brazilian Accents

  • Erick M. B. Santos IFES
  • Katarina Veljovic IFES
  • Karin S. Komati IFES

Abstract


The article presents the FakeBrAccent dataset, aimed at detecting audio deepfakes in Brazilian Portuguese. Created from the BrAccent corpus, the dataset includes original samples and synthetic versions generated with the Speechify tool (zero-shot TTS and voice cloning). It covers five Brazilian accents — Southern, Northeastern, Fluminense, Carioca, and Baiano — and is available in two versions: FakeBrAccent-B, balanced (746 audio samples), and FakeBrAccent-D, unbalanced (1,545 audio samples).

References

Azizah, K. (2024). Zero-shot voice cloning text-to-speech for dysphonia disorder speakers. IEEE Access, 12:63528–63547.

Ballesteros, D. M., Rodriguez, Y., and Renza, D. (2020). A dataset of histograms of original and fake voice recordings (H-Voice). Data in brief, 29:105331.

Batista, N. A. R. et al. (2018). Detecção automática de sotaques regionais brasileiros: A importância da validação cross-datasets. In Anais do XXXVI Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT), pages 939–944, Campina Grande, PB. Sociedade Brasileira de Telecomunicações.

Cuccovillo, L., Papastergiopoulos, C., Vafeiadis, A., Yaroshchuk, A., Aichroth, P., Votis, K., and Tzovaras, D. (2022). Open challenges in synthetic speech detection. In 2022 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6. IEEE.

Khanjani, Z., Watson, G., and Janeja, V. P. (2023). Audio deepfakes: A survey. Frontiers in Big Data, 5:1001063.

Lopes, T., Andrade, J., and Komati, K. (2021). Comparação de serviços em nuvem para transcrição de fala na língua portuguesa em áudios com sotaques regionais brasileiros. In Anais da IX Escola Regional de Informática de Goiás, pages 96–109, Porto Alegre, RS, Brasil. SBC.

Seow, J. W., Lim, M. K., Phan, R. C., and Liu, J. K. (2022). A comprehensive overview of deepfake: Generation, detection, datasets, and opportunities. Neurocomputing (Amsterdam), 513:351–371.
Published
2025-10-16
SANTOS, Erick M. B.; VELJOVIC, Katarina; KOMATI, Karin S.. FakeBrAccent: A Portuguese Audio Deepfake Dataset with Different Brazilian Accents. In: REGIONAL SCHOOL OF INFORMATICS OF ESPÍRITO SANTO (ERI-ES), 10. , 2025, Espírito Santo/ES. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 166-169. DOI: https://doi.org/10.5753/eries.2025.16040.