FLA-Dataset: A Database of Four Location-Based Audio Commands in Brazilian Portuguese

  • Daniel Ribeiro da Silva UFG
  • Gabriel Pettro Oliveira Ruotolo UFG
  • Alexandre Costa Ferro Filho UFG
  • Marcelo Henrique Lopes Ferreira UFG
  • Letícia Lima Mendes UFG
  • José Rafael Rebêlo Teles UFG

Resumo


The FLA-Dataset provides a curated collection of wake word audio commands in Brazilian Portuguese. It supports the development and evaluation of voice-activated systems by including four direction-oriented commands: direita, esquerda, frente, and pare. Recordings are contributed by 30 speakers across diverse demographic and acoustic profiles. The dataset promotes generalization by incorporating varied speech patterns, environments, and genders. All files are standardized in format, sampling rate, and structure to ensure usability. Each sample contains a single spoken command and is organized to allow speaker-specific experiments. Applications span robotics, assistive technologies, and smart devices. The dataset addresses the scarcity of localized resources for Portuguese. It enables speaker-independent training and evaluation in realistic scenarios. FLA-Dataset is publicly available and designed to support inclusive speech recognition research and deployment.
Palavras-chave: Wake word detection, Brazilian Portuguese, Audio commands, Speech recognition, Dataset

Referências

Chen, G., Parada, C., and Heigold, G. (2014). Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4087–4091.

Cioflan, C., Cavigelli, L., Rusci, M., de Prado, M., and Benini, L. (2024). On-device domain learning for keyword spotting on low-power extreme edge embedded systems. In 2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS), pages 6–10.

Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., Doumouro, C., Gisselbrecht, T., Caltagirone, F., Lavril, T., Primet, M., and Dureau, J. (2018). Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint, arXiv:1805.10190.

Ferro Filho, A. C., Ribeiro da Silva, D., Rebêlo Teles, J. R., Pettro Ruotolo, G., Mendes, L., Lopes Ferreira, M. H., and Woerle de Lima Soares, T. (2025). A simplified pipeline for wakeword creation and deployment: Leveraging zero-shot text-to-speech and ros2 for robotic systems. In 2025 Brazilian Conference on Robotics (CROS), volume 1, pages 1–6.

Lim, H., Kim, Y., Yeom, K., Seo, E., Lee, H., Choi, S. J., and Lee, H. (2023). Lightweight feature encoder for wake-up word detection based on self-supervised speech representation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5.

Panayotov, V., Chen, G., Povey, D., and Khudanpur, S. (2015). Librispeech: An asr corpus based on public domain audio books. In 2015 IEEE In ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5206–5210.

Raji, P. A. and Shekofteh, Y. (2022). Robat-e-beheshti: A persian wake word detection dataset for robotic purposes. In 2022 12th International Conference on Computer and Knowledge Engineering (ICCKE), pages 434–439.

Ribeiro, V., Huang, Y., Shangguan, Y., Yang, Z., Wan, L., and Sun, M. (2023). Handling the alignment for wake word detection: A comparison between alignment-based, alignment-free and hybrid approaches. In Interspeech 2023, pages 5366–5370.

Stefanel Gris, L. R., Casanova, E., de Oliveira, F. S., da Silva Soares, A., and Candido Junior, A. (2022). Brazilian portuguese speech recognition using wav2vec 2.0. In Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, page 333–343.

Wang, Y., Lv, H., Povey, D., Xie, L., and Khudanpur, S. (2020). Wake word detection with alignment-free lattice-free mmi. In Interspeech 2020, pages 4258–4262.

Warden, P. (2018). Speech commands: A dataset for limited-vocabulary speech recognition. ArXiv preprint, arXiv:1804.03209.
Publicado
04/12/2025
SILVA, Daniel Ribeiro da; RUOTOLO, Gabriel Pettro Oliveira; FERRO FILHO, Alexandre Costa; FERREIRA, Marcelo Henrique Lopes; MENDES, Letícia Lima; TELES, José Rafael Rebêlo. FLA-Dataset: A Database of Four Location-Based Audio Commands in Brazilian Portuguese. In: ESCOLA REGIONAL DE INFORMÁTICA DE GOIÁS (ERI-GO), 13. , 2025, Luziânia/GO. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 165-174. DOI: https://doi.org/10.5753/erigo.2025.17091.