Processamento e Transcrição de Voz em Língua Portuguesa voltado para Assistente Inteligente

  • Acácio de Andrade USP
  • Shayenne Moura USP
  • Alfredo Goldman USP

Abstract


Inserted in intelligent assistants context and part of the Advanced Distributed Assistant (ADA) project, this work proposes the adaptation of a speech recognition system that allows user interaction through voice commands, transcribing their commands, given in Portuguese, to text format, based on open-source software and available systems. A HMM-based system architecture is adopted. The preliminary system registered a 44.3% word error rate. Posteriorly, the system’s acoustic model development will include a hyperparameter optimization stage and experiments with more complex approaches, an assistant specific language model will be introduced.

References

Ashby, S., Barbosa, S., Brandão, S., Ferreira, J. P., Janssen, M., Silva, C., and Viaro, M. E. (2012). A rule based pronunciation generator and regional accent databank for portuguese. In Thirteenth Annual Conference of the International Speech Communication Association.

Batista, C. T., Dias, A. L., and Neto, N. C. S. (2018). Baseline acoustic models for brazilian portuguese using kaldi tools. In IberSPEECH, pages 77–81.

Bisani, M. and Ney, H. (2008). Joint-sequence models for grapheme-to-phoneme conversion. Speech communication, 50(5):434–451.

Coucke, A., Saade, A., Ball, A., Bluche, T., Caulier, A., Leroy, D., Doumouro, C., Gisselbrecht, T., Caltagirone, F., Lavril, T., et al. (2018). Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190.

Gales, M., Young, S., et al. (2008). The application of hidden markov models in speech recognition. Foundations and Trends R(cid:13) in Signal Processing, 1(3):195–304.

Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al. (2011). The kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.

Stolcke, A. (2002). Srilm-an extensible language modeling toolkit. In Seventh internati- onal conference on spoken language processing.
Published
2020-08-19
DE ANDRADE, Acácio; MOURA, Shayenne; GOLDMAN, Alfredo. Processamento e Transcrição de Voz em Língua Portuguesa voltado para Assistente Inteligente. In: REGIONAL SCHOOL OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, 1. , 2020, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 1-4.