Evaluation of Automatic Speech Recognition Systems

  • Matheus Xavier Sampaio Federal University of Ceará (UFC)
  • Regis Pires Magalhães Federal University of Ceará (UFC)
  • Ticiana Linhares Coelho da Silva Federal University of Ceará (UFC)
  • Lívia Almada Cruz Federal University of Ceará (UFC)
  • Davi Romero de Vasconcelos Federal University of Ceará (UFC)
  • José Antônio Fernandes de Macêdo Federal University of Ceará (UFC)
  • Marianna Gonçalves Fontenele Ferreira Federal University of Ceará (UFC)

Abstract


Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.

Keywords: Machine learning, experiments, analysis

References

Baevski, A., Zhou, Y., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, pages 12449–12460.

Banerjee, S. and Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72.

Chan, W., Jaitly, N., Le, Q., and Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition.

Chiu, C.-C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R. J., Rao, K., Gonina, E., et al. (2018). State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4774–4778. IEEE.

de Lima, T. A. and Da Costa-Abreu, M. (2020). A survey on automatic speech recognition systems for portuguese language and its variations. Computer Speech & Language, 62:101055.

Filippidou, F. and Moussiades, L. (2020). A benchmarking of ibm, google and wit automatic speech recognition systems. In IFIP International Conference on Artificial Intelligence Applications and Innovations, pages 73–82. Springer.

Graves, A., Mohamed, A.-R., and Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. IEEE.

Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025.

Kepuska, V. and Bohouta, G. (2017). Comparing speech recognition systems (microsoft api, google api and cmu sphinx). Int. J. Eng. Res. Appl, 7(03):20–24.

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2001). Bleu: a method for automatic evaluation of machine translation. In ACL.

Reddy, D. R. (1976). Speech recognition by machine: A review. Proceedings of the IEEE, 64(4):501–531.

Schneider, S., Baevski, A., Collobert, R., and Auli, M. (2019). wav2vec: Unsupervised pre-training for speech recognition. In Interspeech 2019, pages 3465–3469.

Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., Yu, D., and Zweig, G. (2016). Achieving human parity in conversational speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Xiong, W., Wu, L., Alleva, F., Droppo, J., Huang, X., and Stolcke, A. (2018). The microsoft 2017 conversational speech recognition system. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5934–5938. IEEE.
Published
2021-10-04
SAMPAIO, Matheus Xavier; MAGALHÃES, Regis Pires; SILVA, Ticiana Linhares Coelho da; CRUZ, Lívia Almada; VASCONCELOS, Davi Romero de; MACÊDO, José Antônio Fernandes de; FERREIRA, Marianna Gonçalves Fontenele. Evaluation of Automatic Speech Recognition Systems. In: BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 301-306. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2021.17889.