Avaliação do Desempenho de Ferramentas de Transcrição de Áudio em Português para Análise de Dados da Web

Jonatas Santos; Marcelo M. R. Araujo; Josemar Caetano; Yago Santos; Julio C. S. Reis; Ana P. C. Silva; Jussara M. Almeida

doi:10.5753/brasnam.2023.230512

Jonatas Santos UFMG
Marcelo M. R. Araujo UFMG
Josemar Caetano UFMG
Yago Santos UFV
Julio C. S. Reis UFV
Ana P. C. Silva UFMG
Jussara M. Almeida UFMG

DOI: https://doi.org/10.5753/brasnam.2023.230512

Abstract

In this work, we present an evaluation of the performance of Portuguese audio transcription tools used for Web data analysis. For this purpose, we explored a publicly accessible dataset, and performed analyses based on two main dimensions: (total) number of failures and accuracy. Our results present interesting findings that may be useful to guide researchers in choosing audio transcription methods for studies focused on the Portuguese language.

References

Filippidou, F. and Moussiades, L. (2020). A benchmarking of ibm, google and wit automatic speech recognition systems. In IFIP Int’l Conference AIAI.

Guerra, P. A. C., Silveira, S. R., Bertolini, C., Parreira, F. J., and Ulbricht, V. R. (2020). Aplicativo mobile para avaliar a acessibilidade de objetos de aprendizagem utilizando um sistema especialista. Revista Educação Especial, 33:1–26.

Kolobov, R., Okhapkina, O., Omelchishina, O., Platunov, A., Bedyakin, R., Moshkin, V., Menshikov, D., and Mikhaylovskiy, N. (2021). Mediaspeech: Multilanguage asr benchmark and dataset. arXiv preprint arXiv:2103.16193.

Lops, P., Jannach, D., Musto, C., Bogers, T., and Koolen, M. (2019). Trends in content-based recommendation. User Modeling and User-Adapted Interaction, 29(2):239–249.

Maros, A., Almeida, J., Benevenuto, F., and Vasconcelos, M. (2020). Analyzing the use of audio messages in whatsapp groups. In Proc. of the WWW.

McCleary, L. and Viotti, E. (2007). Transcrição de dados de uma língua sinalizada: um estudo piloto da transcrição de narrativas na língua de sinais brasileira (lsb).

Qiang, J., Qian, Z., Li, Y., Yuan, Y., and Wu, X. (2020). Short text topic modeling techniques, applications, and performance: a survey. IEEE TKDE.

Ratcliff, J. W. and Metzener, D. E. (1988). Pattern-matching-the gestalt approach. Dr Dobbs Journal, 13(7):46.

Resende, G., Melo, P., Sousa, H., Messias, J., Vasconcelos, M., Almeida, J., and Benevenuto, F. (2019). (mis)information dissemination in whatsapp: Gathering, analyzing and countermeasures. In Proc. of the WWW.

Riggs, E. E. and Knobloch-Westerwick, S. (2022). Beyond the text: Testing narrative persuasion mechanisms with audio messages. Mass Communication and Society.

Rumsey, F. (2012). Spatial audio. Routledge.

Saha, P., Mathew, B., Garimella, K., and Mukherjee, A. (2021). “short is the road that leads from fear to hate”: Fear speech in indian whatsapp groups. In Proc. of the WWW.

Sampaio, M. X., Magalhães, R. P., da Silva, T. L. C., Cruz, L. A., de Vasconcelos, D. R., de Macêdo, J. A. F., and Ferreira, M. G. F. (2021). Evaluation of automatic speech recognition systems. In Proc. of the SBBD.

Verma, J. P., Agrawal, S., Patel, B., and Patel, A. (2016). Big data analytics: Challenges and applications for text, audio, video, and social media data. IJSCAI, 5(1):41–51.

Wang, Y., Luan, H., Yuan, J., Wang, B., and Lin, H. (2020). Laix corpus of chinese learner english: Towards a benchmark for l2 english asr. In Proc. of the INTERSPEECH.

Avaliação do Desempenho de Ferramentas de Transcrição de Áudio em Português para Análise de Dados da Web

Abstract

References

Most read articles by the same author(s)