Avaliação de Sistemas de Reconhecimento de Fala Robustos a Ruídos Provenientes de Maquinário da Indústria do Petróleo

Vinicius de Souza Nunes; Julio Cesar Duarte

doi:10.5753/bresci.2022.222765

Vinicius de Souza Nunes IME
Julio Cesar Duarte IME

DOI: https://doi.org/10.5753/bresci.2022.222765

Resumo

Nos últimos anos, sistemas de reconhecimento automático de fala evoluíram de regras rígidas para modelos probabilísticos, graças ao avanço em redes neurais. No entanto, locutores podem estar expostos a ruídos, o que prejudica a transcrição da fala. Assim, este trabalho tem por objetivo avaliar a aplicação de redes neurais no reconhecimento automático de fala robusto a ruídos, mais especificamente, àqueles provenientes do maquinário da indústria do petróleo. Para atingir o objetivo, foi construída uma base de ruídos representativos, e, nos experimentos, o resultado da média do CER para os modelos desenvolvidos com o mesmo tipo de ruído foi de 0,421984, sendo a SNR do treino a mesma do teste, contra 0,522851 quando tais relações são diferentes.

Palavras-chave: Reconhecimento automático de fala, aprendizado profundo, áudio ruidoso

Referências

Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M., and Weber, G. (2019). Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670.

Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., et al. (2014). Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567.

Johnson, D. H. (2006). Signal-to-noise ratio. Scholarpedia, 1(12):2088.

Juang, B.-H. and Rabiner, L. R. (2005). Automatic speech recognition–a brief history of the technology development. Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara, 1:67.

Kinoshita, K., Ochiai, T., Delcroix, M., and Nakatani, T. (2020). Improving noise robust automatic speech recognition with single-channel time-domain enhancement network. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7009–7013. IEEE.

Mattys, S. L., Davis, M. H., Bradlow, A. R., and Scott, S. K. (2012). Speech recognition in adverse conditions: A review. Language and Cognitive Processes, 27(7-8):953–978.

Mosner, L., Wu, M., Raju, A., Parthasarathi, S. H. K., Kumatani, K., Sundaram, S., Maas, R., and Hoffmeister, B. (2019). Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6475–6479. IEEE.

Pervaiz, A., Hussain, F., Israr, H., Tahir, M. A., Raja, F. R., Baloch, N. K., Ishmanov, F., and Zikria, Y. B. (2020). Incorporating noise robustness in speech command recognition by noise augmentation of training data. Sensors, 20(8):2326.

Prodeus, A. and Kukharicheva, K. (2016). Training of automatic speech recognition system on noised speech. In 2016 4th International Conference on Methods and Systems of Navigation and Motion Control (MSNMC), pages 221–223. IEEE.

Quintanilha, I. M., Netto, S. L., and Biscainho, L. W. P. (2020). An open-source end-toend asr system for brazilian portuguese using dnns built from newly assembled corpora. Journal of Communication and Information Systems, 35(1):230–242.

Saon, G., Tüske, Z., Audhkhasi, K., and Kingsbury, B. (2019). Sequence noise injected training for end-to-end speech recognition. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6261–6265. IEEE.

Xu, J., Matta, K., Islam, S., and Nürnberger, A. (2020). German speech recognition system using deepspeech. In Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, pages 102–106.