Large-scale Translation to Enable Response Selection in Low Resource Languages: A COVID-19 Chatbot Experiment

Lucas Almeida Aguiar; Lívia Almada Cruz; Ticiana L. Coelho da Silva; Rafael Augusto Ferreira do Carmo; Matheus Henrique Esteves Paixao

doi:10.5753/sbbd.2022.224329

Lucas Almeida Aguiar Universidade Estadual do Ceará (UECE)
Lívia Almada Cruz Universidade Federal do Ceará (UFC)
Ticiana L. Coelho da Silva Universidade Federal do Ceará (UFC)
Rafael Augusto Ferreira do Carmo Universidade Federal do Ceará (UFC)
Matheus Henrique Esteves Paixao Universidade Estadual do Ceará (UECE)

DOI: https://doi.org/10.5753/sbbd.2022.224329

Resumo

Natural Language Processing for Low Resource Languages is challenging. The lack of large-scale datasets affects the performance of data-hungry algorithms. To overcome this, we employ data augmentation to enlarge the training data for the task of response selection in multi-turn retrieval-based chatbots. We automatically translated a large-scale English dataset to Brazilian Portuguese (PT_BR) and used it to train a deep neural network. For a COVID-19 chatbot system, our results show that the combination of training with the translated dataset followed by a fine-tuning with the context-specific dataset provides the best results in terms of recall for all studied models. In addition, we make available the translated large-scale PT_BR dataset.

Palavras-chave: natural language processing, automatic translation, multi-turn retrieval-based chatbot, low resource language

Referências

Adamopoulou, E. and Moussiades, L. (2020). An Overview of Chatbot Technology. In IFIP Advances in Information and Communication Technology, volume 584 IFIP, pages 373-383. Springer International Publishing.

Bi, W., Li, H., and Huang, J. (2021). Data augmentation for text generation without any augmented data. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2223-2237.

Bonifacio, L. H., Campiotti, I., Jeronymo, V., Lotufo, R., and Nogueira, R. (2021). mmarco: A multilingual version of the ms marco passage ranking dataset. arXiv preprint arXiv:2108.13897.

Carrino, C. P., Costa-jussà, M. R., and Fonollosa, J. A. (2020). Automatic spanish translation of squad dataset for multi-lingual question answering. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 5515-5523.

Coelho da Silva, T. L., Ferreira, M. G. F., Magalhaes, R. P., de Macêdo, J. A. F., and da Silva Araújo, N. (2020). Rastreador de sintomas da covid19. Simpósio Brasileiro de Banco de Dados.

Costa, F. A., Ferreira, T. C., Pagano, A., and Meira, W. (2020). Building the first english-brazilian portuguese corpus for automatic post-editing. In Proceedings of the 28th international conference on computational linguistics, pages 6063-6069.

Fischer, M., Haque, R., Stynes, P., and Pathak, P. (2022). Identifying fake news in brazilian portuguese. In International Conference on Applications of Natural Language to Information Systems, pages 111-118. Springer.

Friedman, R., Sedoc, J., Gretz, S., Toledo, A., Weeks, R., Bar-Zeev, N., Katz, Y., and Slonim, N. (2022). Viratrustdata: A trust-annotated corpus of human-chatbot conversations about covid-19 vaccines. arXiv preprint arXiv:2205.12240.

Gomes, J. R. S. (2020). Plue: Portuguese language understanding evaluation. https://github.com/ju-resplande/PLUE.

Lee, K., Yoon, K., Park, S., and Hwang, S. W. (2019). Semi-supervised training data generation for multilingual question answering. LREC 2018-11th International Conference on Language Resources and Evaluation, pages 2758-2762.

Li, J., Tao, C., Hu, H., Xu, C., Chen, Y., and Jiang, D. (2022). Unsupervised cross-domain adaptation for response selection using self-supervised and adversarial training. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, pages 562-570.

Liu, C.-W., Lowe, R., Serban, I., Noseworthy, M., Charlin, L., and Pineau, J. (2016). How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In EMNLP.

Lowe, R., Pow, N., Serban, I., and Pineau, J. (2015). The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv preprint arXiv:1506.08909.

Lowe, R., Pow, N., Serban, I. V., Charlin, L., Liu, C.-W., and Pineau, J. (2017). Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue & Discourse, 8(1):31-65.

Lu, J., Ren, X., Ren, Y., Liu, A., and Xu, Z. (2020). Improving contextual language models for response retrieval in multi-turn conversation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1805-1808.

Mozannar, H., Maamary, E., El Hajal, K., and Hajj, H. (2019). Neural Arabic Question Answering. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, number 1, pages 108-118, Stroudsburg, PA, USA. Association for Computational Linguistics.

Paul, A., Haque Latif, A., Amin Adnan, F., and Rahman, R. M. (2019). Focused domain contextual ai chatbot framework for resource poor languages. Journal of Information and Telecommunication, 3(2):248-269.

Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of EMNLP, pages 1532-1543.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.

Sennrich, R., Haddow, B., and Birch, A. (2016). Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86-96.

Shalyminov, I., Sordoni, A., Atkinson, A., and Schulz, H. (2021). Grtr: Generative-retrieval transformers for data-efficient dialogue domain adaptation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:2484-2492.

Shum, H.-y., He, X.-d., and Li, D. (2018). From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic Engineering, 19(1):10-26.

von Essen, H. and Hesslow, D. (2020). Building a Swedish Question-Answering Model. Proceedings of the Probability and Meaning Conference (PaM 2020), (PaM):117-127.

Wan, S., Lan, Y., Xu, J., Guo, J., Pang, L., and Cheng, X. (2016). Match-srnn: modeling the recursive matching structure with spatial rnn. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pages 2922-2928.

Wang, H., Lu, Z., Li, H., and Chen, E. (2013). A dataset for research on short-text conversations. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 935-945.

Wang, S. and Jiang, J. (2016). Learning natural language inference with lstm. In Proceedings of NAACL-HLT, pages 1442-1451.

Wasserman, L. (2004). All of statistics: a concise course in statistical inference, volume 26. Springer.

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J. (2016). Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.

Wu, Y., Wu, W., Xing, C., Zhou, M., and Li, Z. (2017). Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long papers), pages 496-505.

Xia, M., Kong, X., Anastasopoulos, A., and Neubig, G. (2019). Generalized data augmentation for low-resource translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5786-5796.

Yang, W., Zeng, G., Tan, B., Ju, Z., Chakravorty, S., He, X., Chen, S., Yang, X., Wu, Q., Yu, Z., et al. (2020). On the generation of medical dialogues for covid-19. arXiv preprint arXiv:2005.05442.

Yoon, S., Shin, J., and Jung, K. (2017). Learning to rank question-answer pairs using hierarchical recurrent encoder with latent topic clustering. arXiv preprint arXiv:1710.03430.

Zhang, W.-N., Zhu, Q., Wang, Y., Zhao, Y., and Liu, T. (2019). Neural personalized response generation as domain adaptation. World Wide Web, 22(4):1427-1446.

Zhang, Z., Li, J., Zhu, P., Zhao, H., and Liu, G. (2018). Modeling multi-turn conversation with deep utterance aggregation. arXiv preprint arXiv:1806.09102.

Zhou, X., Li, L., Dong, D., Liu, Y., Chen, Y., Zhao, W. X., Yu, D., and Wu, H. (2018). Multi-turn response selection for chatbots with deep attention matching network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1118-1127.