New State-of-the-Art for Question Answering on Portuguese SQuAD v1.1
Resumo
In the Natural Language Processing field (NLP), Machine Reading Comprehension (MRC), which involves teaching computers to read a text and understand its meaning, has been a major research goal over the last few decades. A natural way to evaluate whether a computer can fully understand a piece of text or, in other words, test a machine’s reading comprehension, is to require it to answer questions about the text. In this sense, Question Answering (QA) has received increasing attention among NLP tasks. For this study, we fine-tuned BERT Portuguese language models (BERTimbau Base and BERTimbau Large) on SQuAD-BR - the SQuAD v.1.1 dataset translated to Portuguese by the Deep Learning Brazil group - for Extractive QA task, in order to achieve better performance than other existing models trained on the dataset. As a result, we accomplished our objective, establishing the new state-of-the-art on SQuAD-BR dataset using BERTimbau Large fine-tuned model.
Referências
Cambazoglu, B. B., Sanderson, M., Scholer, F., and Croft, B. A review of public datasets in question answering research. SIGIR Forum 54 (2), Aug., 2021.
DeepLearningBrasil. Squad v1.1 automatically translated to portuguese and reviewed, 2021.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp. 4171–4186, 2019.
Gotmare, A., Keskar, N. S., Xiong, C., and Socher, R. A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation. CoRR vol. abs/1810.13243, 2018.
Guillou, P. Bert base cased squad v1.1 portuguese, 2021a. [link]. Last accessed: 2021-09-13.
Guillou, P. Bert large cased squad v1.1 portuguese, 2021b. [link]. Last accessed: 2021-09-13.
Howard, J. and Ruder, S. Fine-tuned language models for text classification. CoRR vol. abs/1801.06146, 2018.
Huggingface. Trainer - transformers 4.7.0 docs, 2021. [link]. Last accessed: 2021-07-08.
Joshi, M., Chen, D., Liu, Y., Weld, D. S., Zettlemoyer, L., and Levy, O. SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics vol. 8, pp. 64–77, 2020.
Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), 2015.
Liu, N. F., Lee, T., Jia, R., and Liang, P. Can small and synthetic benchmarks drive modeling innovation? A retrospective study of question answering modeling approaches. CoRR vol. abs/2102.01065, 2021.
Loshchilov, I. and Hutter, F. Fixing weight decay regularization in adam. CoRR vol. abs/1711.05101, 2017.
Malte, A. and Ratadiya, P. Evolution of transfer learning in natural language processing. CoRR vol. abs/1910.07370, 2019.
Mayeesha, T. T., Sarwar, A. M., and Rahman, R. M. Deep learning based question answering system in bengali. Journal of Information and Telecommunication 5 (2): 145–178, 2021.
Miller, J., Krauth, K., Recht, B., and Schmidt, L. The effect of natural distribution shift on question answering models. CoRR vol. abs/2004.14444, 2020.
Patel, D., Raval, P., Parikh, R., and Shastri, Y. Comparative study of machine learning models and BERT on squad. CoRR vol. abs/2005.11313, 2020.
Pranesh, R. R., Shekhar, A., and Pallavi, S. Quesbelm: A bert based ensemble language model for natural questions. In 2020 5th International Conference on Computing, Communication and Security (ICCCS). pp. 1–5, 2020.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 (140): 1–67, 2020.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. Squad: 100, 000+ questions for machine comprehension of text. CoRR vol. abs/1606.05250, 2016.
Ravichander, A., Dalmia, S., Ryskina, M., Metze, F., Hovy, E., and Black, A. W. NoiseQA: Challenge set evaluation for user-centric question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, pp. 2976–2992, 2021.
Souza, F., Nogueira, R., and Lotufo, R. Bertimbau: Pretrained bert models for brazilian portuguese. In Intelligent Systems, R. Cerri and R. C. Prati (Eds.). Springer International Publishing, Cham, pp. 403–417, 2020.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. CoRR vol. abs/1706.03762, 2017.
Wadhwa, S., Chandu, K., and Nyberg, E. Comparative analysis of neural QA models on SQuAD. In Proceedings of the Workshop on Machine Reading for Question Answering. Association for Computational Linguistics, Melbourne, Australia, pp. 89–97, 2018.
Wagner Filho, J. A., Wilkens, R., Idiart, M., and Villavicencio, A. The brWaC corpus: A new open resource for Brazilian Portuguese. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, 2018.
Yamada, I., Asai, A., Shindo, H., Takeda, H., and Matsumoto, Y. LUKE: deep contextualized entity representations with entity-aware self-attention. CoRR vol. abs/2010.01057, 2020.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc., 2019.
Zeng, C., Li, S., Li, Q., Hu, J., and Hu, J. A survey on machine reading comprehension—tasks, evaluation metrics and benchmark datasets. Applied Sciences 10 (21), 2020.