New State-of-the-Art for Question Answering on Portuguese SQuAD v1.1

  • E. H. M. da Silva Universidade de Brasilia
  • J. Laterza Universidade de Brasília
  • T. P. Faleiros Universidade de Brasília


In the Natural Language Processing field (NLP), Machine Reading Comprehension (MRC), which involves teaching computers to read a text and understand its meaning, has been a major research goal over the last few decades. A natural way to evaluate whether a computer can fully understand a piece of text or, in other words, test a machine’s reading comprehension, is to require it to answer questions about the text. In this sense, Question Answering (QA) has received increasing attention among NLP tasks. For this study, we fine-tuned BERT Portuguese language models (BERTimbau Base and BERTimbau Large) on SQuAD-BR - the SQuAD v.1.1 dataset translated to Portuguese by the Deep Learning Brazil group - for Extractive QA task, in order to achieve better performance than other existing models trained on the dataset. As a result, we accomplished our objective, establishing the new state-of-the-art on SQuAD-BR dataset using BERTimbau Large fine-tuned model.

Palavras-chave: bert, extractive question answering, fine-tune, language model, squad v1.1 portuguese, transfer learning


