Abstract
Question answering (QA) systems aim to answer human questions made in natural language. This type of functionality can be very useful in the most diverse application domains, such as the biomedical and clinical. Considering the clinical context, where we have a growing volume of information stored in electronic health records, answering questions about the patient status can improve the decision-making and optimize the patient care. In this work, we carried out the first experiments to develop a QA model for clinical texts in Portuguese. To overcome the lack of corpora for the required language and context, we used a transfer learning approach supported by pre-trained attention-based models from the Transformers library. We fine-tuned the BioBERTpt model with a translated version of the SQuAD dataset. The evaluation showed promising results when evaluated in different clinical scenarios, even without the application of a clinical QA corpus to support a training process. The developed model is publicly available to the scientific community.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Calijorne Soares, M.A., Parreiras, F.S.: A literature review on question answering techniques, paradigms and systems (2020). https://doi.org/10.1016/j.jksuci.2018.08.005
Dalianis, H.: Characteristics of patient records and clinical corpora. In: Clinical Text Mining, pp. 21–34. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78503-5_4
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (June 2019). https://doi.org/10.18653/v1/N19-1423, https://www.aclweb.org/anthology/N19-1423
Dias, L.B., Duran, E.C.M.: Análise das evoluções de enfermagem contextualizadas no processo de enfermagem. Revista de Enfermagem UFPE on line (2018). https://doi.org/10.5205/1981-8963-v12i11a234623p2952-2960-2018
Garritano, C.R.d.O., Junqueira, F.H., Lorosa, E.F.S., Fujimoto, M.S., Martins, W.H.A.: Avaliação do Prontuário Médico de um Hospital Universitário. Revista Brasileira de Educação Médica (2020). https://doi.org/10.1590/1981-5271v44.1-20190123
Guillou, P.: Portuguese bert base cased QA (question answering), finetuned on squad v1.1 (2021). https://huggingface.co/pierreguillou/bert-base-cased-squad-v1.1-portuguese
Jeong, M., et al.: Transferability of natural language inference to biomedical question answering. CoRR abs/2007.00217 (2020). https://arxiv.org/abs/2007.00217
Jin, Q., Dhingra, B., Liu, Z., Cohen, W.W., Lu, X.: PubMedQA: a dataset for biomedical research question answering. In: EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/d19-1259
Krallinger, M., Krithara, A., Nentidis, A., Paliouras, G., Villegas, M.: BioASQ at CLEF2020: large-scale biomedical semantic indexing and question answering. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 550–556. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_71
Mutabazi, E., Ni, J., Tang, G., Cao, W.: A review on medical textual question answering systems based on deep learning approaches. Appl. Sci. 11(12) (2021). https://doi.org/10.3390/app11125456, https://www.mdpi.com/2076-3417/11/12/5456
e Oliveira, L.E.S., et al.: Semclinbr - a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks (2020). https://arxiv.org/abs/2001.10071
Pampari, A., Raghavan, P., Liang, J., Peng, J.: emrQA: a large corpus for question answering on electronic medical records. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2357–2368. Association for Computational Linguistics, Brussels, Belgium (October-November 2018). https://doi.org/10.18653/v1/D18-1258, https://aclanthology.org/D18-1258
Qiu, X.P., Sun, T.X., Xu, Y.G., Shao, Y.F., Dai, N., Huang, X.J.: Pre-trained models for natural language processing: a survey. Sci. Chin. Technol. Sci. 63(10), 1872–1897 (2020). https://doi.org/10.1007/s11431-020-1647-3
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuad: 100,000+ questions for machine comprehension of text. In: EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (2016). https://doi.org/10.18653/v1/d16-1264
Schneider, E.T.R., et al.: BioBERTpt - a Portuguese neural language model for clinical named entity recognition (2020). https://doi.org/10.18653/v1/2020.clinicalnlp-1.7
Soni, S., Roberts, K.: Evaluation of dataset selection for pre-training and fine-tuning transformer language models for clinical question answering. In: LREC 2020–12th International Conference on Language Resources and Evaluation, Conference Proceedings (2020)
Souza, J.V.A.D., et al.: A multilabel approach to Portuguese clinical named entity recognition. J. Health Inf. 12 (2021). http://www.jhi-sbis.saude.ws/ojs-jhi/index.php/jhi-sbis/article/view/840. http://www.jhi-sbis.saude.ws/ojs-jhi/index.php/jhi-sbis/issue/view/98/showToc
Šuster, S., Daelemans, W.: CliCR: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1551–1563. Association for Computational Linguistics, New Orleans, Louisiana (June 2018). https://doi.org/10.18653/v1/N18-1140, https://aclanthology.org/N18-1140
Vaswani, A., et al.: Attention Is All You Need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010 (2017)
Wiese, G., Weissenborn, D., Neves, M.: Neural domain adaptation for biomedical question answering. In: CoNLL 2017–21st Conference on Computational Natural Language Learning, Proceedings (2017). https://doi.org/10.18653/v1/k17-1029
Wolf, T., et al.: transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (October 2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Yoon, W., Lee, J., Kim, D., Jeong, M., Kang, J.: Pre-trained language model for biomedical question answering. In: Communications in Computer and Information Science (2020). https://doi.org/10.1007/978-3-030-43887-6_64
Yue, X., Zhang, X.F., Sun, H.: Annotated question-answer pairs for clinical notes in the mimic-iii database (2021). https://doi.org/10.13026/J0Y6-BW05, https://physionet.org/content/mimic-iii-question-answer/1.0.0/
Acknowledgements
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Oliveira, L.E.S.e., Schneider, E.T.R., Gumiel, Y.B., Luz, M.A.P.d., Paraiso, E.C., Moro, C. (2021). Experiments on Portuguese Clinical Question Answering. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-91699-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)