Skip to main content

Experiments on Portuguese Clinical Question Answering

  • Conference paper
  • First Online:
  • 997 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Abstract

Question answering (QA) systems aim to answer human questions made in natural language. This type of functionality can be very useful in the most diverse application domains, such as the biomedical and clinical. Considering the clinical context, where we have a growing volume of information stored in electronic health records, answering questions about the patient status can improve the decision-making and optimize the patient care. In this work, we carried out the first experiments to develop a QA model for clinical texts in Portuguese. To overcome the lack of corpora for the required language and context, we used a transfer learning approach supported by pre-trained attention-based models from the Transformers library. We fine-tuned the BioBERTpt model with a translated version of the SQuAD dataset. The evaluation showed promising results when evaluated in different clinical scenarios, even without the application of a clinical QA corpus to support a training process. The developed model is publicly available to the scientific community.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/huggingface/transformers/.

  2. 2.

    https://physionet.org/content/mimiciii/1.4/.

  3. 3.

    https://www.i2b2.org/NLP/DataSets/Main.php.

  4. 4.

    https://huggingface.co/bert-base-multilingual-cased.

  5. 5.

    https://huggingface.co/pucpr/biobertpt-all.

  6. 6.

    http://www.deeplearningbrasil.com.br/.

  7. 7.

    https://huggingface.co/pucpr/bioBERTpt-squad-v1.1-portuguese.

References

  1. Calijorne Soares, M.A., Parreiras, F.S.: A literature review on question answering techniques, paradigms and systems (2020). https://doi.org/10.1016/j.jksuci.2018.08.005

  2. Dalianis, H.: Characteristics of patient records and clinical corpora. In: Clinical Text Mining, pp. 21–34. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78503-5_4

    Chapter  Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (June 2019). https://doi.org/10.18653/v1/N19-1423, https://www.aclweb.org/anthology/N19-1423

  4. Dias, L.B., Duran, E.C.M.: Análise das evoluções de enfermagem contextualizadas no processo de enfermagem. Revista de Enfermagem UFPE on line (2018). https://doi.org/10.5205/1981-8963-v12i11a234623p2952-2960-2018

    Article  Google Scholar 

  5. Garritano, C.R.d.O., Junqueira, F.H., Lorosa, E.F.S., Fujimoto, M.S., Martins, W.H.A.: Avaliação do Prontuário Médico de um Hospital Universitário. Revista Brasileira de Educação Médica (2020). https://doi.org/10.1590/1981-5271v44.1-20190123

  6. Guillou, P.: Portuguese bert base cased QA (question answering), finetuned on squad v1.1 (2021). https://huggingface.co/pierreguillou/bert-base-cased-squad-v1.1-portuguese

  7. Jeong, M., et al.: Transferability of natural language inference to biomedical question answering. CoRR abs/2007.00217 (2020). https://arxiv.org/abs/2007.00217

  8. Jin, Q., Dhingra, B., Liu, Z., Cohen, W.W., Lu, X.: PubMedQA: a dataset for biomedical research question answering. In: EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2020). https://doi.org/10.18653/v1/d19-1259

  9. Krallinger, M., Krithara, A., Nentidis, A., Paliouras, G., Villegas, M.: BioASQ at CLEF2020: large-scale biomedical semantic indexing and question answering. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 550–556. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_71

    Chapter  Google Scholar 

  10. Mutabazi, E., Ni, J., Tang, G., Cao, W.: A review on medical textual question answering systems based on deep learning approaches. Appl. Sci. 11(12) (2021). https://doi.org/10.3390/app11125456, https://www.mdpi.com/2076-3417/11/12/5456

  11. e Oliveira, L.E.S., et al.: Semclinbr - a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks (2020). https://arxiv.org/abs/2001.10071

  12. Pampari, A., Raghavan, P., Liang, J., Peng, J.: emrQA: a large corpus for question answering on electronic medical records. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2357–2368. Association for Computational Linguistics, Brussels, Belgium (October-November 2018). https://doi.org/10.18653/v1/D18-1258, https://aclanthology.org/D18-1258

  13. Qiu, X.P., Sun, T.X., Xu, Y.G., Shao, Y.F., Dai, N., Huang, X.J.: Pre-trained models for natural language processing: a survey. Sci. Chin. Technol. Sci. 63(10), 1872–1897 (2020). https://doi.org/10.1007/s11431-020-1647-3

    Article  Google Scholar 

  14. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuad: 100,000+ questions for machine comprehension of text. In: EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (2016). https://doi.org/10.18653/v1/d16-1264

  15. Schneider, E.T.R., et al.: BioBERTpt - a Portuguese neural language model for clinical named entity recognition (2020). https://doi.org/10.18653/v1/2020.clinicalnlp-1.7

  16. Soni, S., Roberts, K.: Evaluation of dataset selection for pre-training and fine-tuning transformer language models for clinical question answering. In: LREC 2020–12th International Conference on Language Resources and Evaluation, Conference Proceedings (2020)

    Google Scholar 

  17. Souza, J.V.A.D., et al.: A multilabel approach to Portuguese clinical named entity recognition. J. Health Inf. 12 (2021). http://www.jhi-sbis.saude.ws/ojs-jhi/index.php/jhi-sbis/article/view/840. http://www.jhi-sbis.saude.ws/ojs-jhi/index.php/jhi-sbis/issue/view/98/showToc

  18. Šuster, S., Daelemans, W.: CliCR: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1551–1563. Association for Computational Linguistics, New Orleans, Louisiana (June 2018). https://doi.org/10.18653/v1/N18-1140, https://aclanthology.org/N18-1140

  19. Vaswani, A., et al.: Attention Is All You Need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010 (2017)

    Google Scholar 

  20. Wiese, G., Weissenborn, D., Neves, M.: Neural domain adaptation for biomedical question answering. In: CoNLL 2017–21st Conference on Computational Natural Language Learning, Proceedings (2017). https://doi.org/10.18653/v1/k17-1029

  21. Wolf, T., et al.: transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (October 2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6

  22. Yoon, W., Lee, J., Kim, D., Jeong, M., Kang, J.: Pre-trained language model for biomedical question answering. In: Communications in Computer and Information Science (2020). https://doi.org/10.1007/978-3-030-43887-6_64

  23. Yue, X., Zhang, X.F., Sun, H.: Annotated question-answer pairs for clinical notes in the mimic-iii database (2021). https://doi.org/10.13026/J0Y6-BW05, https://physionet.org/content/mimic-iii-question-answer/1.0.0/

Download references

Acknowledgements

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas Emanuel Silva e Oliveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oliveira, L.E.S.e., Schneider, E.T.R., Gumiel, Y.B., Luz, M.A.P.d., Paraiso, E.C., Moro, C. (2021). Experiments on Portuguese Clinical Question Answering. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91699-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91698-5

  • Online ISBN: 978-3-030-91699-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics