Event Extraction in Clinical Notes

  • João Augusto F. Balducci PUC-Campinas
  • Saullo H. G. de Oliveira PUC-Campinas

Abstract


Event Extraction (EE) is the task of identifying and extracting event information from free text. Due to the large number of unstructured text sources, the healthcare sector can benefit from EE to facilitate the interpretation of health records and mitigate medical errors. We therefore propose EEVIN (Extrator de EVentos clÍNicos) an algorithm for the ordered event extraction problem on clinical texts written in Brazilian Portuguese. The solution was compared with state-of-the-art Large Language Models (LLMs) and obtained significant results while presenting reduced computational costs.

References

Benício, D. H. P. (2020). Aplicação de mineração de texto e processamento de linguagem natural em prontuários eletrônicos de pacientes para extração e transformação de texto em dado estruturado. Master’s thesis, Universidade Federal do Rio Grande do Norte.

Chen, Y., Xu, L., Liu, K., Zeng, D., and Zhao, J. (2015). Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks. In Zong, C. and Strube, M., editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 167–176. Association for Computational Linguistics.

DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z. F., Gou, Z., Shao, Z., et al. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., et al. (2024). The llama 3 herd of models.

Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L.-w. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Anthony Celi, L., and Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1):160035.

Juhn, Y. and Liu, H. (2020). Artificial intelligence approaches using natural language processing to advance ehr-based clinical research. Journal of Allergy and Clinical Immunology, 145(2):463–469.

Li, Q., Li, J., Sheng, J., Cui, S., Wu, J., Hei, Y., Peng, H., Guo, S., Wang, L., Beheshti, A., and Yu, P. S. (2024). A survey on deep learning event extraction: Approaches and applications. IEEE Transactions on Neural Networks and Learning Systems, 35(5):6301–6321.

Liu, J., Chen, Y., Liu, K., Bi, W., and Liu, X. (2020). Event Extraction as Machine Reading Comprehension. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1641–1651. Association for Computational Linguistics.

Oliveira, L. E. S. E., Peters, A. C., da Silva, A. M. P., Gebeluca, C. P., Gumiel, Y. B., Cintho, L. M. M., Carvalho, D. R., Al Hasan, S., and Moro, C. M. C. (2022). SemClinBr a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks. Journal of Biomedical Semantics, 13(1):13.

Perera, S., Sheth, A., Thirunarayan, K., Nair, S., and Shah, N. (2013). Challenges in understanding clinical notes: Why NLP engines fall short and where background knowledge can help. In Proceedings of the 2013 International Workshop on Data Management & Analytics for Healthcare DARE ’13, pages 21–26. ACM Press.

Schneider, E. T. R., de Souza, J. V. A., Knafou, J., Oliveira, L. E. S. e., Copara, J., Gumiel, Y. B., Oliveira, L. F. A. d., Paraiso, E. C., Teodoro, D., and Barra, C. M. C. M. (2020). BioBERTpt a Portuguese neural language model for clinical named entity recognition. In Rumshisky, A., Roberts, K., Bethard, S., and Naumann, T., editors, Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 65–72, Online. Association for Computational Linguistics.

Walker, Christopher, Strassel, Stephanie, Medero, Julie, and Maeda, Kazuaki (2006). ACE 2005 Multilingual Training Corpus.

Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., et al. (2024). Qwen2 technical report.
Published
2025-09-29
BALDUCCI, João Augusto F.; OLIVEIRA, Saullo H. G. de. Event Extraction in Clinical Notes. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 682-687. DOI: https://doi.org/10.5753/stil.2025.37871.