Extração de Eventos em Notas Clínicas

João Augusto F. Balducci; Saullo H. G. de Oliveira

doi:10.5753/stil.2025.37871

João Augusto F. Balducci PUC-Campinas
Saullo H. G. de Oliveira PUC-Campinas

DOI: https://doi.org/10.5753/stil.2025.37871

Resumo

Extração de Eventos (EE) é a tarefa de identificar e extrair informações de eventos em um texto livre. Devido à grande quantidade de fontes de texto não estruturado, a área da saúde pode se beneficiar da EE para facilitar a interpretação de registros clínicos e mitigar erros médicos. Neste trabalho apresentamos o EEVIN (Extrator de EVentos clÍNicos), um algoritmo para a extração de eventos ordenados cronologicamente a partir de textos clínicos escritos em português brasileiro. A solução foi comparada com Large Language Models (LLMs) do estado da arte e obteve resultados significativos com custo computacional mais baixo.

Referências

Benício, D. H. P. (2020). Aplicação de mineração de texto e processamento de linguagem natural em prontuários eletrônicos de pacientes para extração e transformação de texto em dado estruturado. Master’s thesis, Universidade Federal do Rio Grande do Norte.

Chen, Y., Xu, L., Liu, K., Zeng, D., and Zhao, J. (2015). Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks. In Zong, C. and Strube, M., editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 167–176. Association for Computational Linguistics.

DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z. F., Gou, Z., Shao, Z., et al. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., et al. (2024). The llama 3 herd of models.

Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L.-w. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Anthony Celi, L., and Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1):160035.

Juhn, Y. and Liu, H. (2020). Artificial intelligence approaches using natural language processing to advance ehr-based clinical research. Journal of Allergy and Clinical Immunology, 145(2):463–469.

Li, Q., Li, J., Sheng, J., Cui, S., Wu, J., Hei, Y., Peng, H., Guo, S., Wang, L., Beheshti, A., and Yu, P. S. (2024). A survey on deep learning event extraction: Approaches and applications. IEEE Transactions on Neural Networks and Learning Systems, 35(5):6301–6321.

Liu, J., Chen, Y., Liu, K., Bi, W., and Liu, X. (2020). Event Extraction as Machine Reading Comprehension. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1641–1651. Association for Computational Linguistics.

Oliveira, L. E. S. E., Peters, A. C., da Silva, A. M. P., Gebeluca, C. P., Gumiel, Y. B., Cintho, L. M. M., Carvalho, D. R., Al Hasan, S., and Moro, C. M. C. (2022). SemClinBr a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks. Journal of Biomedical Semantics, 13(1):13.

Perera, S., Sheth, A., Thirunarayan, K., Nair, S., and Shah, N. (2013). Challenges in understanding clinical notes: Why NLP engines fall short and where background knowledge can help. In Proceedings of the 2013 International Workshop on Data Management & Analytics for Healthcare DARE ’13, pages 21–26. ACM Press.

Schneider, E. T. R., de Souza, J. V. A., Knafou, J., Oliveira, L. E. S. e., Copara, J., Gumiel, Y. B., Oliveira, L. F. A. d., Paraiso, E. C., Teodoro, D., and Barra, C. M. C. M. (2020). BioBERTpt a Portuguese neural language model for clinical named entity recognition. In Rumshisky, A., Roberts, K., Bethard, S., and Naumann, T., editors, Proceedings of the 3rd Clinical Natural Language Processing Workshop, pages 65–72, Online. Association for Computational Linguistics.

Walker, Christopher, Strassel, Stephanie, Medero, Julie, and Maeda, Kazuaki (2006). ACE 2005 Multilingual Training Corpus.

Yang, A., Yang, B., Hui, B., Zheng, B., Yu, B., Zhou, C., Li, C., Li, C., Liu, D., Huang, F., Dong, G., Wei, H., Lin, H., Tang, J., Wang, J., Yang, J., Tu, J., et al. (2024). Qwen2 technical report.