A semantic interoperability model based on NLP for non-structured health data
Resumo
The increasing volume of unstructured clinical data challenges interoperability, especially in decentralized systems like SUS. This study proposes a semantic model integrating NLP, machine learning, and ontologies to extract and standardize clinical notes. Using Named Entity Recognition (NER), lexical normalization, and alignment with HL7 FHIR and openEHR, the model was tested on COVID-19 data from partner hospitals, demonstrating its effectiveness in structuring unstructured data and enabling scalable interoperability.Referências
Alexopoulos, P. (2020). Semantic Modeling for Data. O’Reilly Media, 1st ed. edition.
Benson, T. and Grieve, G. (2021). Why interoperability is hard. In Principles of Health Interoperability, pages 21–40. Springer.
da Silva Jr, J. B., Lima, N. T., Garcia-Saisó, S., Fitzgerald, J., Bascolo, E., Gross Galiano, S., Solis Ortega, A. E., Morales, C., Marti, M., Estela Haddad, A., et al. (2024). Towards 2030: ministerial agreements on information systems and digital transformation for resilient health systems.
El Kah, A. and Zeroual, I. (2021). A review on applied natural language processing to electronic health records. In 2021 1st International Conference on Emerging Smart Technologies and Applications (eSmarTA), pages 1–6. IEEE.
Hasan, S. A. and Farri, O. (2019). Clinical natural language processing with deep learning. In Data science for healthcare, pages 147–171. Springer.
HIMSS (2021). Healthcare Information and Management Systems Society.
ISO18308 (2011). ISO 18308:2011(en), Health informatics — Requirements for an Electronic Health Record Architecture.
Li, I., Pan, J., Goldwasser, J., Verma, N., Wong, W. P., Nuzumlalı, M. Y., Rosand, B., Li, Y., Zhang, M., Chang, D., et al. (2021). Neural natural language processing for unstructured data in electronic health records: a review. arXiv preprint arXiv:2107.02975.
Li, J., Sun, A., Han, J., and Li, C. (2020). A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering.
Martin-Sanchez, F. and Verspoor, K. (2014). Big data in medicine is driving big changes. Yearbook of medical informatics, 23(01):14–20.
Mello, B., Jos, S., Andr, C., and Carine, L. Use of semantic interoperability standards in Health Records : a systematic review.
Mougin, F., Hollis, K. F., and Soualmia, L. F. (2022). Inclusive digital health. Yearbook of Medical Informatics, 31(01):002–006.
Paim, J. S. (2018). Thirty years of the unified health system (sus). Ciência & Saúde Coletiva, 23:1723–1728.
Podder, V., Lew, V., and Ghassemzadeh, S. (2021). Soap notes.[updated 2021 sep 2]. StatPearls [Internet]. StatPearls Publishing. Available from: [link].
Raza, S. and Schwartz, B. (2023). Entity and relation extraction from clinical case reports of covid-19: a natural language processing approach. BMC Medical Informatics and Decision Making, 23(1):20.
Shen, Y.-C., Hsia, T.-C., and Hsu, C.-H. (2021). Analysis of electronic health records based on deep learning with natural language processing. Arabian Journal for Science and Engineering, pages 1–11.
Sheth, A. P. (1999). Changing focus on interoperability in information systems: from system, syntax, structure to semantics. In Interoperating geographic information systems, pages 5–29. Springer.
Sim, J.-a., Huang, X., Horan, M. R., Stewart, C. M., Robison, L. L., Hudson, M. M., Baker, J. N., and Huang, I.-C. (2023). Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review. Artificial intelligence in medicine, page 102701.
Sivarethinamohan, R., Sujatha, S., and Biswas, P. (2021). Envisioning the potential of natural language processing (nlp) in health care management. In 2021 7th International Engineering Conference “Research & Innovation amid Global Pandemic”(IEC), pages 189–193. IEEE.
Zha, Y., Ke, Y., Hu, X., and Xiong, C. (2024). Ontology attention layer for medical named entity recognition. Applied Sciences, 14(1):421.
Benson, T. and Grieve, G. (2021). Why interoperability is hard. In Principles of Health Interoperability, pages 21–40. Springer.
da Silva Jr, J. B., Lima, N. T., Garcia-Saisó, S., Fitzgerald, J., Bascolo, E., Gross Galiano, S., Solis Ortega, A. E., Morales, C., Marti, M., Estela Haddad, A., et al. (2024). Towards 2030: ministerial agreements on information systems and digital transformation for resilient health systems.
El Kah, A. and Zeroual, I. (2021). A review on applied natural language processing to electronic health records. In 2021 1st International Conference on Emerging Smart Technologies and Applications (eSmarTA), pages 1–6. IEEE.
Hasan, S. A. and Farri, O. (2019). Clinical natural language processing with deep learning. In Data science for healthcare, pages 147–171. Springer.
HIMSS (2021). Healthcare Information and Management Systems Society.
ISO18308 (2011). ISO 18308:2011(en), Health informatics — Requirements for an Electronic Health Record Architecture.
Li, I., Pan, J., Goldwasser, J., Verma, N., Wong, W. P., Nuzumlalı, M. Y., Rosand, B., Li, Y., Zhang, M., Chang, D., et al. (2021). Neural natural language processing for unstructured data in electronic health records: a review. arXiv preprint arXiv:2107.02975.
Li, J., Sun, A., Han, J., and Li, C. (2020). A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering.
Martin-Sanchez, F. and Verspoor, K. (2014). Big data in medicine is driving big changes. Yearbook of medical informatics, 23(01):14–20.
Mello, B., Jos, S., Andr, C., and Carine, L. Use of semantic interoperability standards in Health Records : a systematic review.
Mougin, F., Hollis, K. F., and Soualmia, L. F. (2022). Inclusive digital health. Yearbook of Medical Informatics, 31(01):002–006.
Paim, J. S. (2018). Thirty years of the unified health system (sus). Ciência & Saúde Coletiva, 23:1723–1728.
Podder, V., Lew, V., and Ghassemzadeh, S. (2021). Soap notes.[updated 2021 sep 2]. StatPearls [Internet]. StatPearls Publishing. Available from: [link].
Raza, S. and Schwartz, B. (2023). Entity and relation extraction from clinical case reports of covid-19: a natural language processing approach. BMC Medical Informatics and Decision Making, 23(1):20.
Shen, Y.-C., Hsia, T.-C., and Hsu, C.-H. (2021). Analysis of electronic health records based on deep learning with natural language processing. Arabian Journal for Science and Engineering, pages 1–11.
Sheth, A. P. (1999). Changing focus on interoperability in information systems: from system, syntax, structure to semantics. In Interoperating geographic information systems, pages 5–29. Springer.
Sim, J.-a., Huang, X., Horan, M. R., Stewart, C. M., Robison, L. L., Hudson, M. M., Baker, J. N., and Huang, I.-C. (2023). Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review. Artificial intelligence in medicine, page 102701.
Sivarethinamohan, R., Sujatha, S., and Biswas, P. (2021). Envisioning the potential of natural language processing (nlp) in health care management. In 2021 7th International Engineering Conference “Research & Innovation amid Global Pandemic”(IEC), pages 189–193. IEEE.
Zha, Y., Ke, Y., Hu, X., and Xiong, C. (2024). Ontology attention layer for medical named entity recognition. Applied Sciences, 14(1):421.
Publicado
09/06/2025
Como Citar
MELLO, Blanda Helena De; RIGO, Sandro José; COSTA, Cristiano André da.
A semantic interoperability model based on NLP for non-structured health data. In: PRÊMIO ARTUR ZIVIANI - CONCURSO DE TESES E DISSERTAÇÕES (DOUTORADO) - SIMPÓSIO BRASILEIRO DE COMPUTAÇÃO APLICADA À SAÚDE (SBCAS), 25. , 2025, Porto Alegre/RS.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 211-216.
ISSN 2763-8987.
DOI: https://doi.org/10.5753/sbcas_estendido.2025.7929.