From ANVISA leaflets to extended interoperability with global health databases: some pitfalls and success stories
Resumo
Data interoperability in Health Information Systems (HIS) has long been recognized as a challenge, sometimes even within a single institution, given the number of different databases, systems, standards and requirements adopted. In Brazil, this is aggravated by the lack of standardized, consensual data sources in Portuguese. This paper describes a hands-on approach to overcome these hurdles, thereby helping researchers and practitioners interested in these issues. It shows how, starting from official sites of the Brazilian Health Ministry, we built HealDB – an open Portuguese language database with hundreds of thousands of instances. HealDB supports interoperability across multiple international data sources, providing a core for the construction of federated HIS in Brazil. Within this context, the paper contains two main contributions: (1) a discussion of successive approaches to derive disease ICD-10 codes from text in drug leaflets, identifying their pros and cons; and (2) a case study of linkage to RxNorm, a normalized naming system for clinical drugs maintained by the U.S. National Library of Medicine, thereby illustrating potential extensions.
Palavras-chave:
Health information systems, open science, Portuguese-language database
Referências
Bhatia, P., Celikkaya, B., Khalilia, M., and Senthivel, S. (2019). Comprehend medical: A named entity recognition and relationship extraction web service. In Proc. 18th IEEE International Conference On Machine Learning And Applications, pages 1844–1851.
Haux, R. (2006). Health information systems–past, present, future. International journal of medical informatics, 75(3–4):268–281.
Maciel, R. S. P., Valle, P. H. D., Santos, K. S., and Nakagawa, E. Y. (2024). Systems Interoperability Types: A Tertiary Study. ACM Computing Surveys, 56(10).
Martins, M. J. A. and Medeiros, C. B. (2023). Linking Heterogeneous Health Data Sources in Brazil Centered on Drug Leaflet Processing. In Proc. XXXVIII Brazilian Database Symposium, pages 366–371. SBC - Brazilian Computer Society.
Martins, M. J. A. and Medeiros, C. B. (2024). Construction of Open Data Sources for Data Interoperability in Brazilian Health Information Systems. In Proc. 28th European Conference on Databases and Information Systems – ADBIS 2024 - DOING workshop (Intelligent data - from data to knowledge), pages 117–129. Springer - CCIS vol 2186.
Martins, M. J. A. and Medeiros, C. B. (2025). HealDB - an open Portuguese language database for health information systems, V1. DOI: 10.25824/redu/24I1FH.
Sallauka, R., Arioz, U., Rojc, M., and Mlakar, I. (2025). Weakly-supervised multilingual medical ner for symptom extraction for low-resource languages. Applied Sciences, 15(10):5585.
Schneider, E., de Souza, J., Knafou, J., Copara, J., e Oliveira, L., Gumiel, Y., de Oliveira, L., Teodoro, D., Paraiso, E., and Moro, C. (2020). Biobertpt - a portuguese neural language model for clinical named entity recognition. In Proc. 3rdClinical Natural Language Processing workshop, pages 65–72.
Shaitarova, A., Zaghir, J., Lavelli, A., Krauthammer, M., and Rinaldi, F. (2023). Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey. Yearbook of medical informatics, 32(1):240–243.
Simões, A. and Gamallo, P. (2021). Leme-pt: a medical package leaflet corpus for portuguese. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021), pages 10–1. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
Sohn, S. and Liu, H. (2014). Analysis of medication and indication occurrences in clinical notes. AMIA Annu Symp Proc, 2014:1046––1055.
Sousa, H., Mario Jorge, A., Pasquali, A., Santos, C., and Lopes, M. (2023). A biomedical entity extraction pipeline for oncology health records in portuguese. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, page 950–956.
Haux, R. (2006). Health information systems–past, present, future. International journal of medical informatics, 75(3–4):268–281.
Maciel, R. S. P., Valle, P. H. D., Santos, K. S., and Nakagawa, E. Y. (2024). Systems Interoperability Types: A Tertiary Study. ACM Computing Surveys, 56(10).
Martins, M. J. A. and Medeiros, C. B. (2023). Linking Heterogeneous Health Data Sources in Brazil Centered on Drug Leaflet Processing. In Proc. XXXVIII Brazilian Database Symposium, pages 366–371. SBC - Brazilian Computer Society.
Martins, M. J. A. and Medeiros, C. B. (2024). Construction of Open Data Sources for Data Interoperability in Brazilian Health Information Systems. In Proc. 28th European Conference on Databases and Information Systems – ADBIS 2024 - DOING workshop (Intelligent data - from data to knowledge), pages 117–129. Springer - CCIS vol 2186.
Martins, M. J. A. and Medeiros, C. B. (2025). HealDB - an open Portuguese language database for health information systems, V1. DOI: 10.25824/redu/24I1FH.
Sallauka, R., Arioz, U., Rojc, M., and Mlakar, I. (2025). Weakly-supervised multilingual medical ner for symptom extraction for low-resource languages. Applied Sciences, 15(10):5585.
Schneider, E., de Souza, J., Knafou, J., Copara, J., e Oliveira, L., Gumiel, Y., de Oliveira, L., Teodoro, D., Paraiso, E., and Moro, C. (2020). Biobertpt - a portuguese neural language model for clinical named entity recognition. In Proc. 3rdClinical Natural Language Processing workshop, pages 65–72.
Shaitarova, A., Zaghir, J., Lavelli, A., Krauthammer, M., and Rinaldi, F. (2023). Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey. Yearbook of medical informatics, 32(1):240–243.
Simões, A. and Gamallo, P. (2021). Leme-pt: a medical package leaflet corpus for portuguese. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021), pages 10–1. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
Sohn, S. and Liu, H. (2014). Analysis of medication and indication occurrences in clinical notes. AMIA Annu Symp Proc, 2014:1046––1055.
Sousa, H., Mario Jorge, A., Pasquali, A., Santos, C., and Lopes, M. (2023). A biomedical entity extraction pipeline for oncology health records in portuguese. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, page 950–956.
Publicado
29/09/2025
Como Citar
MARTINS, Márcia Jacobina Andrade; MEDEIROS, Claudia Bauzer.
From ANVISA leaflets to extended interoperability with global health databases: some pitfalls and success stories. In: BRAZILIAN E-SCIENCE WORKSHOP (BRESCI), 19. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 9-16.
ISSN 2763-8774.
DOI: https://doi.org/10.5753/bresci.2025.247982.
