Linking Heterogeneous Health Data Sources in Brazil Centered on Drug Leaflet Processing


Health Information Systems often include a medication Recommendation module that helps doctors find medications based on symptoms. Most such modules rely on simple AI engines, fed by rules that correlate symptoms, diseases and medications. This, however, presents research and practical problems - e.g., some of the medications may no longer be commercially available, or their components may have been updated. Moreover, studies conducted to design such modules are based on corpora and databases in the English language. This hinders an adaptation to the Brazilian context, not only because of the language, but also due to the lack of authoritative integrated bases. To help solve these issues, we have designed a framework based on automatically extracting and linking information from all drug leaflets of approved medications in Brazil to feed recommendation systems. We processed and linked heterogeneous official data sources of the Ministry of Health, symptoms and diseases. The ongoing implementation, described here, created an ontology from the extracted data to enable linkage and identified quality problems in official data.

Palavras-chave: ontologies, processing drug leaflets in Portuguese, data linkage, data curation


A. Flynn, C. Huang, N. L. G. M. N. G. A. B. B. R. and Boisvert, P. (2021). An experiment to convert structured product labels into computable prescribing information. 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI), pages 296–300.

K. W. Fung, C. S. J. and Demner-Fushman, D. (2013). Extracting drug indication information from structured product labels using natural language processing. J Am Med Inform Assoc, 20(3):482–488.

Li, Y. and Xiao, C. (2019). Developing a data-driven medication indication knowledge base using a large scale medical claims database). AMIA Jt Summits Transl Sci Proc, 2019:741–750.

Martins, M. J. A. and Medeiros, C. B. (2023). Medications, symptoms and drug leaflets extracted from public Brazilian sources., Repositório de Dados de Pesquisa da Unicamp, DRAFT VERSION.

R. Khare, J. L. and Lu, Z. (2014). Labeledin: cataloging labeled indications for humandrugs. J Biomed Inform, 52:448–456.

S. J. Nelson, A. F. and Tuttle, M. S. (2021). A bottom-up approach to creating an ontology for medication indications). Am Med Inform Assoc, 28(4):753–758.

Silva, J. V. F. (2016). Facil Bula: Sistema que Estrutura o Bulario Eletronico da ANVISA.

Sohn, S. and Liu, H. (2014). Analysis of medication and indication occurrences in clinical notes. AMIA Annu Symp Proc, 2014:1046-1055.
MARTINS, Márcia Jacobina Andrade; MEDEIROS, Claudia Bauzer. Linking Heterogeneous Health Data Sources in Brazil Centered on Drug Leaflet Processing. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 38. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 366-371. ISSN 2763-8979. DOI: