Bete: A Brazilian Portuguese Dataset for Named Entity Recognition and Relation Extraction in the Diabetes Healthcare Domain

Lucas Pavanelli; Yohan Bonescki Gumiel; Thiago Ferreira; Adriana Pagano; Eduardo Laber

Bete: A Brazilian Portuguese Dataset for Named Entity Recognition and Relation Extraction in the Diabetes Healthcare Domain

Lucas Pavanelli aiXplain Inc. https://orcid.org/0000-0003-2228-7965
Yohan Bonescki Gumiel UFMG https://orcid.org/0000-0001-8239-2930
Thiago Ferreira aiXplain Inc. https://orcid.org/0000-0003-0200-3646
Adriana Pagano UFMG https://orcid.org/0000-0002-3150-3503
Eduardo Laber PUC-RJ https://orcid.org/0000-0002-9025-8333

Resumo

The biomedical NLP community has seen great advances in dataset development mostly for the English language, which has hindered progress in the field, as other languages are still underrepresented. This study introduces a dataset of Brazilian Portuguese annotated for named entity recognition and relation extraction in the healthcare domain. We compiled and annotated a corpus of health professionals’ responses to frequently asked questions in online healthcare forums on diabetes. We measured inter-annotator agreement and conducted initial experiments using up-to-date methods to recognize entities and extract relations, such as BERT-based ones. Data, models, and results are publicly available at https://github.com/pavalucas/Bete.

Springer (English)

Publicado

25/09/2023

Como Citar

Selecione um Formato

PAVANELLI, Lucas; GUMIEL, Yohan Bonescki; FERREIRA, Thiago; PAGANO, Adriana; LABER, Eduardo. Bete: A Brazilian Portuguese Dataset for Named Entity Recognition and Relation Extraction in the Diabetes Healthcare Domain. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 12. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 256-267. ISSN 2643-6264.