Bete: A Brazilian Portuguese Dataset for Named Entity Recognition and Relation Extraction in the Diabetes Healthcare Domain

Resumo


The biomedical NLP community has seen great advances in dataset development mostly for the English language, which has hindered progress in the field, as other languages are still underrepresented. This study introduces a dataset of Brazilian Portuguese annotated for named entity recognition and relation extraction in the healthcare domain. We compiled and annotated a corpus of health professionals’ responses to frequently asked questions in online healthcare forums on diabetes. We measured inter-annotator agreement and conducted initial experiments using up-to-date methods to recognize entities and extract relations, such as BERT-based ones. Data, models, and results are publicly available at https://github.com/pavalucas/Bete.

Publicado
25/09/2023
PAVANELLI, Lucas; GUMIEL, Yohan Bonescki; FERREIRA, Thiago; PAGANO, Adriana; LABER, Eduardo. Bete: A Brazilian Portuguese Dataset for Named Entity Recognition and Relation Extraction in the Diabetes Healthcare Domain. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 12. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 256-267. ISSN 2643-6264.