De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier
Resumo
The de-identification of clinical notes is crucial for the reuse of electronic clinical data and is a common Named Entity Recognition (NER) task. Neural language models provide a great improvement in Natural Language Processing (NLP) tasks, such as NER, when they are integrated with neural network methods. This paper evaluates the use of current state-of-the-art deep learning methods (Bi-LSTM-CRF) in the task of identifying patient names in clinical notes, for de-identification purposes. We used two corpora and three language models to evaluate which combination delivers the best performance. In our experiments, the specific corpus for the de-identification of clinical notes and a contextualized embedding with word embeddings achieved the best result: an F-measure of 0.94.
Palavras-chave:
De-identification, Clinical notes, Language model, Token classifier
Publicado
29/11/2021
Como Citar
SANTOS, Joaquim; SANTOS, Henrique D. P. dos; TABALIPA, Fábio; VIEIRA, Renata.
De-Identification of Clinical Notes Using Contextualized Language Models and a Token Classifier. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 10. , 2021, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2021
.
ISSN 2643-6264.