Deep Active-Self Learning Applied to Named Entity Recognition

José Reinaldo C. S. A. V. S. Neto; Thiago de Paulo Faleiros

José Reinaldo C. S. A. V. S. Neto UnB
Thiago de Paulo Faleiros UnB

Resumo

Deep learning models have been the state-of-the-art for a variety of challenging tasks in natural language processing, but to achieve good results they often require big labeled datasets. Deep active learning algorithms were designed to reduce the annotation cost for training such models. Current deep active learning algorithms, however, aim at training a good deep learning model with as little labeled data as possible, and as such are not useful in scenarios where the full dataset must be labeled. As a solution to this problem, this work investigates deep active-self learning algorithms that employ self-labeling using the trained model to help alleviate the cost of annotating full datasets for named entity recognition tasks. The experiments performed indicate that the proposed deep active-self learning algorithm is capable of reducing manual annotation costs for labeling the complete dataset for named entity recognition with less than 2% of the self labeled tokens being mislabeled. We also investigate an early stopping technique that doesn’t rely on a validation set, which effectively reduces even further the annotation costs of the proposed active-self learning algorithm in real world scenarios.

Palavras-chave: Deep active learning, Self learning, Named entity recognition, Deep learning