When External Knowledge Does Not Aggregate in Named Entity Recognition

Pedro Ivo Monteiro Privatto; Ivan Rizzo Guilherme

Pedro Ivo Monteiro Privatto UNESP
Ivan Rizzo Guilherme UNESP

Resumo

In the different areas of knowledge, textual data are important sources of information. This way, Information Extraction methods have been developed to identify and structure information present in textual documents. In particular there is the Named Entity Recognition (NER) task, which consists of using methods to identify Named Entities, such as Person, Place, among others, in texts, using techniques from Natural Language Processing and Machine Learning. Recent works explored the use of external sources of knowledge to boost the Machine Learning models with sets of domain specific relevant information for the NER task. This work aims to evaluate the aggregation of external knowledge, in the form of Gazetter and Knowledge Graphs, for NER task. Our approach is composed of two steps: i) generation of embeddings, ii) definition and training of the Machine Learning methods. The experiments were conducted on four English datasets, and their results show that the applied strategies for external knowledge integration did not bring great gains to the models, as expressed by F1-Score metric. In the performed experiments, there was an F1-score increase in 17 of the 32 cases where external knowledge was used, but in most cases the gains were lesser than 0.5% in F1-score. In some scenarios the aggregated external knowledge does not capture relevant content, thus not being necessarily beneficial to the methodology.

Palavras-chave: Named entity recognition, Information extraction, Knowledge embeddings