Identifying named entity from researcher curricula
Resumo
NER (Named Entity Recognition) is an essential task in recognizing real-world entities scattered in a document. The task has been beneficial for detecting people, institutions, and places. In a researcher's curriculum repository, a NER process can be beneficial for understanding the associated context of a given document. For example, it could be possible to identify which persons/institutions are present in a given researcher's curriculum. This process is fundamental to identifying experts to work on a project or collaboration among researchers. In this paper, we evaluate entity extraction methods' effectiveness for identifying entities from scientific publications, including vocabulary-based and model-based methods. We describe an analysis of existing NER tools while proposing a procedure to apply NER identification over curricula from the Brazilian Lattes Curricula platform.
Palavras-chave:
Named entity resolution, tools analysis, researcher's curriculum
Referências
Angeli, G., Premkumar, M. J., and Manning, C. D. (2015). Leveraging linguistic structure for open domain information extraction. In Assoc. for Comput. Linguistics (ACL).
Jurafsky, D. and Martin, J. H. (2018). Speech and Language Processing (2rd Edition-draft). Upper Saddle River, NJ, USA.
Nadeau, D. and Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3-26.
Yadav, V. and Bethard, S. (2018). A survey on recent advances in named entity recognition from deep learning models. In Proc. of the 27th International Conf. on Comput. Linguistics.
Jurafsky, D. and Martin, J. H. (2018). Speech and Language Processing (2rd Edition-draft). Upper Saddle River, NJ, USA.
Nadeau, D. and Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3-26.
Yadav, V. and Bethard, S. (2018). A survey on recent advances in named entity recognition from deep learning models. In Proc. of the 27th International Conf. on Comput. Linguistics.
Publicado
19/09/2022
Como Citar
GONÇALVES, Rodrigo; DORNELES, Carina F..
Identifying named entity from researcher curricula. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 37. , 2022, Búzios.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2022
.
p. 427-432.
ISSN 2763-8979.
DOI: https://doi.org/10.5753/sbbd.2022.226233.