Using Named Entities for Recognizing Family Relationships

E. Oliveira; G. Dias; J. Lima; J. P. C. Pirovani

doi:10.5753/kdmile.2021.17457

E. Oliveira UFES
G. Dias UFES
J. Lima UFES
J. P. C. Pirovani UFES

DOI: https://doi.org/10.5753/kdmile.2021.17457

Resumo

Named Entity Recognition problem’s objective is to automatically identify and classify entities like persons, places,organizations, and so forth. That is an area in Natural Language Processing and Information Extraction. NamedEntity Recognition is important because it is a fundamental step of preprocessing for several applications like relationextraction. However, it is a hard problem to solve as several categories of named entities are written similarly andthey appear in similar contexts. To accomplish it, we can use some hybrid approaches. Nevertheless, in this presentstudy, we use linguistic flavor by applying Local Grammar and Cascade of Transducers. Local Grammars are used torepresent the rules of a particular linguistic structure. They are often built manually to describe the entities we aimto recognize. In our study, we adapted a Local Grammar to improve the Recognition of Named Entities. The resultsshow an improvement of up to 7% on the F-measure metric in relation to the previous Local Grammar. Also, we builtanother Local Grammar to recognize family relationships from the improved Local Grammar. We present a practicalapplication for the extracted relationships using Prolog.

Palavras-chave: Named-Entity Recognition, Information Extraction, Artificial Intelligence

Referências

Castro, P. V. Q., Silva, N. F. F., and Soares, A. S. Portuguese Named Entity Recognition Using LSTM-CRF. In Villavicencio A. et al. (eds) Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science, vol 11122. Springer, Cham, Canela, RS, pp. 83–92, 2018.

Fonseca, E., Medeiros, I., Kamikawachi, D., and Bokan, A. Automatically Grading Brazilian Student Essays. In International Conference on Computational Processing of the Portuguese Language. Springer, pp. 170–179, 2018.

Gross, M. The Construction of Local Grammars. In ROCHE, E.; SCHABES, Y. (eds.). Finite-State Language Processing, Language, Speech, and Communication, Cambridge, Mass., 1997.

Gross, M. A Bootstrap Method for Constructing Local Grammars. In Proceedings of the Symposium on Contemporary Mathematics. University of Belgrad, pp. 229–250, 1999.

He, K., Wu, J., Ma, X., Zhang, C., Huang, M., Li, C., and Yao, L. Extracting Kinship from Obituary to Enhance Electronic Health Records for Genetic Research. In Proceedings of the Fourth Social Media Mining for Health Applications (# SMM4H) Workshop & Shared Task. pp. 1–10, 2019.

Lafferty, J., McCallum, A., and Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001. Vol. 1. pp. 282–289, 2001.

Lima, J., Colombo, C., Izo, F., , Oliveira, E., and Badue, C. Finding Entities and Related Facts in Newspaper. In 20th International Conference on Intelligent Systems Design and Applications – (ISDA). Springer, Springer International Publishing, On the WWW, pp. 102–113, 2020.

Lima, J., Colombo, C., Izo, F., Pirovani, J. C. P., and Oliveira, E. Using CRF+LG for Automated Classification of Named Entities in Newspaper Texts. In Computing Conference (CLEI), 2020 Latin American. IEEE, Loja, Ecuador, 2020.

Lima, R., Espinasse, B., and Freitas, F. OntoILPER: an Ontology and Inductive Logic Programming-Based System to Extract Entities and Relations from Text. Knowledge and Information Systems 56 (1): 223–255, 2018. Linguateca., 2018. Acesso em: 17/06/2021.

Mota, C. and Santos, D. Desafios na Avaliação Conjunta do Reconhecimento de Entidades Mencionadas: O Segundo HAREM. Linguateca, 2008.

Oliveira, E., Spalenza, M., and Pirovani, J. rAVA: A Robot for Virtual Support of Learning. In 20th International Conference on Intelligent Systems Design and Applications – (ISDA). Springer, Springer International Publishing, On the WWW, pp. 102–113, 2020.

Parsaeimehr, E., Fartash, M., and Torkestani, J. A. An Enhanced Deep Neural Network-Based Architecture for Joint Extraction of Entity Mentions and Relations. International Journal of Fuzzy Logic and Intelligent Systems 20 (1): 69–76, 2020.

Paumier, S. Unitex 3.2 User Manual, 2021. Acesso em: 24/06/2021.

Pirovani, J., Alves, J., Spalenza, M., Silva, W., Silveira Colombo, C., and Oliveira, E. Adapting NER (CRF+LG) for Many Textual Genres. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). CEUR Workshop Proceedings, vol. 2421. CEUR-WS.org, Bilbao, Spain, pp. 421–433, 2019.

Pirovani, J., Nogueira, M., and Oliveira, E. Indexing Names of Persons in a Newspaper Large Dataset. In 13th International Conference on the Computational Processing of Portuguese (PROPOR). Vol. 11122. Springer, Canela, RS, 2018.

Pirovani, J. and Oliveira, E. Portuguese Named Entity Recognition Using Conditional Random Fields and Local Grammars. In LREC. European Language Resources Association (ELRA), Miyazaki, Japan, pp. 4453–4456, 2018.

Pirovani, J. and Oliveira, E. Studying the Adaptation of Portuguese NER for Different Textual Genres. The Journal of Supercomputing, 2021.

Pirovani, J., Spalenza, M., and Oliveira, E. Geração Automática de Questões a Partir do Reconhecimento de Entidades Nomeadas em Textos Didáticos. In XXVIII Simpósio Brasileiro de Informática na Educação (SBIE). SBC, Ceará, CE, pp. 1147–1156, 2017.

Pirovani, J. P. C. CRF+LG: Uma Abordagem Híbrida para o Reconhecimento de Entidades Nomeadas em Português. Ph.D. thesis, Programa de Pós-Graduação em Informática, Universidade Federal do Espírito Santo, Vitória, ES, 2019.

Pirovani, J. P. C. and Oliveira, E. Extração de Nomes de Pessoas em Textos em Português: uma Abordagem Usando Gramáticas Locais. In Computer on the Beach 2015. SBC, Florianópolis, SC, 2015.

Rocha, C., Jorge, A., Sionara, R., Brito, P., Pimenta, C., and Rezende, S. PAMPO: Using Pattern Matching and Pos-tagging for Effective Named Entities Recognition in Portuguese, 2016.

Santos, D. and Cardoso, N. Reconhecimento de Entidades Mencionadas em Português: Documentação e Actas do HAREM, a Primeira Avaliação Conjunta na Área. Linguateca, 2007.

Yang, J., Zhang, Y., and Dong, F. Neural Reranking for Named Entity Recognition. arXiv preprint arXiv:1707.05127 , 2017.