Extracting Named-Entities and their Relationships

Authors

  • Elias Oliveira Universidade Federal do Espírito Santo
  • Gabriel Dias Universidade Federal do Espírito Santo
  • Jaimel Lima Universidade Federal do Espírito Santo
  • Juliana Pirovani Universidade Federal do Espírito Santo

DOI:

https://doi.org/10.5753/jidm.2022.2559

Keywords:

Named-Entity Recognition, Information Extraction, Artificial Intelligence

Abstract

Extracting named entities from an unstructured text is a good form to build knowledge for conversational intelligent systems. Named Entity Recognition aims to automatically identify and classify entities like persons, places, organizations, and so forth. In addition, Named Entity Recognition is also a fundamental step for relations extraction. However, both problems are hard to solve, as several categories of named entities are similarly written and appear in cognate contexts. To accomplish it, some hybrid approaches combining machine learning and expert linguistic tailored models are usually used. In this current study, we turn our focus onto the expert linguistic flavor by applying Local Grammar and Cascade of Transducers. Local Grammars are to represent the rules of a particular linguistic structure. They are often built manually to describe the entities and relations we aim to recognize. In our study, we adapted a Local Grammar to improve the Recognition of Named Entities. The results show an improvement of up to 7% on the F-measure metric in relation to the previous Local Grammar. Besides, we built another Local Grammar to recognize binary relationships between person and person linked by parenthood and person and localization linked by a recognized place of birth from the improved Local Grammar. Finally, we present practical applications for a conversational system using Prolog for inferring over the extracted entities and relations.

Downloads

Download data is not yet available.

References

Campos, J. and Oliveira, E. Extração de Nomes de Pessoas em Textos em Português: uma Abordagem Usando Gramáticas Locais. In Computer on the Beach 2015. SBC, Florianópolis, SC, 2015.

Castro, P. V. Q., Silva, N. F. F., and Soares, A. S. Portuguese Named Entity Recognition Using LSTM-CRF. In Villavicencio A. et al. (eds) Computational Processing of the Portuguese Language. PROPOR 2018. Lecture Notes in Computer Science, vol 11122. Springer, Cham, Canela, RS, pp. 83–92, 2018.

Fonseca, E., Medeiros, I., Kamikawachi, D., and Bokan, A. Automatically Grading Brazilian Student Essays. In International Conference on Computational Processing of the Portuguese Language. Springer, pp. 170–179, 2018.

Gross, M. The Construction of Local Grammars. In ROCHE, E.; SCHABES, Y. (eds.). Finite-State Language Processing, Language, Speech, and Communication, Cambridge, Mass., 1997.

Gross, M. A Bootstrap Method for Constructing Local Grammars. In Proceedings of the Symposium on Contemporary Mathematics. University of Belgrad, pp. 229–250, 1999.

He, K., Wu, J., Ma, X., Zhang, C., Huang, M., Li, C., and Yao, L. Extracting Kinship from Obituary to Enhance Electronic Health Records for Genetic Research. In Proceedings of the Fourth Social Media Mining for Health Applications (# SMM4H) Workshop & Shared Task. pp. 1–10, 2019.

Lafferty, J., McCallum, A., and Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001. Vol. 1. pp. 282–289, 2001.

Lima, J., Colombo, C., Izo, F., Oliveira, E., and Badue, C. Finding Entities and Related Facts in Newspaper. In 20th International Conference on Intelligent Systems Design and Applications – (ISDA). Springer, Springer International Publishing, On the WWW, pp. 102–113, 2020.

Lima, J., Colombo, C., Izo, F., Pirovani, J. C. P., and Oliveira, E. Using CRF+LG for Automated Classification of Named Entities in Newspaper Texts. In Computing Conference (CLEI), 2020 Latin American. IEEE, Loja, Ecuador, pp. 27–32, 2020.

Lima, R., Espinasse, B., and Freitas, F. OntoILPER: an Ontology and Inductive Logic Programming-Based System to Extract Entities and Relations from Text. Knowledge and Information Systems 56 (1): 223–255, 2018.

Linguateca., 2018. Acesso em: 17/06/2021.

Mota, C. and Santos, D. Desafios na Avaliação Conjunta do Reconhecimento de Entidades Mencionadas: O Segundo HAREM. Linguateca, 2008.

Oliveira, E., Dias, G., Lima, J., and Pirovani, J. Using Relational Inference Engine to Answer Questions. In IV Latin American Conference on Learning Analytics – (LALA). SBC, On the WWW, 2021a.

Oliveira, E., Dias, G., Lima, J., and Pirovani, J. C. Using Named Entities for Recognizing Family Relationships. In 8th Symposium on Knowledge Discovery, Mining and Learning – KDMILE. SBC, Rio de Janeiro, RJ, 2021b. PDF: https://doi.org/10.5753/kdmile.2021.17457 Vídeo:⟨link⟩.

Oliveira, E., Spalenza, M., and Pirovani, J. rAVA: A Robot for Virtual Support of Learning. In 20th International Conference on Intelligent Systems Design and Applications – (ISDA). Springer, Springer International Publishing, On the WWW, pp. 102–113, 2020.

Parsaeimehr, E., Fartash, M., and Torkestani, J. A. An Enhanced Deep Neural Network-Based Architecture for Joint Extraction of Entity Mentions and Relations. International Journal of Fuzzy Logic and Intelligent Systems 20 (1): 69–76, 2020.

Paumier, S. Unitex 3.2 User Manual, 2021. Acesso em: 24/06/2021.

Pirovani, J., Alves, J., Spalenza, M., Silva, W., Silveira Colombo, C., and Oliveira, E. Adapting NER (CRF+LG) for Many Textual Genres. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). CEUR Workshop Proceedings, vol. 2421. CEUR-WS.org, Bilbao, Spain, pp. 421–433, 2019.

Pirovani, J., Nogueira, M., and Oliveira, E. Indexing Names of Persons in a Newspaper Large Dataset. In 13th International Conference on the Computational Processing of Portuguese (PROPOR). Vol. 11122. Springer, Canela, RS, 2018.

Pirovani, J. and Oliveira, E. Portuguese Named Entity Recognition Using Conditional Random Fields and Local Grammars. In LREC. European Language Resources Association (ELRA), Miyazaki, Japan, pp. 4453–4456, 2018.

Pirovani, J. and Oliveira, E. Studying the Adaptation of Portuguese NER for Different Textual Genres. The Journal of Supercomputing, 2021.

Pirovani, J., Spalenza, M., and Oliveira, E. Geração Automática de Questões a Partir do Reconhecimento de Entidades Nomeadas em Textos Didáticos. In XXVIII Simpósio Brasileiro de Informática na Educação (SBIE). SBC, Ceará, CE, pp. 1147–1156, 2017.

Pirovani, J. P. C. CRF+LG: Uma Abordagem Híbrida para o Reconhecimento de Entidades Nomeadas em Português. Ph.D. thesis, Programa de Pós-Graduação em Informática, Universidade Federal do Espírito Santo, Vitória, ES, 2019.

Rocha, C., Jorge, A., Sionara, R., Brito, P., Pimenta, C., and Rezende, S. PAMPO: Using Pattern Matching and Pos-tagging for Effective Named Entities Recognition in Portuguese, 2016.

Santos, C. N. and Guimaraes, V. Boosting Named Entity Recognition with Neural Character Embeddings. In Proceedings of the Fifth Named Entities Workshop, ACL 2015. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 25–33, 2015.

Santos, D. and Cardoso, N. Reconhecimento de Entidades Mencionadas em Português: Documentação e Actas do HAREM, a Primeira Avaliação Conjunta na Área. Linguateca, 2007.

Yang, J., Zhang, Y., and Dong, F. Neural Reranking for Named Entity Recognition. arXiv preprint arXiv:1707.05127 , 2017.

Downloads

Published

2023-01-17

How to Cite

Oliveira, E., Dias, G., Lima, J., & Pirovani, J. (2023). Extracting Named-Entities and their Relationships. Journal of Information and Data Management, 13(6). https://doi.org/10.5753/jidm.2022.2559

Issue

Section

KDMiLe 2021