E-BELA: Enhanced Embedding-Based Entity Linking Approach
Resumo
Entity linking is the process of connecting mentions of entities in natural language texts, such as references to people or places, to specific entities in knowledge graphs, such as DBpedia or Wikidata. This process is crucial in the natural language processing tasks since it facilitates disambiguating entities in unstructured data, enhancing understanding and semantic processing. However, entity linking faces challenges due to the complexity and ambiguity of natural languages, as well as the discrepancy between the form of textual entity mentions and entity representations. Considering that entity mentions are in natural language and entity representations in knowledge graphs have object nodes that describe them in the same way, in this work we propose E-BELA, an effective approach based on literal embeddings. We aim to put close vector representations of mentions and entities in a vector space, allowing linking of mentions and entities by using a similarity or distance metric. The results demonstrate that our approach outperforms previous ones, contributing to the field of natural language processing.
Palavras-chave:
Natural Language Processing, Entity Linking, Linked Open Data, Entity Similarity, Embedding, Disambiguation, DBpedia
Referências
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder. [link]
Lihan Chen, Tinghui Zhu, Jingping Liu, Jiaqing Liang, and Yanghua Xiao. 2023. End-to-End Entity Linking with Hierarchical Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence 37, 4 (Jun. 2023), 4173–4181. DOI: 10.1609/aaai.v37i4.25534
Lucas Colucci, Prachi Doshi, Kun-Lin Lee, Jiajie Liang, Yin Lin, Ishan Vashishtha, Jia Zhang, and Alvin Jude. 2016. Evaluating Item-Item Similarity Algorithms for Movies. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA’16). Association for Computing Machinery, New York, NY, USA, 2141–2147. DOI: 10.1145/2851581.2892362
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, MN, USA, 4171–4186. DOI: 10.18653/V1/N19-1423
Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, and Markus Zanker. 2012. Linked Open Data to Support Content-Based Recommender Systems. In Proceedings of the 8th International Conference on Semantic Systems (Graz, Austria) (I-SEMANTICS ’12). Association for Computing Machinery, New York, NY, USA, 1–8. DOI: 10.1145/2362499.2362501
Tommaso Di Noia and Vito Claudio Ostuni. 2015. Recommender Systems and Linked Open Data. Springer International Publishing, Cham, 88–113.
Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, and Jens Lehmann. 2018. EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs. In The Semantic Web – ISWC 2018, Denny Vrandečić, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Springer International Publishing, Cham, 108–126.
Jorão Gomes, Rômulo Chrispim de Mello, Victor Ströele, and Jairo Francisco de Souza. 2022. A Hereditary Attentive Template-based Approach for Complex Knowledge Base Question Answering Systems. Expert Systems with Applications 205 (2022), 117725. DOI: 10.1016/j.eswa.2022.117725
Ningning Jia, Xiang Cheng, Sen Su, and Liyuan Ding. 2021. CoGCN: Combining co-attention with graph convolutional network for entity linking with knowledge graphs. Expert Systems 38, 1 (2021), e12606. DOI: 10.1111/exsy.12606 [link]
Hongkun Leng, Caleb De La Cruz Paulino, Momina Haider, Rui Lu, Zhehui Zhou, Ole Mengshoel, Per-Erik Brodin, Julien Forgeat, and Alvin Jude. 2018. Finding similar movies: dataset, tools, and methods. In Proceedings of the 8th International Conference on Semantic Systems (WSCG’2018). Václav Skala-UNION Agency, Plzen, Czech Republic, 115–124.
Huiying Li, Wenqi Yu, and Xinbang Dai. 2023. Joint linking of entity and relation for question answering over knowledge graph. Multimedia Tools and Applications 82, 29 (01 Dec 2023), 44801–44818. DOI: 10.1007/s11042-023-15646-w
Qijia Li, Feng Li, Shuchao Li, Xiaoyu Li, Kang Liu, Qing Liu, and Pengcheng Dong. 2022. Improving Entity Linking by Introducing Knowledge Graph Structure Information. Applied Sciences 12, 5 (2022), 44801–44818. DOI: 10.3390/app12052702
Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). Association for Computing Machinery, Scottsdale, Arizona, USA. [link]
Tomás Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013). arXiv:1310.4546 [link]
Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Ostuni, and Eugenio Di Sciascio. 2012. Movie recommendation with DBpedia, In Movie recommendation with DBpedia. CEUR Workshop Proceedings 835, 101–112.
Jean Gabriel Nguema Ngomo, Giseli Rabello Lopes, Maria Luiza Machado Campos, and Maria Claudia Reis Cavalcanti. 2020. An Approach for Improving DBpedia as a Research Data Hub. In Proceedings of the Brazilian Symposium on Multimedia and the Web (São Luís, Brazil) (WebMedia ’20). Association for Computing Machinery, New York, NY, USA, 65–72. DOI: 10.1145/3428658.3431075
Ítalo M. Pereira and Anderson A. Ferreira. 2019. An Item-Item Similarity Approach Based on Linked Open Data Semantic Relationship. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web (Rio de Janeiro, Brazil) (WebMedia ’19). Association for Computing Machinery, New York, NY, USA, 425–432. DOI: 10.1145/3323503.3349547
Maria Pershina, Yifan He, and Ralph Grishman. 2015. Personalized Page Rank for Named Entity Disambiguation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Rada Mihalcea, Joyce Chai, and Anoop Sarkar (Eds.). Association for Computational Linguistics, Denver, Colorado, 238–243. DOI: 10.3115/v1/N15-1026
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 3982–3992. DOI: 10.18653/v1/D19-1410
Petar Ristoski and Heiko Paulheim. 2016. RDF2Vec: RDF Graph Embeddings for Data Mining. In The Semantic Web – ISWC 2016, Paul Groth, Elena Simperl, Alasdair Gray, Marta Sabou, Markus Krötzsch, Freddy Lecue, Fabian Flöck, and Yolanda Gil (Eds.). Springer International Publishing, Cham, 498–514.
Petar Ristoski, Jessica Rosati, Tommaso Di Noia, Renato De Leone, and Heiko Paulheim. 2019. RDF2Vec: RDF graph embeddings and their applications. Semantic Web 10, 4 (2019), 721–752.
Ahmad Sakor, Kuldeep Singh, Anery Patel, and Maria-Esther Vidal. 2020. Falcon 2.0: An Entity and Relation Linking Tool over Wikidata. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 3141–3148. DOI: 10.1145/3340531.3412777
W. Shen, Y. Li, Y. Liu, J. Han, J. Wang, and X. Yuan. 2023. Entity Linking Meets Deep Learning: Techniques and Solutions. IEEE Transactions on Knowledge; Data Engineering 35, 03 (mar 2023), 2556–2578. DOI: 10.1109/TKDE.2021.3117715
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (Feb 2015), 443–460. DOI: 10.1109/TKDE.2014.2327028
Uma Srinivasan and Chidambaram Mani. 2018. Diversity-Ensured Semantic Movie Recommendation by Applying Linked Open Data. International Journal of Intelligent Engineering and Systems 11 (04 2018), 275–286.
Priyansh Trivedi, Gaurav Maheshwari, Mohnish Dubey, and Jens Lehmann. 2017. LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs. In The Semantic Web – ISWC 2017, Claudia d’Amato, Miriam Fernandez, Valentina Tamma, Freddy Lecue, Philippe Cudré-Mauroux, Juan Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Springer International Publishing, Cham, 210–218.
Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2016. Joint learning of the embedding of words and entities for named entity disambiguation. In CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings (CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings). Association for Computational Linguistics (ACL), United States, 250–259. Publisher Copyright: © 2016 Association for Computational Linguistics.; 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016; Conference date: 11-08-2016 Through 12-08-2016. DOI: 10.18653/v1/k16-1025
Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2022. Global Entity Disambiguation with BERT. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 3264–3271. DOI: 10.18653/v1/2022.naacl-main.238
Lihan Chen, Tinghui Zhu, Jingping Liu, Jiaqing Liang, and Yanghua Xiao. 2023. End-to-End Entity Linking with Hierarchical Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence 37, 4 (Jun. 2023), 4173–4181. DOI: 10.1609/aaai.v37i4.25534
Lucas Colucci, Prachi Doshi, Kun-Lin Lee, Jiajie Liang, Yin Lin, Ishan Vashishtha, Jia Zhang, and Alvin Jude. 2016. Evaluating Item-Item Similarity Algorithms for Movies. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA’16). Association for Computing Machinery, New York, NY, USA, 2141–2147. DOI: 10.1145/2851581.2892362
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, MN, USA, 4171–4186. DOI: 10.18653/V1/N19-1423
Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, and Markus Zanker. 2012. Linked Open Data to Support Content-Based Recommender Systems. In Proceedings of the 8th International Conference on Semantic Systems (Graz, Austria) (I-SEMANTICS ’12). Association for Computing Machinery, New York, NY, USA, 1–8. DOI: 10.1145/2362499.2362501
Tommaso Di Noia and Vito Claudio Ostuni. 2015. Recommender Systems and Linked Open Data. Springer International Publishing, Cham, 88–113.
Mohnish Dubey, Debayan Banerjee, Debanjan Chaudhuri, and Jens Lehmann. 2018. EARL: Joint Entity and Relation Linking for Question Answering over Knowledge Graphs. In The Semantic Web – ISWC 2018, Denny Vrandečić, Kalina Bontcheva, Mari Carmen Suárez-Figueroa, Valentina Presutti, Irene Celino, Marta Sabou, Lucie-Aimée Kaffee, and Elena Simperl (Eds.). Springer International Publishing, Cham, 108–126.
Jorão Gomes, Rômulo Chrispim de Mello, Victor Ströele, and Jairo Francisco de Souza. 2022. A Hereditary Attentive Template-based Approach for Complex Knowledge Base Question Answering Systems. Expert Systems with Applications 205 (2022), 117725. DOI: 10.1016/j.eswa.2022.117725
Ningning Jia, Xiang Cheng, Sen Su, and Liyuan Ding. 2021. CoGCN: Combining co-attention with graph convolutional network for entity linking with knowledge graphs. Expert Systems 38, 1 (2021), e12606. DOI: 10.1111/exsy.12606 [link]
Hongkun Leng, Caleb De La Cruz Paulino, Momina Haider, Rui Lu, Zhehui Zhou, Ole Mengshoel, Per-Erik Brodin, Julien Forgeat, and Alvin Jude. 2018. Finding similar movies: dataset, tools, and methods. In Proceedings of the 8th International Conference on Semantic Systems (WSCG’2018). Václav Skala-UNION Agency, Plzen, Czech Republic, 115–124.
Huiying Li, Wenqi Yu, and Xinbang Dai. 2023. Joint linking of entity and relation for question answering over knowledge graph. Multimedia Tools and Applications 82, 29 (01 Dec 2023), 44801–44818. DOI: 10.1007/s11042-023-15646-w
Qijia Li, Feng Li, Shuchao Li, Xiaoyu Li, Kang Liu, Qing Liu, and Pengcheng Dong. 2022. Improving Entity Linking by Introducing Knowledge Graph Structure Information. Applied Sciences 12, 5 (2022), 44801–44818. DOI: 10.3390/app12052702
Tomás Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). Association for Computing Machinery, Scottsdale, Arizona, USA. [link]
Tomás Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013). arXiv:1310.4546 [link]
Roberto Mirizzi, Tommaso Di Noia, Azzurra Ragone, Vito Ostuni, and Eugenio Di Sciascio. 2012. Movie recommendation with DBpedia, In Movie recommendation with DBpedia. CEUR Workshop Proceedings 835, 101–112.
Jean Gabriel Nguema Ngomo, Giseli Rabello Lopes, Maria Luiza Machado Campos, and Maria Claudia Reis Cavalcanti. 2020. An Approach for Improving DBpedia as a Research Data Hub. In Proceedings of the Brazilian Symposium on Multimedia and the Web (São Luís, Brazil) (WebMedia ’20). Association for Computing Machinery, New York, NY, USA, 65–72. DOI: 10.1145/3428658.3431075
Ítalo M. Pereira and Anderson A. Ferreira. 2019. An Item-Item Similarity Approach Based on Linked Open Data Semantic Relationship. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web (Rio de Janeiro, Brazil) (WebMedia ’19). Association for Computing Machinery, New York, NY, USA, 425–432. DOI: 10.1145/3323503.3349547
Maria Pershina, Yifan He, and Ralph Grishman. 2015. Personalized Page Rank for Named Entity Disambiguation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Rada Mihalcea, Joyce Chai, and Anoop Sarkar (Eds.). Association for Computational Linguistics, Denver, Colorado, 238–243. DOI: 10.3115/v1/N15-1026
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 3982–3992. DOI: 10.18653/v1/D19-1410
Petar Ristoski and Heiko Paulheim. 2016. RDF2Vec: RDF Graph Embeddings for Data Mining. In The Semantic Web – ISWC 2016, Paul Groth, Elena Simperl, Alasdair Gray, Marta Sabou, Markus Krötzsch, Freddy Lecue, Fabian Flöck, and Yolanda Gil (Eds.). Springer International Publishing, Cham, 498–514.
Petar Ristoski, Jessica Rosati, Tommaso Di Noia, Renato De Leone, and Heiko Paulheim. 2019. RDF2Vec: RDF graph embeddings and their applications. Semantic Web 10, 4 (2019), 721–752.
Ahmad Sakor, Kuldeep Singh, Anery Patel, and Maria-Esther Vidal. 2020. Falcon 2.0: An Entity and Relation Linking Tool over Wikidata. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 3141–3148. DOI: 10.1145/3340531.3412777
W. Shen, Y. Li, Y. Liu, J. Han, J. Wang, and X. Yuan. 2023. Entity Linking Meets Deep Learning: Techniques and Solutions. IEEE Transactions on Knowledge; Data Engineering 35, 03 (mar 2023), 2556–2578. DOI: 10.1109/TKDE.2021.3117715
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering 27, 2 (Feb 2015), 443–460. DOI: 10.1109/TKDE.2014.2327028
Uma Srinivasan and Chidambaram Mani. 2018. Diversity-Ensured Semantic Movie Recommendation by Applying Linked Open Data. International Journal of Intelligent Engineering and Systems 11 (04 2018), 275–286.
Priyansh Trivedi, Gaurav Maheshwari, Mohnish Dubey, and Jens Lehmann. 2017. LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs. In The Semantic Web – ISWC 2017, Claudia d’Amato, Miriam Fernandez, Valentina Tamma, Freddy Lecue, Philippe Cudré-Mauroux, Juan Sequeda, Christoph Lange, and Jeff Heflin (Eds.). Springer International Publishing, Cham, 210–218.
Ikuya Yamada, Hiroyuki Shindo, Hideaki Takeda, and Yoshiyasu Takefuji. 2016. Joint learning of the embedding of words and entities for named entity disambiguation. In CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings (CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings). Association for Computational Linguistics (ACL), United States, 250–259. Publisher Copyright: © 2016 Association for Computational Linguistics.; 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016; Conference date: 11-08-2016 Through 12-08-2016. DOI: 10.18653/v1/k16-1025
Ikuya Yamada, Koki Washio, Hiroyuki Shindo, and Yuji Matsumoto. 2022. Global Entity Disambiguation with BERT. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 3264–3271. DOI: 10.18653/v1/2022.naacl-main.238
Publicado
14/10/2024
Como Citar
PEREIRA, Ítalo M.; FERREIRA, Anderson A..
E-BELA: Enhanced Embedding-Based Entity Linking Approach. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 30. , 2024, Juiz de Fora/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 115-123.
DOI: https://doi.org/10.5753/webmedia.2024.243160.