CoEPinKB: A Framework to Understand the Connectivity of Entity Pairs in Knowledge Bases

Resumo


A knowledge base, expressed using the Resource Description Framework (RDF), can be viewed as a graph whose nodes represent entities and whose edges denote relationships. The entity relatedness problem refers to the problem of discovering and understanding how two entities are related, directly or indirectly, that is, how they are connected by paths in a knowledge base. Strategies designed to solve the entity relatedness problem typically adopt an entity similarity measure to reduce the path search space and a path ranking measure to order and filter the list of paths returned. This paper presents a framework, called CoEPinKB, that supports the empirical evaluation of such strategies. The proposed framework allows combining entity similarity and path ranking measures to generate different path search strategies. The main goals of this paper are to describe the framework and present a performance evaluation of nine different path search strategies.
Palavras-chave: Entity Relatedness, Similarity Measure, Relationship Path Ranking, Backward Search, Knowledge Base

Referências

Cheng, G., Zhang, Y., and Qu, Y. (2014). Explass: Exploring Associations between Entities via Top-K Ontological Patterns and Facets. In Mika, P., Tudorache, T., Bernstein, A., Welty, C., Knoblock, C., Vrandečić, D., Groth, P., Noy, N., Janowicz, K., and Goble, C., editors, The Semantic Web – ISWC 2014, volume 8797, pages 422–437. Springer International Publishing, Cham. Series Title: Lecture Notes in Computer Science.

Church, K. W. and Hanks, P. (1990). Word Association Norms, Mutual Information, and Lexicography. Computational Linguistics, 16(1):22–29.

De Vocht, L., Beecks, C., Verborgh, R., Mannens, E., Seidl, T., and Van de Walle, R.(2016). Effect of Heuristics on Serendipity in Path-Based Storytelling with LinkedData. In Yamamoto, S., editor, Human Interface and the Management of Information:Information, Design and Interaction, volume 9734, pages 238–251. Springer International Publishing, Cham. Series Title: Lecture Notes in Computer Science.

Fang, L., Sarma, A. D., Yu, C., and Bohannon, P. (2011). REX: explaining relationships between entity pairs. Proceedings of the VLDB Endowment, 5(3):241–252.

Färber, M., Ell, B., Menne, C., and Rettinger, A. A Comparative Survey of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. page 26.

Herrera, J. E. T., Casanova, M. A., Nunes, B. P., Leme, L. A. P. P., and Lopes, G. R.(2017). An Entity Relatedness Test Dataset. In d’Amato, C., Fernandez, M., Tamma, V., Lecue, F., Cudré-Mauroux, P., Sequeda, J., Lange, C., and Heflin, J., editors, The Semantic Web – ISWC 2017, volume 10588, pages 193–201. Springer International Publishing, Cham. Series Title: Lecture Notes in Computer Science.

Herrera, J. E. T., Casanova, M. A., Nunes, B. P., Lopes, G. R., and Leme, L. (2016). DBpedia Profiler Tool: Profiling the Connectivity of Entity Pairs in DBpedia. In Proceedings of the 5th International Workshop on Intelligent Exploration of Semantic Data (IESD 2016).

Hulpuş, I., Prangnawarat, N., and Hayes, C. (2015). Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation. In Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P., Dumontier, M., Heflin, J., Thirunarayan, K., Thirunarayan, K., and Staab, S., editors, The Semantic Web - ISWC 2015, volume 9366, pages 442–457. Springer International Publishing, Cham. Series Title: Lecture Notes in Computer Science.

Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat, 37:547–579.

Jeh, G. and Widom, J. (2002). SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538–543. ACM.

Järvelin, K. and Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4):422–446.

Le, W., Li, F., Kementsietsidis, A., and Duan, S. (2014). Scalable keyword search on large RDF data. Knowledge and Data Engineering, IEEE Transactions on, 26(11):2774–2788.

Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., and Bizer, C. (2015). DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6(2):167–195.

Li, C., Han, J., He, G., Jin, X., Sun, Y., Yu, Y., and Wu, T. (2010). Fast computation of SimRank for static and dynamic information networks. In Proceedings of the 13th International Conference on Extending Database Technology - EDBT ’10, page 465, Lausanne, Switzerland. ACM Press.

Lizorkin, D. and Velikhov, P. (2008). Accuracy Estimate and Optimization Techniques for SimRank Computation. Proceedings of the VLDB Endowment, 1(1):12.

Milne, D. and Witten, I. H. (2008). An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence, pages 25–30, Chicago. AAAI Press.

Moore, J. L., Steinke, F., and Tresp, V. (2012). A Novel Metric for Information Retrieval in Semantic Networks. In García-Castro, R., Fensel, D., and Antoniou, G., editors, The Semantic Web: ESWC 2011 Workshops, volume 7117, pages 65–79. Springer Berlin Heidelberg, Berlin, Heidelberg. Series Title: Lecture Notes in Computer Science.

Pirrò, G. (2015). Explaining and Suggesting Relatedness in Knowledge Graphs. In Arenas, M., Corcho, O., Simperl, E., Strohmaier, M., d’Aquin, M., Srinivas, K., Groth, P., Dumontier, M., Heflin, J., Thirunarayan, K., Thirunarayan, K., and Staab, S., editors, The Semantic Web - ISWC 2015, volume 9366, pages 622–639. Springer InternationalPublishing, Cham. Series Title: Lecture Notes in Computer Science.

Reyhani Hamedani, M. and Kim, S.-W. (2021). On Investigating Both Effectiveness and Efficiency of Embedding Methods in Task of Similarity Computation of Nodes in Graphs. Applied Sciences, 11(1):162. Number: 1 Publisher: Multidisciplinary Digital Publishing Institute.

Sommer, C. (2014). Shortest-path queries in static networks. ACM Computing Surveys, 46(4):1–31.

Talavera Herrera, J. E. (2017).On the Connectivity of Entity Pairs in Knowledge Bases. Doctoral Dissertation, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, Brazil.
Publicado
18/07/2021
JIMÉNEZ, Javier Guillot; LEME, Luiz André P. Paes; CASANOVA, Marco A.. CoEPinKB: A Framework to Understand the Connectivity of Entity Pairs in Knowledge Bases. In: SEMINÁRIO INTEGRADO DE SOFTWARE E HARDWARE (SEMISH), 48. , 2021, Evento Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 97-105. ISSN 2595-6205. DOI: https://doi.org/10.5753/semish.2021.15811.