A distributed framework to investigate the entity relatedness problem in large RDF knowledge bases
Resumo
The entity relatedness problem refers to the question of exploring a knowledge base, represented as an RDF graph, to discover and understand how two entities are connected. This question can be addressed by implementing a path search strategy, which combines an entity similarity measure, with an expansion limit, to reduce the path search space and a path ranking measure to order the relevant paths between a given pair of entities in the RDF graph. This paper first introduces DCoEPinKB, an in-memory distributed framework that addresses the entity relatedness problem. Then, it presents an evaluation of path search strategies using DCoEPinKB over real data collected from DBpedia. The results provide insights about the performance of the path search strategies.
Referências
Cheng, G., Zhang, Y., & Qu, Y. (2014, October). Explass: exploring associations between entities via top-k ontological patterns and facets. In International Semantic Web Conference (pp. 422-437). Springer, Cham.
Hanks, P., & Church, K. W. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22-29.
De Virgilio, R., & Maccioni, A. (2014, May). Distributed keyword search over RDF via MapReduce. In European Semantic Web Conference (pp. 208-223). Springer, Cham.
De Vocht, L., Coppens, S., Verborgh, R., Vander Sande, M., Mannens, E., & Van de Walle, R. (2013, January). Discovering meaningful connections between resources in the web of data. In LDOW.
Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
Fang, L., Sarma, A. D., Yu, C., & Bohannon, P. (2011). REX: Explaining Relationships between Entity Pairs. Proceedings of the VLDB Endowment, 5(3):241-252.
Herrera, J. (2017). On the connectivity of entity pairs in knowledge bases. Doctoral dissertation, Department of Informatics, Pontifical Catholic University of Rio de Janeiro-2017.
Herrera, J. E. T., Casanova, M. A., Nunes, B. P., Lopes, G. R., & Leme, L. A. P. P. (2016, October). DBpedia Profiler Tool: profiling the connectivity of entity pairs in DBpedia. In Proceedings of the 5th International Workshop on Intelligent Exploration of Semantic Data (IESD 2016).
Herrera, J. E. T., Casanova, M. A., Nunes, B. P., Leme, L. A. P. P., & Lopes, G. R. (2017, October). An entity relatedness test dataset. In International Semantic Web Conference (pp. 193-201). Springer, Cham.
Hulpuş, I., Prangnawarat, N., & Hayes, C. (2015, October). Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In International Semantic Web Conference (pp. 442-457). Springer, Cham.
Husain, M., McGlothlin, J., Masud, M. M., Khan, L., & Thuraisingham, B. M. (2011). Heuristics-based query processing for large RDF graphs using cloud computing. IEEE Transactions on Knowledge and Data Engineering, 23(9), 1312-1327.
Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat, 37, 547-579.
Jiménez, J. G., Leme, L. A. P. P., & Casanova, M. A. (2021, July). CoEPinKB: A Framework to Understand the Connectivity of Entity Pairs in Knowledge Bases. In Anais do XLVIII Seminário Integrado de Software e Hardware (pp. 97-105). SBC.
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-446.
Le, W., Li, F., Kementsietsidis, A., & Duan, S. (2014). Scalable keyword search on large RDF data. IEEE Transactions on knowledge and data engineering, 26(11), 2774-2788.
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., ... & Bizer, C. (2015). DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic web, 6(2), 167-195.
Milne, D. & Witten, I. H. (2008). An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proc. AAAI 2008 Workshop on Wikipedia and Artificial Intelligence, pages 25–30, Chicago. AAAI Press.
Moore, J. L., Steinke, F., & Tresp, V. (2012, February). A Novel Metric for Information Retrieval in Semantic Networks. In The Semantic Web: ESWC 2011 Workshops: Workshops at the 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Greece, May 29-30, 2011, Revised Selected Papers (Vol. 7117, p. 65-79). Springer.
Pirrò, G. (2015, October). Explaining and suggesting relatedness in knowledge graphs. In International semantic web conference (pp. 622-639). Springer, Cham.
Ragab, M., Tommasini, R., Awaysheh, F. M., & Ramos, J. C. (2021). An In-depth Investigation of Large-scale RDF Relational Schema Optimizations Using Spark-SQL. In DOLAP 2021, pages 71–80, Nicosia, Cyprus.
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., & Lausen, G. (2016). S2RDF: RDF Querying with SPARQL on Spark. Proceedings of the VLDB Endowment, 9(10):804-815.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. HotCloud, 10(10-10), 95.