Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Graphs stored in NoSQL Databases

Resumo


Governments, corporations, startups, open data initiatives and other organizations are increasingly considering RDF and SPARQL in a broad range of information management scenarios. To reduce SPARQL querying times has been the main issue for virtually all the recent RDF triplestores, yet SPARQL caching techniques have not been broadly considered. In this paper we present Rendezvous, a middleware that addresses workload-adaptive management of large RDF graphs with a caching strategy for SPARQL query results. Our middleware provides a novel RDF data partitioning approach based on a fragmentation strategy that maps RDF data into multiple NoSQL databases. The focus of this paper is also on Rendezvous caching, which can reduce average response time by up to an order of magnitude. Our experimental evaluation shows that the approach is promising, outperforming a recent key/value-based caching baseline.

Palavras-chave: RDF, SPARQL, Graph, NoSQL

Referências

Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. (2009). Sw-store: a vertically partitioned dbms for semantic web data management. The VLDB Journal The International Journal on Very Large Data Bases, 18(2):385–406.

Berners-Lee, T., Hendler, J., Lassila, O., et al. (2001). The semantic web. Scientific american, 284(5):28–37.

Bugiotti, F., Bursztyn, D., Diego, U. C. S., and Ileana, I. (2015). Invisible Glue : Scalable Self-Tuning Multi-Stores. Cidr 2015.

Gallego, M. A., Fernández, J. D., Martínez-Prieto, M. A., and de la Fuente, P. (2011). An empirical study of real-world sparql queries. In USEWOD workshop.

Guo, Y., Pan, Z., and Heflin, J. (2005). Lubm: A benchmark for owl knowledge base systems. Web Semantics: Science, S. and Agents on the WWW, 3(2):158–182.

Hu, C., Wang, X., Yang, R., and Wo, T. (2016). Scalardf: a distributed, elastic and scalable in-memory rdf triple store.

Ma, Z., Capretz, M. A., and Yan, L. (2016). Storing massive resource description framework (rdf) data: a survey. The Knowledge Engineering Review, 31(4):391–413.

Mulay, K. and Kumar, P. S. (2012). Spovc: a scalable rdf store using horizontal partitioning and column oriented dbms. In Proceedings of the 4th International Workshop on Semantic Web Information Management, page 8. ACM.

Sadalage, P. J. and Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education.

Weiss, C., Karras, P., and Bernstein, A. (2008). Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment, 1(1):1008–1019.
Publicado
02/10/2017
SANTANA, Luiz Henrique Zambom; MELLO, Ronaldo dos Santos. Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Graphs stored in NoSQL Databases. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 32. , 2017, Uberlândia/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . p. 184-195. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2017.170758.