Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Graphs stored in NoSQL Databases
Resumo
Governments, corporations, startups, open data initiatives and other organizations are increasingly considering RDF and SPARQL in a broad range of information management scenarios. To reduce SPARQL querying times has been the main issue for virtually all the recent RDF triplestores, yet SPARQL caching techniques have not been broadly considered. In this paper we present Rendezvous, a middleware that addresses workload-adaptive management of large RDF graphs with a caching strategy for SPARQL query results. Our middleware provides a novel RDF data partitioning approach based on a fragmentation strategy that maps RDF data into multiple NoSQL databases. The focus of this paper is also on Rendezvous caching, which can reduce average response time by up to an order of magnitude. Our experimental evaluation shows that the approach is promising, outperforming a recent key/value-based caching baseline.
Referências
Berners-Lee, T., Hendler, J., Lassila, O., et al. (2001). The semantic web. Scientific american, 284(5):28–37.
Bugiotti, F., Bursztyn, D., Diego, U. C. S., and Ileana, I. (2015). Invisible Glue : Scalable Self-Tuning Multi-Stores. Cidr 2015.
Gallego, M. A., Fernández, J. D., Martínez-Prieto, M. A., and de la Fuente, P. (2011). An empirical study of real-world sparql queries. In USEWOD workshop.
Guo, Y., Pan, Z., and Heflin, J. (2005). Lubm: A benchmark for owl knowledge base systems. Web Semantics: Science, S. and Agents on the WWW, 3(2):158–182.
Hu, C., Wang, X., Yang, R., and Wo, T. (2016). Scalardf: a distributed, elastic and scalable in-memory rdf triple store.
Ma, Z., Capretz, M. A., and Yan, L. (2016). Storing massive resource description framework (rdf) data: a survey. The Knowledge Engineering Review, 31(4):391–413.
Mulay, K. and Kumar, P. S. (2012). Spovc: a scalable rdf store using horizontal partitioning and column oriented dbms. In Proceedings of the 4th International Workshop on Semantic Web Information Management, page 8. ACM.
Sadalage, P. J. and Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education.
Weiss, C., Karras, P., and Bernstein, A. (2008). Hexastore: sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment, 1(1):1008–1019.