Block-based communication in graph exploration on distributed RDF bases
Abstract
Distributed SPARQL query processing involves the exchange of intermediate results among RDF storage servers. This paper analyzes the impact of grouping these results into blocks in order to reduce the number of transmissions. The proposed communication strategy has been implemented on a SPARQL query processor based on a graph exploration algorithm. Experimental results showed that block-based communication can improve the performance of distributed query processing. Future works include extension of the SPARQL processor by considering the cost of block communication in query planning and optimization.
References
Goasdoué, F., Kaoudi, Z., Manolescu, I., Quiané-Ruiz, J., and Zampetakis, S. (2013). CliqueSquare: efficient Hadoopbased RDF query processing. In BDA’13 - Journées de Bases de Données Avancées.
Ozsu, M. T. and Valduriez, P. (2011). Principles of Distributed Database Systems, 3rd Ed. DOI: https://doi.org/10.1007/978-1-4419-8834-8
Penteado, R. R. M., Schroeder, R., and Hara, C. S. (2016). Exploring controlled RDF distribution. In IEEE CloudCom 2016, Luxembourg, December 12-15, 2016, pages 160–167. DOI: https://doi.org/10.1109/CloudCom.2016.0038
Rohloff (2010). High-performance, Massively Scalable Distributed Systems Using the MapReduce Software Framework: The SHARD Triple-store. In Programming Support Innovations for Emerging Distributed Applications,pages 4:1–4:5, New York, NY, USA. ACM. DOI: https://doi.org/10.1145/1940747.1940751
