Cutting-edge relational graph data management with Edge-k: From one to multiple edges in the same row

Authors

  • Lucas C. Scabora University of São Paulo, USP
  • Paulo H. Oliveira University of São Paulo, USP
  • Gabriel Spadon University of São Paulo, USP
  • Daniel S. Kaster University of Londrina, UEL
  • Jose F. Rodrigues-Jr University of São Paulo, USP
  • Agma J. M. Traina University of São Paulo, USP
  • Caetano Traina Junior University of São Paulo, USP

DOI:

https://doi.org/10.5753/jidm.2018.1634

Keywords:

Graph Management, Query Processing, RDBMS

Abstract

Relational Database Management Systems (RDBMSs) are widely employed in several applications, including those that deal with data modeled as graphs. Existing solutions store every edge in a distinct row in the edge table, however, for most cases, such modeling does not provide adequate performance. In this work, we propose Edge-k, a technique to group the vertex neighborhood into a reduced number of rows in a table through additional columns that stores up to k edges per row. The technique provides a better table organization and reduces both table size and query processing time. We evaluate Edge-k table management for insert, update, delete and bulkload operations, and compare the query processing performance both with the conventional edge table — adopted by the existing frameworks — and with the Neo4j graph database. Experiments using Single-Source Shortest Path (SSSP) queries reveal that our new proposal approach always outperforms the conventional edge table as well as it was faster than Neo4j for the first iterations, being slightly slower than Neo4j only for iterations after having loaded the whole graph from disk to memory. It was able to reach a speedup of 66% over a representative real dataset, with an average reduction of up to 58% in our tests. The average speedup over synthetic datasets was up to 54%. Edge-k was also the best one when performing graph degree distribution queries. Moreover, the Edge-k table obtained a processing time reduction of 70% for bulkload operations, despite having an overhead of 50% for individual insert, update and delete operations. Finally, Edge-k advances the state of the art for graph data management within relational database systems.

Downloads

Download data is not yet available.

References

Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. Scalable semantic web data management using vertical partitioning. In Proceedings of the 33rd International Conference on VLDB. pp. 411–422, 2007.

Albahli, S. and Melton, A. Rdf data management: A survey of rdbms-based approaches. In Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics. ACM, New York, USA, pp. 31:1–31:4, 2016.

Barabási, A.-L. and Pósfai, M. Network science. Cambridge University Press, 2016.

Bornea, M. A., Dolby, J., Kementsietsidis, A., Srinivas, K., Dantressangle, P., Udrea, O., and Bhattacharjee, B. Building an efficient RDF store over a relational database. In Proceedings of the 2013 SIGMOD. ACM, New York, USA, pp. 121–132, 2013.

Chen, R. Managing massive graphs in relational DBMS. In 2013 International Conference on Big Data. IEEE, Santa Clara, CA, USA, pp. 1–8, 2013.

Corbellini, A., Mateos, C., Zunino, A., Godoy, D., and Schiaffino, S. Persisting big-data: The NoSQL landscape. Information Systems 63 (Supplement C): 1–23, 2017.

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to Algorithms, 3rd Edition. The MIT Press, 2009.

Fan, J., Raj, A. G. S., and Patel, J. M. The case against specialized graph analytics engines. In Proceeding of the 17th Conference on Innovative Data Systems Research CIDR. Online Proceedings, Asilomar, CA, USA, pp. 1–10, 2015.

Gao, J., Jin, R., Zhou, J., Yu, J. X., Jiang, X., and Wang, T. Relational approach for shortest path discovery over large graphs. PVLDB Endowment 5 (4): 358–369, 2011.

Gao, J., Zhou, J., Yu, J. X., and Wang, T. Shortest path computing in relational DBMSs. IEEE Transactions on Knowledge and Data Engineering 26 (4): 997–1011, 2014.

Jindal, A., Madden, S., Castellanos, M., and Hsu, M. Graph analytics using vertica relational database. In International Conference on Big Data. IEEE, pp. 1191–1200, 2015.

Levandoski, J. J. and Mokbel, M. F. RDF data-centric storage. In International Conference on Web Services. IEEE, Los Angeles, USA, pp. 911–918, 2009.

Neumann, T. and Weikum, G. The RDF-3X engine for scalable management of RDF data. The Very Large Data Bases Journal 19 (1): 91–113, 2010.

Scabora, L., Oliveira, P. H., Kaster, D. S., Traina, A. J. M., and Traina-Jr., C. Relational graph data management on the edge: Grouping vertices’ neighborhood with Edge-k. In Proceedings of the 32nd Brazilian Symposium on Databases. SBC, pp. 124–135, 2017.

Silva, D. N. R. D., Wehmuth, K., Osthoff, C., Appel, A. P., and Ziviani, A. Análise de desempenho de plataformas de processamento de grafos. In 31st Brazilian Symposium on Databases, SBBD. SBC, Salvador, BH, Brasil, pp. 16–27, 2016.

Sun, W., Fokoue, A., Srinivas, K., Kementsietsidis, A., Hu, G., and Xie, G. Sqlgraph: An efficient relational-based property graph store. In Proceedings of the SIGMOD. ACM, New York, USA, pp. 1887–1901, 2015.

Vicknair, C., Macias, M., Zhao, Z., Nan, X., Chen, Y., and Wilkins, D. A comparison of a graph database and a relational database: A data provenance perspective. In Proceedings of the 48th Annual Southeast Regional Conference. ACM, New York, USA, pp. 42:1–42:6, 2010.

Vukotic, A., Watt, N., Abedrabbo, T., Fox, D., and Partner, J. Neo4j in Action. Manning Publications Co., Greenwich, CT, USA, 2014.

Downloads

Published

2018-06-20

How to Cite

Scabora, L. C., Oliveira, P. H., Spadon, G., Kaster, D. S., Rodrigues-Jr, J. F., Traina, A. J. M., & Traina Junior, C. (2018). Cutting-edge relational graph data management with Edge-k: From one to multiple edges in the same row. Journal of Information and Data Management, 9(1), 20. https://doi.org/10.5753/jidm.2018.1634

Issue

Section

SBBD 2017