Evaluation of Hash Join Operations Performance Executing on SDN Switches: A Cost Model Approach

Authors

  • Marisa S. Franco Federal University of Paraná
  • Simone Dominico Federal University of Paraná
  • Tiago R. Kepe Federal University of Paraná / Federal Institute of Paraná
  • Luiz C. P. Albini Federal University of Paraná
  • Eduardo C. de Almeida Federal University of Paraná
  • Marco A. Z. Alves Federal University of Paraná

DOI:

https://doi.org/10.5753/jidm.2022.2515

Keywords:

DBMS, Distributed Database, Network Processing, SDN

Abstract

Distributed database systems store and manipulate data on multiple machines. In these systems, the processing cost of query operations is mainly impacted by the data access latency between machines over the network. With recent technology advances in programmable network devices, the network switches provide new opportunities for dynamically managing the network topology, enabling the data processing on these devices with the same network throughput. In this paper, we explore the programmable network switches in query processing, evaluating the processing performance of a cost model in executing the hash join operation. We assume the storage of the hash table built from outer relation and the materialization of the join probing are made in switches using advanced matching techniques similar to package inspections enabled by Ternary Content-Addressable Memories (TCAM) or SRAM via hashing. Our results show that processing the hash join operation using network switches achieved the best results compared to traditional servers, with an average time reduction of 91.82% (Query-10 from TPC-H) and 96.52% (Query-11 from TPC-H).

Downloads

Download data is not yet available.

References

Appleby, A. Smhasher. [link], 2016. Accessed: 2020-01-26.

Binnig, C., Crotty, A., Galakatos, A., Kraska, T., and Zamanian, E. The end of slow networks: It’s time for a redesign. Proceedings of the VLDB Endowment 9 (7): 528–539, 2016.

Blanas, S., Li, Y., and Patel, J. M. Design and evaluation of main memory hash join algorithms for multi-core cpus. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference. Athens, Greece, pp. 37–48, 2011.

Council, T. P. P. Tpc benchmark h. [link], 2020. Accessed: 2020-11-15.

Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R., and von Eicken, T. Logp: Towards a realistic model of parallel computation. SIGPLAN Not. 28 (7): 1–12, 1993.

Estébanez, C., Sáez, Y., Recio, G., and Isasi, P. Performance of the most common non-cryptographic hash functions. Software: Practice and Experience vol. 44, pp. 681–698, 2014.

Franco, M. S., Dominico, S., Kepe, T., Albini, L. C., de Almeida, E. C., and Alves, M. Z. Processamento distribuído de operações hash join em switches programáveis: Uma análise via modelo de custo. In Proceedings of the Brazilian Symposium on Databases. Porto Alegre, RS, Brazil, pp. 253–264, 2021.

Holst, A. V. S. Amount of data created, consumed, and stored 2010-2025. [link], 2021. Accessed: 2021-06-25.

Huang, J., Venkatraman, K., and Abadi, D. J. Query optimization of distributed pattern matching. In Proceedings of the IEEE International Conference on Data Engineering. Chicago, IL, USA, pp. 64–75, 2014.

Jin, X., Li, X., Zhang, H., Foster, N., Lee, J., Soulé, R., Kim, C., and Stoica, I. Netchain: Scale-free sub-rtt coordination. In Conference on Networked Systems Design and Implementation. Renton, WA, USA, pp. 35–49, 2018.

Kepe, T. R., de Almeida, E. C., and Alves, M. A. Z. Database processing-in-memory: An experimental study. Proceedings of the VLDB Endowment 13 (3): 334–347, 2019.

Kossmann, D. The state of the art in distributed query processing. ACM Computing Surveys 32 (4): 422–469, 2000.

Lerner, A., Hussein, R., Cudre-Mauroux, P., and eXascale Infolab, U. The case for network accelerated query processing. In Conference on Innovative Data Systems Research. Asilomar, CA, USA, 2019.

Lerner, A., Hussein, R., Lee, A. R. S., and Cudré-Mauroux, P. Networking and storage: The next computing elements in exascale systems? IEEE Data Engineering Bulletin vol. 43, pp. 60–71, 2020.

Narayana, S., Sivaraman, A., Nathan, V., Goyal, P., Arun, V., Alizadeh, M., Jeyakumar, V., and Kim, C. Language-directed hardware design for network performance monitoring. In Proceedings of the ACM Special Interest Group on Data Communication. Los Angeles, CA, USA, 2017.

Polychroniou, O., Sen, R., and Ross, K. A. Track join: Distributed joins with minimal network traffic. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference. New York, NY, USA, pp. 1483–1494, 2014.

Polychroniou, O., Zhang, W., and Ross, K. A. Distributed joins and data placement for minimal network traffic. ACM Transactions on Database Systems 43 (3): 1–45, 2018.

Qiu, K., Yuan, J., Zhao, J., Wang, X., Secci, S., and Fu, X. Fastrule: Efficient flow entry updates for tcam-based openflow switches. IEEE Journal on Selected Areas in Communications 37 (3): 484–498, 2019.

Salama, A., Binnig, C., Kraska, T., Scherp, A., and Ziegler, T. Rethinking distributed query execution on high-speed networks. IEEE Data Engineering Bulletin 40 (1): 27–37, 2017.

Scheidt de Cristo, F., Almeida, E., and Alves, M. Vivid cuckoo hash: Fast cuckoo table building in simd. In High Performance Computing Systems Symposium. Porto Alegre, RS, Brazil, 2019.

Sedgewick, R. Algorithms in C, Parts 1-4: Fundamentals, Data Structures, Sorting, Searching. Addison-Wesley, 1998.

Shin, M., Nam, K., and Kim, H. Software-defined networking (sdn): A reference architecture and open apis. In International Conference on ICT Convergence. Jeju, Korea (South), pp. 360–361, 2012.

Valduriez, P. and Gardarin, G. Join and semijoin algorithms for a multiprocessor database machine. ACM Transactions on Database Systems 9 (1): 133–161, 1984.

Wan, Y., Song, H., and Liu, B. Greedyjump: A fast tcam update algorithm. IEEE Networking Letters 4 (1): 25–29, 2022.

Xiong, P., Hacigumus, H., and Naughton, J. F. A software-defined networking based approach for performance management of analytical queries on distributed data stores. In Proceedings of the ACM SIGMOD International Conference on Management of Data Conference. Snowbird, Utah, USA, pp. 955–966, 2014.

Yang, Z., Cui, Y., Li, B., Liu, Y., and Xu, Y. Software-defined wide area network (sd-wan): Architecture, advances and opportunities. In International . Conference on Computer Communication and Networks. Valencia, Spain, pp. 1–9, 2019.

Downloads

Published

2022-09-12

How to Cite

S. Franco, M., Dominico, S., R. Kepe, T., C. P. Albini, L., C. de Almeida, E., & A. Z. Alves, M. (2022). Evaluation of Hash Join Operations Performance Executing on SDN Switches: A Cost Model Approach. Journal of Information and Data Management, 13(2). https://doi.org/10.5753/jidm.2022.2515

Issue

Section

SBBD 2021 Full papers - Extended Papers