Distributed Processing of Hash Join Operations in Programmable Switches: a Cost Model Analysis
Abstract
The query processing cost in distributed database systems is directly associated with transferring data cost over the network. The Software-Defined Wide-Area Network (SD-WAN) allows (re)program network devices via software. The programmability of this architecture provides new possibilities for dynamically manage the network topology, also enabling the data processing on these devices. This paper presents an evaluation of the distributed processing of the hash join operation to the database and network devices, using an cost model. Our results show that processing the hash join operation using network switches could achieve competing results compared to traditional servers with similar data traffic.
References
Binnig, C., Crotty, A., Galakatos, A., Kraska, T., and Zamanian, E. (2016). The end of slow networks: It’s time for a redesign. Proc. VLDB Endow.
Blanas, S., Li, Y., and Patel, J. M. (2011). Design and evaluation of main memory hash join algorithms for multi-core cpus. In Proc. of the ACM Inter. Conf. on Management of Data (SIGMOD).
Council, T. P. P. (2020). Tpc benchmark h. http://www.tpc.org/tpch/. Acessado em 25/11/2020.
Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R., and von Eicken, T. (1993). Logp: Towards a realistic model of parallel computation. SIGPLAN Not., 28(7):1–12.
Estebanez, C., Sáez, Y., Recio, G., and Isasi, P. (2014). Performance of the most common non-cryptographic hash functions. Software: Practice and Experience.
Fowler, G., Vo, P., and Noll, L. C. (2013). Fnv hash. http://www.isthe.com/chongo/tech/comp/fnv/index.html/. Acessado em 14/04/2020.
Holst, A. (2021). Amount of data created, consumed, and stored 2010- 2025. https://www.statista.com/statistics/871513/worldwide-data-created/. Acessado em 25/06/2021.
Huang, J., Venkatraman, K., and Abadi, D. J. (2014). Query optimization of distributed pattern matching. In 30th Inter. Conf. on Data Engineering (ICDE).
Jin, X., Li, X., Zhang, H., Foster, N., Lee, J., Soule, R., Kim, C., and Stoica, I. (2018). ´ Netchain: Scale-free sub-rtt coordination. In Conf. on Networked Systems Design and Implementation (NSDI).
Kepe, T. R., de Almeida, E. C., and Alves, M. A. Z. (2019). Database processing-inmemory: An experimental study. Proc. VLDB Endow., 13(3):334–347.
Kossmann, D. (2000). The state of the art in distributed query processing. ACM Comput. Surv.
Lerner, A., Hussein, R., Cudre-Mauroux, P., and eXascale Infolab, U. (2019). The case for network accelerated query processing. In Conf. on Innovative Data Systems Research (CIDR).
Lerner, A., Hussein, R., Lee, A. R. S., and Cudre-Mauroux, P. (2020). Networking and storage: The next computing elements in exascale systems? IEEE Data Eng. Bull.
Narayana, S., Sivaraman, A., Nathan, V., Goyal, P., Arun, V., Alizadeh, M., Jeyakumar, V., and Kim, C. (2017). Language-directed hardware design for network performance monitoring. In Proc. of the ACM Special Interest Group on Data Communication (SIGCOMM).
Polychroniou, O., Sen, R., and Ross, K. A. (2014). Track join: Distributed joins with minimal network traffic. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, page 1483–1494, New York, NY, USA. Association for Computing Machinery.
Polychroniou, O., Zhang, W., and Ross, K. A. (2018). Distributed joins and data placement for minimal network traffic. ACM Transactions on Database Systems (TODS).
Salama, A., Binnig, C., Kraska, T., Scherp, A., and Ziegler, T. (2017). Rethinking distributed query execution on high-speed networks. IEEE Data Eng. Bull.
Salama, A., Binnig, C., Kraska, T., Scherp, A., and Ziegler, T. (2017). Rethinking distributed query execution on high-speed networks. IEEE Data Eng. Bull.
Sedgewick, R. (1998). Algorithms in C, Parts 1-4: Fundamentals, Data Structures, Sorting, Searching. Addison-Wesley.
Shin, M., Nam, K., and Kim, H. (2012). Software-defined networking (sdn): A reference architecture and open apis. In Int. Conf. on ICT Convergence.
Valduriez, P. and Gardarin, G. (1984). Join and semijoin algorithms for a multiprocessor database machine. ACM Trans. Database Syst.
Xiong, P., Hacigumus, H., and Naughton, J. F. (2014). A software-defined networking based approach for performance management of analytical queries on distributed data stores. In Proc. of Inter. Conf. on Management of Data (SIGMOD).
Yang, Z., Cui, Y., Li, B., Liu, Y., and Xu, Y. (2019). Software-defined wide area network (sd-wan): Architecture, advances and opportunities. In Int. Conf. on Computer Communication and Networks (ICCCN).
