Processamento Distribuído de Operações Hash Join em Switches Programáveis: Uma Análise via Modelo de Custo

Resumo


O custo do processamento de uma consulta em sistemas de banco de dados distribuídos liga-se diretamente ao custo da transferência de dados na rede. A Software-Defined Wide Area Network (SD-WAN) e uma tecnologia que permite (re)programar dispositivos de rede via software. Sua programabilidade proporciona novas possibilidades para gerenciar topologias de forma dinâmica, possibilitando ainda o processamento de dados nesses dispositivos. Este artigo avalia o processamento distribuído da operação hash join em switches de rede, a partir de um modelo de custo. Os resultados mostram que o processamento dessas operações em switches de rede alcança desempenho comparável ao processamento tradicional em servidores com um trafego de dados similar.

Palavras-chave: processamento em switches, banco de dados distribuído, hash join, software-defined networks, modelo de custo

Referências

Appleby, A. (2016). Smhasher. https://github.com/aappleby/smhasher/. Acessado em 26/01/2020.

Binnig, C., Crotty, A., Galakatos, A., Kraska, T., and Zamanian, E. (2016). The end of slow networks: It’s time for a redesign. Proc. VLDB Endow.

Blanas, S., Li, Y., and Patel, J. M. (2011). Design and evaluation of main memory hash join algorithms for multi-core cpus. In Proc. of the ACM Inter. Conf. on Management of Data (SIGMOD).

Council, T. P. P. (2020). Tpc benchmark h. http://www.tpc.org/tpch/. Acessado em 25/11/2020.

Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R., and von Eicken, T. (1993). Logp: Towards a realistic model of parallel computation. SIGPLAN Not., 28(7):1–12.

Estebanez, C., Sáez, Y., Recio, G., and Isasi, P. (2014). Performance of the most common non-cryptographic hash functions. Software: Practice and Experience.

Fowler, G., Vo, P., and Noll, L. C. (2013). Fnv hash. http://www.isthe.com/chongo/tech/comp/fnv/index.html/. Acessado em 14/04/2020.

Holst, A. (2021). Amount of data created, consumed, and stored 2010- 2025. https://www.statista.com/statistics/871513/worldwide-data-created/. Acessado em 25/06/2021.

Huang, J., Venkatraman, K., and Abadi, D. J. (2014). Query optimization of distributed pattern matching. In 30th Inter. Conf. on Data Engineering (ICDE).

Jin, X., Li, X., Zhang, H., Foster, N., Lee, J., Soule, R., Kim, C., and Stoica, I. (2018). ´ Netchain: Scale-free sub-rtt coordination. In Conf. on Networked Systems Design and Implementation (NSDI).

Kepe, T. R., de Almeida, E. C., and Alves, M. A. Z. (2019). Database processing-inmemory: An experimental study. Proc. VLDB Endow., 13(3):334–347.

Kossmann, D. (2000). The state of the art in distributed query processing. ACM Comput. Surv.

Lerner, A., Hussein, R., Cudre-Mauroux, P., and eXascale Infolab, U. (2019). The case for network accelerated query processing. In Conf. on Innovative Data Systems Research (CIDR).

Lerner, A., Hussein, R., Lee, A. R. S., and Cudre-Mauroux, P. (2020). Networking and storage: The next computing elements in exascale systems? IEEE Data Eng. Bull.

Narayana, S., Sivaraman, A., Nathan, V., Goyal, P., Arun, V., Alizadeh, M., Jeyakumar, V., and Kim, C. (2017). Language-directed hardware design for network performance monitoring. In Proc. of the ACM Special Interest Group on Data Communication (SIGCOMM).

Polychroniou, O., Sen, R., and Ross, K. A. (2014). Track join: Distributed joins with minimal network traffic. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, page 1483–1494, New York, NY, USA. Association for Computing Machinery.

Polychroniou, O., Zhang, W., and Ross, K. A. (2018). Distributed joins and data placement for minimal network traffic. ACM Transactions on Database Systems (TODS).

Salama, A., Binnig, C., Kraska, T., Scherp, A., and Ziegler, T. (2017). Rethinking distributed query execution on high-speed networks. IEEE Data Eng. Bull.

Salama, A., Binnig, C., Kraska, T., Scherp, A., and Ziegler, T. (2017). Rethinking distributed query execution on high-speed networks. IEEE Data Eng. Bull.

Sedgewick, R. (1998). Algorithms in C, Parts 1-4: Fundamentals, Data Structures, Sorting, Searching. Addison-Wesley.

Shin, M., Nam, K., and Kim, H. (2012). Software-defined networking (sdn): A reference architecture and open apis. In Int. Conf. on ICT Convergence.

Valduriez, P. and Gardarin, G. (1984). Join and semijoin algorithms for a multiprocessor database machine. ACM Trans. Database Syst.

Xiong, P., Hacigumus, H., and Naughton, J. F. (2014). A software-defined networking based approach for performance management of analytical queries on distributed data stores. In Proc. of Inter. Conf. on Management of Data (SIGMOD).

Yang, Z., Cui, Y., Li, B., Liu, Y., and Xu, Y. (2019). Software-defined wide area network (sd-wan): Architecture, advances and opportunities. In Int. Conf. on Computer Communication and Networks (ICCCN).
Publicado
04/10/2021
Como Citar

Selecione um Formato
S. FRANCO, Marisa; DOMINICO, Simone; KEPE, Tiago R.; ALBINI, Luiz C. P.; C. DE ALMEIDA, Eduardo; Z. ALVES, Marco A.. Processamento Distribuído de Operações Hash Join em Switches Programáveis: Uma Análise via Modelo de Custo. In: SIMPÓSIO BRASILEIRO DE BANCO DE DADOS (SBBD), 36. , 2021, Rio de Janeiro. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . p. 253-264. ISSN 2763-8979. DOI: https://doi.org/10.5753/sbbd.2021.17882.