Superando Limites no Multiparticionamento em GPU

Michel B. Cordeiro; Wagner M. Nunan Zola

doi:10.5753/sscad.2025.16780

Michel B. Cordeiro UFPR
Wagner M. Nunan Zola UFPR

DOI: https://doi.org/10.5753/sscad.2025.16780

Resumo

Multiparticionamento é uma primitiva paralela de propósito geral que reorganiza os dados de entrada em bins (ou buckets) contíguos, sendo a função responsável por categorizar cada elemento em um bin definida pelo programador. O Multiparticionamento em GPU, embora considerado uma primitiva paralela em si, recebeu pouca atenção até o momento na literatura. As implementações estado da arte concentram-se em cenários com número reduzido de bins (até 256), impondo a restrição de que essa quantidade seja potência de 2. Este trabalho tem como objetivo apresentar implementações eficientes de Multiparticionamento em GPU, explorando diferentes estratégias otimizadas para diferentes quantidades de bins, permitindo que o algoritmo selecione, em tempo de execução, o nível de granularidade mais adequado. O algoritmo, denominado mrBin, é capaz de processar grandes quantidades de bins (até 12.288), sem impor restrições sobre esse número. Ou seja, mais bins, menos restrições. Nos experimentos realizados, o algoritmo mostrou-se capaz de superar o estado da arte para quantidades de bins maiores que 64, alcançando aceleração de até 2,2 vezes.

Referências

Alcantara, D. A., Sharf, A., Abbasinejad, F., Sengupta, S., Mitzenmacher, M., Owens, J. D., and Amenta, N. (2009). Real-time parallel hashing on the GPU. In ACM SIGGRAPH asia 2009 papers, pages 1–9.

Ashkiani, S., Davidson, A., Meyer, U., and Owens, J. D. (2016). GPU multisplit. SIGPLAN Not., 51(8).

Ashkiani, S., Davidson, A., Meyer, U., and Owens, J. D. (2017). GPU Multisplit: an extended study of a parallel algorithm. ACM Transactions on Parallel Computing.

Choi, B., Komuravelli, R., Lu, V., Sung, H., Bocchino Jr, R. L., Adve, S. V., and Hart, J. C. (2010). Parallel SAH kD-tree construction. In High performance graphics, pages 77–86. Citeseer.

Cordeiro, M. B., Blanco, R. M., and Nunan Zola, W. M. (2025a). Algoritmo paralelo e distribuído para ordenação chave-valor. In Anais da XX Escola Regional de Banco de Dados, pages 133–136, Porto Alegre, RS, Brasil. SBC.

Cordeiro, M. B., Bluhm, G., and Nunan Zola, W. M. (2025b). Multiparticionamento flexível em processadores multicore. In Anais da XXV Escola Regional de Alto Desempenho da Região Sul, pages 169–170, Porto Alegre, RS, Brasil. SBC.

Cordeiro, M. B. and Nunan Zola, W. M. (2023). KNN paralelo em gpu para grandes volumes de dados com agregação de consultas. In Simpósio em Sistemas Computacionais de Alto Desempenho (SSCAD), pages 253–264. SBC.

Cordeiro, M. B. and Nunan Zola, W. M. (2024). Parallel k-means on GPU using warp-centric strategies. In 2024 IEEE 30th International Conference on Parallel and Distributed Systems (ICPADS), pages 720–727.

Cordeiro, M. B. and Nunan Zola, W. M. (2025). Multiparticionamento de dados em gpu. In Anais da XXV Escola Regional de Alto Desempenho da Região Sul, pages 165–166, Foz do Iguaçu, PR, Brasil. SBC.

Ferraz, S., Dias, V., Teixeira, C. H., Parthasarathy, S., Teodoro, G., and Meira, W. (2024). DuMato: An efficient warp-centric subgraph enumeration system for GPU. Journal of Parallel and Distributed Computing, 191:104903.

Gupta, K., Stuart, J. A., and Owens, J. D. (2012). A study of persistent threads style GPU programming for GPGPU workloads. In IEEE 2012 Innovative Parallel Computing (InPar). IEEE.

Jünger, D., Kobus, R., Müller, A., Hundt, C., Xu, K., Liu, W., and Schmidt, B. (2020). WarpCore: A library for fast hash tables on GPUs. In 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC).

Lessley, B. and Childs, H. (2020). Data-parallel hashing techniques for GPU architectures. IEEE Transactions on Parallel and Distributed Systems, 31(1):237–250.

Merrill, D. G. and Grimshaw, A. S. (2010). Revisiting sorting for GPGPU stream architectures. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 545–546.

Meyer, B., Pozo, A., and Nunan Zola, W. M. (2021). Warp-centric k-nearest neighbor graphs construction on GPU. In 50th International Conference on Parallel Processing Workshop, ICPP Workshops ’21, New York, NY, USA. Association for Computing Machinery.

Nunan Zola, W. M. and De Bona, L. C. E. (2012). Parallel speculative encryption of multiple AES contexts on GPUs. In 2012 Innovative Parallel Computing (InPar), pages 1–9. IEEE.

Reinders, J. (2007). Intel threading building blocks: outfitting C++ for multi-core processor parallelism. ”O’Reilly Media, Inc.”.

Stehle, E. and Jacobsen, H.-A. (2017). A memory bandwidth-efficient hybrid radix sort on GPUs. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD ’17, New York, NY, USA. Association for Computing Machinery.