Caminhamento Paralelo Barnes-Hut com Vetorização AVX2

  • Wagner Zola Universidade Federal do Paraná
  • Armando Delgado Universidade Federal do Paraná
  • Rodrigo Morante Blanco Universidade Federal do Paraná

Resumo


O algoritmo Barnes-Hut é um método aproximado amplamente usado na simulação gravitacional de N -Corpos. A natureza irregular desse código apresenta desafios para sua computação em sistemas paralelos. Obstáculos adicionais ocorrem nesse padrão de computação quando se deseja a utilização eficaz da capacidade computacional de arquiteturas multicore com instruçoes SIMD. O enfoque deste trabalho é implementar e analisar a eficiência do caminhamento paralelo Barnes-Hut com octrees implı́citas e uso de instruções vetoriais AVX2. Os experimentos demonstram a efetividade do método, que apresenta altas taxas de GFLOP/s e economia de energia nas simulações.

Referências

Arora, N., Shringarpure, A., and Vuduc, R. W. (2009). Direct n-body kernels for multicore platforms. In ICPP 2009, International Conference on Parallel Processing, Vienna, Austria, 22-25 September 2009, pages 379–387.

Asanovic, K., Bodik, R., Catanzaro, B. C., Gebis, J. J., Husbands, P., Keutzer, K., Patterson, D. A., Plishker, W. L., Shalf, J., Williams, S. W., and Yelick, K. A. (2006). The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley.

Barnes, J. and Hut, P. (1986). A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446–449.

Bédorf, J., Gaburov, E., and Portegies Zwart, S. (2012). A sparse octree gravitational n-body code that runs entirely on the GPU processor. J. Comput. Phys., 231(7):2825–2839.

Burtscher, M. and Pingali, K. (2011). Chapter 6 - an efficient CUDA implementation of the tree-based barnes hut n-body algorithm. In Hwu, W.-m. W., editor, GPU Computing Gems Emerald Edition, pages 75 – 92. Morgan Kaufmann, Boston.

Hennessy, J. L. and Patterson, D. A. (2011). Computer Architecture, Fifth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 5th edition.

Lange, B. and Fortin, P. (2014). Parallel dual tree traversal on multi-core and many-core architectures for astrophysical n-body simulations. In Euro-Par 2014: Proceedings of the 20th International European Conference on Parallel and Distributed Computing, pages 716–727.

Long Wang, R. S. et al. (2015). NBODY6++GPU: ready for the gravitational million-body problem. Monthly Notices of the Royal Astronomical Society, 450(4):4070–4080.

Nunan Zola, W. M., Bona, L. C., and Silva, F. (2014). Fast GPU parallel N-Body tree traversal with Simulated Wide-Warp. In Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on, pages 718–725.

Plummer, H. C. (1911). On the problem of distribution in globular star clusters. Monthly Notices of the Royal Astronomical Society, 71:460–470.

Treibig, J., Hager, G., and Wellein, G. (2010). Likwid: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA.

Yokota, R. (2012). An FMM based on dual tree traversal for many-core architectures. CoRR, abs/1209.3516.

Zecena, I., Burtscher, M., Jin, T., and Zong, Z. (2013). Evaluating the performance and energy efficiency of n-body codes on multi-core cpus and gpus. In 32nd IEEE International Performance Computing and Communications Conference, IPCCC’13.
Publicado
08/11/2019
ZOLA, Wagner; DELGADO, Armando; BLANCO, Rodrigo Morante. Caminhamento Paralelo Barnes-Hut com Vetorização AVX2. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 20. , 2019, Campo Grande. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 454-461. DOI: https://doi.org/10.5753/wscad.2019.8691.