Otimização de Aplicações Paralelas em Aceleradores Vetoriais NEC SX-Aurora

  • Félix Michels UFRGS
  • Matheus Serpa UFRGS
  • Danilo Carastan-Santos UFRGS
  • Lucas Schnorr UFRGS
  • Philippe Navaux UFRGS

Abstract


By design, Vector processors favor an instruction being executed on multiple data, increasing performance in real numerical applications, such as simulations of fluid mechanics and wave propagation. The present work addresses a performance analysis of the SX-Aurora TSUBASA architecture. For this task, we used the NAS benchmark and a real wave propagation application, which is essential for the oil and prospecting industry. By applying simple optimization techniques such as loop unrolling and inlining, we achieved performance improvements with the SX-Aurora TSUBASA up to 7, 8× with the NAS benchmark, and up to 1, 9× with the real application, when compared to the performance of the original versions of the applications.

References

Bailey, D. H., Barszcz, E., Barton, J. T., Browning, D. S., Carter, R. L., Dagum, L., Fatoohi, R. A., Frederickson, P. O., Lasinski, T. A., Schreiber, R. S., et al. (1991). The nas parallel benchmarks. The International Journal of Supercomputing Applications, 5(3):63–73.

Castro, M., Francesquini, E., Dupros, F., Aochi, H., Navaux, P. O., and Mehaut, J.-F. (2016). Seismic wave propagation simulations on low-power and performance-centric manycores. Parallel Computing, 54:108–120.

da Silva, S. A., da Silva Serpa, M., and Schepke, C. (2016). Técnicas de otimização loop unrolling e loop tiling em multiplicações de matrizes utilizando openmp. In Workshop de Iniciação Cientíca do WSCAD, pages 13–18.

Ezell, S. J. and Atkinson, R. D. (2016). The vital importance of high-performance computing to us competitiveness. Information Technology and Innovation Foundation, April, 28.

Fletcher, R. P., Du, X., and Fowler, P. J. (2009). Reverse time migration in tilted transversely isotropic (tti) media. Geophysics, 74(6):WCA179–WCA187.

Fowler, P. J., Du, X., and Fletcher, R. P. (2010). Coupled equations for reverse time migration in transversely isotropic media. Geophysics, 75(1):S11–S22.

Jacquelin, M., Marchal, L., and Robert, Y. (2009). Complexity analysis and performance evaluation of matrix product on multicore architectures. In 2009 International Conference on Parallel Processing, pages 196–203. IEEE.

Komatsu, K., Momose, S., Isobe, Y., Watanabe, O., Musa, A., Yokokawa, M., Aoyama, T., Sato, M., and Kobayashi, H. (2018). Performance evaluation of a vector supercomputer sx-aurora tsubasa. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 685–696. IEEE.

Kowarschik, M. and Weiß, C. (2003). An overview of cache optimization techniques and cache-aware numerical algorithms. In Algorithms for memory hierarchies, pages 213–232. Springer.

Mittal, S. and Vetter, J. S. (2015). A survey of cpu-gpu heterogeneous computing techniques. ACM Computing Surveys (CSUR), 47(4):1–35.

NEC (2020a). How to use c/c++ compiler for vector engine. https://www.hpc. nec/api/v1/forum/file/download?id=pgNh9b. Acessado em: 08/2020.

NEC (2020b). How to use fortran compiler for vector engine. https://www.hpc. nec/api/v1/forum/file/download?id=pRdhmv. Acessado em: 08/2020.

NEC (2020c). Proginf/ftrace user's guide. https://www.hpc.nec/documents/ sdk/pdfs/g2at03e-PROGINF_FTRACE_User_Guide_en.pdf. Acessado em: 08/2020.

NEC (2020d). Sx-aurora tsubasa a100-1 series user's guide. https://www.hpc. nec/documents/guide/pdfs/A100-1_series_users_guide.pdf. Acessado em: 08/2020.

Serpa, M. S., Cruz, E. H., Diener, M., Krause, A. M., Farrés, A., Rosas, C., Panetta, J., Hanzich, M., and Navaux, P. O. (2017). Strategies to improve the performance of a geophysics model for different manycore systems. In 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), pages 49–54. IEEE.

Yokokawa, M., Nakai, A., Komatsu, K., Watanabe, Y., Masaoka, Y., Isobe, Y., and Kobayashi, H. (2020). I/o performance of the sx-aurora tsubasa. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 27–35. IEEE.

Zhou, H.-W., Hu, H., Zou, Z., Wo, Y., and Youn, O. (2018). Reverse time migration: A prospect of seismic imaging methodology. Earth-Science Reviews, 179:207–227.
Published
2020-10-21
MICHELS, Félix; SERPA, Matheus; CARASTAN-SANTOS, Danilo; SCHNORR, Lucas; NAVAUX, Philippe. Otimização de Aplicações Paralelas em Aceleradores Vetoriais NEC SX-Aurora. In: BRAZILIAN SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 21. , 2020, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2020 . p. 311-322. DOI: https://doi.org/10.5753/wscad.2020.14079.