Escolha do Ladrilhamento para um Simulador de Ondas Acústicas em GPUs por meio de Aprendizado de Máquina
Resumo
The simulation of acousitc wave propagation is the kernel for important industrial applications like the Full-Waveform Inversion (FWI) and Reverse-Time Migration (RTM). The kernel solves partial differential equations (PDEs) based on the finite differences method, which can be significantly accelerated with the support of GPUs. One of the main challenges for accelerating this stencil computations on GPUs is to reduce the overhead of memory accesses, and tiling is an important optimization which can accelerate wave propagation kernels. However, deciding the tile sizes for these computations is not a straightforward question, which usually depend upon many architectural and application parameters. In the present work, we employ six machine learning methods for providing recommendations for the sizes of tiles to use. Our best strategy has achieved a improvement coefficient of 1.17 and 1.11 on two GPUs with Turing and Volta architectures.Referências
Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. In Proc. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, page 2623–2631, New York, NY, USA. ACM.
Allen, R. and Kennedy, K. (2001). Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, CA.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1986). Classification and Regression Trees. Wadsworth International Group, Belmont, CA.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. ACM.
Fix, E. and Hodges, J. (1951). Discriminatory analysis. nonparametric discrimination: Consistency properties. International Statistical Review, 20(1):1–30.
Grosser, T., Cohen, A., Sadayappan, P., Holewinski, J., and Verdoolaege, S. (2014). Hybrid hexagonal/classical tiling for gpus. In Proc. of Annual IEEE/ACM International Symposium on Code Generation and Optimization, Orlando, FL, USA. ACM.
Haggui, O., Tadonki, C., Lacassagne, L., Sayadi, F., and Ouni, B. (2018). Harris corner detection on a NUMA manycore. Future Gener. Comput. Syst., 88:442–452.
Kirk, D. B., mei W. Hwu, W., and Hajj, I. E. (2023). Programming Massively Parallel Processors: A Hands-on Approach. Elsevier Inc., 4nd edition.
Korch, M. and Werner, T. (2020). Improving locality of explicit one-step methods on gpus by tiling across stages and time steps. Future Generation Computer Systems, 102:889–901.
Kruse, M. (2021). Loop transformations using clang’s abstract syntax tree. In 50th Intl. Conference on Parallel Processing Workshop, pages 1–7.
Liu, S., Cui, Y., Jiang, Q., Wang, Q., and Wu, W. (2018). An efficient tile size selection model based on machine learning. Journal of Computer Science and Technology.
Luporini, F., Louboutin, M., Lange, M., Kukreja, N., Witte, P., Hückelheim, J., and Gorman, G. J. (2020). Architecture and performance of devito, a system for automated stencil computation. ACM Transactions on Mathematical Software, 46(1):1–28.
Malik, A. M. (2012). Optimal tile size selection problem using machine learning. In 2012 11th International Conference on Machine Learning and Applications, USA. IEEE.
Meng, Q., Ma, Z., Li, H., Leskovec, J., Zhang, X., and et al, K. H. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In 31st Conference on Neural Information Processing Systems (NeurIPS).
OpenMP (2020). OpenMP Examples Updated with 5.1 Features. [link]. Online; accessed 13 February 2024.
Rahman, M., Pouchet, L.-N., and Sadayappan, P. (2010). Neural network assisted tile size selection. The Ohio State University. {rahmanm,pouchet,saday}@cse.ohio-state.edu.
Souza, J. F. D., Machado, L. S., Gomi, E. S., Tadonki, C., McIntosh-Smith, S., and Senger, H. (2022a). Performance of openmp offloading for the acoustic wave stencil on gpus. In Supercomputing, Dallas, TX, USA.
Souza, J. F. D., Moreira, J. B. D., Roberts, K. J., di Ramos Alves Gaioso, R., Gomi, E. S., Silva, E. C. N., and Senger, H. (2022b). simwave - a finite difference simulator for acoustic waves propagation. arXiv.
Tadonki, C. (2017). Scalable numa-aware wilson-dirac on supercomputers. In 2017 International Conference on High Performance Computing & Simulation, HPCS 2017, Genoa, Italy, July 17-21, 2017, pages 315–324. IEEE.
Tukey, J. W. et al. (1977). Exploratory data analysis, volume 2. Springer.
Virieux, J. and Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics, 74(6):WCC1–WCC26.
Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann series in data management systems. Morgan Kaufmann, 2nd edition. Library of Congress Cataloging-in-Publication Data.
Xu, C., Kirk, S. R., and Jenkins, S. (2009). Tiling for performance tuning on different models of gpus. In Second International Symposium on Information Science and Engineering, page 60. IEEE.
Xue, J. (2000). Loop Tiling for Parallelism, volume SECS 575 of Kluwer International Series in Engineering and Computer Science. Springer Science+Business Media New York, New York. Originally published by Kluwer Academic Publishers, New York in 2000.
Allen, R. and Kennedy, K. (2001). Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco, CA.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1986). Classification and Regression Trees. Wadsworth International Group, Belmont, CA.
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785–794. ACM.
Fix, E. and Hodges, J. (1951). Discriminatory analysis. nonparametric discrimination: Consistency properties. International Statistical Review, 20(1):1–30.
Grosser, T., Cohen, A., Sadayappan, P., Holewinski, J., and Verdoolaege, S. (2014). Hybrid hexagonal/classical tiling for gpus. In Proc. of Annual IEEE/ACM International Symposium on Code Generation and Optimization, Orlando, FL, USA. ACM.
Haggui, O., Tadonki, C., Lacassagne, L., Sayadi, F., and Ouni, B. (2018). Harris corner detection on a NUMA manycore. Future Gener. Comput. Syst., 88:442–452.
Kirk, D. B., mei W. Hwu, W., and Hajj, I. E. (2023). Programming Massively Parallel Processors: A Hands-on Approach. Elsevier Inc., 4nd edition.
Korch, M. and Werner, T. (2020). Improving locality of explicit one-step methods on gpus by tiling across stages and time steps. Future Generation Computer Systems, 102:889–901.
Kruse, M. (2021). Loop transformations using clang’s abstract syntax tree. In 50th Intl. Conference on Parallel Processing Workshop, pages 1–7.
Liu, S., Cui, Y., Jiang, Q., Wang, Q., and Wu, W. (2018). An efficient tile size selection model based on machine learning. Journal of Computer Science and Technology.
Luporini, F., Louboutin, M., Lange, M., Kukreja, N., Witte, P., Hückelheim, J., and Gorman, G. J. (2020). Architecture and performance of devito, a system for automated stencil computation. ACM Transactions on Mathematical Software, 46(1):1–28.
Malik, A. M. (2012). Optimal tile size selection problem using machine learning. In 2012 11th International Conference on Machine Learning and Applications, USA. IEEE.
Meng, Q., Ma, Z., Li, H., Leskovec, J., Zhang, X., and et al, K. H. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In 31st Conference on Neural Information Processing Systems (NeurIPS).
OpenMP (2020). OpenMP Examples Updated with 5.1 Features. [link]. Online; accessed 13 February 2024.
Rahman, M., Pouchet, L.-N., and Sadayappan, P. (2010). Neural network assisted tile size selection. The Ohio State University. {rahmanm,pouchet,saday}@cse.ohio-state.edu.
Souza, J. F. D., Machado, L. S., Gomi, E. S., Tadonki, C., McIntosh-Smith, S., and Senger, H. (2022a). Performance of openmp offloading for the acoustic wave stencil on gpus. In Supercomputing, Dallas, TX, USA.
Souza, J. F. D., Moreira, J. B. D., Roberts, K. J., di Ramos Alves Gaioso, R., Gomi, E. S., Silva, E. C. N., and Senger, H. (2022b). simwave - a finite difference simulator for acoustic waves propagation. arXiv.
Tadonki, C. (2017). Scalable numa-aware wilson-dirac on supercomputers. In 2017 International Conference on High Performance Computing & Simulation, HPCS 2017, Genoa, Italy, July 17-21, 2017, pages 315–324. IEEE.
Tukey, J. W. et al. (1977). Exploratory data analysis, volume 2. Springer.
Virieux, J. and Operto, S. (2009). An overview of full-waveform inversion in exploration geophysics. Geophysics, 74(6):WCC1–WCC26.
Witten, I. H. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann series in data management systems. Morgan Kaufmann, 2nd edition. Library of Congress Cataloging-in-Publication Data.
Xu, C., Kirk, S. R., and Jenkins, S. (2009). Tiling for performance tuning on different models of gpus. In Second International Symposium on Information Science and Engineering, page 60. IEEE.
Xue, J. (2000). Loop Tiling for Parallelism, volume SECS 575 of Kluwer International Series in Engineering and Computer Science. Springer Science+Business Media New York, New York. Originally published by Kluwer Academic Publishers, New York in 2000.
Publicado
23/10/2024
Como Citar
SILVA, Tiago da; GOMI, Edson; SENGER, Hermes.
Escolha do Ladrilhamento para um Simulador de Ondas Acústicas em GPUs por meio de Aprendizado de Máquina. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 25. , 2024, São Carlos/SP.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2024
.
p. 216-227.
DOI: https://doi.org/10.5753/sscad.2024.244702.