Roofline Analysis and Performance Optimization of the MGB Hydrological Model
Resumo
The Roofline model gives insights about the performance behavior of applications bounded by either memory or processor limits, providing useful guidelines for performance improvements. This work uses the Roofline model on the analysis of the MGB model that simulates hydrological processes in largescale watersheds. Real-world input data are used to characterize the performance on two multicore architectures, one with only CPUs and one with CPUs/GPU. The MGB model performance is improved with optimizations for better memory use, and also with shared-memory (OpenMP) and GPU (OpenACC) parallelism. CPU performance achieves 42.51 % and 50.17 % of each system’s peak, whereas GPU performance is low due to overheads caused by the MGB model structure.
Referências
Dagum, L. and Menon, R. (1998). OpenMP: an industry standard API for shared-memory programming. IEEE Comp. Sci. Eng., 5(1):46–55.
Dolbeau, R. (2015). Theoretical peak FLOPS per instruction set on modern Intel CPUs. unpublished.
Fan, F. M., Pontes, P. R. M., Paiva, R. C. D., and Collischonn, W. (2014). Avaliação de um método de propagação de cheias em rios com aproximação inercial das equações de Saint-Venant. R. Bras. Rec. Hı́dr., 19(4):137–147.
Fleischmann, A., Siqueira, V., Paris, A., Collischonn, W., Paiva, R. C. D., Pontes, P. R. M., Biancamara, S., Gosset, M., and Calmant, S. (2017). Representando interações entre hidrologia e hidrodinâmica em modelos de grande escala: estudo de caso no rio Nı́ger, África. In XXII SBRH - Simpósio Brasileiro de Recursos Hı́dricos. ABRH - Associação Brasileira de Recursos Hı́dricos.
Hennessy, J. L. and Patterson, D. A. (2007). Computer architecture: a quantitative approach. Morgan Kaufmann Publishers, 4th edition.
Hill, M. D. and Marty, M. R. (2008). Amdahl’s law in the multicore era. IEEE Comp. Soc., 41(7):33–38.
Ilic, A., Pratas, F., and Sousa, L. (2013). Cache-aware roofline model: upgrading the loft. IEEE Comp. Arch. Letr., 13(1):21–24.
McCalpin, J. D. (1995). Memory bandwidth and machine balance in current high performance computers. In IEEE Comp. Soc. Tech. Cmte. Comp. Arch. Newsletter, pages 19–25.
Paiva, R. C. D., Collischonn, W., and Tucci, C. E. M. (2011). Large scale hydrologic and hydrodynamic modeling using limited data and a GIS based approach. J. Hydrol., 406(3-4):170–181.
Ruggiero, J. (2008). Measuring Cache and Memory Latency and CPU to Memory Bandwidth. Intel Corporation.
Shuttleworth, W. J. (1993). Evaporation, chapter 4, pages 1–53. Handbook of Hydrology. McGraw-Hill Education.
Terpstra, D., Jagode, H., You, H., and Dongarra, J. (2010). Collecting performance data with PAPI-C. In Heidelberg, S. B. ., editor, Tools High Perf. Comp., pages 157–173. 3rd Parallel Tools Workshop.
Viswanathan, V., Kumar, K., Willhalm, T., Lu, P., Filipiak, B., and Sakthivelu, S. (2013) Intel Memory Latency Checker. Intel Corporation.
Wienke, S., Springer, P., Terboven, C., and an Mey, D. (2012). OpenACC - first experiences with real-world applications. In Berlin/Heidelberg, S., editor, Lect. Notes Comp. Sci., volume 7484, pages 859–870. Euro-Par 2012 Parallel Proc.
Williams, S., Waterman, A., and Patterson, D. (2009). Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65–76.
Wittmann, M., Hager, G., Janalik, R., Lanser, M., Klawonn, A., Rheinbach, O., Schenk, O., and Wellein, G. (2018). Multicore performance engineering of sparse triangular solves using a modified roofline model. In 30th Int. Sym. Comp. Arch. High Perf. Comp., pages 233–241.
Yang, C., Kurth, T., and Williams, S. (2019). Hierarchical roofline analysis for GPUs: accelerating performance optimization for the NERSC-9 perlmutter system. In Annual Cray Users Group Meeting (CUG’2019). Cray User Group.
Zwarts, L., van Beukering, P., Bakary, K., and Wymenga, E. (2005). The Niger, a lifeline effective water management in the upper Niger basin. Altenburg & Wymenga ecological consultants.