Predição de Desempenho com Graph Neural Networks
Resumo
Melhorar o desempenho de uma aplicação em computadores modernos envolve escolhas em um largo espaço de busca, pois tais arquiteturas possuem variadas características que impactam significativamente o desempenho das aplicações. Além disso, diferentes aplicações tendem a utilizar essas características de forma diferente, tornando a função que mapeia o desempenho de uma aplicação em um hardware bastante complexa. O que torna a tarefa de gerar códigos que maximizem tal função, por parte de compiladores ou especialistas, uma tarefa difícil. Uma possibilidade é avaliar automaticamente várias possibilidades de compilação, técnica conhecida como autotuning. Porém, o custo de executar a aplicação para medir seu desempenho para cada possibilidade tem um elevado custo. Por isso, é comum a utilização de preditores de desempenho para acelerar essa exploração. Nesse trabalho, implementamos e avaliamos o uso de redes neurais de grafos, mais especificamente redes convolucionais de grafos, para a tarefa de predizer o desempenho de uma aplicação. Nós treinamos uma rede com 30 mil diferentes planos de compilação aplicados em 300 diferentes aplicações e mostramos que redes baseadas em grafo podem aprender sobre as características do grafo de fluxo de controle, obtendo desempenho melhor do que técnicas que não consideram tal grafo durante a predição de desempenho.Referências
Ardalani, N., Lestourgeon, C., Sankaralingam, K., and Zhu, X. (2015). Cross-architecture performance prediction (xapp) using cpu code to predict gpu performance. In Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48, page 725–737, New York, NY, USA. Association for Computing Machinery.
Ashouri, A. H., Killian, W., Cavazos, J., Palermo, G., and Silvano, C. (2018). A survey on compiler autotuning using machine learning. ACM Comput. Surv., 51(5).
Ashouri, A. H., Mariani, G., Palermo, G., and Silvano, C. (2014). A bayesian network approach for compiler auto-tuning for embedded processors. In 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia), pages 90–97.
Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., and Sima'an, K. (2017). Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1957–1967, Copenhagen, Denmark. Association for Computational Linguistics.
Bianchi, F. M., Grattarola, D., and Alippi, C. (2019). Mincut pooling in graph neural networks. CoRR, abs/1907.00481.
Brauckmann, A., Goens, A., Ertel, S., and Castrillon, J. (2020). Compiler-based graph representations for deep learning models of code. In Proceedings of the 29th International Conference on Compiler Construction, CC 2020, page 201–211, New York, NY, USA. Association for Computing Machinery.
Cummins, C., Petoumenos, P., Wang, Z., and Leather, H. (2017). End-to-end deep learning of optimization heuristics. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 219–232.
Curtis-Maury, M., Singh, K., McKee, S. A., Blagojevic, F., Nikolopoulos, D. S., de Supinski, B. R., and Schulz, M. (2007). Identifying energy-efcient concurrency levels using machine learning. In 2007 IEEE International Conference on Cluster Computing, pages 488–495.
Fan, K., Cosenza, B., and Juurlink, B. (2019). Predictable gpus frequency scaling for energy and performance. In Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019, New York, NY, USA. Association for Computing Machinery.
Filho, J. F., Rodriguez, L. G. A., and da Silva, A. F. (2018). Yet another intelligent codegenerating system: A exible and low-cost solution. Journal of Computer Science and Technology, (5):940–965.
Kipf, T. N. and Welling, M. (2016). Semi-supervised classication with graph convolutional networks. CoRR, abs/1609.02907.
Lattner, C. and Adve, V. (2004). LLVM: A compilation framework for lifelong program analysis and transformation. In CGO, pages 75–88, San Jose, CA, USA.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436– 444.
Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2017). Gated graph sequence neural networks. CoRR, abs/1511.05493.
Namolaru, M., Cohen, A., Fursin, G., Zaks, A., and Freund, A. (2010). Practical aggregation of semantical program properties for machine learning based optimization. In Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES '10, page 197–206, New York, NY, USA. Association for Computing Machinery.
Ogilvie, W. F., Petoumenos, P., Wang, Z., and Leather, H. (2017). Minimizing the cost of iterative compilation with active learning. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 245–256.
OpenCL (2020). OpenCL framework.
OpenMP (2020). OpenMP framework.
Sarkar, S. and Mitra, S. (2014). Execution prole driven speedup estimation for porting sequential code to gpu. In Proceedings of the 7th ACM India Computing Conference, COMPUTE '14, New York, NY, USA. Association for Computing Machinery.
Satorras, V. G. and Estrach, J. B. (2018). Few-shot learning with graph neural networks. In International Conference on Learning Representations.
Wen, Y., Wang, Z., and O'Boyle, M. F. P. (2014). Smart multi-task scheduling for opencl programs on cpu/gpu heterogeneous platforms. In 2014 21st International Conference on High Performance Computing (HiPC), pages 1–10.
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. (2019). A comprehensive survey on graph neural networks. CoRR, abs/1901.00596.
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. (2018). Graph convolutional neural networks for web-scale recommender systems. CoRR, abs/1806.01973.
Ashouri, A. H., Killian, W., Cavazos, J., Palermo, G., and Silvano, C. (2018). A survey on compiler autotuning using machine learning. ACM Comput. Surv., 51(5).
Ashouri, A. H., Mariani, G., Palermo, G., and Silvano, C. (2014). A bayesian network approach for compiler auto-tuning for embedded processors. In 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia), pages 90–97.
Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., and Sima'an, K. (2017). Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1957–1967, Copenhagen, Denmark. Association for Computational Linguistics.
Bianchi, F. M., Grattarola, D., and Alippi, C. (2019). Mincut pooling in graph neural networks. CoRR, abs/1907.00481.
Brauckmann, A., Goens, A., Ertel, S., and Castrillon, J. (2020). Compiler-based graph representations for deep learning models of code. In Proceedings of the 29th International Conference on Compiler Construction, CC 2020, page 201–211, New York, NY, USA. Association for Computing Machinery.
Cummins, C., Petoumenos, P., Wang, Z., and Leather, H. (2017). End-to-end deep learning of optimization heuristics. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 219–232.
Curtis-Maury, M., Singh, K., McKee, S. A., Blagojevic, F., Nikolopoulos, D. S., de Supinski, B. R., and Schulz, M. (2007). Identifying energy-efcient concurrency levels using machine learning. In 2007 IEEE International Conference on Cluster Computing, pages 488–495.
Fan, K., Cosenza, B., and Juurlink, B. (2019). Predictable gpus frequency scaling for energy and performance. In Proceedings of the 48th International Conference on Parallel Processing, ICPP 2019, New York, NY, USA. Association for Computing Machinery.
Filho, J. F., Rodriguez, L. G. A., and da Silva, A. F. (2018). Yet another intelligent codegenerating system: A exible and low-cost solution. Journal of Computer Science and Technology, (5):940–965.
Kipf, T. N. and Welling, M. (2016). Semi-supervised classication with graph convolutional networks. CoRR, abs/1609.02907.
Lattner, C. and Adve, V. (2004). LLVM: A compilation framework for lifelong program analysis and transformation. In CGO, pages 75–88, San Jose, CA, USA.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436– 444.
Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2017). Gated graph sequence neural networks. CoRR, abs/1511.05493.
Namolaru, M., Cohen, A., Fursin, G., Zaks, A., and Freund, A. (2010). Practical aggregation of semantical program properties for machine learning based optimization. In Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES '10, page 197–206, New York, NY, USA. Association for Computing Machinery.
Ogilvie, W. F., Petoumenos, P., Wang, Z., and Leather, H. (2017). Minimizing the cost of iterative compilation with active learning. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 245–256.
OpenCL (2020). OpenCL framework.
OpenMP (2020). OpenMP framework.
Sarkar, S. and Mitra, S. (2014). Execution prole driven speedup estimation for porting sequential code to gpu. In Proceedings of the 7th ACM India Computing Conference, COMPUTE '14, New York, NY, USA. Association for Computing Machinery.
Satorras, V. G. and Estrach, J. B. (2018). Few-shot learning with graph neural networks. In International Conference on Learning Representations.
Wen, Y., Wang, Z., and O'Boyle, M. F. P. (2014). Smart multi-task scheduling for opencl programs on cpu/gpu heterogeneous platforms. In 2014 21st International Conference on High Performance Computing (HiPC), pages 1–10.
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., and Yu, P. S. (2019). A comprehensive survey on graph neural networks. CoRR, abs/1901.00596.
Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. (2018). Graph convolutional neural networks for web-scale recommender systems. CoRR, abs/1806.01973.
Publicado
21/10/2020
Como Citar
DO ROSÁRIO, Vanderson; ZANELLA, André Felipe; DA SILVA, Anderson; BORIN, Edson.
Predição de Desempenho com Graph Neural Networks. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 21. , 2020, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 1-12.
DOI: https://doi.org/10.5753/wscad.2020.14053.