NeurOMP: Paralelização automática de código utilizando Aprendizagem por Reforço
Resumo
A criação de aplicações que obtenham o máximo de desempenho computacional nas arquiteturas modernas é uma tarefa complexa. Além de utilizar conhecimentos de paralelismo, o programador precisar ter um amplo conhecimento de vários outros aspectos da aplicação. Por este motivo, os compiladores modernos tentam paralelizar algoritmos de maneira automática, utilizando a análise estática do código. Uma maneira de melhorar o processo de tomada de decisão do compilador é utilizando Aprendizagem por Reforço (RL). Este trabalho propõe e avalia uma otimização, chamada NeurOMP, que utiliza RL para paralelizar for loops em códigos C utilizando a biblioteca OpenMP. Os resultados experimentais mostram que o NeurOMP obtém um speedup médio no CAP Bench de 1.6, similar a um especialista humano.Referências
Ahmed, J., Siyal, M. Y., Najam, S., and Najam, Z. (2017). Challenges and issues in mo- dern computer architectures. In Fuzzy Logic Based Power-Efcient Real-Time Multi- Core System, pages 23–29. Springer.
Allen, F. E. (1970). Control ow analysis. ACM Sigplan Notices, 5(7):1–19.
Bessey, A., Block, K., Chelf, B., Chou, A., Fulton, B., Hallem, S., Henri-Gros, C., Kamsky, A., McPeak, S., and Engler, D. (2010). A few billion lines of code later: using static analysis to nd bugs in the real world. Communications of the ACM, 53(2):66–75.
Chapman, B., Jost, G., and Van Der Pas, R. (2008). Using OpenMP: portable shared memory parallel programming, volume 10. MIT press.
Coelli, T. J. (1996). A guide to frontier version 4.1: a computer program for stochastic frontier production and cost function estimation. Technical report, CEPA Working papers.
Coons, K. E., Robatmili, B., Taylor, M. E., Maher, B. A., Burger, D., and McKinley, K. S. (2008). Feature selection and policy optimization for distributed instruction placement using reinforcement learning. In PACT, pages 32–42. ACM.
Cousot, P. and Cousot, R. (1977). Abstract interpretation: a unied lattice model for static analysis of programs by construction or approximation of xpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 238–252. ACM.
Dagum, L. and Menon, R. (1998). Openmp: an industry standard api for shared-memory programming. IEEE CSE, 5(1):46–55.
Davis, M. H. (2018). Markov models & optimization. Routledge.
Fursin, G., Miranda, C., Temam, O., Namolaru, M., Yom-Tov, E., Zaks, A., Mendelson, B., Bonilla, E., Thomson, J., Leather, H., et al. (2008). Milepost gcc: machine learning based research compiler. In GCC Summit.
Gubner, T. and Boncz, P. (2017). Exploring query execution strategies for jit, vectoriza- tion and simd. In Eighth International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS).
Jensen, S. H., Møller, A., and Thiemann, P. (2009). Type analysis for javascript. In International Static Analysis Symposium, pages 238–255. Springer.
Kam, J. B. and Ullman, J. D. (1977). Monotone data ow analysis frameworks. Acta informatica, 7(3):305–317.
Khatami, Z. (2017). Compiler and Runtime Optimization Techniques for Implementation Scalable Parallel Applications. PhD thesis, LSU.
Klabnik, S. and Nichols, C. (2019). The Rust Programming Language (Covers Rust 2018). No Starch Press.
Kulkarni, S. and Cavazos, J. (2012). Mitigating the compiler optimization phase-ordering problem using machine learning. ACM SIGPLAN Notices, 47(10):147–162.
Marinkovic, V., Popovic, M., and Djukic, M. (2018). An automatic instruction-level parallelization of machine code. Advances in Electrical and Computer Engineering, 18(1):27–36.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Moreira, K. C. A., Mendonça, G. S. D., Guimarães, B., Alves, P., and Pereira, F. M. Q. (2016). Paralelização automática de código com diretivas openacc. SBLP. SBC.
Sarkale, Y., Nozhati, S., Chong, E. K., Ellingwood, B., and Mahmoud, H. (2018). Sol- ving markov decision processes for network-level post-hazard recovery via simulation optimization and rollout. arXiv preprint arXiv:1803.04144.
Schardl, T. B., Moses, W. S., and Leiserson, C. E. (2017). Tapir: Embedding fork-join parallelism into llvm's intermediate representation. In Proceedings of the 22nd ACM PPoPP, pages 249–265. ACM.
Souza, M. A., Penna, P. H., Queiroz, M. M., Pereira, A. D., Góes, L. F. W., Freitas, H. C., Castro, M., Navaux, P. O., and Méhaut, J.-F. (2017). Cap bench: a benchmark suite for performance and energy evaluation of low-power many-core processors. CCPE, 29(4).
Stephenson, M., Amarasinghe, S., Martin, M., and O'Reilly, U.-M. (2003). Meta opti- mization: improving compiler heuristics with machine learning. In ACM SIGPLAN Notices, volume 38, pages 77–90. ACM.
Tong, Y., Zhang, W., Ma, Y.-C., Liu, Y., Liang, Y., Zhang, T., and Luo, H. (2017). Compiler-guided parallelism adaption based on application partition for power-gated ilp processor. VLSI, 25(4):1329–1341.
Triantafyllis, S., Vachharajani, M., Vachharajani, N., and August, D. I. (2003). Compiler optimization-space exploration. In CGO 2003., pages 204–215. IEEE.
Allen, F. E. (1970). Control ow analysis. ACM Sigplan Notices, 5(7):1–19.
Bessey, A., Block, K., Chelf, B., Chou, A., Fulton, B., Hallem, S., Henri-Gros, C., Kamsky, A., McPeak, S., and Engler, D. (2010). A few billion lines of code later: using static analysis to nd bugs in the real world. Communications of the ACM, 53(2):66–75.
Chapman, B., Jost, G., and Van Der Pas, R. (2008). Using OpenMP: portable shared memory parallel programming, volume 10. MIT press.
Coelli, T. J. (1996). A guide to frontier version 4.1: a computer program for stochastic frontier production and cost function estimation. Technical report, CEPA Working papers.
Coons, K. E., Robatmili, B., Taylor, M. E., Maher, B. A., Burger, D., and McKinley, K. S. (2008). Feature selection and policy optimization for distributed instruction placement using reinforcement learning. In PACT, pages 32–42. ACM.
Cousot, P. and Cousot, R. (1977). Abstract interpretation: a unied lattice model for static analysis of programs by construction or approximation of xpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, pages 238–252. ACM.
Dagum, L. and Menon, R. (1998). Openmp: an industry standard api for shared-memory programming. IEEE CSE, 5(1):46–55.
Davis, M. H. (2018). Markov models & optimization. Routledge.
Fursin, G., Miranda, C., Temam, O., Namolaru, M., Yom-Tov, E., Zaks, A., Mendelson, B., Bonilla, E., Thomson, J., Leather, H., et al. (2008). Milepost gcc: machine learning based research compiler. In GCC Summit.
Gubner, T. and Boncz, P. (2017). Exploring query execution strategies for jit, vectoriza- tion and simd. In Eighth International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS).
Jensen, S. H., Møller, A., and Thiemann, P. (2009). Type analysis for javascript. In International Static Analysis Symposium, pages 238–255. Springer.
Kam, J. B. and Ullman, J. D. (1977). Monotone data ow analysis frameworks. Acta informatica, 7(3):305–317.
Khatami, Z. (2017). Compiler and Runtime Optimization Techniques for Implementation Scalable Parallel Applications. PhD thesis, LSU.
Klabnik, S. and Nichols, C. (2019). The Rust Programming Language (Covers Rust 2018). No Starch Press.
Kulkarni, S. and Cavazos, J. (2012). Mitigating the compiler optimization phase-ordering problem using machine learning. ACM SIGPLAN Notices, 47(10):147–162.
Marinkovic, V., Popovic, M., and Djukic, M. (2018). An automatic instruction-level parallelization of machine code. Advances in Electrical and Computer Engineering, 18(1):27–36.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Moreira, K. C. A., Mendonça, G. S. D., Guimarães, B., Alves, P., and Pereira, F. M. Q. (2016). Paralelização automática de código com diretivas openacc. SBLP. SBC.
Sarkale, Y., Nozhati, S., Chong, E. K., Ellingwood, B., and Mahmoud, H. (2018). Sol- ving markov decision processes for network-level post-hazard recovery via simulation optimization and rollout. arXiv preprint arXiv:1803.04144.
Schardl, T. B., Moses, W. S., and Leiserson, C. E. (2017). Tapir: Embedding fork-join parallelism into llvm's intermediate representation. In Proceedings of the 22nd ACM PPoPP, pages 249–265. ACM.
Souza, M. A., Penna, P. H., Queiroz, M. M., Pereira, A. D., Góes, L. F. W., Freitas, H. C., Castro, M., Navaux, P. O., and Méhaut, J.-F. (2017). Cap bench: a benchmark suite for performance and energy evaluation of low-power many-core processors. CCPE, 29(4).
Stephenson, M., Amarasinghe, S., Martin, M., and O'Reilly, U.-M. (2003). Meta opti- mization: improving compiler heuristics with machine learning. In ACM SIGPLAN Notices, volume 38, pages 77–90. ACM.
Tong, Y., Zhang, W., Ma, Y.-C., Liu, Y., Liang, Y., Zhang, T., and Luo, H. (2017). Compiler-guided parallelism adaption based on application partition for power-gated ilp processor. VLSI, 25(4):1329–1341.
Triantafyllis, S., Vachharajani, M., Vachharajani, N., and August, D. I. (2003). Compiler optimization-space exploration. In CGO 2003., pages 204–215. IEEE.
Publicado
21/10/2020
Como Citar
SAFFRAN, João; ROCHA, Rodrigo Caetano; GÓES, Luís Fabrício.
NeurOMP: Paralelização automática de código utilizando Aprendizagem por Reforço. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 21. , 2020, Online.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2020
.
p. 85-94.
DOI: https://doi.org/10.5753/wscad.2020.14060.