Impacto do Prefetcher na Precisão de Simulações de Arquiteturas Paralelas

  • Valéria Girelli Universidade Federal do Rio Grande do Sul
  • Francis Moreira Universidade Federal do Rio Grande do Sul
  • Matheus Serpa Universidade Federal do Rio Grande do Sul
  • Philippe Olivier Navaux Universidade Federal do Rio Grande do Sul

Resumo


Em arquitetura de computadores, o uso de simuladores é predominante em todos os grupos de pesquisa, com uma ampla variedade de abordagens e implementações.No entanto, falta na literatura uma análise detalhada de simuladores de arquiteturas paralelas que suportem workloads de Computação de Alto Desempenho (High Performance Computing - HPC). Este trabalho busca analisar o impacto do prefetcher na precisão da simulação paralela realizada pelo ZSim, um simulador de arquiteturas paralelas. Observamos que, devido à falta de modelagem de prefetcher, as estatı́sticas da hierarquia de memória apresentam comportamentos imprecisos, com erros de até 2.600%.

Referências

Akram, A. and Sawalha, L. (2019). A survey of computer architecture simulation techniques and tools. IEEE Access.

Austin, T., Larson, E., and Ernst, D. (2002). Simplescalar: An infrastructure for computer system modeling. Computer, (2):59–67.

Bakhshalipour, M., Shakerinava, M., Lotfi-Kamran, P., and Sarbazi-Azad, H. (2019a). Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399–411. IEEE.

Bakhshalipour, M., Tabaeiaghdaei, S., Lotfi-Kamran, P., and Sarbazi-Azad, H. (2019b). Evaluation of hardware data prefetchers on server processors. ACM Computing Surveys (CSUR), 52(3):52.

Bienia, C., Kumar, S., Singh, J. P., and Li, K. (2008). The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 72–81. ACM.

Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., et al. (2011). The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2):1–7.

Carlson, T. E., Heirmant, W., and Eeckhout, L. (2011). Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In SC’11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–12. IEEE.

Chen, T.-F. and Baer, J.-L. (1995). Effective hardware-based data prefetching for highperformance processors. IEEE transactions on computers, 44(5):609–623.

De Melo, A. C. (2010). The new linux’perf’tools. In Slides from Linux Kongress, volume 18.

Demme, J. D. (2014). Overcoming the Intuition Wall: Measurement and Analysis in Computer Architecture. PhD thesis, Columbia University.

Desikan, R., Burger, D., and Keckler, S. W. (2001). Measuring experimental error in microprocessor simulation. In Proceedings of the 28th annual international symposium on Computer architecture, pages 266–277. ACM.

Eeckhout, L. (2010). Computer architecture performance evaluation methods. Synthesis Lectures on Computer Architecture, 5(1):1–145.

Fog, A. (2012). The microarchitecture of intel, amd and via cpus: An optimization guide for assembly programmers and compiler makers. Copenhagen University College of Engineering, pages 02–29.

Hammarlund, P., Martinez, A. J., Bajwa, A. A., Hill, D. L., Hallnor, E., Jiang, H., Dixon, M., Derr, M., Hunsaker, M., Kumar, R., et al. (2014). Haswell: The fourth-generation intel core processor. IEEE Micro, 34(2):6–20.

Huberty, T. J., Meier, S. G., and Agarwal, M. (2018). Content-directed prefetch circuit with quality filtering. US Patent 9,886,385.

Jain, A. and Lin, C. (2018). Rethinking belady’s algorithm to accommodate prefetching. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pages 110–123. IEEE.

James, D. (2012). Intel ivy bridge unveiled—the first commercial tri-gate, high-k, metalgate cpu. In Proceedings of the IEEE 2012 Custom Integrated Circuits Conference, pages 1–4. IEEE.

Jin, H.-Q., Frumkin, M., and Yan, J. (1999). The openmp implementation of nas parallel benchmarks and its performance.

Le, H. Q., Starke, W. J., Fields, J. S., O’Connell, F. P., Nguyen, D. Q., Ronchetti, B. J., Sauer, W. M., Schwarz, E. M., and Vaden, M. T. (2007). Ibm power6 microarchitecture. IBM Journal of Research and Development, 51(6):639–662.

Lomont, C. (2011). Introduction to intel advanced vector extensions. Intel White Paper, pages 1–21.

Nesbit, K. J. and Smith, J. E. (2004). Data cache prefetching using a global history buffer. In 10th International Symposium on High Performance Computer Architecture (HPCA’04), pages 96–96. IEEE.

Patel, A., Afram, F., and Ghose, K. (2011). Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors. In 1st International Qemu Users’ Forum, pages 29–30.

Sanchez, D. (2016) Zsim tutorial validation. http://zsim.csail.mit.edu/tutorial/slides/validation.pdf.

Sanchez, D. and Kozyrakis, C. (2013). Zsim: Fast and accurate microarchitectural simulation of thousand-core systems. In ACM SIGARCH Computer architecture news, volume 41, pages 475–486. ACM.

Ubal, R., Jang, B., Mistry, P., Schaa, D., and Kaeli, D. (2012). Multi2sim: a simulation framework for cpu-gpu computing. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 335–344. IEEE.

Yourst, M. T. (2007). Ptlsim: A cycle accurate full system x86-64 microarchitectural simulator. In 2007 IEEE International Symposium on Performance Analysis of Systems & Software, pages 23–34. IEEE.
Publicado
08/11/2019
GIRELLI, Valéria; MOREIRA, Francis; SERPA, Matheus; NAVAUX, Philippe Olivier. Impacto do Prefetcher na Precisão de Simulações de Arquiteturas Paralelas. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 20. , 2019, Campo Grande. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 382-393. DOI: https://doi.org/10.5753/wscad.2019.8684.