Intrinsics-HMC: An Automatic Trace Generator for Simulations of Processing-In-Memory Instructions
Resumo
Processor-in-Memory (PIM) architectures, such as the Hybrid Memory Cube (HMC), are emerging nowadays as a solution for processing large amount of data directly inside the memory. In this area, several researchers are proposing and evaluating new instructions and new PIM architectures. For such evaluations, trace-driven simulators, as the Simulator of Non-Uniform Cache Architectures (SiNUCA), are commonly used in order to model these new proposed systems. Such simulators provide fast prototyping of new architectures, while it requires the researcher to write simulation traces manually when evaluating new Instruction Set Architecture (ISA) proposals, which is an time consuming and error prone task. In this work, we propose a methodology for fast generation of simulation traces focused on HMC architecture, which consists on a high-level Intrinsics-HMC library and a modification inside the trace-generator tool from SiNUCA. Our proposal enables the researchers to write high level code in C/C++ languages using our library, which mimics the behavior of HMC instructions. These codes can be compiled and executed in traditional x86 architectures for verification. After ensure the code is correct and working, the user can use our modified version of SiNUCA-Tracer to translate HMC functions into HMC instructions know by the simulator, providing a convenient solution to generate traces and fast simulations of new PIM architectures. Results using the proposed technique applied on database application kernels show the correct translation and simulation of new HMC instructions using SiNUCA.
Referências
Alves, M. A. Z., Diener, M., Santos, P. C., and Carro, L. (2016). Large vector extensions inside the HMC. In Conf. on Design, Automation & Test in Europe.
Azarkhish, E., Rossi, D., Loi, I., and Benini, L. (2016). A case for near memory computation inside the smart memory cube. In Workshop on Emerging Memory Solutions.
Balasubramonian, R., Chang, J., Manning, T., et al. (2014). Near-data processing: insights from a MICRO-46 workshop. IEEE Micro, 34(4).
Elliott, D. G., Stumm, M., Snelgrove, W. M., et al. (1999). Computational RAM: Implementing Processors in Memory. Design and Test of Computers, 16(1).
Hadidi, R., Asgari, B., Mudassar, B. A., Mukhopadhyay, S., Yalamanchili, S., and Kim, H. (2017). Demystifying the characteristics of 3d-stacked memories: a case study for hybrid memory cube. arXiv preprint arXiv:1706.02725.
Henning, J. L. (2006). Spec cpu2006 benchmark descriptions. ACM SIGARCH Computer Architecture News, 34(4).
HMC Consortium (2017). HMC specication 2.1.
Jain, R. (1990). The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling. John Wiley & Sons.
Jeddeloh, J. and Keeth, B. (2012). Hybrid memory cube new DRAM architecture increases density and performance. In Symp. on VLSI Technology.
Jeon, D.-I. and Chung, K.-S. (2017). Cashmc: A cycle-accurate simulator for hybrid memory cube. IEEE Computer Architecture Letters, 16(1).
Khalifa, K., Fawzy, H., El-Ashry, S., and Salah, K. (2013). Memory controller architectures: A comparative study. In Int. Design and Test Symp.
Kim, J. and Kim, Y. (2014). Hbm: Memory solution for bandwidth-hungry processors. In Hot Chips Symposium.
Oliveira, G. F., Santos, P. C., Alves, M. A. Z., and Carro, L. (2017). A generic processing in memory cycle accurate simulator under hybrid memory cube architecture. In Int. Conf. on Embedded Computer Systems: Architectures, MOdeling and Simulation.
Olmen, J. V., Mercha, A., Katti, G., et al. (2008). 3D stacked IC demonstration using a through silicon via rst approach. In Int. Electron Devices Meeting.
Patterson, D., Anderson, T., Cardwell, N., et al. (1997). A case for intelligent RAM. IEEE Micro, 17(2).
Pawlowski, J. (2011). Hybrid memory cube (hmc). Hot Chips, 23.
Pugsley, S., Jestes, J., Balasubramonian, R., et al. (2014). Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads. IEEE Micro, 34(4).
Thanh-Hoang, T., Shambayati, A., Deutschbein, C., Hoffmann, H., and Chien, A. (2014). Performance and energy limits of a processor-integrated fft accelerator. In High Performance Extreme Computing Conf.