RV-Across: An Associative Processing Simulator

Jonathas Silveira; Isaías Felzmann; João  Fabrício Filho; Lucas Wanner

doi:10.5753/wscad.2020.14064

Jonathas Silveira UNICAMP
Isaías Felzmann UNICAMP
João Fabrício Filho UNICAMP / UTFPR
Lucas Wanner UNICAMP

DOI: https://doi.org/10.5753/wscad.2020.14064

Resumo

Associative Processing provides high-performance and energyefﬁcient parallel computation using a Content-Addressable Memory (CAM). Emerging big data applications can be signiﬁcantly sped-up by Associative Processing, but validation and evaluation are key challenges. We present RVAcross, a RISC-V Associative Processing Simulator for testing, validation, and modeling associative operations. RV-Across eases the design of associative and near-memory processing architectures by offering interfaces to both building new operations and providing high-level experimentation. Our simulator records memory and registers states of each associative operation pass, giving the user visibility and control over the simulation. The user can employ the simulation statistics provided by RV-Across to compute performance and energy metrics. RV-Across implements common associative operations and provides a framework to allow for easy extension. We show how the simulator works by experimenting with different scenarios for associative operations with three applications that test the functionality of logic and arithmetic computations: matrix multiply, checksum, and bitcount. Our results highlight the direct relation between the data length and potential performance improvement of associative processing in comparison to regular CPU serial and parallel operation. In case of matrix multiplication, the speed-up increases linearly with matrices dimension, achieving 8X for 200x200 bytes matrices and overcoming parallel execution in an 8-core CPU.

Referências

Boroumand, A., Ghose, S., Kim, Y., Ausavarungnirun, R., Shiu, E., Thakur, R., Kim, D., Kuusela, A., Knies, A., Ranganathan, P., and et al. (2018). Google workloads for consumer devices: Mitigating data movement bottlenecks. In ASPLOS, page 316–331.

Dai, G., Huang, T., Chi, Y., Zhao, J., Sun, G., Liu, Y., Wang, Y., Xie, Y., and Yang, H. (2019). GraphH: A processing-in-memory architecture for large-scale graph process- ing. IEEE TCAD, 38(4):640–653.

Gupta, S., Imani, M., Kaur, H., and Rosing, T. S. (2019). NNPIM: A Processing In- Memory Architecture for Neural Network Acceleration. IEEE TC, 9340(c):1–1. Imani, M., Patil, S., and Simuni íC Rosing, T. (2018). Approximate computing using multiple-access single-charge associative memory. IEEE TETC, 6(3):305–316.

Jeon, D. and Chung, K. (2017). CasHMC: A Cycle-Accurate Simulator for Hybrid Mem- ory Cube. IEEE CAL, 16(1):10–13.

Kaplan, R., Yavits, L., Ginosar, R., and Weiser, U. (2017). A resistive cam processing-in- storage architecture for dna sequence alignment. IEEE Micro, 37(4):20–28.

Khoram, S., Zha, Y., and Li, J. (2018). An alternative analytical approach to associative processing. IEEE CAL, 17(2):113–116.

Kim, J. S., Senol Cali, D., Xin, H., Lee, D., Ghose, S., Alser, M., Hassan, H., Ergin, O., Alkan, C., and Mutlu, O. (2018). GRIM-Filter: Fast seed location ltering in DNA read mapping using processing-in-memory technologies. BMC Genomics, 19(2):89.

Lefurgy, C., Rajamani, K., Rawson, F., Felter, W., Kistler, M., and Keller, T. W. (2003). Energy management for commercial servers. IEEE Computer, 36(12):39–48.

Leidel, J. D. and Chen, Y. (2016). Hmc-sim-2.0: A simulation platform for exploring custom memory cube operations. In IPDPSW, pages 621–630.

Li, S., Liu, L., Peng Gu, Xu, C., and Yuan Xie (2016). NVSim-CAM: A circuit-level simulator for emerging nonvolatile memory based content-addressable memory. In ICCAD, pages 1–7.

Mutlu, O., Ghose, S., Gómez-Luna, J., and Ausavarungnirun, R. (2019). Enabling practi- cal processing in and near memory for data-intensive computing. In DAC.

Nai, L., Hadidi, R., Sim, J., Kim, H., Kumar, P., and Kim, H. (2017). GraphPIM: Enabling instruction-level pim ofoading in graph computing frameworks. In HPCA, pages 457– 468.

Oliveira, G. F., Santos, P. C., Alves, M. A. Z., and Carro, L. (2017). A generic process- ing in memory cycle accurate simulator under hybrid memory cube architecture. In SAMOS, pages 54–61.

Paulo, J. and Lima, C. D. (2019). PIM-gem5 : a system simulator for Processing- in-Memory design space exploration. Master's thesis, Universidade Federal do Rio Grande do Sul.

Santos, P. C., de Lima, J. a. P. C., de Moura, R. F., Ahmed, H., Alves, M. A. Z., Beck, A. C. S., and Carro, L. (2018). Exploring IoT Platform with Technologically Agnostic Processing-in-memory Framework. In INTESA, pages 1–6.

Xu, S., Chen, X., Wang, Y., Han, Y., Qian, X., and Li, X. (2019). PIMSim: A exible and detailed processing-in-memory simulator. IEEE CAL, 18(1):6–9.

Yang, X., Hou, Y., and He, H. (2019). A processing-in-memory architecture programming paradigm for wireless internet-of-things applications. Sensors, 19:140.

Yantir, H. E., Eltawil, A. M., and Kurdahi, F. J. (2018). A hybrid approximate computing approach for associative in-memory processors. IEEE JETCAS, pages 1–1.

Yavits, L., Kaplan, R., and Ginosar, R. (2018). PRINS: resistive CAM processing in storage. CoRR, abs/1805.09612.

Yavits, L., Morad, A., and Ginosar, R. (2015). Computer architecture with associative processor replacing last-level cache and simd accelerator. IEEE TC, 64(2):368–381.

Zhang, D., Jayasena, N., Lyashevsky, A., Greathouse, J. L., Xu, L., and Ignatowski, M. (2014). TOP-PIM: Throughput-oriented programmable processing in memory. In HPDC, pages 85–98.