DyCa: Dynamically Adaptable Cache Bypassing Mechanism

Mariana Carmin; Paulo Cesar Santos; Marco Antonio Zanata Alves

doi:10.5753/wscad.2022.226583

Mariana Carmin UFPR
Paulo Cesar Santos UFRGS
Marco Antonio Zanata Alves UFPR

DOI: https://doi.org/10.5753/wscad.2022.226583

Resumo

As the number of cores increases, more cores and threads share the Last-Level Cache (LLC), which consumes a large portion of the chip’s total power and area. Therefore, sophisticated solutions must guarantee the best resource usage addressing cache conflicts and cache pollution problems. This work exploits the knowledge that many applications present poor temporal and spatial locality. Thus, an adaptive cache mechanism can benefit such applications, improving general system performance and decreasing energy consumption. In this paper, we propose an online and application-aware predictor to adapt the use of LLC. As a result, DyCa shows up to 22% and 21% performance increases in single and multi-program workloads, respectively.

Referências

Abad, P., Prieto, P., Puente, V., and Gregorio, J. A. (2015). Improving last level shared cache performance through mobile insertion policies (mip). Parallel Computing, 49:13-27.

Alves, M. A. Z. (2014). Increasing energy efficiency of processor caches via line usage predictors.

Alves, M. A. Z., Villavieja, C., Diener, M., Moreira, F. B., and Navaux, P. O. A. (2015). Sinuca: A validated micro-architecture simulator. In 17th International Conference On High Performance Computing And Communications (HPCC), pages 605-610.

Egawa, R., Saito, R., Sato, M., and Kobayashi, H. (2019). A layer-adaptable cache hierarchy by a multiple-layer bypass mechanism. In Proceedings of the 10th Int. Symp. on Highly-Efficient Accelerators and Reconfigurable Technologies, pages 1-6.

Hastie, T. J. (2017). Generalized additive models. In Statistical models in S, pages 249-307. Routledge.

Kim, Y., More, A., Shriver, E., and Rosing, T. (2019). Application performance prediction and optimization under cache allocation technology. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1285-1288.

Köhler, R. and Alves, M. A. Z. (2019). Acelerando requisições de prováveis cache misses com requisições em paralelo cache/dram. In Anais Estendidos do IX Simpósio Brasileiro de Engenharia de Sistemas Computacionais, pages 101-106. SBC.

Li, C., Song, S. L., Dai, H., Sidelnik, A., Hari, S. K. S., and Zhou, H. (2015). Locality-driven dynamic gpu cache bypassing. In Proceedings of the 29th ACM on International Conference on Supercomputing, pages 67-77.

Liu, J., Egawa, R., Agung, M., and Takizawa, H. (2020). A conflict-aware capacity control mechanism for last-level cache. In 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW), pages 416-420. IEEE.

Liu, J., EGAWA, R., and TAKIZAWA, H. (2022). A conflict-aware capacity control mechanism for deep cache hierarchy. IEICE Transactions on Information and Systems, 105(6):1150-1163.

Mittal, S., C. Y. and Zhang, Z. (2013). Master: A multicore cache energy-saving technique using dynamic cache reconfiguration. Transactions on very large scale integration (VLSI) systems, 22(8):1653-1665.

Park, J., K. S. and Hou, J. U. (2021). An l2 cache architecture supporting bypassing for low energy and high performance. Electronics, 10(11):1328.

Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., and Karunanidhi, A. (2004). Pinpointing representative portions of large intel® itanium® programs with dynamic instrumentation. In Int. Symp. on Microarchitecture (MICRO-37'04), pages 81-92.

Powell, M., Yang, S.-H., Falsafi, B., Roy, K., and Vijaykumar, T. (2000). Gated-vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the 2000 international symposium on Low-power electronics and design, pages 90-95.

Qureshi, M. K., Lynch, D. N., Mutlu, O., and Patt, Y. N. (2006). A case for mlp-aware cache replacement. In 33rd Int. Symp. on Computer Architecture (ISCA'06), pages 167-178.

Sato, M., Chen, Y., Kikuchi, H., Komatsu, K., and Kobayashi, H. (2019). Perceptronbased cache bypassing for way-adaptable caches. In 2019 IEEE Symposium in LowPower and High-Speed Chips (COOL CHIPS), pages 1-3. IEEE.

SPEC (2006). SPEC CPU 2006. https://www.spec.org/cpu2006. Online; accessed 08 November 2021.

SPEC (2017). SPEC CPU 2017. https://www.spec.org/cpu2017. Online; accessed 08 November 2021.

Xie, X., Liang, Y., Wang, Y., Sun, G., and Wang, T. (2015). Coordinated static and dynamic cache bypassing for gpus. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pages 76-88. IEEE.

Yang, S.-H., Lee, S., Lee, J. Y., Cho, J., Lee, H.-J., Cho, D., Heo, J., Cho, S., Shin, Y., Yun, S., et al. (2012). A 32nm high-k metal gate application processor with ghz multi-core cpu. In 2012 IEEE Int. Solid-State Circuits Conference, pages 214-216.

Zhu, W. and Zeng, X. (2021). Decision tree-based adaptive reconfigurable cache scheme. Algorithms, 14(6):176.