Cluster Cache Monitor

  • Guohong Li Tsinghua University
  • Olivier Temam INRIA Saclay
  • Zhenyu Liu Tsinghua National Laboratory for Information Science and Technology
  • Dongsheng Wang Tsinghua National Laboratory for Information Science and Technology
  • Sanchuan Guo Tsinghua National Laboratory for Information Science and Technology
  • Chongmin Li Tsinghua National Laboratory for Information Science and Technology

Resumo


As the number of cores and the working sets of parallel workloads increase, shared L2 caches exhibit fewer misses than private L2 caches by making a better use of the total available cache capacity, but they also induce higher overall L1 miss latencies because of the longer average distance between two nodes, and the potential congestions at certain nodes. One of the main causes of the long L1 miss latencies are accesses to home nodes of the directory. However, we have observed that there is a high probability that the target data of an L1 miss resides in the L1 cache of a neighbor node. In such cases, these long-distance accesses to the home nodes can be potentially avoided. We organize the multi-core into clusters of 2×2 nodes, and in order to leverage the aforementioned property, we introduce the Cluster Cache Monitor (CCM). The CCM is a hardware structure in charge of detecting whether an L1 miss can be served by one of the cluster L1 caches, and two cluster-related states in the coherence protocol in order to avoid long-distance accesses to home nodes upon hits in the cluster L1 caches. We evaluate this approach on a 64-node multi-core using SPLASH-2 and PARSEC benchmarks, and we find that the CCM can reduce the execution time by 15% and reduce the energy by 14%, while saving 28% of the directory storage area compared to a standard multi-core with a shared L2. We also show that the CCM outperforms recent mechanisms, such as ASR, DCC and RNUCA.
Palavras-chave: Coherence, Network interfaces, Protocols, Hardware, Cooperative caching, Benchmark testing, Monitoring
Publicado
23/10/2013
LI, Guohong; TEMAM, Olivier; LIU, Zhenyu; WANG, Dongsheng; GUO, Sanchuan; LI, Chongmin. Cluster Cache Monitor. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 25. , 2013, Porto de Galinhas/PE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2013 . p. 1-8.