Scratchpad Memories for Parallel Applications in Multi-core Architectures
Resumo
Scratchpad memories are largely used in embedded processors due to their reduced energy consumption and area compared to traditional cache memories. In multi-core architectures, these memories are an interesting solution for the storage of shared data and data which is used intensively. However, these memories present some challenges, such as the need for manual choice of the content. Furthermore, different sizes of scratchpad memories result in the need to modify the source code of the application. In this article, we propose the use of a scratchpad memory in a multi-core architecture which alleviates these disadvantages. We added the scratchpad to an architecture consisting of 4 cores, reducing the size of L2 cache in order to give chip area to the scratchpad memory. We evaluated our proposed design by executing the NAS Parallel Benchmark (NPB) applications in a simulator. We improved performance by up to 45% compared to the base architecture, reducing cache invalidations by up to 85%.
Referências
R. Banakar, S. Steinke, B. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad memory: design alternative for cache onchip memory in embedded systems. In Proceedings of the tenth international symposium on Hardware/software codesign, pages 73-78, New York, NY, USA, 2002. ACM.
E. Cruz, M. Alves, and P. Navaux. Process mapping based on memory access traces. In 11th Symposium on Computing Systems, WSCAD-SCC, pages 72-79, 2010.
E. Cruz, C. P. Ribeiro, M. Alves, A. S. Carissimi, J. Mehaut, and P. Navaux. Process mapping based on memory access traces. In 13th Workshop on Advances in Parallel and Distributed Computational Models, APDCM / IPDPS, Los Alamitos, CA, USA, 2011. IEEE Computer Society.
P. Francesco, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. M. Mendias. An integrated hardware/software approach for run-time scratchpad management. In Proceedings of the 41st annual Design Automation Conference, DAC, pages 238-243, New York, NY, USA, 2004. ACM.
B. Jacob, S. Ng, and D. Wang. Memory systems: cache, DRAM, disk. Morgan Kaufmann Publishers Inc., 2007.
H. Jin, M. Frumkin, and J. Yan. The OpenMP implementation of NAS parallel benchmarks and its performance. NASA Ames Research Center, Technical Report NAS-99-011, 1999.
L. Li, L. Gao, and J. Xue. Memory coloring: a compiler approach for scratchpad memory management. In Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on, pages 329-338, 2005.
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B.Werner. Simics: A full system simulation platform. Computer, 35:50-58, 2002.
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing nuca organizations and wiring alternatives for large caches with cacti 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 3-14, Los Alamitos, CA, USA, 2007. IEEE Computer Society.
N. Nguyen, A. Dominguez, and R. Barua. Memory allocation for embedded systems with a compile-time-unknown scratch-pad size. ACM Trans. Embed. Comput. Syst., 8:21-32, April 2009.
J. Reinders. VTune performance analyzer essentials: measurement and tuning techniques for software developers. Intel Press, 2005.
V. Suhendra, A. Roychoudhury, and T. Mitra. Scratchpad allocation for concurrent embedded software. ACM Transactions on Programming Languages and Systems, 32:13-47, April 2010.
A. Yanamandra, B. Cover, P. Raghavan, M. Irwin, and M. Kandemir. Evaluating the role of scratchpad memories in chip multiprocessors for sparse matrix computations. In IEEE International Symposium on Parallel and Distributed Processing., IPDPS, pages 1-10, 2008.