Memory Latency: to Tolerate or to Reduce?
Resumo
It has become a truism that the gap between processor speed and memory access latency is continuing to increase at a rapid rate. This paper presents some of the architecture strategies which are used to bridge this gap. They are mostly of two kinds: memory latency reducing approaches such as employed in caches and HiDISC: (Hierarchical Decoupled Architecture) or memory latency tolerating schemes such as SMT (Simultaneous Multithreading) or ISSC (I-structure software cache). Yet a third technique reduces the latency by integrating on the same chip processor and DRAM. Finally, algorithmic techniques to improve cache utilization and reduce average memory access latency for traditional cache architectures are discussed.
Referências
AMARAL, J. N. et al. Portable Threaded-C release 1.1 Technical note 05. Computer Architecture and Parallel System Laboratory, University of Delaware. Sep, 1998
ARVIND, R. S. et al. I-Structure: Data Structures for Parallel Computing. ACM Transactions on Programming Languages and Systems, Oct, 1989
BURGER, Doug; GOODMAN, James R.; KAGI Alain. Memory Bandwidth Limitations of Future Microprocessors. 23rd Annual International Symposium on Computer Architecture, 1996
COOLEY, J. W.; TUKEY, J. W. An Algorithm for the Machine Calculation of Complex Fourier Series. Math. Comp., 19, 1965.
CRAGO, S.P. HiDISC: A High-Performance Hierarchical, Decoupled Architecture, Ph.D. Thesis, University of Southern California, December 1997.
CRISP Richard. Direct Rambus Technology: The New Main Memory Standard, IEEE Micro, Nov. 1997
CULLER, D. LogP: Towards a Realistic Model of Parallel Computation. Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, May, 1993
CUPPU, Vinodh; JACOB, Bruce; DAVIS, Brian; MUDGE, Trevor. A Performance Comparison of Contemporary DRAM Architecture. 26th Annual International Symposium on Computer Architecture, 1999.
DAVIS, Brian et al. DDR2 and Low Latency Variants. Workshop on Solving the Memory Wall Problem, 2000.
DENNIS, J. B.; GAO, G. R. On Memory Models and Cache Management for Shared-Memory Multiprocessors. CSG MEMO 363, Laboratory for Computer Science, MIT, March 1995.
DIEP, T. A. et al. Performance Evaluation of the PowerPC 620 Microprocessor. In Proceedings of 22nd Annual International Symposium on Computer Architecture. June 1995.
FERRANTE, J.; OTTENSTEIN, K.; WARREN, J. D. The Program Dependence Graph and Its Use in Optimization. ACM Transactions on Programming Languages and Systems, Vol. 9, No. 3, July 1986.
GILOI, W. K. et al. MANNA: Prototype of a Distributed Memory Architecture with Maximized Sustained Performance. In Proceedings of Euromicro PEP96 Workshop, 1996.
GULATI, M.; BAGHERZADEH, N. Performance Study of a Multithreaded Superscalar Microprocessor. In Proceedings of International Symposium on High-Performance Computer Architecture, 1996.
HALLNOR, Erik G. A Fully Associative Software-Managed Cache Design. 27th Annual International Symposium on Computer Architecture, 2000.
HONG, S. L.; McKEE, S. A.; SALINAS, M. H.; KLENKE, R. H.; AYLOR, J. H.; WULF, W. A. Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory. 5th International Symposium on High-Performance Computer Architecture, 1999.
HUM, H. H. J. et al. A Design Study of the EARTH Multiprocessor. In PACT 95, June 1995.
KANG, Y. et al. FlexRAM: Toward an Advanced Intelligent Memory System. International Conference on Computer Design, 1999.
KIM, K.; PRASANNA KUMAR. Perfect Latin Squares and Parallel Array Access. In Proceedings of the 16th Annual International Symposium on Computer Architecture, May 1989.
KOZYRAKIS, Christoforos. E. et al. Scalable Processors in the Billion-Transistor Era: IRAM. IEEE Computer, 1997.
KOZYRAKIS, Christoforos E.; PATTERSON, David A. A New Direction for Computer Architecture Research. IEEE Computer, 1998.
KURIAN, Lizy; HULIN, Paul T.; CORAOR, Lee D. Memory Latency Effects in Decoupled Architectures. IEEE Transactions on Computers, Vol. 43, No. 10, Oct. 1994.
KWAK, Hantak; LEE, Ben; HURSON, Ali R.; YOON, Suk-Han; HAHN, Woo-Jong. Effects of Multithreading on Cache Performance. IEEE Transactions on Computers, Vol. 48, No. 2, Feb. 1999.
LAM, M. S.; ROTHBERG, E.; WOLF, M. E. The Cache Performance and Optimizations of Blocked Algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), April 1991.
LIN, Wen-Yen; GAUDIOT, Jean-Luc. The Design of an I-Structure Software Caches System. In Workshop on Multithreaded Execution, Architecture and Compilation, 1998.
McMAHON, F. H. Fortran CPU Performance Analysis. Lawrence Livermore Laboratories, 1972.
MOON, S.; SAAVEDRA, R. H. Hyperblocking: A Data Reorganization Method to Eliminate Cache Conflicts in Tiled Loop Nests. USC-CS-98-671, USC Computer Science Technical Report, February 1998.
PARK, N.; KANG, D.; BONDALAPATI, K.; PRASANNA, V. Dynamic Data Layouts for Cache-Conscious Factorization of DFT. International Parallel and Distributed Processing Symposium (IPDPS 2000), May 2000.
PATTERSON, David et al. A Case for Intelligent DRAM. IEEE Micro, April 1997.
RIVERA, G.; TSENG, C. W. Data Transformations for Eliminating Conflict Misses. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, June 1998.
SAULSBURY, Ashley; PONG, Fong; NOWATZYK, Andreas. Missing the Memory Wall: The Case for Processor/Memory Integration. In Proceedings of 23rd Annual International Symposium on Computer Architecture, 1996.
TEMAM, O.; GRANSTON, E.; JALBY, W. To Copy or Not to Copy: A Compile-Time Technique for Assessing When Data Copying Should Be Used to Eliminate Cache Conflicts. Proceedings of Supercomputing '93, November 1993.
THOMAS, Randi; YELICK, Katherine. Efficient FFT on IRAM. 1st Workshop on Media Processors and DSPs, 1999.
TULLSEN, Dean M.; EGGER, Susan J.; LEVY, Henry M. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In Proceedings of 22nd Annual International Symposium on Computer Architecture, 1995.
WILES, Maurice J. The Memory Gap. Workshop on Solving the Memory Wall Problem, 2000.
