A Simulator for SMT Architectures: Evaluating Instruction Cache Topologies

  • Ronaldo Gonçalves UEM
  • Eduard Ayguadé Universitat Politècnica de Catalunya
  • Mateo Valero Universitat Politècnica de Catalunya
  • Philippe Navaux UFRGS

Resumo


SMT (Simultaneous MultiThreaded) is becoming one of the major trends in the design of future generations of microarchitectures. Its ability to exploit both intra- and inter-thread parallelism makes it possible to exploit the potential ILP (Instruction-level parallelism) that will be offered by future processor designs. SMT architectures can hide high latencies of instructions taking better advantage of the hardware resources through the simultaneous execution of a lot of diversified instructions from different threads. In order to provide detailed and accurate information about the performance of this approach, a SMT simulator has been developed on top of the SimpleScalar Tool Set. The SMT simulator allows the configuration of a large set of architectural parameters (cache and reservation station topologies, number of slots and branch prediction accuracy) in addition to the parameters originally inherited from the basic simulator (size of the cache memories, tables and queues, instruction scheduling policy and pipeline width). The SMT simulator has been exhaustively tested with workloads composed of some SPEC95 benchmarks and under different instruction cache topologies. The simulator has proved to be an efficient tool for the performance evaluation of these kind of architectures. The paper describes the main features of this simulator and analyses the simulation results.
Palavras-chave: superscalar, SMT, performance evaluation

Referências

Anderson, D. & Shanley, T., Pentium Processor System Architecture, Second Edition, MindShare, Inc., Addison-Wesley, Massachusetts, 433p., February, 1995.

Burger, D., Austin, T. M., The SimpleScalar Tool Set, Version 2.0, Technical Report #1342, University of Wisconsin–Madison, June, 1997.

Butler, M., et al., Single Instruction Stream Parallelism is Greater Than Two, Proceedings of the 18th Annual International Symposium on Computer Architecture, Toronto, Canada, May, 1991.

Chakravarty, D. & Cannon, C., PowerPC: Concepts, Architecture, and Design, J. Ranade Workstations Series, McGraw-Hill, USA, Inc., p.363, 1994.

Diep, T. A., Nelson, C., Shen, J. P., Performance Evaluation of the PowerPC 620 Microarchitecture, Proceedings of the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June, 1995.

Gonçalves, R. A. L., Navaux, P. O. A., SEMPRE: Superscalar Architecture with Multiple Processes in Execution (in portuguese), X SBAC-PAD, Búzios, Brazil, September, 1998.

Gonçalves, R. A. L., Sagula, R. L., Divério, T. A., Navaux, P. O. A., Process Prefetching for a Simultaneous Multithreaded Architecture, SBAC-PAD’99 (11th Symposium on Computer Architecture and High Performance Computing), Natal, Brazil, Sept/October, 1999.

Gonçalves, R. A. L., Ayguadé, E., Valero, M., Navaux, P. O. A., Performance Evaluation of Issue Topology and Decode Depth on Simultaneous Multithreaded Architectures, Technical Report, UFRGS, Brazil, April, 2000.

Hennessy, J., Patterson, D. A., Computer Architecture: A Quantitative Approach., San Mateo, CA: Morgan Kaufmann, 1994.

Hily, S., Seznec, A., Standart Memory Hierarchy Does Not Fit Simultaneous Multithreading, Proceedings of the Multithreaded Execution, Architecture and Compilation – MTEAC, 1998.

Hirata, H. et al., An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads, Proceedings of the 19th Annual International Symposium on Computer Architecture, ACM & IEEE-CS, pp.136–145, May, 1992.

Johnson, M., Superscalar Microprocessor Design, Prentice Hall Series in Innovative Technology, PTR Prentice Hall, Englewood Cliffs, New Jersey, 288p., 1991.

Jouppi, N. P. & Wall, D. W., Available Instruction-Level Parallelism for Superscalar and Superpipelined Machines, Research Report, Digital Western Research Laboratory, Palo Alto, California, July, 1989.

Jourdan, S., Sainrat, P., Litaize, D., An Investigation of the Performance of Various Instruction-Issue Buffer Topologies, Proceedings of the 28th International Symposium on Microarchitecture – MICRO-28, Ann Arbor, Michigan, December, 1995.

Lipasti, M. H. & Shen, J. P., Exceeding the Dataflow Limit via Value Prediction, 29th Micro, Paris, France, December, 1996.

Lo, J. et al., An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors, Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA’98), June 29–July 1, 1998.

MIPS R10000 Microprocessor User’s Manual, Version 1.0, MIPS Technologies, Inc., North Shoreline, Mountain View, California, June, 1995.

Palacharla, S., Jouppi, N. P., Smith, J. E., Complexity-Effective Superscalar Processors, Proceedings of ISCA’97, Denver, USA, 1997.

Sigmund, U., Ungerer, T., Memory Hierarchy Studies of Multimedia-enhanced Simultaneous Multithreaded Processors for MPEG-2 Video Decompression, Workshop on Multi-Threaded Execution, Architecture and Compilation (MTEAC 00), Toulouse, 8.1.2000.

Smith, J. E., Sohi, G. S., The Microarchitecture of Superscalar Processors, Proceedings of the IEEE, 83(12), pp.1609–1624, December, 1995.

Sohi, G. S., Instruction Issue Logic for High Performance, Interruptible, Multiple Functional Unit, Pipelined Computers, IEEE Transactions on Computers, 39(3):349–369, March, 1990.

Tomasulo, R. M., An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM Journal, pp.25–33, January, 1967.

Tullsen, D. M., et al., Simultaneous Multithreading: Maximizing On-Chip Parallelism, Proceedings of the ISCA’95, Santa Margherita Ligure, Italy, Computer Architecture News, n.2, v.23, 1995.

UltraSPARC User’s Manual, UltraSPARC I/UltraSPARC-II, Revision 2.0, Sun Microsystems, Mountain View, CA, USA, May, 1996.

Wall, D. W., Limits of Instruction-Level Parallelism, Research Report, Digital Western Research Laboratory, Palo Alto, California, June, 1993.

Yamamoto, W., et al., Performance Estimation of Multistreamed, Superscalar Processors, Proceedings of the Hawaii International Conference on Systems Sciences, January, 1994.

Young, J. L., Por Dentro do Power PC, Editora Berkeley Brasil, 313p., São Paulo, 1996.
Publicado
24/10/2000
GONÇALVES, Ronaldo; AYGUADÉ, Eduard; VALERO, Mateo; NAVAUX, Philippe. A Simulator for SMT Architectures: Evaluating Instruction Cache Topologies. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 12. , 2000, São Pedro/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2000 . p. 279-286. DOI: https://doi.org/10.5753/sbac-pad.2000.41226.