Performance Study of a Multithreaded Superscalar Microprocessor

  • Manu Gulati NexGen Inc.
  • Nader Bagherzadeh University of California


This paper describes a technique for improving the performance of a superscalar processor through multithreading. The technique exploits the instruction-level parallelism avaliable both inside each individual stream, and across streams. The former is exploited through out-of-order execution of instructions within a stream, and the latter through execution of instructions from different streams simultaneously. Aspects of multithreaded superscalar design, such as fetch policy, cache performance, instruction scheduling, and functional unit utilization are studied. We analyze performance based on the simulation of a superscalar architecture and show that it is possible to provide support for multiple streams with minimal extra hardware, yet achieving significant performance gain (20 • 55%) across a range of benchmarks.


Anant Agarwal. "Performance tradeoffs in multithreaded processors,". IEEE Transactions on Parallel and Distributed Systems, 3(5):525-539, September 1992.

R. Alverson, O. Callaban, O. Cummins, B. Koblenz, A. Poretrfield, and B. Smith. "The Tera Computer System,". In Proceedings of International Conference on System Science, pages 1-6, June 1990.

Bob Boothe and Abhiram Ranade. "lmproved multithreading techniques for hiding communication latency in multiprocessors,". In Proceedings of the 19th International Symposium on Computer Architecture, pages 214-223, May 1992.

R. H. Halstead Jr and T. Fujita. "A multithreaded processor architecture for parallel symbolic computing,". In Proceedings of the 15th International Symposium on Computer Architecture, pages 443-451, June 1988.

John Hennessey and David Patterson. "Computer Architecture: A Quantitative Approach,". Morgan Kaufmann Publishers, Inc., San Francisco, California, second edition, 1996.

Mike Johnson. "Superscalar Microprocessor Design,". Prentice Hall, Englewood Cliffs, 1991.

R. M. Keller. "Look-ahead processors,". Computing Surveys, 7(4):177-195, December 1975.

IBM Microelectronics and Motorola Inc. "PorwerPC 603 RISC Microprocessor User's Manual,", 1994.

Rishiyur S. Nikhil and Arvind. "Can dataflow subsume von Nuemann computing?.". In Proceedings of the 16th International Symposium on Computer Architecture, pages 262-272, 1989.

J.P. Singh, Wolfe-Dietrich, and Anoop Gupta. "SPLASH: Stanford Parallel Applications for Shared-Memory,". Technical Report CSL-TR-92-526, Stanford University, Computer Systems Laboratory, Stanford University, CA 94305, 1992.

R. M. Tomasulo. "An Efficient Algorithm for Exploiting Multiple Arithmetic Units". IBM Journal, 11:25-33, January 1967.

Steven Wallace and Nader Bagherzadeh. "Performance lssues of a Superscalar Microprocessor,". In Proceedings of the 1994 International Conference on Parallel Processing, volume 1, pages 293-297, August 1994.

Steven Wallace, Nirav Dagli, and Nader Bagherzadeh. "Design and lmplementation of a 100 MHz Reorder Buffer,". In 37th Midwest Sympsium on Circuits and Systems, August 1994.
GULATI, Manu; BAGHERZADEH, Nader. Performance Study of a Multithreaded Superscalar Microprocessor. In: TUTORIAIS - INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 9. , 1997, Campos do Jordão/SP. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 1997 . p. 19-29. DOI: