Process Prefetching for a Simultaneous Multithreaded Architecture

  • Ronaldo A. L. Gonçalves UFRGS
  • Rafael L. Sagula UFRGS
  • Tiarajú A. Diverio UFRGS
  • Philippe O. A. Navaux UFRGS


Traditional superscalar architectures shall eventually prove incapable of taking full advantage of billions of transistors to be available in the future generations of microprocessors if they remain limited by dataflow dependencies. Thus, SMT (Simultaneous Multithreaded) architccture may be a possiblc solution to this problem, as far as it can fctch and execute a great deal of instruction flows and at the same time hiding both high latency operations and data dependencies. But this capability of SMT architecture depends on the existence of multithreaded applications and on some effective fetching instruction mechanism that will guarantee the presence of ready threads in the L1 i-cache to be used throughout context switching. SEMPRE (Superscalar Execution ofMultiple PRocEsses) is a type of SMT architecture which makes use of various processes to be found in today's operating systems developed to supply instructions to its SMT pipeline. This paper proposes and evaluates an effectual mechanism that prefetches instructions from awaiting processes in order to guarantee adequate context switching. An analytical model of such a mechanism was developed through using DSPN (Deterministic and Stochastic Petri Nets) and the results have shown that its use improves the dispatch width by 25% when realistic parameters are used. This method reduces the problem of cache degradation (present on many SMT architectures) and tolerates L2 delays of up to 9 cycles in some cases without the loss of performance.

Palavras-chave: SMT, Prefetch, Modeling


Performance Tradeoff in Multithreaded Processors IEEE Transactions on Parallel and Distributed Systems. 3(5):525-539. September, 1992.

Akkary. H.; Driscoll. M. A.: A Dynamic Multithreading Processor Proceedings of the MICRO-31: ACM/IEEE International Symposium on Microarchitecture, Dallas, Texas. December, 1998.

Bums. J.; Gaudiot J.-L.: Exploring the SMT Fetch Bouleneck Proceedings of the MTEAC'99 (in conjunction with HPCA-5). Orlando. Florida, 1999.

Carvalho, L.: Uma Ferramenta para Modelagem de Sistemas de Comunicação. Computação e Confiabilidade. Msc. Thesis. COPPE/UFRJ. Brazil, 1997.

Couvillion, J.; Freire, R.; Johnson. R.; Obal II, W.D.; Qureshi, M.A.; Rai, M.; Sanders, W.H.; Tvedt, J.E.: Performability Modeling with UltraSAN IEEE Software. vol. 8, no. 5, Sept. 1991. pp. 69-80.

McCrackin. D.C.: The Synergistic Effect of Thread Scheduling and Caching in Multithreaded Computers COMPCON Spring, pages 157-164. 1993.

Goncalves. R. A. L.; Navaux, P. O. A.: SEMPRE: Uma Arquitetura SuperEscalar com Multiplos PRocessos em Execução Anais, X SBAC-PAD. Buzios, Brazil, Sep, 1998.

Gonçalves, R. A. L.; Navaux, P. O. A.: Proposta de uma Arquitetura Multi-Threading Voltada para Sistemas MultiProcessos IV Congresso Argentino de Ciência da Computação - CACIC'98, Neuquen. Argentina. Oct, 1998.

Goosens, B. T.: The Threads Pracessor Proceedings of the MTEAC'98. Workshop on Multithreaded Execution, Architecture and Compilation: held in conjunction with HPCA-4, Las Vegas, Nevada, February, 1998.

Govindarajan. R.; Nemawarkar. S. S.: SMALL: A Scalable Multithreaded Architecture to Exploit Large Lacality Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, Dallas, TX. Dec. 1992.

Gulati, M.; Bagherzadeh, N.: Performance Study of a Multithreaded Superscalar Microprocessor Proceedings of the HPCA-2, California, February, 1996.

Gunther. N. J.: The Practical Performance Analyst: Performance-by-Design Techniques for Distributed Systems McGraw-Hill, 1998.

EGGERS, Susan J. et al: Simultaneous Multithreading: A Platform for Next-Generation Processors IEEE Micro, V.l7. n.5. Sep/Oct 1997.

Hily, S.; Seznec, A.: Out-of-Order Execution May Not Be Cost-Effective on Processors featuring Simultaneous Multithreading IRISA (Institui de Recherche en Informatique et Systemes Alatoires. Publication Interne 1179, March, 1998.

Hirata, H. et al: An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads Proceedings of the 19th Annual International Symposium on Computer Architecture, ACM & IEEE-CS. 1992.

Jacob, B. L.; Chen. P. M.; Silverman, S. R.; Mudge, T. N.: An Analytical Model for Designing Memory Hierarchies IEEE Transactions on Computer, Vol. 45, No. 10, Oct/1996.

Jain, R.: The Art of Computer Systems Performance Analysis John Wiley and Sons, New York, 1991.

Kant, L.; Sanders, W.H.: Analysis of the Distribution of Consecutive Cell Losses in an ATM Switch Using Stochastic Activity Networks Special Issue of International Journal of Computer Systems Science & Engineering on ATM Switching, vol. 12. no. 2, March 1997, pp. 117- 129.

Laudon, J. et al: Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations Proceedings of the International Conference on ASPLOS. Oct, 1994.

Lee, D. et al: Instruction Cache Fetch Policies for Speculative Execution Proceedings of the 22th International Symposium on Computer Architecture (ISCA'22). ltaly. 1995.

Lindemann. C.: Performance Modeling with Deterministic and Stochastic Petri Nets John Wiley and Sons, 1998.

Lo, J.L. et al: An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors Proceedings of the 25th Annual International Symposium on Computer Architecture, June, 1998.

Marsan, M.; Baldo, G.; Conte, G.: A Class of Generalized Stochastic Petri Nets for tire Performance Evaluation of Multiprocessor Systems ACM Transactions on Computer Systems, May 1984.

Marcuello, P., González. A.: Control and Data Dependence Speculation in Multithreaded Processors Proceedings of the MTEAC'98 (In conjunction with HPCA-4), Las Vegas, Nevada. February. 1998.

Moreno, E. D.; Kofuji, S. T.: Um Modelo RPDE para Busca Antecipada de Dados num Multiprocessador Baseado em um Simples Nó SMP X SBAC-PAD. Anais. Búzios, RJ, 28-30 Sep. 1998.

Ncmirovsky, M., Yarnamoto, W. : Quantitative Study on Data Coches on a Multistreamed Architecure Proccedings of the MTEAC'98 (In conjunction with HPCA-4), Las Vegas. Nevada. February. 1998.

Patterson. D. A.; Hennessy. J. L.: Computer Architecture: A Quatitative Approach Morgan Kaufmann Publishers. 1990.

Park, W.; et al: Performance Advantages of Multithreaded Processors Proceedings of the International Conference on Parallel Processing, 1991.

Petri, C. A.: Kommunikation mit Automaten Ph.D. Thesis, University of Bonn, Germany, 1962.

Rinker. R.E, Tamma, R., Najjar, W.: Evaluation of Cache Assisted Multithreaded Architecture Proceedings of the MTEAC'98 (In conjunction with HPCA-4), Las Vegas. Nevada, February, 1998.

Robertazzi, T. Computer Networks and Systems: Queuing Theory and Performance Evaluation Springer-Verlag, 1994.

Sigmund, U.; Ungerer, T.: Identifying Boulenecks in a Multithreaded Superscalar Microprocessor Proceedings of the EUROPAR' 96, Lyon, August, 1996.

Saavedra-Barrera, R. H.; Culler, D. E.; von Eicken, T.: Analysis of Multithreaded Architectures for Parallel Computing 2nd Annual ACM Symposium on Parallel Algorithms and Architecture; Crete. Greece; July, 1990, pp. 169-178.

Sagula, R. L.; Diverio, T.; Navaux, P. O. A.: Modelagem Analítica: Formalismos e Ferramentas. Trabalho Individual 789. PPGC/UFRGS. 1999.

Sagula. R. L.; Gonçalves, R. A. L.; Diverio, T. A.; Navaux, P. O. A.: A Utilização de Modelagem Analítica no Projeto de Arquitetura de Processadores. CLEI/PANEL'99. Assunción, PY. Sept/1999.

Sahner, Robin A.; Trivedi, Kishor S.; Puliafito, A.: Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package Kluwer Academic Publishers, 1996.

Shanley, T.: Pentium Pro and Pentium II System Architecture. Addison-Wesley. 1997.

Silva, E.; Muntz, R.: Métodos Computacionais de Solução de Cadeias de Markov: Aplicações a Sistemas de Computação e Comunicação VIII Escola de Computação, Gramado-RS. 1992.

Serra, T.; Bampi, S.: Mecanismos de Pré-Busca em Máquinas RISC Trabalho Individual 829, PPGC/UFRGS. 1999.

Thekkath, R.; Eggers, S.J.: The Effectiveness of Multiple Hardware Contexts Proceedings. Sixth International Conference on Architectural Support for Programming Languages and Operating Systems pages 328-337. October, 1994.

Tsai. J.-Y. & Yew; P.-C.: The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation Proceedings of the Conference on Parallel Architectures and Compilation Techniques - PACT96, October. 1996.

Tullsen, D. M. et al: Simultaneous Multithreading: Maximizing On-Chip Parallelism Proceedings of the ISCA'95, Santa Margherita Ligure, haly, Computer Architecture News, n.2, v.23. 1995.

Tullsen. D.M. et al: Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor Proceedings of the 23rd ISCA. Philadelphia, PA. May, 1996.

Wallace. S.; Calder. B.; Tullsen. D. M.: Threaded Multiple Path Execution Proceedings of the 25th International Symposium on Computer Architecture. June, 1998.

Yamamoto. W.; Serrano, M.; Talcott, A.; Wood. R.; Nemirovsky, M.: Performance Estimation of Multistreamed, Superscalar Processors Proceedings of the Hawaii International Conference on Systems Sciences, January. 1994.
GONÇALVES, Ronaldo A. L.; SAGULA, Rafael L.; DIVERIO, Tiarajú A.; NAVAUX, Philippe O. A.. Process Prefetching for a Simultaneous Multithreaded Architecture. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 11. , 1999, Natal. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 1999 . p. 59-66. DOI: