DTSVLIW: VLIW Performance with Sequential Code

Alberto Ferreira de Souza; Peter Rounce

doi:10.5753/sbac-pad.2000.41225

Alberto Ferreira de Souza UFES
Peter Rounce University College London

DOI: https://doi.org/10.5753/sbac-pad.2000.41225

Resumo

Due to the temporal execution locality present in programs, even small instruction caches (16-Kbyte) can provide processors with fast access to instructions most of the time. The Dynamically Trace Scheduled VLIW (DTSVLIW) architecture exploits programs’ temporal execution locality by executing code in two distinct modes. In the first execution encounter, fragments of the code are executed in sequential mode (in a simple pipelined processor), scheduled into blocks of VLIW instructions and cached in a VLIW cache by the DTSVLIW’s Scheduler Engine. In subsequent encounters, the DTSVLIW’s VLIW Engine executes these blocks in VLIW mode. In this paper, we present experiments which show that DTSVLIW machines can perform better than Superscalar machines with equivalent hardware and better than VLIWs with the same degree of parallelism, while keeping the fast clock of the latter. We also discuss how the DTSVLIW compares with the Trace Cache and EPIC architectures.

Palavras-chave: DTSVLIW, EPIC, Superscalar, VLIW

Referências

D. August et al. Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture. Proc. of the 25th Int. Symp. on Computer Architecture, pp. 227–237, 1998.

M. J. Charney and T. R. Puzak. Prefetching and Memory System Behaviour of the SPEC95 Benchmarks. IBM J. of Res. and Dev., Vol. 41, No. 3, May 1997.

T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel. Optimization of Instruction Fetch Mechanisms for High Issue Rates. Proc. of the 22nd Int. Symp. on Computer Architecture, pp. 333–344, 1995.

D. Davidson et al. Some Experiments in Local Microcode Computation for Horizontal Machines. IEEE Trans. on Computers, Vol. C-30, No. 7, pp. 460–477, 1981.

A. F. de Souza and P. Rounce. Dynamically Trace Scheduled VLIW Architectures. Proc. of the HPCN’98, in Lecture Notes on Computer Science, Vol. 1401, pp. 993–995, 1998.

A. F. de Souza and P. Rounce. Dynamically Scheduling the Trace Produced During Program Execution into VLIW Instructions. Proc. of the 13th Int. Parallel Processing Symp. & 10th Symp. on Parallel and Distributed Processing – IPPS/SPDP’99, pp. 248–257, 1999.

A. F. de Souza and P. Rounce. Effect of Multicycle Instructions on the Integer Performance of the Dynamically Trace Scheduled VLIW Architecture. Proc. of the HPCN’99, LNCS Vol. 1593, pp. 1203–1206, 1999.

A. F. de Souza and P. Rounce. On the Effectiveness of the Scheduling Algorithm of the Dynamically Trace Scheduled VLIW Architecture. Proc. of the 11th Brazilian Symposium on Computer Architecture and High Performance Computing, pp. 167–174, 1999.

A. F. de Souza. Integer Performance Evaluation of the Dynamically Trace Scheduled VLIW Architecture. PhD Thesis, Department of Computer Science, University College London, 1999.

A. F. de Souza and P. Rounce. Dynamically Scheduling VLIW Instructions. To appear in the Journal of Parallel and Distributed Computing, 2000.

T. A. Diep, C. Nelson, and J. P. Shen. Performance Evaluation of the PowerPC620 Microarchitecture. Proc. of the 22nd Int. Symp. on Computer Architecture, pp. 163–174, 1995.

C. Dulong. The IA-64 Architecture at Work. IEEE Computer, pp. 24–31, July 1998.

J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. on Computers, Vol. C-30, No. 7, pp. 478–490, 1981.

J. A. Fisher. The VLIW Machine: A Multiprocessor for Compiling Scientific Code. IEEE Computer, pp. 45–53, July 1984.

M. Franklin and G. S. Sohi. The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism. Proc. of the 19th Int. Symp. on Computer Architecture, pp. 58–67, 1992.

D. H. Friendly, S. J. Patel, and Y. N. Patt. Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors. Proc. of the 31st Int. Symp. on Microarchitecture, pp. 173–181, 1998.

C.-H. Fu, M. D. Jennings, S. Y. Larin, and T. M. Conte. Value Speculation Scheduling for High Performance Processors. Proc. of the 8th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 262–271, 1998.

J. D. Gee, M. D. Hill, D. N. Pnevmatikatos, and A. J. Smith. Cache Performance of the SPEC92 Benchmark Suite. IEEE Micro, pp. 17–27, August 1993.

L. Gwennap. Intel, HP Make EPIC Disclosure. Microprocessor Report, Vol. 11, No. 14, pp. 1–9, October 27, 1997.

W. W. Hwu et al. An Effective Technique for VLIW and Superscalar Compilation. Journal of Supercomputing, Vol. 7, pp. 229–248, 1993.

Intel. IA-64 Application Developer’s Architecture Guide. Intel Corp., Order No. 245188-001, May 1999.

M. Johnson. Superscalar Microprocessor Design. Prentice-Hall, 1991.

R. E. Kessler. The Alpha 21264 Microprocessor. IEEE Micro, pp. 24–36, March–April 1999.

S. Melvin, M. Shebanow, and Y. Patt. Hardware Support for Large Atomic Units in Dynamically Scheduled Machines. Proc. of the 21st Int. Symp. on Microarchitecture, pp. 60–66, 1988.

J. H. Moreno et al. Simulation/Evaluation Environment for a VLIW Processor Architecture. IBM J. of Res. and Dev., Vol. 41, No. 3, May 1997.

M. Moudgill et al. Compiler/Architecture Interaction in a Tree-Based VLIW Processor. IBM Research Report RC20694, November 1996.

R. Nair, M. E. Hopkins. Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups. Proc. of the 24th Int. Symp. on Computer Architecture, pp. 13–25, 1997.

T. Nakra, R. Gupta, and M. L. Soffa. Value Prediction in VLIW Machines. Proc. of the 26th Int. Symp. on Computer Architecture, pp. 258–269, 1999.

D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitative Approach. 2nd Edition, Morgan Kaufmann Publishers, 1996.

B. R. Rau and J. A. Fisher. Instruction-Level Parallelism: History, Overview, and Perspective. Journal of Supercomputing, Vol. 7, pp. 9–50, 1993.

B. R. Rau. Dynamically Scheduled VLIW Processors. Proc. of the 26th Int. Symp. on Microarchitecture, pp. 80–92, 1993.

E. Rotenberg, S. Bennett, and J. E. Smith. Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching. Proc. of the 29th Int. Symp. on Microarchitecture, pp. 24–34, 1996.

E. Rotenberg, Q. Jacobson, Y. Sazeides, and J. E. Smith. Trace Processors. Proc. of the 30th Int. Symp. on Microarchitecture, pp. 138–148, 1997.

A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multiple-Block Ahead Branch Predictors. Proc. of the 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 116–127, 1996.

J. E. Smith and S. Weiss. PowerPC 601 and Alpha 21064: A Tale of Two RISCs. IEEE Computer, pp. 46–58, June 1994.

A. Sodani and G. S. Sohi. Dynamic Instruction Reuse. Proc. of the 24th Int. Symp. on Computer Architecture, pp. 194–205, 1997.

Sun Microsystems. The SPARC Architecture Manual – Version 7. Sun Microsystems Inc., 1987.

S. Vajapeyam and T. Mitra. Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences. Proc. of the 24th Int. Symp. on Computer Architecture, pp. 1–12, 1997.

T.-Y. Yeh, D. T. Marr, and Y. N. Patt. Increasing the Instruction Fetch Rate via Multiple Branch Prediction and a Branch Address Cache. Proc. of the 7th Int. Conf. on Supercomputing, pp. 67–76, 1993.