Exploiting Reuse with Dynamic Trace Memoization: Evaluating Architectural Issues

Amarildo T. da Costa; Felipe M. G. França; Eliseu M. C. Filho

doi:10.5753/sbac-pad.2000.41217

Amarildo T. da Costa IME / UFRJ
Felipe M. G. França UFRJ
Eliseu M. C. Filho UFRJ

DOI: https://doi.org/10.5753/sbac-pad.2000.41217

Resumo

Employing memoization tables to skip the execution of dynamic sequences of redundant instructions, our Dynamic Trace Memoization (DTM) mechanism extends the concept of instruction reuse to larger grain units. This work evaluates three critical architectural issues concerning the feasability of DTM: (i) The cost-efectiveness of trace level reuse – the paper shows that a balance between the sizes of the single instruction memoization table and the trace memoization table produces higher speedups and does not need a higher number of read ports in the trace memo table; (ii) Register file pressure – for the SPECInt95 and APECFp95 benchmark suites, DTM requires no extra read ports; (iii) Floating-point apparatus – in contrast with the speedup of 8.4% obtained by the SPECInt95, the SPECFp95 presented a speedup of 7% considering a DTM mechanism which ignores the floating-point operations.

Palavras-chave: Trace Reuse, Memoization, Instruct Reuse, Superscalar Processor

Referências

Y. Sazeides, J. E. Smith. The Predictability of Data Values. Proc. of the 30th International Symposium on Microarchitecture, 1997, pp. 248–258.

A. Sodani, G. Sohi. An Empirical Analysis of Instruction Repetition. Proc. of the 8th ASPLOS Conference, 1998, pp. 35–45.

A. Sodani, G. Sohi. Dynamic Instruction Reuse. Proc. of the 24th International Symposium on Computer Architecture, 1997, pp. 194–205.

A. Sodani, G. Sohi. Understanding the Differences Between Value Prediction and Instruction Reuse. Proc. of the 31st International Symposium on Microarchitecture, 1998, pp. 205–215.

J. Huang, D. Lilja. Exploiting Basic Block Value Locality with Block Reuse. Proc. of the 5th International Symposium on High-Performance Computer Architecture, 1999, pp. 106–115.

V. E. F. Rebello. NEUROCOM – Integrating Neurocomputing and Conventional Computing. ProTem II-CNPq, Brazil, Project Technical Report, May 1997.

A. Costa, F. França, E. Chaves. Evaluating DTM in a Superscalar Processor Architecture. ES-498/99, COPPE/UFRJ, Rio de Janeiro, Brazil, July 2000.

A. Costa, F. França, E. Chaves. The Dynamic Trace Memoization Reuse Technique. Proc. of the International Conference on Parallel Architectures and Compilation Techniques – PACT 2000, Philadelphia, PA, USA, October 2000, to be published.

D. Michie. Memo Functions and Machine Learning. Nature 218, 1968, pp. 19–22.

B. Shriver, B. Smith. The Anatomy of a High Performance Microprocessor – A Systems Perspective. IEEE Computer Society Press, 1998.

A. Gonzalez, J. Tubella, C. Molina. Trace-Level Reuse. Proc. of the International Conference on Parallel Processing, 1999, pp. 30–37.

D. Burger, T. Austin, S. Bennett. The SimpleScalar Tool Set, Version 2.0. Technical Report 1342, Computer Science Department, University of Wisconsin.

M. H. Lipasti, J. P. Shen. Exceeding the Dataflow Limit Via Value Prediction. Proc. of the 29th International Symposium on Microarchitecture, 1996, pp. 226–237.

A. Wolfe et al. Datapath Design for a VLIW Video Signal Processor. Proc. of the 3rd Symposium on High-Performance Computer Architecture, 1997, pp. 24–35.

T. Williams, N. Patkar, G. Shen. SPARC64: A 64-bit Active Instruction Out-of-Order-Execution Processor. IEEE Journal of Solid State Circuits, Vol. 30, No. 11, Nov. 1995, pp. 1215–1226.