Efficient Local memory support for approximate computing

Marcelo Brandalero; Guilherme Meneguzzi Malfatti; Geraldo Francisco Oliveira; Leonardo Almeida da Silveira; Larissa Rozales Gonçalves; Bruno Castro da Silva; Luigi Carro; Antônio Carlos Schneider Beck

Marcelo Brandalero UFRGS
Guilherme Meneguzzi Malfatti UFRGS
Geraldo Francisco Oliveira UFRGS
Leonardo Almeida da Silveira UFRGS
Larissa Rozales Gonçalves UFRGS
Bruno Castro da Silva UFRGS
Luigi Carro UFRGS
Antônio Carlos Schneider Beck UFRGS

Resumo

Given the saturation of single-threaded performance improvements in General-Purpose Processors (GPPs), novel architectural techniques are required to meet emerging demands. In this paper, we propose a generic acceleration framework for approximate algorithms that replaces computation with table look-up accesses in dedicated memories. At compile time, annotated application kernels are automatically profiled using sample inputs, and the most representative input-output mappings of each kernel are selected by using K-Means Clustering and saved in the program binary. At runtime, these mappings are loaded into dedicated look-up tables, and kernel execution is replaced by hardware execution of the Nearest-Centroid Classifier, which selects from memory the best matching output to the region. We show a comparison with a similar framework based on neural acceleration and that, under similar levels of quality, the proposed approach achieves on average three times better performance and energy with significant area savings, thus opening new opportunities for performance harvesting in approximate accelerators.

Palavras-chave: Approximate computing, memoization, data clustering

Referências

A. C. S. Beck, C. A. L. Lisba, and L. Carro, Adaptable Embedded Systems. Springer Publishing Company, Incorporated, 2012.

Q. Xu, T. Mytkowicz, and N. S. Kim, “Approximate Computing: A Survey,” IEEE Design and Test, vol. 33, no. 1, pp. 8–22, 2016.

S. Mittal, “A Survey of Techniques for Approximate Computing,” ACM Computing Surveys, vol. 48, no. 4, pp. 1–33, 2016.

S. Sidiroglou-Douskos, S. Misailovic, H. Hoffmann, and M. Rinard, “Managing Performance vs. Accuracy Trade-offs with Loop Perfora- tion,” in Proc. ACM SIGSOFT Symposium and European Conference on Foundations of Software Engineering (SIGSOFT/FSE), 2011, pp. 124– 134.

M. Brandalero, L. A. da Silveira, J. D. Souza, and A. C. S. Beck, “Accelerating Error-tolerant Applications with Approximate Function Reuse,” Science of Computer Programming, 2017.

M. Brandalero, A. C. S. Beck, L. Carro, and M. Shafique, “Approximate on-the-fly coarse-grained reconfigurable acceleration for general-purpose applications,” in Design Automation Conference (DAC), 2018, pp. 1–6.

R. Hegde and N. R. Shanbhag, “Energy-efficient Signal Processing via Algorithmic Noise-tolerance,” in Proc. International Symposium on Low Power Electronics and Design (ISPLED), 1999, pp. 30–35.

D. Mohapatra, V. K. Chippa, A. Raghunathan, and K. Roy, “Design of Voltage-scalable Meta-functions for Approximate Computing,” in Proc. Design, Automation & Test in Europe (DATE), 2011, pp. 1–6.

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Neural Acceleration for General-Purpose Approximate Programs,” in Proc. International Symposium on Microarchitecture (MICRO), 2012, pp. 449– 460.

A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, and H. Es- maeilzadeh, “Neural Acceleration for GPU Throughput Processors,” in Proc. International Symposium on Microarchitecture (MICRO), 2015, pp. 482–493.

T. Moreau et al., “SNNAP: Approximate Computing on Programmable SoCs via Neural Acceleration,” in Proc. International Symposium on High Performance Computer Architecture (HPCA), 2015, pp. 603–614.

R. St. Amant et al., “General-purpose Code Acceleration with Limited-precision Analog Computation,” ACM SIGARCH Computer Architecture News, vol. 42, no. 3, pp. 505–516, 2014.

K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.

S. Chaudhuri, S. Gulwani, R. Lublinerman, and S. Navidpour, “Proving Programs Robust,” in Proc. ACM SIGSOFT Symposium and European Conference on Foundations of Software Engineering (SIGSOFT/FSE), 2011, p. 102.

A. Yazdanbakhsh, D. Mahajan, P. Lotfi-Kamran, and H. Esmaeilzadeh, “AxBench: A Benchmark Suite for Approximate Computing,” IEEE Design and Test, 2016.

N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, “CACTI 6.0: A Tool to Model Large Caches,” HP Laboratories, Tech. Rep., 2009.

S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci, “A Portable Programming Interface for Performance Evaluation on Modern Processors,” International Journal of High Performance Computing Applica- tions, vol. 14, no. 3, pp. 189–204, 2000.

J. Han and M. Orshansky, “Approximate computing: An emerging paradigm for energy-efficient design,” in Proc. European Test Sympo- sium (ETS), 2013, pp. 1–6.

S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard, “Quality of Service Profiling,” in Proc. International Conference on Software Engineering (ICSE), 2010, p. 25.

B. Grigorian, N. Farahpour, and G. Reinman, “BRAINIAC: Bringing Reliable Accuracy into Neurally-implemented Approximate Computing,” in International Symposium on High Performance Computer Architecture (HPCA), 2015, pp. 615–626.

T. Chen et al., “BenchNN: On the broad potential application scope of hardware neural network accelerators,” in Proc. International Symposium on Workload Characterization (IISWC), 2012, pp. 36–45.

M. Shafique, R. Hafiz, S. Rehman, W. El-Harouni, and J. Henkel, “Cross-layer Approximate Computing: From Logic to Architectures,” in Design Automation Conference (DAC), 2016, pp. 1–6.

P. J. Werbos, “Backpropagation through time: What it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.

A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 2010.

A. Sampson et al., “EnerJ: Approximate Data Types for Safe and General Low-Power Computation,” in Proc. Conference on Programming Language Design and Implementation (PLDI), vol. 46, no. 6, 2011, p. 164.