Instruction-Level Loop Perforation

  • Daniela Catelan UFMS
  • Liana Duenha UFMS
  • Ricardo Santos UFMS
  • Lucas Wanner UNICAMP


Approximate computing (AC) offers techniques ranging from application to circuit levels. AC techniques offer better performance at the cost of inaccurate results. A widely used software AC technique is loop perforation (LP). This paper presents an Instruction-Level LP (ILLP) approach that relies on approximate hardware instructions. We extended the ACCEPT compiler and SPIKE simulator workflows to generate and simulate applications with ILLP. We evaluated the technique comparing the results of precision, number of instructions, cycles, and energy consumption. ILLP achieves a 74.61% reduction in the number of instructions for the PI application, a 51.40% reduction in the number of cycles for the FFT, and an energy saving of 74.49% for the PI.


Almurib, H. A. F., Kumar, T. N., and Lombardi, F. (2016). Inexact designs for approximate low power addition by cell replacement. In Conference on Design, Automation & Test in Europe, page 660–665, CA, USA. EDA Consortium.

Boyini, K. (2022). Floyd warshall algorithm. [link].

Catelan, D., Santos, R., and Duenha, L. (2022). Evaluation and characterization of approximate arithmetic circuits. Concurrency and Computation: Practice and Experience, page e6865.

Game, B. (2022). [link].

GeeksforGeeks (2022). Dijkstra’s shortest path algorithm. [link].

Graham, S. L., Kessler, P. B., and Mckusick, M. K. (1982). Gprof: A call graph execution profiler. In Symposium on Compiler Construction, page 120–126, NY, USA. Association for Computing Machinery.

Li, S., Park, S., and Mahlke, S. (2018). Sculptor: Flexible approximation with selective dynamic loop perforation. In International Conference on Supercomputing, page 341–351, NY, USA. Association for Computing Machinery.

Moreno, A. A., Calle, F. R., and Pedraza, C. (2021). A low-cost fault tolerance method for arm and risc-v microprocessor-based systems using temporal redundancy and approximate computing through simplified iterations. Journal of Integrated Circuits and Systems, 16(3).

Reis, L. and Wanner, L. (2021). Functional approximation and approximate parallelization with the accept compiler. In IEEE 33rd International Symposium on Computer Architecture and High Performance Computing, pages 188–197.

Reis, L. O. P. (2021). Targeting broad software approximations with the accept compiler.

RISC-V (2022). Risc-v toolchain. [link].

Rodriguez-Cancio, M., Combemale, B., and Baudry, B. (2019). Approximate loop unrolling. page 94–105, NY, USA. Association for Computing Machinery.

Sampson, A., Baixo, A., Ransford, B., Moreau, T., Yip, J., Ceze, L., and Oskin, M. (2015). Accept: A programmer-guided compiler framework for practical approximate computing. University of Washington Technical Report UW-CSE-15-01, 1:1–14.

Santos, F. (2022). Cfast fourier transform. [link].

Sidirogloy-Douskos, S., Misailovic, S., Hoffmann, H., and Rinard, M. (2011). Managing performance vs. accuracy trade-offs with loop perforation. Association for Computing Machinery, pages 124–134.

Silveira, J., Castro, L., Araújo, V., Zeli, R., Lazari, D., Guedes, M., Azevedo, R., and Wanner, L. (2022). Prof5: A risc-v profiler tool. In International Symposium on Computer Architecture and High Performance Computing, pages 201–210.

SPIKE (2019). Spike risc-v isa simulator. [link].

Statescu, A. (2022). Program to compute pi using a monte carlo method. [link].
CATELAN, Daniela; DUENHA, Liana; SANTOS, Ricardo; WANNER, Lucas. Instruction-Level Loop Perforation. In: SIMPÓSIO EM SISTEMAS COMPUTACIONAIS DE ALTO DESEMPENHO (SSCAD), 24. , 2023, Porto Alegre/RS. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 37-48. DOI: