A Case Study on Optimizing Accurate Half Precision Average

  • Kenny Peou LRI UMR CNRS 8623, Université Paris-Saclay
  • Alan Kelly NUMSCALE
  • Joel Falcou LRI UMR CNRS 8623, Université Paris-Saclay
  • Cecile Germain LRI UMR CNRS 8623, Université Paris-Saclay

Resumo


In this work, we study the numerical performance of various common algorithms used to calculate the average of an array of half precision (FP16) floating point values. While the current generation of CPUs does not support native FP16 arithmetic, it is a planned feature in a number of next-generation CPUs. FP16 arithmetic was emulated via the half software library. Due to the limitations of the FP16 data type, some algorithms proved insufficient for arrays as small as 100 elements. We propose an algorithm that allows numerically stable FP16 computation of the average and compare it to the naive floating point (FP32) algorithm in terms of both numerical precision and runtime performance. We find that our algorithm offers comparable robustness, numerical precision, and SIMD performance to the higher precision computation.
Palavras-chave: Random access memory, Registers, Clustering algorithms, Machine learning algorithms, Software algorithms, Program processors, Memory management, Half-Precision, Numerical Precision, SIMD
Publicado
24/09/2018
PEOU, Kenny; KELLY, Alan; FALCOU, Joel; GERMAIN, Cecile. A Case Study on Optimizing Accurate Half Precision Average. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 30. , 2018, Lyon/FR. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2018 . p. 356-363.