Improving FALCON’s Key Generation on ARMv8-A Platforms
Resumo
This short paper proposes two implementation techniques that may speed up the key generation routine of FALCON, a lattice-based digital signature scheme recently standardized by NIST. The first is a change in the set of primes used for splitting polynomials in NTT+RNS representation: by mixing 31-bit and 63-bit primes, we postulate that operations may be computed concurrently using both the scalar ALU (for 64-bit) and SIMD ALU (computing four 32-bit operations in a single 128-bit register). The second uses arbitrary-precision floating-point operations to compute the polynomial reduction step, which we prototype and benchmark on both Apple M1 SoC and Cortex-A72, improving on the original implementation for deeper recursion levels.
Referências
Fouque, P.-A., Hoffstein, J., Kirchner, P., Lyubashevsky, V., Pornin, T., Prest, T., Ricosset, T., Seiler, G., Whyte, W., and Zhang, Z. (2020). Falcon: Fast-Fourier lattice-based compact signatures over NTRU. Submission to the NIST Post-Quantum Cryptography Standardization Project. [link].
Hart, W. B. (2010). Fast Library for Number Theory: An Introduction. In Proceedings of the Third International Congress on Mathematical Software, ICMS’10, pages 88–91, Berlin, Heidelberg. Springer-Verlag. [link].
Kim, Y., Song, J., and Seo, S. C. (2022). Accelerating Falcon on ARMv8. IEEE Access, 10:44446–44460.
NIST (2017). Post-Quantum Cryptography. [link].
Pornin, T. and Prest, T. (2019). More efficient algorithms for the NTRU key generation using the field norm. In Lin, D. and Sako, K., editors, Public-Key Cryptography – PKC 2019, pages 504–533, Cham. Springer International Publishing.
Shor, P. W. (1997). Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput., 26(5):1484–1509.