Highly Vectorized SIKE for AVX-512

Hao Cheng; Georgios Fotiadis; Johann Großschädl; Peter Y. A. Ryan

doi:10.46586/tches.v2022.i2.41-68

Authors

Hao Cheng DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
Georgios Fotiadis DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
Johann Großschädl DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
Peter Y. A. Ryan DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg

DOI:

https://doi.org/10.46586/tches.v2022.i2.41-68

Keywords:

Post-Quantum Cryptography, Isogeny-Based Cryptography, Software Optimization, Finite-Field Arithemtic, SIMD-Parallel Processing

Abstract

It is generally accepted that a large-scale quantum computer would be capable to break any public-key cryptosystem used today, thereby posing a serious threat to the security of the Internet’s public-key infrastructure. The US National Institute of Standards and Technology (NIST) addresses this threat with an open process for the standardization of quantum-safe key establishment and signature schemes, which is now in the final phase of the evaluation of candidates. SIKE (an abbreviation of Supersingular Isogeny Key Encapsulation) is one of the alternate candidates under evaluation and distinguishes itself from other candidates due to relatively short key lengths and relatively high computing costs. In this paper, we analyze how the latest generation of Intel’s Advanced Vector Extensions (AVX), in particular AVX-512IFMA, can be used to minimize the latency (resp. maximize the hroughput) of the SIKE key encapsulation mechanism when executed on Ice Lake CPUs based on the Sunny Cove microarchitecture. We present various techniques to parallelize and speed up the base/extension field arithmetic, point arithmetic, and isogeny computations performed by SIKE. All these parallel processing techniques are combined in AvxSike, a highly optimized implementation of SIKE using Intel AVX-512IFMA instructions. Our experiments indicate that AvxSike instantiated with the SIKEp503 parameter set is approximately 1.5 times faster than the to-date best AVX-512IFMA-based SIKE software from the literature. When executed on an Intel Core i3-1005G1 CPU, AvxSike outperforms the x64 assembly implementation of SIKE contained in Microsoft’s SIDHv3.4 library by a factor of about 2.5 for key generation and decapsulation, while the encapsulation is even 3.2 times faster.

Highly Vectorized SIKE for AVX-512

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

iacr-logo