Highly Vectorized SIKE for AVX-512

Authors

  • Hao Cheng DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
  • Georgios Fotiadis DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
  • Johann Großschädl DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
  • Peter Y. A. Ryan DCS and SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg

DOI:

https://doi.org/10.46586/tches.v2022.i2.41-68

Keywords:

Post-Quantum Cryptography, Isogeny-Based Cryptography, Software Optimization, Finite-Field Arithemtic, SIMD-Parallel Processing

Abstract

It is generally accepted that a large-scale quantum computer would be capable to break any public-key cryptosystem used today, thereby posing a serious threat to the security of the Internet’s public-key infrastructure. The US National Institute of Standards and Technology (NIST) addresses this threat with an open process for the standardization of quantum-safe key establishment and signature schemes, which is now in the final phase of the evaluation of candidates. SIKE (an abbreviation of Supersingular Isogeny Key Encapsulation) is one of the alternate candidates under evaluation and distinguishes itself from other candidates due to relatively short key lengths and relatively high computing costs. In this paper, we analyze how the latest generation of Intel’s Advanced Vector Extensions (AVX), in particular AVX-512IFMA, can be used to minimize the latency (resp. maximize the hroughput) of the SIKE key encapsulation mechanism when executed on Ice Lake CPUs based on the Sunny Cove microarchitecture. We present various techniques to parallelize and speed up the base/extension field arithmetic, point arithmetic, and isogeny computations performed by SIKE. All these parallel processing techniques are combined in AvxSike, a highly optimized implementation of SIKE using Intel AVX-512IFMA instructions. Our experiments indicate that AvxSike instantiated with the SIKEp503 parameter set is approximately 1.5 times faster than the to-date best AVX-512IFMA-based SIKE software from the literature. When executed on an Intel Core i3-1005G1 CPU, AvxSike outperforms the x64 assembly implementation of SIKE contained in Microsoft’s SIDHv3.4 library by a factor of about 2.5 for key generation and decapsulation, while the encapsulation is even 3.2 times faster.

Downloads

Published

2022-02-15

How to Cite

Cheng, H., Fotiadis, G., Großschädl, J., & Ryan, P. Y. A. (2022). Highly Vectorized SIKE for AVX-512. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2022(2), 41–68. https://doi.org/10.46586/tches.v2022.i2.41-68

Issue

Section

Articles