Will You Cross the Skies Threshold for Me? Generic Side-Channel Assisted Chosen-Ciphertext Attacks on NTRU-based KEMs

. In this work, we propose generic and novel side-channel assisted chosen-ciphertext attacks for NTRU-based Key Encapsulation Mechanisms (KEM) secure in the chosen ciphertext model (IND-CCA security). Our attacks involve construction of malformed ciphertexts which, when decapsulated by the target device, ensure that a targeted intermediate variable has a very close relation with the secret key. Subsequently, an attacker who can obtain information about the secret-dependent variable through side-channels, can recover the full secret key. We propose several novel CCAs which can be carried through instantiating three diﬀerent types of oracles, namely plaintext-checking oracle, decryption-failure oracle, and full-decryption oracle, using side-channel leakage from the decapsulation procedure. Our proposed attacks are applicable to two NTRU-based schemes: NTRU and NTRU Prime. The two schemes are candidates in the ongoing NIST standardization process for post-quantum cryptography. We perform experimental validation of our proposed attacks on optimized implementations of NTRU-based schemes taken from the open-source pqm4 library, using the EM-based side-channel on the 32-bit ARM Cortex-M4 microcontroller. All our proposed attacks are capable of recovering the full secret key in only a few thousand chosen ciphertext queries to the target device on all parameter sets of NTRU and NTRU Prime. Our attacks therefore stress on the need for concrete protection strategies for NTRU-based KEMs.


Introduction
field, poses significant challenges in performing CCAs, both in a black box setting as well as in a side-channel setting.
Thus, there exists a sufficient gap in theoretical understanding in terms of how to mount CCAs over the newer variants of NTRU-based schemes. These aspects make it very interesting to develop side-channel assisted CCAs on NTRU-based schemes. Another pertinent question that arises is, even if such attacks appear to be possible, is there a significant difference in terms of the cost of side-channel CCAs on NTRU-based schemes, compared to attacks on LWE/LWR-based schemes.
To address these critical questions, we propose here the first side-channel assisted CCAs on IND-CCA secure NTRU-based schemes. Our attacks are applicable to the IND-CCA secure NTRU and NTRU Prime KEMs, the NTRU-based candidates for PKE/KEMs in the NIST Post Quantum Cryptography (PQC) standardization process. We attempt to traverse the landscape of side-channel assisted CCAs, by demonstrating practical sidechannel attacks instantiating three different types of oracles. These are the PC oracle, the DF oracle, and the FD oracle on all parameter sets of NTRU and NTRU Prime. Underlying the attacks is the key idea of building suitably chosen ciphertexts that are capable of instantiating the three types of oracles. The idea for the type of ciphertexts to be built is inspired by the work of Jaulmes and Joux [JJ00]. They proposed the first CCA that works in a black box setting on the original IND-CPA secure NTRU PKE scheme of Hoffstein et al [HPS98]. We in this work propose novel and generic adaptations of their attack to mount successful side-channel assisted CCAs on NTRU and NTRU Prime. Remarkably, all our proposed attacks only require a few thousand chosen-ciphertext queries to the target device for full key recovery with a 100% success rate and no offline analysis for key recovery. Our analysis is also backed by successful experimental validation on optimized implementations of NTRU and NTRU Prime KEM taken from the open-source pqm4 library [KRSS19], on the 32-bit ARM Cortex-M4 microcontroller using the Electromagnetic Emanation (EM) side-channel.

Contributions:
The main contributions of our work can be summarized as follows.
1. We demonstrate the first practical side-channel assisted chosen-ciphertext attacks on NTRU-based schemes. The attacks target two NTRU-based schemes, NTRU and NTRU Prime, which are final round candidates in the onging NIST PQC standardization process. It is worth noting that such attacks until now, have only been demonstrated on LWE/LWR-based schemes. Our work is the first to investigate such attacks on NTRU-based schemes.
to subvert the challenges posed by optimizations such as use of rounded ciphertexts in NTRU Prime and use of arbitrary-weight secrets in the NTRU-HRSS variant of NTRU, to perform successful key recovery.
5. We also demonstrate simple techniques to utilize side-channel leakage from the decapsulation procedure, to realize a practical plaintext-checking oracle and decryptionfailure oracle, to combine it with the capabilities of our proposed novel CCAs for efficient key recovery attacks. Since these oracles only provide binary information, the side-channel analysis relies on simple techniques and can be performed with very minimal knowledge about the target implementation.
6. We perform experimental validation of our attacks on optimized implementations of NTRU-based schemes taken from the open-source pqm4 library [KRSS19], using the EM-based side-channel on the 32-bit ARM Cortex-M4 microcontroller. All our proposed attacks are capable of recovering the full secret key in only a few thousand chosen ciphertext queries to the target device on all parameter sets of NTRU and NTRU Prime.

Availability of software
All softwares utilized for this work is placed into the public domain. They are available at https://github.com/SCACCAONNTRU/SCACCAONNTRU.

Organization of the Paper
This paper is organized as follows. Section 2 provides the necessary background by introducing the required notation and concepts as well as useful known results. Sections 3 and 4 present our proposed PC oracle-based attack. The discussion covers the attack routes on NTRU Prime and on NTRU, respectively. Sections 5 and 6 discuss our DF oracle-based and FD oracle-based SCAs, in that order. Section 7 discusses potential countermeasures against our proposed attacks. Section 8 concludes our paper.

Notation
We denote by Z/qZ or Z q , the ring of integers modulo an integer q, zero-centered in the range [−q/2, q/2 − 1] ∩ Z if q is even, or [−(q − 1)/2, (q − 1)/2] ∩ Z if q is odd. For brevity, we denote the threshold as q/2 throughout the paper, irrespective of q being even or odd. Let Z q [x]/(φ(x)) denote the polynomial ring whose reduction polynomial is φ(x). The ring elements are polynomials whose coefficients come from Z q . We use R q to denote a polynomial ring. Polynomials in R q are written in bold lower case letters. The i th coefficient of a polynomial a ∈ R q is denoted by a[i]. The multiplication of two polynomials a and b is denoted as c = a · b. A polynomial is small if its coefficients are in Z 3 := {−1, 0, 1}. A polynomial is of weight-w if exactly w of its coefficients are nonzero.
An element x ∈ R q which is sampled from a distribution D with standard deviation σ is denoted by x ← D σ (R q ). An array of bytes of an arbitrary length is denoted by B * . Byte arrays of length n are written as B n . The i th bit in an element x ∈ Z q is denoted by x i . The acquisition of a side-channel trace t corresponding to a particular operation X on an input p is denoted by t ⇐= X (p).

NTRU One-Way Function
Hoffstein, Pipher, and Silverman in 1998 [HPS98] proposed the N th order Truncated Polynomial Ring Unit (NTRU) public key encryption scheme. Its security relies on a conjectured circular security assumption, better known as the NTRU assumption or the NTRU One-Way Function, involving the factorization of polynomials in R q [HPS98].
The problem was shown to be reducible to a shortest vector problem (SVP) over a special class of lattices known as the NTRU lattices [CS97]. It is worth noting that the NTRU cryptosystem has survived cryptanalysis for almost 24 years now. This instills a lot of confidence in its security claims, despite the lack of provable security guarantees. Two candidate PKE/KEMs in the NIST PQC standardization process, namely a main finalist NTRU [CDH + 19] and an alternate finalist NTRU Prime [BBC + 20], are based on the paradigm of the NTRU cryptosystem. For clarity, we refer to the original NTRU PKE proposed in [HPS98] as NTRU-1998 whereas the finalists NTRU and NTRU Prime are referred to by their respective names throughout this paper.

NTRU Prime
NTRU Prime is a suite of two IND-CCA secure KEMs: Streamlined NTRU Prime and NTRU LPRime. The former is based on the NTRU paradigm. The latter is based upon the LPR Encrypt paradigm [BBC + 20]. We focus on the Streamlined NTRU Prime variant and, henceforth, refer to it as NTRU Prime. At its core, it contains a perfectly correct and deterministic IND-CPA secure PKE. It is defined by three parameters (n, q, w), where n and q are prime numbers and w is a positive integer with the restrictions 2n ≥ 3w, q ≥ 16w + 1, x n − x − 1 is irreducible in Z q [x].
Unlike the NTRU-1998 PKE which operates in a cyclotomic ring (Z/qZ)[x]/(x n − 1) with n = 2 k , NTRU Prime operates in the field R q := Z q [x]/(x n −x−1), which is not cyclotomic. The choice is motivated by the need to protect against potential attacks that could exploit the cyclotomic structure in lattice-based schemes [KEF20].
Algorithm 1 describes the NTRU Prime PKE. The procedure GenSmall() takes in a seed ρ ∈ B * and samples for small polynomials in R 3 , whereas GenShort uses ρ ∈ B * to sample for small weight-w polynomials from the space denoted as R sh . The procedure Round rounds every coefficient of a given polynomial to its nearest multiple of 3.
The key generation procedure NTRU_PRIME_PKE.KeyGen produces an NTRU instance h = g/(3f ) ∈ R q with g ∈ R 3 and f ∈ R sh . The secret key is formed by f and g. The public key is h ∈ R q . The encryption procedure NTRU_PRIME_PKE.Encrypt takes as input the message polynomial r ∈ R sh and generates a product-form NTRU instance c = Round(r · h) ∈ R q as the ciphertext whose coefficients are multiples of 3. The decryption procedure NTRU_Prime_PKE.Decrypt takes the ciphertext c to first compute a = 3f · c ∈ R q . The parameters are chosen to ensure that the true, that is the non-reduced, value of every coefficient a[i] for i ∈ [0, p − 1] always lies in the zero-centered range (−q/2, q/2]. A suitable choice for the parameters, leading to the a in Line 3, is key to the correctness of the decryption procedure. The resulting a is then reduced modulo 3 to yield e = g · r ∈ R 3 . The latter, upon multiplication withĝ ∈ R 3 , results in b . Subsequently, the weight of b is checked. If Weight(b ) = w, then r = b is the valid decryption output. Otherwise, the decryption output is fixed to be (1, 1, . . . , 1, 0, 0, . . . , 0) ∈ R 3 . Algorithm 1: Streamlined NTRU Prime PKE Core The NTRU Prime PKE core is only IND-CPA secure and, hence, is susceptible to CCAs. The well-known Fujisaki Okamoto (FO) transform [FO99] can convert it into an IND-CCA secure KEM. The transform instantiates NTRU_PRIME_PKE.Encrypt, NTRU_PRIME_PKE.Decrypt, and several instances of hash functions in the IND-CCA secure encapsulation and decapsulation procedures. Algorithm 2 supplies the detail. In theory, the FO transform helps check the validity of ciphertexts through a re-encryption procedure after decryption in Line 5 of NTRU_Prime_KEM.Decaps. Thus, the attacker only sees, with a very high probability, decapsulation failures for invalid ciphertexts. This provides strong theoretical security guarantees against CCAs.
NTRU performs computations over two polynomial rings S k := Z k [x]/(φ n ) and T k := Z k [x]/(φ 1 φ n ). It offers parameter sets that fall into two broad categories, namely, NTRU-HPS and NTRU-HRSS. While they share several unified design choices, there are notable differences. NTRU-HPS selects coefficients from fixed-weight sample spaces, similar to the We refer the reader to [CDH + 19] for the respective details of both variants. Without loss of generality, we use the NTRU-HPS PKE to describe the procedures of the NTRU PKE core in Algorithm 3. The procedure Sample_fg() takes in a seed ρ ∈ B * and samples the secret polynomials f , g ∈ R p where p = 3. The key generation procedure NTRU_PKE.KeyGen produces an instance h = 3g/(f ) ∈ T q , with (f , g) forming the secret key and h ∈ T q forming the public key. We highlight here the change in position of the multiplier 3 in h compared to its position in NTRU Prime, where h = g/3f ∈ R q .
The encryption procedure NTRU_PKE.Encrypt takes a random r ∈ L r and a message m ∈ L m as input to generate the ciphertext c as h·r+Lift(m) ∈ T q as shown in Line 3. The decryption procedure NTRU_PKE.Decrypt uses the ciphertext c to compute a ∈ f · c ∈ T q in Line 7. Just like in NTRU Prime, the true value of every coefficient of a is in Z q . This is the key to the perfect correctness of the NTRU PKE. Subsequently, a ∈ T q is reduced modulo S 3 and multiplied with f p to compute the message polynomial m , which is then used to recover the random polynomial r in Lines 10 and 11. Line 12 says that the decryption procedure returns the polynomial pair (r , m ) as the decryption output only if (r , m ) ∈ (L r × L m ). Otherwise, it returns the fixed value (1, 1). The decryption procedure also generates a single bit denoted as f ail which denotes success or failure of decryption, where f ail = 0 denotes success, and failure otherwise.

IND-CCA Secure NTRU KEM
Unlike NTRU Prime KEM and several other LWE/LWR-based KEMs, NTRU KEM achieves IND-CCA security without re-encryption, since the underlying NTRU PKE core achieves the Bernstein-Persichetti rigidity [BP18]. This makes the decapsulation procedure of NTRU among the fastest compared to other lattice-based KEMs. Algorithm 4 gives the encapsulation and decapsulation procedures of NTRU KEM. They instantiate NTRU_PKE.Encrypt and NTRU_PKE.Decrypt, respectively, along with several instances of hash functions.

Side-Channel assisted CCAs on LWE/LWR-based schemes
While IND-CCA secure KEMs are theoretically secure against CCAs, their security properties are only valid as long as an attacker is unable to obtain any information about the intermediate variables in the decapsulation procedure. Side-channel leakage that reveals sensitive information about any of the variables can lead to serious security flaws. The most severe outcome is a complete recovery of the secret key.
KEMs based on the LWE/LWR problem have been subjected to several side-channel assisted CCAs [DTVV19,RRCB20,GJN20]. Their modus operandi starts with the attacker constructing specially structured ciphertexts. When decrypted/decapsulated, the ciphertexts ensure that a certain intermediate variable, referred to as the anchor variable, bears a very close relation with a targeted portion or, in the best scenario for the attacker, the complete secret key. CCAs on IND-CPA secure LWE/LWR-based schemes have revealed the efficacy of specially constructed ciphertexts to turn the decrypted message into an anchor variable. Once the attacker recovers the value of the anchor variable for the chosen end ciphertexts using side-channels, the full secret key can be recovered. Based on the type and amount of side-channel information available, we categorize the existing attacks on LWE/LWR-based schemes into the following three categories.

Plaintext-Checking Oracle-Based SCA
The attacker constructs chosen ciphertexts such that the anchor variable only assumes a very small number of possible values known to the attacker. Each possible value exclusively depends on a targeted portion of the secret key. An attacker who can utilize side-channels to retrieve the value of the anchor variable realizes an artificial Plaintext-Checking (PC) oracle. Its responses can then be used to recover the full secret key. For LWE/LWR-based schemes such as Kyber and Saber, the decrypted message for chosen ciphertexts can be restricted to two values. These are m = 0, on the occurrence of the all-zero bit string m 0 , and m = 1, on the occurrence of the string m 1 whose entries are all 0 except at the least significant bit, where the entry is 1. Side-channels such as timing and electromagnetic emanation have been shown to be efficiently exploited to realize a PC oracle in IND-CCA secure schemes whose binary responses m ∈ {0, 1} can recover the full secret key in a few thousand chosen-ciphertext queries to the target decapsulation device [DTVV19, RRCB20].

Decryption-Failure Oracle-Based SCA
The second class of attacks perform key recovery by exploiting side-channels to obtain information about decryption failures for the attacker's chosen ciphertexts. Crafted errors are added to a valid ciphertext to trigger decryption failures. Whether m = m valid or m invalid depends upon a targeted portion of the secret key. Similar to the PC oracle-based SCA, side-channels can detect decryption failures. This realizes a Decryption-Failure (DF) oracle whose responses can recover the full secret key. Guo, Johansson, and Nilsson in [GJN20] exploited timing side-channel information from non-constant time ciphertext comparison in Frodo KEM to detect decryption failures. Subsequently, Bhasin et al. in [BDH + 21] exploited EM side-channel vulnerabilities in several masked ciphertext comparison approaches to realize a DF oracle in Kyber KEM. Both attacks were capable of performing full key recovery with several thousand chosen ciphertext queries to the target device.
As can be seen, both PC oracle and DF oracle-based SCA only extract binary information about the anchor variable through side-channels. Thus, these attacks can be carried out with a relatively simple attack setup and does not pose stringent requirements on the Signal to Noise Ratio (SNR) for trace acquisition. Moreover, the analysis is also fairly simple and can be performed with very limited knowledge of the target implementation.

Full-Decryption Oracle-Based SCA
While the PC oracle and DF oracle attacks only extract binary information (1-bit) about the anchor variable through side-channel traces, they typically require a few thousand chosen-ciphertext queries to the target device for full key recovery, especially given the size of secrets used in lattice-based KEMs. This raises a natural question about the possibility of more efficient attacks with a more powerful oracle to gather more than just binary information about the decrypted message. In this direction, Xu et al. [XPRO20] showed that an attacker who can obtain a complete knowledge of the decrypted message m for chosen ciphertexts can effectively run the CCA in parallel mode, resulting in full key recovery in only a handful of traces/queries. They showed how to perform full key recovery using only 8 to 16 [NDGJ21]. Table 1 lists side-channel assisted CCAs on IND-CCA secure LWE/LWR-based schemes based on their oracle types.
While the above attacks work on IND-CCA secure LWE/LWR-based KEMs, they do not extend trivially to NTRU-based KEMs. This is because the underlying arithmetic of schemes based on the LWE/LWR paradigm is vastly different compared with schemes in the NTRU paradigm. Mounting similar side-channel attacks in a chosen-ciphertext setting on NTRU-based schemes has been an open problem. Even if nontrivial extension of such attacks can be carried out, the comparative cost of attacking NTRU-based KEMs in a chosen-ciphertext setting is previously unknown. To address these two questions we exhibit the first side-channel assisted CCAs on NTRU-based schemes. We demonstrate that our proposed attacks are practical, generic, and are capable of exploiting all three different    Figure 1 gives a classification of the various side-channel assisted CCAs attacks on lattice-based KEMs. Ours are highlighted in red.

CCAs on NTRU-based schemes
Given the existence of the NTRU cryptosystem for almost 24 years now, several CCAs have been proposed on different variants of the NTRU PKE cryptosystem. Jaulmes and Joux [JJ00] presented the first CCA on the unpadded version of NTRU-1998 PKE. Their attack requires knowledge of the full decryption output (i.e.) FD oracle and can recover the full secret key with a handful of ciphertexts. They also present an adaptation of their attack to the OAEP-like padding scheme, which works only with the knowledge of decryption failures (i.e.) DF oracle for key recovery. Hoffstein and Silverman also presented CCAs using the DF oracle on the unpadded NTRU-1998 PKE [HS99].
Han et al. [HHHK03] subsequently presented very efficient CCAs based on the FD oracle, on optimized variants of unpadded NTRU-1998 PKE and their proposed attacks utilize chosen ciphertexts that are completely pre-computed offline, independent of the previous outputs. While the aforementioned attacks utilize invalid/maliciously crafted ciphertexts, another class of CCAs exploit decryption failures for valid ciphertexts [HGNP + 03, GN07]. While these attacks apply to variants of NTRU cryptosystem with non-negligible decryption failure rate, they are not relevant to the more recent variants, and in particular, the NIST PQC candidates NTRU and NTRU Prime, as they are based on perfectly correct PKE.
More recently, Ding et al. [DDSV19] presented a novel CCA on the NTRU-1998 PKE using the DF oracle. While this attack with trivial modifications can be adapted to the NTRU-HPS parameter set of NTRU assuming a PC oracle, Zhang et al. [ZCQD21] showed that it does not work on the NTRU-HRSS variant of NTRU, due to the use of secrets with arbitary weight. They adapt the attack of Ding et al. [ZCQD21] to the NTRU-HRSS scheme, but the improved technique can only recover 93.6% of the keys. Thus, a CCA against the NTRU-HRSS scheme that works with a 100% success rate is not known. Moreover, to the best of our knowledge, we are also not aware of a CCA on NTRU Prime. As we later show in the paper, mounting CCAs on NTRU Prime is especially challenging, given its use of rounded ciphertexts and conditional checks on the decrypted message.
In this work, we improve upon the CCA proposed by Jaulmes and Joux [JJ00] and propose generic and novel adapatations to the NIST PQC candidates NTRU and NTRU Prime. Remarkably, our proposed attacks can perform key recovery with 100% success rate on all parameter sets of NTRU and NTRU Prime, assuming the presence of a suitable oracle. To the best of our knowledge, we therefore present the first CCA on IND-CPA secure PKE of NTRU-HRSS and NTRU Prime, that works with a 100% success rate. We also extend the same attacks to the side-channel setting, to propose the first side-channel assisted CCAs on IND-CCA secure NTRU and NTRU Prime KEM.

Test Vector Leakage Assessment
The Test Vector Leakage Assessment (TVLA) from [GJJR11] is a popular conformancebased methodology in side-channel analysis. It has been widely used in both academia and industry to evaluate cryptographic implementations. TVLA computes the univariate Welch's t-test over two given sets of side-channel measurements to identify their differentiating features. By testing for a null hypothesis that the mean of the two sets is identical, a PASS/FAIL decision is made. The TVLA formulation over measurement sets T r and T f is given by where µ r , σ r , and m r (resp. µ f , σ f , and m f ) are the mean, standard deviation and cardinality of the trace set T r (resp., T f ). The null hypothesis is rejected with a confidence of 99.9999% only if the absolute value of the t-test score is > 4.5 [GJJR11]. A rejected null hypothesis implies that the two trace/data sets are different and might leak some sidechannel information and, hence, is considered a FAIL test. The threshold was later shown to depend on the length of the side-channel trace [DZD + 17]. We choose the threshold as 5 based on our experimental settings. While TVLA is mainly used as a metric for side-channel evaluation, it has also been used as a tool for feature selection in multiple cryptanalytic efforts [RJJ + 18]. Here we use TVLA as a tool for feature selection from side-channel measurements [GLRP06].

Plaintext-Checking Oracle-Based SCA
We primarily use NTRU Prime, instead of NTRU, to describe our PC-oracle attack. The former comes with complications that arise due to the use of rounded ciphertexts. Once we have described the attack on NTRU Prime, we adapt it to NTRU. Our attack works in two phases. We construct malicious ciphertexts and, subsequently, utilize side-channel information from the decryption of these malicious ciphertexts to perform key recovery.
1. Pre-Processing Phase: We search for a ciphertext that, when decrypted, leads to what we refer to as a single collision event. We query the decapsulation device with specially crafted ciphertexts and analyze their side-channel leakage to detect the event. Such a ciphertext is called a base ciphertext, denoted by c base . We use it to infer crucial information about the secret polynomials f and g.

Key Recovery Phase:
We use the base ciphertext to construct new attack ciphertexts. They are built in such a way that, upon decryption, their corresponding internal variable e, in Line 4 of NTRU_Prime_PKE.Decrypt procedure in Algorithm 1, can only belong to either one of two exclusive classes, namely, e = 0 or e = 0 with a single nonzero coefficient. Moreover, the value of e depends on a targeted portion of the secret key. We exploit side-channel leakage from the operations that manipulate e to obtain information about its value and devise a practical PC oracle. The oracle's responses (e = 0 or e = 0), obtained for several attack ciphertexts, are used to recover the full secret key.

Pre-Processing Phase: Retrieving the Base Ciphertext c base
Our construction of chosen ciphertexts is inspired by the attack of Jaulmes and Joux on NTRU-1998 in [JJ00]. We start with an intuition for the approach before proposing a concrete methodology. The notation used is from Algorithm 1 of NTRU Prime.

Intuition
We first analyze the effect of decrypting c = k + k · h, where k ∈ Z + , by looking at a = 3f · c in Line 3 of NTRU_Prime_PKE.Decrypt procedure as The . We now choose a suitable positive integer k, with 3 | k, based on the conditions 4k > q/2 and s · k < q/2 for s ∈ [0, 3]. (3) For the sake of explanation, let f and g only collide at the the i th coefficient with the value of +1. Hence, a has the coefficients . When a is reduced modulo q and zero-centered in (−q/2, q/2], all coefficients, except for a[i], retain their true value and remain a multiple of 3. This is because every time a[i] crosses the q/2 threshold, that is, whenever a[i] > q/2, and upon subsequent reduction modulo q, we subtract the prime q from a[i]. More explicitly, Subsequently, e = a mod 3 ∈ R 3 is nothing but The approach ensures that a[i] crosses the q/2 threshold only during a collision. When there is no collision, a[i] < q/2. Thus, for a choice of k in Equation (3), e[i] = 0 signifies a collision at i, while all other coefficients remain zero.
The same scenario applies when the collision value is −1. Subsequently, a[i] < −(q/2) and, hence, when q is added to a[i] to zero-center it in the range [−q/2, q/2], the corresponding e[i] = 0, implying a collision at i. Henceforth, to avoid repetitions, we focus only on collision with the highest positive value of +1. The same analysis holds for the lowest negative value of −1.
In our attack, it would be ideal to have a single collision between f and g, resulting in an e that has a single nonzero coefficient. For illustration, we use one particular parameter set of NTRU Prime. Our choice falls on sntrup761 whose (n, q, w) = (761, 4591, 286). We denote by ρ single the probability of a single collision between f and g for sntrup761. We denote by ρ the probability of a collision at any given coefficient and by the probability of a collision between f and g with a matching coefficient of either −1 or 1. For f ∈ R sh and g ∈ R 3 , we get ρ match := (w/3n) and, hence, ρ match ≈ 0.125 for sntrup761. The probability of a single collision between f and g is This value is impractically low at 8 · 10 −43 . We require better choices for the ciphertexts to limit the number of collisions and, thus, the number of nonzero coefficients in e.

Constructing Ciphertexts for Single Collision
We split the value of a in Equation (2) into where t 1 = f and t 2 = g. To limit the number of collisions between t 1 and t 2 we make a generic choice for c. This choice is where both d 1 and d 2 are polynomials with, respectively, m and n nonzero coefficients (±1). The corresponding a = 3f · c is given by where t 1 = d 1 ·f and t 2 = d 2 ·g. The product of a polynomial d with with all coefficients in {−2, −1, 0, 1, 2}. We denote the resulting product by Rotp R (d, i) and refer to it informally as the rotation of d by i degrees. Thus, is the sum of rotations of f by varying degrees, governed by {i 1 , i 2 , . . . , i m }. Similarly, t 2 is the sum of rotations of g by the degrees in {j 1 , j 2 , . . . , j n }. A collision occurs at index i only if all the corresponding coefficients of Rotp R (f , u), for u ∈ {i 1 , i 2 , . . . , i m }, and Rotp R (g, v), for v ∈ {j 1 , j 2 , . . . , j n }, are either +2 or −2. We observe that the probability of collisions quickly degrades as (m, n) increase.
For the choice of c in Equation (8), the maximum possible value for a[i] in Equation (9) is (3k 1 · 2m + k 2 · 2n), which is obtained upon a collision. We therefore choose (k 1 , k 2 ) that satisfy three conditions: with 0 ≤ r ≤ 2m and 0 ≤ s ≤ 2n. In other words, we choose (k 1 , k 2 ) such that a[i] > q/2 only when there is a collision at i, while a[i] < q/2, otherwise. Thus, e[i] = 0 for a collision at i and e[i] = 0, otherwise. Summarizing the above discussion, we select values for (m, n) and (k 1 , k 2 ) for our chosen ciphertexts in the form of Equation (8). The choice for (m, n) ensures that a single collision takes place with a high probability. Given (m, n), we then choose (k 1 , k 2 ) which satisfies the conditions in Equation (12) such that e[i] = 0 indicates a collision at the i th coefficient. The concrete values for (m, n) and (k 1 , k 2 ) can be fixed for a given parameter set of NTRU Prime.

Additional Challenge: Use of Rounded Ciphertexts
The encryption procedure of NTRU Prime generates ciphertexts whose coefficients are rounded to the exact multiples of 3 (line 3 of NTRU_Prime_PKE.Encrypt procedure). Thus, the scheme proposes to send only the quotient of each coefficient upon division by 3, thereby reducing ciphertext size. Thus, every coefficient of the received ciphertext is multiplied by 3 by the decryption procedure. However, our chosen ciphertexts (Equation (8)) are not exact multiples of 3 and thus need to be rounded, which introduces a rounding noise denoted as m ∈ R 3 . Thus, the actual value of our chosen-ciphertext used in decryption is given by The corresponding a = 3f · c is where s := (3k 1 · d 1 · f + k 2 · d 2 · g) is the signal component while n := 3f · m is the noise component. But, m ∈ R 3 and f ∈ R sh are small polynomials, making the size of noise much smaller in comparison to the range q.
For the parameter set sntrup761, Figure 3 shows the distribution of the coefficients n[j] for j ∈ [0, n − 1] of n. It is Gaussian with mean 0 and σ ≈ 57, which is much less than q = 4591. The noise polynomial n = 3f · m is a multiple of 3 and gets rounded to 0 when a is reduced modulo 3. However, when n is added to coefficients of a near q/2, the noise is capable of giving rise to a false positive or a false negative collision. For a given choice of (m, n) and (k 1 , k 2 ), the largest possible value of a coefficient of a is denoted by m 1 := (3k 1 · 2m + k 2 · 2n). The next largest value is denoted by m 2 . As stated in Equation (12), we choose values for (k 1 , k 2 ) such that m 1 > q/2 and m 2 < q/2. Let 0 ≤ r ≤ 2m and 0 ≤ s ≤ 2n. Let dm 1 (resp. dm 2 ) denote the distance between m 1 (resp. m 2 ) from q/2, where dm 1 = (3k 1 · 2m + k 2 · 2n) − q/2 and  Though the rounding noise n cannot be removed, the possibility of a false positive or negative collision can, however, be minimized by placing additional constraints in choosing the tuple (k 1 , k 2 ). Along with constraints for (k 1 , k 2 ) in Equation (12), we choose the tuple that maximizes the distance dm 1 (resp. dm 2 ) for m 1 (resp. m 2 ) to prevent the noise coefficient n[j] from growing large enough to push a[j] to the other side of q/2, which is when an error occurs in the value of e. As long as the error n[j] does not push a[j] to the other side of q/2, there will be no error in e. In other words, m 1 and m 2 should lie as far as possible on either side of the threshold q/2. This additional constraint in the choice of (k 1 , k 2 ) is simply to maximize the distance tuple (dm 1 , dm 2 ).
For sntrup761 of NTRU Prime, we empirically choose (m, n) = (0, 4) and (k 1 , k 2 ) = (0, 306). We stress that other values can also be chosen to construct ciphertexts corresponding to single collision. Table 2 lists the concrete values of (m, n), (k 1 , k 2 ) and the corresponding distance tuple (dm 1 , dm 2 ) for different parameter sets of NTRU Prime. These values can be chosen beforehand for any given parameter set.

Detecting Collision through Side-Channels
Given (m, n) and (k 1 , k 2 ), we randomly select polynomials d 1 and d 2 in Equation (8) until we arrive at a ciphertext c that has a single nonzero coefficient for e. Since e is an internal variable, it is not possible to classically obtain information about its value. Hence, we utilize side-channel to identify e = 0. This leads to a classification problem with two classes, namely e = 0 and e = 0. For e = 0, Line 5 of the NTRU_Prime_PKE.Decrypt procedure implies b = e ·ĝ = 0 and, hence, Weight(b ) = w b = 0. For e = 0 with a single nonzero coefficient, however, b = 0 with uniformly random coefficients in {−1, 0, 1} and, hence, w b = 0. Although the exact value depends on the secret polynomial g, the average value of w b is 500 for sntrup761. The large weight difference between the two classes should be easily distinguishable through the EM side-channel. The same applies to other parameter sets of NTRU Prime.
In our experiments, we ran the optimized implementation of sntrup761 from the open-source pqm4 library [KRSS] on the STM32F4DISCOVERY board (DUT) housing the STM32F407, ARM Cortex-M4 microcontroller. The implementation, compiled with the following options -O3 -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16, was clocked at the maximum clock frequency of 168 MHz. EM measurements were observed from the DUT using a near-field probe and processed using a Lecroy HD6104 oscilloscope at a sampling rate of 500MSam/sec. Figure 4 shows our experimental setup to perform EM trace acquisition. We adopt the Welch's t-test to detect a collision for a chosen ciphertext.

Welch's t-test for Collision Detection:
Due to the large difference in weights, we focus on capturing EM signals from the weight calculation operation in Line 6 of the decryption procedure NTRU_Prime_PKE.Decrypt. We first obtain T replicated measurements from the decryption of c = 0, which corresponds to e = 0. The trace set is denoted by T O .
To test if a given ciphertext c results in a collision, we similarly obtain T replicated measurements from the decryption of c , which is denoted by T X . Let T = T O ∪ T X . We now perform the Welch's t-test between T O and T X .
• We center each trace t i ∈ T by removing the mean and dividing by its standard deviation to obtain t i .
• We compute the Welch's t-test between the normalized traces in T O and T X based on Equation (1). If there are several peaks well above the t-test threshold of ±5, then e = 0 for c . Otherwise, e = 0. Figure 5(a) depicts the t-test plot if e = 0 for c on T = 10 replicated measurements. As can be clearly seen, we do not observe any significant peaks about the above the t-test threshold of ±5. It is possible that there are a few points bordering the threshold or marginally exceeding it. We performed an examination of the internal registers and the control flow to identify any change in behaviour that could result in t-test values bordering the threshold. But, we were unable to identify any discernible change in the state of the device. Thus, those points with bordering t-test values can be safely ignored. Figure 5(b) corresponds to e = 0. We can identify several peaks, well above the threshold, which clearly indicates e = 0.
As we can see, leakage detection for identification of e = 0 does not assume any knowledge about the implementation of the decapsulation procedure. In the worst case, the attacker only requires to know the location of the targeted operations within the decryption procedure, but as shown in previous works [ACLZ20,NDGJ21], it is also possible to identify operations within the decapsulation procedure, through visual inspection.
We repeat this test for different choices of (d 1 , d 2 ) until we obtain one for which e = 0, indicating a possible collision. There is a chance that this collision, instead of being a valid one, is a false positive. Moreover, our technique only realizes a binary oracle that can distinguish between e = 0 and e = 0. Thus, we do not know the number of non-zero coefficients in e. If we identify a tuple (d 1 , d 2 ) that corresponds to e = 0, we simply proceed to the key recovery phase of the attack. For a faulty base ciphertext with a single false collision or multiple collisions, key recovery cannot be performed correctly. Thus, we simply need to repeat the attack until the correct key is recovered.
We denote the ciphertext corresponding to e = 0 as c base . For analytical purpose, we assume that the ciphertext c base that corresponds to e = 0 has a single non-zero coefficient at index i. We us (d1 att , d2 att ) to denote the tuple (d 1 , d 2 ), with m and n nonzero coefficients, respectively, that corresponds to c base as . The ciphertext c base is, therefore, Upon retrieval of c base , we proceed to the second phase of the attack, which is the key recovery phase.

Key Recovery Phase
Attack Overview: The key recovery phase works by constructing new attack ciphertexts using (d1 att ,d2 att ), which when decrypted result in only two possible values for e: (1) e = 0 and (2) e = 0 with e[i] = 0 where i is the index of the single collision. The value of e depends on the value of a targeted coefficient of f . This binary information obtained using side-channels over several chosen ciphertexts leads to a complete recovery of f one coefficient at a time.

Attack Methodology
We build, using (d1 att ,d2 att ), the ciphertext where 1 , 2 , 3 ∈ Z + , u ∈ [0, n − 1], and c base = 1 · d1 att + 2 · d2 att · h. Let the error introduced due to rounding be m ∈ R 3 . Thus, a = 3f · c att is given by where n is the noise term 3f · m . Please note that this noise term n is different from the noise term of the base ciphertext c base . For the sake of explanation, we assume that d1 att and d2 att collide at i with a value of +2. Thus, the coefficients of a can be expressed as In particular, given a constant δ := 3 1 · 2m + 2 · 2n + n[i], we can represent the coefficient of a at the colliding index i as

Thus, a[i] is linearly dependent on
Based on the rotational property of polynomial multiplication mod (x n − x − 1) in Equation (10), we know that By simply changing the rotation index u we can ensure the dependency of a[i], that is, the colliding index i, with different coefficients of the secret polynomial f . For a given u, the five values in {−2, −1, 0, 1, 2} are possible candidates for β u . Our task is, therefore, to select values for ( 1 , 2 , 3 ) such that the occurrence of a[i] > q/2 and therefore e[i] = 0, acts as a binary distinguisher capable of identifying every candidate for β u . To distinguish β u = +2, for example, we choose integers 1 , 2 , 3 multiples of 3, that satisfy the condition 3 1 · r + 2 · s + 3 3 · β u > q/2, if r = 2m, s = 2n, and β u = 2, < q/2, otherwise, Effect of Rounding Error: Some rounding error n is present on a. Adopting a similar strategy to the one in Section 3.1.3, we select ( 1 , 2 , 3 ) that minimize the possibility of a false positive or a false negative in the collision. For distinguishing β u = 2, the tuple must satisfy Equation 21. At the colliding index when β u = 2, the largest possible coefficient of a is m 1 := 3 1 · 2m + 2 · 2n + 3 3 · 2 > q/2. Let the second largest value be m 2 < q/2 and the distance between m 1 (resp. m 2 ) and q/2 be dm 1 (resp. dm 2 ). The values for Either e = 0 or e = 0 Secret Coeffs.
( 1 , 2 , 3 ) (0, 279, 42) (0, 237, 84) (0, 279, −42) (0, 237, −84) ( 1 , 2 , 3 ) should be chosen so as to maximize the distance dm 1 and dm 2 , where dm 1 = (3 1 · 2m + 2 · 2n + 3 3 · 2) − q/2 and dm 2 = max (r,s,t) =(2m,2n,2) In other words, we should give enough leeway to ensure that the possible error n[i] does not push a[i] to tho other side of q/2. The same must be done for all choices of ( 1 , 2 , 3 ) that are used to distinguish every candidate for β u . Similar to the tuple (m, n) and (k 1 , k 2 ) in Subsection 3.2, the tuple ( 1 , 2 , 3 ) can be chosen ahead and fixed for a given parameter set of NTRU Prime. Table 3 is the decision table for the sntrup761 parameter set. It shows unique distinguishability for every candidate for β u ∈ {−2, −1, 0, 1, 2}, based on O or X for chosen ciphertexts constructed using concrete values for the ( 1 , 2 , 3 ) assuming a collision with a value of +2. The responses for β u = +1 (resp. +2) can be swapped with β u = −1 (resp. −2) if the collision value is −2. Every candidate for β u = Rotp R (f , u)[i] can be uniquely identified based on the information about O or X from only upto four chosen ciphertext queries. We note that certain candidates such as +1 and +2 only require 2 queries to be identified (going from left to right), 0 can be uniquely identified in 3 queries, while −1 and −2 require all 4 queries. Thus, we can adopt such a greedy approach to identify the value of β u in a more optimized manner. Table 2 supplies the concrete values of the tuple ( 1 , 2 , 3 ) and the corresponding distance tuple (dm 1 , dm 2 ), chosen for our attack on different parameter sets of NTRU Prime. We use the notation ( x1 , x2 , x3 ) to denote the tuple used to distinguish x ∈ {1, 2}. While these are specific parameters we used for our attack, we would like to emphasize that there are several other values for ( 1 , 2 , 3 ) which can be chosen to construct attack ciphertexts for key recovery.
Since e is an internal variable, we use side-channel information to distinguish between the classes O and X. As seen in Subsection 3.1.4, we used the Welch's t-test to identify if e = 0 to retrieve the base ciphertext c base . The peaks in the t-test plot above the pass/fail threshold of ±5 in Figure 5(b) are precisely the features that identify e = 0. In the following discussion, we demonstrate techniques to leverage the identified features in the t-test plot to build templates for the two classes O and X. The templates will then be used to classify a given single trace into either of the two classes.

Classification using Reduced Templates
We select features of the t-test plot between T O (e = 0) and T X (e = 0) whose absolute t-test value is greater than a certain chosen threshold T h sel as our set P of Points of Interest (PoI). A reduced trace set T O or T X is constructed by using points in P. We choose a greater threshold than ±5 for better distinguishability. For the t-test results in Figure 5, we set ±7 as the larger threshold. This threshold is a parameter of the attack setup. We subsequently calculate the respective means m O,P and m X,P of T O and T X to use as the reduced templates for each class.
A single trace t for classification is normalized such that t = t − t to obtain a reduced trace t P . The sum-of-squared difference Γ * of the trace is computed with each reduced template The trace t falls into the class that corresponds to the least sum-of-squared difference. A single power/EM trace of the targeted operation is sufficient to distinguish between X or O. Thus, single side-channel traces from the decryption of chosen ciphertexts constructed according to Equation (17) can recover β u = Rotp R (f , u) [i]. Figure 6 visualizes the matching of a section of the reduced trace tr with the reduced templates of the respective classes O and X. There is a clear distinguishability between the reduced templates of the two classes, leading to a classification with 100% success rate.

Recovering the Full Secret Key
We have thus demonstrated recovery of a single coefficient β u = Rotp R (f , u) [i]. By simply changing the rotation index u, we can recover Rotp R (f , u)[i] for all u ∈ [0, n − 1]. However, recovering the exact value of the secret polynomial f requires knowledge about (1) the colliding index i and (2) the collision value (either +2 or −2), both of which cannot be inferred through side-channels using our technique. Thus, we need to try out all n possible colliding indices i ∈ [0, n − 1] as well as the two possible collision values ±2. This amounts to 2n choices for f . For sntrup761, 2n = 1, 522. For each choice, we compute the secret key f and check if f ∈ R sh and also attempt to decrypt known ciphertexts. We empirically verified for all parameter sets, that the search space is reduced drastically to only a handful of possibilities (≈ 10), upto a certain rotation of f . It is possible that none of the guessed f turns out to be correct. This could be due to two reasons. Firstly, the rounding noise n within the attack ciphertexts c att could be large enough to induce errors in e, which inturn results in erroneous oracle's responses. Secondly, the chosen base ciphertext c base has multiple collisions, which again results in erroneous oracle's responses. In these cases, we simply reject the current (d1 att , d2 att ) and initiate a search for a new pair before repeating the attack until the correct f is recovered.
We observe that failed iterations of the attack significantly impact the attack's cost, with respect to the number of traces for key recovery. In this respect, we can adopt a few optimization approaches to reduce the impact of failed iterations, particularly in the key recovery phase. If the side-channel oracle's responses do not match the expected responses in the decision table, then the key recovery phase can be immediately aborted, to restart the pre-processing phase for a fresh base ciphertext. Similarly, if the recovered values of β u for u ∈ [0, n − 1] appear to be very skewed and do not follow the expected distribution, here again the key recovery phase can be immediately aborted, to restart the attack. We summarize the attack flow of our PC oracle-based SCA on NTRU Prime in Figure 7.

Experimental Results
We implemented our proposed PC oracle-based SCA on the optimized implementation of sntrup761 from the pqm4 library [KRSS]. The pre-processing phase to identify c base , took on average, 39 attempts. The number of attempts denoted as A also includes failed attack iterations. Each attempt requires the capture of N = 10 traces to carry out the Welch's t-test for leakage detection. Thus, it takes A · N ≈ 390 traces to identify c base , which is denoted as t base . The subsequent attack phase requires up to 4 chosen-ciphertext queries, (i.e.) up to 4 traces, to recover one coefficient. The secret polynomial f contains n = 761 coefficients and we denote the traces required in the attack phase as t attack . Thus, we require t total = t base + t attack ≈ 3269 traces for complete recovery of f . Our attack works with a success rate of about 100% with no remaining brute force or offline analysis.
We also successfully verified our attack methodology using a simulated PC oracle on other parameter sets of NTRU Prime. Table 4 gives the estimated trace complexity of our attack for different parameter sets of NTRU Prime where the numbers are estimated with N = 10 for the pre-processing phase. We can see that 4700 traces is enough for full key

PC Oracle-based SCA on NTRU
In this section, we adapt our PC oracle-based SCA on NTRU Prime KEM to NTRU KEM. The notation used is from the IND-CPA secure NTRU PKE described in Algorithm 3. Since our attack applies in the same manner to both NTRU-HPS and NTRU-HRSS, we primarily use NTRU-HPS for description of the attack, but also provide details on those aspects that differ for NTRU-HRSS, wherever necessary.

Pre-Processing Phase
Let k 1 , k 2 ∈ Z + . We construct the chosen ciphertext c as where 3 | k i , for i ∈ {1, 2}, and d 1 and d 2 are polynomials with, respectively, m and n nonzero coefficients taking the value of +1. The corresponding a = f · c ∈ T q in Line 7 of NTRU_PKE.Decrypt procedure is given by where t 1 = d 1 · f and t 2 = d 2 · g ∈ T q . The polynomial t 1 (resp. t 2 ) in the cyclotomic ring T = Z[x]/(x n − 1) is the sum of exact rotations of the secret polynomial f by varying degrees, that is, for u ∈ {i 1 , i 2 , . . . , i m } (resp. v ∈ {j 1 , j 2 , . . . , j n }). Thus, a collision at index i occurs when all the corresponding coefficients of the rotations of f and g have a value of +1 or −1. We choose (m, n) to maximize the probability of a single collision and, then, proceed to choose (k 1 , k 2 ) such that a collision at index i results in a[i] > q/2 while keeping a[i] < q/2 when there is no collision. From Equation (24), we observe that the absolute maximum value of a coefficient of a upon collision is a[i] = k 1 · m + 3k 2 · n. Thus, we choose (k 1 , k 2 ) such that While the above analysis applies for NTRU-HPS, NTRU-HRSS uses g = g · φ 1 with the coefficients of g coming from {−1, 0, 1}. Hence, the coefficients of the secret polynomial g are elements in {−2, −1, 0, 1, 2}. Thus, the absolute maximum value possible for a coefficient of a is a[i] = k 1 · m + 3k 2 · 2n and thus Equation (25) can be adapted accordingly.

Additional Challenge: Ciphertext Compression
Similar to the use of rounded ciphertexts in NTRU Prime to reduce ciphertext size, NTRU also adopts compression exploiting the inherent property of valid ciphertexts. The decryption procedure of NTRU expects valid ciphertexts to be a multiple of φ 1 modulo q.
In other words, the sum of coefficients of a valid ciphertext is expected to be 0 modulo q, which can also be seen from the conditional check in line 3 of NTRU_PKE.Decrypt procedure. Thus, the scheme proposes to only send the first n − 1 coefficients of c, while the last coefficient c[n − 1] is computed within the decryption procedure as However c constructed according to Equation 23 is inherently not a multiple of φ 1 modulo q. But, it can be adapted to satisfy the requirement in the following manner. Thus, we slightly modify c as . .+x im )+k 2 ·(x j1 +x j2 +x j3 +. . .+x jn )·h = k 1 ·d 1 +k 2 ·d 2 ·h, (27) where 2 | m (i.e.) m is even and polynomial d 1 has equal number of positive and negative non-zero coefficients (i.e.) d 1 has m/2 coefficients with a value of +1 and m/2 coefficients with a value of −1. This ensures that the sum of coefficients of c is 0. This is not required for d 2 since h is already a multiple of φ 1 , thus the product d 2 · h in c is a multiple of φ 1 . Thus, ciphertext c according to Equation (27) is processed without any errors in the decryption procedure. Unlike the chosen ciphertexts for NTRU Prime which inherently contain rounding error, chosen ciphertexts for NTRU do not contain any error, which significantly simplifies our attack on NTRU. This applies for both NTRU-HPS as well as NTRU-HRSS variants of NTRU. Table 5 lists the concrete values of the tuples (m, n) and (k 1 , k 2 ) used to construct chosen ciphertexts for single collision, for different parameter sets of NTRU.

Detecting Collision through Side-Channels
Given chosen tuples (m, n) and (k 1 , k 2 ), we construct several chosen ciphertexts c based on Equation (27), until we identify c base whose e = 0. This is identified through side-channel leakage, in a similar manner to our attack on NTRU Prime. If e = 0, then m = e · f p = 0 (Line 9 of NTRU_PKE.Decrypt procedure). However, if e = ±x i , m contains uniformly random coefficients in {−1, 0, 1}. This large difference in the value of m can be easily identified through side-channels, thereby distinguishing between the two classes: (1) e = 0 (Class O) and (2) e = 0 with e[i] = 0 (Class X).
We performed practical experiments on the ntruhps2048677 parameter set of NTRU. Side-channel measurements were acquired from the same target platform and experimental setup described in Section 3.1.4 and the Welch's t-test based approach was used to identify leakage corresponding to e = 0. Figure 8(a) depicts the t-test plots for several ciphertexts c whose e = 0. As can be seen, there are no significant peaks about the threshold, which indicates e = 0. However, Figure 8(b) corresponds to e = 0 for c , where we can clearly identify several peaks, well above the threshold, thereby indicating e = 0.
The identified ciphertext is denoted as c base and its corresponding polynomial tuple (d 1 , d 2 ) according to Equation 27 is denoted as (d1 att , d2 att ), which is subsequently used to create the attack ciphertexts for key recovery. Since we can only differentiate between e = 0 and e = 0, it is possible that e contains multiple non-zero coefficients. In such a case, key recovery cannot be performed correctly and thus the search for c base has to be repeated until the correct key is recovered.

Key Recovery Phase
We build, using (d1 att ,d2 att ), the attack ciphertext where 1 , 2 , 3 ∈ Z + , u ∈ [0, n − 1]. The term 3 · (x − 1) · x u is used to ensure that c att is a multiple of φ 1 modulo q. Let Rotp T (f , j) denote the product of f with x i in the ring T . If we assume d1 att and d2 att collide at i with a value of +1, then (29) If we denote 3 1 · 2m + 3 2 · 2n as a constant δ, then we can represent the coefficient of the colliding index a[i] as Section 3.2.1, barring the additional constraints placed to deal with the rounding error, as chosen ciphertexts of NTRU are devoid of rounding error. Table 6 is the decision table for the ntruhps2048677 parameter set, which demonstrates unique distinguishability for every candidate for β u ∈ {−2, −1, 0, 1, 2}, based on O or X.
can be uniquely identified in no more than four chosen ciphertext queries. Table 5 gives the concrete ( 1 , 2 , 3 ) values for different parameter sets of NTRU. We write ( x1 , x2 , x3 ) to denote the tuple used to distinguish x ∈ {1, 2}.

Classification using Reduced Templates
As shown in Section 3.2.2, side-channel leakage from the decryption of the attack ciphertext c att can be used to classify a given ciphertext as either O/X, thereby realizing a PC oracle. We use the distinguishing features of the t-test plot in Figure 8 to construct reduced templates for both classes O and X. We then use the templates for classification using a simple LSQ test. Figure 9 visualizes the matching of a section of the attack trace with the reduced templates of the respective classes O and X. Here again, we are able to observe clear distinguishability between the two classes and we experimentally obtained 100% success rate in classification, thereby demonstrating high accuracy of the realized PC oracle.

Recovering the Full Secret Key
Thus, we can use the realized PC oracle to uniquely recover the value of β u in upto four traces, thereby obtaining information about two coefficients of f . The same can be repeated for indices u ∈ [0, n − 1] to build a well-defined linear system, which can be trivially solved to recover all n coefficients of the f . We can only recover the secret upto a rotation of i indices (i.e.) f = f · x i . The attacker does not know the collision index i, however the multiplication of f by x i in the ring T q does not change the coefficients of f . Moreover, since the decryption involves multiplication and division by f , the rotated secret f can also be used to decrypt any message encrypted with the secret polynomial f . As stated earlier, the secret key might not be recovered correctly if there are multiple colliding indices in the base ciphertext c base . Thus, we can simply repeat the attack until the complete key is recovered correctly.

Experimental Results
We implemented our attack on the optimized implementation of ntruhps2048677 from the pqm4 library [KRSS]. The pre-processing first step to retrieve c base , required on average, only 10 attempts. For N = 10 replicated measurements, the trace complexity t base of the pre-processing phase is ≈ 100 traces. The subsequent attack phase requires upto 4 chosen-ciphertext queries (4 traces) to recover one coefficient. There are n = 677 coefficients. Altogether, including failed attack iterations, the complete secret polynomial f can be recovered in t total ≈ 2364 traces. Our attack works with a success rate of about 100%, with no remaining brute force or offline analysis.
We also successfully verified our attack using a simulated PC oracle on the remaining parameter sets of NTRU. Table 7 presents the estimated trace complexity of our attack for different parameter sets of NTRU where the numbers are estimated with N = 10 for the pre-processing phase. We can see that 2900 traces is enough for full key recovery across all parameter sets of NTRU Prime, thereby demonstrating the effectiveness of our attack.
As can be seen, the attack complexity of NTRU is lesser than NTRU Prime by a factor of ≈ 1.5 for almost similar dimensions of the secret polynomial f . This can be mainly attributed to absence of rounding noise in NTRU, which simplifies the attack analysis and allows for more relaxed choices of the attack parameters. This reduces the number of attempts to identify the base ciphertext in the pre-processing phase, as well as reduces the number of failed iterations of the attack.

Comparison with PC Oracle-based SCA on LWE/LWR-based schemes
We identify a few subtle but critical differences, when comparing our proposed PC oraclebased SCA on NTRU-based schemes compared to similar attacks on LWE/LWR-based schemes. The main difference lies in the anchor variable whose value is controlled carefully through the chosen ciphertexts for key recovery. As seen from our attacks on NTRU and NTRU Prime, the internal variable e within the decryption procedure, serves as the anchor variable. However, the underlying arithmetic of LWE/LWR-based schemes is such that, it is possible to exercise direct control over the output of the decryption procedure (i.e.) the decrypted message m through chosen ciphertexts, which serves as the anchor variable for key recovery.
Another differing aspect is the ability to control the value of the anchor variable. While our proposed attacks can restrict e to two classes e = 0 (Class O) and e = ±1 · x i (Class X), the value of e in class X cannot be controlled. Thus, the pre-processing phase of our attack involves a search for a base ciphertext whose e = ±1 · x i . The attacker can neither control nor know the colliding index i, since it depends upon the secret key.
However, for LWE/LWR-based schemes, the two classes are fixed (i.e.) m = 0 (Class O) and m = 1 (Class X), irrespective of the secret key. It is possible to build attack ciphertexts to exactly restrict m to either 0 or 1. Since the decrypted message m is the anchor variable, an attacker can also easily build ciphertexts for m = 0 and m = 1 to build side-channel templates. Thus, the search for a base ciphertext is not necessary, which heavily simplifies the PC oracle-based SCA on LWE/LWR-based schemes.
Though the attack seems to be more involved for NTRU-based schemes, we do not observe a significant difference in the attacker's cost (trace complexity) to perform full key recovery. For comparison, we utilize experimental results from the work of Ravi et al. [RRCB20] who demonstrated PC oracle-based SCA on LWE/LWR-based schemes, using the same target platform and attack setup. Their attack on the Kyber512 parameter set of Kyber required about 7700 traces for full key recovery (dimension n = 512 with coefficients in {−2, −1, 0, 1, 2}). But, this count corresponds to three attack iterations to improve success rate through majority voting. A single attack iteration takes about 2560 traces and thus the trace complexity of our proposed attack is comparable to the attack LWE/LWR-based schemes.

Limitations of the PC Oracle-Based SCA
Our proposed PC oracle-based SCA can perform full key recovery on all parameter sets of NTRU KEM and NTRU Prime KEM. However, we observe that side-channel leakage from only a few operations within the decryption procedure can be used to obtain information about the anchor variable e for key recovery. Thus, the attacker has a narrow scope to obtain side-channel leakage to instantiate a PC oracle for key recovery. This is particularly true for NTRU Prime KEM and we refer to the decryption procedure of NTRU Prime (i.e.) NTRU_Prime_PKE.Decrypt procedure in Algorithm 1. The attack ciphertexts result in e = 0/e = ±1 · x i . If e = 0, then b = 0 by line 5. If e = ±1 · x i , then b has uniformly random coefficients in {−1, 0, 1} and its exact value depends upon the secret polynomial g. However, in both cases, the weight of b is not equal to w, which is a requirement to be satisfied by the decrypted message. Thus, by line 10, the decryption procedure only returns a fixed value of (1, 1, . . . , 1, 0, 0, . . . , 0) for all the attack ciphertexts.
The effect of the anchor variable e for the attack ciphertexts, does not propagate beyond the decryption procedure. Thus, the PC oracle attack can only be carried out using side-channel information from operations that manipulate e and other dependent variables within the decryption procedure. This restricts an attacker from utilizing side-channel information from operations performed after decryption. These operations take place within the re-encryption procedure from line 5 of KEM.Decaps in Algorithm 2.
In the following section, we improve upon the PC oracle-based SCA by proposing a novel DF oracle-based SCA. The improved attack widens the scope of the attacker, to obtain side-channel leakage from several other operations within the decapsulation procedure, which aids in key recovery.

Decryption-Failure Oracle-Based SCA
We start by providing some intuition for the decryption-failure (DF) oracle attack. We demonstrate our attack on NTRU Prime only since the same approach extends trivially to NTRU. The main idea is to carefully perturb valid ciphertexts, followed by observing the effect of perturbation on the decrypted message. The perturbations are similar to those used for the PC oracle-based attack. Let c valid be a valid ciphertext whose anchor variable e is denoted as e valid . Let c be an element in a set of specially crafted ciphertexts. These are similar to those used for the PC oracle-based attack. Upon decryption of c , the corresponding e can only have two possible values, namely, e = 0 and e = ±1 · x i . We simply add the perturbation ciphertext c to the valid ciphertext c valid , to obtain a perturbed ciphertext c pert . Perturbing c valid in this manner, in turn, perturbs e valid so that the corresponding e pert for c pert admits two possible values, namely, e pert = e valid (the class O) or e valid with a single coefficient error at i (i.e.) e pert = e valid ± 1 · x i , denoted as e invalid (the class X).
Decryption never fails for the class of valid ciphertexts. The decryption procedure returns r valid . For the second class of ciphertexts, however, there is a single coefficient error in the anchor variable e, with e invalid = e valid ± 1 · x i . This triggers a decryption failure and, hence, r invalid := (1, 1, . . . , 1, 0, 0, . . . , 0) is returned as the decrypted message. Here, the perturbed ciphertext c pert restricts the decrypted message r to two possibilities, namely, r valid and r invalid . There, the decrypted message always takes the form of r invalid . The success or failure of decryption for the perturbed ciphertexts depends upon a targeted portion of the secret key. Thus, an attacker who can obtain information about the decryption outcomes through a Decryption-Failure (DF) oracle can fully recover the secret key. In effect, we have ensured that the effect of the anchor variable e propagates to the decrypted message r , while this was not the case with the PC oracle-based attack where the decrypted message always takes the form of r invalid . Figure 10 illustrates the attack targeting leakage from the decryption procedure. But, we can also rely on leakage from the re-encryption procedure to instantiate the DF oracle.
A decryption failure can be identified through side-channel leakage from two sets of operations. The first one consists of operations that manipulate the anchor variable e. The second one includes operations that manipulate the decrypted message r in the re-encryption procedure. Thus, an attacker enjoys a wider scope to obtain sidechannel information from several operations in the decapsulation procedure, including the re-encryption operation toward a key recovery.
Remark 1. We observe that the DF oracle-based attack works with information about the decrypted message r . This can be used to perform key recovery over the IND-CPA secure NTRU Prime PKE, even without the requirement of side-channels. Thus, our proposed DF oracle-based attack on NTRU Prime is also the first theoretical chosen-ciphertext attack against the IND-CPA secure NTRU Prime PKE.
Similar to the PC oracle-based SCA, our DF oracle-based attack also works in two phases, namely the pre-processing phase and the key recovery phase.

Pre-Processing Phase
As in Line 3 of NTRU_PRIME_PKE.Encrypt procedure in Algorithm 1, we construct a valid ciphertext c valid = Round(h · r). Its corresponding a = 3f · c valid is where m is the rounding error. We then construct perturbations using the methodology that was used to build the ciphertexts to obtain a single collision for NTRU Prime in Section 3.1.2. Such a perturbation c is given by with 3 | k 1 , 3 | k 2 , and d 1 and d 2 having, respectively, m and n nonzero coefficients +1. The corresponding a = 3f · c is We now use c to perturb c valid as c pert = Round(h·r+c ) = Round(h·r+k 1 ·d 1 +k 2 ·d 2 ·h) = h·r+k 1 ·d 1 +k 2 ·d 2 ·h+m , where m is the rounding error. Upon decrypting c , we express a pert = 3f · c pert as Thus, a pert ≈ a valid + a . Let (k 1 · d 1 · 3f + k 2 · d 2 · g) be the signal component s of a pert . The noise n comprises of the rounding noise 3f · m acting together with g · r, written as gr, from a valid . For simplicity, we denote variables corresponding to the perturbed ciphertext c pert (i.e.) a pert and e pert as a and e respectively. To induce a decryption failure, that is, to perturb a single coefficient of e = a mod 3, we need a single coefficient of a, say a[i], to be greater than q/2. This is achieved by choosing (m, n) for the polynomials (d 1 , d 2 ) in Equation (32) to maximize the probability of a single collision. If there is a collision at i, then s[i] should be large enough to push a[i] beyond the q/2 threshold. The coefficient s[i] at the colliding index is m 1 := (3k 1 · 2m + k 2 · 2n). Let m 2 denote the next largest possible value. We thus choose (k 1 , k 2 ) such that m 1 > q/2 and m 2 < q/2.
The noise component n = g · r + 3f · m in a, however, contributes to crossovers near the q/2 threshold for coefficients of a, resulting in false positives and false negatives in decryption failures. For sntrup761, the distribution of n is Gaussian with mean 0 and a slightly larger standard deviation of σ ≈ 53 than σ = 50 for n in the PC oracle-based attack. Though the increase is insignificant, we will soon see in Section 5.3 that the noise term gr in n is also present as a constant bias in the attack ciphertexts used for key recovery, along with the inherent rounding error. This additional bias poses challenges, and thus we require a slightly different approach to construct chosen ciphertexts.

Additional Challenge: Dealing with the Bias gr
To negate the effect of gr, we slightly modify the constraints in choosing (k 1 , k 2 ) so as to obtain a single collision at the index where the corresponding coefficient gr[i] of gr whose absolute value is high (both positive and negative value). We choose (k 1 , k 2 ) such that with 1 , 2 > 0. This way, even if there is a collision at i, we keep s[i] = m 1 < q/2 in the range [(q/2 − 1 ), (q/2 − 2 )]. Such a constraint for (k 1 , k 2 ) gives us several advantages. The main advantage is that a[i] > q/2 only when two conditions, namely a collision at i and n[i] > 2 , hold simultaneously. In allowing the noise coefficient to have a large value, we increase the chances of gr[i] to also have a large value at the colliding index. Instead of simply identifying a collision at any index, we increase the chances of achieving collision at an index where gr[i] has a large value. Thus, even if a [i] = m 1 and is close to q/2, it gets pushed further away from q/2 by gr[i]. This has a positive influence over key recovery as it decreases the chance of a false negative for decryption failure in the key recovery phase.
With m 1 chosen to be < q/2 by 2 < dm 1 < 1 , there is a leeway to increase dm 2 . This reduces the chances of false positives at the other indices j = i where no collisions occur. Thus, our modified constraints for choosing (k 1 , k 2 ) according to Equation (36) offer several advantages in reducing the false positives as well as false negatives for decryption failures, which aids key recovery.
We choose (m, n) and (k 1 , k 2 ) based on the aforementioned constraints to identify ciphertexts with a single collision. Table 8 presents the concrete values chosen for our attack on all parameter sets of specified NTRU Prime. Given (m, n) and (k 1 , k 2 ), we randomly select d 1 and d 2 to construct perturbations c based on Equation (32). The aim is to identify a perturbation which, when added to a valid ciphertext c valid , induces a single coefficient error in the corresponding variable e (i.e.) e invalid = e valid ± x i . This results in a decryption failure by yielding r invalid .

Detecting Decryption Failure through Side-Channels
Decryption failures can be identified either by obtaining information about the anchor variable e within the decryption procedure or the decrypted message r used in the re-encryption procedure, through side-channels. We can therefore utilize side-channel leakage from two sources. The first source consists of operations that manipulate e within the decryption procedure in Lines 5 to 6 of NTRU_PRIME_PKE.Decrypt in Algorithm 1. Operations within the re-encryption procedure in Line 5 of NTRU_Prime_KEM.Decaps in Algorithm 2 form the second source.
We performed practical experiments on sntrup761. Measurements were acquired from the same target platform and experimental setup used to perform the PC oracle-based attack. In particular, we obtained side-channel leakage from the encoding of the decrypted message r just after the decryption procedure. Other operations within the re-encryption can also be deployed to infer information on r . We used the Welch's t-test described in Section 3.1.4 to identify leakage that differentiates r invalid (decryption failure) from r valid (decryption success). Figure 11(a) depicts the t-test plot for several perturbed ciphertexts c pert whose decryption does not fail (i.e.) no error. One sees no significant peaks about the threshold, which indicates r = r valid . Figure 11(b), however, exhibits the t-test plot when the decryption fails for the perturbed ciphertext. One can clearly identify several peaks, well above the threshold, indicating r = r invalid .
The ciphertext which successfully induces a decryption failure is denoted as c base and its corresponding polynomials d 1 and d 2 are denoted by d1 att and d2 att , with m and n terms, respectively.

Key Recovery Phase
We now use the polynomials d1 att and d2 att of c base to build new perturbed attack ciphertexts. Side-channel leakage from their decapsulation is used to identify decryption failures, which subsequently leads to full recovery of the secret polynomial f .

Attack Methodology
Our approach to build new perturbation ciphertexts for key recovery very closely resembles the one used for the PC oracle-based attack on NTRU Prime in Section 3.2. We first build the perturbation ciphertext c using (d1 att , d2 att ) of c base as with 1 , 2 , 3 ∈ Z + and u ∈ [0, n − 1]. We add c to the term h · r of c valid , and generate the perturbed/invalid ciphertext c pert in the same way as done in Equation (34). The corresponding a = 3f · c pert is given by a = 3f · c pert = 3 1 · d1 att · f + 2 · d2 att · h · 3f + 3 · 3f · x u + gr + 3f · m = 3 1 · d1 att · f + 2 · d2 att · g + 3 3 · Rotp R (f , u) + gr + 3f · m = 3 1 · d1 att · f + 2 · d2 att · g + 3 3 · Rotp R (f , u) + n where n = 3f · m + gr. If d1 att and d2 att collide at i, then we can write the coefficients of a as   (Class O) or e = e invalid (Class X) can act as a binary distinguisher for every candidate of β u ∈ [−2, 2]. These constraints used to select ( 1 , 2 , 3 ) are the same as that used for the PC oracle-based SCA for NTRU Prime. Thus, we arrive at the same values which were used for the PC oracle-based SCA, as can be seen in Table 8. Thus, the decision table for unique distinguishability also stays the same (cf. Table 3 in Section 3.2.1).

Classification using Reduced Templates
We utilize the differentiating features in the t-test plot shown in Figure 11(b) to build reduced templates for both the classes O and X. Subsequently, they can be used to classify any given trace corresponding to the decapsulation of an attack ciphertext into either of the classes. This was treated earlier in Section 3.2.2. Figure 12 visualizes the matching of a small section of an attack trace tr with the reduced templates of the respective classes O and X. There is a clear distinguishability between the reduced templates of the two classes. This enables us to correctly classify each given single trace with a 100% success rate.

Recovering the Full Secret Key
So far we have demonstrated the recovery of a single coefficient β u of the rotated secret polynomial Rotp R (f , u). Similarly, by changing the rotation index u, we can recover Rotp(f , u)[i] for all u ∈ [0, n − 1]. Just in line with the PC oracle attack, recovering the exact secret polynomial f requires knowing the colliding index i and the value (+2 or −2) of the collision. By simply trying out all possible choices for i ∈ [0, n − 1] and the colliding values +2 and −2, we check, for each choice, if f ∈ R sh and attempt to decrypt known ciphertexts. We empirically verified that the search space is drastically reduced to ≈ 10, up to a certain rotation of f . It is also possible that the secret is not recovered correctly, due to a bad choice of the base ciphertext c base . In this case, we simply retry the attack until the correct key is recovered.

Experimental Results
We ran our attack on the optimized implementation of sntrup761 taken from the pqm4 library [KRSS] on the ARM Cortex-M4 microcontroller. The pre-processing phase to retrieve c base requires an average of ≈ 165 attempts. For N = 10, the number of required total attempts goes to ≈ 1650 traces (t base ). The number is almost 2.85× higher compared with the PC oracle-based attack. The latter only requires ≈ 58 attempts. The increase is partly due to the additional constraints to deal with the constant bias gr (Section 5.1.1). The subsequent attack phase can recover a single coefficient in up to 4 ciphertexts and, thus, the complete attack requires t total = t base + t attack ≈ 4566 traces for complete recovery of f . Our attack works with a success rate of about 100% with no additional brute force or offline analysis to perform. We also successfully verified our attack methodology using a simulated DF oracle on all the parameter sets of NTRU Prime. Table 9 gives the attack's estimated trace complexity for different schemes. The numbers are estimated with N = 10 for the pre-processing phase. We can see that roughly between 4100 to 5300 traces are enough for full key recovery across all listed parameter sets with a 100% success rate. The numbers are roughly 1.2 to 1.4 times the numbers for the PC oracle-based attack. This increase can be attributed mainly to the longer pre-processing phase for the DF oracle-based attack.

Comparison with DF Oracle-based SCA on LWE/LWR-based schemes
Known DF oracle-based SCA on LWE/LWR-based schemes [GJN20, BDH + 21] modified single coefficients of the ciphertext to perturb the corresponding bits in the decrypted message m that served as the anchor variable. Whether or not the perturbations result in a decryption failure is linearly dependent on the secret key. This information, if could be obtained by a DF oracle, led to full key recovery. For LWE/LWR-based schemes it is notable that the location of the perturbed bit in the decrypted message can be precisely controlled.
Although the underlying arithmetic is vastly different for a direct comparison, we identify a few subtle differences when compared to our proposed DF oracle-based SCA on the NTRU-based schemes. Our approach does not allow us to control the location of the perturbed bit of the decrypted message. The more important but subtle difference lies in the type of error used for perturbation. In our attack, we use carefully constructed perturbations which, in fact, are the chosen ciphertexts used to carry out the PC oraclebased attacks. In contrast, the attacks on LWE/LWR-based schemes use simpler errors which perturb targeted single coefficients of the ciphertext polynomial.
For a quantitative comparison of the attacker's effort, we utilize experimental results from the work of Bhasin et al. [BDH + 21]. It demonstrated a practical side-channel attack on a side-channel resistant implementation of Kyber KEM. Their attack exploited side-channel vulnerabilities in the ciphertext comparison operation to instantiate a DF oracle. Their attack on Kyber512 took about 2 17 decapsulation queries and an additional offline analysis, with a computational complexity of 2 65 , for full key recovery. Similarly, Guo et al. [GJN20] proposed a timing side-channel attack targeting the ciphertext comparison operation to instantiate a DF oracle-based attack on Frodo KEM. Their attack required about 2 30 decapsulation queries for full key recovery in Frodo − KEM − 1344 − AES parameter set. While the number of measurements includes replicated queries for better signal to noise ratio, the number of decapsulation queries without replications is still very high at ≈ 118000.
Our proposed attack on NTRU-based schemes requirer much less number of traces, in the range of 4500 to 7000 traces, for full key recovery with a 100% success rate across all parameter sets of NTRU Prime.

Full-Decryption Oracle-Based SCA
We have thus shown that PC and DF oracles can be realized using side-channels through careful choice of the chosen ciphertexts for NTRU Prime and NTRU KEM to perform full key recovery. However, the PC and DF oracles only provide binary information (1-bit) about the secret-dependent anchor variable, and therefore require a few thousand chosen ciphertext queries to the target device. However, a more powerful side-channel adversary who can extract more that just a single-bit information about the anchor variable, can potentially perform more efficient attacks.
In this respect, Sim et al.
[SKL + 20] demonstrated single-trace message recovery attacks over several IND-CCA secure NIST PQC KEMs. In particular, their attack targeted those routines which manipulate sensitive variables such as the decrypted message, one coefficient or one bit at a time. Targeting NTRU, they showed that the polynomial lift operation computed on the decrypted message m in the decryption procedure (Line 10 of NTRU_PKE.Decrypt procedure) is susceptible to side-channel attacks. They demonstrated successful single-trace message recovery with close to a 100% success rate.
Though they do not demonstrate message recovery for NTRU Prime, we speculate that the weight check operation on the variable b (Line 6 of NTRU_Prime_PKE.Decrypt procedure) could also be susceptible to similar single-trace attacks, especially because it involves manipulation of single coefficients of b . The feasibility of performing single trace recovery of b remains out of scope of this work.
However, the aforementioned side-channel vulnerabilities can potentially be exploited to recover the complete decrypted message m in case of NTRU, or the variable b in case of NTRU Prime in a single trace. We show that such vulnerabilities can also be used to instantiate a Full-Decryption (FD) oracle in a CCA setting to mount very efficient key recovery attacks. We first describe our attack on NTRU Prime KEM, and subsequently on NTRU KEM.

Attack Methodology: NTRU Prime KEM
The attack methodology directly follows from our PC oracle-based SCA on NTRU Prime KEM (cf. Section 3). We conduct the pre-processing phase to retrieve the base ciphertext c base whose e = ±x i . Using the side-channel based FD oracle, we assume complete recovery of b for c base in a single trace. From line 5 of NTRU_Prime_PKE.Decrypt procedure, we know that b = e ·ĝ ∈ R 3 (40) whereĝ is the inverse of the secret polynomial g in R 3 . Since e = ±x i , g can be directly recovered as g = e ·b ∈ R 3 whereb is the inverse of b ∈ R 3 . The attacker does not know i, but can simply try out all possible choices for i ∈ [0, n − 1] and recover the secret polynomials f and g upto a rotation.

Attack Methodology: NTRU KEM
The attack follows the pre-processing phase of the PC oracle-based SCA of NTRU KEM from Subsection 4.1 to retrieve the base ciphertext c base whose e = ±x i . Using the side-channel based FD oracle, we assume complete recovery of m for c base in a single trace. From line 9 of NTRU_PKE.Decrypt procedure, we know that where f p is the inverse of f in S 3 . Since e = ±x i , f p can be simply computed as f p = b ·ê ∈ S 3 whereê is the inverse of e ∈ R 3 . An attacker can try out all possible values of i to fully retrieve f p and thereby calculate the secret key polynomials f and g. Unlike the PC oracle or DF oracle-based attacks, the attacker can perform full key recovery only using the base ciphertext c base for both NTRU and NTRU Prime KEM, which completely eliminates the need for key recovery phase. Thus, the trace requirement of the FD oracle-based SCA primarily comes from the pre-processing phase of the attack. Please refer to the column corresponding to t base in Tables 4 and 7 for the estimated trace complexity of the FD oracle-based SCA on different parameter sets of NTRU Prime KEM and NTRU KEM respectively.

Countermeasures
Our proposed side-channel assisted CCAs rely on fixing targeted intermediate variables to known values and, subsequently, utilizing side-channel leakage to identify its value to perform key recovery. Thus, a complete randomization of the internal computation through masking, can serve as a concrete countermeasure against the attacks. Let us briefly address the countermeasures for the NTRU Prime KEM and the NTRU KEM separately.
In the case of NTRU Prime, the PC oracle-based attack only exploits leakage from the decryption procedure. Thus, masking only the decryption procedure in decapsulation protects against the PC oracle-based SCA. The same applies for the FD oracle-based attack since it primarily relies upon leakage from the decryption procedure. The DF oracle-based attack, however, is capable of exploiting leakage from the re-encryption procedure for key recovery. Thus, the entire decapsulation procedure needs to be masked for a concrete protection to thwart key recovery.
In the case of NTRU, the decapsulation procedure does not perform any re-encryption of the decrypted message. Thus, the decryption procedure remains the only source of side-channel leakage to instantiate the oracles for key recovery. All three attacks target NTRU by exploiting leakage from the decryption procedure. We therefore believe that masking the decryption procedure within decapsulation is sufficient to thwart our attacks. However, the other unmasked operations within the decapsulation procedure, could also offer an opportunity for the attacker to instantiate oracles for key recovery. We leave a concrete analysis of this possible attack route for some future work.
Masking countermeasures, in general, are known to be costly in terms of performance. There are several works, see, e.g., [LSCH10, WZW13, HCY20, SMS19], on protecting NTRU-based primitives against side-channel attacks. Thus far, existing attacks as well as countermeasures only target the polynomial multiplier involving the secret key in the decryption procedure in Lines 3 and 5 in NTRU_Prime_PKE.Decrypt procedure of NTRU Prime in Algorithm 1 and in Lines 7 and 9 of NTRU_PKE.Decrypt procedure of NTRU in Algorithm 3.
Our attacks have shown that other operations within the decryption and decapsulation procedure can also be targeted for key recovery. Moreover, schemes such as Streamlined NTRU Prime include nonlinear operations which are nontrivial to mask. An example is the weight check in Line 6 in the NTRU_Prime_PKE.Decrypt procedure of NTRU Prime.
To the best of our knowledge, a concrete and complete masking scheme for NTRU-based PKE/KEMs is yet to be devised. Developing efficient and concrete masking strategies for NTRU-based PKE/KEMs, therefore, warrants an urgent attention from our community.

Conclusion
We have thus demonstrated the first practical side-channel assisted CCAs on NTRU and NTRU Prime, which are final round candidates in the onging NIST PQC standardization process. Our attacks involve careful construction of malformed ciphertexts which, when decrypted, can instantiate three different types of oracles through side-channel leakage from the decapsulation procedure. The resulting responses can then be used to perform full key recovery. The oracles are plaintext-checking oracle, decryption-failure oracle, and full-decryption oracle. We perform experimental validation of our proposed attacks on optimized implementations of NTRU-based schemes, using the EM-based side-channel on the 32-bit ARM Cortex-M4 microcontroller. All of our proposed attacks are capable of recovering the full secret key in only a few thousand chosen ciphertext queries to the target device on all parameter sets of NTRU and NTRU Prime. Our attacks stress on the need for concrete masking strategies for NTRU-based KEMs to protect against side-channel assisted CCAs.