Curse of Re-encryption: A Generic Power/EM Analysis on Post-Quantum KEMs

. This paper presents a side-channel analysis (SCA) on key encapsulation mechanism (KEM) based on the Fujisaki–Okamoto (FO) transformation and its variants. The FO transformation has been widely used in actively securing KEMs from passively secure public key encryption (PKE), as it is employed in most of NIST post-quantum cryptography (PQC) candidates for KEM. The proposed attack exploits side-channel leakage during execution of a pseudorandom function (PRF) or pseudorandom number generator (PRG) in the re-encryption of KEM decapsulation as a plaintext-checking oracle that tells whether the PKE decryption result is equivalent to the reference plaintext. The generality and practicality of the plaintext-checking oracle allow the proposed attack to attain a full-key recovery of various KEMs when an active attack on the underlying PKE is known. This paper demonstrates that the proposed attack can be applied to most NIST PQC third-round KEM candidates, namely, Kyber , Saber , FrodoKEM , NTRU , NTRU Prime , HQC , BIKE , and SIKE (for BIKE , the proposed attack achieves a partial key recovery). The applicability to Classic McEliece is unclear because there is no known active attack on this cryptosystem. This paper also presents a side-channel distinguisher design based on deep learning (DL) for mounting the proposed attack on practical implementation without the use of a proﬁling device. The feasibility of the proposed attack is evaluated through experimental attacks on various PRF implementations (a SHAKE software, an AES software, an AES hardware, a bit-sliced masked AES software, and a masked AES hardware based on threshold implementation). Although it is diﬃcult to implement the oracle using the leakage from the TI-based masked hardware, the success of the proposed attack against these implementations (even except for the masked hardware), which include masked software, conﬁrms its practicality.


Background
Public key encryption (PKE) is a cryptographic primitive essential for secure information systems.As it is usually difficult to construct a chosen ciphertext attack (CCA)-secure PKE, the Fujisaki-Okamoto (FO) transformation [FO99] and its variants (e.g., [HHK17, SXY18, BHH + 19]) have been commonly used to produce CCA-secure key encapsulation mechanisms (KEMs) from a chosen plaintext attack (CPA)-secure PKE via re-encryption and equality checking; most KEM schemes in NIST post-quantum cryptography (PQC) competition [NIS20] follow this CPA-to-CCA transformation approach.Although the theoretical security of such KEM schemes has been extensively analyzed, side-channel analysis (SCA), a type of attack on cryptographic implementation using side-channel leakage (e.g., execution time, power consumption, and electromagnetic (EM) emanation), can potentially break these schemes when they are implemented in the real world [Koc96,KJJ99].It is quite important to investigate the SCA vulnerability of KEM schemes for the applications in which SCA can be a practical threat, such as Internet-of-Things (IoT).
Many previous SCAs on KEMs have mainly focused on the decryption of the underlying PKE with the goal of recovering the secret key (e.g., [PPM17,ZYD + 20]).In contrast, some recent studies have shown that the FO transformation can leak the secret key, even if the underlying PKE is securely implemented [GTN20, RRCB20, PP21].These attacks exploit side-channel leakage or fault injection to obtain information about the PKE decryption result, and then mount a chosen-ciphertext attack on the underlying PKE.In fact, such side-channel-assisted chosen-ciphertext attacks have been studied on public key primitives after the disclosure of Bleichenbacher's padding oracle attack on RSA PKCS [Ble98].Focusing on post-quantum KEMs based on the FO transformation, Guo et al. [GTN20] present a timing attack that is potentially applicable to lattice-and code-based KEMs if the FO transformation is implemented in a non-constant-time manner, which reveals the importance of constant-time implementation of the FO transformation in addition to PKE.By contrast, for the power/EM side-channel and fault injection, attacks only applicable to lattice-based schemes are known [RRCB20,PP21].In particular, there is no known power/EM SCA on the FO transformation in code-nor isogeny-based KEM(s) (e.g., HQC, BIKE, and SIKE).Thus, a detailed evaluation of the applicability/limitations of SCAs on the FO transformation is essential for developing an adequate countermeasure for the sake of secure KEM implementation.

Our contributions
In this paper, we show that the side-channel leakage of re-encryption, which plays an essential role to realize CCA security for most KEM schemes, can be generally exploited to break the CCA security.We also present a concrete and practical method to exploit the leakage with experimental evaluation.The contributions of this paper are as follows: • We present a generic power/EM SCA methodology for KEMs based on the FO transformation and its variants.The key idea underlying the proposed attack is the creation of a plaintext-checking oracle through a side-channel trace to mount a chosenciphertext attack on the underlying CPA-secure PKE.This oracle tells whether or not the PKE decryption result in decapsulation is equivalent to the reference plaintext.The reference plaintext means the PKE decryption result corresponding to a valid reference ciphertext.To realize the oracle, the proposed attack exploits sidechannel leakage during execution of pseudorandom function (PRF) or pseudorandom number generator (PRG) in re-encryption of the KEM decapsulation.This allows it to distinguish whether or not the PKE decryption result is the fixed reference plaintext.The proposed SCA focusing on the PRF leakage can be also used to create a decryption failure oracle, which is another major oracle in attacking lattice-based KEMs, as the attack on Streamlined NTRU Prime shown in this paper (following the attack in [REB + 21]).As such, the proposed attack can be performed even if the underlying PKE implementation has no secrecy leakage.As many PKEs are vulnerable to adaptive chosen-ciphertext attacks, the proposed attack can be widely applied to many KEMs including lattice-, code-, and isogeny-based KEMs.  1, in which the generality of the proposed attackone of major advantages over conventional attacks-is confirmed.We stress here that this paper is the first report on power/EM analysis on the FO transformation of code-and isogeny-based KEMs, although some SCAs on the FO transformation of lattice-based KEMs have been already discussed in previous works, which are not fully generalized (e.g., [RRCB20] on Kyber, Saber, and FrodoKEM).
• We present a deep-learning (DL)-based distinguisher for implementing the plaintextchecking oracle, which is designed as a two-classification neural network (NN), and allows for attacks without specific assumption nor knowledge about the target implementation.In addition, we also describe how to realize the distinguisher with a convincing accuracy using an NN model whose accuracy is insufficient (in many cases, NN model accuracy for SCA can be low owing to the presence of noise and/or SCA countermeasures [ISUH21]).Thus, as demonstrated in this paper, the proposed NNbased distinguisher can be used to attack practical implementations in a black-box manner even when an SCA countermeasure such as masking are implemented.Note that the proposed attack requires no profiling device for acquiring a training dataset, since it is acquired from the target implementation under our scenario, as in several previous SCAs on lattice-based KEMs such as [XPRO20,RBRC20,SKL + 20,NDGJ21].
• Using the distinguisher, we validate the proposed attack through experimental attacks on various PRF implementations.In the experiments, we target five implementations: a non-protected SHAKE and AES software obtained from an open-source cryptographic softwares library pqm4 [KRSS19,pqm21], an open-source AES hardware developed for side-channel standard attack evaluation board (SASEBO) [Toh], an open-source masked bit-sliced AES software for ARM Cortex-M4 corresponding to Schwabe's and Stoffelen's paper [SS16,git21], and a masked AES hardware based on threshold implementation (TI) in [UHA17] as TI is one of the most promising masking schemes.Our results confirm that the NN model can achieve a sufficiently high test accuracy to perform the key recovery attack even for masked software implementations, whereas it is difficult to break the masked TI-based hardware in our environment.Finally, we rigorously and comprehensively evaluate the number of side-channel traces required for a successful key recovery to demonstrate the practicality of the proposed SCA on the post-quantum KEMs.

Paper organization
The remainder of this paper is organized as follows.Section 2 reviews KEMs based on the FO transformation and the previous timing and power/EM SCAs on KEMs focusing on the FO transformation.Section 3 describes the proposed SCA methodology on the basis of a plaintext-checking oracle realized via side-channel leakage.Section 4 demonstrates the application of the proposed attack to NIST PQC third-round KEM candidates.Section 5 presents the side-channel distinguisher design for mounting the proposed attack on practical implementations and Section 6 conducts experimental validation using various PRF implementations.Finally, Section 7 concludes this paper.

IND-CCA-secure KEM based on the FO transformation
KEM is a public key cryptographic primitive that encapsulates a secret key.KEM is defined as a triple of polynomial-time algorithms: a key generation KeyGen, an encapsulation Encaps, and a decapsulation Decaps.Many CCA-secure KEMs are obtained using a CPAsecure PKE with the FO transformation or its variant (e.g., [HHK17, SXY18, BHH + 19]); most NIST PQC KEM candidates follow this CPA-to-CCA transformation approach.Algorithm 1 illustrates KEM = (KeyGen, Encaps, Decaps) based on a (standard) FO transformation, where PKE is a CPA-secure probabilistic PKE comprising a key generation algorithm Gen, an encryption algorithm Enc, and a decryption algorithm Dec. Here, as a major example, we consider a KEM that returns a pseudorandom number instead of a rejection symbol ⊥ in the case of an invalid ciphertext, which implies implicit rejection.Given a security parameter 1 λ , KEM.KeyGen first generates a key pair (sk, pk) using the PKE key generation PKE.Gen.Then, s is generated as a random plaintext of the PKE from the message space M at Line 3. Finally, the algorithm returns the triplet (sk, pk, s).
KEM.Encaps first randomly generates a message m from a message space M and then evaluates a random oracle G on m or on a pair of m and pk (e.g., in the cases of BIKE and SIKE, respectively), which is usually realized using a PRF/PRG.In Line 4, the PKE encryption PKE.Enc is performed using a public key pk, message m, and randomness r.Then, a random oracle H is evaluated on m and c to derive the shared secret k.The input format and output length of those random oracles are determined in accordance with each KEM specification, as summarized in Table 2. Finally, the algorithm returns the ciphertext c corresponding to k.Note that the ciphertext c may be a tuple.
KEM.Decaps first performs the PKE decryption for c using the secret key sk to obtain the plaintext m .Then, analogously to KEM.Encaps, KEM.Decaps generates r as G(m ) or G(m , pk), and evaluates PKE.Enc(pk, m ; r ).This procedure is called re-encryption.At Line 5, the KEM.Decaps algorithm performs equality checking, namely, examines whether the re-encryption result c is equal to the ciphertext c.If c = c , the algorithm returns the shared secret k = H(m , c) as the ciphertext is valid; otherwise, the algorithm returns a pseudorandom number of H prf (s, c) (instead of ⊥) as the ciphertext is invalid, where H prf is another random oracle or equivalent to H. Thus, the KEM scheme gives any active attacker no information about the PKE decryption result for invalid ciphertext.
In many modern KEM schemes, G, H, and H prf are instantiated using SHAKE or SHA3.There are some variants of the FO transformation for different types of CPA-secure PKE (e.g., deterministic PKE), different security models, tighter security bounds, and/or improved efficiency (e.g., [HHK17,SXY18,BHH + 19]); however, note that the basic principle is almost the same (that is, it is related to PRF/PRG, re-encryption, or equality/validity check).Although some variants avoid the complete re-encryption for computational efficiency (e.g., [DOV21] and NTRU submitted to NIST PQC), the proposed SCA would be applicable to these CPA-to-CCA-secure transformations as long as they involve PRF/PRG and/or procedure corresponding to equality/validity check.Finally, we summarize how KEMs implement G, H, H prf , and F, which is used to compute an additional hash added into a ciphertext, in Table 2.

Timing analysis
Guo et al. [GTN20] present an SCA focusing on the FO transformation.The attack utilizes a timing side-channel to realize a plaintext-checking oracle for lattice-and code-based KEMs.As the timing attack exploits the equality check between the ciphertext and re-encryption result (i.e., Line 5 in KEM.Decaps of Algorithm 1) rather than PKE.Dec, the attack can be applied to constant-time PKE implementation, unless overall decapsulation is implemented in a constant-time manner.
More precisely, the timing attack is a chosen ciphertext attack on KEMs and utilizes a plaintext-checking oracle to mount an adaptive attack on the underlying lattice-or code-based PKE.Let c be a valid ciphertext named reference ciphertext corresponding to a plaintext m.For an invalid ciphertext c , the plaintext-checking oracle tells whether or not the PKE decryption result of c is equivalent to m.The timing attack implements the plaintext-checking oracle via a timing side-channel.The attacker generates an invalid ciphertext c = c+δ where δ is determined according to the adaptive attack.For lattice-and code-based PKEs, if δ is sufficiently small for the underlying scheme, the PKE decryption result for c will be equivalent to m, which indicates that the re-encryption result should be c in this case.Otherwise, the PKE decryption result will be a random plaintext m, which will be re-encrypted to a random ciphertext ĉ that differs significantly from c.Then, PKE.Decaps performs the equality check, namely, compares the ciphertext c + δ with the re-encryption result.Here, ciphertext of lattice-and code-based PKEs is treated as long vectors in common processors.Therefore, if two ciphertexts are considerably similar to each other (i.e., if comparing c + δ and c), a standard comparison method (e.g., memcmp) takes a relatively long time; otherwise (i.e., if comparing c + δ and ĉ), the comparison terminates immediately after examining the first block comparison.This results in a timing difference depending on whether or not the PKE decryption result is equivalent to m; thus, the timing side-channel acts as a plaintext-checking oracle.The full-key recovery of KEM is achieved by repeated accesses to the plaintext-checking oracle for different δ's.In [GTN20], Guo et al. demonstrate the application of this attack to FrodoKEM.Although the signal-to-noise ratio (SNR) of side-channel measurement (i.e., accuracy of the oracle) would be problematic, the result indicates that the full-key recovery is sufficiently feasible.Since the disclosure of this attack, many PQC implementations have employed a fully constant-time conditional move (e.g., cmov) for the secure comparison of c and c and the move operation in PKE.Decaps.Thus, the timing attack is prevented at this time.
Note that the timing attack cannot be applied to SIKE (the isogeny-based KEM in NIST PQC), because the known adaptive attack on SIKE.PKE uses invalid ciphertext(s) that differs significantly from reference ciphertext, indicating that the comparison operation between c and c immediately terminates independently of whether the PKE decryption result is m or not.In addition, Guo et al. further note that their timing analysis may be carried out using a power/EM side-channel, because each pair of similar ciphertexts will have similar Hamming weights, resulting in similar power consumption/EM emanation.However, it is unknown how to exploit it with a sufficient accuracy in a practical setting/implementation; the feasibility of such an attack is not validated.

Power/EM analysis
Ravi et al. [RRCB20] show an SCA on lattice-based KEMs.The attack is a side-channelassisted CCA, ciphertexts of which are generated such that the decrypted (or decoded) plaintext is either 0 or 1 depending on a partial key.Here, the attacker cannot directly observe the plaintext due to the FO transformation; however, the side-channel leakage during re-encryption can be exploited to distinguish whether the plaintext is 0 or 1, which allows the attack to estimate the partial key.The attacker can recover the full key of some lattice-based KEMs by repeatedly querying the invalid ciphertexts to obtain different partial keys.In [RRCB20], Ravi et al. show that this methodology is applicable to six lattice-based KEMs, namely, Kyber, Saber, FrodoKEM, Round5, NewHope, and LAC.Ravi et al. also present a side-channel distinguisher based on a combined application of Welch's t-test and reduced template that yields a sufficiently feasible full-key recovery.Their distinguisher does not require the detailed knowledge of the target implementation.
Recently, Bhasin et al. [BDH + 21] report SCA vulnerabilities of masked polynomial comparison schemes [OSPG18, BPO + 20] for ciphertext equality check in lattice-based KEMs, and demonstrate its application to Kyber.One of their attacks is based on the timing attack by Guo et al. [GTN20], and focuses on the leakage of masked polynomial comparison of c = c to realize a plaintext-checking oracle using a distinguisher comprised of t-test like the test vector leakage assessment (TVLA) [SM15].Note that, although the attack utilizes a plaintext-checking oracle as well as our SCA, the literature primarily studies the (in)security of masked polynomial comparison for lattice-based KEMs, and discusses only some lattice-based KEMs (i.e., Kyber, Saber, and FrodoKEM).In this sense, the contributions and goal of this paper are different from those of [BDH + 21], as this paper primarily studies the generality and practicality of adaptive attacks using plaintextchecking oracle in the scenario of SCA on KEMs and presents a DL-based side-channel distinguisher that is generally applicable to various PRF implementations.
In addition, extended CCA SCA approaches to lattice-based KEMs have been presented in [XPRO20, RBRC20, SKL + 20, REB + 21], and, Ngo et al. [NDGJ21] present an extended attack to a masked Saber implementation in [vBDK + 21] using a DL technique.These attacks are very efficient in terms of the number of oracle accesses (i.e., side-channel traces) because they employ chosen ciphertexts which result in more side-channel-leaky plaintext regarding features and implementation of the underlying PKE.In other words, these attacks are very specific to the underlying PKE and its implementation.Although these attacks are CCA, they focus on some specific parts of the underlying PKE (e.g., message encoding/decoding and number theoretic transform (NTT)-based multiplication) rather than the FO transformation.In other words, these attacks achieve a higher efficiency by focusing on a scheme/implementation-specific aspect; therefore, they are less general in terms of KEM based on the FO transformation.
In addition to the approaches outlined above, there are some SCAs for code-and isogenybased KEMs (e.g., [SKC + 19, LNPS20] and [KAJ17, ZYD + 20], respectively).However, these attacks focus on the underlying PKE rather than the FO transformation.For codeand isogeny-based KEMs, no SCA focusing on the FO transformation is known.
As another attack direction, Kannwischer et al. [KPP20] present a single-trace SCA on SHA3 that recovers the secret input to SHA3 via a belief propagation based method, which is called soft-analytical SCA (SASCA).Although their attack is powerful, its feasibility heavily depends on the word length of the processor, the key length (i.e., the input bits to be recovered), and the SNR at the side-channel measurement.In fact, it is difficult to apply the attack to some practical settings (e.g., 32-bit processor and longer-than 256-bit secret) regarding the post-quantum KEMs.In addition, SASCA requires the detail of implementation and can be prevented/mitigated using a common SCA countermeasure (e.g., masking).Note that Kannwischer et al. show that their attack can be used for recovering the shared secret of KEMs, but do not show the secret key recovery.

Plaintext-checking oracle
We first introduce a plaintext-checking oracle, which plays an essential role in the proposed attack.A plaintext-checking oracle is one of major oracles employed in adaptive attacks on a wide range of PKEs including lattice-, code-, and isogeny-based ones (e.g., [GPST16,GTN20]).The key recovery attack engaged in a plaintext-checking oracle is called keyrecovery plaintext-checking attack (KR-PCA).In this paper, we refer to "adaptive attack" as adaptive chosen-ciphertext attack.
For a given KEM, let c be a valid ciphertext named the reference ciphertext, and let m be the corresponding plaintext named the reference plaintext.Note here that m denotes the PKE decryption result, rather than the output of KEM.Decaps.The attacker can obtain a reference ciphertext corresponding to any reference plaintext by performing an encapsulation without secret key.An adaptive attacker generates an invalid ciphertext c , which is a modification of c for an adaptive attack, and then queries it to the decryption oracle.Let m be the plaintext corresponding to c .An adaptive attack exploits the fact that there are two cases depending on the secret key: m is equal to either the reference plaintext m or other plaintext m.Formally, the plaintext-checking oracle O(c , m) returns 1 if m = m ; otherwise, it returns 0. For a KEM implementation based on FO transformation, such an oracle should be unavailable to any attacker, because the plaintext-checking oracle obviously leaks information on the PKE decryption result, which violates IND-CCA security guaranteed by the FO transformation.

Proposed SCA
The proposed attack enacts a plaintext-checking oracle (or other abstracted decryption oracle such as decryption failure oracle) through a side-channel leakage to mount a chosenciphertext attack on the underlying CPA-secure PKE.In the proposed SCA, the attacker first obtains the side-channel leakage during PRF execution of PKE.Decaps for the reference ciphertext c.The attacker then queries a modified ciphertext c for an adaptive attack using plaintext-checking oracle, and observes the side-channel leakage during PRF execution in the re-encryption of decapsulation.If c is decrypted to the reference plaintext m, the side-channel leakage for c should be considerably similar to that for c because the PRF input is identical.By contrast, if c is decrypted to other plaintext m, the two side-channel leakages should be meaningfully different.Thus, the attacker can distinguish whether or not the PKE decryption result is a reference plaintext from the side-channel leakage of PRF.As the proposed attack focuses on PRF leakage, it can perform a key recovery independently of the PKE.Dec implementation, even if it has no secrecy leakage.
The proposed SCA comprises a profiling phase and an attack phase.In the profiling phase, the attacker trains a classification model that uses side-channel trace(s) to distinguish which the PRF/PRG input is the reference plaintext or other random plaintext to enact the plaintext-checking oracle as mentioned above.In this paper, this trained model is called a side-channel distinguisher.In the attack phase, the attacker performs an adaptive attack on the underlying PKE using the trained distinguisher as the plaintext-checking oracle.Note that, although the attack employs a profiling phase, it does not require any profiling device because the profiling is performed using the target device without knowing the secret key, as in previous SCAs on lattice-based KEMs [RRCB20, XPRO20, RBRC20, SKL + 20, NDGJ21].The proposed SCA also does not require details on the target implementation, as DL enables us to train a model without leakage assumption nor specific knowledge about the target implementation.Such a DL-based side-channel distinguisher would be suitable to two classification of traces for fixed vs. random input, as Moos et al. show an efficient DL-based leakage assessment [MWM21].

Attack concept
To describe the underlying idea on KR-PCA on several prominent lattice-based PKEs, we consider a lattice-based PKE with a simplified notation.Suppose that, in the PKE decryption, the plaintext before decoding is given in the form of Encode(m) + ke + e , where k is the secret key, e and e are errors, and Encode is an encode algorithm with a corresponding decode algorithm Decode to remove the noise ke + e .Let c be a valid ciphertext corresponding to Encode(m) + ke + e , which can be computed by the encapsulation.For a lattice-based PKE, the ciphertext is correctly decrypted and decoded to m if the noise ke + e is less than a threshold value γ; otherwise, c is decrypted and decoded to other plaintext m.Lattice-based PKEs are usually designed so that the decryption failure probability is negligibly small for valid ciphertext.
In a KR-PCA, the attacker queries a modified ciphertext c = c + δ to the decryption oracle, where δ is an error added to the ciphertext.The modified ciphertext is decrypted to Encode(m) + ke + e + δ before decoding, where ke + e + δ is the noise to be removed.Let m be the decoded plaintext.If ke + e + δ < γ, c is decrypted and correctly decoded to m (i.e., m = m); otherwise (i.e., ke + e + δ ≥ γ), c is decrypted and wrongly decoded to other plaintext m (i.e., m = m).In other words, if ke + e + δ < γ, O(c , m) = 1; otherwise, O(c , m) = 0. Therefore, the attacker can determine ke + e + δ through adaptive queries to the plaintext-checking oracle to find a value of δ such that ke + e + δ = γ.Thus, the attacker solves the linear equation to recover the secret key k because e, e , δ, and γ are available to the attacker.Furthermore, the number of oracle accesses needed for a full-key recovery can be reduced by querying a dedicated ciphertext, as mentioned in [BDL + 19] and described in the following sections.

FrodoKEM
We herein describe the KR-PCA on FrodoKEM in [GTN20] as a representative and simple case.For the simplicity, we omit the detailed descriptions of attacks on Kyber and Saber because they are broken in a manner similar to FrodoKEM, as described in Section 4.1.3.Some instances of NTRU and NTRU Prime are also broken in a similar manner (we omit the detailed description for them as well), and we describe the implications in attacking them in Section 4.1.4and Section 4.1.5,respectively.
Let S be the matrix for the secret key.Let S , E, E , and E be the error matrices.When the ciphertext (c 0 , c 1 ) corresponding to a pair of ciphertext matrices B and C is input to the decryption oracle, the oracle computes the plaintext matrix M as In a KR-PCA, the attacker generates a modified ciphertext consisting of c 0 and c 1 corresponding to C + ∆, where ∆ is an error matrix added by the attacker (which corresponds to δ in Section 4.1.1).When querying (c 0 , c 1 ), the decryption oracle computes Here, if all elements of Q are less than a threshold γ, M is correctly decoded to m; otherwise, M is wrongly decoded to other plaintext m.Therefore, the attacker can find a ∆ such that Γ = Q by adaptively Algorithm 2 Key-recovery plaintext-checking attack on FrodoKEM Input: Reference ciphertext (c 0 , c 1 ), reference plaintext m, and noise matrices S , E, E , and E Output: Secret key sk (i.e., Secret matrix S) 1: Function AttackOnFrodoKEM((c 0 , c 1 ), m, S , E, E , E ) 2: ∆ ← ZeroMatrix(n, n); 3: for i = 0 to n − 1 do 4: for j = 0 to n − 1 do 5: ), m) = 0 and O((c 0 , c (i,j,δ−1) 1 ), m) = 1 then 7: ∆ i,j ← δ; 8: Solve linear equation Γ = ES − E S + E + ∆ about S; 9: return S; querying (c 0 , c 1 ) to the plaintext-checking oracle, where Γ is a matrix, all elements of which are a constant coefficient of γ.Because all elements in Q except for the secret key S (i.e., S , E, E , E , and ∆) are now available, the attacker can recover S by solving the linear equation Γ = Q if the attacker obtains ∆.
Algorithm 2 describes the KR-PCA on FrodoKEM.The attacker determines a reference plaintext and the corresponding valid reference ciphertext by performing an encapsulation in advance.At Line 2, we initialize an n × n matrix ∆ as a zero matrix, where n and n denote the matrix size in FrodoKEM.We iteratively determine the (i, j)-th element of ∆ (denoted by ∆ i,j ) over the loop of Lines 3-7.At Line 6, we query modified ciphertexts (c 0 , c ), where c (i,j,δ) 1 is a ciphertext corresponding to a matrix of C where δ is added to the (i, j)-th element.If the (i, j)-th element of Q is less than γ, the corresponding plaintext matrix M is correctly decoded to m (i.e., O((c 0 , c ), m) = 1); otherwise, M is wrongly decoded to other plaintext (i.e., O((c 0 , c (i,j,δ) 1 ), m) = 0).In particular, the (i, j)-th element of Q is equal to γ if and only if O((c 0 , c (i,j,δ) 1 ), m) = 0 and O((c 0 , c (i,j,δ−1) 1 ), m) = 1; in this manner, the attacker obtains information on ∆ i,j through the plaintext-checking oracle.Once the attacker obtains ∆ i,j for all i and j, the attacker recovers the secret matrix S by solving the linear equation Γ = Q at Line 8.
In Lines 5-7, we require at most γ oracle accesses to determine ∆ i,j for each pair of i and j if using a naive manner.However, as Guo et al. mention in [GTN20], it is possible to reduce the number of oracle accesses to log γ by means of a binary search.Thus, Algorithm 2 achieves a full-key recovery with nn log γ oracle accesses.In addition, as mentioned in [BDL + 19], the number of oracle accesses can be further reduced using a sparse ciphertext matrix.Let us consider D (i) = [ 0, . . ., 0, 1, 0, . . ., 0], whose i-th column is all 1's.Suppose that we query a couple of ciphertext matrices (D (i) , C).In the decryption, we obtain M = C − D (i) S = C − Z, where the first row of Z is the i-th row of S and the remaining elements are 0. We modify C and checks whether the decoded message is 0 or not as the plaintext-checking oracle.For example, let us consider the query D (1) with C whose first row is filled by q/2 B+1 (where q is the modulus of the ring and B is the bit length of Frodo.Encode) and the remaining rows are filled by 0's.We have M whose first row is q/2 B+1 − S 0,i for i = 0, 1, . . ., n − 1, which is decoded into 0 if and only if S 0,i > 0. Thus, the attacker can directly recover the coefficients of secret matrix S with fewer oracle accesses than the above straightforward attack.
In FrodoKEM.Decaps, the plaintext m is first computed by PKE.Dec, and then SHAKE is computed for a concatenation of m and a hash value associated with public key (denoted by pkh in the FrodoKEM document [A + 20]).Since the SHAKE input is only dependant on m and public key, the SHAKE execution in FrodoKEM.Decaps after the PKE decryption is exploitable via the proposed SCA.

Kyber and Saber
The proposed SCA can be mounted on Kyber and Saber as there is similar KR-PCA using sparse ciphertexts and plaintext-checking oracle against them as that against FrodoKEM.Against Kyber, the proposed SCA can recover the secret key on the basis of key-recovery attack against Kyber-512 in Round 2 following the approach of Huguenin-Dumittan and Vaudenay [HV20] (precisely, we use the extended version in Xagawa et al. [XIU + 21]).Against Saber, the proposed SCA recovers the secret key on the basis of the adaptive attack in Huguenin-Dumittan and Vaudenay [HV20] for LightSaber and the attack by Osumi et al. [OUKT21] for Saber and FireSaber.In all cases, the decrypted plaintext will be 0 or a unit vector 0 i−1 1 0 −i−1 .The plaintext-checking oracle can be implemented using the PRF leakage in the re-encryption as well as FrodoKEM.

NTRU
NTRU has two slightly differing KEM schemes-NTRU-HPS and NTRU-HRSS.In the PKE of both KEMs, the public key is h, the plaintext is a pair of "short" polynomials (r, m), and the ciphertext is We can modify the KR-PCA by Hoffstein and Silverman [HS99] and Jaulmes and Joux [JJ00] against the original NTRU, in which m is the plaintext and r is a randomness.These key-recovery attacks modify c = h • r + lift(m) into c = c + δ and check whether a half m of decrypted plaintext (r, m) is equivalent to the expected half plaintext m guess , say, 0, or not.We note that the remaining half r guess of the expected plaintext can be computed from the ciphertext and the expected half plaintext by r guess = (c − lift(m guess )) • h −1 .The primary hurdle to adapting this attack to NTRU is that NTRU's ciphertext space is changed from the original NTRU.Thus, δ should also satisfy δ ≡ 0 (mod (q, x − 1)).This constraint makes analysis complex, and therefore, we do not adopt these attacks.
We can also use the KR-PCA attack against NTRU-HPS and NTRU-HRSS by [DDS + 19] and [ZCQD21], respectively.These key-recovery attacks fix m guess = 0, modify r and r guess , compute c = h • r , and check whether the decrypted plaintext (r, m) is equivalent to the guess (r guess , m guess ) or not.In both attacks, c satisfies c ≡ 0 (mod (q, x − 1)) because h ≡ 0 (mod (q, x − 1)) by design.
We note that NTRU in Rounds 2 and 3 uses SXY [SXY18] as a variant of the FO transformation; this approach does not involve the computation of r ← G(m ) because the underlying PKE.Enc is deterministic.Furthermore, NTRU does not perform the re-encryption test explicitly.However, NTRU still involves the validity check in the decapsulation, which can be exploited via the framework of the proposed SCA.In addition, (un)fortunately, NTRU's decapsulation program in pqm4 computes both keys k = H(r, m) and k = H prf (s, c) and outputs one of them according to the result of the implicit reencryption test.In our experiments, we are able to detect whether m guess = 0 or not from the leakage of these computations of H and H prf or the procedure corresponding to the validity check with a high accuracy.(See Section 6 for the details.)

NTRU Prime
NTRU Prime has two KEM schemes: sntrupr (Streamlined NTRU Prime) and ntrulpr (NTRU LPRime).As NTRU LPRime has a similar structure to Kyber, Saber, and FrodoKEM, it is possible to mount a KR-PCA on it following the approach in [XIU + 21], in which the decrypted plaintext is 1 or a vector of the form 1 i−1 0 1 −i−1 for i.
Streamlined NTRU Prime has a structure similar to that of NTRU.The plaintext is r and a ciphertext is c = Round(h • r), where Round(x) rounds each coefficient of x to the nearest element in 3Z.1 However, there are some technical hurdles to adapting conventional KR-PCAs (or KR-PCAs similar to the above) against NTRU.For example, Streamlined NTRU Prime's PKE.Dec internally checks the Hamming weight of each decrypted plaintext r and overwrites the decrypted plaintext with the fixed plaintext r fixed if the test fails.Very recently, Ravi et al. [REB + 21] propose two key-recovery side-channel attacks against Streamlined NTRU Prime, which is inspired by chosen-ciphertext attacks against NTRU proposed by Jaulmes and Joux [JJ00].
The first attack is based on the "plaintext-checking" oracle, which tests whether the internal variable is 0 or not.The internal decrypted plaintext will be either 0 or some polynomial; in both cases, the Hamming weight will be invalid and the output of the underlying PKE.Dec is r fixed .Thus, in order to implement this "plaintext-checking" oracle, we need to analyze side-channel information of the computation in PKE.Dec(sk, c) rather than PRF, which is out of focus of this paper.
The second attack is based on the decryption-failure oracle, which tests whether the decrypted plaintext is intended to be r or not.If so, the Hamming weight of r will be valid.In contrast, if the decryption failure occurs, the Hamming weight of the decrypted plaintext becomes invalid and it is overwritten by r fixed .Ravi et al. implement the decryption-failure oracle by analyzing side-channel information from the re-encryption test.We can adopt and modify their attack for the proposed SCA, which indicates that the proposed general framework can be instantiated with a decryption failure oracle for the application to Streamlined NTRU Prime as follows:

Ravi et al.'s decryption failure-based attack and our modification: This attack proceeds in two phases:
1.In the first phase, the attack seeks a δ corresponding to a "1-collision" of the secret key by checking the decryption of c = c + δ, where c = Round(h • r valid ) for a correct plaintext r valid .If the decryption failure is detected, then we employ δ as c base .They design the structure of δ carefully.We slightly change the structure of δ to boost the success probability of successfully obtaining 1-collision. 2We then follow their strategy to design δ and estimate the probability of successfully obtaining appropriate δ as approximately 1% and 1.5% for sntrup653 and sntrup1277, respectively.3See [REB + 21, Section 4.1 and 4.2] for the details.
2. In the second phase, the attack queries four ciphertexts modifying c and c base , and checks the decrypted results are r valid or r fixed to determine the coefficient of the secret key.
We note that NTRU Prime uses a variant of the FO transformation that does not involve the computation of randomness because the underlying PKE.Enc is deterministic as NTRU.(Un)fortunately, NTRU Prime uses the explicit re-encryption test and also adds an additional hash HashConfirm(r, pk) to its ciphertext of the underlying PKE, where HashConfirm(r, pk) = Hash(0x02 Hash(0x03 r) Hash(0x04 pk)) and Hash(z) is the first 32 bytes of SHA-512(z).Thus, the decapsulation algorithm computes HashConfirm(r , pk) in the re-encryption test, which leaks side-channel information of r as desired.

HQC
Roughly speaking, HQC has a structure similar to those of the lattice-based KEM schemes Kyber, Saber, FrodoKEM, and NTRU LPRime, even though HQC is based on the code problem.Hence, we can perform KR-PCAs on HQC in the strategy similar to them.Indeed, Huguenin-Dumittan and Vaudenay [HV20] give a KR-PCA against HQC in Round 2 by mimicking the attack by Băetu et al. [BDL + 19] against another code-based PKE Lepton [YZ17].Although HQC changed the parameters and decoder from Rounds 2 to Round 3, adjusting the parameter setting enables us to perform the KR-PCA; see Xagawa et al. [XIU + 21] for details.In their attack, the decrypted plaintext is 0 or a vector of the form 0 i−1 1 0 −i−1 for some i.As HQC employs SHAKE to obtain the decrypted plaintext in the re-encryption, the plaintext-checking oracle can be enacted using the SHAKE leakage via the proposed SCA.

BIKE
BIKE in Round 3 has a single KEM scheme based on the Niederreiter PKE with quasicyclic moderate density parity-check (QC-MDPC) code.In [GJS16], Guo et al. give a key-recovery reaction attack (GJS attack) against QC-MDPC [MTSB13], which is a variant of the McEliece PKE with QC-MDPC codes.Roughly speaking, the decryption oracle can be used to recover the distance profile µ(h 0 ) of one-half of a secret key h 0 ∈ GF(2) n .The distance profile contains (d, µ d ) for d = 1, 2, . . ., n/2, implying that there are µ d pairs of 1's with distance d in h 0 .Guo et al. report that it is possible in practice to recover h 0 from its distance profile µ(h 0 ) in the parameter set for 80-bit security.Xagawa et al. [XIU + 21] report that the GJS attack [GJS16] against QC-MDPC can be partially applied to BIKE in round 3 in the presence of the plaintext-checking oracle.Their approach recovers approximately one-quarter of the distance profile in the parameter set for 128-bit security.The decapsulation of BIKE employs the PRF in the re-encryption (i.e., AES and SHA384), which is exploited to implement the plaintext-checking oracle via the proposed framework.
Note that the GJS attack queries multiple ciphertexts (e.g., 2,000 ciphertexts for each d) from crafted invalid plaintexts at random in order to compute (d, µ d ) and checks whether they are decrypted correctly or not; as a result, it is impossible to fix the template plaintext.

Classic McEliece
The

Isogeny-based KEM
Hereafter, we propose a new SCA on SIKE focusing on the FO transformation.The proposed SCA is based on an adaptive attack on Jao's and De Fao's supersingular isogeny cryptosystem [JDF11] (namely, supersingular isogeny Diffie-Hellman (SIDH)) proposed by Galbraith et al. [GPST16].We describe a modification of their attack for mounting the proposed SCA on SIKE.Decaps in the proposed framework.
Let P A , Q A , P B , and Q B be the public generator points on E 0 , where E 0 is the starting Montgomery curve over F p 2 with p = 2 e A 3 e B ± 1 (in SIKE in Round 3, E 0 is defined as y = x 3 + 6x 2 + x).Let sk 2 and sk 3 be Alice's and Bob's secret keys, respectively.Let the secret points for generating finite cyclic groups for the kernels of Alice's and Bob's isogenies φ A and φ B , respectively.As well, let pk 2 and pk 3 be Alice's and Bob's public keys, respectively.Let E A = E 0 / R A be Alice's public curve isogenous to E 0 with regard to Alice's isogeny φ A with a kernel R A , and let PA = φ A (P B ) and QA = φ A (Q B ) denote Bob's public points on E A (calculated by Alice).As sk 3 acts as the secret key in SIKE, the goal of an adaptive attacker is to recover sk 3 by adaptively querying ciphertexts to the decryption oracle.Note that, in SIKE, Alice corresponds to the sender (and attacker in the proposed SCA) and Bob corresponds to the receiver with a key generation (and victim).
At SIKE.Encaps, Alice computes her secret point R A and public curve E A (and PA and QA ) to generate a reference ciphertext (c 0 , c 1 ), where the reference j-variant for the ciphertext corresponds to a curve E 0 / R A , R B .Here, E 0 / R A , R B denotes the shared curve isogenous to E 0 with regard to an isogeny with a kernel of a finite group , where m is a random number from U ({0, 1} n ) with n ∈ {128, 192, 256}.
In the adaptive attack, we consider a secret key with a ternary digit representation where β i ∈ {0, 1, 2}.The adaptive attack performs the key recovery from the least significant ternary digit upto the most significant ternary digit in an iterative manner.Let us consider the recovery of the i-th ternary digit (i.e., β i ), supposing that the attacker has already recovered up-to the (i − 1)-th digit, that is, β 0 , β 1 , . . ., β i−1 (the attack description includes the initial case, namely i = 0).Let , c 1 ) for τ ∈ {0, 1, 2}, where PA and QA in c 0 are replaced with P (τ,i) respectively.Then, let R AB = PA + [sk 3 ] QA be the cyclic group generator of the isogeny kernel at SIKE.Decaps corresponding to the reference ciphertext (c 0 , c 1 ).In other words, the decryption oracle correctly calculates the cyclic group generator as R AB for a valid ciphertext.On the other hand, when querying (c (τ,i) 0 , c 1 ) to the decryption oracle, the generator of the cyclic group is calculated as and then the j-variant of

Algorithm 3 Key-recovery plaintext-checking attack on SIKE
Input: Reference ciphertext (c 0 , c 1 ) and reference plaintext m Output: Secret key sk 3 1: Function AttackOnSIKE((c 0 , c 1 ), m) 2: K 0 ← 0; 3: for i = 0 to e B − 1 do 4: for each τ ∈ {0, 1, 2} do 5: P (τ,i) , c 1 ), m) = 1 then 9: which follows from the fact that the order of QA is 3 e B .Thus, if R (τ,i) AB = R AB (i.e., τ = β i ), the PKE decryption result is equivalent to the reference plaintext; otherwise, the PKE decryption result is different from the reference plaintext.Therefore, the attacker can obtain the i-th ternary digit of the secret key (i.e., β i ) via a plaintext-checking oracle.Thus, the attack recovers β i in an iterative manner using a decryption oracle that tells whether the j-variant of (c , c 1 ) is equal to that of the reference ciphertext (c 0 , c 1 ), and the full-key recovery is completed within the number of oracle accesses linear to e B .
Algorithm 3 illustrates the KR-PCA on SIKE.In SIKE.Decaps, the j-variant value depends only on c 0 (but not c 1 ), and the PKE decryption result is always identical for a fixed j-variant and c 0 .Therefore, we can realize the plaintext-checking oracle for SIKE through the side-channel leakage form G (i.e., SHAKE) to distinguish whether the input to G is reference plaintext m or other.Algorithm 3 uses the plaintext-checking oracle O (i.e., side-channel distinguisher herein) at Line 8. Thus, the number of distinguisher call needed to carry out full-key recovery is at most 3e B .Note that it can be reduced to 2e B because the attacker knows β i = 2 without querying (c O((c (1,i) 0 , c 1 ), m) = 0. We can also use the SHAKE leakage inside the PKE decryption (i.e., SHAKE(j(E 0 / R A , R B )), and it is denoted by F in the SIKE documentation [J + 20]) instead of G at the decapsulation.Note also that this attack can be readily extended to general SIKE over F p 2 with p = A e A B e B f ± 1 by replacing the base of coefficients (i.e., "3" in the above equations and Algorithm 3) with B and examining τ from {0, 1, . . ., B − 1}.

Complexity analysis
Table 3 lists the number of oracle accesses required for the proposed SCA to recover the full key of NIST PQC third-round candidates for KEM.For the simplicity, Table 3 lists only the results for instances with security levels equivalent to AES128 and AES256 (i.e., NIST security levels 1 and 5, respectively).
From Table 3, we confirm that the key recovery can be achieved with a sufficiently feasible number of oracle accesses.Although BIKE level 1, as the hardest case, requires 3M oracle accesses for a partial-key recovery, most KEMs can be broken within 60,000 oracle accesses.Here, Kyber, Saber, NTRU, NTRU Prime, and SIKE are all less complex than the code-based KEMs, possibly because the number of secret coefficients to be recovered via plaintext-checking oracle is greater for the code-based KEMs.Nevertheless, the proposed SCA would be still feasible even on the code-based KEMs, as some modern SCAs are evaluated with (the order of) more-than 10M or 100M traces (e.g., [SM15, SM19, SBM19]).
Relative to previous CCA SCAs on lattice-based KEMs (e.g., [XPRO20, RBRC20, SKL + 20, NDGJ21]), the proposed attack may require more oracle accesses.This would be because some previous SCAs exploit scheme/implementation-specific aspects for improved 478 (= 2 × 239) † Denote 1,500 distance profiles out of 6,162 full-distance profiles for Level 1.It is difficult to reliably recover key bits more than this using the proposed SCA.
efficiency in terms of the number of traces, although the proposed SCA is applicable to (relatively) black-box implementation.Furthermore, the attack by Ravi et al. [RRCB20] can also perform full-key recoveries of Kyber, Saber, and FrodoKEM with a complexity comparable to the proposed SCA.We stress that the primary advantage of the proposed attack is the generality, as the proposed SCA is applicable to many lattice-, code-, and isogeny-based KEMs.Owing to the high applicability of adaptive attack using the plaintextchecking oracle, the proposed SCA offers a higher degree of generality for KEMs based on the FO transformation and its variants.

Side-Channel Distinguisher Design
This section describes the design of a DL-based side-channel distinguisher.In recent years, several studies have evaluated and demonstrated the significant advantage of DL in carrying out profiling SCAs (e.g., [BPS + 18, KPH + 19, PCP20, ZBHV20, WAGP20]).In a DL-based profiling SCA, a trained NN is used to estimate intermediate value (or its Hamming weight/distance) from side-channel leakage, and the secret key is estimated using the likelihood from the NN output (i.e., occurrence probability for intermediate value or its Hamming weight/distance).Therefore, the conventional DL-based SCA on AES usually utilizes an NN with nine outputs at the output layer corresponding to Hamming weight/distance classification or 256 outputs for intermediate value classification.
The previous studies have developed many NN models to efficiently perform the key recovery with fewer traces (e.g., [ZBHV20,WAGP20]).In the following experiment, we employ a convolutional NN (CNN) and a multilayer perceptron (MLP), as the practicality and effectiveness of CNN and MLP are shown in several previous studies on DL-based SCA.The CNN/MLP for our experiment is designed to have a sufficient model capacity for application to various PRF implementations including software and hardware with and without masking countermeasure.The proposed SCA should perform two-classification of whether the input to the PRF is the reference plaintext or not; therefore, we construct a CNN/MLP model, the output layer of which is with an activation function of Sigmoid and has one output.Note that there is a possibility that we can exploit a conventional NN presented in previous studies on DL-based SCA for the proposed attack by means of fine tuning or transfer learning.However, in this paper, we use a standard CNN/MLP model for the generality, as some of conventional NNs for DL-based SCA are specified for target implementations (e.g., [WAGP20]).
For a successful key recovery, we require a very accurate model to realize a perfect oracle, because an error in oracle would render the recovered key critically incorrect.However, the accuracy of an NN model used for SCA is occasionally nonnegligibly low owing to the presence of noise and SCA countermeasure [ISUH21].To improve the oracle accuracy realized by the model, a simple method is to use multiple traces for one plaintext-checking oracle.More concretely, the attacker acquires t traces for a modified ciphertext c , performs an inference for each trace, and then estimates the PRF input as the majority vote of the inference results.Let a be the accuracy of the model.If using t traces for an oracle, the expected accuracy of the oracle realized by such a majority vote of multiple NN outputs, denoted by α t , is given by We can determine t such that the success rate of the overall attack is larger than a threshold σ; that is, σ ≤ α t u , where u is the number of required oracle accesses shown in Table 3.However, such a majority vote considers the NN output to be a binary value, although the NN output is given as a probability of the PRF input being the reference plaintext; this suggests that the majority vote does not fully exploit the advantages of the NN-based distinguisher.As an efficient alternative, we can determine the plaintext-checking oracle output by means of the likelihood comparison, in which the label is determined according to the negative log likelihood (NLL) for a hypothetical oracle output of b ∈ {0, 1} as where q denotes a probability distribution parameterized by θ (i.e., the trained NN), and x s is the s-th side-channel trace.Because such a method exploits the NN output more effectively than the above majority vote, it can enact the plaintext-checking oracle more accurately.Actually, if q(b | x s ; θ) has been sufficiently trained and approximates the true distribution, such a likelihood ratio test (herein, NLL comparison) becomes the most powerful test according to the Neyman-Pearson lemma [NP33].One major drawback of this method is that it is quite difficult to evaluate the resulting accuracy in an analytical manner; therefore, we experimentally evaluate its effectiveness and practicality.

Experimental setup
In the following experiments, we employed CUDA 11.0, cuDNN 8.0.5, Tensorflow-gpu 2.4.1, and Keras 2.4.0 on an Intel Xeon W-2145 3.70 GHz and NVIDIA GeForce GTX 2080 to carry out the NN training.The learning rate was 0.001, the batch size was 32, and the number of epochs was 100.Table 4 summarizes the hyper parameters of the CNN for traces with 1,000 sample points, where the top and bottom columns denote the input and output layers, respectively, and the remaining hidden layers are connected in the ascending order from the input to the output.In the "Input" row, S 1 × S 2 denotes the input shape, S 1 is the traces size, and S 2 is the input dimension.In the "Operator" row, conv1d(F ) denotes the operation at the each layer and F is the filter size.(a) For non-protected software, nonprotected hardware, and masked hardware implementations, we employ a CNN comprising six convolutional layers Conv1, Conv2, . .., and Conv6 followed by three fully connected layers FC1, FC2, and FC3, as the effectiveness of CNN in DL-based SCA has been shown in many previous studies (e.g., [BPS + 18, KPH + 19, PCP20, ZBHV20, WAGP20]).(b) For masked software implementation, we employ an MLP which consists of five fully connected layers FC1, FC2, . .., and FC5, as we expect that a cascade of fully connected layers may exploit multivariate leakages of masked software more efficiently than a CNN.The output layer (i.e., (a) FC3 and (b) FC5 ) has one output for two-classification.Given a side-channel trace, the CNN/MLP outputs the probability that the plaintext (i.e., PRF input) is equal to the reference plaintext (or, conversely, other plaintext).
Table 5 lists the experimental implementations and the numbers of traces used for the experiments. 5We employ the following five PRF implementations: non-protected AES and SHAKE softwares, non-protected AES hardware, masked AES software, and masked AES hardware.For masked implementations, we target a bit-sliced software and TI-based hardware, which are one of the most promising first-order masking schemes for software and hardware implementations, respectively.We use major publicly-available open-source implementations of non-protected software/hardware and masked software for the reproducibility.In contrast, the masked hardware is implemented by ourselves according to the paper [UHA17], because there are no publicly-available TI-based masked hardware.The source code of the masked hardware is included in our repository.Although SHAKE is more commonly used than AES as PRF/PRG in KEMs, we target masked AES software/hardware since many SCA countermeasures for symmetric key primitives have been developed with consideration and application to AES rather than SHAKE (As far as we know, there is no publicly available masked SHAKE implementation).However, there would be little difference in the attack results produced by AES and SHAKE, if they are protected using the same masking scheme.
To evaluate performance of the distinguish attack (i.e., the model accuracy for enacting the plaintext-checking oracle), the side-channel traces are given in a manner similar to the fixed-vs.-randomTVLA as follows: For AES, the plaintext is given by a fixed or random value to be distinguished, and the key is fixed for both fixed and random plaintexts. 6For SHAKE, the input is given in the same manner as the plaintext for the AES case.
For masked implementations, we explicitly use traces during only masked operations, and we excluded timings corresponding to initial masking.As there are some end-to-end masked implementations that masks all decapsulation procedures including PKE.Dec, re-encryption, and equality check [OSPG18, BPO + 20], we exploit the leakage from only masked operations in attacking such an implementation.More precisely, for masked software implementation, we set a trigger at the timing just before the start of first round after the initial masking, and use traces during masked AES operations.For masked hardware implementation, we acquire the traces during the masked first-round computation.

Accuracy evaluation
Table 6 reports the accuracy of the trained NN on the test sets.From Table 6, we can confirm that the trained NN achieved a meaningfully high accuracy to carry out the proposed SCA except for masked hardware.For the non-protected software and hardware implementations, the NN model achieves a 99.8% and 99.9% test accuracy, respectively.In addition, the NN model achieves a 96.0%accuracy even for the masked software.The 6 In using AES as an XOF for modern KEMs (e.g., Kyber, NTRU LPRime, and BIKE), AES frequently works in the CTR mode in which the plaintext is frequently fixed and the key is the payload.However, we set the plaintext as the payload and set the key fixed in the experiment.This is because we intended to conduct the experiment to validate the proposed attack in a manner more severe to the attacker, such that the evaluation becomes general for various modes of operation and masking implementation.For example, some masked AES implementations (e.g., masked AES software in [SS16] and hardware in [UHA17], which are used in our experiment) do not protect the key scheduling parts because it causes no DPA leakage, although it causes an exploitable leakage for the distinguish attack.Therefore, for a general and severe evaluation, our experiment only aims at exploiting the leakage only from the round function part, which causes always an exploitable leakage independently of the mode of operation and masking implementation.(More precisely, if the plaintext is a payload and key is fixed, only round function part is a leakage source but key scheduling is not even for the distinguish attack, because the key scheduling part always processes an identical value for a fixed key.By contrast, if the key is a seed and the plaintext is fixed, both round datapath and key scheduling datapath are leakage sources for the distinguish attack.Thus, the experiment validates the proposed attack in a more general and severe manner, as the experiment is harder for the attacker than the practice.)masking countermeasure reduces the performance of the distinguish attack.However, for software implementation, the mitigation would not be sufficient to prevent the attack using multiple traces.In contrast, it is difficult to distinguish the input of masked hardware due to the advantage of TI.
We then evaluate the oracle accuracy in using the majority vote and likelihood comparison.The majority vote is evaluated in an analytical manner using Eq.(1) for odd numbers of traces, whereas the likelihood comparison is evaluated experimentally using a shuffled test data repeatedly.More precisely, to evaluate the accuracy using the likelihood comparison using t traces, we repeat the following procedure 10,000 times: we randomly determine b true = 0 or 1, obtain t traces for reference plaintext from the test set if b true = 1; otherwise (i.e., if b true = 0) for random plaintext, calculate the NLL in Eq. (2) for each hypothetical oracle output b = 0 or 1, determine the oracle output b as the smaller NLL, and examine whether b true = b.The distinguisher with a likelihood comparison can achieve a 100.0%test accuracy with at least 2, 2, and 5 traces for non-protected software, non-protected hardware, and masked software, respectively.In contrast, the majority vote requires 5, 5, and 11 traces for a 99.999% accuracy7 for non-protected software, non-protected hardware, and masked software, respectively.Thus, we can confirm that the likelihood comparison would be more accurate and effective than majority vote according to the Neyman-Peason lemma, and the trained NNs can achieve a sufficient accuracy for the key recovery except for masked hardware.In contrast, we find that we cannot achieve a high accuracy (more than 99.999%) fewer than 5,000 traces in our environment, which indicates that the proposed attack cannot break masked TI-based hardware using our NN model.

Evaluation of number of traces for successful key recovery
Table 7 lists the number of side-channel traces required for a successful key recovery when using the side-channel distinguisher evaluated in the previous subsection (except for masked hardware).For non-protected implementations and masked software, we adopt the distinguisher based on the likelihood comparison with 2 and 5 traces according to the evaluation in Section 6.2.The results listed in Table 7 assume that the oracle enacted by the side-channel distinguisher is completely accurate if it achieves a 100.0%test accuracy.
If a larger number of traces is needed to enact an accurate oracle, more traces are obviously required for key recovery.However, our experimental results reveal that the attack is still feasible on KEMs even if the PRF implementation is masked in software, given that the modern SCA evaluation is conducted with more-than 10M or 100M traces (e.g., [SM15,SM19,SBM19]).At least, the first-order masking countermeasures on software are not an essential solution to counter the proposed SCA.Evaluation of the DL-based distinguish attack on higher-order masked implementation will be the subject of important future work.
In contrast, Table 7 does not include the results on TI-based masked hardware, as we could not achieve a sufficient accuracy for mounting the KR-PCA.Thus, TI would be an effective countermeasure against the proposed attack.However, even TI may be broken owing to the development of DL-based SCAs on masked hardware.In fact, most existing studies on DL-based SCA focus on masked software implementation, whereas very few

Conclusion
This paper presented a generic power/EM attack methodology targeting KEMs based on the FO transformation and its variant using a plaintext-checking oracle.The proposed SCA exploits the side-channel leakage during PRF execution in re-encryption to realize a plaintext-checking oracle, namely, to distinguish whether the PRF input is equal to the reference plaintext or not.We demonstrated that all KEMs in the NIST PQC thirdround candidates except for Classic McEliece are vulnerable to the proposed attack.We also presented a DL-based side-channel distinguisher design, which was demonstrated through experimental attacks on various PRF implementations, including implementations protected by a masking countermeasure.Our results confirm that the proposed SCA can perform key recoveries on many KEM implementations, even if the PRF implementation is protected in software.Meanwhile, we also confirm that the proposed attack was not successful on masked TI-based hardware in our environment, which would be an effective countermeasure against the proposed attack as well as existing key-recovery SCAs.
The proposed attack has two significant advantages: the generality and applicability.First, the proposed attack realizes a plaintext-checking oracle through a side-channel leakage of PRF in the re-encryption.Since many PKEs are known to be vulnerable to an adaptive attack using the plaintext-checking oracle, the proposed attack can be generally applied to these KEM schemes.In addition, the PRF and equality/validity check play an essential role in KEMs with a CPA-to-CCA-secure transformation.Although some CPA-to-CCA-secure transforms do not perform a complete re-encryption, the proposed SCA would be applicable even to such variants of FO transformation as long as they employ PRF and/or procedure corresponding to the validity check.Second, the proposed SCA does not require the detailed knowledge of target implementation and therefore can be applied to (relatively) black-box implementations.Thus, in implementing a KEM, we should be aware of the proposed attack if an adaptive attack on the underlying PKE is known and the application can be threatened by power/EM SCA.
In the scenario of SCA on KEM.Decaps, the attacker can perform a profiling using the target device itself without secret key, suggesting that KEM implementations should be resistant to profiling attacks including DL-based ones to counter the proposed SCA.Evaluation of higher-order masking against the proposed attack and developing an effective countermeasure will be the important future work for realizing a secure KEM implementation.As well, the capability evaluation of DL-based SCA on masked hardware is an important subject to validate the security of masked TI-based hardware against the proposed attack.We are also planning to investigate the applicability of the proposed SCA to KEMs others than the NIST PQC third-round candidates.

where
Frode.Encode(m) denotes the encoded plaintext (or initial seed).The corresponding Frode.Decode(M) obtains m by removing the noise ES − E S + E (which corresponds to ke + e in Section 4.1.1).
be the recovered part of the secret key.The attacker generates the modified ciphertexts (c (τ,i) 0

Table 1 :
Applicability of implementation attacks focusing on FO transformation to NIST PQC third-round KEM candidates and their potential countermeasure/mitigation • We investigate the applicability of the proposed attack to NIST PQC third-round KEM candidates (four finalists and five alternatives), and demonstrate that Kyber, Saber, FrodoKEM, NTRU, NTRU Prime, HQC, BIKE, and SIKE are vulnerable to the proposed SCA.The proposed attack achieves a partial key recovery of BIKE.Its applicability to Classic McEliece remains unclear as no adaptive chosen-ciphertext attack on Classic McEliece is known.The applicability of the proposed and conventional attacks are summarized in Table Algorithm 1 CCA-secure KEM based on FO transformation (KeyGen, Encaps, Decaps)

Table 2 :
Summary of variants of FOs in NIST PQC Round 3 KEM Candidates (finalists and alternates): Before version 4.2, BIKE's G uses SHA384 and AES256-CTR.For a function Hash, Hash (x) will output the first bits of Hash(x).SHA3-512 r and SHA3-512 l proposed attack is not applicable to Classic McEliece because there is no known adaptive attack on the PKE of Classic McEliece.However, regarding the decapsulation of Classic McEliece, we can realize a plaintext-checking oracle for the PKE because Classic McEliece computes an additional hash Hash(2, m ) in the re-encryption test as Streamlined NTRU Prime, which indicates that the proposed attack can be mounted on Classic McEliece if a KR-PCA is discovered.

Table 3 :
Number of oracle accesses required by proposed attack for key recovery (except for Classic McEliece)

Table 4 :
NN hyper parameters(a) For non-protected software, non-protected hardware, and masked hardware implementations

Table 5 :
Experimental implementations and numbers of used traces

Table 6 :
Accuracy of NN to distinguish PRF input

Table 7 :
Number of side-channel traces required for successful proposed attack (partial-key recovery for BIKE and except for Classic McEliece) literature demonstrates attack on masked hardware.The investigation of DL-based SCAs on masked hardware is also an important future work.