Attacking and Defending Masked Polynomial Comparison for Lattice-Based Cryptography

. In this work, we are concerned with the hardening of post-quantum key encapsulation mechanisms (KEM) against side-channel attacks, with a focus on the comparison operation required for the Fujisaki-Okamoto (FO) transform. We identify critical vulnerabilities in two proposals for masked comparison and successfully attack the masked comparison algorithms from TCHES 2018 and TCHES 2020. To do so, we use ﬁrst-order side-channel attacks and show that the advertised security properties do not hold. Additionally, we break the higher-order secured masked comparison from TCHES 2020 using a collision attack, which does not require side-channel information. To enable implementers to spot such ﬂaws in the implementation or underlying algorithms, we propose a framework that is designed to test the re-encryption step of the FO transform for information leakage. Our framework relies on a speciﬁcally parametrized t -test and would have identiﬁed the previously mentioned ﬂaws in the masked comparison. Our framework can be used to test both the comparison itself and the full decapsulation implementation.


Introduction
Conventional public-key cryptography, like RSA and ECC, suffers from an ever-increasing threat as large-scale quantum computers advance closer to reality.As a consequence, a public effort to standardize post-quantum cryptography (PQC) was initiated by NIST in 2017 [NIS16].With the NIST process recently reaching the third round, implementation security and protection against side-channel or fault attacks is emerging as an important criterion for standardization [AASA + 20].
So far, several works have shed light on the side-channel vulnerabilities of lattice-based PQC schemes.Authors exploit vulnerabilities at different levels, including but not limited to, building blocks like the number theoretic transform (NTT) [PPM17], message encoding [ACLZ20], or the Fujisaki-Okamoto (FO) transform [DTVV19,GJN20,RRCB20].A commonly applied countermeasure to protect implementations against side-channel attacks is to mask every sensitive step of the algorithm.An important observation is that lattice-based key encapsulation mechanisms (KEM) achieve semantic security with respect to chosen-ciphertext attacks (CCA) by application of the FO transform [FO99,HHK17].Even though the additional operations in this transform do not directly process secret information, they still have to be protected against leakage as the processed data depends on the secret key.The importance of this aspect has recently been demonstrated by side-channel attacks [DTVV19,RRCB20] on lattice-based schemes and more recently a timing attack [GJN20] on FrodoKEM [NAB + 20].
Besides reports of first-order masked implementations of a scheme similar to NewHope in [OSPG18] and Saber in [BDK + 20], the need to protect the FO transform has also led to works that specifically look into building blocks and also their higher-order security.While arithmetic masking to protect arithmetic operations in the polynomial ring R q = Z q [X]/(X n +1) can be scaled to higher orders easily due to the linearity of the operation, it is not trivial to achieve this property for polynomial sampling [SPOG19] or the final comparison step 1 [BPO + 20] of the FO transform.
In short, securing the FO transform and the required hashing, re-encryption and comparison is non-trivial.And while works like [OSPG18, BPO + 20] provide a security proof, it is still important to check the correctness of their assumptions and arguments.This aspect is reinforced by the fact that small leakages or inconsistencies in the security reasoning can already be devastating.In many scenarios (e.g.[DTVV19, RRCB20, GJN20]) an attacker can retrieve the full secret key by any information on whether data processed by the FO transform differs between different input ciphertexts.

Contributions
In this work, we focus on flaws in previously proposed masked comparisons that were used as part of the FO transformation.We show that the approach for a first-order masked comparison from [OSPG18] leads to a vulnerable implementation.Moreover, we describe two attacks on the higher-order masked comparison from [BPO + 20].One is a collision attack that does not require any side-channel information.The second attack breaks the scheme by a first-order side-channel attack.We show that all these attacks allow an adversary to efficiently retrieve the secret key by applying it on a Kyber implementation that uses the aforementioned vulnerable masked comparisons.
As a result, we indicate that the masked comparison should be an atomic operation applied to all inputs, such that no comparison result on a subset of coefficients is leaked.We then fix [OSPG18] using the strategy of [BDK + 20], and we propose a correction of the flaw in the [BPO + 20] scheme.
We also highlight an issue in the leakage testing done by the original authors [OSPG18, BPO + 20], namely the existence of non-malevolent output leakage.This non-malevolent leakage can obfuscate real leakage, making it harder to capture real vulnerabilities as for example the ones reported in this paper.In response, we introduce a framework that is explicitly designed for leakage testing in security-critical parts of the FO transform.To validate our approach, we show that our framework makes it easier to catch the problems present in [OSPG18] and [BPO + 20].

Preliminaries
In this section, we introduce our notation and provide details on the Kyber KEM.In addition, we recall previous work on masked implementations of lattice-based KEMs and the protection of the comparison operation in the FO re-encryption against side-channel attacks.

Notation
For x ∈ R, we write x to mean the closest integer to x (where y+ 1 2 := y+1 for y ∈ Z).For a,b ∈ Z, we write a mod (+) b for the unique integer â ≡ a mod b such that 0 ≤ â < b, and similarly a mod (±) b denotes the integer â ≡ a mod b such that −b/2 ≤ â < b/2.We extend these definitions to tuples, vectors, matrices, and polynomials a over Z component-wise.When the exact representation is not important, we simply write a mod b.Let Z q denote the quotient ring Z/qZ for an integer q ≥ 1.Let R = Z[X]/(f ), where usually f = X n +1 for n a power of 2, and R q = R/(q) = Z q [X]/(f ) for some positive integer q.We let R k (resp.R k q ) be a ring module of dimension k over R (resp.R q ).We identify equivalence classes in R q with their representative polynomial with coefficients mod (+) q.By {0,1} z we denote the set of z bits and by {0,1} * we denote the set of bits of arbitrary length.We use the notation a i or a[i] for i = 0,...,n−1 to access the i-th coefficient of a. Matrices of elements in R q are denoted as upper case letters.For a given set S and a probability distribution D over S, we use s r ← − D to mean s ∈ S sampled according to D using coins r.In addition, we use s $ ← − S to mean s ∈ S sampled uniformly at random from S. Hereby, U (q) denotes the uniform distribution on R q , whereas χ denotes an error distribution to be defined for the specific algorithm.
If a variable v gets overwritten as part of a loop (e.g.v ← v/2), we may refer to the variable v after step i of the loop as v [i] (e.g.v [i] ← v [i−1] /2).A plain variable A that is split into integer S shares is denoted in bold notation as A. The plain value of an arithmetically shared A mod q (in Z q ,R q , or R k q ) is reconstructed as A = S j=1 A (j) mod q.To make clear that we refer to shares we use round brackets in the A (j) notation to refer to the j-th share for 1 ≤ j ≤ S. We also use the notation A ∈ Z (S) q to denote that A is split into S shares and each share is in Z q .Individual shares, e.g.A (1) , are denoted in bold to make it easier to identify them in algorithms.By || we denote the concatenation operation.The input to a comparison algorithm compare is usually denoted as A and Ã and may be in Z q ,R q ,{0,1} * depending on the context.The result of compare(A, Ã) is either true if A = Ã and f alse otherwise.

The Kyber Key Encapsulation Mechanism
In this work, we use the Kyber KEM to showcase our attacks and countermeasures.Kyber is an IND-CCA2-secured KEM and a finalist in round three of the NIST PQC standardization process [SAB + 20].Kyber was first described in [BDK + 18] and uses the Fujisaki-Okamoto (FO) transformation [FO99,HHK17].The FO transform is applied to an intermediate chosen-plaintext attack (CPA) secured public-key encryption (PKE) to achieve IND-CCA2 security.The three parameter sets Kyber512, Kyber768, and Kyber1024 are claimed to have a security level equivalent to the security of AES-128, AES-192, and AES-256, respectively.All Kyber variants share the parameters n = 256, q = 3329, η 2 = 2 and the security level is defined by appropriately setting k, η 1 , d t , d u , and d v .Computations in Kyber are performed in the ring R q = Z q [X]/(X n +1) and by M = {0,1} n we denote the plaintext space, where each message m ∈ M can be seen as a polynomial in R q with coefficients in {0,1}.Kyber uses ciphertext compression to reduce the size of the ciphertext following standard techniques also applied by other lattice-based schemes.The Kyber compression is defined as: (1) For later reference, we provide a simplified 2 version of the public-key encryption scheme Kyber.CPA = (Kyber.CPA.Gen, Kyber.CPA.Enc, Kyber.CPA.Dec) as in Algorithms 1, 2, and 3. We set χ η as the centered binomial distribution with support {−η,...,η}, Algorithm 1: Kyber.CPA.Gen.

The Fujisaki-Okamoto Transform and Physical Attacks
The FO transformation [FO99,HHK17] of Kyber requires, besides access to Algorithms 1, 2, and 3, two different hash functions H and G as well as a key derivation function (KDF).For future reference, we now provide a simplified3 description of the IND-CCA2 Kyber KEM in Algorithms 4, 5, 6.The main idea of the conversion is to check the validity of a ciphertext in Kyber.CCAKEM.Decaps after decryption by performing a so-called re-encryption.In Algorithm 6, a candidate ciphertext c := Kyber.CPA.Enc(pk,m ,r ) (see Line 4) is obtained and then compared with the input ciphertext c.The goal is to detect maliciously crafted ciphertexts that could be used to reveal the secret key.In Figure 1, an overview of the application of the FO transformation on Kyber KEM is given, where operations that depend on the secret key are indicated in grey.
The security properties of the FO transform only hold if an adversary is not able to learn information on intermediate values processed during the re-encryption operation.This is because the input to the re-encryption depends on the decryption which uses the secret key.The only information that an attacker should get is an accept or a failure.This information can either be provided explicitly as a reject/fail flag or implicitly by outputting either the correct key or in case of a failure a random/constant key.Thus, side-channel leakage in any form from the FO transform could lead to serious security flaws.D'Anvers et al. [DTVV19] first presented an implementation attack, using timing information present in the error correction of LAC and Ramstake.Timing leakage was also exploited in [GJN20], this time in the FO validation check of FrodoKEM.Even constant-time implementations can still be vulnerable to other types of side-channel leakage.Electromagnetic radiation from a hashing step in the FO transform has similarly been exploited as a covert channel in [RRCB20], highlighting the need for side-channel countermeasures on top of constant-time implementation.
In this paper, we specifically look at the protection of the comparison operation.We will summarize the two masked comparison operations analyzed in this paper below.

Masked Comparison from [OSPG18]
To goal of [OSPG18] is to compare a public polynomial Ã with a sensitive and first-order masked polynomial A, which is split in two shares A (1) ,A (2) so that A = A (1) +A (2) .The main idea of the OSPG method is to introduce an additional hashing step before the comparison.
Variables A,A (1) ,A (2) , Ã can be defined as polynomials in R q or just coefficients of a vector in Z n q as the ring structure of R q is not relevant and the only required operation is coefficient-wise addition mod q.The main idea is that the hashing step prevents leakage of the sensitive polynomial A. Thus, the comparison is rewritten as where the last step relies on the collision-resistance of a cryptographic hash function H : R q → {0, 1} d with d output bits (e.g., d = 256).The attacker does not learn additional information on A in case of a failed comparison as the unmasking is done on a hashed value.This approach is illustrated in Algorithm 7.
Algorithm 7: Masked Comparison of sensitive A split into shares A (1) , A (2) with public Ã [OSPG18, Algorithm 5] In [OSPG18], there are three input ciphertext components, c1 , c2 , and c4 that must be compared to their re-encrypted counterparts c 1 , c 2 , and c 4 The authors mention that the adversary can adaptively change c 2 or c 4 and use the output leakage of compare(c 2 ,c 2 ) (resp.compare(c 4 ,c 4 )) to distinguish the decrypted m cpa .They note that the same does not apply to the input component c1 , and propose to first execute compare(c 1 ,c 1 ) individually.Only if this final comparison is valid, the other two comparisons may be conducted [OSPG18,Section 3.4].Note that we show in Section 3.2 that this assumption is incorrect and that even the output of compare(c 1 ,c 1 ) leaks sensitive information.

Application to Saber
For their implementation of Saber [BDK + 20], the authors focus on first-order security and therefore re-use the masked comparison of [OSPG18].In Saber, it is required to check two sensitive and shared re-encrypted ciphertext components c 1 ∈ R (2) p and c 2 ∈ R (2) t for equality with the user-provided ciphertext c1 ∈ R p and c2 ∈ R t .Different from the original method, the authors implement only a single check.In order to do so, they instantiate the hash function as H : R p ×R t → {0,1} d with a simple concatenation of the inputs.In other words, it is checked whether (6)

Masked Comparison from [BPO + 20]
A method to achieve higher-order security for the masked comparison was presented by Bache, Paglialonga, Oder, Schneider, and Güneysu in [BPO + 20].They took a significantly different approach, as it is unclear how the hashing-based approach from [OSPG18] can be used with a higher-order masking scheme.Moreover, a straightforward solution based on a higher-order protected A2B conversion would seem to be too inefficient.
For consistency with related algorithms and our introduction of Kyber, we change the notation used in [BPO + 20] when recalling the BPO approach.By n we denote the number of input coefficients to the comparison algorithm.As the algorithm does not exploit any ring structure, the number of input coefficients n (originally k) may be n = n for a polynomial in R q but could also be the concatenated vector representation of various polynomials in R q (e.g., n = kn to account for the modular structure in Kyber).Inputs can be in Z q (originally F q ) for a prime q.A prime modulus is required by the security proof and thus the approach would not be directly applicable to Kyber, due to compression, or Saber, due to the Algorithm 8: MaskedSum of m-th set according to [BPO + 20, Algorithm 5] m mod q 14 end 15 return B m , Bm power-of-two modulus.The comparison algorithm requires the sensitive input A to be shared in S (originally n) shares.The shares of A are compared with the unshared public value Ã.
The idea of [BPO + 20] is to perform this comparison on summed up and randomized subsets of A and Ã.The value A is assumed to be a vector of n coefficients in Z (S) q and Ã is assumed to be a vector of n coefficients in Z q .The actual value of n depends on the scheme in which the masked comparison is instantiated, e.g., n = (k +1)n for Kyber.To run the comparison, both are split into x sets of cardinality l = z x (originally, k is used instead of z) assuming that l divides n .Then Algorithm 8 is run on each m-th subset and for each result compare(B m , Bm ) for 0 ≤ m < x is run.If all comparisons return true, the final result is true.Note that we follow the notation of [BPO + 20] to avoid a notation clash by denoting the m-th subsets A m and Ãm just as A and Ã in Algorithm 8 to be able to use the subscript for accessing individual coefficients 4 .

Attack
In this section, we present three attacks against masked comparison algorithms, one against the comparison proposed in [OSPG18] and two against [BPO + 20].Our attacks make use of the fact that it is possible to submit slightly modified ciphertexts and observe whether a decryption failure occurs.These attacks invalidate the implicit assumption of [OSPG18] and [BPO + 20] that the output of the masked comparison on a subset of coefficients is not sensitive information.
We will first detail how plaintext checking oracles, i.e. an oracle that tells if a ciphertext is decrypted correctly so that m = m , can assist in full key recovery.As in [GJN20], we target the specific binary information from a decryption failure oracle in the FO validation check.In contrast to this work which targets FrodoKEM, we also consider schemes that use 4 The reader might substitute A ciphertext compression such as Kyber.This compression prohibits gaining exact equations in the secret in our case, which diminishes the information we retrieved from the oracle.To counter this, we show that we can still retrieve approximate equations in the secret key which, combined with the leaky-LWE framework of Dachman-Soled et al. [DDGR20], allows to effectively retrieve the secret key.
Finally, we describe how to obtain efficient plaintext checking oracles for implementations that use the masked FO validation checks of [OSPG18] and [BPO + 20].For the latter, on top of an oracle assisted by side-channel traces, we construct an oracle that takes advantage of collisions in the validation check, and that does not need any side-channel information.As opposed to the attacks in literature, due to the side-channel protections in place, we have to limit the attack to modifying only one coefficient of a valid ciphertext.The reason for this restriction will become clear later.

Generic key recovery from decryption failure oracles
An inherent feature of many lattice-based encryption schemes is the possibility of decryption failures and the resilience to noisy ciphertexts.Algorithm 3 shows the CPA-secure decryption of Kyber.Momentarily disregarding the compression and NTT operations, the decryption is essentially a two-step approach.First, the decompressed ciphertext components (u, v) are combined with the secret key s to find the intermediate polynomial x : In this equation, w is a secret error term that has components both due to compression, as well as the sampled noise elements of MLWE.We refer to [SAB + 20] for the full details.After this first step, x is compressed as in Equation 1 to find m : In Compress q (x ,1), each coefficient of the polynomial x gets decoded into exactly one message bit.If −q/4 ≤ x [i] < q/4, the resulting message bit m [i] is zero, and when the inverse is true, m [i] is equal to one.This means that a decryption failure indicates: for at least one coefficient i. Parameters are typically chosen such that a decryption failure for valid ciphertexts happens with only negligible probability.The polynomials u, v and m are part of the input and are typically known to the adversary.Fluhrer [Flu16] showed that knowledge of whether m equals m, i.e. a decryption failure, is enough to recover the secret key s by successively adapting u, v and observing whether a decryption failure occurs.
Due to practical constraints, we will limit our attack to adapting only one coefficient of a valid ciphertext.A possible attack would proceed as follows.Consider an adversary that submits noisy input ciphertexts (u, v+e•X i ) for a given i ∈ [0,n−1], where u and v are correctly generated, but with e•X i ∈ R q an additional noise term with only one non-zero coefficient e.In Equation 7, this noise term will appear as an additive term to the secret error term w: which leads to the decryption failure conditions: Note that we assume a failure will in this scenario only happen at the adapted coefficient i, since the failure probability of valid ciphertexts is negligible.An adversary that can submit multiple such ciphertexts, can adaptively tweak e to look for decryption failures.If he finds an error e that does trigger a decryption failure, while e−1 does not trigger a decryption failure, he learns the following equation in the secret: Similarly, an error e that does trigger a decryption failure, while e+1 does not, indicates: We will call a polynomial e•X i that triggers one of these two conditions a border-failure error.Such a border-failure error is illustrated in Figure 2 for m[i] = 0, where the error term with coefficient e results in a decryption failure (m [i] = 1), but the error term with e−1 still results in a correct decryption (m [i] = 0).By performing a binary search as discussed in [GJN20], one can find a border-failure error, and thus an exact equation, in at most log 2 (q) iterations (e.g. 12 iterations for Kyber768), by carefully selecting the value of e 1 in each iteration to divide the search space in two.Further, by obtaining kn independent equations, it is possible to fully retrieve the secret error term w and to construct exact equations in the coefficients of s.
In theory, the Fujisaki-Okamoto transformation [FO99] of IND-CCA2-secure lattice-based encryption schemes prevents such attacks by checking if a ciphertext is valid and rejecting adapted ciphertexts.However, an adversary that is able to construct a plaintext checking oracle would be able to circumvent the security of the FO transform and execute this attack.
Compression An additional challenge occurs when applying this technique for key recovery of schemes with compression (e.g.Kyber), which is not covered by [GJN20].For schemes that do not perform compression on the ciphertext term v it is possible to obtain exact equations in the secret key as explained above.Compressed ciphertexts, on the other hand, are of the form This obstructs an adversary to input fine-grained errors e in v, as these small errors would be removed due to the compression.
w However, it is possible to inject errors E into the term c 2 , which will result in coarse-grained errors e ≈ q/2 du E in the decompressed ciphertext after compression.
This means that the space of possible noise values for e is effectively reduced to rounded multiples of q 2 dv .Ciphertext compression thus prevents us from freely choosing the noisy input ciphertext v+e•X i , which in turn prevents us from getting exact equalities.However, similar to the procedure above we can search for values of E, where E does trigger a decryption failure while E − 1 does not trigger a failure.From such an observation we learn approximate equalities in the secret of the form: which can be written as an approximate equation: Figure 3 visualizes the injection of coarse-grained errors when compression is in place.Given a border-failure error E as explained above, w can take any value in the light gray marked interval.
Solving Mod-LWE using (approximate) equations Dachman-Soled et al. [DDGR20] introduced a tool to estimate the security of an LWE sample in the presence of linear hints about the secret.We can use this framework to include our exact or approximate linear equations and estimate the remaining cost of retrieving the secret.In the case of approximate equations, we consider an approximation error variance equal to the variance of a uniform distribution with width q/2 dv .Figure 4 gives the security of Kyber512 and Kyber768 after inclusion of approximate or exact hints, which correspond to compressed and non-compressed ciphertexts respectively.

Side-channel attacks
We will now describe how to construct a plaintext checking oracle that can be used for the attack described before.During the FO transformation, a validation check c = c is used to detect malformed ciphertexts.As mentioned in Section 2.3, the final accept/fail result bit of this check is not sensitive information.This property is used in both [OSPG18] and [BPO + 20], where this bit is unmasked in the implementation.Yet, while the final result bit of the check is not sensitive, information on which part of the message fails certainly is sensitive.Consider, as before, input ciphertexts of the form c = (c 1 ,c 2 +E •X i ) with E •X i again polynomial with only one nonzero coefficient E. When E is sufficiently small so that m = m, decryption succeeds and c = (c 1 ,c 2 ) = (c 1 ,c 2 ).On the other hand, when E causes a decryption failure, m = m will essentially randomize the reencryption through the hashing step in the FO transform (Figure 1).Similar to [GJN20], our decryption failure oracle constitutes visible side-channel leakage between compare((c 1 ,c 2 + E •X i ),(c 1 ,c 2 )) and compare((c 1 ,c 2 +E •X i ),(rand 1 ,rand 2 )).We will show that this leakage is present in both [OSPG18] and [BPO + 20] because they unmask partial checks.
In [OSPG18], it is noted that the different ciphertext components c 1 and c 2 have different sensitivity on the decrypted message m .The authors argue that m is only sensitive for an invalid c 1 .Consequently, they propose to first check the validity compare(c 1 ,c 1 ), with compare as in Algorithm 7, and only conditionally execute compare(c 2 ,c 2 ) in a second step.However, in the plaintext checking oracle outlined above, c 1 is a valid ciphertext, since the error is injected into c 2 +E •X i .Moreover, the output of compare(c 1 ,c 1 ) is sensitive, since m = m results in check(c 1 ,c 1 ), whereas in the case m = m this routine will be compare(c 1 ,rand 1 ).Even though the distinguishable input c 1 = c 1 or c 1 = rand 1 is initially masked, the final output of this check is unmasked, and reveals the occurrence of a decryption failure.
The masked comparison of [BPO + 20] leaves open a very similar decryption failure oracle to that of [OSPG18].In Algorithm 8, the comparison proceeds in sets of l coefficients, where the pass/fail bit is unmasked for every set.Again, the occurrence of a decryption failure will be readily visible in the side-channel information, since compare((c 1 ,c 2 +E •X i ),(c 1 ,c 2 )) will fail only in the set that contains coefficient i, whereas compare((c 1 ,c 2 +E •X i ),(rand 1 ,rand 2 )) will likely fail in all of the sets.Note that for our implementation of Kyber768, since the [BPO + 20] requires a prime modulus, compare is not directly instantiated with MaskedSum, but must first decompress the ciphertext elements into R q .
Practical attack To experimentally verify the decryption failure oracles outlined above, we integrated the masked comparisons of [OSPG18] and [BPO + 20] into the Kyber768 ARM Cortex-M4 implementation available in PQM4 [KRSS].The masked comparison of [BPO + 20] requires fresh randomness within the algorithm.When this randomness is sampled using rejection sampling, collected traces are misaligned due to the variable timing of this routine.Therefore, to simplify evaluation, we sample all fresh randomness in advance and pass it to the masked comparison routine as an input parameter.
We use an STM32F407VGT6 chip, mounted on a custom PCB to facilitate power measurements.This custom PCB results in a very stable behavior of the STM32F407 chip.It contains a dedicated shunt resistor to monitor the instantaneous power consumption but is otherwise stripped of unnecessary components that would introduce additional noise into the measurements.The PCB is driven by an external power supply at 3.6 V and clocked by an external clock at 8 MHz, to get maximum stability.
To collect power traces, we use Tektronix DPO 70404C digital oscilloscope and set it to sample power traces at 65 MSamples/s.A PA 303 SMA pre-amplifier performs analog preprocessing of the collected traces before they enter the oscilloscope.A central PC is used to communicate input/output data to the board through a serial USART connection, as well as to collect and analyze power measurements.
We use Welch's t-test to detect differences in the power consumption between two classes of measurements.This test computes the so-called t-statistic for every sample in the measurements as: where X 1 and X 2 denote the means of each class, σ 2 1 and σ 2 2 their respective variances, and N 1 , N 2 the number of samples.
We construct the t-test classes as the input ciphertexts (c 1 ,c 2 +E •X i ) and (c 1 ,c 2 +(E − 1)•X i ).We specifically target the masked comparison step, where as explained above, we expect t-test leakage when E triggers a decryption failure (c = (c 1 ,c 2 )) but E −1 does not (c = (rand 1 ,rand 2 )).This leakage would allow us to construct approximate equations in the coefficients of the secret key, reducing the R-LWE security.
We generated a pseudorandom Kyber ciphertext from an input seed and conduct the attack described above, with results illustrated in Figure 5. From this figure, it can be deduced that E = 5 is the border-failure error.When E = 4, both E and E −1 = 3 are too small to trigger decryption failures, and they do not leak in the output of the partial comparisons.Conversely, when E = 6, both E and E −1 = 5 trigger a failure, also preventing partial comparison output leakage.In the middle case where E = 5, E −1 decrypts correctly but E triggers a decryption failure, and there are large leakage spikes that can be observed.As expected, for [OSPG18] there is a single peak, which we pinpointed to the unmasking of the partial comparison compare(c 1 ,c 1 ).On the other hand, for [BPO + 20] there are 16 peaks corresponding to the unmasking of the l = 16 partial sets.Our attacks require a very limited amount of traces to detect the border-failure error.When looking at the maximum absolute t-statistic value over the measurements on the right side of Figure 5, the border-case failure is detectable after a few hundred traces for [OSPG18] and already after the first block of 50 traces for [BPO + 20].
To further illustrate why E = 5 is the border-failure error, we inspected internal variables.For our pseudorandom Kyber ciphertext it holds that: Since 986 ≥ q/4 = 832.25, the ciphertext decrypts to m [0] = 1, such that indeed E = 5 is the border-case failure error term.From the side-channel information, an adversary that does not know (v−s T u)[0] = −55 can now compute it approximately, and finds that:

Collision attack
In this section, we show that it is possible to extract information on the Kyber secret key using a reaction attack without side-channel information when the method from [BPO + 20] is used for comparison in a masked implementation, thus breaking the CCA security of the scheme.Firstly we recall the four scenarios an attacker may experience when querying a decryption oracle.Assume c a is a valid ciphertext, c is the re-encrypted ciphertext and ca is the ciphertext that is sent to the oracle by the attacker.
1. ca = c a : A valid ciphertext has been provided such that m = m and thus the reencryption uses the same random coins r as the original encryption.The result of the final comparison c = ca is true.
2. ca = c a +e, where e is a polynomial with a limited number of small coefficients: A valid ciphertext is slightly modified by a small error but in a way that the decryption still results in m = m .The result of the final comparison c = ca is false but the difference between the two ciphertexts is small as e = ca − c.In the rest of this subsection we will consider e a polynomial with only one non-zero coefficient, which means that the difference between ca and c is limited to only one coefficient.
3. ca = c a + e, where e is a polynomial with one or more large coefficients: A valid ciphertext is modified in such a way, that m = m after decryption (e.g., in one bit).Thus the re-encryption operates on completely different random coins r than the original encryption.Internal variables of the re-encryption differ from the original encryption.The result of the final comparison c = ca is false and the difference of ca and c is very large as c can be considered to be randomly generated due to the different coins.
4. ca is uniform: A completely unrelated and invalid ciphertext is sent to the decryption oracle.Same result as when e is large.
The masked comparison of [BPO + 20] has a certain probability of false positives, where invalid ciphertexts are still accepted.These false probabilities are denoted collisions and the authors analyzed that they occur with probability 1/q x with x being the number of comparison sets used.In [BPO + 20] the authors recommend a value of x = 16 for Kyber.The probability of a collision in a single set P single−coll was shown to be 1/q.
However, their analysis included scenarios 1, 3, and 4, but did not take into account the possibility of scenario 2, where only one coefficient and thus only one set fails.In this scenario, the false positive probability equals the single set collision probability 1/q.This relatively high collision probability is what we exploit in our attack.
In practice, our attack queries the decryption failure oracle many times with a slightly modified ciphertext (c 1 ,c 2 +E •X i ) as in Subsection 3.1.As before, E •X i is a polynomial that is non-zero except for its i th coefficient E. If no decryption failure occurs we are in scenario 2 and the scheme will return a correct decryption of the original ciphertext with probability P single−coll .If it did cause a faulty decryption, we are in scenario 3 and the ciphertext (c 1 ,c 2 +E •X i ) will not accept the message5 .This means that when the ciphertext (c 1 ,c 2 +E) is accepted we know that m = m , and this acceptance happens with probability 1/q.
However, when m = m we do not get any definite proof as the ciphertext will always be rejected.In this case, we can construct an error term E where: which results in an error: e ≈ e−q/2 mod (+) q. (22 The effect of the term q/2, is to flip the message, and x becomes: This way, (c 1 ,c 2 + E • X i ) behaves in the opposite fashion: m = m will always lead to a rejected ciphertext, while m = m will succeed with probability 1/q.On average, the decryption oracle has to be queried 2q times, q times with error E and q times with error E, to obtain one correctly decrypted message from the original ciphertext which indicates whether m = m or m = m .As such we have successfully constructed a decryption failure oracle that can be used in an attack as outlined in Subsection 3.1.Analogously to before, we aim to find the border-failure error E so that E −1 does not trigger a decryption failure but E +1 does.
We verified the attack in practice using the PQClean implementation of Kyber768 on Intel Core i5 -8350U CPU with 1.7 GHz.Without the compression of Kyber, recovery of a border-failure error took around 3 minutes on average.That includes around 2 16.2 calls to the decryption oracle, which is a combination of the binary search cost log 2 (q) and the cost of finding one border-failure 2q.As can be seen in Figure 4, obtaining 2 11 equations leads to a full recovery of the secret key in this case.In conclusion, for Kyber768 without compression, it would be possible to retrieve the full secret key in 2 27.2 calls to decapsulation.This attack would take roughly 100 hours on our device.
For Kyber512 or Kyber768 with compression, we are expecting to find an approximate equation with 2qlog 2 (2 dv ) ≈ 2 13.7 queries, where 2 dv is the number of values E can take and 2q again the cost to find a collision.The cost of the remaining security can be calculated by [DDGR20] and is graphically presented in Figure 4.All in all, the number of queries and remaining computational effort for Kyber768 is still high.However, for a fast decryption oracle, an attack onto Kyber512 with 2 13.7 •2 17 = 2 30.7 queries to obtain 2 17 equations that lead to a remaining computational complexity of 2 65 , which seems within practical reach.

Correcting [BPO + 20]
The first order masked comparison of [OSPG18] can be fixed easily by combining both c 1 and c 2 in a single hash for comparison as done in [BDK + 20].The higher-order masked comparison of [BPO + 20] is not straightforward to fix.One option is to fall back to a generic comparison where all coefficients of the ciphertext are first converted into a Boolean sharing after which the comparison is performed using an appropriate masked Boolean circuit.This would require kn A2B conversions for c 1 and n for c 2 , which is a total of l A = (k+1)n A2B conversions, where n is the number of coefficients of the polynomials in the ring R q and where k is the number of polynomials in the secret vectors.Additionally one needs an appropriate Boolean circuit to perform the comparison on the masked Boolean shares.
It is however possible to use the idea of [BPO + 20] to reduce the number of A2B conversions needed.The idea is to combine all coefficients in one masked sum, and to perform a masked comparison on only this sum instead of all coefficients separately.This would lead to a masked comparison that only needs one A2B conversion, but with a probability of a false positive (i.e. the comparison returns true while the inputs are not equal) of 1/q.To reduce the false positive probability one can then repeat this procedure l B times, leading to a false positive probability of 1/q l B .In such scenario, it is important that the final l B comparisons are performed in a secure way so that only the final output of all comparisons combined is outputted and no intermediate comparison results are leaked.
To provide security and avoid the problems in [BPO + 20], we introduce two significant changes to the original proposal of [BPO + 20]: First, all coefficients are combined in every masked sum B, to avoid the attacks as described in the previous section.Secondly, we generate the masked sum B in S shares instead of just 1 share in the original proposal, which is a requirement to obtain (S −1) th order security.
Algorithm 9 gives an algorithm that compresses the l A = (k+1)n coefficients that need to be compared in a side-channel secure way, to l B coefficients that need to be compared in a side-channel secure way.We will first prove that the l B remaining coefficients give the same result after comparison as the initial l A coefficients, except for a false positive probability of 1/q l B .We will then prove the security of our algorithm.An experimental validation of our algorithm can be found in Appendix B. i B with probability (q −1)/q.Moreover, note that the values of Bi B − S j=1 B (j) i B are independent for different i B , as the randomness R [i B ,1] varies in each iteration.

Algorithm 9: ReduceComparisons
Due to this independence, the probability that for all Taking into account that probabilities sum to one, the probability that at least for one Theorem 2. Algorithm 9 is t-NI at any order t<n.
Proof.The essence of the proof is that in Algorithm 9, different shares of the same input variable are never combined.Every intermediate and output value depends on maximal one share of each input variable.The intermediate values are can be simulated uniformly at random from Z q .Secondly, any intermediate value i B is constructed without involvement of the sensitive input values and therefore are perfectly simulatable without knowing any input share, by following the exact procedure in Algorithm 9. Finally any intermediate value of only depends on the j th shares of A i A and thus can be simulated without knowledge of any A t i A where t = j.
After the application of Algorithm 9, the number of coefficients that need to be compared is reduced from l A = (k + 1)n to l B , where l B should be chosen so that the false positive probability 1/q l B is small enough to avoid that an adversary can practically find such false positives.For Kyber768, when a false positive probability of maximum 2 −128 is required, this implies a reduction from l A = 1024 coefficients to l B = 11 coefficients that need to be compared.
However, the constraints of [BPO + 20] still apply to our proposed correction, mainly the fact that q needs to be prime.Schemes with power-of-two moduli or schemes that perform compression of ciphertexts to power-of-two moduli, as is frequently the case in lattice-based encryption schemes, need an additional masked operation to convert the power-of-two shares to prime moduli.This might complicate the usage of our method, or might in extreme cases make it more expensive than the generic masked comparison.The application of Algorithm 9 is therefore scheme-specific and as it is not the main target of this paper, we refrain from an extensive treatment here and leave this for future work.

Framework for side-channel testing
The masking schemes that were attacked in Section 3 were tested by the original authors for side-channel information leakage using Test Vector Leakage Assessment (TVLA) [GJJR11].A natural question is then why this side-channel leakage did not show up in the test results and how to develop a test that would capture these leakages, which can be used to replace the traditional test.
A standard scenario for testing is a fixed vs random (FvR) test, in which the fixed category consists of valid input ciphertexts and in which the random test set comprises of uniform random inputs u,v ← U (q) k×1 ×U (q).Remember that the result of the comparison is allowed to be leaked, but no other information besides this one bit should be leaked.As the comparison output is different between both testing sets, there is potential for non-malevolent output leakage showing up in the t-test.While this output leakage can be ignored, it is hard to assess if no real leakage is ignored in the same breath.For example, in [BDK + 20, Figure 8], this output leakage is shown, and the authors argue that it is non-sensitive.

Fixed + Noise
Random (c 1 ,c 2 ) ← Kyber.CPA.Enc(pk,m,r) True: while True: In this section, we develop a new test method that has the same leakage detection capacity as the standard FvR test, but that eliminates the output leakage.We will formalize our new method both for testing the masked comparison and for testing the whole decapsulation.Our framework scales to arbitrary orders similar to the traditional t-test.

'Fixed + noise' vs random
Our method uses a 'Fixed + Noise' vs Random (FNvR) test.The key idea is to introduce small random noise to the fixed valid ciphertext so that the comparison output is false in both test scenarios.By keeping the noise small and varying its location at random, the test is still able to find the same vulnerabilities as the traditional FvR method after combining all traces.
Our FNvR test can be understood as a specific instance of a Semi-Fixed vs Random (SFvR) t-test.SFvR tests are sometimes used to eliminate false positives related to input-or output leakage in a FvR test.In such a test, rather than fixing the input to a constant value, part of the sensitive intermediate variables is fixed.Consequently, the input ciphertext is not a single fixed representative, but uniformly drawn from the set of ciphertexts that results in the desired intermediate variable values.In this setting, any apparent leakage results from the sensitive variable, rather than fixed inputs or outputs.
Our choice to submit noisy input ciphertext fixes the value (c ?= c ) = 0, thus eliminating the output leakage resulting from the fixed input being a valid ciphertext.To maintain similar leakage detection to a standard FvR test, we keep the noise small, which additionally fixes the internal variable m = m with high probability.Specifically, the 'fixed + noise' set is generated as a valid ciphertext to which the smallest possible noise value is added to one of the coefficients at random, as shown in Figure 6.A C code listing that implements our FNvR test is provided in the Appendix A, which also takes into account that ciphertexts are encoded into bitstrings.
We note that the proposed FNvR does not suffer from the common pitfalls of SFvR test [UvWBS20].The complexity of SFvR grows with the size of the set of semi-fixed ciphertext, as all ciphertext in the set must be exhaustively tested.Moreover, the choice of intermediate variable may introduce some bias, making the test leakage model dependent.However, FNvR introduces a small noise in one of the coefficients chosen at random, limiting the complexity increase to the total number of coefficients.Moreover like FvR, FNvR is also performed with valid (+ small noise) and random ciphertext, thus avoiding introducing bias towards any leakage model.

FNvR leakage detection capabilities
Our FNvR method is a validation test to detect vulnerabilities in a masked implementation, which eliminates the output leakage present in the traditional FvR test.Here, we make an informal argument why the FNvR test has the same leakage detection capabilities as the FvR test.Consequently, our FNvR test can be used as a direct replacement for the more traditional FvR method.Notwithstanding that many vulnerabilities will be captured by our new test, it is possible that vulnerabilities are not captured by the FvR test and therefore will also not be captured by our FNvR method.
When applying the FNvR test on the decapsulation step, the output leakage is suppressed by introducing small noise to one of the coefficients of the input ciphertext leading to a comparison that always fails.To reason about the leakage detection capability of the FNvR test compared to the traditional test, one can split the decapsulation into its three phases: first decryption, then re-encryption, and finally the comparison.
1. Decryption For decryption, our analysis makes use of the fact that this is a coefficientwise operation with respect to the polynomial v, or equivalently, the ciphertext part c 2 .When an error is injected into c 2 [i], all computations on coefficients j = i are identical to those in a 'fixed' class of measurements.Since we inject errors randomly into (c 1 ,c 2 ), the probability that the error is injected into c 2 [i = j] is equal to (n − 1)/((k + 1)n).Each individual coefficient j therefore tends to its 'fixed' value, with a probability of (n − 1)/((k + 1)n), and accordingly, this 'fixed' value will leak under the same assumptions as a standard FvR test, albeit possibly with more collected traces.

Re-encryption
The re-encryption step only depends on the decrypted message.As such it is identical for ciphertexts from the 'fixed + noise' and the 'fixed' classes, given that no decryption failure occurs due to the extra noise element.To minimize this decryption failure probability, one should choose noise coefficients as small as possible6 , i.e. ±1.In practical schemes, this should lead to a very small decryption failure probability.For example, in the case of Kyber768 this probability under 'fixed + noise' inputs is 2 −87 .When a decryption failure would occur, the trace will be identical to a trace in the random class of measurements.Given the very small probability of inserting random traces into the 'fixed+noise' class, sufficient traces without decryption failures will be present in the 'fixed+noise' class.

Comparison
In the comparison, the output leakage is suppressed due to the pass/fail bit resulting in a fail for both classes of measurements.On the other hand, we have to show that FNvR does capture leakage that would be captured by the FvR test.We can follow similar reasoning as for the encryption, namely that the 'fixed + noise' class changes only 1 coefficient of the input ciphertext at random, while all other coefficients remain the same as in the fixed category.This means that both methods will essentially find the same leakage, except when this leakage is at the changed coefficient of the input ciphertext.By varying the location of the coefficient that is changed, one can make sure that leakage from all coefficients is tested.
One can argue that our FNvR test is more realistic than the FvR test, as an adversary is allowed to distinguish between a valid and random ciphertext, but should not be allowed to obtain any other information.The 'fixed + noise' is the closest category to the fixed category that gives invalid ciphertexts.

Validation of the methodology
We validated our FNvR framework on the masked comparisons of [OSPG18, BDK + 20, BPO + 20].We treat ReduceComparisons (Algorithm 9) separately in the Appendix, as it is not a full comparison and doesn't compute the pass/fail bit.At the same time, we give a cautionary note regarding input leakage.
Figure 7 shows our results.On the left, we conducted a regular FvR t-test, where the fixed input is a valid ciphertext.All of the masked comparisons show output leakage, due to the output pass/fail bit differing between the fixed and random class of measurements.
On the right, we conduct our novel FNvR t-test.The output pass/fail bit is no longer leaked since it is identical between the two classes of measurements.As expected, we can see that the output leakage indeed vanishes from [BDK + but remains present in [OSPG18, BPO + 20].Similar to our attack on these two implementations, a noise term added in just one coefficient will cause one partial comparison to fail in the 'fixed+noise' class of measurements, but the remaining partial comparisons will succeed.This stands in contrast with the random class of measurements, where all partial checks will fail with high probability.In [OSPG18], we can now observe a second leakage peak, since an error added to c 1 will still cause the comparison of c 2 to succeed7 .The noise term is varied over the different coefficients, such that all partial comparisons will eventually show leakage in our framework.

Conclusion and Future Work
In this work, we attacked the masked implementations of polynomial comparison from [OSPG18] and [BPO + 20] using a side-channel attack.Additionally we demonstrated a collision attack on [BPO + 20] that does not require side-channel information.Both attacks were verified in practical experiments.When instantiated with either of these masked polynomial comparison methods, the security of Kyber512 can be reduced drastically using a sufficient number of side-channel measurements (or decapsulation queries for [BPO + 20]).
In addition, we showed that the polynomial comparison variant described in [BDK + 20] can be used to fix the insecure comparison of [OSPG18] and proposed a modification to the [BPO + 20] method that can reduce the number of coefficient comparisons to be performed.Our FNvR framework for side-channel testing suppresses output leakage and thus makes it easier to distinguish between real leakage and output leakage.Moreover, our FNvR test still detects all vulnerabilities a normal FvR t-test would detect.
All in all, our work shows that a careful validation and implementation of all steps of the FO transform is required when implementing secured lattice-based cryptography.Small leakages, implementation mistakes, or misunderstandings related to the attack model in the masked comparison might lead to powerful plaintext checking oracles.As a consequence, it is important to investigate new countermeasures, prove their security in the correct model, and develop approaches for theoretical and practical validation.With our FNvR framework, we provide the first step in this direction.
As a recent attack on a first-order masked Saber implementation [NDGJ21] has shown, first-order masking is not enough to provide sufficient protection against side-channel attacks.Therefore, the efficient design and implementation of side-channel protection, including shuffling techniques and higher-order masking, remains an interesting open topic.This concerns the masked comparison but also other building blocks used in the FO transform.Future work in this direction could focus on higher-order masked polynomial comparisons, their efficient implementation, or on algorithms that can support prime and power of two moduli.In particular, it would be interesting to implement our correction of [BPO + 20] to investigate the final performance to compare the results with other approaches.Additionally, it would be interesting to investigate improvements or generalizations of the framework provided in [DDGR20] to reduce the number of required approximate linear equations.A good understanding of the cost of retrieving the full secret could help to bound the amount of side-channel leakage an attacker would be allowed to obtain in a realistic scenario."Post-Quantum Cryptography for Embedded Systems", Oct. 2020 and we are thankful to the Lorentz Center, its staff, the organizers and participants of the workshop.This work was supported in part by CyberSecurity Research Flanders with reference number VR20192203, the Research Council KU Leuven (C16/15/058), the Horizon 2020 ERC Advanced Grant (695305 Cathedral), SRC grant 2909.001, and the German Federal Ministry of Education and Research (BMBF) under the project "Aquorypt" (16KIS1017).Michiel Van Beirendonck is funded by an FWO PhD fellowship strategic basic research.Presented project results were partly supported by the project that has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 830927.The authors would like to thank the Chair for Communication Systems and Network Security as well as the research institute CODE at the Bundeswehr University in Munich, headed by Prof. Dreo, for their comments and improvements.leakage in FvR methods.This is because in line 9, there is a computation on the non-sensitive variable Ã, which, in our case, is the submitted input ciphertext.In our implementation, we moved this computation to the end of the algorithm in a separate loop, and use a yellow trigger to mark it in the measurements.As can be seen in Figure 8, the t-statistics takes high values during this interval, but otherwise shows no leakage during the processing of the S = 2 shares.These measurement confirm our theoretic analysis of the security of Algorithm 9. We leave the development of techniques to reduce the input leakage for future work.

Figure 1 :
Figure 1: Decapsulation of Kyber.All operations in grey are influenced by the long term secret sk.
m[i]  in Algorithm 8 to resolve this.

Figure 2 :
Figure 2: Visualization of a border-failure error e in Kyber.CPA.Dec without compression.

Figure 3 :
Figure 3: Visualization of triggering decryption error in Kyber.CPA.Dec with error E 2 with compression.

Figure 4 :
Figure 4: Security of Kyber512 and Kyber768 in function of the number of (approximate) equations retrieved.

mod q 10 end
of this first term is also uniform over the field.Addition of the other terms does change the uniformity of the distribution and therefore Bi B −

Figure 6 :
Figure 6: Generation of inputs ciphertexts following the classes used in the FNvR test.

Figure 7 :
Figure 7: t-statistic for the different masked comparison algorithms after 10.000 collected traces, for a FvR test (left) and a FNvR test (right).