From MLWE to RLWE: A Differential Fault Attack on Randomized & Deterministic Dilithium

. The post-quantum digital signature scheme CRYSTALS-Dilithium has been recently selected by the NIST for standardization. Implementing CRYSTALS-Dilithium, and other post-quantum cryptography schemes, on embedded devices raises a new set of challenges, including ones related to performance in terms of speed and memory requirements, but also related to side-channel and fault injection attacks security. In this work, we investigated the latter and describe a differential fault attack on the randomized and deterministic versions of CRYSTALS-Dilithium. Notably, the attack requires a few instructions skips and is able to reduce the MLWE problem that Dilithium is based on to a smaller RLWE problem which can be practically solved with lattice reduction techniques. Accordingly, we demonstrated key recoveries using hints extracted on the secret keys from the same faulted signatures using the LWE with side-information framework introduced by Dachman-Soled et al. at CRYPTO’20. As a final contribution, we proposed algorithmic countermeasures against this attack and in particular showed that the second one can be parameterized to only induce a negligible overhead over the signature generation.


Introduction
Current digital security infrastructures heavily rely on secure and efficient cryptographic primitives, including digital signatures which are based on asymmetric/public-key cryptography.However, schemes based on classic public-key cryptography, like RSA and ECC, are at risk of being broken once a relevant quantum computer is realized.This threat has accelerated the research into Post-Quantum Cryptography (PQC) schemes: cryptographic algorithms which are still secure even against an adversary with access to a quantum computer.After a few years since the call for proposals for new public-key cryptography standards by the National Institute of Standards and Technology (NIST) [Nat], on July 5th, 2022, the NIST has selected two primary algorithms to standardize: CRYSTALS-Kyber for key establishment and CRYSTALS-Dilithium for digital signatures.In addition, the signature schemes FALCON and SPHINCS+ will also be standardized.Notably, CRYSTALS-Dilithium [DKL + 21] (which we refer to as Dilithium for conciseness in the rest of this paper) is recommended for embedded use cases due to its relative efficiency compared to other PQC schemes.

Polynomial ring notations
All ring arithmetic operations in the paper are over the polynomial ring R = Z q [X]/(X n +1).We denote a polynomial with regular lower-case letters, e.g., p ∈ R, a vector of polynomials with bold lower-case letters, e.g., a ∈ R k and a matrix of polynomials with bold upper-case letters, e.g., A ∈ R k×k .In addition, we denote the i-th coefficient of p as p [i] , the i-th coefficient of the j-th polynomial in a as a [j,i] , and the i-th coefficient of the (k, j)-th polynomial in A as A [k,j,i] .In cases where the lowest index is omitted, we are referring to the complete j-th and (k, j)-th polynomial of a and A, respectively.
For z, α ∈ Z we write z mod ± α to mean the unique integer z ′ in ]− α 2 , α 2 ] with z ≡ z ′ mod α if α is even (resp., odd).The notation z mod ± α implies that all the coefficients in z are given mod ± α.With this, the following norms on Z q , R and R k are defined: with z ∈ Z q and p ∈ R. We use the notation x ← χ whenever we assign a uniformly random element of a set χ to a variable x.The symbol || is used for the concatenation of two bit strings or two vectors/matrices.

Learning with errors
Learning with Errors (LWE) was first introduced by Regev [Reg05] and later expanded to polynomial rings by Lyubashevsky, Peikert and Regev [LPR10] to Ring Learning with Errors (RLWE).A Module Learning with Errors (MLWE) problem is obtained by setting the polynomial ring dimension of the RLWE problem to a dimension greater than one, thereby relying on multiple polynomial ring elements in the same instance.RLWE instances can be expressed in LWE form by representing the polynomial ring multiplication as a matrix vector product as recalled in [LZS + 21, section 2].In the following, we focus on the Search (as opposed to Decision) variants of these problems.

Search LWE problem.
Let A ∈ Z m×n q and χ e be a fixed distribution over Z.The problem of recovering a secret s ∈ Z n q given samples of the form: (A, t = A • s + e ) with e ← χ m e is known as the Search-LWE problem.

Methods of solving Search-LWE.
There exists two main methods to solve a Search-LWE instance by lattice reduction: the primal-uSVP attack and dual attack [AGVW17].
The primal attack uses either Kannan's embedding [Kan87] or the Bai-Galbraith embedding [BG14] to construct an integer embedding lattice to solve the unique Shortest Vector Problem (uSVP).Using Kannan's embedding, recovering s and e given t = A • s + e where t, e ∈ Z m q and s ∈ Z n q is as difficult as recovering the unique shortest non-zero vector v ∈ Z m+n+1 from the embedding lattice Λ in Equation 1 with embedding parameter c and ∥v∥ ≈ σ √ n + m [ADPS16].
On the other hand, the dual attack solves Decision-LWE via reduction to the Short Integer Solution (SIS) problem, which in turn is reduced to finding short vectors in a lattice embedding [Ajt99].
There exists two main sets of algorithms for finding short vectors in lattices: enumeration and sieving.Enumeration algorithms perform an (exhaustive) search for an integer linear combination of the basis vectors, with well-known examples being LLL (Lenstra-Lenstra-Lovász) [LLL82] and BKZ (Block Korkine-Zolotarev) [Kor77,Sch87].LLL can only find an approximation of shortest vector in polynomial time, and as such, is most commonly used as a pre-processing step in other lattice reduction algorithms.The BKZ-β algorithm repeatedly calls an enumeration SVP oracle for finding shortest vectors in dimension or block size β.The dimension of the underlying SVP Oracle, β, is the most widely used measure for cryptographic security of lattice-based cryptography as the time complexity of the BKZ algorithm is exponential in β.First introduced in [AKS01], lattice sieving algorithms find the shortest vector in a lattice by repeatedly computing linear combinations of vectors with the aim of producing shorter vectors.Whereas other algorithms require polynomial memory, lattice sieving algorithms have non-polynomial space complexity and typically require large magnitudes of memory.
The remainder of this work centers on utilizing the BKZ algorithm, which has been enhanced through the inclusion of extreme pruning techniques as outlined in [CN11] and commonly referred to as BKZ 2.0.Multiple open-source implementations of the BKZ algorithm exist, most notably FPLLL [dt21] (which is used in [DDGR20]) and NTL [Sho21].Signature generation.We describe the signature generation in Algorithm 1.For more details, e.g., regarding key generation or signature verification, we refer to [DKL + 21].First, the message M is hashed into a bit string µ.For deterministic signing, µ is hashed together with K to produce a seed ρ ′ .For the randomized version ρ ′ is generated randomly.This seed and a rejection counter κ (initially set to 0) are inputs to ExpandMask to sample the secret polynomial y.Then, w = Ay is decomposed into w 1 and w 0 .The challenge c is the hash of µ∥w 1 .Next, c is converted into a polynomial c that contains exactly τ coefficients set to ±1 and the others set to zero.The vectors z and r are then computed from c, y and w 0 .For both security and correctness, two checks are performed:

CRYSTALS-Dilithium
If any of the two checks do not pass, κ is incremented and the process starts over (from sampling a new y).Otherwise, a hint h is computed, which is needed in the verification to account for the public key compression.Two more checks are performed on ct 0 and h.Again, if these checks do not pass the signature is rejected, κ is incremented and the process starts over.Otherwise, the signature σ = (c, z, h) is returned.

Fault injection attacks on Dilithium
Fault attacks against the signature generation of Dilithium have been the subject of several works in the literature with the majority exploiting the computation of the signature z = y + cs 1 .The ability to potentially establish a linear relation between the long-term secret key component s 1 and public values, z and c has made this step a promising attack vector for a variety of adversaries [BBK16, BP18, RJH + 19, IMS + 22].In addition, it has been shown that recovering s 1 is sufficient to achieve existential forgery for Dilithium [BBK16, RJH + 19].
In the following, we provide a short survey of fault attacks on Dilithium's signature generation algorithm and distinguish between the ones that only apply to the deterministic variant and the ones that apply to both deterministic and randomized Dilithium.

Deterministic and randomized Dilithium.
Bettale et al. [BMR21] investigate the application of safe error attacks to PQC schemes.In the Dilithium case, the main targets are the secret vectors s 1 and s 2 , since they have small coefficient values (∈ {−η, . . ., η}).The main objective of the attack is to fault each of the coefficients of s 1 or s 2 to zero and find faulted signatures which do not differ from unfaulted signatures, i.e., the secret key coefficient is indeed zero.However, knowing the zero coefficients in the secrets does not allow for straightforward key recovery but rather reduces the complexity of the underlying MLWE problem.Loop abort faults were first proposed by Page and Vercauteren [PV06].Their use against lattice-based cryptosystems was first documented in [EFGT16] in an attack to target the sampling of y (ExpandMask), to generate masks of low degree by skipping the sampling of some coefficients in y.The non-sampled coefficients of y are then assumed to be zero and, with enough (faulty) signatures, a solvable system of equations between z and cs 1 can be established.The so-called nonce reuse attacks target sampling in a variety of lattice-based schemes and have been demonstrated in [RRB + 19].This previous work applies to the Dilithium key generation and the main idea is to output faulty keys by not incrementing the nonce used to generate the secret key vectors s 1 and s 2 , thus outputting weak keys.Islam et al. [IMS + 22] propose a fault injection attack against both variants of Dilithium.The main idea behind this attack is that a single bit flip in the secret polynomials will result in (a limited number of) specific fault patterns in the faulty signature.These patterns can be recovered by exhaustively correcting the faulty signature and testing with the public key if it verifies.The number of recovered secret key bits depends on the number of injected bit flips and the distribution of the key coefficients.
Deterministic Dilithium only.The deterministic variant is naturally more vulnerable to DFA than the randomized one.The first DFA against the signing procedure of deterministic Dilithium was demonstrated by Bruinderink and Pessl [BP18], who showed that 65% of the execution of a deterministic Dilithium signature is vulnerable.Their work requires that an attacker can sign the same message M multiple times using the deterministic variant of Dilithium; first, to obtain a proper signature σ = (c, z, h) without any fault injected and then to produce a faulted signature σ ′ to obtain a faulty z ′ and c ′ .Using z, c, z ′ , c ′ , the attacker can recover s 1 by computing s 1 = ∆c −1 ∆z where ∆c = c − c ′ and ∆z = z − z ′ .However, the faulted z ′ and c ′ must come from the same signing attempt (the same value of the counter κ) as the proper σ for the masks y to be equal.Naturally, this attack or more generally standard DFA does not apply to randomized Dilithium since an adversary cannot observe or fault two signatures for the same message with the same value of y.

Attack description & practical consideration
At a high level, the attack proposed in this work leverages the possibility for an adversary to force (part of) polynomials in the vector of polynomials y to be equal during signature generation, i.e., y [i] = y [j] for i ̸ = j.From the observation of this faulty signature z, the adversary can compute the difference between signature polynomials ∆z = z ) for i ̸ = j.With ℓ − 1 such independent differences the underlying lattice problem of Dilithium is weakened, leading to possible key recovery.Namely, we demonstrate how the resulting faults can be used to perform a lattice attack.We first perform concrete attacks against the version of Dilithium considered in proofs (with t being public).In such a case, we are able to recover keys from all Dilithium parameter sets.Then, we discuss the applicability to concrete instantiations of Dilithium where only t 1 (the upper bits of t) is public.In such a case, the inserted faults lead to a significant drop in estimated security for all the parameter sets, but we report successful key recovery only against level V parameter set due to limitations in available computing resources.
In the rest of this section, we first describe how an instruction skip can be leveraged to insert such faults in Dilithium implementations.We also provide the number of faults needed when considering realistic adversarial capabilities, namely one instruction skip per signature.Especially, we discuss the applicability to randomized and deterministic versions of Dilithium.Second, we put forward how the attack turns Dilithium's MLWE problem into a RLWE one.In Section 4, we discuss the resulting estimated security and concrete attack results (when t is public) when the plain RLWE problem needs to be solved with Dilithium MLWE parameters.In Section 5, we put forward how partial secret key enumeration can be performed prior to the lattice reduction attack and then used into a lattice reduction framework, further reducing the attack runtime.(b) Disassembled source code compiled with arm-none-eabi-gcc v10.3.1 with the following compiler flags: -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O3 -fomit-frame-pointer. Red corresponds to the targeted counter increment.

Vulnerability description
Weakness identification.As mentioned above, the goal of the adversary is to force (part of) polynomials in the vector y to be equal and build equations of the form ∆z = z [i] −z [j] for i ̸ = j.Hence, a natural sweet spot for such a fault is the generation of these polynomials.Indeed, each polynomial y [i] in y is derived from the ExpandMask function such that y [i] ← ExpandMask(ρ ′ , κ * ℓ + i) where κ is the rejection counter.The ExpandMask function referred to as polyvecl_uniform_gamma1 in the reference implementation2 is reported in Figure 1.This function generates all the ℓ polynomials of the vector y by repeatedly calling the function poly_uniform_gamma1 with the same seed ρ ′ but incrementing the nonce with κ * ℓ + i.Our new attack is based on skipping the instruction highlighted in red in the two listings.That is, by skipping the increment of the nonce in line 12 in Figure 1b, an attacker can force the reuse of the same nonce for the generation of two consecutive polynomials in y.For instance, skipping the increment instruction during the first iteration of the loop enables to force y [0] = y [1] .Eventually, we note that repeating this fault can lead to forcing equivalence of two polynomials in y for multiple independent signatures which can be exploited as detailed in Subsection 3.2.We do not report experiments for injecting such a fault, however it has been shown in recent literature that the single instruction skip fault model is highly plausible to implement in practice with high precision and reproducibility [DRPR19, MDP + 20, BFP19].
Next, we analyze aspects of the attack and strategies that affect the number of fault injections needed.Based on these previous aspects, we then provide the total number of instruction skips required.
Identifying successful fault injection.The attack relies on collecting ℓ − 1 correct and independent equations of the form z ) for i ̸ = j from the returned signatures.We notice that an attacker can easily detect whether they have successfully injected a fault in the last signing attempt by observing the difference z ) and we know from the distribution of the challenge polynomial and the secret vector that c(s It is very unlikely with overwhelming probability that all coefficients of z [0] − z [1] are in that small interval.3Fault attacks usually identify correctly inserted faults by comparing valid with (possibly) faulty signatures.Hence, the randomized version of Dilithium does not enable such fault identification by the adversary, However, we put forward that the previously described property enables an adversary to identify successful fault injection for both deterministic and randomized versions of Dilithium for arbitrary messages.
The attack requires faulty outputs, however the Dilithium signature generation usually involves a few attempts.Every time a signature is rejected, the counter κ is incremented and a new signature is generated.Accordingly, an attacker has no knowledge of the final signing iteration in order to inject a fault in its execution.The same holds even for faulty deterministic signatures, since the fault affects y and hence z, w, c and r, which then affect the rejection likelihood.Next, we discuss attack strategies for randomized and deterministic signing to deal with Dilithium's abort property and estimate the number of faults needed for the attack.

Attack strategy for randomized signatures.
Recall that an attacker can identify successful fault injections by analyzing the returned signatures, as described previously.If it is not possible for an attacker to target the last iteration because of randomized signing or the fact that the fault changes the rejection behavior, they can target the first one.Based on Dilithium's aborts probability, the first iteration is the most likely to return a signature and succeeds with probabilities p ≈ 0.23, p ≈ 0.20 and p ≈ 0.26 for levels II, III and V. Interestingly, the fault described earlier decreases the probability that a signature is rejected.We estimated by sampling a large number of signatures the probability of accepting a signature at a faulted first iteration and found that this probability increases to p ≈ 0.26, p ≈ 0.21 and p ≈ 0.27 for levels II, III and V, respectively.4By targeting only the first iteration and taking into account the impact of the fault on the rejection probability, an attacker requires on average 1/p fault injections to acquire one faulty signature.This has been confirmed by sampling a large numbers of signatures and corresponds to ≈ 3.8, 4.7 and 3.7 fault injections for levels II, III and V.Eventually, to acquire the ℓ − 1 faulty signatures needed to carry out the rest of the attack, on average 11.3, 18.8 and 22.2 fault injections are needed for levels II, III and V, respectively.
Improved attack strategy for deterministic signatures.In the deterministic case, an attacker can determine the expected final signing iteration since signing the same message multiple times always results in the same signature and the same number of iterations.For the randomized case, we proposed to always target the first iteration since it maximizes the probability of accepting a signature.For the deterministic case, this probability is maximized by faulting the final iteration.However, this probability is not 1, since faulting the generation of y also affects w, c and r.Hence, the fault could lead to a signature being rejected despite being previously accepted when no fault was injected.We estimated by sampling that the probability of accepting a signature after the fault at the previously determined accepted iteration is p ≈ 0.4, p ≈ 0.31 and p ≈ 0.38 for levels II, III and V, respectively.These probabilities are naturally higher than the ones determined previously for the randomized case when always targeting the first signing iteration.Accordingly, in the deterministic case an attacker requires on average 2.4, 3.2 and 2.6 fault injections for levels II, III and V to acquire one faulty signature.We estimated that on average 7.3, 13 and 15.8 fault injections are needed for levels II, III and V, respectively.In addition, as opposed to the randomized case, valid signatures are needed to determine the expected final signing attempt.
Overview of number of fault injections required.In the following, we provide an overview of the number of fault injections required for the attack when skipping one instruction per signature.The attack can also be carried out with ℓ − 1 faults in a single signature, however this can be quite challenging in practice.Recall that the attack requires ℓ − 1 independent equations of the form ∆z = c(s Collecting these equations requires skipping one instruction in the last signing iteration for ℓ − 1 signatures.We have estimated for both deterministic and randomized Dilithium, following the previously described strategies, the number of fault injections needed for an attacker to acquire such ℓ − 1 faulty signatures.The results are shown in Figure 2 where the x-axis corresponds to the total number of fault injections and the y-axis to the probability of acquiring n σ faulty signatures such that n σ ≥ ℓ − 1.To illustrate how to interpret this figure, we provide a few examples.For instance, for randomized Dilithium II, with 10 fault injections, an attacker acquires the ℓ − 1 required faulty signatures with probability 0.5.Eventually and for all security levels, with approximately 40 fault injections for randomized Dilithium and approximately 25 fault injections for deterministic Dilithium, the attacker will most likely succeed in acquiring the ℓ − 1 faulty signatures.

From MLWE to RLWE
For simplicity, in the rest of this work, we assume that an adversary was able to inject (ℓ − 1) instruction skips during the final signing iteration such that all the polynomials of y are equal, i.e., ∀i ∈ {0, 1, . . ., ℓ − 1}, y [i] = y.Accordingly, such a faulty signature is of the form: . . .
From this, we express the ℓ − 1 differences with the first polynomial in s 1 and the following ones as: where the vector of polynomials λ is constructed from the faulty signatures.Indeed, the challenge polynomial c is most likely invertible thanks to the Dilithium polynomial ring5 and is part of the signature.Concretely, Equation 3 illustrates that recovering one polynomial of s 1 enables the recovery of the full secret vector s 1 .Indeed, this linear system has ℓ − 1 independent equations with ℓ unknowns.Similar equations can be derived from multiple faulty signatures by simply taking into account the different zs and cs.Which pairwise differences of the polynomials of s 1 an attacker gets do not matter, as long as they obtain ℓ − 1 independent ones, which is the case when skipping the increment in the ExpandMask function as detailed in the previous section.Next, we put forward how one secret polynomial can be recovered by the adversary.For simplicity, we only describe the methodology for s 1[0] .The same strategy can be used to recover any s 1[i] and from Equation 3 recovering all the remaining (ℓ − 1) polynomials of s 1 .Concretely, we observe that the MLWE instance of Dilithium t = A • s 1 + s 2 can be expressed by exploiting the linear system of equations in Equation 3 as: . . .
Expanding the matrix multiplication, the previous equality is equivalent to: where each of the rows is a RLWE problem.For instance, the first row is: where the left part of the equation is known to the adversary thanks to the standard Dilithium MLWE instance and public key (giving t and A) and the λ from the faulty signatures.The right part of the equation depends on two single secret polynomials s 1[0] and s 2[0] .Therefore, our attack reduces the Dilithium MLWE instance of dimension (k, ℓ) to a (1, 1) instance (hence a RLWE instance) using the same polynomial ring with dimension n = 256.Hence, the resulting RLWE has a reduced security compared to the original MLWE problem.In Section 4.1, we explore the feasibility of recovering the full key by solving this reduced complexity RLWE problem through lattice reduction.
A note on key compression.Above, we described how the attack decreases the security of Dilithium's underlying MLWE problem.This attack assumes that Dilithium's public key contains t = A • s 1 + s 2 , as considered in the security proofs.However, to reduce the public key size, the complete t is decomposed into a high part t 1 which is part of public key and a low part t 0 which is part of the secret key with the relation t = t 1 • 2 d + t 0 .In previous attacks, since the knowledge of t 0 is typically required to recover the secret vector s 2 , it has been shown that recovering s 1 only is sufficient to achieve existential forgery for Dilithium [BBK16, RJH + 19].In our attack, the knowledge of t 0 is not required for the MLWE to RLWE reduction part of our attack as t 0 can be embedded into the additive noise vector (moving t 0 to the right part of Equation 6).However, it has an impact on the resulting security of the RLWE problem making it harder to solve.As it is unclear if t 0 must be considered secret or public (it has been hinted that t 0 can be recovered from enough signatures in [Lyu22, RJH + 18, RRB + 19]), we take a worst-case approach for the rest of this work.If not specified in the following sections, the full t is assumed to be public.In Subsection 5.3, we derive the impact of fully secret t 0 on the complexity of the (reduced) RLWE instance.We hope that this approach gives a complete view to the reader about the applicability of the attack to Dilithium.
Eventually, once the missing polynomial s 1[0] is obtained thanks to a lattice reduction attack, the complete s 1 can be recovered thanks to the linear system of equations.The secret polynomial s 2 can be obtained thanks to Equation 5 when t is known.In such a case, the relations put forward in Appendix A can also be used.If only t 1 is known, the adversary can leverage the results from [BBK16, RJH + 19] to forge valid signatures.

Impact on estimated security & key recovery
In the previous section, we showed that faulty signatures enable to reduce the MLWE instance of Dilithium to a RLWE one.In the following, we evaluate the complexity of solving the RLWE problem in Equation 5 using a lattice reduction.Concretely, we first estimate the security parameter β with various tools from the literature, both for MLWE and RLWE.This enables to quantify the security reduction.Second, we solve concrete RLWE instances and report the runtime.The results presented in this section are summarized in Table 2. Later in Section 5, we will highlight how side-information can be used to decrease even more the security of the RLWE instance [DDGR20].
In the following, we denote by pre-attack the unfaulted case, i.e., the standard hardness of Dilithium's MLWE problem.Accordingly, post-attack stands for the case where the adversary inserted the required faults resulting in an easier RLWE problem.In both cases, t is considered public.

Post-attack hardness estimates
In order to estimate the hardness of MLWE and RLWE instances, we use the estimator in [DDGR20].This estimator has been selected as it allows easily integrating sideinformation (see Section 5), perform attacks and estimate the BKZ β.In addition, it was observed by the authors to be accurate for small dimension lattices which is the case in our scenario (we compare various estimators in Appendix B and observe the same behavior).
The results are provided in Table 2 for the three Dilithium NIST security levels.The top half of the table corresponds to the standard Dilithium security prior to the attack and the bottom half of the table to the reduced security after the attack by exploiting the reduction from MLWE to RLWE resulting from the induced faults.The dimensions for the respective problems are also provided.From Table 2, it is clear that after the fault attack the baseline security is significantly decreased.For illustration, the estimated BKZ block-size β is reduced from 434 to 62 for Dilithium level II, from 641 to 68 for level III and from 890 to 62 for level III.Interestingly, we note that for level II and V, the resulting RLWE is similar as the resulting dimensions are the same as well as the noise range η.
The only difference is the number of faults that need to be injected to reduce the MLWE down to RLWE.For level III, the noise range is slightly larger, hence the estimated β is slightly higher.

Solving RLWE instances
For all versions, the significant hardness reduction leads to much smaller LWE dimensions and BKZ block-sizes, and hence to practical lattice reduction attacks.For this purpose, we use the toolbox provided by Dachman-Soled et al. [DDGR20].The Search-LWE problem is solved via primal-uSVP and Kannan's embedding (as mentioned in Subsection 2.2).In terms of setup, we use 16 cores of an Intel Xeon processor running at 2.75 GHz.
The results of the lattice reduction attacks are shown in Table 2.In particular, we provide the minimum, average and maximum BKZ block-size β, along with the minimum, average and maximum runtime observed over 10 lattice reductions for random keys for Dilithium level II.The results for levels II and V are identical since the LWE instances are equivalent as discussed above.Over our 10 experiments with level II and V parameters, we were always able to recover the polynomial s 1[0] in Equation 6 with an average runtime of 40 hours.The rest of the secret key being trivially obtained from Equation 3. Interestingly, we observe that the actual β is close to the estimated one.The actual β is on average equal to 59 and was estimated to be 62, confirming that the estimations of [DDGR20] are accurate in this case.We did not perform the attack for level III since the attack time would be prohibitive with our current setup (estimated BKZ block-size β for level III is 68).However, from Table 2 it is clear that since the block-size for level III is only marginally higher than the one for levels II and V, the lattice reduction should still be practical, potentially with lattice sieving.

Improving attacks with side-information
In the previous sections, we showed that it is possible to recover the secret signing key of Dilithium in very few signatures using a combined fault injection and lattice reduction attack.The lattice reduction attack succeeds without using any additional information and instead only relying on the faulty signatures.
In the following, we first show in Subsection 5.1 how additional knowledge on the secret key, referred to as side-information (or hints) in [DDGR20], can be recovered from the faulty signatures.This additional knowledge can be plugged into to the solver to further reduce the complexity of the previous lattice reductions.Subsequently, in Subsection 5.2 we provide security estimates and the runtime of the improved attack.These results are summarized in Table 3.In Subsection 5.3, we explore the possibility to recover the secret without the full knowledge of t, and instead only the high part t 1 that is shared as part of the public key after key compression.Finally, in Subsection 5.4, we compare these results with other published attacks.

Side-information from partial enumeration
In addition to reducing the number of dimensions of the MLWE problem, the faults can also be used to gain additional information on each of the coefficients s 1[0,j] in s 1[0] independently.Indeed, for each of these coefficients, some value can never be compatible with the linear system of equations in Equation 3. Concretely, we first observe that these values are uniformly distributed on a small range [−η, η] with η ∈ {2, 4}.Then, the linear system in Equation 3 can be expressed for each of the n coefficients independently.As a result, it is possible to narrow down the possible values for each s 1[0,j] independently.This can be efficiently achieved by enumerating over all possibilities for the ℓ coefficients s 1[:,j] at a particular index j.The plausible values of s 1[0,j] are then obtained by listing all the s 1[:,j] leading to a valid solution of Equation 3. Overall, this process involves n enumerations over (2η + 1) ℓ possibilities.
The results of such an enumeration are plotted in Figure 3 for Dilithium with NIST security levels II, III and V, from left to right.The enumeration is performed by simulating our attack for 10,000 random Dilithium keys for each level.On all the plots the x-axes correspond to the number of possibilities remaining after enumeration for a single coefficient index s 1[0,j] , i.e., the number of possible solutions or the size of the solution space for a single coefficient after solving the system of equations.This value ranges from 1 (the coefficient is known) to 2η + 1 (no side-information is recovered).The y-axes correspond to the number of coefficients in the polynomial s 1[0] that are known up to a number of possibilities on the x-axis.Since these results are provided for a set of 10,000 random keys for each plot, the red crosses indicate the average number of coefficients with x remaining possibilities.Solid lines indicate key space reduction within ± 1.5 the interquartile range from the median, while the hollow circles indicate outliers.
First, we observe from Figure 3 that the number of coefficients fully recovered is relatively high.Specifically, on average we fully recover ≈ 80 coefficients (31%), ≈ 45 (17%) and ≈ 156 (61%) out of the total 256 coefficients for all secret key polynomials, for level II, III and V, respectively.Many other coefficients are also reduced to 2 possibilities (for instance ≈ 90 and ≈ 60 coefficients for level II and III, respectively) and in general to less than 2η + 1 possibilities.No information is recovered on a very small number of coefficients for which the initial 2η + 1 possible values remain.Accordingly, in addition to reducing the MLWE problem to a significantly easier RLWE problem, the attack is also able to recover a significant portion of the secret key coefficients and partial information on the remaining ones.
Interestingly, the key recovery proportion depends on the security level, precisely on the values of the parameters η and ℓ.First and naturally, the key recovery potential is lower for level III since the key space or the solution space for the system of pairwise differences is larger for η = 4 compared to levels II and V for which η = 2. Second and surprisingly, the increase in ℓ leads to more coefficients being recovered, as illustrated by the enumeration results for level V.As opposed to standard MLWE security where increasing ℓ results in harder instances, in this case the increase in ℓ negatively impacts security (but still increasing the number of faults needed).This is due to having more equations in the system we use for enumeration and hence to more constraints which reduce the key space.

Lattice reduction attack with hints
In the following, we use the toolkit from Dachman-Soled et al. [DDGR20] to evaluate the impact of the side-information obtained in the previous section on the hardness of the lattice problem.Indeed, it provides the ability to insert leaked information about secrets and/or errors into lattice reduction attacks and estimate their complexity.The resulting estimates are available in Table 3.
Table 3: Summary of estimated hardness reduction from the fault attack with sideinformation and runtime of resulting RLWE solving.Both are obtained using the tools provided in [DDGR20] 7 .We provide minimum, average and maximum values observed over 10 random faulted signatures with random keys (in the form min/avg/max).The total runtime is the sum of the BKZ runtime and the hint integration.
NIST Integration of perfect hints.As mentioned above, some secret key coefficients can be determined uniquely simply thanks to enumeration.This knowledge can be integrated as Perfect Hints in the [DDGR20, Definition 23] framework and provides the strongest form of side information.In addition, the framework can be used to implicitly integrate Short Vector Hints [DDGR20, Definition 30].Concretely, all the prefect hints are first integrated, and the short vector hint is then integrated (its computation is integrated into the toolkit).
In addition, the framework of Dachman-Soled et al. considers so-called modular and approximate hints.These correspond in the context of our attack to key coefficients that are not uniquely determined with enumeration and their integration could lead to improved attacks.We leave the study of these kinds of hints for future works since in the context of our attack perfect and short vector hints are sufficient to significantly reduce the complexity of the RLWE problem and lead to practical key recovery.In the following, we refer to both perfect and short vector hints as simply hints.
Estimated & concrete security.The results of the lattice reduction attacks using hints are provided in Table 3.In this table, both estimates and concrete key recovery results are averaged over 10 experiments as the number of hints may vary for different secret keys as illustrated in Figure 3.We again provide the minimum, average and maximum BKZ block-size β, along with the minimum, average and maximum runtime.Concretely, our experiments illustrate that the integration of hints significantly reduces the hardness of the lattice problem.First comparing with Table 2, the lattice dimension decreases from 513 for all NIST security levels without hints down to 438, 465 and 340 on average for level II, III and V, respectively.Similarly, it reduces the block-size β.As an example for level II, the actual (resp.estimated) β is on average decreased from 59 (resp., 62) down to 29 (resp., 15)8 .Interestingly, the improvement offered by the integration of hints is larger for level V, as the number of available perfect hints is higher compared to level II and III (see Figure 3).
Next, the runtime of the attack is also significantly reduced.With our resources, from ≈ 40 hours on average to ≈ 15 hours for level II and ≈ 23 hours for level V.The attack is also practical for level III, with an average runtime of ≈ 17 hours.In particular, we note that with the large number of hints we are able to extract, the total runtime of the attack is actually dominated by the hint integration.Despite the long hint integration runtime, our results show that integrating short vector hints significantly decreased the dimension of the instance and significantly improved lattice reduction time.As previously mentioned, this suggests that for practical attacks it might be possible to reach some kind of trade-off between the number of hints integrated and the remaining BKZ complexity to minimize the total runtime.This question is left for future work.

Lattice reduction attack with hints without knowledge of t 0
A key feature of Dilithium is public key compression.That is, the key generation algorithm only publicly outputs t 1 such that t 1 2 d + t 0 = As 1 + s 2 .In the previous sections, we assumed that the attacker has knowledge of t 0 , the lower 13 bits of the public key.While t 0 is not assumed to be secret and can potentially be recovered from enough signatures as hinted in [Lyu22, RJH + 18, RRB + 19], we still explore the possibility for key recovery without knowledge of t 0 .For this purpose, we essentially include t 0 as part of the error, which was only s 2 in the previous sections.This increases the support of the error distribution from For key recovery, we apply the same previously described lattice reduction methodology to recover s 1 .Unlike Section 5 where lattice reduction will also recover s 2 , this method will only yield s 2 − t 0 and will not result in a straight-forward recovery of the private key component s 2 .However, s 1 is sufficient to achieve existential forgery as shown in [RJH + 18].
The lattice reduction estimates using hints but without knowledge of t 0 are given in Table 4.As expected, since the variance of the error distribution is larger, the attacks are more difficult than with the knowledge of t 0 .However, we can still observe a significant reduction of the security level.It reduces on average from β = 434 to β = 94 for level II, and from β = 641 to β = 136 for level III.Notably for level V, since more perfect hints can be recovered on the secret, security is reduced from β = 890 to β = 49.Such a block-size is practical on our setup as illustrated in Table 3.Indeed, successful attacks are reported for β = 48 and higher dimensions.
Table 4: Summary of estimated hardness reduction (assuming t 0 is secret) from the fault attack with side-information and runtime of resulting RLWE solving.Both are obtained using the tools provided in [DDGR20] 7 .We provide minimum, average and maximum values observed over 10 random faulted signatures with random keys (in the form min/avg/max). NIST

Comparison to related attacks
In this section we compare our attack to state-of-the-art fault injection attacks against Dilithium.For consistency and to simplify comparison in future works we extend the table provided by Ravi et al. [RCDB22] to include our attack.The columns Attack_Vector and Countermeasure have been removed for conciseness, since instruction skipping faults can be achieved by different means and countermeasures against our attack are discussed in Section 6.For the full table we refer the reader to [RCDB22, Table 3].Notably, the attack characteristic column in our case essentially captures the difference between deterministic and randomized signing.This characteristic is defined in [RCDB22], and namely with the ability to communicate with the target device (Communicate_DUT_IO) an attacker can request signatures for specific messages.In the deterministic case this is useful since it allows an attacker to sign the same message twice or multiple times and hence identify the expected final signing iteration.In the randomized case, this is not possible and it usually suffices to observe the returned signature (Observe_DUT_IO).

Re-computation and norm-based countermeasures
In the following, we present and analyze two countermeasures to protect Dilithium against the attack described earlier in this paper, when an attacker can induce one fault per signature.This is motivated by the fact that verification after signing does not prevent the attack, since despite the fault, the returned signature is still valid.Indeed, the whole signature stems from the pseudo randomly generated y and the message, so generating a faulty y at the beginning of a signing attempt is virtually the same as signing with a different random y.Interestingly however, on memory-constrained devices y is typically generated twice since it is used at two different stages of the signing algorithm [GKS21,BRS22].Naturally, faulting only one out of the two generations of y leads to a faulty signature and hence the attack would be detected by verification after signing.In the following, we focus on implementations which do not regenerate y.
We first discuss a simple trick related to re-computation based detection for Fiat-Shamir with aborts signatures.We then also present a more efficient countermeasure in comparison to re-computation which protects specifically against the attack presented in this work.Note that ensuring the control flow integrity of the signature generation can prevent the attack, by making sure that the increment of the nonce cannot be skipped, still in this section we suggest and discuss efficient algorithmic countermeasures.

Re-computation for Fiat-Shamir with aborts
This section is based on the two following observations.First, almost all known fault attacks on Dilithium (safe error and ineffective fault attacks excluded) require access to the returned signature.Second, one particular property of Fiat-Shamir with aborts signature schemes such as Dilithium is that signatures are checked before being released.In the case of Dilithium, if the norm checks on z and r do not pass, the signature generation is aborted and repeated starting from the generation of y with an incremented counter κ.On average, Dilithium requires 3 to 5 attempts depending on the parameter set as recalled in Table 1, but even 10 or 20 attempts could be observed with non-negligible probability before accepting a signature9 .
A full re-computation of a signature generation leads to a 100% overhead.Based on the previous observations, we propose to significantly reduce this overhead by only re-computing the last signing attempt, hence ensuring that the released final signature is not faulty or detecting any faults in the final signature and not releasing it.An extension of this proposal is to use lightweight and more efficient countermeasures for all the signing attempts and use stronger but potentially more expensive countermeasures during the re-computation of the last attempt.In the following, we simply focus on re-computing the last signing attempt to estimate the benefit of this strategy.These estimations are given in Table 6, where for instance when 5 signing attempts are required (which is close to the average number of signing attempts for Dilithium III) to generate a signature, re-computing only the last attempt induces only a 20% overhead instead of a 100% when re-computing all signing attempts.

Norm-based fault detection countermeasure
In this section, we propose an efficient countermeasure meant to protect against the attack presented in this paper.We first explain the idea behind the countermeasure, then provide an algorithm detailing it and finally analyze it, in particular with respect to false positives.This countermeasure uses the same trick described previously, i.e., it is only performed for the final valid signature, however, it is also more efficient since it avoids re-computing a whole signing attempt.

Notations and reminders.
For simplicity, we will use ∆z, ∆y, ∆s 1 , c∆s 1 to denote differences of the form Idea behind the fault detection.Our countermeasure is based on the observation that a fault can be detected by analyzing the differences between the coefficients of the polynomials of z, which we already hinted to in Subsection 3.1.Indeed, if y [i,j] = y [i ′ ,j] then ∆z = c∆s 1 and we know from the distribution of the challenge polynomial and the secret vector that the coefficients of c∆s 1 are ∈ [−2τ η, 2τ η].The countermeasure consists of analyzing the values of ∆z's and counting how many of them are small (i.e., inside the range [−2τ η, 2τ η]) and how many are large (i.e., outside this range).On the one hand, if a ∆z is outside this range, we can say with confidence that the ∆y-term is present and no fault was injected (at least for the coefficients in question).On the other hand, since the range of ∆y is much larger than that of c∆s 1 , the probability of an unfaulted ∆z lying inside of [−2τ η, 2τ η] by chance is quite small.That is the key observation we use as a criterion for fault detection.One of the main advantages of this method, is that since the check is performed on z, as opposed to checking y, it can be done only once after the final signing attempt.
Norm-based fault detection algorithm.The inputs to the fault detection countermeasure are the vector z and what we refer to as the strictness parameter N .The strictness parameter N will correspond to the maximum number of small ∆z's that we permit for a given signature.Concretely, if there are more than N small ∆z values then we assume that a fault was injected, discard the signature and compute a new one.Interestingly, the parameter N additionally dictates what kind of attacks are prevented.By setting N = 256 we can prevent attacks which force two full polynomials of y to be equal.A smaller value of N allows detecting other versions of the attack, e.g., when only a few coefficients of two polynomials of y are forced to be equal.The smaller N is, the more kind of attacks are detected, but a low value of N also leads to a high False Positive Rate (FPR) since random coefficients can be close to each other and therefore lead to a small ∆z value simply by chance.The fault detection is presented in Algorithm 2. In Line 1 we initialize a counter that will count the number of small ∆z.The for-loops in Lines 2 and 3 iterate over all possible combinations of polynomials in z, the loop in Line 4 iterates over the coefficients.Lines 5 and 6 check whether a ∆z is small, and in that case increment the counter.We then check if the maximum number of permitted small ∆z is surpassed (Line 7).The output is 1 if we suspect a fault, and 0 if none was detected.
Analysis of the countermeasure.
In the following we analyze Algorithm 2 and the probabilities involved, and in particular how the FPR depends on the strictness N .For this we need the probability distributions of ∆y and c∆s 1 .Based on the Dilithium Algorithm 2 Fault Detection Countermeasure (z, N ) 1: counter = 0 2: for i = 0, ..., ℓ − 2 do 3: return 1 9: return 0 specifications, we derived the probability mass functions of the former as: while the latter can be approximated by a normal distribution N (0, 2 3 τ η(η + 1)) following the Central Limit Theorem.For an unfaulted ∆z the probability of lying inside the range [−2τ η, 2τ η] can now be calculated in the following way: We will call this probability p.This sum is not infinite, because P (∆y = x) is only non-zero for a finite number of x (see Equation 7).For a given z we can derive ℓ(ℓ − 1)/2 differences of polynomials and because each polynomial has n coefficients, we then get n•ℓ(ℓ−1)/2 := n z differences of coefficients ∆z.We approximate the FPR using a binomial distribution B(n z , p) 10 .The FPR is the probability that more than N ∆z's are small.
In an implementation of Dilithium it is important to know how many unfaulted signatures, i.e., ones that passed all Dilithium rejection checks, will have to be computed until one is accepted by the fault detection in Algorithm 2. We want this number to be as close to 1 as possible to avoid rejecting valid signatures due to the false positives in the detection.The probability that m − 1 signatures are rejected and then the m-th one is accepted is (1 − FPR) m−1 • FPR.The expected number of signatures until we accept is then: .
The above number is not the number of overall signatures created, but rather the number of valid signatures that already passed all rejection requirements of Dilithium itself, and are then passed on to Algorithm 2. In Table 7 we summarize all relevant values for the different parameter sets of Dilithium and provide the FPR and E sig for exemplary values of N .For the strictest fault detection (N = 0), the FPR as well as the number of generated signatures E sig are quite high for all three NIST security levels.However, if we increase N 10 Since the ∆z are not independent random variables, we have verified that the approximations given in Table 7 match values derived experimentally for random signatures.
only slightly, which is equivalent to allowing some minimal equality in the polynomials of y which could be the case simply by chance, both the FPR and E sig drop relatively fast until E sig reaches ≈ 1 for N = 15.This value of N still provides sufficient security for all Dilithium parameter sets since it would detect when full polynomials or a significant portion of the polynomials of y are equal but with a close to zero FPR and negligible impact on the rejection rate.Performance evaluation.Figure 4 shows the overhead introduced by the countermeasure as function of the strictness parameter N .The numbers are obtained from the PQM4 reference implementation and the added countermeasure running on the NUCLEO-L4R5ZI board.The left side provides the overhead in the number of clock cycles, and the right side the overhead relative to the unprotected signature generation.The dashed lines on the left side correspond to the clock cycles spent on the unprotected signature generation.These results confirm the estimations provided in Table 7 11 .For a very small N (e.g., N = 0), the overhead is quite significant since many signing iterations are needed to generate a valid signature that also fulfills the condition set by the countermeasure, but as soon as N is large enough, for instance N ≥ 5 no additional signing iterations are needed most of the time.As a result, the proposed countermeasure can be parameterized to achieve a negligible overhead.

Conclusion
In this paper, we introduced a new fault attack which applies to both randomized and deterministic versions of Dilithium.In particular, the attack requires a few instruction skips to reduce the MLWE problem Dilithium is based on to a much simpler RLWE problem.The latter can be solved with lattice reduction attacks which we demonstrate and enhance with the integration of side-information or hints extracted from the faulty signatures.As a final contribution we also suggested countermeasures to protect against the presented attack with minimal overheads.
As for perspectives, it is clear that PQC schemes are relatively less investigated than standard asymmetric cryptography based on RSA or ECC that has been practically deployed for decades.This holds true as well with respect to implementation attacks, including side-channel and fault attacks.From our work, we confirm the interest of the first steps taken in [DDGR20] to bind the gap between lattice reduction attacks and fault/sidechannel attacks.We believe that using partial (i.e., probabilistic) side-information, as usually obtained from noisy measurements, would be an important step forward into that direction.

A Impact of faults on s 2
It was noted in [ABC + 22] that although r = w 0 −cs 2 = w−αw 1 −cs 2 is not released as part of the signature it can be computed from the signature and the public key as Az−ct−αw 1 .In the context of our attack, we first examine how the faulty y affects w = A • y.This is shown in Equation 8 where we denote ∀i ∈ {0, 1, . . ., k − 1}, . . . ) Eventually, what Equation 9 highlights is that, similarly to s 1 , after the attack it is possible to recover all polynomials of s 2 from a single one.

B Comparing LWE estimates for RLWE instance
v o i d polyvecl_uniform_gamma1 ( p o l y v e c l * y , c o n s t u i n t 8 _ t r h o p r i m e [CRHBYTES ] , u i n t 1 6 _ t kappa ) { u n s i g n e d i n t i ; f o r ( i = 0 ; i < L ; ++i ) poly_uniform_gamma1(&y−>v e c [ i ] , s ee d , L * kappa + i r 4 , r 5 , r 6 , r 7 , pc } (b) Disassembled binary

Figure 2 :
Figure2: Probability that the number of released faulted signatures n σ is at least ℓ − 1 after inserting a single instruction skip per independent execution of Sign(sk, M ), hence enabling full secret key recovery.Continuous lines stand for randomized Dilithium.Dashed lines stand for deterministic Dilithium.

Figure 3 :
Figure 3: Frequency of the number of remaining possibilities after enumeration.

Figure 4 :
Figure 4: Comparison of average runtime of Dilithium signatures with and without the countermeasure as function of the strictness parameter N .

Table 1 :
Dilithium is a digital signature scheme based on the MLWE and MSIS (Module Short Integer Solution) problems [DKL + 21].Table1provides the Dilithium parameters for different NIST security levels 1 .Dilithium parameters.

Table 2 :
Summary of estimated hardness reduction from the fault attack and runtime of resulting RLWE solving.Both are obtained using the tools provided in [DDGR20] 7 .For concrete solving, we provide minimum, average and maximum values observed over 10 random faulted signatures with random keys (in the form min/avg/max).

Table 5 :
Extension of [RCDB22, Table3] with our attack.The number of executions is given on average for all Dilithium security levels (in the form level II/level III/level V). † additional valid signatures are required in the deterministic case to determine the expected final signing iteration and is not accounted for in the average number of faulted signatures.

Table 6 :
Comparison between re-computing every signing attempt and re-computing only the last valid signing attempt.Values represent the overhead on top of the regular signing, which is constant at 100% when re-computation is applied to all attempts.

Table 7 :
Parameters and FPR of the norm-based fault detection approach for different NIST security levels and strictness parameter N .