Exploiting Small-Norm Polynomial Multiplication with Physical Attacks Application to CRYSTALS-Dilithium

. We present a set of physical proﬁled attacks against CRYSTALS-Dilithium that accumulate noisy knowledge on secret keys over multiple signatures, ﬁnally leading to a full key recovery attack. The methodology is composed of two steps. The ﬁrst step consists of observing or inserting a bias in the posterior distribution of sensitive variables. The second step is an information processing phase which is based on belief propagation and eﬀectively exploits that bias. The proposed concrete attacks rely on side-channel information, induced faults or possibly a combination of the two. Interestingly, the adversary beneﬁts most from this previous knowledge when targeting the released signatures, however, the latter are not strictly necessary. We show that the combination of a physical attack with the binary knowledge of acceptance or rejection of a signature also leads to exploitable information on the secret key. Finally, we demonstrate that this approach is also eﬀective against shuﬄed implementations of CRYSTALS-Dilithium .


Introduction
Over the last years, quantum computing has witnessed significant advances.In turn this has accelerated the research, adoption and standardization of Post-Quantum Cryptography (PQC) schemes: cryptographic schemes that are believed to be secure even when facing an adversary with access to a quantum computer.The selection and standardization of PQC schemes is driven by the National Institute of Standard and Technology (NIST), and as of July 2022 the lattice-based schemes CRYSTALS-Kyber and CRYSTALS-Dilithium have been selected as the primary PQC standards for key establishment and digital signatures, respectively.
In particular, PQC schemes have garnered the interest of the cryptographic community with respect to their efficient and secure embedded implementations.The main security threats and countermeasures thereof investigated are Side-Channel Attacks (SCA) and Fault Attacks (FA).Notably for CRYSTALS-Kyber, additional vulnerabilities are introduced by the use of the Fujisaki-Okamoto (FO) transform [FO99] in the decapsulation process [RRCB20, XPR + 22, SCZ + 23].Other works also describe how to target the Number Theoretic Transform (NTT) which is the main arithmetic building block in implementations of lattice-based schemes such as CRYSTALS-Kyber and CRYSTALS-Dilithium [HHP + 21].An interesting line of research, initiated by Hermelink et al. [HPP21] and later extended in [Del22, KCS + 23], entails using the Belief Propagation (BP) algorithm in combination with faults to break implementations of CRYSTALS-Kyber.The protection of • Generic Attack Framework.In Section 3, we describe a new generic methodology to accumulate information on the secret key polynomial from noisy knowledge of polynomials.This approach is based on the BP algorithm which is leveraged for Soft Analytical Side-Channel Attacks (SASCA) introduced in [VGS14].As demonstrated in this work, it enables a large variety of attack scenarios that we systematically study through simulated experiments.For each of these, we evaluate the number of signatures to recover a single key polynomial.
• Physical Attacks with Accepted Signatures.In Section 4, we apply the framework to the case where the adversary gets access to released signatures.For side-channel attacks against Dilithium Level-2, we show that for low noise (SNR = 100) only ≈ 4 traces are needed to recover a secret key polynomial.For higher noise levels (e.g., SNR = 0.01), a total of ≈ 700 traces are needed.For fault attacks, between 23 and 2000 faults are needed depending on the fault precision.
• Physical Attacks without Accepted Signatures.In Section 5, we study the case of attacks without released signatures.When the signature is not accepted, but the index of the rejected coefficient is leaked (e.g., through an early-abort strategy), ≈ 6 • 10 5 traces are needed in the low noise settings.Eventually, we target a weakened parameter set and show that the knowledge of whether a full polynomial is rejected (e.g., when no early-abort strategy is implemented), while having no knowledge of which coefficient lead to this rejection, can still lead to key recovery.Fault attacks are also similarly exploitable, but we do no include these experiments in the paper due to space limitations.
• Physical Attacks with Shuffled Computation.In Section 6, we demonstrate that shuffled implementations of CRYSTALS-Dilithium, as suggested in [ABC + 22], are also vulnerable to physical attacks using our framework.More precisely, for sidechannel attacks against Dilithium Level-2, with SNR = 100, a total of ≈ 5 • 10 3 traces are needed to recover the secret key polynomial.For SNR = 0.5, ≈ 1.2 • 10 5 measurements are needed.Similar results are also observed for fault attacks.
Put all together, our attack is a step forward in the direction of the evaluation of hardened implementations of CRYSTALS-Dilithium against physical attacks.Concretely, the impact of our contribution on the attack surface and countermeasures for variables within CRYSTALS-Dilithium is summarized in Table 1.Again, while attacks on some variables were known (e.g., fault attacks on c • s 1 ), we extend and generalize these attacks and provide a framework to exploit the leakage that scales efficiently with respect to noise.In Table 1a, we summarize the types of attack that are known in the literature, and what our framework is able to perform.In Table 1b, we summarize the types of countermeasures that should be applied to these variables.In particular, in this paper we conclude that shuffling, while it increases the complexity of the attack linearly with the noise level, it does not completely avoid it, hence we recommend to use masking for all the sensitive variables as hinted in [ABC + 23].

Background
In this section, we describe the necessary background to understand the contributions of this paper.We start by introducing the general notations used in the paper.We then recall the main step of the signature generation process in CRYSTALS-Dilithium and some of its specific details and properties relevant to the remainder of this work.Afterwards, we describe side-channel attacks and the SASCA methodology.

Notations
We denote by Z q the integer ring modulo the prime q, and by Z q [X]/(X n +1) the polynomial ring in X modulo X n + 1. Polynomials in Z q [X]/(X n + 1) are written in bold, e.g., p, with n being the degree of the polynomial.For CRYSTALS-Dilithium, q = 2 23 − 2 13 + 1 and n = 256, and we will fix them for the remainder of the paper.The i-th coefficient of a polynomial is denoted p i .The multiplication between two polynomials a and b is written as c = a • b and addition as c = a + b.The infinity norm of a polynomial is expressed as ||p|| ∞ and is the maximum absolute value of its coefficients.Constants are denoted with Greek letters.A half-open interval containing all elements in the range {α, . . ., β − 1} is denoted as α, β .A closed interval is denoted as α, β , where β is included.The variables x mod q are in the range 0, q , variables x mod ± q in the interval −(q − 1)/2, (q − 1)/2 .The j-th bit of the i-th coefficient in a polynomial p is denoted as p i [j].We denote random variables with upper case letters, e.g., X, and their realizations with a lower case, e.g.,

Signature Generation and Rejection in CRYSTALS-Dilithium
In this section we detail the CRYSTALS-Dilithium operations required for the understanding of the remaining sections.More specifically, we describe generation of the signature polynomials z and r 0 .As both are computed and bound checked similarly, we use generic parameters that can apply to both cases.We refer to the CRYSTALS-Dilithium specifications for additional details [DLL + 17].
In a valid signature, the signature polynomial z must satisfy During the signing operation, first the polynomial z = y + s • c is computed and bound checked, i.e., the norm of all its coefficients has to be strictly smaller than γ − β.If the bound check does not pass, the signature process is repeated until the aforementioned property is fulfilled (see Section 2.2.2).The same process is applied to the other signature polynomial r 0 with slightly different parameters.Note that there are further rejection checks during the CRYSTALS-Dilithium signature generation, e.g., the number of ones in the hint.However, these are not relevant for our attacks and are omitted for simplicity.
In this work, we will discuss the recovery of a single secret key polynomial.Note that CRYSTALS-Dilithium is based on MLWE hence the secret key is composed of a small vector of polynomials.As a result, the experiments on a single polynomial can be used to derive the number of measurements and/or faults needed to mount the attack on the full CRYSTALS-Dilithium secret key.

Relevant Polynomials Properties
Next, we describe in detail the properties of the polynomials involved in the computation of z as well as the polynomial operations used.We refer to Table 2 for the Dilithium parameters that are relevant to our attack.
Secret key s.The long term secret key polynomial is denoted as s.This polynomial has a small norm such that ||s|| ∞ ≤ η.All the coefficients in this polynomial are independently drawn from a uniform distribution thanks to ExpandS during KeyGen such that s i $ ← {−η, . . ., η} for all indexes i.The parameter η depends on the parameter set with η ∈ {2, 4}.The distribution of the coefficients has a mean µ s = 0 and a standard deviation σ s = η(η + 1)/3.Challenge c.During Sign, a fresh challenge c is generated deterministically from a random bitstring thanks to SampleInBall.The obtained challenge has a special structure.It has exactly τ coefficients that are different from zero and are equal to either 1 or −1.Again, τ ∈ {39, 49, 60} depending on the parameter set.The polynomial c is part of the signature, hence c is given for valid signatures (when the bound check is passed).This polynomial is considered as non-sensitive in protected implementations [MGTF19, ABC + 23].
Mask polynomial y.During Sign, a fresh mask polynomial y is derived for every new signature generation.The coefficients of these polynomials are uniformly distributed on the interval −γ, γ .This polynomial is not part of the valid signature and must remain secret.Product x = s • c.The signature generation involves computing the polynomial multiplication between s and c.Concretely, each coefficient in the resulting polynomial x is defined as ( As a result, each x i is a weighted sum of secret key coefficients.Since c only contains exactly τ non-zero coefficients, each x i is the weighted sum of a subset of τ secret key coefficients.Hence, x satisfies the property that ||x|| ∞ ≤ β where β = τ η.By the central limit theorem, the distribution of x i can be estimated by a normal distribution ≈ N (0, √ τ • η(η + 1)/3).However, its distribution can also be explicitly computed by using convolutions of probability tables as detailed in Section 3.
Signature z = y + s • c.Thanks to the norm-check performed during signature generation, one ensures that for all valid signatures we have ||z|| ∞ < γ − β.Putting it all together, the coefficients in z are given by (3)

Rejection Probability in a Blackbox Setting
The rejection probability of polynomials can be derived analytically (see [ABD + 19, Section 3.4] for details).In the following, R i stands for the random variable denoting the rejection event.That is, r i = 1 (resp.r i = 0) denotes that the i-th coefficient has been rejected (resp.accepted).Concretely, this probability is given by since for a given value of x i , exactly 2(γ − β) − 1 of the 2γ possible values of y i lead to a z i contained within the bounds.Similarly, we denote the rejection of a full polynomial with the random variable R. Concretely, the rejection probability is given by (5) since it requires that all the coefficients are within the bounds.1 Putting all together, we observe that the rejection probability of polynomials or coefficients is independent of the secret key s when the adversary only has access to c and z.
Early-Abort.The performance impact of rejecting signatures is relatively significant for CRYSTALS-Dilithium, since it requires to restart the signature generation process from quite an early step.Hence, to speed up the Sign algorithm, implementations may use an Early-Abort strategy.That is, as soon as a coefficient is out of bound, the signature generation is aborted and a fresh y and c are sampled.As a result, the execution time of Sign leaks which coefficient is rejected.This does not affect the security of CRYSTALS-Dilithium in a blackbox setting [DLL + 17].

Side-Channel Attacks
A common type of physical attacks are so-called side-channel attacks, that exploit information obtained through physical leakages such as power consumption or timing to recover information on secret keys [KJJ99,CRR02].Concretely, the adversary observes the leakage L j (where j indexes the physical observations, also referred to as traces, of the cryptographic function's different executions) which is a random variable corresponding to the power consumption, electromagnetic radiation or timing of the execution of a cryptographic function (e.g., an AES encryption).From this observation, she can obtain partial information on ephemeral intermediate variables x j (e.g., an S-box output) given by the conditional probability Pr[x|L j ].Given some other public information p j (e.g., a plaintext), she derives information on a long-term secret k (e.g., a secret key) summarized by the conditional probability Pr[k|L j , p i ].This estimated probability can be obtained thanks to various tools such as Gaussian templates [CRR02].However, a single leakage observation may not be enough to completely recover the secret.Hence, the adversary can observe multiple (N ) leakage traces and use a maximum likelihood approach to accumulate information on the secret.Precisely, she computes the likelihood of a secret k given the N leakage observations according to and selects the most probable key according to Overall, as the number of traces N increases, the probability that the guessed key k * is the correct one increases.Hence, the more traces are available, the more successful the attack will be.

Soft Analytical Side-Channel Attacks (SASCA)
The previously described generic side-channel attack leverages observations on a single ephemeral variable x to recover information on the key.Yet, many variables in cryptographic implementations are dependent on the secret key.In the following, we describe SASCA [VGS14] which is a strategy to exploit simultaneously leakages on various variables.First, we recall that the relationships between these sensitive variables can be represented with a factor graph.A factor graph is composed of nodes and edges.It contains two types of nodes.The first type is the variable node that represents a variable (or any value) within the implementation that can have a given distribution.The second type of node is the function node, that represents an operation between variables.These can typically be XOR gates, AND gates, modular additions or modular multiplications.In the factor graph, edges allow connecting variables with function nodes, hence forming a bipartite graph.A side-channel adversary exploiting such a factor graph performs a so-called SASCA that follows the next steps: 1.The adversary defines a factor graph that represents the implementation under attack.It typically contains sub-parts of the secret key (e.g., secret key bytes), public variables (e.g., plaintexts), and ephemeral secret variables (e.g, an S-box output).If the operations are repeated (exploiting multiple traces), parts of the graph can be duplicated.
2. Thanks to access to the leakage L, the adversary can gain partial knowledge on intermediate variables within the factor graph.That is for a variable x, the adversary derives Pr[x|L] from the leakage L similarly as in standard template attacks (see Section 2.3).She then initializes the variable node x within the factor graph to Pr [x|L].We denote this distribution as the initial distribution of the variable node and use the notation Pr ini [x].
3. Once all the information obtained from the leakage is encoded into the factor graph, the adversary can run a Belief Propagation (BP) algorithm.Informally, BP iteratively updates the distributions of intermediate variables thanks to messages received from neighbor nodes, where messages are also distributions.In turn, the messages sent from function to variable nodes and from variable to function nodes are updated.Namely, the messaging passing rule from a variable node v to a function node f is given by where δ v denotes all the neighbors of v and δ v \f denotes the set of all its neighbors excluding f .The message passed from one variable to a function is the product of all the other received messages and the initial distribution.The message passing rule from a function to a variable node is given by where I denotes one combination of input/output values and ζ(, ) is the compatibility function of the function node (which is equal to 1 if the combination of input/output values is possible and 0 otherwise).
This adversary is known to be optimal when the factor graph does not contain cycles (i.e., is a tree).When the factor graph contains cycles, then this method becomes heuristic but has been demonstrated to be effective in the context of side-channel attacks [VGS14, HHP + 21, BS21].

Generic Attack Framework
In this section, we describe the methodology for the attack.We first start with a high-level introduction, followed by the description of the factor graph exploited by the SASCA adversary.Eventually, we discuss optimizations used to compute the propagated messages.

High Level Description of the Attacks
All the attacks proposed in this work are based on SASCA.These attacks enable to recover each of the secret key polynomials in CRYSTALS-Dilithium independently in a divide and conquer fashion, hence we focus on the problem of recovering a single secret key polynomial s.The attack can simply be repeated on each polynomial to recover the full secret key.As illustrated in Figure 1, all the proposed attacks are performed in two phases.
Information Extraction.The first step consists of recovering partial information on each of the coefficients in the polynomial x with x = s • c.In each of the considered attacks, the probability of some coefficient x i posterior to the physical attack is estimated through profiling e.g., using templates.These probabilities Pr ini [x i ] are the actual ephemeral secret information that the adversary can extract from the CRYSTALS-Dilithium signature generation.This can be obtained from side-channel leakages on y and x, a fault attack biasing the distribution of y or their combination.This can be used to recover the secret key either together with a valid signature z or from a rejection event R when the signature is rejected.The information extraction is specific to the adversarial capabilities, and we describe several scenarios in the next sections.
Information Processing.The second step consists of leveraging all the obtained Pr ini [x i ] (possibly from several traces) to map them to information on the secret key s via SASCA.This step is identical for every attack considered in this paper.In the remainder of this section, we describe how information on Pr ini [x] can be efficiently accumulated to the secret key Pr[s] by virtue of a dedicated factor graph.

Factor Graph Description
The factor graph is illustrated for a small example of a degree four polynomial in Figure 1.The top variable nodes s i denote the coefficients of a secret key polynomial (not to be confused with the polynomials s 1 and s 2 in the description of Dilithium), and the bottom variable nodes are coefficients of the polynomial x = s • c.These are linked through the weighted sum function nodes Σ c,i .This function node Σ c,i is graphically described in Figure 2 and is the straightforward mapping of Equation 2. Concretely, it implements the expression x i = j c j s j where the polynomial c = rot(c, i) is the rotation of c defined as rot(c, i) = c • X i mod X n + 1.As depicted by the color in Figure 1, each coefficient in x is part of an independent sub-graph involving one single Σ c,i node.
When the challenge polynomial c is known, exactly τ of the edges of each independent sub-graph in Figure 2 have to be kept as c contains only τ coefficients different from zeros.For the non-zero coefficients, the function nodes •c i denote the multiplication by either 1 or −1 depending on the value of c i .This function node can be simply implemented as a re-ordering of the propagated messages.For efficiency, we merge the computation on this function node with the subsequent addition as detailed in Section 3.3.
An important remark is that all the polynomials involved in the signature generation have a small norm.As a result, all the intermediate variables are smaller than q meaning that no modular reduction must be done.Therefore, we next discuss the propagation rules and omit the (unnecessary) modular reductions.
Info. processing

Physical attack
Figure 1: Example factor graph for parameters N = 1 trace and polynomial of degree

Efficient Propagation Rule Computation
The previously described factor graph could be computed with a generic SASCA tool such as SCALib [CB23].However, it comes with drawbacks such as execution time and memory consumption.Instead, we describe the propagation rules optimized for the previous factor graph.That is, how both Equation 8 and Equation 9 are efficiently implemented in such a context.To this end, we take a bottom-up approach and first describe the propagation rule for a single Σ (c,i) described in Figure 2. Then we discuss how multiple of these can be combined as in Figure 1.

Propagation Rule for Σ (c,i) .
Starting with a single Σ (c,i) , we detail the notations in the corresponding visual representation (see Figure 2).There, every variable node g i stands for the sum g i = j=i j=0 c j • s j where the weights c i are known.We start with the propagation rules around + i , and continue with the propagation on the full factor graph.
The propagation rule that computes the message from the function node + i to the following variable node g i is next denoted as convadd and is described in Algorithm 1.In order to compute m +i→gi , the other incoming messages to the function node + i are needed.Hence convadd takes as input the incoming messages m gi→+i and m si→+i together with the known value c i that multiplies s i .From Equation 9, the outgoing message m +i→gi is computed by summing over the product of all the other combinations of incoming messages.That is, the algorithm has two nested loops to cover all the input combinations of g and s . 2 The corresponding output value o is computed as o = g + c • s and the value of the outgoing message is updated for the value o.Eventually, we note that convadd can be used to compute the messages m +i→gi−1 by negating c i and adapting the ranges for input values.
Algorithm 1 convadd (m gi−1→+i ,m si→+i ,c i ') Quit early as output message will be the input message 2: Init message with zeros 4: for s ∈ −η, η do 5: Next, we describe in Algorithm 2 the propagation rule on the full function node Σ.First, the challenge polynomial c is rotated with rot with the appropriate index in order to obtain c'.Then, all the messages m +i→gi going from left to right in Figure 2 are computed.To do so, the m g0→+1 is initialized with the incoming message m s0→Σ re-ordered according to c 0 .Then, we iterate in ascending order on all the + i .There, we note that m gj →+j+1 ← m +j →gj as per Equation 8 as there is no other incoming messages to g j 's.A similar iteration is applied in descending order (right to left on Figure 2) in order to compute messages m +j →gj−1 .The last step is to compute the messages m Σ→sj by computing the propagation rule around + j with the two already available messages m gj →+j and m gj−1→+j .This is done thanks to convaddrev which is slightly adapted convadd in order to include the effect of c i on the output variable.3Propagation Rule for Multiple Σ (c,i) .In the above, we described the SASCA propagation rule for a single function node Σ (c,i) .Multiple of these factor nodes can be connected to the variable nodes of the secret key coefficients s i .This is the case for the factor graph of a single polynomial multiplication as described in Figure 1.Concretely, the variable node s i receives multiple messages m Σ (c,j) →si from all the Σ (c,j) nodes it is connected to (c i = 0).The messages m si→Σ (c,j) are then computed according to Equation 8 as the product of all other incoming messages to s i hence as m si→Σ (c,j) = n =j m Σ (c,j) →si .This product of a large number of small values can lead to computational errors.Therefore to compute these messages, we first compute and store the sum of log-probabilities of all messages.Then the outgoing messages are computed such as log(m si→Σ (c,j) ) = n log(m Σn→si )−log(m Σj →si ).Eventually, the guessed value s * i for a secret key coefficient s i is the value maximizing the product of incoming messages similarly to Equation 7. It can be derived from this sum of log-probabilities as In the above, we only describe the case where a single signature corresponding to a single product s • c is observed, where one Σ (c,i) is added for each of the n coefficients in Algorithm 2 Propagation rules for Σ associated to x i .
Input: All input messages m sj →Σ for j ∈ 0, n , challenge c, degree of exploited coefficient i and its associated Pr ini [x i ].Output: Generates all the messages m Σ→sj for j ∈ 0, n .m Σ→sj ← convaddrev(m gj →+j , m gj−1→+j , c i ) the output polynomial x.Yet, additional Σ (c,i) nodes can also be added to the factor graph by observing N signatures corresponding to different products s • c for different challenges c and a constant secret key polynomial s.In such a case, the factor graph contains at most N × n different nodes Σ (c,i) .All these Σ (c,i) must not necessarily be included in the factor graph either.This can be the case if the associated Pr ini [x i ] obtained through the physical attack is known to be secret key independent (e.g., no leakage).In all cases, the above propagation rules remain the same.

Discussion on the Factor Graph Selection
Knowledge of c.In the above, we assume that the adversary knows exactly c.In the case of released signatures, this knowledge is trivial as it is embedded into the signature.This case is studied in Section 4 and Section 6.When the adversary does not have access to a released signature (see Section 5), c is not known but can potentially be recovered by other means such as side-channel leakage.The state-of-the-art hardened implementations of CRYSTALS-Dilithium often do not protect the polynomial c against side-channel attacks [MGTF19, ABC + 23] and this polynomial is being manipulated as single bits per register.As a result, we consider that such a polynomial could be recovered by side-channel adversaries.As an example, Karabulut, Alkim and Aysu [KAA21] have shown how to recover some amount of information on c.The proposed attack in this paper also extends to the setting where only noisy leakage on c is obtained.In such a case, the multiplication with a weight c is simply replaced by a multiplication function node between s i and c i making the propagation rule slightly more complex.We leave such a detailed investigation to future work.

Impact of Fast Polynomial Multiplication with NTT.
We notice that the factor graph used for this attack implements a school-book polynomial multiplication.However, efficient implementations of CRYSTALS-Dilithium usually leverage NTTs to perform polynomial multiplications [AHKS22].Yet, we stress that the attack methodology is independent of the polynomial multiplication methodology as it is based on the definition of polynomial multiplication itself.The only slight advantage of using NTT-based multiplication is that direct leakage on x can be avoided as it is not explicitly computed in the standard domain and only in NTT representation (this is not the case in [AHKS22]).Even in this case, the attack is also applicable as direct leakage on y, which cannot be avoided, allows to initialize the factor graph as discussed in the following sections.Interestingly, while prior works using BP against lattice-based cryptography mainly exploit the structure of the NTT used for polynomial multiplication [PPM17, HHP + 21], our work makes use of BP independently of the multiplication strategy.
Performance Considerations.A performance consideration is that only τ coefficients are different from zero in c (see Table 2).When the challenge polynomial c is known, exactly τ of the edges must be kept for each Σ i .Overall, for each Σ i node, only the outgoing messages m Σi→si and the associated Pr ini [x i ] must be stored in memory.This leads to a total of τ (2η + 1) + 2β + 1 64-bit floats that need to be stored.For example, if all the Σ i are included and 1000 signatures are used for the attack, 0.72 gigabyte is needed for Dilithium-2, 1.7 gigabyte is needed for Dilithium-3 and 1.1 gigabyte is needed for Dilithium-5 to store the full factor graph.As an example of the run time, our Rust implementation requires 59.2 seconds to process 256, 000 different Σ i nodes for Dilithium-3 on a single core.This corresponds to the processing of 1000 side-channel traces where each of the 256 coefficients are exploited.In practice, we multi-thread this computation on 32 cores (64 threads) and reduce the run-time down to 2 seconds.
Eventually, we note that the propagation rule for + (convadd) can also be implemented by leveraging FFT-based convolutions as proposed in [PPM17].However, the benefits are not obvious as one of the inputs to the addition is always small ( −η, η ).The study of such an approach and the practical benefits it may bring is also left for future work.
Application to s 2 .While this work focuses on describing the recovery of s 1 from the relation z = y + s 1 • c, a similar attack applies to recover s 2 .The main relation exploited instead is r = w 0 − s 2 • c.The polynomials r are also subject to a bound check and for accepted signatures, r can be computed from the signature and the public key as explained in [ABC + 23, Section 3.2].This approach requires knowledge of t 0 , a part of the public key which is not explicitly revealed by the signer but not necessarily sensitive or to remain secret, as it has been hinted that t 0 can be recovered from enough signatures in [Lyu22, RJH + 18, RRB + 19].

Physical Attacks with Valid Signatures
In this section we detail the case where the adversary obtains the signature, i.e., both polynomials z and c.We first describe the leakage and fault models we consider, and then continue with the methodology used to initialize the factor graph described in Section 3 with Pr ini [x i ]'s.Finally, we describe the results of simulated attacks for both side-channel and fault attacks.

Leakage and Fault Models
We start by describing the leakage and fault models used for the simulated attacks.We stress that the results presented in this paper are not restricted to these models and also apply to others.In both cases, we assume that the adversary knows exactly the leakage and faults models.We leave the study of unprofiled scenarios to future investigations.
Leakage Model.In this work, we consider leakage on polynomials x = c • s and y described in Section 2.2 in a similar way.For simplicity, we only describe the leakage for x.The leakages under consideration are the sum of a deterministic data-dependent function and Gaussian noise.We assume that the coefficients leak independently.The deterministic component of the leakage function is denoted by which is the sum of bits of x i mod ± q.The bits involved in the leakage are defined by the list B where each value in B is a bit index.As an example, the leakage L 31,± xi corresponds to the sign-bit of the coefficient x i in 32-bit two's complement representation.In case the data is represented in the interval −(q − 1)/2, (q − 1)/2 , the notation L B,± xi is used.If the coefficients are represented in the interval 0, q , the notation L B,+ xi is used.From this, the leakage on a polynomial coefficient is a random variable where σ 2 SNR is the noise variance ensuring the SNR for the given deterministic leakage function L * , * xi .Concretely to mount the attack, the adversary first computes the probability of observing a leakage sample l xi with standard Gaussian template attacks assuming a given value for x i .That is Second, she computes the value of Pr[x i |l xi , L * , * xi , σ SNR ] thanks to Bayes's theorem (normalization) over all the possible values x i such as Generally, we will assume that the polynomials have a signed representation (e.g., see [AHKS22]) and that the device leaks the hamming weight of intermediates.The latter deterministic leakage component is then denoted as L 0:31,± xi .In the following, the adversary is able to exploit leakage on all the coefficients in the polynomials y and/or x = s • c depending on the context.Fault Model.Similarly to side-channel leakage models, the fault attacks detailed in this work apply to various models.In this work, we assume that the adversary is able to insert a fault that will induce a known bias on the bits of a polynomial coefficient.Concretely, the fault adversary can set a bit b to zero with probability α such that: As a result, the probability of a given faulted coefficient is proportional to which is the product of the probability on each of its (non-uniform) faulted bits.Similarly, as for Equation 14, the actual Pr[y i ] is obtained by normalization and the list B is the same as defined in the leakage model definition.As an example, the fault model F 0:31,± yi,1 sets all the bits of the coefficient y i to zeros with probability 1.In the following, we will assume that a single coefficient is faulty in the polynomial y per execution.Yet, multiple coefficients in y can also be faulted in order to reduce the number of faulted signatures required.We stress that faults can be inserted also with the signed representation of coefficients.In the following, we only consider faults on y, as faults on x does not seem to be directly exploitable with our framework.We leave the investigation of faults on multiple coefficients in y, on signed representations and on x as future work.

Initialization of the Factor Graph
The previous equations describe how to derive the probabilities on coefficients in x and y from either side-channel leakages or induced faults.In the following, we describe how these probabilities are used to derive the initial probabilities Pr ini [x i ] for the SASCA described in Section 3. Concretely, we observe that the knowledge of the released signature polynomial z enables to directly translate information on y to information on x because of the additive relation z = x + y.That is, in the case of side-channel leakage on y, the equation Pr[ is used to derive the probability on the corresponding x i .Similarly, when a fault is introduced on a coefficient y i , the resulting posterior probability on x i is derived thanks to All these probabilities on x i can be combined into the initial probability used by SASCA according to yi, * ] where the two first terms represent the side-channel leakage on y i and x i , respectively.The last term stands for the information on x i obtained from the fault injection on y i .This equation puts forward that both information from side-channel and fault attacks can be summarized in the same probability, which makes the extension to combined attacks straightforward.

Experimental Results
In the following, we describe the results of both side-channel and fault attacks to recover a single secret key polynomial s assuming the model described in the previous section.In both cases, both the polynomials z and c are assumed to be known (e.g., through a valid signature).The efficiency of the attack is estimated through the median number of correctly recovered coefficients in the secret key polynomial s among the 256 ones.We use the median as it is the complexity to recover a given number of coefficient with 50% chance.Eventually, the reported results do not chance significantly when the mean is used instead of the median.This is estimated from 100 independent experiments after 20 iterations of the BP algorithm described in Section 3. We did not observe significant improvements when increasing that number of iterations.
Recent works such as [DDGR20] have shown how to integrate information extracted from a side-channel into lattice reduction attacks against LWE-based schemes.However, up to our knowledge, it is not yet fully understood how to accurately quantify the extent of information required on the secret key coefficients to break CRYSTALS-Dilithium.This is more so challenging when dealing with soft/probabilistic information as is the case for common side-channel and fault attacks.This line of research is orthogonal to this work, and we leave it as an open question.In the following, we will consider that an attack is successful once the full secret key polynomial is recovered only from the physical attack.Side-Channel Attacks.The results for side-channel attacks exploiting hamming weight leakage on all the coefficients of both polynomials x and y are reported in Figure 3. On these plots, each of the curves represents a different SNR.For level-2 parameters, only 4 traces are needed to obtain all the secret key polynomials when the noise is very low (SNR = 100).For SNR = 0.1, around 70 traces are needed and around 700 are needed with SNR = 0.01.As expected, as the noise increases, the number of traces required to mount the side-channel attack increases with an inversely proportional relationship.From these plots, we observe that both level-2 and level-5 have similar results, while level-3 requires slightly more traces.We expect that this difference is due to the larger secret key coefficient size η (see Table 2).
Fault Attacks.Similarly, the results of fault attacks with a single faulted coefficient y 0 in a valid signature are reported in Figure 4.As a single coefficient leads to key dependent data, a single function node Σ 0 is added to the factor graph in Figure 1 per challenge c.Similarly, as for side-channel attacks, we observe that the less precise the fault is, the less efficient the attack.As an example, when α = 1 meaning that y 0 is always set to zero, around 23 faulted and released signatures are required for level-2 parameter set.When α = 0.6, meaning that the bias on y 0 towards zero is weaker, around 2000 faults are needed.
Note that we report the number of faulted and released signatures.However, as CRYSTALS-Dilithium is a Fiat-Shamir with abort scheme, multiple signature trials are performed.This means that the faulted signature might not be among the released ones.Concretely, if a single fault is inserted during all signature attempts, the number of faults to insert is derived by multiplying the numbers from Figure 4 by the average number of repetitions from Table 2.We refer to [EAB + 23] for an analysis of and strategies to deal with CRYSTALS-Dilithium signature aborts in combination with faults.
Eventually, we stress that if multiple faults are injected in a single signature attempt, then the same number of additional Σ i nodes can be inserted in the factor graph.This has the effect of decreasing proportionally the required number of faulted and released signatures N .Indeed, the information extraction methodology is sensitive to the number of Σ nodes, independently of whether they are added through additional challenges c or through additional faults.Finally, we note that exactly the same methodology can be used to mount combined attacks as illustrated with Equation 19.

Physical Attacks with Rejected Signatures
In the previous section, we demonstrated that leakage on y and/or x can be used in combination with a valid signature (z, c) in order to retrieve the secret key polynomials.In this section, we will demonstrate that the combination of leakage on y, the knowledge of the challenge c (e.g., through side-channel analysis) and the fact that a polynomial is rejected or not leads to exploitable information on the secret key.This applies to implementations with and without an early-abort strategy.

Initialization of the Factor Graph with Early-Abort
We denote the event of a coefficient in the signature polynomial z i as being rejected as R i = r i .Namely, r i = 1 (resp.r i = 0) if the coefficient is rejected (resp.accepted).In case the early-abort is implemented, the coefficients of z are checked individually and sequentially.The process is aborted as soon as one coefficient is rejected, leaking r i through timing.For consistency, we next denote the information induced by the physical attack such as side-channel leakage or a fault as P. From this, we will estimate Pr[x i |R i , P] which is the probability distribution of the coefficient x i knowing if z i has been rejected and the information extracted from the physical attack.
Concretely, the probability that the coefficient z i has been rejected given a value of x i is expressed as which sums over all the combinations (y i , x i ).In the previous expression, we note that the term Pr[R i = 1|x i , y i , P] is a compatibility function that is equal to one when |x i + y i | ≥ γ − β and equal to zero otherwise.Similarly, the probability that the same coefficient is accepted is given as Finally, the probability on x i can be obtained thanks to Bayes's theorem according to similarly to Equation 14.In this equation, values for Pr[R i = r i |x i , P] are derived with Equation 20 if r i = 1 and Equation 21 otherwise.This probability can then directly be used to initialize the factor graph described in Section 3.

Initialization of the Factor Graph without Early Abort
We also study the case where the early-abort is not implemented.Hence, the adversary only knows if a full polynomial of z has been rejected (resp.accepted), denoted with R = 1 (resp.R = 0), but does not have knowledge of which particular coefficient was rejected.Concretely, in the following, we will derive the expression of Pr[x i |R, P].To do so, we first describe the expression for the probability that a coefficient is individually accepted which is Pr where Pr[R i = 0|x i , P] is obtained from Equation 21.The distribution Pr[x i ] can be analytically computed because x i is defined as the sum of τ uniform variables uniform on −η, η .Then, the probability that the polynomial is accepted given a single coefficient value x i is given by because the polynomial is accepted only if all the coefficients are individually accepted.
Similarly to Equation 21, the probability that the polynomial is rejected is given by Finally, the posterior distribution of the coefficient x i given R = r can be expressed as Again, this expression can be directly used to initialize the factor graph.

Experimental Results
Side-Channel Attack with Early-Abort.In Figure 5, the results of side-channel attacks against rejected signatures when early-abort is implemented are reported for all parameter sets.Concretely, the target is assumed to leak L 0:31,± y as well as the index of the rejected coefficient z j such that j = min i (R i = 1).As a result, the adversary also gets knowledge that R i = 0, for all i < l.From this information, we leverage Equation 22 in order to initialize the factor graph described in Section 3. Despite the fact that both accepted and rejected coefficients could be exploited to mount the attack, we noticed that the coefficient for which R j = 1 provides the most information on x j .As a result, we only exploit the rejected coefficient in order to reduce the size of the factor graph (and its memory requirements).From these figures, we observe that with SNR = 100, around 7 • 10 5 (resp.2.1 • 10 6 ) rejected polynomials are needed to recover the secret polynomial for Level-2 and Level-5 (resp.Level-3).As expected, decreasing the SNR, hence increasing noise, leads to an increased number of required signatures.Concretely, around 10 7 are required to recover the secret key polynomial when SNR = 1.
Side-Channel Attack without Early-Abort.Next, we discuss the previously described attack exploiting rejected polynomials when the early-abort is not implemented.That is, the index(es) of the rejected coefficient(s) is unknown to the adversary.Accordingly, the factor graph is initialized with Equation 26.As a result, the graph includes a node for every x i as the adversary has no prior knowledge of which node is the most informative, as opposed to when an early-abort strategy is used.This results in a large factor graph which is challenging to process.We were not able to mount such an attack against standard CRYSTALS-Dilithium parameter sets because of insufficient memory in our setup.Therefore, we introduce a weakened parameter set called Level-0 which is equivalent to Level-2 except that τ = 14.This weakened parameter set is used to put forward that secret information can also be recovered even if only the knowledge that z has been rejected provided to the adversary in addition to leakage on y.This applies to all the parameter sets despite the fact that we were not able to mount a full attack with our evaluation setup.Results of this attack on the so-called Level-0 are given in Figure 6.There, we show that for (almost) noise free Hamming Weight on y, around 5 • 10 5 rejected polynomials are needed.

Physical Attacks against Shuffled y
In this section, we apply the methodology described in Section 3 in order to target an implementation where the polynomial y is protected with shuffling, as proposed in [ABC + 22].Note that the later version of this work [ABC + 23] does not rely on shuffling anymore to protect the considered CRYSTALS-Dilithium implementation.In the following, we demonstrate that shuffling y is indeed not sufficient to reliably protect against sidechannel attacks.It only increases the attacks' complexities, but they remain practical.Next we follow the same approach as in previous section.Namely, we first describe how to initialize the factor graph in that setting, and then provide the results.

Initialization of the factor graph
In a shuffled implementation, the adversary does not know the order in which the coefficients in y are manipulated.That is, only the leakage of y j is available, which is the leakage of the j-th manipulated coefficient.From this, the adversary can not relate this leakage to a single specific coefficient y i .As a result, we leverage the approach from [VMKS12], later refined in [ABG + 22, Section 3], in order to attack shuffled implementations.The first step is to compute the probability of y i given the leakage on all the coefficients of y.Assuming that no leakage is available on the shuffle permutation itself, this expression is given by Pr where each of the Pr[y j |l y j , L * , * y j , σ SNR ] is the probability for coefficient manipulated at index j.Its expression can be obtained similarly as for standard template attacks (see Equation 14).From this estimated probability on y i , the probability on each of the coefficient x i can be derived with which is exactly similar to Equation 17.These probabilities are then directly used to initialize the factor graph.We note that Equation 27 is equivalent for every y i .As a result, it can be computed only once.In case permutation leakage is available (which is not assumed here), it can be incorporated into that equation, leading to different expression for every y i [ABG + 22].

Experimental Results
The result of the attacks against shuffled y are reported in Figure 7. There, we observe that for Dilithium Level-2, around ≈ 5 • 10 2 traces are needed when SNR = 100.When SNR = 0.5, around 1.2 • 10 5 are required.Interestingly we observe that when doubling the SNR, the attack complexity is increased by two.Meaning that the noise only linearly increases the complexity of the attack, as expected with shuffling [VMKS12].Additionally, when compared to an implementation without shuffling, the attack against a shuffled implementation requires 2 • 256 more traces independently of the noise level (see Figure 3 in comparison to Figure 7).That is expected by the size of the permutation (256) and the fact that we consider a single leaky polynomial [VMKS12].These results shows that the attack complexity is increased by a significant factor, but not completely avoided.

Conclusion
In this work, we put forward several new side-channel and fault attack vectors against CRYSTALS-Dilithium's signature generation leading to key recovery.We have shown that even a slight bias on the distribution of y posterior to a physical attack leaks sensitive information on the secret key.This is an improvement over previous work that requires high accuracy from the leakage traces or fault precision.Up to our knowledge, this work is additionally the first one to demonstrate physical attacks exploiting rejected signatures, putting forward that all the iterations of CRYSTALS-Dilithium, due to the Fiat-Shamir with aborts framework, are sensitive.Eventually, similar attacks should be applicable to various schemes such as Raccoon [dPPRS23] or HAETAE [CCD + 23].
For consistency and to simplify comparison in future works we extend the tables provided by Ravi et al. [RCB22] in their survey of physical attacks on Kyber and Dilithium to include our attack.The columns Attack_Vector and Countermeasure have been removed for conciseness, since observing side-channel leakage or inducing faults can be achieved by different means.In addition, our attack can be significantly hindered by masking with sufficient noise.The attack characteristic column has also been removed since all our attacks do not rely on specific inputs but only require to observe some information from the device e.g., side-channel leakage, accepted signatures or the fact that a bound check did not pass, which corresponds to the characteristic Observe_DUT_IO from [RCB22].For the full tables we refer the reader to Table 3 and Table 4 in [RCB22].
It seems that many previous attacks on CRYSTALS-Dilithium, such as the recent loop abort fault attack by Ulitzsch [UMB + 23], rely on Integer Linear Programming (ILP) whereas we rely on BP.As opposed to ILP, BP and the generic framework we propose, allow to exploit efficiently the information contained in the leakage distributions, whether they stem from side-channel or fault attacks.As future work, we would like to investigate and quantify the improvements provided by BP over ILP.In this work, we aimed to recover the full secret using BP without exploiting additional post-processing such as enumeration or lattice reduction as performed by Hermelink et al. [HMS + 23].Further investigating how to exploit diverse and probabilistic information coming from both side-channel and fault attacks in a lattice reduction attack is an interesting future perspective.

Figure 2 :
Figure 2: Internal description of Σ (c,i) factor node used in Figure 1 and c = rot(c, i)

Figure 3 :
Figure 3: Side-channel attack with Hamming Weight leakages on the 256 coefficients of polynomial L 0:31,± x

Figure 4 :
Figure 4: Fault attack on the single coefficient y 0 with model F 0:23,+ yi,α .N is the number of faulted and released signatures.Curves correspond to the median proportion of correctly recovered coefficients for various induced bias α values.

Figure 5 :
Figure 5: Side-channel attack with Hamming Weight leakages on the 256 coefficients of polynomial y with L 0:31,± y and rejected signatures.Early-abort is implemented and the index of the rejected coefficient is known.N is the number of rejected signatures.Curves correspond to the median proportions of correctly recovered secret key coefficients for various SNR values.

Figure 6 :
Figure 6: CRYSTALS-Dilithium Level-0.Side-channel attack with Hamming Weight leakages on the 256 coefficients of polynomial y with L 0:31,± y and rejected signature.Earlyabort is not implemented.N is the number of rejected polynomials.The curve corresponds to a very high SNR.

Figure 7 :
Figure 7: Side-channel attack with hamming weight leakages on the 256 coefficients of polynomial y with L 0:31,± y and shuffled polynomial.N is the number of released signatures.Curves correspond to the median proportion of correctly recovered coefficients for various SNR values.

Table 1 :
Summary of sensitivity and countermeasure effectiveness in CRYSTALS-Dilithium for valid and rejected signatures.

Table 2 :
Relevant CRYSTALS-Dilithium parameters for this paper.

Table 3 :
Extension of [RCB22, Table4] with the side-channel-based version of our attack.The number of traces is given for all Dilithium security levels (in the form level II/level III/level V).

Table 4 :
Extension of [RCB22, Table3] with the fault-based version of our attack.The number of executions is given for all Dilithium security levels (in the form level II/level III/level V).The fault sets 1 coefficient to zero.