Single trace HQC shared key recovery with SASCA

. This paper presents practicable single trace attacks against the Hamming Quasi-Cyclic (HQC) Key Encapsulation Mechanism. These attacks are the frst Soft Analytical Side-Channel Attacks (SASCA) against code-based cryptography. We mount SASCA based on Belief Propagation (BP) on several steps of HQC’s decapsulation process. Firstly, we target the Reed-Solomon (RS) decoder involved in the HQC publicly known code. We perform simulated attacks under Hamming weight leakage model, and reach excellent accuracies (superior to 0 . 9) up to a high noise level ( σ = 3), thanks to a re-decoding strategy. In a real case attack scenario, on a STM32F407, this attack leads to a perfect success rate. Secondly, we conduct an analogous attack against the RS encoder used during the re-encryption step required by the Fujisaki-Okamoto-like transform. Both in simulation and practical instances, results are satisfactory and this attack represents a threat to the security of HQC. Finally, we analyze the strength of countermeasures based on masking and shufing strategies. In line with previous SASCA literature targeting Kyber, we show that masking HQC is a limited countermeasure against BP attacks, as well as shufing countermeasures adapted from Kyber. We evaluate the “full shufing” strategy which thwarts our attack by introducing sufcient combinatorial complexity. Eventually, we highlight the difculty of protecting the current RS encoder with a shufing strategy. A possible countermeasure would be to consider another encoding algorithm for the scheme to support a full shufing. Since the encoding subroutine is only a small part of the implementation, it would come at a small cost.


Introduction
During this contest, the security of involved cryptosystems has been extensively studied by the community.HQC has been the target of several Side-Channel Attacks (SCA) since 2019.The former version of HQC, based on BCH codes, was attacked by two resembling timing attacks [PT19, WTBB + 20] in 2019 and by a chosen ciphertext attack [SRSWZ20] by Schamberger et al. in 2020.The latter attack is based on a decoding oracle that can corresponding authors: {guillaume.goy,julien.maillard}@cea.frdistinguish whenever the BCH decoder corrects an error.Thanks to a chosen ciphertext strategy along with a resolution based on linear algebra, they successfully recover the whole secret key.In 2022, authors adapted their approach to build an attack [SHR + 22] against the new version of HQC based on concatenated Reed-Muller (RM) and Reed-Solomon (RS) codes, allowing successful recovery of the secret key with 50000 power traces.Meanwhile, another key recovery side-channel attack with chosen ciphertext strategy [GLG22a] was exhibited against HQC-RMRS.Authors targeted the Fast Hadamard Transformed (FHT), involved in the RM decoder, to perform an attack with less than 20000 electromagnetic measurements.
Eventually, Goy et al. exposed the frst single trace attack targeting the HQC shared key [GLG22b].They used the structure of the concatenated RMRS decoder and the Decryption Failure Rate (DFR) [AMAB + 17] analysis to observe that, in practice, the RS decoder manipulates mostly error-free codewords.The idea behind the attack is interesting, but authors were unable to recover the shared key from the noisy side-channel information without computing at least 2 96 algebraic operations.This paper shows a vulnerability in the implementation of HQC-RMRS, but does not propose a practical attack.
To be complete, HQC can also be the target of generic attacks [RRCB20, UXT + 22] targeting the Fujisaki-Okamoto (FO) transform construction, cache attacks [HSC + 23] and timing attacks exploiting the randomness generator, namely the rejection sampling [GHJ + 22].These attacks will not be detailed in this paper.
Soft Analytical Side-Channel Attacks (SASCA) are powerful methods to perform SCA.SASCA algorithms are mostly based on Belief Propagation (BP) theory, which details can be found in [Mac03], chapter 26.BP was frst used as SCA against cryptography by Veyrat-Charvillon et al. [VCGS14] in 2014, targeting the AES Furious implementation.Authors described a practical attack and emphasize on the efciency of SASCA compared with the best state-of-the-art attacks at the time.SASCA was also used against the standardized cryptographic hash function Keccak: in 2020, Kannwischer et al. [KPP20] described a single trace attack on SHA-3.Authors mentioned a boolean masking countermeasure to thwart the attack, however, as specifed in [GS18], masking countermeasures could enable new attacks.
Finally, SASCA was also applied on PQC, namely the standardized lattice-based KEM Kyber [BDK + 18], renamed Module-Lattices KEM (ML-KEM) by the FIPS 203 [oSU23], was the target of four attacks [PPM17, PP19, HHP + 21, HSST23] between 2017 and 2023.Primas et al. [PPM17] introduced the frst BP based attack against Kyber.They showed that SASCA could be mounted against lattice-based cryptography, targeting the Number Theoretic Transform (NTT), an optimization strategy for lattice-based cryptography.Furthermore, they target a masked implementation of the NTT, leading at always recovering the secret key in real case attack scenario.Authors performed the attack in simulations under a Hamming weight leakage model, and obtained a satisfactory success rate (superior to 0.9) up to a σ = 0.4 noise level.Their evaluation on a real device required to build around one million templates.Later, Pessl and Primas [PP19] improved the attack by only crafting 213 Hamming weight templates.They also use node-merging (to limit the number of cycles), damping and graph scheduling techniques.Simulations showed a good success rate up to σ = 1.5.In 2021, Hamburg et al. [HHP + 21] combined SASCA with a Chosen Ciphertext Attack (CCA) strategy, recovering the long-term secret key up to σ = 2 with a success rate superior to 0.9.Ravi et al. [RPBC20] introduced fne and coarse shufing countermeasures to thwart BP attacks.In 2023, Hemerlink et al. [HSST23] analyzed the strength of these shufing countermeasures and assessed their resistance against Hamburg et al. attack.So far, these shufing countermeasures were not threatened by any attacks, but authors emphasize that this situation could lead to a "false security perception", and encourage precaution.

Our contributions
In this work, we introduce the frst practical single trace belief propagation attack against a PQC code-based cryptosystem, HQC, that can be executed within a few minutes.Specifcally we recover the shared key manipulated by the Reed-Solomon code involved in HQC-RMRS scheme.All presented attacks exploit either one or two templates targeting the Galois feld multiplication.We show that the reference implementation of HQC can be targeted by single trace attacks, and still threatened when protected with some countermeasures.Our attacks are performed both in simulations and in a real attack scenario on a STM32F407.
• We frst exploit the point of vulnerability identifed by Goy et al. [GLG22b] and transform it into a practical single trace side-channel attack aiming at shared key recovery.While this attack is based on the establishment of prior templates, the requirement for BP strategies is highly dependent on implementation choices.We describe how to build the factor graph for the RS decoder algorithm, which manipulates the error-free codeword containing information about the shared key.
• We show that codeword masking [MSS13], a masking strategy applicable to HQC, does not provide satisfactory security against our attack.Even if simple masking countermeasures of Kyber's NTT have been shown vulnerable to SASCA attacks, this consideration cannot be applied as-is for HQC.Indeed, codeword masking of the RS decoder is performed with a RS encoder for performance purposes.Hence, we provide a study against the RS encoder and show that no reasonable masking countermeasure can thwart our attack.
• We also study the strength of known shufing strategies (fne and coarse) [RPBC20], along with HQC specifc strategy (window shufing) against our attack.We show that none of these strategies constitute a sustainable countermeasure against our attack in a real case attack scenario.From an idea of [ATT + 18], we derive the "full shufing" strategy.This allows adding a high combinatorial complexity, making the attack impractical.
• Eventually, we observe that the re-encryption from the FO-like transform implies an additional encoder call during the decapsulation process.We combine the encoder and decoder leakages to perform a decapsulation attack.This new attack strategy requires to protect both decoder and encoder to thwart the threat.We show that changing the RS encoder strategy allows using the full shufing and protect against our attack.
Outline Section 1 recalls HQC construction, presents the targeted algorithms as well as the SASCA approach.Section 2 introduces the attacker model.Section 3 presents the single template attack, exploiting only the template leakages.Section 4 introduces the graph construction and SASCA attack against the RS decoder and presents our simulation attack results.Section 5 targets the codeword masking countermeasure for RS decoder, where we redo the same work as the previous section against the encoder.Section 6 presents practical attacks against the weak shufing countermeasures (fne, coarse and window) along with evaluating the full shufing.Section 7 introduces the decapsultation attack, combining leakages from decryption and re-encryption taking advantage of the FO-like structure.Eventually, we pesent practical results for our attack and draw conclusions and perspectives in Section 8.

HQC
Hamming Quasi-Cyclic (HQC) is a code-based cryptosystem which security relies on the hardness of solving the established syndrome decoding problem.The HQC Key Encapsulation Mechanism (KEM) is created from HQC Public Key Encryption (PKE) using a Fujisaki-Okamoto-like transform called the Hofheinz-Hövelmanns-Kiltz (HHK) transform [HHK17].To create the decapsulation, this transform adds two main operations to ensure the IND-CCA2 security of the KEM version : (i) the decrpyted message is re-encrypted to ensure that it comes from a fair ciphertext and after this check, (ii) hash functions are used to derived a share key from the decrypted message.In this paper, we only describe the PKE version of HQC: Since shared key derivation is a deterministic operation accordingly to the decrypted message, recovering the latter is enough to succeed in the shared key recovery of the KEM.
HQC PKE HQC ciphertext security stands on the ability of masking a codeword with random error, so that no one can decode it without the knowledge of the secret key.Thus, the selected error correction code does not need to be hidden, and anyone can be selected.For HQC-RMRS, authors proposed to use concatenated Reed-Muller (RM) and Reed-Solomon (RS) codes.In the following algorithms (see Figure 1), elements live in an ambient space R = F 2 [X]/(X n − 1), sometimes with a constraint on the Hamming weight: R ω = {z ∈ R | HW(z) = ω}, and whose parameters are given in Table 1.

Algorithm 2 Encrypt Algorithm 1 Keygen
Input: param Output: (pk, sk) At the end of the HQC-KEM protocol, we expect that m = m ′ to derive the shared key.In HQC KEM, the shared key derivation is a deterministic operation, hence securing the secret value m is as important as securing the secret key.The main difculty for HQC, is to prove that the Decryption Failure Rate (DFR), i.e. the decoding failure rate, is smaller than 2 −λ where λ is the security level: This work has been done for the current RMRS version of HQC [AGZ20].This low DFR on the decoder of HQC implies a property about the DFR of the internal Reed-Muller code.Indeed, in most cases, the intermediate codeword between RM decoder and RS decoder is already error-free.Namely, we have: The decoding error probability has been well studied in [AMAB + 17] (page 30, Table 4), and we summarized in Table 1.

Reed-Solomon Codes
Reed-Solomon Codes (RS) are a sub-class of cyclic codes.These [n, k, t] codes over F q are generated using a generator polynomial g(x) ∈ F q [X] of degree n − k.The generator polynomial is given as parameter of HQC scheme.Any message m ∈ F k can be seen as a In the reference implementation of HQC, the RS encoding is performed under systematic form, following strategy in [LCM84].
Encoding RS Let g(x) be the generator polynomial of a RS code and u(x) the polynomial associated to a message m, i.e. (m In HQC reference implementation [AMAB + ], this encoding is performed by Algorithm 4.

Algorithm 4 HQC Reed-Solomon Encoder from [AMAB + ]
Require: parameters: k, n Require: generator polynomial g ∈ F n−k q Require: a message m ∈ F k q Ensure: c := RS.Enc(m) ∈ F n q 1: Initialize c to 0 n 2: for i from 1 to k do for j from 1 to n − k do 5: ▷ gf_mul is the Galois feld multiplication 6: Decoding RS The RS decoder used in HQC follows the theory from [JH04].The strategy is based on the existence of a unique interpolating polynomial for the received codeword.This polynomial allows decoding up to half the minimum distance of errors.The frst operation is the syndrome computation (see Algorithm 5), done with the knowledge of the parity check matrix H = (h i,j ) 1≤i≤n−k .As a reminder, the decoder of HQC, and therefore 1≤j≤n the parity check matrix, are publicly known.for j from 2 to n do 4: ▷ gf_mul: Galois feld multiplication 5: return s Null syndrome From the low DFR (see Equation 2), we know that the input of the RS decoder in HQC is almost always an error-free codeword, which syndrome is zero.
For the rest of the paper, we will consider that this codeword is always error-free.As a consequence, after the syndrome computation, the RS decoder will manipulate only zeros; we will not describe the following operations.

Galois feld multiplication
The main operation during the encoder and the syndrome computation is gf_mul, the Galois feld multiplication.This algorithm uses a fast multiplication algorithm from [BGTZ08] based on a Fast Fourier Transform (FFT) model.With Algorithm 6, we describe this operation used in [AMAB + ], since the April 2023 reference implementation of HQC.The gf_mul implementation remains the same independently of the HQC selected security level.

Algorithm 6 Galois feld multiplication from [AMAB
Algorithm 6 performs a Galois feld multiplication a × b.A key point to note is that the two inputs a and b are not handled symmetrically by the algorithm.Indeed, t[1] extracts the two least signifcant bits of a in lines 8 and 16.Lines 25 and 26 shift the bits of a by diferent values in a for loop.This asymmetry results in a signifcant manipulation of one of the two operands, and we will see that this has consequences in the subsequent side-channel leakage.

SASCA with Belief Propagation
Belief Propagation (BP) is a widely used approach in the feld of probabilistic graphical models, particularly in Bayesian networks and Markov random felds.It is based on a message-passing algorithm designed to compute marginal probabilities or make inferences about random variables within these models.In the context of SASCA, the graph can be fed with leakage information (i.e., probability distributions) on some intermediate values during the computation of a target algorithm.
Belief propagation is applied on a bipartite graph, called a factor graph, which is composed of two types of nodes: variable nodes that are used to store the probability distributions of the algorithm's intermediate variables, and factor nodes that represent the arithmetical links between them.The process starts with an initialization step where each variable node in the graph receive a former "belief" marginal.This initial belief can come from side-channel leakage, often obtained with a template modeling.Variable nodes with no prior knowledge are initialized with a uniform distribution.Then, the message passing algorithm operates.The message µ x→f sent from variable node x to factor node f is defned as follows [KFL01]: Where n(x) returns the neighbors of x within the factor graph.Additionally, messages sent by a factor f depending on a variable x is computed with the sum-product formula depicted as follows: where X represents the set of variable nodes connected to f and ∼ {x} expresses the summary notation as defned in [KFL01].
Messages are passed iteratively between nodes in the graph.The algorithm is stopped when the maximum number of iterations is reached or when convergence is reached.The latter allows being more fexible regarding the setup of the maximum number of iterations, at the cost of fnding a strategy for detecting convergence.In this paper, we consider that a threshold on the maximal statistical change of all variables' distributions is a satisfying method to detect convergence.In other words, the algorithm stops if distributions of all nodes remains almost constant between two (or more) updates.Eventually, marginal distributions of all variables are extracted as follows: with Z being a normalization factor.
The belief propagation algorithm has been proved to be exact on tree-like graphs.In practice, cryptography related graphs often contain cycles, but BP (or loopy-BP in these cases) provides good empirical results.Eventually, several techniques such as message damping and scheduling can be applied when the graph contains cycles: these techniques are not used in this work.

Attacker Model
In this paper, we consider an attacker able to perform profled attacks on HQC decapsulation for shared key recovery.This implies that the adversary has access on a fully controlled clone of the real target device for the profling phase.For simplifcation purposes, we run both profling and attack procedures on the same physical device: the complexity of template portability does not fall under the scope of this paper.Throughout this work, we suppose the attacker to be able to craft templates from the gf_mul operation only.Hence, we suppose that the attacker has the ability to isolate an ordered sequence of gf_mul computations within a wider routine, such as the RS decoder or encoder.We believe that this task can be conducted thanks to pattern matching techniques, and is then eluded from our study.Eventually, as all attacks presented in this paper target the HQC shared key, the attacker does not have the ability to increase the Signal-to-Noise Ratio (SNR) with techniques requiring side-channel measurement of several HQC decapsulation instances (such as trace averaging).

Single Template Attack
In this section, we implement an attack aiming at recovering all the codeword bytes of the error-free codeword by using only one template.In a second phase, this codeword can be decoded to deduce the shared key computed at the end of the key exchange.Finally, we describe how the decoder structure allows coping with eventual template mispredictions and obtain high attack success rates.

Experimental Setup
We acquired traces with a "Langer Near Field" electromagnetic probe using a RT02024 Rhode-Schwarz oscilloscope with a sample rate of 1 GHz.The Galois feld multiplication gf_mul has been extracted from the April 2023 reference implementation of HQC [AMAB + ] following Algorithm 6.We selected the STM32F407 as our target board.We compiled the code with −O3 optimization, surrounded by a GPIO based trigger.This set-up leads to a computation time of 1.3µs and traces of 1300 points, see Figure 2 for the average acquired trace.These small-sized traces allowed us to perform our attack on the full length of the traces, without selecting points or areas of interest.In total, we acquired an amount of 500000 traces for randomly sampled inputs.

Templates on Galois Field Multiplication
Before prior templating phase, we conduct a leakage assessment on the three 8-bit variables involved in gf_mul: the two multiplication operands as well as the output.We make the assumption of a linear leakage model and rely on a Linear Regression Analysis (LRA).Namely, for a side-channel measurement x i and an 8-bit variable y i , we express the leakage as: Given a set of n training samples (x i ) 1≤i≤n (i.e., n traces) and under Gaussian noise ϵ, there exists a unique solution to this system β ˜= β ˜0, • • • , β ˜8 , i.e., an estimation of the parameters β = (β 0 , • • • , β 8 ), which minimizes the residual sum of squares defned as: The accuracy of the model can be measured through the coefcient of determination, denoted R 2 , which is computed as follows: Note that this metric needs to be computed in a univariate way (i.e., for each time sample).
Coefcients of determination corresponding to the three targeted variables are displayed in Figure 3.By observing the LRA output in Figure 3 we can observe that (i) the leakage of the frst operand is both important and spread along the computation of gf_mul: this can be explained by the several logical operations perform that act on this operand, (ii) the leakage corresponding to the second operand is less important and (iii) the output of gf_mul computation is leaking at the end of the function, probably when it is stored in main memory.
Then, we mount 6 diferent template attacks : (i) 3 templates are targeting the Hamming weight of inputs and output of the Galois feld multiplication.(ii) The 3 last templates aim at recovering the exact value of inputs and output involved in the Galois feld multiplication.To build the templates, we use Fisher's Linear Discriminant Analysis (LDA) as our classifer.The validation accuracy of each model is evaluated on datasets of diferent sizes segmented into 90% training and 10% validation traces.We analyzed the accuracy depending on the selected number of training traces and conclude that the best compromise was reached for 300000 training traces.Templates accuracies are summarized in Table 2. Several observations can be made: (i) the value of the frst operand can be predicted with a 93.89% accuracy, (ii) the value template attacks on second operand and output value do not provide predictions signifcantly better than random guesses and (iii) the Hamming weights of the output and second operand give satisfactory results, more informative than a random guess.
We can conclude that the high number of logical operations that act on the frst gf_mul operand (see Algorithm 6) is benefcial from a template attacker's perspective.Indeed, the various shifts allow to isolate the leakage of diferent partitions of the bit-level decomposition of the frst operand.This increases the separability between the diferent value classes.Consequently, this is easier for the LDA to discriminate values than Hamming weight classes for this particular operand.As a reminder, during the computation of the RS syndromes (see Algorithm 5), the message, which is the sensitive data of this computation, is used as the frst operand of the multiplication which is the one that leaks the most.Moreover, the storing of gf_mul's output in main memory allows an attacker to reach exploitable template accuracies.

Building Prediction Matrices
In this subsection, we describe a data structure called "prediction matrix", which aims at providing repeatable real-case like simulations by storing multiple template predictions.Designing a simulation that matches the predictions of a template attack on a real target is a hard task.Indeed, the outputs of the templates depend on several factors.For instance, the hardware components involved in the attack, such as the target board and the measurement chain, and the experimental setup conditions (e.g., EM probe positioning, temperature etc.) have an impact on the template accuracy.The choice of a classifer, as well as its exploitation of multivariate leakage, also have a considerable impact on the template's properties.
For these reasons, our real-case scenario attacks are performed thanks to prediction matrices.The latter contain a set on 100000 independent probability distributions predicted by the models displayed in Subsection 3.2, along with the corresponding true labels.The advantages of such prediction matrices are twofold.Firstly, randomly sampled elements from a prediction matrix can be seen as a real template prediction: this can be used in a simulation context to test the robustness of the attacks, that can easily be ran a high number on times.Secondly, as all attacks presented in this paper exploit the leakages of the gf_mul operation, the use of prediction matrices allows deriving attacks on several functions and countermeasure scenarios.
In our particular case, we stress that claiming that the use of prediction matrices is comparable to a real case scenario attack highly depends on the ability for the attacker to detect the gf_mul routines in wider side-channel traces.We believe that this assumption is reasonable within the attacker model we consider in this paper.

Combine and Conquer
From Algorithm 5, we notice that each codeword byte c[j] is independently manipulated n − k times within the for loop, in lines 2 and 4. In the current reference implementation of HQC [AMAB + ], gf_mul always manipulates the codeword bytes under the frst operand.This choice allows an attack, leveraging the high accuracy of the frst operand template denoted as p.The attacker may combine template outputs with a strategy, such as majority voting, which provides a lower bound for the success rate of the attack.
The probability for the good hypothesis to be ranked frst by the classifer can be seen as the result of a Bernoulli distribution with parameter p.Given that trials are independent, they can be combined into a binomial distribution with parameters n − k and p. Let's denote by X the random variable following this distribution for a codeword byte.One can observe that C 1 , the frst codeword byte, is not manipulated with gf_mul, and hence cannot be recovered with our template attack.For any other codeword byte, the majority voting is a success if and only if X > ⌊ n 2 ⌋.Furthermore, all codeword bytes are independent, which results into the following success probability:

Re-Decoding Strategy
In this paper, we apply the strategy from [GLG22b] that consists in re-decoding the recovered codeword which provides several advantages: (i) re-decoding allows correcting templates mistakes or inaccuracies, (ii) this allows at recovering the value of C 1 which cannot be found by a template results and (iii) the attacker gains additional fexibility regarding the accuracy of the template.The literature exposes two more efcient strategies to decode RS codes, namely list decoders.These decoders are not used in HQC for performance purposes, however we can take advantage of their increased error correction capability to improve the attack.
Decoding RS list decoders RS list decoders work by modifying the interpolating polynomial by adding some constraints [JH04].This strategy allows decoding more error than the classical decoder, but outputs a list of possible decoded messages instead of a single one.It was discovered by Sudan (S) [Sud00] in 1997 and improved in 1999 by Guruswami and Sudan (GS) [VG99].While the code can only correct up to t errors (see Table 1, with list decoding, if the number of errors is below a given threshold τ , depending on the code parameters and the size of the list, the true message belongs to the list.With HQC parameters, GS RS list decoder is able to correct up to respectively τ = 19, 19 or 36 errors, instead of t = 15, 16 or 29 for HQC 128, 192 or 256.Note that one error slot is already taken by C 1 .Indeed, C 1 is not manipulated with a gf_mul operation, so our attacker model does not allow to perform a template attack on this variable.The probability of success of the attack becomes:

Practical Attack
Targeting all security levels This template attack can also be conducted for HQC higher security levels.In fact, the gf_mul function is exactly the same, independently of the selected security level, allowing us to re-use templates (see Subsection 3.2).Parameters from Table 1 show that n, the number of codeword bytes to be recovered, increases with the security level.But, at the same time, n − k, the number of independent trials, also increases, giving more independent information about each codeword byte.

Results
We observe an accuracy of p = 0.9389 on the frst operand with 300000 training traces and a single attack trace (see Table 2).Considering this probability in equations 10 and 11, we obtain success rates greater than 0.9999 with or without the re-decoding strategy for all HQC security levels.
Discussion From Equation 11, we compute the minimum value of p such that the success of the attack stays beyond 0.9.It follows that a template accuracy of p min = 0.7262 is enough to succeed in the attack for HQC128.HQC192 and HQC256 require minimal template accuracy being respectively p min = 0.7250 and p min = 0.6834.Since the minimal required accuracy is lower for each security level than what we obtained in practice, we could consider attacking targets with higher noise level.This frst attack is based on an unfortunate choice of operand order for the multiplication in the reference implementation of HQC [AMAB + ].We can reasonably assume that, from the results presented herein, a informed developer will make the choice to swap frst and second operands.This allows manipulating sensitive data under the operand that leaks the least.Given that this multiplication operation is commutative, swapping operands does not imply computational overhead.

SASCA on Reed-Solomon Decoder
After the swap of operands, we are not able to perform the attack from Section 3 against the sensitive data, which is now "hidden" behind the second operand.Moreover, the frst operand, which value can be templated with high accuracy, now holds the content of the parity check matrix which is already publicly known.Nevertheless, results from Table 2 show that the Hamming weight of gf_mul's second input and output can be templated with high accuracy.This information is gathered into a factor graph.

Reed-Solomon Decoder Graph
We construct a factor graph to represent the RS syndrome computation depicted in Algorithm 5 (see Figure 4).Each of the n − 1 windows corresponds to an iteration of the second for loop (line 3).Within each window, m = n − k Galois feld multiplications (line 4) are performed, between a codeword byte and an element from the parity check matrix H, resulting in an intermediate syndrome value (line 2).The computation of each syndrome byte involves the XOR operation of each intermediate syndrome at the corresponding position in every window.Finally, we depict the initialization step (line 1) through a XOR operation with C 1 on each syndrome byte.
The factor graph presented in Figure 4 models the relations between each intermediate value used during the computation.In a normal use of a decoder, the output syndrome gives information about the random error added to the codeword.But here, we consider the RS syndrome as zero (see Subsection 1.1) allowing removing the lower part of the graph.This construction ends up with n − 1 windows, each representing an independent tree-like graph and beneft from the BP convergence proof in such graph topology.We recall that re-decoding strategy is available for the attacker, allowing them to recover C 1 , the frst codeword byte which is outside all windows.Building gf_mul sub-graph The main sub-operation performed during the RS syndrome computation is gf_mul, the multiplication in the Galois feld F 2 8 .This operation can be performed using a fast multiplication based on the Fast Fourier Transform (see Algorithm 6) [BGTZ08], which is the choice of the HQC authors since April 2023 in the reference implementation.However, this calculation can be done diferently, using the logarithm v := a × b = α log(a) × α log(b) = α (log(a)+log(b))%n representation of each element.
, where α is a primitive element of the Galois Field.After this transformation, if the log and exp transformation (stored with precomputed tables in practice) are known, the multiplication can be computed by simple addition and modular reduction.This approach allows us to optimize the computations of factor messages (see Figure 5).Namely, lookup tables are used to compute logarithm, exponentiation and modular reduction factor operations.The addition factor is implemented with a convolution, which can beneft from a FFT depending on the size of the variables' domain.

Simulating Hamming Weight Leakages
As a frst step, we perform simulations on the decoder graph.We initiated the marginal probability of the second operand with the high accuracy value template results from prediction matrices (see Subsection 3.3).This operand gives information about the parity check matrix elements.We also initiated the marginal probabilities of the outputs of all gf_mul computation from a Hamming weight leakage model with a Gaussian noise.Hence, for a side channel trace x resulting from the manipulation of a gf_mul output v, we have: With this leakage model, we can simulate the output of a perfect template classifer with the following equations: Simulation results Figure 6 shows the success rate of the attack when increasing the standard deviation value σ step by step, and performing the attack 400 times for each of them.Up to a standard deviation of 2, we have a success rate of 1 for security levels 128 and 192.For the highest security level of HQC, we can almost reach a noise level of σ = 3 without loss of accuracy.

Discussion
We observe that the success rate for HQC192 is lower than the one for HQC128.Indeed, the number of bytes to recover is larger (56 instead of 46), the number of independent trials remains almost the same (32 instead of 30) and the error correction capability is the same (i.e,19).This makes the attacks more difcult to conduct considering Equation 11.
We expected the attack on HQC256 to have a success rate greater than for HQC128.Indeed, the number of codeword bytes to recover is way larger for this security level (n = 90 instead of 46 or 56), but the number of independent trials is also bigger (n − k = 58 instead of 30 or 32), and the error correction capability increases (36 instead of 19).The attack needs to fnd twice as many codeword bytes, but has twice as many independent leaks on each of them, and is ultimately helped by a strong correction capability.

Codeword Masking Countermeasure
A state-of-the-art masking countermeasure strategy is codeword masking [MSS13].This masking strategy allows creating a mask for the decoder using an encoder.Instead of Given that the countermeasure is not the repetition of the same operation, codeword masking requires a further study about the encoding algorithm.Consequently, an attack targeting a masked implementation of HQC can be performed in two steps: (i) attacking ′ the decoder to recover c + c , the masked shared key and (ii) attacking the encoder to ′ recover the mask c .In this scenario, the success rate is the product of both success rates of these two points.The frst point is addressed in Section 4. In this section, we describe a SASCA approach against the RS encoder, addressing the second point.

Reed-Solomon Encoder Graph
Figure 7 gives a graphical representation of Algorithm 4. In order to depict all intermediate values of the algorithm, the for loop (line 2) is unfolded.Each line of the graph corresponds to one iteration of this loop.The gate values Γ (line 3) are represented on the left side of the graph.From the second line of the graph, they depend on the rightmost element of the array, the addition of which is indicated by a numbered arrow on the graph.Each element in the blue rectangle represents the Galois feld multiplication's output of the corresponding gate value on the same line and the corresponding generator polynomial element in the same column (line 5).Finally, these elements are diagonally added (XOR) to produce the redundancy bytes (lines 6-7) at the bottom of the graph.These bytes are then concatenated with the initial message to form the output codeword (line 9).
As well as the decoder, the main operation performed by the encoder is gf_mul, the Galois feld multiplication.Moreover, this encoder algorithm requires the knowledge of g, a generator polynomial, publicly known as a parameter of HQC.This prior knowledge can be implemented into the factor graph.Finally, the attack also aims at recovering the RS codeword bytes, allowing applying the re-decoding strategy (see Subsection 3.5).
We re-used the same template results from Subsection 3.3 to perform practical attacks.Simulations follow theory from Hamming weight leakage model from Subsection 4.2 which results are in Figure 8. 8 show the attack on the encoder is more sensitive to noise than the decoder's (see Figure 6).We claim that, in practice, the masked decoder is not secure since we are still able to recover the mask with a probability of 0.7625, 0.6575 and 0.8075 for HQC128, HQC192 and HQC256 respectively.

Discussion
The sensitivity of this attack to noise can potentially be explained by the sparse relations between intermediate values in the encoder graph, as well as cycles within the latter.Future work can focus on optimization techniques such as damping or message scheduling.Still, higher success rates for HQC256 are reported, both in simulations and real case scenario.
High-order masking strategy By generating N random masks and adding them together, one can generate high-order masking.Each one of the N masks must be independently N recovered to succeed in the attack, which occurs with probability p N = p , with p the probability of recovering a single mask.It follows that reducing the probability of success under 0.01 requires to compute 17, 11 or 22 independent masks for HQC128, HQC192 and HQC256 respectively.This approach doesn't seem to be efective due to the additional overhead it incurs.
Alternative masking strategy In this section, we only considered codeword masking, which is a very specifc form of masking at a high level.As further work, it would be interesting to consider the efect of masking in a lower level, for example directly masking the Galois feld multiplication itself.Similarly as what have been done for Dilithium [ABC + 22], this approach could be an efective way to protect HQC against our attack.

Shufing Countermeasures
The NTT from Kyber [BDK + 18] was already targeted by SASCA like strategies in [PPM17, PP19, HHP + 21].Two shufing countermeasures, coarse-full-shufing and fne shufing [RPBC20] were identifed to protect the NTT against SASCA.The fne shufing aims at shufing the order of NTT inputs and outputs, randomly selecting one of the 4 combinations for each call.This strategy prevents an attacker from labeling the observed leakages.The coarse shufing consists in shufing the elements of the inner loop, independently within each layer.These shufing strategies can be adapted to protect HQC.Indeed, the layer of the NTT behaves like the windows of the RS decoder.The coarse shufing can be used to shufe elements order within a window (see Figure 4).The fne shufing can be used to shufe the inputs of gf_mul, since the output is unique, the number of combinations is just 2.
We can also deduct novel shufing methods for HQC.A possibility is to compute each window in a random order.This strategy is useless for the NTT since all layers perform the exact same operation.However, for the RS decoder, the windows are perfectly independent and can then be performed in a random order.We call this countermeasure window shufing.Finally, all gf_mul operations being independent during the computation, they can be performed in a fully random order, following ideas from [ATT + 18].
In this section, we describe and analyze the security of these shufing countermeasures.For the study of shufing countermeasures from a side-channel perspective, we emphasize the importance for the attacker to possess a fully controlled device for the profling phase.Indeed, either the knowledge of the shufing or the possibility to isolate a single known gf_mul operations is mandatory to craft templates.

Fine Shufing
Under a fne shufing strategy, the sensitive data (i.e., the codeword byte) is manipulated under the frst operand one out of two times in average.We re-use a majority voting strategy from Subsection 3.4 to exploit the high frst operand leakage.We consider that the output of the classifer is a random value when the sensitive data is hidden behind the second operand.This hypothesis is a worse scenario than what we do observe in practice (see Table 2).Indeed, the leakage on the values from the parity check matrix could help for a more refned analysis.However, if the probability that the good hypothesis is ranked frst by the classifer is high enough, the majority voting will succeed.

Discussion
Using the fne shufing strategy goes against the desire to hide the sensitive data under the operand that leaks the least, as discussed in Section 3.

Coarse Shufing
The coarse shufing strategy aims at shufing the operations' order performed in each window.The selected shufing can be changed for each window, ensuring a better security level.The sequence of operations does not impact the graph construction or the path to convergence for the target codeword.This assertion is true since the n − 1 windows are independent sub-graphs (see Section 4).We recall that the value of C 1 is recovered with the fnal re-decoding strategy (see Subsection 3.5).Consequently, this case matches our prior setup of belief propagation, giving the same results as the previous decoder attack (see Section 4).

Window Shufing
Shufing windows allows interchanging the order of codeword bytes computations.In such a case, even if we are able to converge with a BP attack, recovered codeword bytes are shufed and the attack does not succeed unless the permutation is reversed.Here, we apply the same attack strategy, independently of the considered swap order.Indeed, the frst step is to run the belief propagation as presented in Section 4. This step produces marginal probabilities on each intermediate value.Previous results show that the values of the codeword bytes are successfully recovered, independently of the presence of a shufing.Consequently the difculty of attacking this shufe remains in inverting the permutation.

Inverting codeword bytes permutation
The parity check matrix H = (h i,j ) 1≤i≤k can 1≤j≤n−k be transformed into a Dirac probability distribution under matrix T of size k × n × 256 : We know that the lines of the parity check matrix has been shufed by the window shufing, but in each line, elements kept their original arrangement.After the frst BP phase, we obtain a shufed estimation of T , denoted T ˜, that holds the marginals of each variable representing H.More formally, if L represents a side-channel measurement: The idea is to reassign the lines T ˜ in order to minimize a distance with T .To do so, we compute the matrix D such that: where d is an arbitrary distance function.Inverting the window shufing is equivalent to select k elements from the matrix D. Exactly one element per row and one element per column such that the sum of these elements is minimal.The location of these selected elements gives the assignment between T ˜ and T lines.This problem is an instance of the assignment problem, for which an optimal solver is known.

Assignment problem
The assignment problem is a classic optimization problem in the feld of operations research and linear programming.It involves fnding the optimal assignment of a set of tasks to a set of agents (or workers) in such a way that the total cost or time required to complete the tasks is minimized, or conversely, the total proft or utility is maximized.Each task must be assigned to exactly one agent, and each agent can only be assigned to one task.

Hungarian algorithm
The Hungarian algorithm is an efcient method for solving the assignment problem, especially when the problem involves equal numbers of tasks and agents.It was developed by Harold Kuhn [Kuh55] in the 1950s and later refned by James Munkres [Mun57].We applied it considering T ˜ resulting from simulated leakage to study the behavior of the algorithm with noise.Several distance metrics have been evaluated for Equation 16: the L 1 distance1 presented the best results.After the Hungarian method, we know the value of the second operand with precision.Now (i) either the marginals on the codeword are already satisfactory to foresee a successful re-decode or (ii) the attacker can inject the newly learned information into the graph to converge towards more accurate results.

Full Shufing
A stronger shufing is introduced by combining ideas from window shufing and coarse shufing.These two shufing method can be applied independently as in [GLG22b], but this may lead to de-shufing attacks.Therefore, they can be cross-used, by totally randomizing the order of gf_mul computations.This strategy follows an idea from [ATT + 18] and aims at increasing the combinatorial complexity for the attacker.This strategy that we call "full shufing" has an overhead which is the cost of shufing a list of size n × (n − k).

Complexity of full shufing inversion
Let's suppose that we are able to recover, with a BP attack or other, the exact value of the second operand, coming from the parity check matrix.Given this information, we want to invert the shufing of these elements.However, the size of the matrix is much larger than the size of the Galois feld, leading to a large redundancy.Consequently, it is impossible to un-shufe without testing all possibilities for the redundant elements.Given the parity check matrix, one is able to compute this number of permutations for all the security levels of HQC.Note that this number increases with the size of the matrix, therefore with the security level, since the Galois feld remains the same.This number of permutations is respectively 2 504 , 2 614 and 2 1030 for the three security levels of HQC.This number being larger than the security level, we conclude that inverting the shufing is not achievable with the strategy presented in Section 6.This leads us to believe that full shufing is an efective countermeasure against our attack.

Decapsulation attack
The HHK transform used for HQC, generally the Fujisaki-Okamoto (FO) transform, involved a re-encryption part during the decapsulation.Thus, a decoded shared key is also re-encoded during the re-encryption.This additional step allows exploiting side-channel leakages from both a decoder and an encoder during the same decapsulation process.

Combining RS decoder and encoder graphs
We are able to build a double graph, creating a connection between encoder and decoder graphs.Indeed, these two graphs share the same codeword bytes variable nodes, which hence can be merged.We follow simulation strategy from Subsection 4.2 and display the results in Figure 9.We show that we are able to reach higher noise levels than any previous attacks in this paper, this for all HQC security levels.
Countermeasure This combined attack, exploiting leakage redundancy from the reencryption, is a threat to the security of HQC.Then, fnding a countermeasure both for the encoder and decoder is required.The current RS encoder algorithm (see Algorithm 4) is implemented with a polynomial division.Protecting this encoder with a shufing strategy is a hard task, since the carry propagation implies that several gf_mul operations depend on the result of previous ones.Considering the current encoder implementation, the full shufing strategy cannot be applied straightforwardly.Our idea to protect the encoder is to change its algorithm for a classical matrix-vector multiplication encoding.The full shufing strategy can then be applied, which provides a sufcient combinatorial complexity to prevent our attack from succeeding.Changing the encoder algorithm allows to protect both the encoder and decoder with the same shufing countermeasure.

Conclusion and Further Work
In this paper, we present new shared key recovery attacks on the code-based PQC NIST contest candidate HQC.Depending on HQC implementation choices, our attacks can either be a classical template attack or rely on Soft Analytical Side-Channel Attack (SASCA) based on Belief Propagation (BP) theory.
For all our practical attacks, we used the setup presented in Subsection 3.1.These attacks are performed within a few minutes on a STM32F407 target running the reference implementation of HQC [AMAB + ], for each security level, each attack have been repeated 400 times.We reach a perfect accuracy for each attack, expect the encoder attack which present sucess rates greates than 65%.We stress that our attacks are a threat for HQC and efcient countermeasures must be applied.This work takes advantage of the inner structure and properties of code-based cryptography to mount practical shared key recovery attacks.
• We demonstrate practical attacks against the Reed-Solomon (RS) decoder of HQC.
Precisely, we exploit physical leakages during Galois feld multiplication, a cornerstone operation of the RS logic, and model intermediate variables' dependencies within a factor graph.We simulated this attack with Hamming weight leakage model and showed that the success rate stays high (superior to 0.9) up to σ = 2 and even σ = 3 for the highest HQC security level (see Figure 6).In practice, this attack has a success rate of 100%.
• We perform the same analysis against a version of HQC protected with codeword masking.Specifcally, the robustness of the RS encoder against SASCA is studied.It emerges that the encoder attack is more sensitive to noise, which can potentially be explained by the sparse relations between intermediate values, as well as cycles in the encoder graph.Simulation results are depicted in Figure 8 with good accuracies up to σ = 1.In practice, our attack reaches success rates of 76.25%, 65.75% or 80.75% depending on the selected security level.We emphasize that these success rates are enough to threaten the security of the scheme; codeword masking is not an efcient countermeasure to protect HQC against SASCA on a STM32F407.
• We analyze the security of several RS decoder shufing countermeasures against our attacks.We demonstrate insufcient protection brought by shufing countermeasures adapted from the Kyber-related literature.Namely, we reach perfect accuracy on a real case attack scenario.We present the full shufing strategy which provides satisfactory additional combinatorial complexity to the attacks proposed in this paper.We believe that RS decoder full shufing strategy is an interesting countermeasure that could possibly thwart other attacks.
• Finally, by exploiting the Fujisaki-Okamoto (FO) transform, an attacker can combine encoder and decoder leakages by merging both factor graphs, for successful shared key recovery on devices with higher noise levels (see Figure 9).Once again, in a practical scenario, our attack has a success rate of 100%.The combined attack exploiting the redundancy leakage from the re-encryption is a potential threat for any FO-like scheme.We show that changing the HQC encoding strategy allows protecting both encoder and decoder with full shufing.
The analysis of HQC's internal Reed-Solomon through the lens of a side-channel attacker leads to several intuitions about further work.Firstly, as all our attacks exploit the Galois feld multiplication, we believe that protecting the latter operation is a promising path towards efcient countermeasures.An option could be to implement a gadget [BBE + 18] for gf_mul, ensuring security for RS operations under a given attacker model.Secondly, the full shufing algorithm must be carefully selected, especially the random generator, to prevent permutation recovery attacks.Finally, the resilience of other PQC schemes built with the FO transform needs to be evaluated against SASCA approaches analogous to the decapsulation attack presented in this paper.Attacks combining the redundancy of leakages created by the re-encryption could be a threat for FO schemes.
Hamming Quasi-Cyclic (HQC) [AMAB + 17] is a code-based Key Encapsulation Mechanism (KEM) involved in the American National Institute of Standards and Technology (NIST) process for Post-Quantum Cryptography (PQC) standardization [CCJ + 16].After three preliminary rounds and the standardization of lattice-based cryptography, HQC, along with BIKE [ABB + 17] and ClassicMcEliece [BCL + ], is now a candidate of the fourth and last round [AAC + 22].

Algorithm 5
Compute Syndromes from HQC RS Decoder from [AMAB + ] Require: parameters: k, n the dimension and length of the code Require: parity check matric H ∈ F ( q n−k,n) Require: codeword c ∈ F n q Ensure: s := H T • c the syndrome of c 1: Initialize s to c[1] n−k 2: for i from 1 to n − k do 3:

Figure 3 :
Figure 3: Coefcients of determination computed for both inputs and the output of the Galois feld multiplication.

Figure 5 :
Figure 5: Galois feld multiplication sub-graph.Factors are denoted with a square and variables with a circle.

Figure 6 :
Figure6: Simulated success rate of SASCA on the decoder, with re-decoding strategy, depending on the selected security level of HQC.

′
decoding c + e into m, we start by randomly sampling a message mask m .This message ′ mask is encoded into c , the codeword mask.Then the decoder algorithm is applied on ′ ′ c + c + e, masking the sensitive data c, returning m + m due to the linearity of the ′ involved code.The true result m is recovered by subtracting the message mask m .Since the encoder is a fast operation in front of the decoder, codeword masking allows reducing the overhead of the countermeasure.

Figure 9 :
Figure9: Success rate of SASCA on the decapsulation (decoder + encoder combined), with re-decoding strategy, depending on the selected security level of HQC.

Table 2 :
Hamming weight and value templates accuracies on gf_mul and success rates of attacks on STM32F407.Each attack has been performed 400 times.Templates were trained with 300000 training traces with 10%/90% validation/training segmentation.