Quasi-linear masking against SCA and FIA, with cost amortization

. The implementation of cryptographic algorithms must be protected against physical attacks. Side-channel and fault injection analyses are two prominent such implementation-level attacks. Protections against either do exist. Against side-channel attacks, they are characterized by SNI security orders: the higher the order, the more diﬃcult the attack. In this paper, we leverage fast discrete Fourier transform to reduce the complexity of high-order masking. The security paradigm is that of code-based masking. Coding theory is amenable both to mask material at a prescribed order, by mixing the information, and to detect and/or correct errors purposely injected by an attacker. For the ﬁrst time, we show that quasi-linear masking (pioneered by Goudarzi, Joux and Rivain at ASIACRYPT 2018) can be achieved alongside with cost amortisation. This technique consists in masking several symbols/bytes with the same masking material, therefore improving the eﬃciency of the masking. We provide a security proof, leveraging both coding and probing security arguments. Regarding fault detection, our masking is capable of detecting up to d faults, where 2 d + 1 is the length of the code, at any place of the algorithm, including within gadgets. In addition to the theory, that makes use of the Frobenius Additive Fast Fourier Transform, we show performance results, in a C language implementation, which conﬁrms in practice that the complexity is quasi-linear in the code length.


Introduction
In this article we are interested in the security of block ciphers, such as the AES.Such algorithms encrypt and decrypt data using a key, which must remain secret.Nonetheless, the implementation of cryptographic algorithms is subject to several attacks, amongst which side-channel and fault injection attacks are especially powerful.Side-channel attacks consist in correlating guessed (key-dependent) variables with some information leakage, whereas fault injection attacks consist in correlating sensitive variables with the fault outcomes.Both attacks try exhaustively all values of a subkey, and carry out a sufficient amount of attacks so as to rebuild the complete key with a divide-and-conquer approach.
It is therefore paramount to protect implementations against those attacks.The protection against side-channel analysis is often based on "masking": it consists in computing with randomized intermediate variables in order to provably deter attempts from an attacker to correlate on the randomized leakage.The protection against fault injection can typically rely on provable mathematical techniques, such as error detection codes.
Recently, the "code-based masking" (CBM) paradigm has been introduced: it leverages codes to achieve protection against the two threats at the same time.A pair of linear complementary codes allows to linearly combine sensitive information with digital random numbers in such a way the randomness has maximal decorrelation power whilst ensuring the demasking remains possible at all times.The ability to handle faults is based on redundancy kept by codes, ensuring their length is large enough to enable a detection or correction capability meeting the requirements in terms of fault injection attacks coverage.

Background on masking
Masking, from a historical perspective.A consensual protection against side-channel analyses consists in randomizing data representation and computations.This method is commonly referred to as masking.Several masking schemes have been proposed already.
Let us recap briefly the different milestones this technique has passed over the years.First of all, a proof-of-concept leveraging data randomization has been introduced by the seminal work of Kocher et al. [KJJ99].Some early implementations have been proposed, and it has soon become clear that high-order attacks could defeat lower order masking schemes.Hence the research for provable protections against higher-order attacks.Formal definitions have been put forward by Blömer et al. in [BGK04].A constructive scheme has been proposed by Ishai et al. [ISW03] on bits.This scheme has been subsequently extended to words (e.g., bytes) by Rivain and Prouff [RP10].Some tools to perform automatic proofs for such schemes have been developed, for instance by Barthe  Minimizing the number of multiplications.The bottleneck in terms of performance is the number of nonlinear multiplications (that is, multiplications of x by an element different from a linear combination of powers of x whose exponents are of the form 2 j − 1), since the addition and linear multiplications pose no problem and all S-boxes over finite fields being polynomial, the global complexity of masking directly depends on the number of nonlinear multiplications in the unprotected algorithm.
Then, a great deal of research has been devoted to reducing the number of multiplications in cryptographic operations, as for instance [CPRR15].According to the before 2020, it seemed difficult to mask one element of the field F q in a way ensuring a d th -order probing security, with a better complexity than O(d 2 ) multiplications over F q .Recently, leveraging Karatsuba multiplication, Maxime Plançon [Pla22] introduced RTIK masking scheme.This masking style manages to get reduced complexity down to O(d log 2 (3) ), i.e., O(d 1.59 ), for limited values of d only (namely d being an extension order of the field where computations takes place, when this field happens to be an extension).
Cost amortization and fault detection capability.In order to get the most from masking schemes, from a performance standpoint, some attempts have been made.One direction has been the simultaneous masking of several bytes, referred to as "cost amortization", as demonstrated constructively by Wang et al. [WMCS20].Formerly, the same idea has been applied in the field of multiparty computation, under the name of "packed secret sharing" [DIK10].It has required to make a difference between the number of shares (n) and the masking order (d).Moreover, our masking is compatible with builtin fault detection capability, tightly intertwined with the CBM design.
Quasi-linear masking complexity.Another direction for reducing the cost due to multiplications is in reducing the cost of each multiplication by leveraging spectral representations, such as the Number Theoretic Transform (NTT) as put forward first by Goudarzi, Joux and Rivain (GJR [GJR18]).Quasi-linear masking enables significant performance improvements on masking schemes which considerably ease their adoption by the industry.Unfortunately the NTT works only for prime fields with odd characteristics and large orders which is not convenient in practice.Recently, the authors of [GPRV21] extended the GJR scheme of [GJR18] to the even characteristic by replacing the NTT by a Discrete Fourier Transform (abridged "DFT" in the sequel), namely the additive fast Fourier transform of Gao et al. [GM10].(Notice that this DFT is "general" in that it operates on finite fields.)The novel masking scheme is dubbed "GJR+".
The initial proposal of [GJR18] (GJR) and the modification of [GPRV21] (GJR+) considerably improved upon the state of the art, since they allowed to reduce the complexity of multiplications from quadratic (O(d 2 )) to quasi-linear (O(d log d)).This improvement is significant because the multiplication is the bottleneck in terms of computational complexity.
But the "DFT" in general (and NTT in particular) have a drawback: the linear operations (in the field) are no longer transparent.Instead of having a complexity O(n) (linear in the number n of shares), because each share is applied the linear transformation on itself, individually, an operation of quasi-linear complexity shall be applied.Still, the overall complexity remains quasi-linear.
Code-Based Masking (CBM).Besides, CBM has been introduced as a new paradigm to capture the security properties of masking.It describes the masking scheme as the (vector space) sum of an encoded information taken from a code C, with an encoded mask taken from a code D, that is "disjoint" from C. The main advantage of CBM is that the security order is simple to determine: namely, the masking order is equal to the dual distance of the masking code minus the number one [PGS + 17].Computing in CBM, including multiplications, has been put forward in [WMCS20].Advantageously, CBM has been proven in the same article relevant to describe the capability to detect faults on top of a masking scheme: indeed, when the two vector spaces C and D are in direct sum but such that dim(C) + dim(D) < n where n is the length of C (or D), the information can be encoded in a redundant manner, enabling detection or even correction.Notice that CBM class includes as special cases Boolean masking and inner product masking.

Analysis of the state of the art
We begin in this subsection with a comparison with state-of-the-art of combined sidechannel and fault injection attacks.The efficiency of side-channel analysis is captured by the masking complexity and the ability to mask several symbols at the same time (denoted "cost am." for "cost amortization").Only our proposal enjoys this cost amortization capability.The efficiency of the protection against fault injection is qualified according to: 1. whether the detection is end-to-end throughout the algorithm; 2. whether the detection needs to be performed at pre-defined checkpoints set at design time or whether no detection is required (e.g., when faults are infective thereby preventing an attack to exploit them).Notice that checkpoints may be placed at strategic waypoints during the execution of the algorithm, or only at the end prior to disclosing the demasked result.
Most known masking countermeasures apply either to binary fields or to prime fields, whereas our masking can handle both binary and prime fields (and actually any finite field in general).
The comparison is given in Tab. 1. Regarding the applicable field, the different fields are denoted by F 2 vs F q , where q stands for any prime power.Masking schemes compatible with F q are thus more versatile.
We analyze now the drawbacks of existing quasi-linear masking, in particular [GPRV21].
No cost amortization nor fault detection capability.Despite the advantages in terms of performance of quasi-linear masking ([GJR18] and [GPRV21] as well), the technique described in these papers does not unleash the full potential in terms of masking efficiency and fault attack protection.Regarding the efficiency, none of these papers addresses how to encode multiple bytes of information in one go.Besides, these papers do not show how to correct errors (it would require to encode redundant information, as for instance put forward in [CCG + 20]).
Non-practical masking order.It is hinted in [GPRV21] that their quasi-linear masking "improves the efficiency of the masked cipher for a masking order n ≥ 64 for the MiMC block cipher and n ≥ 512 for the AES".These masking orders are non-practical.Indeed, in real life, masking order is rather low, such as 1, 2 or maximum 3.
Complex implementation.The technique of [GPRV21] involves a randomized Fourier transform.Namely, the primitive root of unit which defines the Fourier transform must be chosen at random (see page 602).This is an obvious limitation in terms of efficiency: the DFT operations must be pre-computed prior to any cryptographic masked operation (whereas our scheme does not require any pre-computation).
Abstract specification.In [GPRV21], the DFT is not instantiated, which limits the ability to compare with other schemes, apple to apple, in terms of actual performances (actually [GPRV21] only provides data complexities).As a side-effect, this negatively impacts the clarity of the security proof (which requires cumbersome hypotheses, such as leaving the DFT out of the scope of the security analysis).

Our contributions
In this paper, we introduce a practical masking scheme, with quasi-linear complexity, and fault detection/correction.
Proofs of security against SCA and FIA based on code properties.Our masking algorithm is described as a CBM.Therefore, not only side-channel security order is related to a dual distance, but also the capability to detect & correct faults is also related to codes minimum distance.Namely, we show that our scheme features side-channel security order of d + 1 − t, detects d faults and corrects (d − 1)/2 faults, where 2d + 1 is the encoding length and t is the information size (t ≥ 1, and t > 1 when cost amortization is enforced).
Cost amortization.Our masking algorithm allows to mask jointly several bytes, based on a proof leveraging coding theory (within the CBM paradigm).Former works involving quasi-linear masking are only concerned by masking individual bytes.Notice that cost amortization also has an advantage in terms of the efficiency of fault detection capability.
Practical and efficient DFT.We thoroughly studied several DFT algorithms, and deploy an efficient one.It offers improved efficiency owing to optimization from a numeric standpoint.Namely, it relies on a sparse representation with small & simple coefficients (e.g., most often, "1"s).This DFT can be leveraged in the same time for the computation of the masking and the error detection.
Implementation and performance validation.We show that our quasi-linear masking is easily implementable.Namely, we provide performance characterization in C language.
In particular, it supports the effectiveness of cost amortization.Our benchmarks are on the block cipher AES, but our masking can apply as well to lattice-based post-quantum cryptographic algorithms (such as Crystals Kyber and Dilithium, as explained in Sec. 8).We compare our performance results to others but rare are the papers on masking which actually indicated them with enough precision for allowing comparison.

Outline
Preliminary notions are given in Sec. 2. They focus on DFT computation as it is the most complex operation in our masking.We propose in Sec. 3 to consider an original DFT method proposed by Wang and Zhu in [WZ88] which is particularly adapted to both software and hardware implementation.Indeed, the Gao and Cantor methods that we mentioned could give similar theoretical complexity but would require a huge effort of implementation in practice.We show in Sec. 4 how to extend this masking to the case of simultaneous protection of several symbols.We propose in a second phase, in Sec. 5 to detect or correct errors and erasures of any codeword present anywhere in the process of the ciphering algorithm, including within gadgets.The security rationale is detailed in Sec.6, where we provide formal proofs in the CBM and SNI models.Implementation in C language is given in Sec. 7, along with performance results.Some discussions are available in Sec. 8. Conclusions and perpectives are in Sec. 9.
Examples of quasi-linear DFT constructions adapted to handling bytes are given in App. A. We show the efficiency of this method on all platforms; our method definitively complies with hardware and software implementation and has a very low complexity.Namely, in App.A.1 (resp.App.A.2), we investigate the case of d = 2 (resp.d = 7).Those two values represent regular and substantial/high security levels.

Finite fields
In this article, we are interested in data represented as elements from finite fields.We denote by F q the field of q elements.We recall that when q is a power of two, F q is said of characteristic two; in this case, subtraction and addition are the same operation, simply denoted by "+".A finite field of characteristic two can be seen as a polynomial extension of degree of F 2 , where q = 2 .In this case, the addition boils down to the -bit parallel XOR operation.In this article, we illustrate our results on F 256 (i.e., = 8), which is the natural field within AES.Let ν be a primitive element of F q , that is a generator of the multiplicative group F * q .Let n be a positive integer.We assume that n divides q − 1, then we have that the field element ω = ν q−1 n is a primitive root of the unity (i.e.ω n = 1).By construction, n is odd with q is power of two.We denote n = 2d + 1.

Reed-Solomon codes
We denote by F n q the vector space of n field elements.A vector subspace of F n q is also called a linear code of length n.The Reed-Solomon code of length n, dimension k and minimal distance n − k + 1 is an evaluation code for which a generator matrix can be defined as that of the evaluation of the polynomial basis 1, X, X 2 , . . ., X k−1 over the set 1, ω, ω 2 , . . ., ω n−1 .We denote this code by RS The dual C ⊥ of a linear code C is the linear code equal to the kernel of the generator matrix of C. It is well-known that the dual code of RS As a consequence, we know that the matrix (ω ij ) 0≤i≤n−1,0≤j≤n−1 , known as the Vandermonde matrix defined over 1, ω, ω 2 , . . ., ω n−1 , is a generator matrix of the RS[n, n, 1] code.We have also that the inverse of the Vandermonde matrix corresponds to the generator matrix of the RS[n, n, 1] code defined over 1, ω n−1 , ω n−2 , . . ., ω 1 .

Multiplication of polynomials and DFTs in finite fields
We are interested in the multiplication of two polynomials P and Q on F q of degree less than or equal to d.The result is P Q, a polynomial of degree less than or equal to 2d.
The naive computation has complexity O(d 2 ).However, a less complex method can be implemented.
Every polynomial is evaluated over {1, ω, . . ., ω n−1 }.The evaluation of P Q is the pairwise product of the evaluation of P and Q.Thus, P Q is given by the interpolation of its truth table.Now, it is well-known that the evaluation of a polynomial is precisely its Discrete Fourier Transform (DFT).Reciprocally, the interpolation of a polynomial is given by the inverse DFT (IDFT) [Knu11,Vol 2].Notice that the definition of the DFT (and of the IDFT) is relative to the value of ω.Whenever there can be ambiguity, we shall write DFT ω (resp.IDFT ω ) instead of DFT (resp.IDFT).
Besides, the evaluation of polynomial P on its support is equivalent to multiplying the row (p 0 , p 1 , . . ., p d−1 ) made up of coefficients of P = d−1 i=0 p i X i by the Vandermonde matrix.Reciprocally, the interpolation of a polynomial P is given by the multiplication by the row (P (1), P (ω), . . ., P (ω n−1 )) with the inverse of the Vandermonde matrix.
According to [Gao03], these operations (DFT and IDFT) can be computed using O(n log(n) log log(n)) operations in F q operations.The details of these algorithms can be found in Chapters 8-11 of [vzGG13].
Multiplicative DFT (see [Gao03]).The usual DFT requires that its support (n points, named a i ) form a multiplicative group of order n, concretely, the polynomial X n + 1 has n distinct roots in the underlying field.In this case we say that the field supports DFT, and we call such a DFT multiplicative.A multiplicative DFT has time complexity O(n log(n)) and can be implemented in parallel time O(log(n)), where the implicit constants are small.For such abovementioned fields, we can take n + 1 to be a power of 2 with n|(q − 1) and a 1 , . . ., a n to be all the roots of X n + 1.Then a DFT and its inverse at these points can be computed using O(n log(n)) operations in F q .By using DFTs, polynomial multiplication and division can also be computed using O(n log(n)) operations.The implicit constants in all these running times are very small, so these algorithms are practical for n ≥ 256.
Additive DFT (see [GM10]).Unfortunately multiplicative DFTs are not supported by many finite fields, especially fields of characteristic two which are preferred in practical implementations.Cantor [Can89] finds a way to use the additive structure of the underlying field to perform a DFT over a finite field of order p where is a power of p.This method is generalized by von zur Gathen and Gerhard [vzGG96] to arbitrary .Their additive DFTs (for p = 2) uses O(n log 2 n) additions and O(n log 2 n) multiplications in F q .For fields of characteristic two and for n = 2 , Gao and Mateer [GM10] recently improved on Cantor's method.When is a power of 2, the above time complexity can be improved to O(n log(n) log log(n)).For arbitrary , there is an additive DFT using O(n log 2 (n)) additions and O(n log(n)) multiplications in F q .These DFTs are highly parallel and can be implemented in parallel time O(log 2 (n)).

Quasi-linear DFT in practice
All DFT methods presented and discussed in the previous section 2.3 can be implemented in a pragmatic manner.Namely, first, a polynomial decomposition binary tree is computed off-line, once for all.Second, for each invocation of DFT or IDFT, a butterfly algorithm is executed on the pre-computed tree.
Preparation of a polynomial decomposition tree.We leverage the method put forward by Wang and Zhu in [WZ88].Their idea consists in remarking that P (ν i ) = P (X) mod (X + ν i ), then it is shown that the polynomial X n+1 + X can be decomposed, as discussed below.
Let us design a binary tree of polynomials q i,j , where i is the depth and j is an index for the breadth.Let n be the size of the DFT, then 0 ≤ i ≤ log 2 (n) , and 0 ≤ j ≤ 2 log 2 (n) −i .The tree is defined recursively as follows: • The root is denoted by q log 2 (n) ,0 = X n+1 + X; • intermediate nodes are denoted by q i,j and defined as q i,j = 1 k=0 q i−1,2j+k , with degree(q i,j ) = 2 i ; • Eventually, the leaves are q 0,j = X − β j , where β j are elements of F q .
By convention, the first leaf q 0,0 = X.In fact intermediate divisors are completely determined once the ordering of the bottom divisors q i,0 is fixed.
Example 1.We illustrate in this example such a binary tree, obtained from the Frobenius Additive Fast Fourier Transform (FAFFT) put forward in [LCK + 18].We remind that X 4 + X = X(X + 1)(X 2 + X + 1).The polynomial X 2 + X + 1 is the minimal polynomial whose zero is ω (recall that ω is defined throughout the article as a root of the unity of X n + 1).Then we have the following binary tree: With the construction of [WZ88], it is possible to show that all q i,j are either linearized or affine polynomials [MS77] (that is: q i,j (X 1 + X 2 ) + q i,j (0) = q i,j (X 1 ) + q i,j (X 2 )).Consequently, polynomials q i,j are sparse with at most i + 1 coefficients.
Computation of an efficient DFT.Based on such a pre-computed binary tree, we can now introduce an algorithm to efficiently compute the DFT.It is given in Alg. 1.
Algorithm 1: Quasi-linear (i.e., fast) Discrete Fourier Transform Data: Pre-computed binary tree q i,j Input: a = (a 0 , a 1 , . . ., a n−1 ) Output: The last step in Alg. 1 (for i = 0) consists in a reduction modulo q 0,j , which are polynomials of degree 1.Thus, the modulo operations yield a value in F q .
3 Quasi-Linear Masking without Cost Amortization In this section, we introduce our high-order CBM algorithm, without cost amortization.That is, we consider only the masking of t = 1 element (byte).The purpose of this particular case is to explain simply the DFT-based masking with fault detection capability.

Masking construction
We define now the Reed-Solomon code RS q [n, n, 1] whose generator matrix is given by the Vandermonde Matrix M ∈ F n×n q where M i,j = ω ij .Let x ∈ F q be a sensitive variable.To mask it, we pick randomly r 0 , . . ., r d−1 in F q and encode the vector a = (x, r 0 , . . ., r d−1 , 0, . . ., 0) ∈ F n q with the Vandermonde matrix.We define: Unmasking corresponds to the computation of the inverse DFT.Namely, let us denote z = mask(x) (i.e.z j = d i=0 a i ω ij ).We have a = IDFT( z).The sensitive data is x = a 0 , thus we get:

Masking addition and scaling
Let us denote: z = mask(x) and z = mask(x ).The following properties are satisfied:

Masking the multiplication
The multiplication is not a linear operation, so the question is how to compute mask(xx ) without unmasking x or x .We denote y = z * z := (z j z j ) j∈{0,...,2d} where " * " is the term-to-term product between two vectors.For j ∈ {1, . . ., 2d}, we have: The coefficients r i are obtained from the multiplication between Z(X) The multiplication between Z(X) and Z (X) of degree d gives a polynomial Y (X) = xx + 2d i=1 r i X i of degree 2d.Thus, to get mask(xx ) we need to eliminate the coefficients r i for i ∈ {d + 1, . . ., 2d}.

Algorithm for the masked multiplication
We get: This computation is summarized in Alg. 3.
A tedious calculation of the complexity of this algorithm in terms of the number of multiplications in F q is given in Tab. 2.

Algorithm 3: oneElementMultiplication
Complexity: // Call to routine of Alg. 2 5 return y + DFT(0, . . ., 0, r ) In conclusion, the complexity of addition is linear, that of multiplication is quasi-linear.Besides, masking and demasking each costs n log(n) multiplications [TL20] over F q , hence is quasi-linear as well.As a conclusion, all operations can be computed in quasi-linear complexity.

Quasi-linear Masking with Cost Amortization
Let us now extend our quasi-linear masking to several information elements (e.g., bytes) simultaneously.This allows to explore a tradeoff between side-channel order (namely d + 1 − t) and the amount of information processed simultaneously (namely t).

Encoding procedure
First we pick randomly r = (r t+1 , r t+2 , . . ., r d+1 ) in F d+1−t q .By Lagrange interpolation, there exists a vector a = (a 0 , a 1 , . . ., a d ) and the associated polynomial Let us define the matrix A ∈ F (d+1)×(d+1) q , where A i,j = u i j for any i, j ∈ {0, . . ., d}.This matrix is a Vandermonde matrix which is invertible since u i = u j for i = j.Then we have: The second step of encoding consists in computing DFT ω (a 0 , . . ., a d , 0, . . ., 0).Thus: In this equation, ( x | r) is the row obtained by the concatenation of row vectors x and r, and A −1 |0 is the vertical concatenation of the matrices A −1 and 0. This method is a O((d + 1) 2 ) complexity encoding procedure, but we can do better with the following one.We can construct P (X) = P (X) + P (X) by first picking randomly the polynomial We want now to construct P (X) which allows to solve the following linear system: where: • A ∈ F t×t q , and A i,j = A i,j for any 0 ≤ i, j < t; , A i,j = A i+t,j for any 0 ≤ i < d + 1 − t and 0 ≤ j < t; • a ∈ F d+1−t q is a random vector.
Thus, the calculation of a = ( a | a ) = ( x + a × A ) × A −1 | a costs t(d + 1) multiplications over F q (we note that A and A −1 may advantageously be pre-computed).Again, the second step of encoding consists in computing DFT The overall masking procedure is given in Alg. 4. Decoding procedure follows the same tracks: we use the inverse discrete Fourier transformation to get a, then we have: x = a × A + a × A which has the same complexity as the masking operation.
The polynomial obtained by performing DFT −1 ω (DFT Now we have to propose a method that associates a degree d polynomial D(X) to C(X).This polynomial must satisfy the same properties: The authors of [GJR18] proposed the following construction for t = 1: Obviously, in this case D(u 0 ) = C(u 0 ) = x 1 x 1 .We propose to generalize this construction.Let: .

Matrix product masking
It is necessary to also define the matrix product operation, as this type of operations is essential to calculate MixColumns or ShiftRows for example, with t ∈ {4, 8, 16}.Let us denote by L ∈ K t×t a public matrix, we need to construct an algorithm MatrixProduct such that: Let us recall that the masking operation is a combination between 2 FFTs, that can be represented as a matrix product as follows: where: Thus:

Exponentiation algorithm
Let e be a power of 2, we denote x e = (x e 1 , . . ., x e t+1 ) ∈ F t+1 q .In order to calculate SubBytes transformation efficiently we need to calculate mask( x e ) (see for instance [RP10, Alg.3]).We have: In this case, the order of the operations is very important.As a matter of fact, the mask( x) e • (N e ) −1 can divulge the sensitive data if it has been done as indicated above.This is why it is mandatory to pre-compute ((N e ) −1 × N ) first (once for all), and only then calculate mask( x) e • ((N e ) −1 × N ).

Error correcting code interpretation
We note that by construction, there exists an invertible matrix R that satisfies: We note that this DFT computation corresponds to the encoding in the Reed-Solomon code defined by the evaluation of 1, X, . . ., X d over 1, ω, ω 2 , . . ., ω 2d , and represented by a matrix V .Hence, we get that mask(y) = yR V .We deduce that our masking algorithm corresponds to an encoding procedure with a generalized Reed-Solomon code of minimal distance d + 1, dimension d and length 2d + 1.

Error detection method
We have seen previously that our masking technique corresponds to an encoding in a Reed-Solomon code of parameters [n = 2d + 1, k = d + 1, d + 1] q .We propose in this section to describe a known method based on syndrome decoding [Pet60, Mas69, Jr.65, BHP98] that does not leak sensitive information.
Our information on t words is included inside of d + 1 words which are then encoded in the Reed-Solomon code of length 2d.Next we assume that a reasonable number of faults is injected on this codeword c.This codeword is in correspondence with a degree k It corresponds to the classic problem of error correction in a noisy channel.The error can be interpreted as a vector e = (e 0 , e 1 , . . ., e n−1 ) = DFT −1 ω (e(X)) where e(X) is a degree n − 1 polynomial over F q .We denote by the weight of the non-zero coefficients (positions) in e(X).Hence, we study the vector y = c + e = (e j ) j∈ 0,n−1 .
To detect or correct the errors, we calculate a syndrome from y, which only depends on the error word e and not on the codeword c.We recall that the dual code of the RS[n, k] is the RS[n, n − k] code.A basis of this code is given by the monomials 1, X, . . ., X n−k−1 which are evaluated over the set 1, ω, . . ., ω n−1 .
To correct these faults, we need to construct the error locator polynomial.We introduce the vector λ = (λ j ) j∈ 0,n−k−1 such that λ j = 0 whenever the corresponding coefficient e j of e is non-zero, and λ j = 0, whenever e j = 0.In this way, we have λ j • e j = 0 for all j ∈ {0, . . ., n − 1}.If we denote Λ(X) = DFT ω (λ) and E(X) = DFT ω (e) = S, then, due to the well-known convolution theorem of the DFT, we have (5) The roots ω −j1 , . . ., ω −j of the polynomial Λ(X) correspond to the locations j 1 , . . ., j of the erroneous positions in y.Therefore Λ(X) = Λ 0 + Λ 1 X + • • • + Λ X is called the "error locator polynomial".Without loosing in generality, Λ(X) can be normalized by setting λ 0 = 1.Equation (5) gives rise to a linear system of n equations.From these equations, n − k − t equations only depend on the n−k coefficients from E(X), which coincide with the elements S 0 , . . ., S n−k−1 of the syndrome, and the unknown coefficients of the error locator polynomial λ(X).Hence, we extract a linear system of n − k − er equations and unknowns: Obviously, a unique solution exist as long as ≤ n−k 2 which means than we can correct not more than n−k 2 = d−1 2 faults.To avoid a large complexity to solve the system of equations ( 6), due to specific form of it, we can use the well-known Berlekamp-Massey algorithm that solves this system with a linear complexity.
At this point we have located the errors by constructing Λ(X).Reconstructing the errors can be done by the Forney algorithm.It consists in calculating the error evaluator polynomial Ω(X) = Sp(X)Λ(X) mod 2er, where Sp(X) is the partial syndrome polynomial: Finally the error is given by evaluating the quantity for X j = ω ij : , where Λ is the first derivative of Λ.These quantities can be again evaluated by using the DFT transform, hence correcting fault injection can be done with a linear complexity.Exemplary tradeoffs are given in table 3.

Positive effect of cost amortization on fault detection capability
Let us fix a field F q and a minimal distance d.Then, it is more efficient from the code length point of view to mask two (resp.2k) symbols together than each one (resp.each k) independently.Formally, let BLLC the BestLengthLinearCode function in Magma [Uni], which yields the minimal length of a code on F q of a given dimension and minimum distance.We have that: For instance, on F 256 , RS codes are minimum distance separable (MDS) and thus

Security proof
The security of our scheme depends of our encoding procedure, our multiplication gadget and our capacity to detect fault injections during the computation steps of the encryption algorithm.

The encoding procedure
We remind that our encoding procedure of a vector x = (x 0 , . . ., x t−1 ) has been defined in subsection 4.1.It consists in picking randomly r = (r t+1 , r t+2 , . . ., r d+1 ) in F d+1−t q and performing the operation: We also recall that the matrix A = (u j i ) i,j∈ 0..d .A first approach consists in showing that our masking method corresponds to a special case of DSM scheme, then we propose to translate this operation in a generic encoder as defined in [WMCS20] (page 137, definition 13).Applying DF T ω corresponds to a multiplication by one Vandermonde matrix.This matrix happens to be the generator matrix of the Reed-Solomon code RS[n, n, 1] defined over F q .A generator matrix of this code is defined by the evaluation of the monomials (X i ) i∈{0,n−1} over 1, ω, . . ., ω n−1 .The multiplication by A −1 |0 leads to cancel the last rows of the generator matrix of this RS[n, n, 1] code which becomes a Reed Solomon code RS[n, d + 1, n − d].We denote R a generator matrix of this code.Hence, We propose consequently later in this section a method to detect errors without revealing sensitive information.
Let C H the code generated by H, then, Proof.We denote by K the matrix which corresponds to the last It is well known that the parity check matrix of R that we can denote T is a Reed-Solomon code RS[n, d, n − d + 1] and we have H t T = 0. Hence, H t T = K × R × t T = 0 and the subspace generated by the rows of T are included in the kernel of H.

Study of K:
We remind that K = (0, Id d+1−t )A −1 .First of all, A −1 is a Reed-Solomon generator matrix as any invertible square matrix because it is equivalent (up to an invertible matrix) to a Reed-Solomon code.Hence K is a generator matrix of a sub code of a RS[d + 1, d + 1] code.We would like to determine now the dual code of K and we observe the equation A −1 × A = Id d+1 .By setting and we get that We deduce that K (d+1−t)×(d+1) × B (d+1)×t = 0 (d+1−t)×t and we know that By construction t (B (d+1)×t ) = t B is a generator matrix of a code generated by the polynomials 1, X, X 2 , . . ., X t−1 defined over the set u 0 , . . ., u d : this is a Reed-Solomon code RS[d + 1, t, d + 2 − t] of minimal distance d + 2 − t.We deduce that the encoder (x, r) → (x, r)A −1 is a generic encoder of probing order d + 1 − t.
We want now to describe the kernel of K × R. We can repeat the same construction for R. If we denote V ω the Vandermonde matrix associated to DF T ω : , Ri (2d+1)×d , and Hence we can build a vector space included in the kernel of H = K × R with T which is the generator matrix of a RS[2d + 1, d] code and D = t B × t R i .We note that t R i = (ω (n−i)j ) i∈ 0..d ,j∈ 0..2d is a generator matrix of a code generated by d + 1 polynomials of degree more than d + 1.Then t B = (u j i ) i∈ 0..t−1 ,j∈ 0..d .Hence the code generated by D is an evaluation code generated by t independent polynomials of degree more than d + 1 whereas T is a generator matrix of a code generated by d polynomials of degree strictly less than d, then these two codes are linearly independent and we deduce that we have built the kernel of H.We have now to evaluate the minimal distance of this code (T ∪ D).
Hence, we have and Then For i = 0 (i.e t = 1), it means that the vector D 0 corresponds to the evaluation of the fraction over {1, ω, . . ., ω 2d } and we are looking for a degree d polynomial P (X) that cancels the maximum of positions of D 0 , i.e. such that Q(X) = (X + u 0 )P (X) + X + u d+1 0 X d+1 admits the maximum of zeros.We remark that degree(Q) ≤ d + 1, then the number of zero is less than d + 1 which is equivalent to a minimal distance greater than 2d + 1 − (d + 1) = d.In the same time, the Singleton bound states that d min (T ∪ D 0 ) ≤ 2d + 1 − (d + 1) + 1 = d + 1.We deduce that for D = D 0 , We want to evaluate now the minimal distance of a codeword built from a linear combination of D 0,j , D 1,j and T .It means that for a fixed element θ ∈ F q we are looking for a degree d polynomial P (X) such that for a maximum of input we have This is equivalent of studying the number of zero of the function The degree of T (X) is less or equal to d + 2 then T (X) has d + 2 roots maximum which is equivalent to a minimal distance greater than 2d + 1 − (d + 2) = d − 1 and we deduce: By induction we have that for any t, d + 1 − t ≤ d ≤ d + 2 − t and the probing security order is between d − t and d + 1 − t, thus we have demonstrated Theorem 1.
In this Theorem 1, we prove the security of the multiplicative gadget, including the transformation of the shares into the spectral domain (back and forth).This was left out of the scope of former work GJR+ [GPRV21]; we thus offer a comprehensive, end-to-end, security proof of the whole computation.Notice that in the section entitled "Discussion on Hypothesis 1 ", page 620 of [GPRV21], the announced security orders (obtained by exhaustive search, for some examplary small orders) are lower than our bound d + 2 − t.The reason is that examples in [GPRV21] do not satisfy condition (1).
Proof.Without losing in generality, we can assume that i = 0.For t = 1 this is exactly the same proof than the previous one.For t > 1, we must evaluate the number of zeros of the function: As is a polynomial of degree less than d + t − 1 which implies that the minimal distance is greater than 2d + 1 − d − t + 1 = d + 2 − t and the singleton bound states that it is less than d + 2 − t thus it is equal.
This corollary shows that our masking scheme does reach the same masking order as [BEF + 23].
Example 2. If u 0 = 0, we get D 0,j = 1 and the property is satisfied.
As summary, we have proven in this section that given a Vandermonde matrix is at least d − t probing secure.

NI and SNI criteria Definition 1 ([MZ22]
).A function f is t-NI if, when given a total of s outputs and internal probes, s ≤ t implies a dependency with maximum s input shares.A function f is t-SNI if s ≤ t implies a dependency with maximum i input shares, where i is the number of internal probes.
Proof.We remind that for complexity reason, we have replaced the classical Vandermonde matrix by the DFT algorithm.Our chosen DFT has a particular structure, it is an iterative DFT et each step corresponds to a matrix multiplication, then totally, our DFT corresponds to a classical encoder by a (sparse) matrix.Therefore, Theorems 2 and 3 of [WMCS20] apply verbatim.
Remark 2. The refresh gadget of [GPRV21] is obviously compliant with our procedure and it is (d − t)-SNI.
To claim that the complete encoder with its associate gadgets is (d − t)-probing secured, we must prove the property for the multiplication gadget.

The multiplication gadget
The security of the masking representation is immediate owing to the number of shares.However, to be comprehensive, we have to show now that operations are also secure.Namely, the masked multiplication procedure offer also the same level of protection.Regarding the security of this gadget.We remind that the authors of [GPRV21] made a strong hypothesis that we convert in a theorem: Theorem 2 (Hypothesis (FFT Probing Security)).The circuits processing Proof.In fact the application DF T ω ( x 0) → r corresponds exactly to our masking operation mask( x) = ( x, r) × A −1 × R except that A is more general than simply a Matrix of the form (α ij ) i,j .We deduce that t DF T n ≥ d − t in this case since it corresponds to the theorem 1. Regarding DFT −1 ω : u → tt: if fact, u = ref resh(mask( x) * mask( y)) where * represents here the multiplication term by term and not the mask multiplication.In our masking, by definition, we have u = mask( 0) + mask( x) * mask( y).mask( 0) = rH where r is a d + 1 − t dimension vector which is random, then building r requires at least d + 1 − t positions from the vector rH.By construction, DFT −1 ω (mask( 0)) = (a 0 (r), a 1 (r), . . ., a d (r), 0, . . ., 0) = (0, r)A −1 .Then DFT −1 ω (mask( x) * mask( y)) = (c 0 , . . ., c 2d ).We deduce that: We prove below this proof that we cannot construct a sensitive information from (c d+1 , . . ., c 2d ).The coefficients a i of the vector (c 0 + a 0 (r), c 1 + a 1 (r), . . ., c d + a d (r)) depends linearly of r.We have already proven that the encoder (x, r)A −1 is d + 1 − t probing secured, thus getting information from (c 0 + a 0 (r), c 1 + a 1 (r), . . ., c d + a d (r)) requires to capture at least d + 1 − t positions.We deduce the final result, the hypothesis is correct with Then, due to the the previous demonstrated hypothesis, we deduce the following lemma: Lemma 1. [GPRV21] The circuit processing (mask(x), mask(y)) → u = mask(x) * mask(y) is (d − t)-probing secured.
We provide here-after a proof by reduction of our Lemma 1 to the result formulated in [GPRV21, Lemma 1].
Proof.The authors of [GPRV21] have proven (page 619, lemma 1) that the following circuit processing (x, y) probing secure (In fact, we proved that the encoder x → (x, r) context) and we have proven that mask(x) = DFT((x, r)A −1 0) is at least d − t probing secure, then we can now apply the same proof, with t F F T n = (d − t): either a probe gives some information about mask(x) or about mask(y).Finally, each position depends symmetrically of mask(x)[i] and mask(y)[i] which are independent and uniformly distributed, thus less than d − t probes cannot give information about x and y.
Remark 3 (Typographic mistake correction).The proof of lemma 1 in [GPRV21] contains twice the argument "w is added to W 1 ", whereas the second occurrence should read "w is added to W 2 ".
We remind that the inner product mask(x) * mask(y) here is not the gadget multiplication mask(x) × mask(y).Unfortunately, we cannot claim that (mask(x), mask(y)) → mask(x) * mask(y) is (d − t)-NI or SNI secured because the function (x, y) → x • y does not satisfies the t-NI property and we cannot use the composition theorem.
The mask multiplication (gadget) is obtained from the following computation where only the variables c d+1 , . . ., c 2d−t+1 related to the sensitive information.Hence, the weakest side is obtained with the vector Then the question is: can we get information from d−t position of the vector (c d+1 , . . ., c 2d ).Our claim is that our gadget is at least d − t probing secured, then we must assume that in the model of attack, maximum d − t values can be guessed from some measures.From d − t pieces of knowledge from the vector (c d+1 , . . ., c 2d ), x = unmask(z) and x = unmask(z ) cannot be reconstructed: if an attacker has access to the following system of equations We can evaluate the number of potential solutions for (a i admits 2 m solutions.By induction, we get the same property at any step k ≤ d.Thus totally this system admits at least 2 m(d−1)(2 m −1) solutions for d variables a i .This result is obviously worst with less equations, thus this system of equation does not give information from d + 1 − t values of (a i ) solutions.
We conclude that the gadget multiplication is d − t probing secured.
Remark 4. It seems that our encoding method has similar properties than this one defined in [GPRV21] then it would be interesting to investigate if the region probing security still holds here.

Fault detection/correction
Fault attacks are very efficient in general [JT12].Some fault attacks, such as Statistical Ineffective Fault Attacks (SIFA [DEG + 18], inheriting from the seminal work of [YJ00]) can be applied despite masking against side-channel analysis and fault detection mechanisms are in place.
First of all, we cannot claim that our method is fully resilient against fault attack because we did not study the impact of generating a fault on the checker itself (the syndrome calculation), however, we show in this paper that we harden considerably the resilience against fault injection.
We considered two representative fault models, namely one where the attacker has no control over the fault (random model), and one where the attacker can inject targeted low weight faults.We recall that, in front of uniformly random faults, the detection capability is only characterized by the minimal distance.
Furthermore, we assume that the attacker has the ability to inject a certain number of simultaneous faults which is less than the correction capacity of the considered code, especially the Red-Solomon code involved in the gadget multiplication.We consider also that all codewords present in the implementation are corrected/checked.If not, we face an open problem: the impact of the error propagation in the cipher algorithm design and this is out of the scope of this paper.
We recall that by construction, each masked element belongs to the code RS[n, d+1, n− d].Intentional or accidental errors can disturb the symmetric cipher implementation.If an error appears during the first rounds of the considered cipher, then its propagation shall affect dramatically the rest of calculation, making the final result wrong and non-correctible due to the excessive number of errors.It can then give information that may compromise the key.Such scenarios appear for example in case of radiation or in case of intentional fault attacks.We are also aware that such channel perturbation can lead to the presence of erasures, which means that information simply disappears.As we consider the problem of decoding Reed-Solomon codes, erasures can simply be considered as errors.Hence, a decoding algorithm that works for Reed-Solomon codes can correct erasures.Of course it is essential that our counter-measure against FIA does not weaken the counter-measure against SCA, hence we propose to show in the next subsections that our error detection based on the syndrome decoding is secured and efficient.
We recall that we have: where (c d , . . ., c 2d ) = ExtractLastCoefficients(mask( x) * mask( x )), with U k and G i that are precomputed and we have denoted Obviously, introducing errors in the gadget multiplication may be a problem for the following reason: mask( x) * mask( x ) equals DF T ω (C(X)) where C is a degree 2d polynomial thus faults on the vector DF T ω (C(X)) cannot be detected in the RS[2d + 1, 2d + 1] code.However, we remark that the first d coefficients of the polynomials involved in DFT are null by construction.We deduce that injecting a fault inside these vectors can be detected simply by a syndrome calculation (IDFT).An error may be injected in the coefficient c d+1 , • • • , c 2d , but in this case the resulting vector mask( x * x ) does not belong to the RS[2d + 1, d] code and the error will be detected.An attacker may inject simultaneously errors in both vectors, but in this case we are no longer in the random injection model and we face an open problem out of scope of this paper.
Finally this leads us to propose below some improvement.

Detecting faults in the gadget
We propose in this case to slightly modify the parameters of our encoder x → ( x, r) → A −1 R with x ∈ F t q and r ∈ F d+1−t q .We propose to consider some r ∈ F d+1−t−h q with h < d + 1 − t.Hence the resulting polynomial has degree d − h instead of d.This modification implies that the vector mask( x) * mask( x ) = DF T ω (C(X)) can be checked: C(X) has degree 2d − 2h in this case and consequently, the vector DF T ω (C(X)) belongs to the RS[2d + 1, 2d − 2h + 1, 1 + 2h] code of minimal distance 1 + 2h, thus 2h errors can be detected.We remind that the error detection on a codeword can be done by computing its syndrome, and computing its syndrome corresponds with our parameters to perform the IDFT algorithm: the computation of IDF T (mask( x) * mask( x )) states whether it corresponds to a degree d − h polynomial or not.
Regarding the consequences for the SCA security, the probing order is clearly modified because the dimension of r is less than in the original encoder.By analysing carefully the proof of probing order, we observe that this modification does not modify the proof, only the security order is modified, passing from d − t order to d − t − h order.We can now summarize in the following algorithm the step of detection inside the gadget multiplication: Algorithm 6: severalByteProduct with detection Complexity:

About syndrome computation leakage
It is essential that our counter-measure against FIA does not weaken the counter-measure against SCA, thus we propose to show in this section that syndrome decoding cannot leak information.
Namely, we consider the possibility of either detecting or even correcting errors and erasures anywhere in the calculation process where codewords are available.In general, decoding errors leads to unmasking the sensitive information, which is of course not desired between the first and last round of the algorithm that we must protect.For example, Sudan [GS99] and Berlekamp-Welch [RR86] algorithms return directly the sensitive information, while syndrome decoding does not.
Decoding generalized Reed-Solomon codes is classic, but we are particularly interested in syndrome decoding which does not reveal any sensitive information.The algorithm [Sha07,McE77,KB10] that uses the Euclidean algorithm is a syndrome decoding algorithm.It consists in building the polynomials that correspond to the error evaluator and error locator as explained in Theorem 4.3 of [Sha07] and also, as explained at the beginning of the current section 5.2.Hence, this algorithm returns the vector corresponding to the error, that allows to return the corrected codeword belonging to the Reed-Solomon code.Never the sensitive information has been exposed during the process of decoding because the first step consists in cancelling the codeword coming from the encoded information in order to construct the error as we will show later in this section.
In the previous subsection regarding the encoding procedure, we have seen that masking a vector x consists in performing Hence z = mask( x) is simply a codeword belonging to the RS(n, d + 1, d + 1) code.If we denote by V the parity check matrix of R, we have by construction R × V = 0 and in particular mask( x) × V = 0. Thus, by a simple syndrome calculation, if we suppose z was modified by a fault injection attack or a radiation, then we get z = z + e, and we have: Obviously the syndrome calculation does not bring any information since by definition a codeword corresponds to information that has been masked and we have assumed that the potential attacker has not more than d probes, thus no linear transformation can provide any information on the sensitive information.
We note however that determining the efficiency of this method when faults take place in the decoding algorithm itself remains an open problem.But the method is efficient when the fault injections are directed on the masked design of the ciphered algorithm.Then each variable being encoded by our generalized Reed-Solomon code, we may potentially check all variables (this has of course a non negligible cost).The attacker may inject faults on the matrices G and H to disturb the multiplication; then either the number of constructed errors is too large and the algorithm cannot correct it, but it simply detects and alerts (to enable key zeroization for instance), or the number of errors is reasonable and the error correction algorithm can correct the disturbed multiplication.
Eventually, it is up to the security policy to consider the best strategy between detecting and launching a countermeasure or correcting.

Comparison with [BEF + 23]
Recently, the authors of [BEF + 23] proposed a similar solution based on polynomial encoding.Their solution gives a strong resilience against SCA and simultaneously protects against a huge number of fault injections.We propose to compare the solutions here.We note that our solution works for a fixed length n (number of shares) which is given by the possibility of implementing a DFT instead of multiplying by a Vandermonde matrix whereas their solution has a free length (number of shares) depending on the number of detected errors e: either n = 2d + e + 1 in a first version (SotA) or n = d + e + 1 for the improved version (laOla).In order to make easier the comparison, we describe our performances with a Vandermonde matrix instead of a DFT and finally, we describe our performances with a trick used for laOla [BEF + 23].
Table 4 compiles performance figures and/or complexities of [BEF + 23] and our work.This table shows that our scheme is faster, owing to the quasi-linearity complexity of our multiplicative gadget.The difference of complexity also holds for the error detection (and correction) capability, namely quasi-linear in our case versus quadratic for [BEF + 23].Moreover, our scheme supports cost amortization, which allows for further speed-up and huge memory saving.Namely, we can process t sensitive elements altogether whereas [BEF + 23] requires to repeat t times the computation.
The only advantage we see for [BEF + 23] scheme stems from its flexibility.The fault detection capability can be fine-tuned leveraging the parameter e.
Nonetheless, we attempted to compare our work with [BEF + 23] in the context of parametric fault detection capability.In this respect, we had to intentionally degrade our scheme to turn the (quasi-linear) DFT into a (quadratic) multiplication by a Vandermonde matrix.Indeed, DFT is rigid (of fixed size) whereas matrix multiplication is naturally scalable.Despite this handicap, one can notice that our performance are similar (of same quadratic complexity) to that of SotA.Also the error detection (or correction) capability is the same in those conditions.Remarkably, our scheme with "inefficient" Fourier transform still enjoys the advantage to allow for cost amortization.
Then we observe that d errors can be detected on the vectors: C 0 = DFT ω (P 0 (X)P 0 (X)), C 1 = DFT ω (X d/2 T 0 (X)), C 2 = DFT ω (X d T 1 (X)), and C 3 = DFT ω (X d P 1 (X)P 1 (X)), just by remarking that at least d identified coefficients must be zero for each corresponding polynomial, which enables error detection by syndrom computation.Finally we underline that our cost amortization capability can be applied for each vectors l i , i ∈ {0, 1, 2, 3} in order to get 4 degree d polynomials D 0 , D 1 , D 2 and D 3 that satisfy D = D 0 + D 1 + D 2 + D 3 .Hence we avoid the degree 2d polynomial in C(X) and consequently, d errors can be detected.
Interestingly, this trick is compliant with our scheme.Thus, our work is also empowered to detect d faults, anywhere in any gadget, where 2d + 1 is the dimension of the codes.This is reflected in the last-but-one column of Table 4.Our value of the security order benefits from Corollary 1 (i.e., it attains its maximum value d + 1 − t), thereby equating the probing security order of [BEF + 23] schemes (SotA and laOla).

Software implementation
The implementation of a masked AES-128 allowed us to accurately measure the gain in time and memory space that can be obtained with parallel masking (that is, t > 1).Indeed, as we can see in Fig. 1, the computation time decreases linearly according to the size of the sensitive data (t), consistently across values d (masking order).We can also witness the quasi-linearity of the computation time (this quasi-affine function depending on the value of d), and the non-linearity (namely, the "quadricity") of RP masking [RP10]: • the RP masking (in log-log scale) computation time curve grows by two decades when d grows by one decade, • whereas for our scheme, the slope is less than two (and the value also is less).
The need for randomness is represented in Fig. 2, and same observations can be done.All values of d are represented for which there exists a DFT (namely d ∈ {1, 2, 7, 8, 25, 42}), under the condition d > t.
We had to represent speed and randomness for large values of d not because practical applications requires very high masking order, but to show the asymptotic complexity.
We used the C code from Jean-Sébastien Coron's github project [Cor] to implement RP.But we replaced the optimized log-table based multiplication by a constant-time one.Namely, hardcoded tables sq, taffine, tsmult in file "aes_rp.c"have been replaced by their algorithmic counterparts.The rationale is that masking is pointless if applied on a non-constant time implementation, because timing leakage is exploitable at 1st order [BGV21].Obviously, we have adopted the same constant-time implementation to our schemes, hence the comparison is fair.Such implementation of field multiplication is used alike in both schemes (RP and ours).
These statistics concern the calculation of 50 times an AES-128 encryption, implemented with C, compiled with gcc, with a refresh after each multiplication (SMult) or exponentiation, and executed on an Intel(R) Core(TM) i7-8550U, CPU 1.80 GHz processor, 16 GB of RAM, with different configurations of our scheme compared to Rivain and Prouff (RP) scheme [RP10].
Masking with cost amortization also reduces memory usage.Indeed, with t = 16, the total cost to mask a block of 16 bytes is n instead of 16n.In general, the size of a masked word for AES is 16n/t.
) i∈ d..2d : by assuming that c 2d = 0, then the equation c 2d = a d a d admits 2 m − 1 solutions.If c 2d = 0, then a d a d admits 2 m solutions.By setting a d = 0 and a d = 0 we get the equation c 2d−1 = a d−1 a d + a d a d−1

Table 2 :
Complexity of operations involved in the masked multiplication

Table 3 :
Side-channel security order versus fault detection / correction, in F 256 B and the B satisfies the definition of a generic encoder denoted enc B .If we denote by d the minimal distance of C H perp : d = d min (C H perp ), then, as explained in [WMCS20], a direct consequence is that the encoding procedure enc B is d -private.Our task consists now in evaluating d and we propose to demonstrate the following theorem: Let an integer t, 1 ≤ t ≤ d, a Vandermonde matrix A of the form (u i j ) i,j∈ 0,d with u i = u j .Let R the generator matrix of the Reed-Solomon code RS Proposition 2. The masking operation mask( x) is a generic encoder.Proof.We have seen that mask( x) = xG⊕ rH.By construction, rank(G) = t and rank(H) = d + 1 − t.If we denote C G , C H and C H perp the codes respectively generated by the generator matrix G, H and the kernel of H, then C G ∩ C H = {0}.If we denote B = G H , then we have: mask( x) = ( x, r) ×