Information Leakages in Code-based Masking: A Uniﬁed Quantiﬁcation Approach

. This paper presents a uniﬁed approach to quantifying the information leakages in the most general code-based masking schemes. Speciﬁcally, by utilizing a uniform representation, we highlight ﬁrst that all code-based masking schemes’ side-channel resistance can be quantiﬁed by an all-in-one framework consisting of two easy-to-compute parameters (the dual distance and the number of conditioned codewords) from a coding-theoretic perspective. In particular, we use signal-to-noise ratio (SNR) and mutual information (MI) as two complementary metrics, where a closed-form expression of SNR and an approximation of MI are proposed by connecting both metrics to the two coding-theoretic parameters. Secondly, considering the connection between Reed-Solomon code and SSS (Shamir’s Secret Sharing) scheme, the SSS-based masking is viewed as a particular case of generalized code-based masking. Hence as a straightforward application, we evaluate the impact of public points on the side-channel security of SSS-based masking schemes, namely the polynomial masking, and enhance the SSS-based masking by choosing optimal public points for it. Interestingly, we show that given a speciﬁc security order, more shares in SSS-based masking leak more information on secrets in an information-theoretic sense. Finally, our approach provides a systematic method for optimizing the side-channel resistance of every code-based masking. More precisely, this approach enables us to select optimal linear codes (parameters) for the generalized code-based masking by choosing appropriate codes according to the two coding-theoretic parameters. Summing up, we provide a best-practice guideline for the application of code-based masking to protect cryptographic implementations.


Introduction
Masking is one of the most well-studied countermeasures to protect cryptographic implementations against side-channel attacks due to the favorable provable security it provides. The core idea underlying any masking scheme is to split the sensitive (key-dependent) variables into several shares and perform independent computations on masked variables only. Indeed, the rationale is that, given a sufficient amount of noise, the attack complexity increases exponentially with the number of shares [CJRR99,PR13], while the implementation cost increases only quadratically (or only cubically in higher-order glitch-free implementations [GSF13]).
Two key ingredients of a masking scheme are the encoding for randomizing the sensitive variables, and the masked operations for manipulating the random shares. Regarding the latter, the secure masked operations can be constructed effectively [ISW03,RP10] for both bit-and word-oriented variables. Furthermore, thanks to the well-established concept of (Strong) Non-Inference (NI and SNI) introduced by Barthe et al. [BBD + 16], the basic gadgets carrying out the elementary operations (e.g., addition, multiplication, etc.) can be composed to construct the whole implementation without losing the claimed security properties. Regarding the former, the encoding is a more fundamental ingredient in masking that provides the achievable upper bounds of side-channel security order with tunable public parameters. Indeed, firstly, the side-channel security order of the full implementation cannot exceed the security order of the corresponding encoding, and secondly, when implemented ideally, the security order of an implementation can be guaranteed by its encoding. However, evaluating the concrete side-channel resistance of the encoding in general cases remains an open problem since many different encodings in various masking schemes behave differently when fed with diverse parameters. Therefore, a unified quantification approach would formalize and compare the security of different encodings and find optimal parameters for a specific masking scheme.

Unifying Masking Schemes by Generalization
Generalization is a promising approach to unify different masking schemes. In this trend, the code-based masking generalizes many existing schemes, including Boolean masking, Inner Product masking (IPM) 1 [BFG15, BFG + 17], Leakage Squeezing (LS) [CDG + 14, CG18] and Direct Sum masking (DSM) [BCC + 14, PGS + 17]. To the best of our knowledge, the generalized code-based masking (GCM) [WMCS20] is the most generic scheme in this respect. In particular, polynomial masking [GM11,PR11] is also a special case of GCM, which is built upon Shamir's secret sharing (SSS) scheme [Sha79].
Let X ∈ F k 2 and Y ∈ F t 2 be respectively the sensitive variable and t random masks. Then the encoded variable in GCM writes: Z = XG + Y H ∈ F n 2 , given that k + t ≤ n, where G and H are generator matrices of two codes C and D, respectively. For the sake of simplicity, we take k = 1, but essentially, the GCM can use packed secret sharing techniques [GSF13,WMCS20] to improve the performance by parallelism. However, the side-channel security evaluation of encoding is similar to any k, since each of the k sensitive variables is encoded similarly. The overview of connections between these masking schemes is shown in Fig. 1, where the four intersecting areas are: • Intersection I: as pointed out in [CS21], Boolean masking can be considered as a special case of polynomial masking for small enough parameters (n ≤ 6 or equivalently t ≤ 5).
• Intersection II: in [BFG15], the authors claimed that the polynomial masking is a special case of IPM. However, this generalization does not indicate the exact connections between SSS-scheme and RS codes. Indeed, if we take the polynomial evaluations in encoding into consideration, the generalization from SSS-based masking to IPM is valid only when n = 2 and t = 1. • Intersections III and IV: in SSS-based masking, if n = t + 1, the codes C and D are complementary, therefore they can be viewed as DSM (or LS) scheme. Otherwise, if n > t + 1, the corresponding masking schemes are out of DSM's scope. On the other side, the linear codes for DSM may not be converted into SSS-based schemes since the codes in SSS are endowed with a specific algebraic structure.
The most significant benefit of utilizing code-based masking is the higher security order than the simple Boolean masking given the same number of shares. Taking 2-share IPM over F 2 8 [BFG + 17, CGC + 21] as an instance, when appropriate public parameters are chosen, the side-channel security order can be maximized to 3 under the bit-probing model [PGS + 17], which is higher than 1 in Boolean masking. Moreover, the security orders are enlarged to 7 vs. 2 (IPM vs. Boolean one) in 3-share scenarios [CGC + 21, Tab. 2].
Currently, the side-channel security order of GCM has been connected to the dual distance of D [PGS + 17, CG18], which is denoted as d ⊥ D . As a special case, the security order t in IPM and DSM is equal to d ⊥ D − 1 since the two codes C and D are complementary. However, as pointed out in [CGC + 21], the dual distance of D is not sufficient to characterize the concrete side-channel resistance of IPM, hence a new framework with a new parameter (more precisely B d ⊥ D , which counts the number of codewords of Hamming weight equal to d ⊥ D in D ⊥ ) is proposed to model IPM's concrete security level more accurately. Nevertheless, this framework is not applicable to GCM since C and D may not be complementary anymore.

Public Points in SSS and Polynomial Masking
To construct a t-th order secure polynomial masking, a polynomial of degree t is firstly selected: f X (X) = X + t i=1 u i X i , where the secret X is then associated as the constant term in f X (X). Secondly, f X (X) is evaluated in n distinct points α i for 1 ≤ i ≤ n, which are called "public points" in the scheme. As a result, the secret X is encoded by using the private parameters u i (which are random masks viewed in the context of masking).
As observed in [CMP18], the public points in SSS play a significant role in the sidechannel resistance of SSS-based masking schemes. In fact, this problem of public points is inherent in the SSS scheme and can be dated back to Massey [Mas93] who claimed that SSS scheme "can be attacked with the well-developed tools of algebraic coding theory". The SSS-based masking provides a practical example whereby changing the public points in polynomial masking, the concrete security level can be significantly different.
However, to the best of our knowledge, there are neither qualitative principles for selecting good or even optimal public points in SSS-based masking nor a quantitative approach to evaluate the role of public points played in the side-channel resistance of SSS-based masking. In this paper, we propose solutions to the two problems by utilizing a coding-theoretic quantitative approach.

Independence Assumption behind Masking Schemes
The independence assumption is an indispensable condition behind the security proofs when extending from the probing model to the bounded moment model or noisy leakage models [BDF + 17, DDF14]. For instance, if this independence condition is violated due to physical defaults (e.g., couplings through the ground or parasitic capacitances, glitches, etc.), the side-channel security order will decrease accordingly [DFS15]. However, this independence condition is essentially related to inter-share leakages from different shares in masking and treats each share as a whole.
Moreover, the independence issue also happens in intra-share cases where the leakages of different bits in the same share leak jointly. This kind of leakage is often called non-linear leakages and comes, e.g., from registers or memory units of real devices. In fact, both intra-share and inter-share independence issues can happen simultaneously. Taking AES implemented on ARM Cortex-M4 as an example, where the registers are 32-bit, and each share is in F 2 8 , four shares can be manipulated at the same time. Consequently, the register will leak jointly, including intra-share and inter-share leakages. To the best of our knowledge, the intra-share independence issue has not yet been studied thoroughly in the sense of security order reduction. We will show that essentially, the intra-share independence is the condition for higher security orders under the bounded moment model [BDF + 17].

Our Contributions
In view of the above state-of-the-art, our contributions are threefold as follows.
A Unified Leakage Quantification Approach for GCM. We derive a closed-form expression for SNR to quantify the information leakages in GCM for any leakage functions. In particular, we present a simplified expression for the Hamming weight leakage model. In fact, this new result generalizes the framework proposed in [CGC + 21] for IPM. Furthermore, we use mutual information (MI) to quantify the information leakages of GCM in an information-theoretic sense. Both SNR and MI are connected to two properties (namely the dual distance and the number of conditioned codewords) of the linear codes used in GCM. Relying on a theoretical analysis of SNR and MI, we propose a unified approach to quantify information leakage in GCM. Then we show how to select optimal codes for GCM by optimizing the two properties. The experimental results confirm that the MI can be minimized by utilizing optimal codes, which indicates the improved concrete security level of the corresponding masking scheme.
Optimal Public Points for SSS-based Polynomial Masking. As an application of our unified approach, we characterize the side-channel resistance of polynomial masking from a coding-theoretic point of view. The first outcome is a more accurate characterization of information leakage and the second outcome is a straightforward method to choose optimal linear codes (parameters) for SSS-based masking. For the first time, we quantify the impact of combining different public points in SSS-based masking in the context of side-channel analysis and show that more shares leak more information (given a specific t). In particular, our coding-theoretic approach can exactly depict the observations made in [CMP18]. Using MI, we present the quantitative results of information leakages in SSS-based masking, which again validate our unified approach. For the first time, we exhibit several optimal tuples of public points (the linear codes in a coding-theoretic perspective) for SSS-based masking in the sense of side-channel resistance.
Revisited Independence Condition in Masking Schemes. Independence condition requires that the information leakages from different variables are statistically independent. In the context of masking, it exists in two cases: inter-share and intra-share. Specifically, the former means that the leakages of different shares are independent, which is wellstudied in literature [BDF + 17]. The latter deals with the leakages from one share, in which different bits in this share may leak independently or not. To capture both of them, we introduce the leakage function P , where its numerical degree indicates both cases' independence conditions. For instance, the commonly assumed Hamming weight leakage model has a numerical degree equal to 1, a perfect independent case. Moreover, we show how the degree of P affects the side-channel security order of a masking scheme.
We underline that all mathematical derivations presented in this paper have been verified formally with Magma computational algebra system [Uni]. The open sources of this paper are available on Github [CG20].

Difference between this work and [CGC + 21]
In this work, we study GCM by using a similar coding-theoretic approach as in [CGC + 21]. However, two key differences make this work significantly different from [CGC + 21].
Firstly, GCM generalizes IPM by allowing C, and D to be non-complementary, which allows deriving security metrics in a more general manner. In [CGC + 21], the authors prove that the side-channel security of IPM only depends on the code D. While in this work, for the first time, we show that the side-channel security depends on both C and D.
In particular, the quantitative findings enable us to put forward optimal GCM encodings which are new upon [CS21]: given the same parameters n and t (the number of shares and security order), we decrease the information leakage in GCM to the lesser possible extent.
Secondly, GCM allows for protections in much more general contexts. Namely, GCM can be used to withstand glitches [PR11] and to detect errors against fault injection attacks on top of preventing side-channel attacks. Therefore, our work has broader implications for the protection of realistic platforms. In a nutshell, GCM opens a new path to derive unified countermeasures against both fault injection and side-channel attacks.

Encoding in Code-based Masking
Let n, k be positive integers and K = F 2 be a finite field. Let C be an [n, k] q linear code parameter with generator matrix G defined over F q (here we use q = 2 ). Let the irreducible polynomial be g(α) = α 8 + α 4 + α 3 + α 2 + 1 to generate the field K = F 2 8 . Recall that for an (n, t)-SSS scheme, the secret X is split into n shares, and the sharing is t-privacy, where any t + 1 shares can be used to recover the secret but not for less than t shares. Note that the (n, t)-SSS scheme is also connected to the Reed-Solomon (RS) code with parameters [n, t + 1].
Let X ∈ K k , Y ∈ K t and Z ∈ K n be the sensitive variable, the random masks, and the shared variable; we use Eqn. 1 as the uniform representation of encoding in GCM which is used throughout the paper: where k + t ≤ n, G and H are two generator matrices of the two codes C and D with C ∩ D = {0}.
In this paper, we focus on GCM, which is the most general case of code-based maskings 2 . By using the uniform representation as Eqn. 1, we revisit the encodings of code-based masking schemes as in Tab. 1.
Security parameters: n, k, t k = 1, n = t + 1 G, H can be any matrices G, H can be any matrices n ≥ k + t and f X (X).
In glitch-free case, G, H can be any matrices

Linear Codes
We recall several known definitions and properties on linear codes, which hold respectively when the base field is K = F 2 or K = F 2 . Given a linear code C with parameters [n, k, d C ] where d C is the minimum distance, its weight enumerator is defined as follows.
Definition 1 (Weight Enumerator [MS77, §5.2]). The weight enumerator of a linear code specifies the number of codewords C of each possible Hamming weight in C. Specifically, we have where B i = |{c ∈ C|w H (c) = i}| and w H (·) denotes the Hamming weight function.
For instance, given a linear code C we have B 0 = 1, Note that two linear codes are said to be equivalent if one can be obtained from the other by a series of operations of the following two types: 1) an arbitrary permutation of the coordinate positions and 2) in any coordinate position, multiplication by any nonzero scalar. Straightforwardly, equivalent linear codes have the same weight enumerator.
Definition 3 (Dual Distance [MS77]). The dual distance d ⊥ C of a linear code C is the minimum Hamming weight w H (u) of nonzero u ∈ K n , such that c∈C (−1) c·u = 0.
According to [MP13, Theorem 5.1.18], there exists a self-dual basis of F q over F q if and only if either q is even or both q and are odd. We call this a sub-field representation.
Definition 5 (Code Expansion [MS77, §7.7]). By using sub-field representation, the elements in F 2 are decomposed over F 2 . Consider a generating matrix of a linear code of size k × n in F 2 . It becomes a generating matrix of size k × n in F 2 . Any linear codes of parameters [n, k] 2 contain (2 ) k = 2 k codewords, hence is turned into a [n , k ] 2 linear code in F 2 . The latter code is called the expansion code of the former.
Summing up, the two definitions build a direct link between word-and bit-level representation. This allows to connect the word (or register)-level probing and the bit-level probing security models, depending on the granularity of the attacker spying tool.

Properties of Complementary Space Vectors
In this subsection, we derive relevant properties of complementary space vectors that will be needed to derive our results. Let E a space vector of K n . The indicator of E is the application Lemma 1. Let C and D be two space vectors in K n built from independent bases, meaning Proof. First of all, we notice that (C ⊕ D) ⊥ ⊆ C ⊥ . Indeed, a vector orthogonal to all vectors of C ⊕ D is in particular orthogonal to all vectors of C + 0 = C. In a symmetric way, we have that Let us now prove the converse inclusion. Let x ∈ C ⊥ ∩ D ⊥ . For any vector y in C ⊕ D, there exists a unique pair (c, d) ∈ C × D (owing to the complementarity of space vectors C and D), Proof. By application of Lemma 1, we have that C ⊥ ∩ D ⊥ = (C ⊕ D) ⊥ = (K n ) ⊥ . Now, as K n is the universe code, we have (K n ) ⊥ = {0}.
In the remainder of this paper, we consider two cases: • In GCM as a general case: C ∩ D = {0}, and C ⊕ D ⊆ K n . The redundant case n > t + 1 corresponds to the strict condition: C ⊕ D K n and then {0} C ⊥ ∩ D ⊥ .
• In IPM or DSM as special cases: C ∩ D = {0}, and C ⊕ D = K n . This is the case of [CGC + 21], where we have C ⊥ ∩ D ⊥ = {0} as shown in Lemma 2.
Definition 7 (Fourier Transform). The Fourier transform of a pseudo-Boolean function P : K n → R is denoted by P : K n → R, and is defined as:

Connecting SSS Scheme to the RS code
We recall the (n, t)-SSS scheme by mainly referring to [CMP18,CRZ13]. Let X ∈ K again be the secret and can be split into n shares such that no tuple of shares with cardinality lower than t depends on X. The SSS scheme consists in selecting a random polynomial The recovery of X from its sharing consists in two steps: f X (X) is first recovered by using the Lagrange interpolation and second, f X (X) is evaluated in 0. Since in an (n, t)-SSS, any tuple of shares with cardinality greater than t can be used to recover X, we denote by U the selected shares (|U | ≥ t + 1), which is called the interpolation set.
Next, we recall the Reed-Solomon codes.
Given the degree of f (X) is t, then t + 1 evaluations of it can be used to recover f (X) itself and the codewords. In terms of RS codes, the sharing of X with SSS scheme is an encoding with a RS code RS ({α 1 , . . . , α n }, t + 1): where G H is the generator matrix (α j i ) i∈[1; n], j∈[0; t] . More precisely, G is an 1-by-n matrix equal to (1, 1, . . . , 1) and H is a Vandermonde matrix. By denoting G i and H i the i-th column of G and H respectively, we have: Accordingly, the reconstruction of X from Z = (Z 1 , Z 2 , . . . , Z n ) is done by taking Z i to obtain an interpolation set U such that |U | ≥ t + 1. We also call this scheme the redundant sharing when n > t + 1 since at least t + 1 shares can recover X. We will show in Sec. 5 that more redundancies in sharing of SSS-based masking leak more information on X.

Quantifying Information Leakages in GCM
In this section, we use SNR as a leakage metric to evaluate the information leakages in GCM. In particular, SNR quantifies the key-dependent leakage at certain degrees. SNR is thus attractive in that if SNR at a given degree d is null, then one can conclude that the scheme is secure at order d.

Uniform Representation of Leakage Function
As the first step, we formalize the information leakages from a device. In this respect, we rely on the clarification on serial and parallel implementations proposed in [BDF + 17].
Before formalization, we give an example to provide some intuition for the uniform leakage function P . Let Z = (Z 1 , Z 2 , . . . , Z n ) denote the encoded intermediate with n shares and X be the secret. By ignoring the noise, we assume the leakage of each share is L i = Z i under the identity leakage model and L = i L i is the total leakage. To launch a successful attack, an adversary needs to find the (smallest) key-depend statics, namely Equivalently, an adversary needs the smallest d such that V E L d |X = 0, which measures the informative part in L.
Formally, let P = ϕ P • φ P denote the leakage function, where φ P is the leakage model for each share, and ϕ P is the combination function that assembles the leakages from selected shares. In this paper, we call φ P and ϕ P the intra-share and inter-share leakage model, respectively. For instance, in serial implementations, the leakage of each share is: L i = φ P (Z i ) + N i , then the exploitable leakages can be combined by ϕ P . For instance, taking the Hamming weight model and centered product as leakage model and combination function, respectively, then where the latter combines leakages of d shares by the normalized product. Consequently, the highest order of key-dependent leakages is captured by P with numerical degree d.
Therefore, we use the following representation of P as a pseudo-Boolean function: where Z I = i∈{1,...,n } s.t. Ii=1 Z i , and β I ∈ R and deg(P ) = max{w H (I) | β I = 0}.
Two Probing Models. For the purpose of a finer-grain analysis, we clarify the two kinds of probing model (see also [DGH + 18, §2.2]) and corresponding security orders as follows: • Bit-probing model: each probe only gets one bit at a time where each bit leaks independently or jointly. Correspondingly, φ P is defined at bit-level and ϕ P at certain degrees are used to combine the bit-level leakages. The security order under the bit-probing model is denoted by t b .
• Word-probing model: each probe gets an -bit word at a time, where an -bit variable leaks as a whole. As a result, the degree of φ P implies how many numbers of bits leaked jointly, in which the intra-share independence condition plays a role in security order reduction, as shown above. Similarly, the security order is then denoted by t w .
When connected to coding-theoretic properties, the security orders t b and t w are related to the dual distance of the code D used in GCM over F 2 and F 2 , respectively [PGS + 17, In the sequel, we call t the security order for the sake of simplicity, t b and t w should be unambiguous from the context (e.g., variables in F 2 or F 2 ).

SNR-based Information Leakage Quantification
Let P (Z) be a leakage function as in Eqn 4 and let N denote the independent noise with zero mean and variance Then, the leakage is: is the encoding in GCM (Equ. 1). The SNR of leakages is defined as: Therefore, we propose the following theorem to quantify the leakages in the GCM scheme by SNR.
where σ 2 total ∝ σ 2d is the total noise and P (·) is the Fourier transform of P (·) The demonstration of Theorem 1 involves computing V [E [P (Z)|X]], which can be derived by the following Lemma 3. In order to have the paper read fluently, its proof is relegated in Appendix A.1 which also proves Theorem 1.

Lemma 3. Let a pseudo-Boolean function P (Z) denote the leakage function, and taking the same notations as above, we have
Remark 1. Note that Lemma 3 encompasses the core result in [CGC + 21]. Indeed, as a special case, if n = t + 1 in SSS-based masking, the two codes C and D are complementary, as well as C ⊥ and D ⊥ . Since by Lemma 2, we have C ⊥ ∩ D ⊥ = {0} and the only possible solution in Eqn. 7 is x = y = 0. Therefore, V [E [P (Z)|X]] can be simplified into: which is exactly the same result as in [CGC + 21].
As a nutshell, the information leakages from GCM can be quantified by Theorem 1 under the generic leakage model characterized by P , which evaluates the SNR of the leakages. As a direct result, we have the following proposition, which connects the code property d ⊥ D and the security order in GCM. Proof. Given a pseudo-Boolean function P , one has P (z) = 0 for all z ∈ K n such that w H (z) > deg(P ) [CG99]. As a result, SNR will be zero since deg(P ) < d ⊥ D and all codewords of D ⊥ \C ⊥ as in Eqn. 6 have Hamming weight no less than d ⊥ D . Consequently, the attacks on GCM fail if deg(P ) < d ⊥ D . Conversely, for an attack to succeed, one must have deg(P ) ≥ d ⊥ D . This is, however, only a necessary condition, but not a sufficient one. Indeed, it is possible that attacks in the setting deg(P ) ≥ d ⊥ D fail. This is illustrated in the next remark.
Remark 2. The security order can be even higher than d ⊥ D −1 when there is no x, y ∈ D ⊥ \C ⊥ such that x + y ∈ C ⊥ which have weight d ⊥ D . Indeed, in Eqn. 6, the sum will be empty if the degree of P is equal to deg(P ) = d ⊥ D . Thus the SNR is equal to zero, and the security order increases accordingly. A specific example can be found in [WMCS20, Example 1] (shown in Appendix B.5), in which d ⊥ D equals 2 and the security order equals 2 as well.

Quantifying Hamming Weight Leakages
One realistic leakage model is the so-called "Hamming weight" leakage: each bit is leaking in a similar amount, though independently from others. It has been demonstrated to be practical in many works, such as [BCO04]. In this case, the attacker can measure a quantity

Simplifications
We use P (z) = w H (z) d as the informative part in a leakage model, which captures the higher-order leakages where the numerical degree deg(P ) equals d. Moreover, we have: This coefficient equals d! as long as for all i (1 ≤ i ≤ n ), J i = 0 or 1. Now, the terms in P (z) are categorized into two cases: The first terms z J have numerical degree deg(z J ) < d, hence can be discarded in the analysis (they contribute nothing to the SNR). Remaining terms of numerical degree d are: I∈{0,1} n , w H (I)=d z I .
Relying on decomposition in Eqn. 9, we can simplify lemma 3 as follows.
Lemma 4. Let a pseudo-Boolean function P (Z) = w H (Z) d denote the leakage function, and taking the same notations as above, we have where B d denotes the adjusted coefficient in weight enumerator which is defined in Def. 9.
Before diving into the proof of Lemma 4, we define the parameter B d ⊥ D which count the number of codewords under certain conditions in C ⊥ and D ⊥ .
Definition 9 (Adjusted coefficient in weight enumerator). Let C and D denote two linear codes. The adjusted coefficient B d is defined as: To be more precise, we use subscript 2 (if necessary) to indicate the subfield representation of a linear code. For instance, D 2 denotes the subfield representation of D over F 2 . Therefore, we have the following lemma for B d .
is the coefficient in weight enumerator of D ⊥ 2 defined in Def. 1, then we have the following inequality in SSS-based masking: is the number of pairs of codewords (x, y) in D ⊥ \C ⊥ which satisfy the two conditions: their sum is in C ⊥ and their weights are equal to d ⊥ D2 . Clearly, this number is greater or equal to the same number of pairs where in addition, x and y are chosen to be identical. In the latter case, the number of codewords is equal to: because x + y = 0 does always belong to C ⊥ and that x and y have the same Hamming weight since they are equal. Now, Eqn. 12 is the minimum nonzero coefficient in the weight enumerator of D ⊥ \C ⊥ , which is equal to B d in SSS-based masking.
Proof of Lemma 4. Let ϕ I (z) = z I where I ∈ {0, 1} n . Thus Since all monomials with numerical degree smaller than d have SNR = 0, we only focus on monomials with numerical degree equal to d.
is linear combination of monomials with numerical degree smaller than d in ϕ I (z), then the Fourier transform of ϕ I (z) is: We have φ I (y) = 0 for y with w H (y) ≥ d ⊥ D = t + 1 > d, since given a pseudo-Boolean function P , one has P (z) = 0 for all z ∈ K n with w H (z) > deg(P ) [BCC + 14, Lemma 1]. As a result, by combining Eqn. 14 with Eqn. 24, we have the following equation: where B d is the adjusted coefficient in weight enumerator defined in Def. 9.

Connecting SNR with Code Properties
Taking Lemma 4 as an input to Theorem 1, we have the following theorem for Hamming weight leakages in GCM.
Theorem 2. Let a device be protected by the GCM scheme as Z = XG + Y H. Assume the device is leaking in Hamming weight model in the form: L = P (Z) + N . Then the SNR of the exploitable leakages is: where σ 2 total is the total noise such that σ 2 total ∝ σ 2d . Proof. Obviously

MI-based Information-Theoretic Leakage Quantification
We extend the leakage quantification approach by using another metric, namely MI, in an information-theoretic sense. Let the secret X be encoded as in Eqn. 1, and let the leakages be L = P (Z) + N , then the MI between L and X is defined as I where k d is the d-th order cumulant [Car03]. Assuming the device is leaking in the Hamming weight model, we have the following theorem for quantifying the information leakages in GCM.
where σ is the standard deviation of noise in the leakage of each share.
Assume that the device leaks in Hamming weight model, then P (Z) d ⊥ D has a degree equal to d ⊥ D . Hence the MI is equal to: when σ → +∞. Finally, Eqn. 18 can be further developed at the first order in 1/σ 2d ⊥ D as follows after involving Eqn. 15: , when σ → +∞, which proves Theorem 3. Fig. 2. More precisely, the estimated MIs are converging to numerical one when log 10 σ 2 ≈ 1.5, which verifies Theorem 3 numerically. in an information-theoretic sense. In the general case of leakage function P , the MI can be estimated similarly by applying different forms of P into Eqn. 18 to derive connections to coding properties correspondingly.

Optimal Codes for GCM
Thanks to Theorem 1, 2 and 3, we can compare the information leakages of GCM in a quantitative manner. More importantly, relying on the analytic characterization of information leakages, the three theorems enable us to choose optimal linear codes for GCM. Specifically, the codes with maximized d ⊥ D and minimized B d ⊥ D are the best candidates for GCM. Considering the SSS-based masking as a special case, the optimal public points can be determined straightforwardly by applying the two theorems.
To thoroughly validate the optimal codes, we consider multivariate leakages. In particular, it is shown in [SVO + 10] that comparing to sum, absolute difference, and normalized product, the joint distribution is the most efficient way to combine the multivariate leakages in side-channel analysis. In this paper, we consider both sum and joint distribution to exploit the multivariate leakages. A comparison of the two combination functions in an information-theoretic sense is presented in Appendix B.2.
We take (3, 1)-SSS based masking as an example of GCM and specify it as follows. Let X be encoded into Z = XG + Y H with n = 3 shares, the two generator matrices are: Considering the common "Hamming weight + Gaussian noise" model, the side-channel leakages are simulated as follows.
is the Gaussian noise. To combine the 3-D leakages, other sum or joint distribution are applied wherein ϕ P (L) = 3 i=0 L i is called 1-D leakages or ϕ P (L) = (L 1 , L 2 , L 3 ) is called 3-D leakages, respectively.
The results are shown in Fig. 3(a) and 3(b) are 1-D MI and 3-D MI, respectively (more results over F 2 4 are in Appendix B.1). The first observation is that the 3-D MI utilizing joint distribution exploits more key-dependent information existed in leakages, therefore the attack is more efficient when using the joint distribution of leakages [BGHR14]. Secondly, the numerical results in Fig. 3 are in accordance with the Theorem 2 and 3, where the two parameters d ⊥ D and B d ⊥ D in codes play a significant role in determining the side-channel resistance of GCM.
Thirdly, the strategy to choose the optimal codes for GCM is to maximize the dual distance d ⊥ D and/or to minimize the conditioned number of codewords B d ⊥ D . Moreover, the concrete side-channel security level of GCM will be improved by optimizing either of the two parameters. Interestingly, when the noise levels are at certain intervals, the codes with smaller d ⊥ D (also with smaller B d ⊥ D ) may be better than that with larger d ⊥ D . For instance, for the curves in purple (the fourth one) and in sky-blue (the fifth one) of Fig. 3, the corresponding d ⊥ D are 2 and 3, respectively. When σ 2 < 10, the purple curve shows a better side-channel resistance than the sky-blue one.

Enhancing the SSS-based Polynomial Masking
In the context of masking, the random masks in SSS-based masking are u i for 1 ≤ i ≤ t where α 1 , α 2 , . . . , α n are n public points. Two main observations made in [CMP18] are: • the choices of public points α i can have an impact on side-channel resistance of the corresponding masking scheme, therefore, combining different t + 1 tuples of Z i , the efficiencies of corresponding template attacks are different, • combining more than t + 1 tuples of Z i may improve the attack efficiency in the sense of the number of traces needed to recover the secret key.
Recall that the generator matrices in SSS-based masking (e.g., the RS code) from Tab. 1, G and H are the same as the generator matrices in DSM when n = t + 1. In the context of masking, we only care about G and H, since the former is used to encode the secret X and the latter is for encoding the random masks (e.g., u 1 , . . . , u t in the case of SSS-based masking).
Note that H is a Vandermonde matrix, resulting in that the code D is a maximum distance separable (MDS) code, it is optimal at word-level. However, with different parameters α i for 1 ≤ i ≤ n, the codes have different impacts on side-channel resistance when they are adopted in masking schemes.

Further Clarifications
We further clarify the properties of the code D and its dual as follows. Let D be an RS code of parameters [n, t, n − t + 1] which is generated by H in Eqn. 3. Then its dual code D ⊥ is also an RS code of parameters [n, n − t, t + 1] [MS77]. Recall the connections between the RS code and SSS scheme, D can be used to construct an (n, t)-SSS scheme.
Given that n ≥ t + 1, we assume that t + 1 ≤ n ≤ n, the code D is constructed by selecting n columns from the generator matrix H of D (or equivalently, remove n − n columns in H). Subsequently, the code D has parameters [n , t, n − t + 1]. It is also an RS code and its dual code D ⊥ has parameters [n , n − t, t + 1]. Therefore, the dual distance of D is equal to D, namely d ⊥ D = d ⊥ D = t + 1. In summary, removing some coordinates (n ≥ t + 1) in RS code does not decrease its dual distance (at word-level).
Remark 3. Note that for two arbitrary linear codes D and D where the latter is generated from the former as above (by selecting some coordinates), we have the following lemma for their dual distances.
Proof. Assume u ∈ D ⊥ , by appending n − n zeros to u, then the new codeword (u, 0 n−n ) is also a codeword of D ⊥ . Therefore we have d ⊥ Interestingly, Lemma 6 implies that given a fixed t, adding more shares in an (n, t)-SSS based masking cannot increase the security order of the corresponding masking scheme and can be more likely to lower the security order, especially under the bit-probing model.

Representing Linear Codes in Subfield F 2
We take F 2 as the subfield, then any codes over F 2 can be expanded into subfields by code expansion Def. 5. We further investigate the properties of codes D and D .
Let D 2 and D 2 denote the expanded codes of D and D over F 2 , respectively. Since they are not MDS codes at the bit level, there is no straightforward method to compare the dual distances of D 2 and D 2 . However, by Lemma 6, it is obvious to have . This connection helps in SSS-based masking since, by increasing n, the dual distance at word-level keeps the same, but the dual distance at bit-level cannot be larger than in the case with n = t + 1. Moreover, from the adversary's viewpoint, combining more than t + 1 shares may be more efficient when attacking a specific SSS-based implementation.
From the quantitative results in Sec. 3, two parameters that have an impact on the side-channel resistance of GCM is the dual distance d ⊥ D2 and the coefficient . Hereafter, we use the information-theoretic metric to show how the more redundant shares affect the concrete security level in SSS-based masking.

More Redundancy in Sharing Leaks More
We present an information-theoretic evaluation on (3, 1)-SSS based polynomial masking. Taking n = 3 and t = 1, then the three public points (α 1 , α 2 , α 3 ) can be derived by setting α 1 = α i , α 2 = α j and α j = α k , where i, j, k must be distinct integers. Due to the equivalence of the linear codes (Sec. 2.2), we can choose i = 0, 1 ≤ j < k ≤ 254 and obtain 32131 candidates rather than 255 3 = 2731135 in total. Recall that the generator matrices G and H are as in Eqn. 19. Therefore, taking a random mask u 1 , the X is encoded into: For all possible values of α 1 , α 2 , α 3 ∈ F 2 8 , we study the dual distance d ⊥ D and the coefficient B d ⊥ D at both word-level and bit-level. As expected, all codes have the same weight enumerator at word-level (they are all MDS codes and optimal at word-level . The specific properties of the codes are listed in Tab. 2 6 and the MI between the leakages L and X are depicted in Fig. 3. The complete details of all linear codes for the (3, 1)-SSS based masking are available in [CG20]. For the sake of brevity, we put more codes for (3, 1)-SSS and (5, 2)-SSS based masking in Appendix B.4. Table 2: Exhibiting different codes in (3, 1)-SSS scheme generated by Eqn. 20. Note that we take α 1 = α i = 1, α 2 = α j and α 3 = α k . As shown in Tab. 2, for the first time, we exhibit an approach to find the optimal codes for SSS-based masking and present optimal codes for (3, 1)-SSS based masking. Specifically, the code with α 1 = 1, α 2 = α 72 and α 3 = α 80 (in the last column of Tab. 2) is one of the  best candidates for (3, 1)-SSS based masking. In addition, the generator matrices of all three optimal (nonequivalent) codes are shown in Appendix B.3. It is worth noting that the codes obtained by permuting the order of α i for 1 ≤ i ≤ 3 are equivalent, resulting in only three optimal codes for (3, 1)-SSS based masking over F 2 8 . Using the same settings of (3, 1)-SSS based masking as in Sec. 4.4, the results of MI on the information leakages of 3-share and corresponding 2-share combinations are shown in Fig. 4. In each of four cases, the main takeaway point is that given a specific t in (n, t)-SSS based masking, all the more shares leak more key-dependent information. Specifically, we first highlight that the smallest security order determines the side-channel security of SSS-based masking among all n t+1 combinations. In the context of coding theory, the dual distance of n-share SSS-based masking is determined by the minimum value of dual distances in truncated codes D . Two instances are in Fig. 4(b) and 4(c) where the minimum of dual distances are 2 and 3, respectively.
Secondly, when the codes in SSS and its truncated variants have the same dual distance, the parameter B d ⊥ D plays a role in side-channel resistance. More precisely, smaller B d ⊥ D brings improved concrete security for GCM. Two instances are shown in Fig. 4(a) and 4(d) where the dual distances of D are 2 and 4, respectively. Interestingly, a recent work [CS21] provides empirical comparisons on some instances of (2, 1)-SSS and (3, 1)-SSS based masking, which confirms our information-theoretic evaluation.
In summary, the information-theoretic evaluations in Fig. 4 confirms that more redundancy in sharing of GCM would leak more information. Besides, one way to find optimal codes for GCM is to build up from (sub-)optimal choices of the codes with less shares.

Revisiting the Independence Condition
Failing to ensure the independence of the shares can ruin a masking scheme by revealing a lower order of key-dependent leakages than the designed security order. For instance, the unintentional physical coupling [BDF + 17] in the hardware device can combine leakages from different shares, hence degrade the concrete security level of a masked implementation. In this section, we investigate the intra-share independence issue and show the theoretical condition of higher-order security of code-based masking, especially in GCM as it is the most general case.
Another reason why the independence condition might be broken is the existence of glitches. Let us reason on a canonical example, namely that of the exclusive-or (XOR) gate. Let Z 1 and Z 2 be two single-bit shares, which enter an XOR gate. Recall that the leakage function is P = ϕ P • φ P as introduced in Sec. 3. Taking φ P = 1, then the leakage function is the pseudo-Boolean function ϕ P , which lives in F 2 × F 2 → R. It is equal to: This function can glitch because of the term Z 1 × Z 2 . Indeed, if Z 1 changes, then the leading term still depends on Z 2 (derivative). Therefore, glitches are dreadful since they consist in combinations from within the chip, even before the measurement noise arrives.
An Information-Theoretic Evaluation of Intra-Share Independence. We consider the Hamming weight as leakage model in a perfect independent case and take the weighted square of Hamming weight as second-order (non-linear) leakages as follows: where Z i is an -bit share and w is the weight of second-order leakages. As a consequence, P (Z) = φ P (Z) will be the same as Hamming weight model with deg(P ) = 1 if w = 0. Otherwise, there exists a different amount of second-order leakages indicated by w where the degree of P equals 2. The MI results on four candidates of w are shown in Fig. 5 for 4-bit and 8-bit variables, respectively. It is worthwhile to note that in 2-share settings with n = 2 and t = 1, the SSS-based masking can be transformed into IPM by changing the way of involving public parameters α i for 1 ≤ i ≤ n. Essentially, the two schemes are different because of the structure of G and H as in Tab. 1, but are comparable from a side-channel perspective. The first observation from Fig. 5 is that MI increases along with the increasing amount of second-order leakages. More importantly, in the presence of second-order leakages, the security order under the bit-probing model [PGS + 17] (indicated by the slope of MI curves when the noise level is high) decreases by one since the degree of φ P is 2. Similarly, the security order will reduce by two when the degree of φ P equals 3 in the red curves of Fig. 5(b). However, the lowest security order under the bit-probing model is bounded by the Boolean masking under the word-probing model. More precisely, increasing the degree of φ P only affects the intra-share independence and therefore decreases the security order under the bit-probing model, while the degree of ϕ P (e.g., induced by couplings) affects the security order under the word-probing model.

Figure 5:
The intra-share independence issue: the existence of higher-order leakages decreases the security of the corresponding masking scheme (two public parameters are α 1 = α i , α 2 = α j as in Tab. 1). Note that the blue curves are for the Boolean masking.

Differences with [CGC + 21] in Detail
As summarized in Sec. 1.3, this work tackles GCM, which is a more general masking scheme than the one studied in [CGC + 21]. In fact, we utilize the same notion of the numerical degree and a similar coding-theoretic approach as in [CGC + 21], and also the same leakage assessment metrics like SNR and MI. However, generalizing [CGC + 21] to this work is not trivial at all, we show hereafter the technical differences from [CGC + 21]. We first highlight the different constructions of the generator matrices G and H in Tab. 1 for the codes C and D, respectively. Indeed, C and D are not complementary in GCM, while they are complementary in IPM. In this respect, we show that Eqn. 7 is simplified as Eqn. 8 when C and D are complementary, thus we recover the main results in [CGC + 21] (see Remark 1). As a special case, the framework proposed in [CGC + 21] is applicable when C and D are complementary, e.g., when n = t + 1 in SSS-based masking.
Moreover, we prove that GCM requires introducing a more general parameter B d (see Def. 9), which is a novel parameter for linear codes. Particularly, in [CGC + 21] the parameter B d only depends on D. While B d depends on both C and D, which indicates the importance of selecting appropriate candidates for both of them in practice. We also provide efficient magma scripts to evaluate this quantity [CG20].
Finally, we insist that the generalization in this work is a significant improvement that works for all GCMs. Since firstly, we show in Remark 2 that the security order can be greater than the dual distance minus one in GCM, which cannot be explained by the framework in [CGC + 21], but can be explained perfectly by this work in a quantitative manner. Secondly, the redundancies in GCM allow detecting faults (e.g., for glitch-free designs [PR11]), which is currently an active research topic. We leave open the question on the construction of coding-theoretic countermeasures against both side-channel and fault injection attacks for future investigation.

Connections with [CS21]
The SSS-based masking is also the topic of a recent work [CS21], in which Costes et al. showed that the Boolean masking is a special case of SSS-based masking when n ≤ 6. More interestingly, their simulation-based multivariate attacks [BGHR14] confirm our mathematical derivations, in particular, the information-theoretic evaluation in Fig. 4.
More generally, this work provides a unified framework for quantifying information leakage of all GCM instances. As a straightforward application, Theorems 2 and 3 in this paper enable us to explain the empirical observations in practical attacks. For instance, the three codes for (3, 1)-SSS in Fig. 3 of [CS21] correspond to different d ⊥ D and/or B d ⊥ D . However, we stress that the three codes for (2, 1)-SSS in the same figure are not equivalent to each other but have the same d ⊥ D equal to 4 and closely distributed B d ⊥ D ∈ {11, 8, 8}. Moreover, this work presents a systematic way to select optimal codes for SSS-based masking and GCM, which is out of the scope of [CS21].

Efficient Implementations of GCM
In this paper, we optimize security without touching the performances of GCM (there is no tradeoff between security and performance). Our coding-theoretic approach shows that both SNR and MI security metrics concur that dual distance and adjusted coefficient in weight enumerator are the two drivers for security improvements. Essentially, we stick to the definition of GCM (recall the rightmost column in Tab. 1), and propose an effective way to tune the underlying codes.
In terms of performances, they are the same (with respect to memory and speed) as the generic GCM. A more detailed study could consist in attempting to represent the generator matrices G and H as compactly as possible (with as many zeros and ones in coefficients as possible, or with a specific structure, say "cyclic" for instance). Besides, Wang et al. [WMCS20] showed a complementary way to improve the overall performance of GCM implementations by an amortization technique. Both approaches would ease an efficient implementation of GCM, leaving an open problem for future study.

Conclusions and Perspectives
This paper presented a unified approach to quantifying the information leakages of codebased masking in the most general case, namely GCM, which already encompasses many state-of-the-art masking schemes. Firstly, by a uniform representation of encodings in GCM, we proposed a quantitative approach to evaluate the concrete security level of GCM. The signal-to-noise ratio and mutual information are used as two complementary metrics to quantify the lowest degree of key-dependent leakages. By this unified approach, we were able to quantify the impact of different codes in GCM and optimize it by choosing optimal codes for it. Next, we evaluated the impact of public points in Shamir's Secret Sharing in the context of masking. Thanks to the unified analytic approach, we showed the impact of public points in side-channel security orders of the corresponding masking. More importantly, we provided a roadmap to optimal linear codes for designers to optimize the SSS-based masking (also GCM) soundly. Lastly, we revisited the independence condition behind the masking scheme and showed that the intra-share dependence could ruin higherorder security under the bounded moment model. In particular, we showed how the higher-order intra-share leakages affect the side-channel security orders precisely.
However, the construction of optimal codes for a large number of shares is still an open problem. We launched an exhaustive study on (3, 1)-SSS based masking and presented some results on (5,2)-SSS in [CG20]. But the exhaustive enumeration would be computationally infeasible when n gets larger (e.g., n > 8) in SSS-based masking or, more generally, in GCM. A heuristic solution is to construct new (sub-) optimal codes by concatenating two optimal or sub-optimal codes, following a gradient descent idea. Alternatively, constructing the (sub-)optimal codes by an algebraic approach under certain constraints is a promising solution. We will explore both solutions for GCM in the future.

A.1 Proof of Lemma 3
In order to demonstrate Lemma 3, we clarify the computations in V [E [P (Z)|X]] as follows. Let us consider Eqn. 1 in basefield F 2 , and thus let X = F 2 , Y = F t 2 and Z = F n 2 . Moreover, the C and D are expanded into F 2 by using code expansion (Def. 5): • E [P (Z)|X = x] for a given x ∈ X is: • For any variable X, we have that: x, y∈(C ⊥ ∩D ⊥ ) P (x) P (y).
Due to Lemma 1, we have C ⊥ ∩D ⊥ = (C ⊕D) ⊥ in SSS-based polynomial masking, where ⊕ denotes the direct sum operation. Notice that This means that in Eqn. 23, the subtracted terms are already included in the first sum. Indeed, if x ∈ D ⊥ also satisfies x ∈ C ⊥ , then x + y ∈ C ⊥ in the first sum implies y ∈ C ⊥ . Therefore, Eqn. 23 can be rewritten as follows: (24)

A.2 Proof of Lemma 9
Proof. Note that

A.3 Proof of Lemma 10
Proof. By definition, We have: since, according to the inverse Fourier transform (by using Lemma 8), we have: Hence we obtain where C, D are not necessary to be complementary codes and |C||D| = 2 t ≤ 2 n . Indeed, since C is linear, c∈C (−1) (x+y)·c is null when x + y does not belong to C ⊥ and equals the size of C if it does, and the same with D. Note that x, y ∈ D ⊥ and x + y ∈ C ⊥ which implies x + y ∈ C ⊥ ∩ D ⊥ . In summary, we have the following result for E E [P (Z)|X] 2 .

B.1 (3, 1)-SSS based Masking on 4-bit Variables
The information-theoretic evaluations of (3, 1)-SSS based masking over F 2 4 are shown in Fig. 6, which are similar with the results over F 2 8 as in Fig. 3.

B.2 Comparison of MI on 1-D and n-D Leakages
We add more results on MI to compare the efficiency of different combination functions ϕ P in exploiting information leakages. In Fig. 3, we show the advantages to use joint distribution in trivariate leakages. In addition, we compare the two combination function in 2-share cases by plotting MI curves together. As shown in Fig. 7, the combination by using joint distribution is more efficient than the one by using sum in bivariate leakages scenarios. Moreover, this is true for n-variate leakages.
More importantly, the superiority of GCM can be fully unleashed by choosing appropriate codes. In this respect, our leakage quantitation approach is a simple, generic and effective way to choose the optimal codes for GCM.

B.5 A Special Example from [WMCS20]
As shown in Remark 2, there are some cases of GCM in which the side-channel security order can be greater than the dual distance of D minus one. In particular, Wang et al. [WMCS20] presented an example where the generator matrices of C and D as follows, respectively, where C ⊥ is a code with parameters [8, 6, 1] and D ⊥ is of parameters [8, 4, 2]. We have d ⊥ D = d D ⊥ = 2 and B 2 = 1 for D ⊥ . Therefore, there is only one codeword u = [1, 1, 0, 0, 0, 0, 0, 0] ∈ D ⊥ such that w H (u) = 2. Since u is also in C ⊥ , which indicates that B 2 equals 0. As a consequence, applying Theorem 2 gives that SNR equals 0 for deg(P ) = d ⊥ D = 2 under Hamming weight leakages (e.g., P (Z) = w H (Z)) and then the security order is at least equal to d ⊥ D , rather than d ⊥ D − 1. More generally, taking Theorem 1 gives the same conclusion for any leakage function P with deg(P ) = 2.
In particular, we checked that the first nonzero B d ⊥ D for nonzero codewords is B 3 = 3. Therefore the security order is exactly 2 in above example.