Cryptanalysis of Eﬃcient Masked Ciphers: Applications to Low Latency

. This work introduces second-order masked implementation of LED, Midori , Skinny , and Prince ciphers which do not require fresh masks to be updated at every clock cycle. The main idea lies on a combination of the constructions given by Shahmirzadi and Moradi at CHES 2021, and the theory presented by Beyne et al. at Asiacrypt 2020. The presented masked designs only use a minimal number of shares, i.e., three to achieve second-order security, and we make use of a trick to pair a couple of S-boxes to reduce their latency. The theoretical security analyses of our constructions are based on the linear-cryptanalytic properties of the underlying masked primitive as well as SILVER, the leakage veriﬁcation tool presented at Asiacrypt 2020. To improve this cryptanalytic analysis, we use the noisy probing model which allows for the inclusion of noise in the framework of Beyne et al. We further provide FPGA-based experimental security analysis conﬁrming second-order protection of our masked implementations.


Introduction
Ever since the introduction of differential power analysis by Kocher et al. [KJJ99] in 1999, the cryptographic hardware community has been looking for countermeasures to protect embedded devices. The benefits as well as difficulties of masking as a countermeasure against side-channel analysis attacks, have been proven through several scientific articles and experimental investigations. Masked implementations can be made efficient towards a cost function like area or latency, and their security can be proven using abstractions such as the probing model.
Nevertheless, masking typically relies on fresh randomness in order to provide security in the probing model. The generation of this randomness is often costly, and its security requirements are currently not very well known. The cost of this generation is also not often reported in academic literature, leading to a biased view of the efficiency of certain countermeasures.
For first-order security, one can use threshold implementations as proposed in 2006 by Nikova et al. [NRR06]. Thanks to the property that these maskings maintain uniformity, it is possible to build secure masked circuits which do not require any fresh randomness.
are listed in Table 1 on page 696. We should highlight that we verified the second-order security of our masked S-boxes using SILVER [KSM20] under the glitch-extended probing model and of our implementations by FPGA-based practical experiments. The different case studies show that our techniques are applicable to a wide range of symmetric-key primitives. Our hardware implementations (HDL code), are provided in full in GitHub .

Preliminaries
This section recalls a number of standard concepts related to Boolean masking (Section 2.1), as well as a number of useful tools in probability theory (Section 2.2), and information on d + 1 sharings (Section 2.3).

Boolean Masking and Threshold Implementations
Boolean masking is a technique based on splitting each secret variable x ∈ F 2 in the circuit into sharesx = (x 1 , x 2 , . . . , x sx ) such that x = sx i=1 x i over F 2 . A random Boolean masking of a fixed secret is uniform if all sharings of that secret are equally likely.
In this work, we make use of threshold implementations as proposed by Nikova et al. [NRR06]. This approach has been extended to capture higher-order univariate attacks by Bilgin et al. [BGN + 14]. In the following, the main properties of threshold implementations are reviewed.
LetF be a layer in the threshold implementation corresponding to a part of the circuit F : F n 2 → F m 2 . The functionF : , where we assume s x shares per input bit and s y shares per output bit, will be called a sharing of F . The i th share of the functionF is denoted by F i : F nsx 2 → F m 2 , for i ∈ {1, .., s y }. Sharings can have a number of properties that are relevant in the security argument for a threshold implementation; these properties are summarized in Definition 1.
x i ) for all x 1 , . . . , x sx ∈ F n 2 , 2. d th -order non-complete if any function in d or fewer shares F i depends on at most s x − 1 input shares, 3. uniform ifF maps a uniform random sharing of any x ∈ F n 2 to a uniform random sharing of F (x) ∈ F m 2 .

Probability Theory and Fourier Analysis
Throughout the paper, random variables are denoted in boldface. The probability space will always be clear from the context. The average of a random variable x is denoted by Ex. The probability mass or density function of x will be denoted by p x . In the proof of Theorem 1, we will use the Kullback-Leibler divergence from a random variable x to a random variable y on the same probability space. This measure of dissimilarity is defined as the average logarithmic likelihood ratio: D KL (p y || p x ) = E y log(p y (y)/p x (y)) , where log is the natural logarithm and the average is with respect to y.
Most of the probability distributions in this paper are discrete distributions on F n 2 . In the analysis of such distributions, it is often convenient to work with the Fourier transformations of probability mass functions. As will be discussed in Section 3.4, this is closely related to the well-known technique of linear cryptanalysis [TG91,Mat93]. In general, the Fourier transformation of a function F n 2 → C can be defined as in Definition 2 below.
Definition 2. Let f : F n 2 → C a complex-valued function on F n 2 . The Fourier transformation of f is a function f : F n 2 → C defined by Equivalently, f is the representation of f in the basis of functions x → (−1) u x for u ∈ F n 2 . The Euclidean norm of a function f : F n 2 → C will be denoted by f 2 . Since the Fourier transformation is orthogonal up to a factor 2 n/2 , it holds that f 2 = 2 n/2 f 2 . This result is known as Parseval's theorem.

Masking with d + 1 Shares
It has been shown that the implementation cost of threshold implementations is high, particularly at higher orders due to using a large number of input shares, leading to higher area overhead and requiring a significant amount of fresh randomness when composing the functions [MPL + 11, CBR + 15]. More precisely, the number of input shares depends on the algebraic degree of the target Boolean function, which potentially scales to higher implementation costs. There have been two independent works trying to make the number of input shares independent of the algebraic degree [RBN + 15, GMK16]. They proposed methodologies to use d + 1 input shares for d th -order security while maintaining glitch resistance in hardware platforms with the same level of security that threshold implementations offer. Due to using a lower number of input shares, these constructions have smaller area overhead and latency. These techniques generally demand fresh randomness to achieve non-completeness in contrast to threshold implementations where fresh masks might be needed to fulfill uniformity. A masked Boolean function following the d + 1 sharing method is divided into two separate parts by a register layer to avoid the propagation of glitches.
Based on the technique presented by Groß et al. [GMK16], a two-share variant of a 2-input AND gate f (a, b) = x can be realized as: (the horizontal line is purely cosmetic) where a 0 , a 1 , b 0 , b 1 are the input shares, r is a single-bit of fresh randomness, and x 0 , x 1 are the output shares. The functions f l are known as coordinate functions whose result should be stored in registers. The part that generates output shares by XORing the registers' output is known as the compression layer. To achieve first-order d + 1 hardware implementations without fresh randomness, a methodology has been introduced by Shahmirzadi and Moradi [SM21a] which they also extended to second-order designs [SM21b].

A Noisy Probing Model
In this section, an extension of the bounded-query probing model from [BDZ20] will be introduced. In the modified model, the adversary can probe the circuit but it obtains noisy rather than exact results. This is a more realistic model and thus allows for a tighter security analysis.
The noisy probing model resembles the noisy leakage model first introduced by Chari et al. [CJRR99] and extended by Prouff and Rivain [PR13]. The main difference between the two models is in the information given to the adversary. In the noisy leakage model, the adversary is given a noisy function of all wire values in the circuit. In our model, as in the (glitch-extended) probing model, an adversary can only probe the circuit locally. However, unlike in the probing model, the adversaries' probes reveal only a noisy leakage function of the wire values. That makes the model similar to the noisy probing model from Dziembowski et al. [DFH + 16]. However, the models differ in the way noisy leakage functions are defined. In addition, as opposed to the model of Dziembowski et al., our model is purely information-theoretic, non-asymptotic, and limits the number of queries that can be made by the adversary. Moreover, in this work, we apply the noisy probing model to masking applications instead of re-keying applications. Nevertheless, we believe that reusing the term is justified.

Security Model
We first introduce the bounded-query probing model from Beyne et al. [BDZ20]. In this model, the security of a circuit C with input k against a t-threshold-probing adversary is quantified by means of a left-or-right security game as follows. The challenger picks a random bit b and provides an oracle O b , to which adversary A is given query access. The adversary queries the oracle by choosing up to t wires to probe, we denote this set by P, and sending it to the oracle along with the inputs k 0 and k 1 . Note that we consider the input of the circuit to consist of both the plaintext and the key. The oracle responds by giving back the probed wire values of C(k b ). After a total of q queries, the adversary responds to the challenger with a guess for b. For b ∈ {0, 1}, denote the result of the adversary after interacting with the oracle O b using q queries by A O b . For left-or-right security, the advantage of the adversary A is then defined as Since we are working on hardware, we extend the above model to include the effect of glitches. These effects can result in significant leakage that is not accounted for by the standard probing model, see for example the attacks of Mangard et al. on several masked AES implementations [MPO05]. Whereas one of the adversary's probes normally results in the value of a single wire, a glitch-extended probe allows obtaining the values of all wires in a bundle. As glitches occur in the logic between two memory gates, they are stopped by registers. In other words, glitches do not propagate through memory gates. As a result, a glitch-extended probe returns all values leading to the probed wire until registers are reached. This extension was originally proposed in the work of Reparaz  In this work, we adapt the above model by changing the oracle. More specifically, we extend the notion of a probe to a noisy probe. Instead of giving back the exact values on the wire/or bundle, the noisy probe returns a noisy leakage function of the values. The formal definition of noisy leakage functions is given in Section 3.2. The new security model is depicted in Figure 1. The advantage of a noisy t-threshold probing adversary A will be denoted by Adv noisy t-thr (A).
In practice, the above model relates to an attacker performing a t th -order attack on traces. The attacker only has a limited number of traces which relates to a limited number of queries. In the above security model, the adversary can pick two secret values for the masked circuit which resembles a fixed vs. fixed t-test. Because the adversary can pick the secret value, we will model this as a public value throughout the work. Instead, the security of the countermeasure is based on the randomness used to mask it.
In Section 5, we provide several second-order maskings of symmetric primitives. For these case studies we provide upper bounds on the advantage of probing adversaries. Using Figure 1: The privacy model for glitch-extended t-threshold-probing security consisting of a challenger C, an adversary A, a left-right oracle O b , two inputs k 0 , k 1 , a set of probes P, and a noisy leakage function the fact that there is noise on the side-channel measurements we can relax that bound and provide much more efficient randomness-free sharings.

Noisy Leakage Functions
In order to introduce the noisy probing model, it is necessary to introduce the notion of noisy leakage functions. Let d and m 1 , . . . , m d be positive integers. In Definition 3, the Hamming weight of a vector . Furthermore, the set of vectors of weight i will be denoted by B d (i), i.e. the Hamming circle of radius i: Definition 3 provides a quantitative description of noisy leakage functions that will be useful to obtain our main theoretical result regarding the noisy probing model, i.e. Theorem 1. In this definition, f is a random function over the set of functions from to Ω. For example, in the Hamming weight leakage model with additive Gaussian noise Ω = R. The function q f can be interpreted as a measure of similarity between the probability density functions p f (x1) and p f (x2) of the noisy leakage under secrets x 1 and x 2 . The reciprocal noise parameters λ 1 , . . . , λ d then upper bound the Euclidean norm of the restriction of the Fourier transform of q f to Hamming circles of successively increasing weights.
Definition 3 (Noisy leakage function). Let Ω ⊆ R n be a measurable set. A d th order (λ 1 , . . . , λ d )-noisy leakage function f is a random function from with p f (x) the probability density function of f (x).
The noise parameters λ 1 , . . . , λ d characterize the level of the noise, with λ i in particular representing the 'noise-level' when values from i probes are combined. In principle, the noise parameters could be computed empirically from estimates of the probability distributions of the leakage (i.e. trace points) under all possible secrets. A practical evaluation of these parameters is left as future work, and we shall rely on plausible estimates instead.

Relation to Other Leakage Functions
In previous work, other definitions of noisy functions have been proposed, in particular in the context of the noisy leakage model by Duc et al. [DDF14]. There, a statistical distance is used to measure the noise on the random function. Some examples of statistical distances which can be used in this context are found in the work by Prest et al. [PGMP19]. In our work, we deviate from using these statistical distances for a more natural fit for Theorem 1.
The above definition for noisy leakage functions can be computed explicitly for concrete leakage models, as we will illustrate for the "Hamming weight plus Gaussian noise" model. Figure 2 shows the value of λ 1 for the Hamming-weight leakage model defined by f (x) = wt(x) + e with e ∼ N (0, σ 2 ) and wt(x) the bitwise Hamming-weight of x. Unsurprisingly, a larger standard deviation σ results in a larger noise parameter λ 1 .

Noise Amplification
The real-world value of the probing model relies in large part on the premise that higher-order side-channel attacks require a (geometrically) increasing number of traces to perform. Hence, it must be the case that combining information from different probes increases the noise. The following lemma shows that such an amplification of noise indeed occurs, provided that the leakage functions of the probes are independent. The latter assumption is commonly referred to as the independent-leakage assumption and was first stated by Dziembowski and Pietrzak [DP08].
Lemma 1 (Noise amplification). Let f 1 , . . . , f d be mutually independent λ-noisy first-order leakage functions with f i from F mi 2 to Ω for i = 1, . . . , d. The random function g defined by .
A noisy threshold probing adversary A will be said to have independent λ-noisy probes if its probes jointly yield a leakage function g of the form described in Lemma 1. The functions f 1 , . . . , f d can then be interpreted as the leakage functions for the individual probes.

Bound on the Advantage
In order to use the security model from Section 3.1 in practice, it is necessary to be able to bound the advantage of adversaries in terms of some properties of the masking that can be computed or estimated. For the noiseless bounded-query probing model, [BDZ20, Theorem 1] provides such a bound in terms of the linear-cryptanalytic properties of the masking. However, the latter theorem is not applicable to the new noisy probing model from Section 3.1. Hence, in Theorem 1, we provide a generalization of [BDZ20, Theorem 1]. The latter theorem corresponds to the case λ 1 = . . . = λ t = 1.
Similar to [BDZ20, Theorem 1], Theorem 1 below assumes that any probed wire value can be labeled as 'good' or 'bad'. The values labeled 'good' jointly reveal nothing about the secret. The 'bad' values may reveal secret information, but the leakage can be bounded in terms of λ 1 , . . . , λ t and ε 1 , . . . , ε t . The parameters λ 1 , . . . , λ t are determined by physical aspects such as the leakage model and noise level. The parameters ε 1 , . . . , ε t are instead determined by the mathematical properties of the masking. Specifically, it will be shown in Section 3.4 how these parameters can be determined using linear cryptanalysis.
Theorem 1. Let A be a noisy t-threshold-probing adversary for a circuit C. Take λ 1 , . . . , λ t ≥ 1, and ε 1 , . . . , ε t ≤ 1 as non-negative real numbers. Assume that for every query made by A on the oracle O b with result z, there exists a partitioning (depending only on the probe positions) of the probed wire values into two random variables x ('good') and y ('bad') such that 1. The noisy leakage function f such that z = f (x, y) is (λ 1 , . . . , λ t )-noisy.
2. The conditional probability distribution p y|x satisfies E x p y|x 1 Bt(d) 3. Any t-threshold-probing adversary for the same circuit C and making the same oracle queries as A, but which only receives the 'good' wire values ( i.e. corresponding to x) for each query, has advantage zero.
The advantage of A can be upper bounded as where q is the number of queries to the oracle O b .
The proof of Theorem 1 relies on the following technical lemma. Informally, for a (λ 1 , . . . , λ d )-noisy leakage function f , Lemma 2 upper bounds the dissimilarity between f (x) and f (x ) with x and x independent random variables and x uniform random. The dissimilarity is measured using the Kullback-Leibler divergence and the bound is expressed in terms of the noise parameters λ 1 , . . . , λ d of f and the Fourier transformation of the probability mass function of x.
Lemma 2. Let x be a random variable on V = d i=1 F mi 2 with probability mass function p x and f a (λ 1 , . . . , λ d )-noisy leakage function from V to Ω. Let x be random variable uniform random on V and f a noisy leakage function independent from and identically distributed as f . It holds that Proof. Let y = f (x) and y = f (x ). The goal is to upper bound the Kullback-Leibler divergence (see Section 2.2) By the law of total probability, it holds that The values p 2 y (y) will now be computed. By the law of total probability, it holds that where the second equality follows from p x (0) = |V | and x∈V p f (x) (y) = E|f −1 (y)|/|V |. Hence, Next, we consider the integral of |V | p 2 y (y)/E|f −1 (y)| with respect to y. By linearity of integration, it suffices to consider each of the above three terms separately. For the first term, it holds that Indeed, conditioned on f , every x ∈ V is mapped to exactly one y ∈ Ω under f . The second term is zero since Indeed, Ω y dy ≡ 1 and 1(u) = 0 for all u = 0. Summing the three terms, it follows that where the inequality log(1 + x) ≤ x was used. By expanding the square, it follows that where q f is the function defined in Definition 3: Grouping by weight and applying the Cauchy-Schwarz inequality yields by definition of the noise parameters λ 1 , . . . , λ d (Definition 3).
The proof of Theorem 1 can now be stated. The first part of the proof is similar to that of the original theorem for noiseless t-threshold-probing adversaries from [BDZ20] and consists of a standard game-hopping argument. The second part of the argument is based on Lemma 2, which replaces the simpler [BDZ20, Lemma 1].
of Theorem 1. Consider the following two additional games: 1. Game 't-thr-good' is a modification of the t-threshold probing game in which the oracle O b replaces the 'bad' values in each query by uniform random values. In this game, A essentially only receives information about the 'good' wire values.
2. In the game '∆-bad', the adversary chooses a secret input k and is given access to an oracle with the same noisy t-threshold-probing interface as O b . This oracle is either a noisy t-threshold-probing oracle for the real circuit with input k, or a modification thereof in which the 'bad' values in each query are replaced by uniform random bits. The goal is to distinguish between these two cases.
We construct an adversary B for the game '∆-bad' by running the noisy t-threshold-probing adversary A. Specifically, B picks a uniform random bit b and forwards the corresponding secret k b chosen by A to its challenger. Adversary B reports the oracle as real if and only if A correctly recovers b. Hence, by the triangle inequality, The factor two in front of Adv ∆-bad (B) is due to our definition of 'advantage', i.e. the absolute difference between the winning and failure probabilities of B. It is given that Adv t-thr-good (A) = 0, so it suffices to upper bound Adv ∆-bad (B). Denote the result of the i th query of B to its oracle by z i when B interacts with the real noisy threshold probing oracle and by z i when B interacts with the (partially) randomized oracle. Let δ TV (·, ·) denote the total variation distance and the tensor product. The distinguishing advantage of the adversary B is then upper bounded by where the second inequality is due to Pinsker. Since B makes exactly the same queries to its oracle as A, the wire values probed in the i th query of B can also be partitioned into 'good' and 'bad' wire values. Denote these values by x i and y i respectively when B is interacting with the real threshold probing oracle, and by x i and y i when B interacts with the (partially) randomized oracle.
There exists a (λ 1 , . . . , λ t )-noisy leakage function f i such that the result of the i th oracle query satisfies with f i and f i independent and identically distributed. By definition of '∆-bad', the random variables x i and x i have the same probability distribution. Consequently, Up to an inconsequential reordering of bits, the values (x i , y i ) can be considered to be elements of It then follows from (2) that Hence, we conclude that

Cryptanalysis of Higher-Order Threshold Implementations
The security bound obtained in Theorem 1 depends on the parameters ε 1 , . . . , ε t . These values will be determined by performing linear cryptanalysis of the masked cipher. This section provides a brief summary of the main concepts from [BDZ20] that are necessary for this analysis. Since the maskings in this paper target second-order security, we assume t = 2.

Linear Masking Schemes
For any linear masking scheme, there exists a vector space V ⊂ F 2 of valid sharings of zero. More specifically, an F 2 -linear secret sharing scheme is an algorithm that maps a secret x ∈ F n 2 to a random element of a corresponding coset of the vector space V. Let ρ : F n 2 → F 2 be a map that sends secrets to their corresponding coset representative. For convenience, we denote V a = a + V.
LetḠ be a correct sharing of a function G : F n 2 → F n 2 in the sense of Definition 1. Fix any x ∈ F n 2 and let a = ρ(x) and b = ρ(G(x)). The correctness property implies that

Linear Cryptanalysis of Masked Ciphers
Linear cryptanalysis is closely related to the propagation of the Fourier transformation of a probability distribution under a function F : V a → V b . This leads to the notion of correlation matrices due to Daemen et al. [DGV94]. The action of F on probability distributions can be described by a linear operator. The coordinate representation of this operator with respect to the standard basis {δ x } x∈V may be called the transition matrix of F . Following [Bey18], the correlation matrix of F is then the same operator expressed with respect to the Fourier basis. The (absolute) correlation matrix of a sharing can be defined as follows. Note that it only depends on the spaces V a and V b , not on the specific choice of the representatives a and b.

Definition 4 (Correlation matrix). For a subspace
In Definition 4, the vector space the expression u x is well-defined. Consequently, Definition 4 above is proper.
The relation between Definition 4 and linear cryptanalysis is as follows: the coordinate |C F v,u | is equal to the absolute correlation of a linear approximation over F with input mask u and output mask v. That is, 1| for x uniform random on V a . An important difference with ordinary linear cryptanalysis is that, for shared functions, the masks u and v correspond to equivalence classes. This formalizes the intuitive observation that masks which differ by a vector orthogonal to the space V lead to identical correlations.
The link between ε 1 and ε 2 and linear cryptanalysis is completed by Theorem 2 below. It shows that the coordinates of p z are entries of the correlation matrix of the statetransformation between the specified probe locations. In Theorem 2, the restriction of x ∈ V a to an index set This definition depends on the specific choice of the representative a, but the result of Theorem 2 does not.
Theorem 2 relates the linear approximations of F to p z (u, v) and hence provides a method to upper bound p z 1 B2(2) 2 and therefore ε 2 based on linear cryptanalysis. Upper bounding the absolute correlations |C F v,ũ | is nontrivial in general. However, the piling-up principle [Mat93,TG91] can be used to obtain heuristic estimates.
Finally, note that the maskings discussed in Section 5 do not necessarily satisfy the uniformity property (see Definition 1) in each layer -but if necessary, we extend the adversary's probes to guarantee the uniformity of the probed values. This implies that p z 1 B2(1) 2 = 0 and consequently ε 1 = 0.

Masking Techniques
Shahmirzadi and Moradi [SM21b] presented second-order sharings of Skinny, Midori, Present, and Prince that require only a few bits of randomness per cycle. We use the theory from Beyne et al. [BDZ20] (summarized in Section 3.4) to create second-order sharings which require no fresh randomness. To that end, we propose two techniques to mask S-box layers. The first technique splits each quadratic function into two stages and uses non-completeness and uniformity properties. The second method builds further on the first by pairing masked S-boxes to reduce their latency.

Technique 1: Non-Completeness over Two Stages
In order to apply the theory of Beyne et al., the masking of the S-box must have a sufficiently small maximum absolute correlation in the sense of Definition 4. Although the work of Shahmirzadi and Moradi gives efficient uniform sharings, their sharings do not have this property. For example, consider their uniform sharing of the AND gate using two bits of randomness (r 0 , r 1 ) [SM21b, Sect. 3.2.1]: Consider the first output share after two cycles: Given that b is considered a constant, the above sharing only linearly combines its input shares. Instead, we want nonlinear terms, such as a 0 b 0 , to occur in the output. This property will result in Lemmas 3 to 6 in Section 5 and will be used to guarantee the multivariate security of our maskings. We thus search for uniform sharings of quadratic functions which have this nonlinearity property. In this work, we use the first-order non-complete and uniform maskings by Bilgin et al.
[BNN + 12] as a starting point to create low-randomness second-order maskings.
The original work of Beyne et al. requires that each stage in the masked design is second-order non-complete and uniform. This requirement comes at a high price in terms of the number of shares and thus in area cost. Instead, we relax this requirement by making sharings which are uniform only every two stages similar to the sharings by Shahmirzadi and Moradi. Since we will reuse randomness between S-boxes, we additionally require that our sharings still achieve first-order non-completeness after two stages.
We start from the non-complete and uniform sharings of Bilgin et al. and divide them into two stages such that each stage is second-order non-complete using the ring re-masking technique of Reparaz et al. [RBN + 15], but we only refresh the cross terms. We illustrate our approach by giving a uniform masking of an AND-XOR gate, which maps (a, b, c) to ab + c. The masking uses random bits r 0 , . . . , r 5 .
Each arrow denotes a register stage. Note that the first output share equals a 0 b 0 + a 0 b 1 + a 1 b 0 + c 0 + r 1 + r 5 which is still non-complete, but now also contains nonlinear terms.
In order to ensure the second-order non-completeness of the second stage of the sharing, we used fresh randomness in a ring refreshing configuration. Using the theory from Section 3, it will be shown in Section 5 that this randomness can be reused in every S-box. This can be seen as follows. Probing two S-boxes with jointly uniformly shared inputs as in Figure 3 only gives a non-complete set of input shares due to the masking's properties even when removing the randomness r from the construction.

Technique 2: Paired Masked S-boxes
The technique from Section 4.1 allows us to reduce the number of shares and still create randomness-free second-order sharings. Indeed, in Section 5 we will apply the above technique to design low-area, low-randomness sharings of LED, Midori-64, Skinny, and Prince. However, each quadratic function requires two register stages and this increases the latency of the masking.
Removing the second register layer breaks the non-completeness property over two stages that was outlined above, causing a failure to achieve second-order probing security. Adding randomness to the design is not an option as we intend to reuse all randomness in each S-box. Instead, we pair two S-boxes that use each other's inputs to help achieve second-order probing security. We also use different randomness for each S-box in the pair, but this randomness can be re-used for each pair. The overall configuration is depicted in Figure 4.
Essentially, the idea is that if two probes are placed in the same pair of masked S-boxes, the randomness (r 0 , r 1 ) ensures that the probes do not observe secret information. When probing two different pairs using the same randomness, the inputs from the paired S-boxes act as fresh randomness.
The above trick should remind the reader of the changing of the guards technique due to Daemen [Dae17], as we are using the inputs of one masked S-box in another one. However, here we are not solving the uniformity of the sharing. Instead, we are using the inputs to ensure the non-completeness property of the sharing.
However, we should be careful with the above trick as the second property required to apply the theory by Beyne et al. is that the masking needs to have good diffusion. It will be shown in Section 5 that this helps to increase the number of active masked S-boxes. In case the output of the masked S-box depends on its paired second S-box, the diffusion properties of the masked cipher are altered. Instead, we add the inputs of a second S-box such that the dependency disappears after two stages. Below, we provide an example of the masking technique for two paired AND-XOR gates: → y 0 f 1 (k 0 , l 1 ) = k 0 l 1 + r 11 + r 6 + a 0 → y 1 y 0 + y 1 + y 2 = y 0 f 2 (k 1 , l 0 ) = k 1 l 0 + r 6 + r 7 + b 0 → y 2 f 3 (k 1 , l 1 , m 1 ) = k 1 l 1 + m 1 + a 0 + b 0 → y 3 f 4 (k 1 , l 2 ) = k 1 l 2 + r 7 + r 8 + a 0 → y 4 y 3 + y 4 + y 5 = y 1 f 5 (k 2 , l 1 ) = k 2 l 1 + r 8 + r 9 + b 0 → y 5 f 6 (k 2 , l 2 , m 2 ) = k 2 l 2 + m 2 + a 0 + b 0 → y 6 f 7 (k 0 , l 2 ) = k 0 l 2 + r 9 + r 10 + a 0 → y 7 y 6 + y 7 + y 8 = y 2 f 8 (k 2 , l 0 ) = k 2 l 0 + r 10 + r 11 + b 0 → y 8 In red, we denote the added randomness which can be re-used for each pair of AND-XOR sharings. In blue, we denote the paired gate's input. The diffusion of the masking remains unaffected, for example, the output x 0 = a 0 b 0 + a 0 b 1 + a 1 b 0 + c 0 + r 1 + r 5 does not depend on (k 0 , l 0 ).
In Section 5, we apply the technique requiring two register stages per quadratic function and the paired S-box technique to several case studies. More specifically, we investigate LED in Section 5.1, Midori in Section 5.2, Skinny in Section 5.3, and Prince in Section 5.4.

Case Studies
In this section, we apply the two masking techniques from Sections 4.1 and 4.2, to the ciphers LED, Midori, Skinny, and Prince. We use SILVER [KSM20] to study the security of the masked S-boxes and the theory from Section 3 for the security analysis of the entire masked primitive. Based on this analysis, we can conclude that our maskings remain secure even against a noisy-probing adversary that can make up to 100 million (2 27 ) queries. The practical analysis in Section 6 supports this conclusion.
In the rest of the paper, we use the same notation as Bilgin et al. [BNN + 12], who classified all 4-bit invertible S-boxes and analyzed their maskings. Based on the study, 4-bit quadratic bijections are classified in six classes up to affine equivalence, namely Q 4 4 , Q 4 12 , Q 4 293 , Q 4 294 , Q 4 299 , and Q 4 300 . This classification was also provided for cubic bijections, denoted by C 4 i .

LED
LED is a 64-bit block cipher designed by Guo et al. [GPPR11]. The cipher's state is divided into 16 four-bit cells. The variant considered here has a 128-bit master key, from which subkeys are derived using a nibble-wise permutation. The cipher consists of 12 steps, each comprising four rounds. These rounds consist of the parallel application of the Present S-box [BKL + 07], a ShiftRows step and a column-by-column multiplication with an MDS matrix.
Masking. The S-box S is given by the hexadecimal lookup  : (a, b, c, d) → (a, d, b In our first design, we make a second-order probing-secure masked version of the S-box in such a way that it remains first-order secure without fresh masks. This enables us to reuse the fresh masks in all S-boxes and in all rounds, which will be discussed in more detail in the security analysis paragraph below. To guarantee the security of the design, the outputs of A 1 , A 2 , and the second application of Q 4 12 should be stored in registers. Otherwise, the first-order non-completeness would be violated. Additionally, the masking of Q 4 12 also needs a register layer before compression, leading to a 5-stage design of the masked LED S-box. The full description of the coordinate functions and how they are compressed together is given in Appendix A. To improve the latency, we can apply our second masking technique (introduced in Section 4.2) and remove two register layers to make a 3-stage design. To this end, we integrate the middle and output affine functions, A 2 and A 3 , at the output of the quadratic functions. Namely, c + d, bd + c, a, bd + cd + a + b), The second-order probing secure maskings of F and G are given in detail in Appendix C.
Note that the composition of A 1 and Q 4 12 contains a coordinate function containing all quadratic monomials in three input variables. Hence, it does not have a 3-share non-complete and uniform sharing [BNN + 12]. For this reason, we implement the input affine map A 1 separately such that the output of A 1 is stored in a register to ensure non-completeness. We then pair two S-boxes, following the second technique described in Section 4.2 and add randomness to the paired design. These random masks can be reused for each pair of S-boxes in the entire encryption.
Architecture. The design architecture of our fully-pipelined round-based second-order LED is depicted in Figure 5. In this design, spanning 5 clock cycles, no S-boxes are paired and each S-box is implemented independently. Note that we do not place any further registers to implement the cipher as one of the register stages can be seen as the state register. As stated before, the fresh masks for one S-box can be re-used for all S-boxes in all rounds. Hence, we generate 24 random bits at the start and store them for the entire encryption, avoiding the need to update the random bits every clock cycle. The structure of the second design is similar to Figure 5, except that two S-boxes are paired together and each round requires two fewer clock cycles, i.e. A 2 and A 3 are integrated into the quadratic bijections and the register layers after them are removed. Since we cannot share random bits between the paired S-boxes, we need 72 random bits at the start of each encryption. These bits remain the same throughout the execution of the cipher like in the other design. The synthesis results for our designs are shown in Table 1. Using polynomial masking as the underlying masking scheme, the authors of [CBRN14] presented a second-order secure PRESENT S-box whose randomness complexity and latency are extremely high. Recently, a design with 7 input shares has been proposed by Beyne et al. [BDZ20] with a high area overhead. Notably, our designs need fewer fresh masks with lower area and higher throughput. Note that the number of fresh masks reported in Table 1 includes the initial sharing and the key schedule. Security Analysis. We first assess the security of one round of the maskings. For this, we use the SILVER verification tool [KSM20]. Since the paired S-box is too large for the tool to handle, we have split the verification into two parts. For the first part, we removed the fresh masks from our S-box construction and checked their first-order security in the glitch-extended probing model. Second, we verified the second-order security of one S-box with fresh masks. In both cases, the security was confirmed by SILVER. Due to the first part, placing probes in two different pairs is covered. The second part of the verification together with using fresh randomness in one pair covers the case where the two probes are placed in the same pair. We conclude that all probe positions in a single round of the primitive can be labeled as 'good' when applying Theorem 1. Since the key schedule is an affine function, a probe on the key-addition operation depends on at most one share of the key. As a result, we can consider the shares of the key as 'good' values when applying Theorem 1. The same reasoning applies to the additional randomness that is reused across S-boxes.
Before Theorem 1 can be applied to obtain a bound on the security against noisy 2-threshold probing adversaries, the analysis of the 'bad' wire values must be completed. This is the most difficult part of the security analysis and requires determining ε 1 and A single probe on the LED state results in a value that depends either on one column of the state (when placed at the MixColumns step) or on the input or output of two (paired) S-boxes in the same column. Hence, it suffices to consider masks (in the sense of linear cryptanalysis) that activate at most one column of the input and output state of the trail. As a result, we search for the activity pattern between two columns of the LED state activating the least number of S-boxes. This analysis was already made for the LED masking of Beyne et al. [BDZ20]. There, it was shown that all such two-and three-round trails have correlation zero. For more than three rounds, the best linear trails span four rounds and activate 24 S-boxes. This follows from the design of LED, which follows the wide-trail strategy [DR01]. An example of the activity pattern of such an optimal trail between two columns is shown in Figure 6.
Thus, for probes placed in rounds i and i + r with r ≥ 3, the relevant linear trails all have at least 24 active S-boxes. Hence, using Lemma 3, the correlations of these trails are bounded by 2 −48 . By the piling-up principle, we expect a similar bound for the correlation of linear approximations. It then follows that the 2-norm of the nontrivial Fourier coefficients of the observed bits z can be upper bounded by where we have used the inequality |supp p z | ≤ 2 64 , which follows from the fact that the observed value z consists of at most 64 bits in the glitch-extended probing model: if a coordinate inS is read, at most 12 shares are learned; if an output of the shared linear layer is probed, at most 32 shares are observed.
The above analysis motivates the following security claim, which relies only on the accuracy of the piling-up principle (that resulted in the estimate ε 2 ≤ 2 −32 ).
Due to the above security bound, the advantage of any noisy-probing adversary making at most 2 27 queries is at most 2 −8 as long as λ ≥ 2 6 . Based on Figure 12d from the practical evaluation on FPGA, we observe third-order univariate leakage of the masked Prince with 100 million traces. Assuming similar leakage for the masked LED, we can assume that there is a third-order distinguisher with advantage one. As a result, we have that 1 ≤ 2q/λ 3 for q = 2 27 and thus λ ≤ 2 9 . Note that we expect much higher noise parameters in ASIC implementations.

Midori
In this section, we consider the Midori-64 block cipher by Banik et al. [BBI + 15]. It has a 64-bit state that is divided into 4-bit cells and a 128-bit key. Round keys are derived by alternately using the left or right half of the master key. Midori-64 uses a 4-bit cubic S-box which can be decomposed into two quadratic functions. The diffusion layer consists of a permutation of the 4-bit cells and the application of an involutive binary quasi-MDS matrix. We note that Midori-64 has been broken by Todo et al. [TLS16]. We choose this cipher to provide more variety in the applications of our masking techniques, but we do not recommend the use of the cipher.
Masking. Midori's 4-bit S-box S is given by the lookup table cad3ebf789150246 and is affine equivalent to the class C 4 266 . We use the same S-box decomposition as presented by Moradi et al. [MS16]. Namely, we can decompose the S-box as S = A 3 • Q 4 12 • A 2 • Q 4 12 • A 1 with A 1 : (a, b, c, d) → (b, a, d, a + c), (0a1b82934e5fc6d7) The general structure of our 4-stage masked S-box can be seen in Figure 7. Notably, the output of A 1 and A 2 should be stored in registers. The masking of Q 4 12 also needs a register layer before compression. Their sharing is given in Appendix A. This would lead to a design with a latency of 4 clock cycles.
To reduce the latency of our design by removing the register layer after A 2 , we follow the technique described in Section 4.2 and pair two masked S-boxes. More precisely, similar to LED, we integrate A 2 (respectively A 3 ) with Q 4 12 and define bd + cd + b, a, bd + a + c + 1), If we would integrate the affine map A 1 into the first quadratic bijection Q 4 12 , then we would face the same difficulty as observed for LED. That is, all quadratic monomials of three input variables would appear in one coordinate function. Hence, we keep the affine map A 1 in the decomposition and store its output in a register. Since we removed the register after the compression layer of F , we pair two S-boxes to ensure their second-order probing security. As discussed in Section 4.2, this requires adding independent fresh masks to the two S-boxes in the pair. These fresh masks can be reused for all S-box pairs in the cipher. The maskings of F and G are given in Appendix E.
Architecture. The design architecture of our round-based second-order Midori-64 implementation, which supports both encryption and decryption, is illustrated in Figure 7. Note that in this design, the affine functions are implemented separately and the S-boxes are not paired. In this design, 24-bit fresh masks are generated at the start of encryption and remain unchanged during the execution. These fresh masks are used in all S-boxes. The same holds for the low latency design. However, the number of fresh masks is higher, i.e. the design should receive 72-bit fresh randomness along with the shared input and key. The synthesis results for this design can also found in Table 1. As a comparison to the state of the art, a second-order secure Midori-64 is presented in Shahmirzadi and Moradi [SM21b]. It requires more randomness than our designs. Furthermore, our designs have roughly the same delay and area overhead. Security Analysis. We first investigate the probing security of our S-box designs and of one round of the masked primitive. This verification is performed by the SILVER tool [KSM20] and was done in the same way as for LED. From the verification, we can conclude the second-order probing security of one round of the masked LED. Hence, we conclude that all probe positions in a single round of the primitive can be labeled as 'good' following Theorem 1. Similarly, since the key schedule is an affine function, the same labeling applies for it.
As for LED, the perfect first-order security of the masking implies ε 1 = 0. To estimate ε 2 , a similar argument as in the case of LED will be used. However, there is one important novelty in the analysis of Midori below: rather than assuming that the entire key is constant, which was sufficient for the analysis of LED, we rely on the fact that the adversary can observe only a few key bits. This significantly reduces the correlation of the best trail.
We first compute the maximum absolute correlation of the masked Midori S-box. The following lemma provides an upper bound, which can be verified using the software in the supplementary material.
Lemma 4. LetS : V a → V b be any restriction of the sharing of the Midori S-box S defined above. Denote its absolute correlation matrix by |CS|. For any u, v ∈ F 2 /V ⊥ such that u = 0, it holds that CS u,v ≤ 2 −2 . with only one active column in the input-and output masks. If the randomness of the keys k 1 and k 2 is taken into account, then the trail has correlation zero.
As in the analysis of LED, a probe can only activate one column of the Midori state. For fixed keys, the best trails between two columns essentially follow the ones given by Banik et al. [BBI + 15, Fig. 9] and span four rounds activating 15 S-boxes. Figure 8 depicts such a trail. Although the correlation of this trail is quite small (≤ 2 −30 ), this is not sufficient to obtain a good bound on ε 2 due to the potentially large size of supp p z .
However, taking into account the randomness of the key, the correlation of all four and five round trails is necessarily zero. Indeed, each of the adversary's probes reveals bits from at most one cell of the shared round keys k 1 and k 2 (note that these bits are labeled 'good'). To ensure a zero mask on the key k 1 , the masks u and v indicated in Figure 8 must satisfy u i = v i for all cells i ∈ {1, . . . , 16} \ {j}, where cell j is the probed cell. Since only column j of u can be nonzero, the same must be true for v. However, this is impossible by the two-round diffusion of Midori, which ensures that at least three columns of v are nonzero. This implies that the correlation of any trail over four rounds is zero. The same reasoning applies to five rounds; the only difference is that two cells of k 1 are known and two columns of u and v can be nonzero.
Banik et al. [BBI + 15,Tbl. 7] showed that a linear trail over 5 rounds of Midori must activate at least 23 S-boxes. Hence, for probes placed in rounds i and i + r with r ≥ 6, the relevant linear trails all have at least 23 active S-boxes. Note that this is an underestimate, since it does not take into account the influence of the key-addition and the fact that only one column of the input and output state can be active. Based on this, it can be concluded that the absolute correlation of the best trail is at most 2 −46 . It follows that the 2-norm of the nontrivial Fourier coefficients of the observed bits z can be upper bounded by where we have used the inequality |supp p z | ≤ 2 48 , which follows from the fact that the observed value z consists of at most 48 bits in the glitch-extended probing model: if a coordinate inS is read, at most 12 shares are learned; if an output of the shared linear layer is probed, at most 24 shares are observed.
The above analysis motivates the following security claim, which relies only on the accuracy of the piling-up principle.

Security Claim 2. Let
A be a noisy 2-threshold probing adversary for the masking of Midori described in this section. If A makes at most q queries and the probes of A are independent and λ-noisy, then the advantage of A is bounded by (assuming piling-up) Adv noisy 2-thr (A) ≤ q λ 2 2 43 .
Given that the adversary can make up to 2 27 queries, even without noise the above bound yields a maximum advantage of 2 −8 . The noise can only decrease the advantage of the adversary.

SKINNY
In this section, we consider the tweakable block cipher Skinny from Beierle et al. [BJK + 16] with a 64-bit state and a 64-bit tweakey. The state is divided into 4-bit cells which are processed using the cubic Piccolo S-box [SIH + 11], a ShiftRows operation and a lightweight matrix multiplication with the columns of the state. This matrix only has branch number two. The tweakey is processed using affine operations.
Masking. The S-box S of Skinny is given by the lookup table c6901a2b385d4e7f and belongs to the cubic class C 4 223 . We use the same decomposition as Shahmirzadi and Moradi [SM21b], namely S = A 3 • Q 4 294 • A 2 • Q 4 294 • A 1 . All affine functions are bit-permutations up to addition by a constant. The affine functions are defined as The full description of the coordinate functions and how we realized a second-order secure version of Q 4 294 is provided in Appendix B. The general structure of the 4-stage masked S-box is similar to Midori's S-box. More precisely, we placed register layers at the input and output of the first application of Q 4 294 as well as before the compression layers. However, we add 16 bits of additional fresh masks to refresh the output of every pair of S-boxes in a column to compensate for the weak diffusion layer of Skinny. Note that it is also possible to avoid this extra randomness, as the randomness of the masked key can serve as a substitute as in the analysis of Midori in Section 5.2. However, recall that the tweak part of the tweakey can be public and does not necessarily need to be masked. As a result, the complete construction uses a total of 40-bits of fresh masks.
We then used our second masking technique to improve the latency by pairing two S-boxes similar to Midori-64 and LED. Namely, the first application of Q 4 294 in the first S-box receives one share from the first application of Q 4 294 in the second S-box as input and vice versa. We note that, since all affine layers are bit-permutations, they do not need to be integrated with the quadratic functions in the S-box. The sharing of this construction is provided in Appendix D. In this way, we can omit the register layer after the first Q 4 294 . In summary, we provide a three-share second-order probing-secure realization of the Skinny S-box with 3 register stages making use of 48-bit fresh masks. Again, an additional 16-bits of fresh randomness should be used in the design to refresh the output of each pair of S-boxes to avoid multivariate leakage. The total of 64-bit fresh randomness can be re-used in every pair of S-boxes.
Architecture. The design architecture of our fully-pipelined round-based second-order Skinny-64 is depicted in Figure 9. Each round is performed in 4 clock cycles. Note that no further register layer is necessary and one of the register layers in the S-box construction can be seen as the state register. As stated before, 40-bit fresh masks should be given to the design at the start of the encryption. The low latency design needs one fewer clock cycle per round at the cost of additional fresh randomness, i.e. 64-bit fresh randomness should be fed to the design. Table 1 shows the corresponding performance figures. Recently, a secondorder secure design of Skinny-64-64 was presented by Shahmirzadi and Moradi [SM21b]. It requires more fresh masks per encryption than our design. Furthermore, our low latency design is 25% faster in the terms of clock cycles. Note that the Skinny key schedule is a S-box r r r r r Figure 9: Design architecture of our round-based second-order Skinny-64 encryption function.
linear function and other variants with larger key sizes can be easily implemented even though we only constructed Skinny-64-64, i.e. with a 64-bit key size.
Security Analysis. We start with the security analysis of one round of the primitive. This has been analyzed by SILVER [KSM20] following the approach outlined in the security analysis of LED. However, for this case we were also able to evaluate the second-order probing security of the paired S-boxes by SILVER due to the fact that the Skinny S-box is rather simple. Moreover, the number of fresh masks is also lower compared to our Midori and Present S-box low latency designs. We can thus label all probe positions placed in a single round of the masked Skinny as 'good' when applying Theorem 1. Again, since the tweakey schedule is an affine function, we can also consider the shared tweakey variables as 'good'. We then switch to the multiple-round probing security analysis and consider the 'bad' probe positions for Theorem 1. Since the masking is perfect first-order secure, we have that ε 1 = 0. We now investigate ε 2 . We first bound the maximum absolute correlation of the masked Skinny S-box in Lemma 5. This result can be verified using the software in the supplementary material of the submission.
Lemma 5. LetS : V a → V b be any restriction of the sharing of S defined above. Denote its absolute correlation matrix by |CS|. For any u, v ∈ F 2 /V ⊥ such that u = 0, it holds that CS u,v ≤ 2 −2 .
As mentioned above, we add some additional randomness to avoid trails with nonzero correlation over two or three rounds of Skinny resulting from two probes. In particular, we add these 16 extra random bits after each S-box pair and use a development version of ArxPy 2 , equipped with the SMT solver Boolector [NPB15], to search for the best trail resulting from two probes, taking into account that only one column of the state can be active in the input-and output masks. Remarkably, we find that the best trail found by the software covers 6 rounds and activates at least 34 masked S-boxes. Hence, the correlations of these trails are bounded by 2 −68 . It then follows that ε 2 can be upper bounded by  where we have used the inequality |supp p z | ≤ 2 24 , which follows from the fact that the observed value z consists of at most 24 bits in the glitch-extended probing model: if a coordinate inS is read, at most 12 shares are learned; if an output of the shared linear layer is probed, at most 12 shares are observed. The above analysis motivates the following security claim, which relies only on the accuracy of the piling-up principle.

Security Claim 3. Let A be a noisy 2-threshold probing adversary for the masking of Skinny described in this section. If A makes at most q queries and the probes of A are independent and λ-noisy, then the advantage of A is bounded by (assuming piling-up)
Adv noisy 2-thr (A) ≤ q λ 2 2 111 .
It is clear that, even without noise, the above bound achieves the desired security level for up to q ≤ 2 27 queries.

PRINCE
Prince is a low-latency and energy-efficient cipher introduced by Borghoff et al. [BCG + 12]. The cipher's state is divided into 16 four-bit cells and the cipher uses a 128-bit key which is split in two 64-bit subkeys. The nonlinear layer of Prince uses a cubic S-box which can be split into three quadratic functions and whose inverse is affine equivalent to itself. The diffusion layer consists of a ShiftRows step mixed with an involutive quasi-MDS matrix.
Masking. Prince uses both its S-box S, given by bf32ac916780e5d4 as a lookup table, and its inverse S −1 , given by b732fd89a6405ec1, during encryption and decryption. The S-box and its inverse are both affine equivalent to the cubic class C 4 223 [MS16]. Based on the study published by Moradi and Schneider [MS16], we can decompose the inverse S-box as We can use the same sharing of Q 4 294 as in our masking of Skinny to make a secondorder masking of the inverse S-box using 38 bits of fresh masks (see Appendix B). However, we make a slight change to the middle quadratic function: we moved some linear terms of the direct sharing in order to improve the maximum absolute correlation of the S-box (Lemma 6). The general architecture of our design is shown in Figure 10. Namely, each masking of Q 4 294 requires a register layer before compression. Further, a register layer should be placed at the output of A 1 , A 2 , and A 3 . This results in a design with 6 register stages. To avoid leakage at the reflection layer of Prince, we add 32 bits of extra fresh randomness after the MixColumns operation. More precisely, we fully refresh a column consisting of 4 nibbles and re-use the same masks to refresh the other columns as well. It is interesting to note that Princev2 [BEK + 20] does not require this extra randomness due to its key addition in the reflection layer. Additionally, the alternating use of two round keys eases the security analysis of its masking.
Finally, note that the S-box and its inverse are affine equivalent as S = A • S −1 • A which allows us to also implement the S-box proper. The affine layer A is given by   A : (a, b, c, d) → (a + b + d + 1, a + 1, d, c + 1). (b8a93021edfc6574) To improve the latency of our design, we integrated the affine functions into the quadratic bijection Q 4 294 and write the S-box inverse as Following the technique described in Section 4.2, we pair four S-boxes in each column. Namely, the F and G functions of the i th S-box with 0 ≤ i ≤ 3 are paired with the F function of the (i + 1 mod 4) th S-box and the G function of the (i + 2 mod 4) th S-box. The full expression of the coordinate functions and how we paired the functions are given in detail in Appendix F. Based on this approach, the latency of the masked S-box is reduced to 3 clock cycles.
Architecture. Figure 10 depicts the design architecture of our fully-pipelined round-based second-order Prince supporting both encryption and decryption. In this design, each round is calculated over 6 clock cycles using a total of 70 bits of fresh masks. Namely, 38 bits for the S-box and 32 bits for refreshing after the MixColumns. For the low latency design, i.e. the 3-stage construction, we need to place a register layer at the input of the S-box to function as the state register. This design requires more fresh masks. i.e. 168 bits for four paired S-boxes and 32 bits for refreshing after the MixColumns. Table 1 shows the corresponding performance figures. Compared to state of the art, our designs requires fewer fresh masks per encryption while maintaining similar area and throughput.
Security Analysis. We start by analyzing the security of a single round of the masked Prince. This verification was done using SILVER [KSM20]. With SILVER, we examined the first-and second-order security of our designs as outlined in Section 5.1. The verification was successful, meaning that we can label all probe positions in one round of the masking as 'good'. Since the key-schedule is an affine function, it is possible to label all shares of the key as 'good' values when applying Theorem 1. Similarly, all of the random bits used in the S-boxes may be labeled 'good'. We then move to the 'bad' probe positions in Theorem 1. Since the masking is perfect first-order secure, we have ε 1 = 0. We thus focus on ε 2 . We first upper bound the maximum absolute correlation of the masked Prince S-box. This can be verified using software included in the submission.
Lemma 6. LetS : V a → V b be any restriction of the sharing of S defined above. Denote its absolute correlation matrix by |CS|. For any u, v ∈ F 2 /V ⊥ such that u = 0, it holds that CS u,v ≤ 2 −1.678 .
A probe in the state of Prince either results in an active column when probing the diffusion layer or in up to three active cells in a column when probing the masked S-box (due to the pairing technique). Similar to Midori, we include the key schedule in our analysis. Using an argument similar to the one used for Midori, it can be shown that the randomness of the masked key ensures that a trail with nonzero correlation must cover at least five rounds. In Table 2, we show how many S-boxes are activated given the number of active cells in the first column of the diffusion layer over 2 rounds. Activating only one cell results in the activity pattern with 16 active S-boxes shown in Figure 11. This trail is not possible due to the key. Indeed, this would require that at least two columns of the mask on the state after the third and fourth linear layer are equal. Furthermore, any other activity pattern starting with one activation in M must activate at least five additional S-boxes, making it inferior to the two-cell pattern from Table 2.
As a result, the best trail spans at least five rounds and offers ε 2 = 2 −21.68 (the square of the best result of Table 2).
The above analysis motivates the following security claim, which relies only on the accuracy of the piling-up principle.

Security Claim 4. Let
A be a noisy 2-threshold probing adversary for the masking of Prince described in this section. If A makes at most q queries and the probes of A are independent and λ-noisy, then the advantage of A is bounded by (assuming piling-up) Adv noisy 2-thr (A) ≤ q λ 2 2 20.68 .
In order to achieve an advantage of at most 2 −8 for up to 2 27 noisy-probing queries, we require that λ > 2 11.16 . A similar argument to the one given in Section 5.1 would not work here, as from the practical evaluation we can conclude that λ < 2 9 . However, we repeat that we expect a much higher noise parameter on ASIC. Moreover, a more detailed security analysis can provide a better bound.

Experimental Analysis
In addition to all theoretical analyses, for the sake of completeness, we have conducted experimental analyses. To this end, we have taken our full cipher implementation of Prince introduced in Section 5.4, which needs 6 clock cycles per cipher round and in total 70 bits of fresh randomness. We implemented this design on a Xilinx Spartan-6 FPGA of SAKURA-G evaluation board [GIS14] and supplied the device with a stable 6 MHz clock. For each required fresh mask bit, we instantiated a Linear Feedback Shift Register (LFSR) with the feedback polynomial x 31 + x 28 + 1, which -following the instruction given by De Meyer et al. [MMW18] -can be efficiently realized in Xilinx FPGAs by means of only three 6-to-1 Look-Up Tables (LUTs). For each given plaintext to be encrypted, the LFSRs are just activated for one clock cycle to be updated. In other words, the entire 70-bit fresh masks stay unchanged until the encryption is terminated.
Using a digital oscilloscope at a sampling rate of 500 MS/s, we collected power consumption traces by monitoring the voltage drop over a 1 Ω shunt resistor placed on the VDD path of the target FPGA. We followed the measurement strategy explained by Schneider and Moradi [SM15] to conduct fixed versus random t-test (also known as TVLA [CDG + 13]), where the encryption engine receives either a fixed or random plaintext while the key is constant during these measurements. Such an analysis is supposed to detect SCA leakages without performing any key-recovery attack. Note that all inputs to the device (resp. the output of the device) are provided in a masked form using 3 shares.
We performed four different analyses, with the first one being an ordinary t-test on each sample point individually, i.e. first-order univariate. For second-order univariate, we first made the traces mean-free (for each group of fixed and random individually), and then squared each mean-free sample point prior to running the same t-test. The same process has been performed for third-order univariate while cubing each mean-free sample point instead of squaring. For a bivariate second-order t-test, an individual t-test for each combination of every two possible sample points should be performed (by multiplying the corresponding mean-free sample points). Since performing such a high number of individual tests is not feasible (particularly in our case that every power consumption signal has 7 000 sample points), we followed the same trick used in [CRB + 16, SM21b] by down-sampling the traces by taking a sample point for each clock cycle (carefully selected at the middle of the cycle). This allows us to perform individual t-tests for every possible combination of two clock cycles.
The corresponding results are shown in Figure 12, confirming our theoretical analyses. In short, we do not detect any first-order or second-order leakage, either univariate or bivariate. As expected, the design exhibits third-order leakage, as depicted in Figure 12d. Since the third-order univariate t-test already showed detectable leakage, we omitted the extension of our multivariate analysis to the third-order case.

Conclusions
In this work, we used an extension of the bounded-query probing model from Beyne et al. [BDZ20], called the noisy probing model. This model can be seen as a hybrid between the threshold probing model and the noisy leakage model. We have shown that the  concrete security of maskings can be analyzed within this model using linear cryptanalysis, extending the results of Beyne et al. [BDZ20]. The inclusion of noise makes the model more realistic and allows for relaxed design constraints.
We proposed two techniques to create second-order low-randomness masked designs. The first relaxes the joint need for second-order non-completeness and uniformity. Thanks to this relaxation, we can make low-randomness masked designs with a minimal number of shares. The second technique extends the first one by pairing two masked S-boxes. This technique allows reducing the number of register stages, thus improving the latency of the masked designs.