Quantile : Quantifying Information Leakage

. The masking countermeasure is very eﬀective against side-channel attacks such as diﬀerential power analysis. However, the design of masked circuits is a challenging problem since one has to ensure security while minimizing performance overheads. The security of masking is often studied in the t -probing model, and multiple formal veriﬁcation tools can verify this notion. However, these tools generally cannot verify large masked computations due to computational complexity. We introduce a new veriﬁcation tool named Quantile , which performs randomized simulations of the masked circuit in order to bound the mutual information between the leakage and the secret variables. Our approach ensures good scalability with the circuit size and results in proven statistical security bounds. Further, our bounds are quantitative and, therefore, more nuanced than t -probing security claims: by bounding the amount of information contained in the lower-order leakage, Quantile can evaluate the security provided by masking even when they are not 1-probing secure, i.e., when they are classically considered as insecure. As an example, we apply Quantile to masked circuits of Prince and AES, where randomness is aggressively reused.


Introduction
Since the rise of the Internet of Things (IoT), embedded devices have been integrated into a wide range of everyday services, making the protection of cryptographic keys on these devices an essential but challenging task.Physical side-channel attacks, such as power or electromagnetic analysis, may allow attackers to extract cryptographic keys by observing a device's power consumption or electromagnetic emission during cryptographic operations [KJJ99,QS01,CRR02].
One of the most prominent algorithmic countermeasures against physical side-channel attacks is masking.In a nutshell, masking is a secret-sharing technique that splits inputs, outputs, and intermediate values of cryptographic computations into t + 1 random shares such that the observation of up to t shares does not reveal any information about their corresponding unmasked value [ISW03, GMK16, BBD + 16, CS19].
Probing model.The security of masking is most often studied in the threshold probing model [ISW03].In this model, computations are represented as arithmetic circuits, and an adversary may observe the value of up to t wires in the circuit.The implementation is then considered secure if any such observation is independent of unmasked values.While this model may seem simplistic and abstract, it has been shown that any t-probing secure circuit is also secure in the much more realistic noisy leakage model [PR13].Furthermore, security in the t-probing model implies practical t-order security under some assumption of leakage independence for each value [BDF + 17].Together, these observations lead to the conclusion that, both in (not very tight) theoretical reductions and in practice, t-probing security implies security for concrete physical side-channel leakage.
Despite the simplicity of the t-probing model, it is not always easy to prove the security of masked implementations in it, and multiple approaches have been proposed in the literature.The first approach, which works well for small circuits that implement a simple functionality such as a logic or arithmetic gate (named gadgets) and are well-structured, is to write proofs by hand.In order to extend these proofs to larger implementations, composable security definitions have been proposed that enable simple security proofs for the composition of multiple gadgets.Examples of such definitions include Strong Non-Interference (SNI) [BBD + 16] and Probe-Isolation Non-Interference (PINI) [CS19].The main appeals of this approach are its scalability to complex computations and the ease of security verification (which can be automated [CGLS21,CS21]), which in turn enables automatic masked circuit generation [BDM + 20, KMMS22].On the other hand, composition-based approaches often lead to less efficient circuits than non-composable constructions since they impose additional requirements on the gadgets and hinder crossgadget optimizations, such as randomness reuse.
Another approach is to automate security proofs for masked circuits independently of a concrete masking scheme.Multiple formal verification tools [ANR18, BBC + 19, KSM20, GHP + 21] have been introduced for this purpose, and they all have the same high-level functionality: given a masked circuit description, verify that it is t-probing secure.Despite numerous optimizations, the application of such tools is typically limited to the verification of no more than a few rounds of a masked circuit and low masking orders.Some of these tools achieve their efficiency at the expense of having false-positive verification results (i.e., secure circuits for which leakage is reported).The PROLEAD verification tool [MM22] recently introduced an alternative approach to verification.Rather than formally proving independence, it is based on Monte Carlo sampling and statistical tests of independence.The statistical nature of the tool significantly improves its scalability towards more complex circuits with high logic depth but admits false positives caused only by the statistical test (the probability of false positives is therefore controllable with the parameters of the test).However, this technique introduces the risk of false negatives (i.e., insecure circuits wrongly reported as secure), whose probability of occurring is harder to control due to the use of asymptotical statistical tests.
Practical security.Actual side-channel adversaries do not have access to probing leakage from some of the variables in a circuit but rather a noisy leakage on each of the variables.Further, such adversaries succeed if they recover a fixed amount of information (e.g., a key) by measuring multiple traces (i.e., executions of the circuit), the security level being the number of traces needed.This contrasts with the t-probing adversary, which has to recover any (no matter how tiny) amount of information in one trace.
For example, let (x 0 , x 1 ) be a masking of the secret x such that x 0 ⊕ x 1 = x and Pr [x 0 = 0|x = 0] = Pr [x 0 = 0|x = 1] = 0.5.Furthermore, let (l 0 , l 1 ) = (x 0 + n, x 1 + n ) be corresponding physical leakage, where n and n are independent Gaussian noise variables.In this case, the circuit is 1-probing secure, and therefore, an adversary observing only l 0 or l 1 does not learn anything about x.However, the circuit is not 2-probing secure: probing both x 0 and x 1 reveals x.With the noisy leakage (l 0 , l 1 ), a second-order adversary may estimate the covariance of l 0 and l 1 using multiple leakage traces, yielding a good distinguisher for the value of x.In contrast, assume now that, for some small > 0, Pr [x 0 = 0|x = 0] = 0.5 + and Pr [x 0 = 0|x = 1] = 0.5 − .The circuit is no longer 1-probing secure: observing x 0 leaks information about the value of x.Practically, this bias can be exploited by a first-order adversary observing only l 0 , but an exploit requires many traces (on the order of 1/ ).However, for the second-order attack using (l 0 , l 1 ), a small bias does not have a significant impact.As a result, the practical first-order attack may require more traces than the second-order attack to be successful.
The previous example shows that the probing security order is not always a good predictor of the security level.Therefore, it might be better to adopt a metric that quantifies the leaked information more accurately.
Contributions.This paper presents a methodology to quantitatively assess the security of masked circuits.The methodology relies on the noisy leakage model [PR13], i.e., we assume that the leakage is made of independent noisy leakages of each of the variables of the circuit.Then, we bound the mutual information between these variables and the secret using a statistical sampling-based technique implemented in an open-source tool named Quantile (Quantifier of Information Leakage) and available at https://github.com/vedadux/quantile .
Compared to the state-of-the-art t-probing security verification tools, our method scales well to large circuits and provides proven statistical bounds.Furthermore, beyond t-probing security, our methodology is also able to evaluate the security level in the noisy leakage model, taking into account the presence of noise in practical leakages.
Our contributions can be summarized as follows: • We present a scalable sampling-based verification technique for masked circuits that bounds the mutual information between sensitive values and t-probing model leakage.
• We show how to turn these bounds into a lower bound on the number of attack traces needed for a worst-case adversary, allowing us to compare attacks at various orders.To the best of our knowledge, our technique is the first one to use a circuit's description to formally quantify "benign" masking imperfections, thanks to the notion of effective security order.
• We provide an optimized software implementation of our verification tool that exploits vector instructions and multi-core processors.
• We show the effectiveness of our verification approach by applying it to masked implementations of AES and Prince using different amounts of randomness reduction/reuse techniques to meet certain performance or efficiency goals in lightweight applications.
The rest of the paper has the following structure.Section 2 covers the necessary preliminaries for our masking verification technique.In Section 3, we develop a method for bounding the number of attack traces needed to carry out a side-channel attack.This method is based on bounding the mutual information between a secret and physical leakage.Section 4 explains how approximations of the mutual information can be computed efficiently and what the overall workflow of Quantile looks like.In Section 5 we apply Quantile to masked implementations of AES and Prince using different randomness reduction/reuse techniques.Section 6 discusses related work, and finally, we conclude the paper in Section 7. Important proofs for our methodology are given in Appendix A and Appendix B.

Preliminaries
In the following, we briefly introduce the side-channel setting modeled through information channels and give an overview of entropy, mutual information, and basic estimators.
Notation.Throughout this work, we denote random variables with uppercase letters (X), their values with lowercase letters (x), and sets with calligraphic letters (X ).We use bold uppercase letters (X) for vectors of random variables.We write X ∼ U X when the random variable X follows a uniform distribution over the set X .Similarly, we write X ∼ N µ, σ 2 for a random variable X that follows the normal distribution with mean µ and variance σ 2 .

Side-Channel Leakage
We first formalize side-channel leakage of a cryptographic computation from an informationtheoretic point of view.A cryptographic computation takes as input a secret S (e.g., a cryptographic key) and known data D (e.g., the plaintext) from the respective domains S and D. In general, we assume that the computation is also masked, meaning it takes additional uniformly random inputs M ∼ U M (e.g., masks), ultimately randomizing the computation even when the same secret and data are provided.We write X to denote a tuple of intermediate values of the computation.
A computation is said to be t-probing secure if all tuples X of size t are independent of S. Here, we assume that S is provided to the computation in an already masked representation and cannot be probed directly [ISW03].Furthermore, we say that a t-probing secure computation has the security order t.
However, in reality, a side-channel attacker cannot observe X directly.Instead they observe physical leakage, which we model as the tuple L, where each element is the result of applying an independent noisy function [PR13] to the corresponding element of X.
In a side-channel attack, the secret is chosen uniformly at random, i.e., S ∼ U S .The adversary then performs n Adv computations with the same secret S but changing data In the rest of this work, we additionally assume that each D i is chosen independently and uniformly at random, i.e., D i ∼ U D .Each of the computations produces noisy leakage L i , thus giving the adversary access to L = (L i ) n Adv i=1 .Finally, the adversary outputs a guess S for the secret S. The success rate r Adv of the attack is defined as r Adv = Pr [S = S ], while the attack order is the number of elements in L. The following Markov chain summarizes the attack process: In this work, we will use the the Hamming weight of the probed variable with additive Gaussian noise as an example for the noisy leakage function in L. For such function, we can define the signal-to-noise ratio (SNR) as the variance of the deterministic part (the Hamming weight) divided by the variance of the noise [Man04].

Entropy Estimation
For a random variable X, its (Shannon) entropy represents the uncertainty in its outcome, represented as bits.For a discrete variable X, e.g., occurring in a digital computation, entropy H (X) is defined as (2) The above definition extends to the entropy H (X|Y = y) of X|Y = y, and the conditional entropy, as Because of its frequent occurrence, we define the function H bin : (0, 1) → (0, 1) as H bin is often referred to as the binary entropy function because it represents the the entropy of a variable X with domain X = {x 0 , x 1 } and probability p = Pr [X = x 0 ].The mutual information between random variables X and Y is defined as I (X; Y ) = H (X) − H (X|Y ), whereas the mutual information conditioned on Z is defined similarly as I (X; Y |Z) = H (X|Z) − H (X|Y, Z).The bounds for the mutual information presented in this paper rely on a very simple estimator for the entropy of a distribution.The so-called plug-in estimator first estimates the distribution of a random variable X through samples and computes (2) using this new distribution.
Definition 1 (Plug-in entropy estimator).Given a vector X n = (X i ) n i=1 of independent and identically distributed random variables X i , let X n be a new random variable distributed according to where 1 A (x) is an indicator function with value 1 if and only if x ∈ A, and 0 otherwise.
The plug-in entropy estimator H n (X) is defined as the entropy H X n shown in (2).This estimator is negatively biased everywhere, as shown by Paninski [Pan03].
Proposition 1 (Bias of the plug-in entropy estimator [Pan03, Prop.1]).For a discrete random variable X with support X , the bias of the entropy estimator H n (X) satisfies

Bounding the Mutual Information
In this section, we develop a method for bounding the number of attack traces needed for a side-channel attack.This method is based on bounding the mutual information I (L; S|D).As a first step, we bound the noiseless mutual information I (X; S|D) in Section 3.1.Afterward, in Section 3.2, we show how such a bound can be integrated with knowledge of the noisy leakage function (e.g., knowledge of the SNR) to get a bound on I (L; S|D).Finally, we show how the latter can be mapped to an attack's success rate.

Information Leakage from an Intermediate Variable
In general, estimating the mutual information between two random variables is a difficult problem for continuous variables or discrete variables with a large domain size.Moreover, deriving good bounds is also difficult, since the estimators for mutual information can be biased either positively or negatively, depending on the distribution.In our case, even though S has a large domain, we know that it is uniform (and similarly for D), and X is a discrete variable with a relatively small domain.We exploit this knowledge to derive practically-relevant bounds on I (X; S|D).More precisely, we design a method to derive the bounds such that each inequality holds with probability at least 1 − δ, where δ is a confidence level, that is, the probability that the bound is incorrect.Informally, the core idea of the method is to use the equality I (X; S|D) = H (X|D) − H (X|S, D), and approximate both H (X|D) and H (X|S, D) independently.Here, we perform the approximations by summing only over n D many values for D, respectively (S, D) in equation (3), and estimating both H (X|D = d) and H (X|S = s, D = d) using the plug-in entropy estimator with n X samples of X|D = d, respectively X|S = s, D = d.This procedure gives us an estimator I (X; S|D) for the conditional mutual information I (X; S|D), and we can then analyze its bias and its variance to get the bounds I LB (X; S|D) and I UB (X; S|D).The bias of this estimator comes from the bias of the plug-in estimators for the entropy (which are bounded in [Pan03]).Regarding the deviation, the asymptotic trend follows the central limit theorem.We derive nonasymptotic bounds in Appendix A, giving the following result as a corollary.
Corollary 1.Let X, D and S be discrete random variables with domains X , D and S, and let D and S be distributed uniformly.Let n D and n X be positive integers and δ > 0 be a real number.
i=1 be vectors of independent random variables distributed identically to D and S respectively.Furthermore, let j=1 be vectors of independent random variables distributed identically to X|D = D i and X|S = S i , D = D i respectively.Finally, let be estimates for H (X|D) and H (X|D, S), with I (X; S|D) = H (X|D) − H (X|S, D) consequently being an estimate for I (X; S|D).Then I LB (X; S|D) = I (X; S|D) − ε and Here, Corollary 1 gives bounds for I (X; S|D) of a single intermediate variable X with confidence 1 − δ.While it is tempting to then find the point X max with maximal I (X max ; S|D) throughout the whole computation and claim that for all other intermediate computations X the mutual information is less than I UB (X max ; S|D), this ignores the fact that given long enough computations there is a good chance that some intermediate values violate their upper bounds.To account for this, we can apply a union-bound over all intermediate values, essentially dividing δ by the length of the computation.Both the derivation and proof are given in Appendix A.

From Probing to Noisy Leakage
Let us now assume that instead of observing directly the leaking variable X, we observe as leakage L = f (X) for some noisy function f : X → L [DDF19].We are therefore interested in bounding I (L; S), which then gives us a bound on the number of traces n Adv an adversary needs to measure, in order to recover the value of S with some probability.
For 0 ≤ p ≤ 1, we denote id p : X → X ∪ {⊥} the randomized function that on input x ∈ X outputs x with probability p and ⊥ otherwise.If there exists a (randomized) function f ⊥ : X ∪ {⊥} → L such that for all x ∈ X such that f (x) has the same distribution as f ⊥ (id p (x)), then, by the data processing inequality, The reduction of noisy leakage functions f to random probing functions f ⊥ (id p (•)) has been extensively studied in the literature [DDF19, PGMP19], and its discussion is out of the scope of this work.
Example 1.As an illustration, let us consider the case of single-bit leak X with X = F 2 , that is added with Gaussian noise to obtain L, i.e., L = f (X) = 1 {0} (X) − 1 {1} (X) + Z with Z ∼ N 0, σ 2 .The portion of the distribution of L where we cannot distinguish X = 0 from X = 1 corresponds to the portion of id p that maps to ⊥, i.e., 1 -p.The two parts of the probability distribution of L "meet" at Pr [L = 0|X = 0] = Pr [L = 0|X = 1].Using cdf P (•) as the cumulative distribution function of distribution P and erf (•) for the error function, we have Therefore, usign the first term of the Taylor series for erf(•), we have The simple leakage function L = f (X) is later used in Section 5.2 to contextualize the experimental results in terms of noisy leakage.

Number of Attack Traces
The number of traces needed to mount an attack is the most common side-channel security metric.In this section, we show how to bound the number of traces n Adv required to mount a secret-recovery attack with success rate r Adv , using the mutual information I (L; S|D) between the observed leakage L and the secret S.
This result is a version of [dCGRP19, Theorem 1], and the proof given in Appendix B is very similar to the original proof.The major difference is that we are giving a statement in the context of I (L; S|D), while they use I (X; L|D).1 Example 2. We illustrate Lemma 1 for a side-channel attack on ciher with an 128-bit key, e.g., AES [DR98] or Prince [BCG + 12].Assuming that the key is chosen uniformly at random, an adversary that wants a sucess rate of at least r Adv = 50% would need traces.This shows that the number of traces is inversely proportional to the mutual information, with a proportionality factor that is relatively small.This factor depends on the size and distribution of the key and the success rate, but it is anyway bounded by the entropy of the key.In Section 5.2, we use this lemma and a bound on the mutual information in order to get a lower bound on the number of attack traces n Adv , i.e., on the security level.

Computing the Approximation and Bounds
Approximating and bounding the mutual information I (S; X|D), as outlined in Section 3.1, requires a significant amount of samples for the involved random variables.In this section, we first present the critical insights for highly efficient sampling and introduce our simulation framework and its workflow.

Efficient Sampling
In order to get samples for the intermediate values X, it is necessary to execute the design of the cryptographic primitive.There are many state-of-the-art hardware simulators, and it might be tempting to just pick one of them and use them to simulate the designs and obtain the samples.However, these tools are intended for testing, debugging, and accurate timing estimation and are not meant for running a large number of simulations and aggregating them into histograms.Quantile's extremely efficient aggregating simulator is based on two key insights.
Code generation from symbolic simulation.The first critical insight is that many of the values in hardware design's execution do not change across simulations.This includes control signals and everything else that is independent of the data the hardware processes, i.e., public data, secrets, and masks.In a sense, such intermediate values are constants and can be optimized away by unrolling the hardware circuit across the clock cycles of its execution.Here, the hardware circuit starts out with its data inputs being the only unknown values that must be treated symbolically.Furthermore, any time such symbolic signals are fed as inputs to a hardware cell, the cell output is computed symbolically as well.For example, an AND cell, with input symbols a and b, would produce output symbol a ∧ b.However, if the second input is constant 0 (respectively 1), it would produce output constant a ∧ 0 = 0 (respectively symbol a ∧ 1 = a).For clock cycle transitions, the symbolic register output signals in the current clock cycle are defined to be equal to the register input signal from the previous clock cycle.As a result, the number of clock cycles to simulate can is known beforehand since the end of the execution is triggered by a non-symbolic finish signal.After symbolically simulating a hardware circuit, we have effectively generated a straight-line symbolic trace of its execution, where all constants have been eliminated, and only the important computations are left.This trace can be turned into a very efficient statically compiled simulator.
Parallel simulations through bitslicing.The second critical insight is that the netlist, and therefore the execution trace, exclusively manipulates single-bit variables.We can therefore use the bitslicing technique to store values belonging to separate simulations inside a single architectural register.Moreover, many modern x86_64 machines support the SSE2, AVX2, and AVX512F extensions that provide 128-, 256-, and 512-bit wide registers and vectorized instructions for all common bitwise logic operations.The only missing operation commonly used in netlists is a multiplexer, which can be simulated as MUX(s, i 0 , i 1 ) = (¬s ∧ i 0 ) ⊕ (s ∧ i 1 ).Furthermore, the bit-level parallelism of bitsliced executions additionally enables the quick computation of 1-bit histograms by counting the number of bits set to 1 inside a register with the popcntq inctruction.This can be exploited because currently, Quantile is tailored to univariate leakage analysis.For support of multivariate leakage, the histogram computation needs to be adapted, either to do bit-manipulations and still use popcntq, or to unslice the parallel executions and count the element frequencies in a more traditional manner.

Framework Overview
We now briefly present the workflow of Quantile shown in Figure 1, with the main steps described below.Step 1 (Synthesis).The user provides a hardware design written in either Verilog, VHDL, or a mix of both, which is compiled into a JSON netlist using Yosys (with the GHDL plugin).The generated netlist is implemented using Yosys' generic gate library, for which we have written a custom symbolic simulation library.
Step 2 (Simulation).The user writes a testbench in C++ using the provided simulation library, setting the values of input signals and registers to constants or declaring them as secret, data, or masks.The simulation library handles the symbolic representation and simplification of intermediate values throughout the testbench execution.
Step 3 (Code Generation).The testbench symbolically unrolls the execution of the netlist unsing the provided constants, secrets, data and masks, generating a straight-line C++ program representing an execution of the netlist under the control of the testbench.The generated code is able to perform a bitsliced execution of the design, and chooses the optimal width for bitslicing, depending on whether SSE2, AVX2 or AVX512F are available on the given machine.
Step 4 (Estimation).Finally, the generated code is used to sample S, D and X.After choosing the target confidence δ, the user either chooses the sampling quantities n D and n X directly, or lets the framework pick its best guess for some target error ε.The framework instantiates multiple simulation workers and pools their results to continuously update its mutual information estimate and report it to the user.

Analysis of Masked Ciphers
In this section, we apply our method to derive bounds for the information leakage of several cryptographic implementations.We show that Quantile can compute reasonable bounds for complete executions of cryptographic primitives, look at the consequences of randomness reuse, and investigate the security of low-randomness masking techniques.Then, we illustrate the bounds on the noisy leakage and number of attack traces.We discuss these results, showing why quantitative security bounds are more useful than simply evaluating the security order.

Securely Masked Ciphers
In the following, we give a brief overview of well-known masked hardware designs we have evaluated using our framework.Afterward, we discuss the results of our analysis shown in Table 1.
AES DOM [GMK16] implements a protected AES [DR98] using the domain oriented masking (DOM) scheme with two shares, where each share is assigned a domain and only cross-domain operations are re-shared using on-the-fly randomness.Since the DOM Table 1: Summary of Quantile results for δ = 10 −3 when designs use fresh masks ( ), reuse masks after n rounds ( n) or do not use fresh masks ( ), with indicating I LB (X max ; S|D) > 0.
Prince TI [BKN22] implements a protected Prince [BCG + 12] block cipher using CMS construction [RBN + 15] with two shares, where re-sharing is only necessary at the end of a non-linear operation before the results are compressed back into two shares.This allows the implementation to implement an S-Box with only two stages.Since the S-Box is rather small, the design uses 16 S-Box instances applied to all 4-bit nibbles in parallel.Overall, one round of Prince only takes two clock cycles.
Results.For AES DOM and Prince TI with fresh masks, the approximated mutual information was very low (≤ 10 −5 ), and significantly smaller2 than the error bounds of Corollary 1 (3 × 10 −4 ).This leads to the conclusion that the implementations' first-order leakage is small or nonexistent.As a sanity check, we also run the analysis on AES DOM and Prince TI without masks, i.e., setting all masks to have value 0. Unsurprisingly, our analysis immediately determines mutual information of about 0.99 at many points in the computation, proving them insecure with a moderate error bound of 0.1.

Reusing Masks
Protecting an implementation with masking inevitably increases the size of a circuit, increases the latency due to synchronisation and (usually) requires a lot of fresh randomness.Generating enough randomness at each clock cycle generally requires bulky RNG modules that increase the overall design size.An alluring but dangerous idea for reducing the randomness requirements is to simply reuse masks.Done naively, this has the potential of undermining the probing security of the design, rendering the masking useless.However, there might be ways to cleverly reuse masks and retain practical security, if not even perfect probing security.
Results.As a simple preliminary experiment, we analyzed what happens to the security of AES DOM and Prince TI when randomness is reused across different rounds of the cipher.AES DOM did not show any signs of increased information leakage when reusing the same randomness is used for all the rounds of the cipher, yielding the same approximated mutual information.We suspect that this is due to the diffusion properties Table 2: Bounds for effective security order transitions: above SNR t , the effective security order is 2. The corresponding mutual information (which is a bound for the first-order attack the exact value for the second-order attack) and number of attack traces are given.Prince Nullfresh [SM21] 4.85 × 10 −3 1.68 × 10 −5 3.74 × 10 6 AES 2-bit masking [GMKM18] 4.82 × 10 −3 1.66 × 10 −5 3.79 × 10 6 of AES, as well as the large number of masks needed for the computation of each cipher round.In contrast, Prince TI becomes less secure when the same randomness is used in each round of the cipher, with approximated mutual information of 3.70 × 10 −4 .According to Corollary 1, it is very likely (99.9%) that I (X; S|D) > 7.0 × 10 −5 and therefore that Prince TI is not probing secure when masks are reused in each round.We ran another experiment where randomness gets reused every two rounds, and got the same leakage estimate as for Prince TI with fresh masks.

Low-randomness Designs
In this section, we analyze recent low-randomness masking schemes that rely exclusively on the randomness coming from the initial input sharings.Nullfresh [SM21] removes randomness from first-order masked computations.The Nullfresh method achieves this by noticing that the three-input computation (a ∧ b) ⊕ c does not require fresh randomness for a secure first-order sharing, assuming that c is uniformly distributed.Because the computation of an AND gate can be represented as a ∧ b = (¬a ∧ b) ⊕ b, it is also possible to create a secure first-order sharing of a ∧ b analogously.Similarly, most quadratic and cubic 3-and 4-input S-boxes can be shared in a similar manner, eliminating the need for fresh randomness in ciphers that use them.[GMKM18] is a slightly older technique for achieving low-randomness implementations.The authors notice that it is possible to construct a first-order masked AND gate where the bulk of the computation happens in the first share of the output, and the second share is inherited from one of the inputs.This leads them to a technique where, given careful sharing choices, every input and intermediate value a in the original computation is shared as (a ⊕ m, m), where m ∈ {m 0 , m 1 , m 0 ⊕ m 1 } and m 0 and m 1 are the only uniformly random values necessary for the security of the computation.

2-bit Masking
Results.We have analyzed a Nullfresh implementation of Prince and a 2-bit masking implementation of AES, and present the results in Table 1.Both of these implementations achieve stunningly low estimated conditional mutual information (≤ 10 −5 ), with the 2-bit masked AES achieving the lowest estimate 3 × 10 −9 in our experiments.We suspect that this happens because only two masks are used overall, compared Nullfresh Prince which uses 192 masks for the initial sharing, leading to extremely low variance for the estimator H n X (X|S = s, D = d), due to there only being four different possibile values for the masks.

Noisy Leakage and Number of Traces
We now move on to the more realistic noisy leakage model, where we assume that the value of every intermediate bit in the computation leaks with additive Gaussian noise.6).The first-order leakage bounds are based on (4), with p taken from (5), and I (X; S|D) bounds taken from Table 1 (these upper and lower bounds give the interval shown in the plot).
Considering the presence of noise allows us to quantitatively compare the leakage at different orders in the bit-leakage with Gaussian noise model (i.e., each intermediate bit leaks independently with additive Gaussian noise).
For first-order leakage, we combine the bound I UB (X max ; S|D) from Quantile (Table 1) with Equation 4and Lemma 1.For second-order leakage, we assume the observation of noisy leakage of a pair of uniform shares representing a secret bit (or a bit in bijection with a key bit, given the plaintext).The mutual information I (X; S|D) is then easily computed with numerical integration, and the number of traces is derived with Lemma 1.
In Figure 2, we show these results for the Prince TI circuit, as a function of the SNR.Prince TI ( ) shows that not using randomness, strongly degrades the security, making a first-order attack with few traces possible.For the Prince TI ( 1), the second-order leakage is larger than the first-order leakage when the SNR > 8.27 × 10 −3 .That is, for relatively high SNR values, a second-order attack requires fewer traces than a first-order attack.Consequently, the security of implementations with such SNR is dictated by the second-order attack, and reused randomness does not lower the security level in such cases.
This discussion relates to the notion of "effective security order" [DDF19, Sta20], which is the the order of the optimal attack (i.e., the one that requires the lowest number of traces).This order can be larger than the "true" security order in the probing model where observations are noiseless.By contrast, the effective security order refers to concrete SNR values.For SNR > 8.27 × 10 −3 , Prince TI ( 1) has an effective security order of 2, while for SNR ≤ 8.27 × 10 −3 , its effective security order is 1.
Finally, Table 2 shows the coordinates of the points where the first-and the second-order attacks intersect, i.e., the SNR above which the effective security order is 2. Below that point, the effective security order might be 1 (as in the Prince TI with randomness re-use at every round), but might also be 2 (this is due to the overapproximation of the mutual information cf.Table 1).This table shows that for adversaries with less than 1 million attack traces, the second-order attack performs better, and therefore the first-order leakage is not an issue.Let us also note that the transition values of SNR t are fairly low.Therefore, first-order leakage of these designs is only an issue in implementations with already high noise levels or a very low signal levels, e.g., due to dual-rail logic [LMW14].

Evaluation of Quantile's Simulator
We have evaluated the performace of Quantile's efficient sampling method against both Verilator and PROLEAD on AES DOM and Prince TI.The comparison only considers how quickly the simulators are able to generate full simulation traces of the given designs, and no additional statistics are gathered.The Verilator-based sampler uses a custom-written C++ testbench for the designs that iteratively runs simulations with random secrets, masks and data.As for PROLEAD, we have removed all of its analysis capabilities, probe gathering, statistics computation, and only ran the circuit simulation component.The evaluation was done on a machine equipped with an eight-core Intel Core i7-8550U CPU running at 1.8 GHz, 16 GiB of memory, and a 64-bit Linux system.The results are shown in Table 3 and indicate that our simulation technique is about two orders of magnitude faster than both Verilator and PROLEAD.
A good simulation performance is important due to the large number of executions needed for tight approximation bounds.For example, the Prince TI experiments in Table 1 need 5.24 × 10 12 simulations of the full cipher.Each of these experiments ran for about 54.31 h on a server machine with an 44-core Intel Xeon E5-2699 CPU, clocked at 2.20 GHz, that supports AVX2 instructions3 .As the workload is highly parallelizable, parts of the experiments can also be run on multiple machines and then merged for analysis.

Formal Verification Tools
So-called formal verification tools automate the proof of t-probing security (and related notions) independently of a concrete masking scheme.In essence, these tools enumerate all possible sets of t probes in the circuit and, for each of these sets, check whether the distribution of its values depends on the secret inputs.A common limitation of this family of tools is that they are limited in the circuit size and security order they can handle due to computational cost because the number of probe sets increases quickly with t and with the circuit size.Moreover, checking a single set can be a difficult problem when the circuit has a high logical depth.
Exact tools.The simplest but least efficient of the formal tools is VerMI [ANR18], which verifies the independence by enumerating all the randomness values to compute the exact distribution of the probes.The SILVER tool [KSM20] performs this independence verification using binary decision diagrams, which results in a more efficient algorithm overall.These two tools are exact, meaning they always correctly report whether a circuit is t-probing secure or not.
Coco [GHP + 21, HB21] improves upon the performance of exact tools by introducing two optimizations.First, instead of computing the distributions of wires, it computes correlation sets.Using an implicit representation of these sets encoded in a SAT solver, this technique typically avoids the exponential scaling of the complete distribution computation but over-approximates the leakage (leading to false positives).Second, the list of probe sets to explore can be encoded in the SAT solver, leading to optimization opportunities within the solver, compared to performing explicit checks for every set of probes.
MaskVerif [BBC + 19] is another over-approximating formal tool that uses the following property: if e is an expression in which a random bit r does not appear, then replacing every instance of r in a computation by r ⊕ e does not change the distribution of the computed values.This property enables it to simplify the algebraic expressions of the probed wires in a similar way to Gaussian elimination until secrets no longer appear in the expressions.Otherwise, the verification fails.This restricted simplification technique and its heuristics lead to a highly efficient algorithm at the cost of false positives.Further enhancing the performance, maskVerif opportunistically verifies the security of sets of more than t probes, which allows it to reduce the total number of sets to verify.
Comparison to Quantile.In general, VerMI and SILVER can usually only be applied to components of a single cipher round like S-Boxes.The over-approximating tools Coco and maskVerif can verify a few rounds of a masked cipher, with maskVerif being considered more efficient overall, especially at higher masking orders.We have tried replicating our results from Table 1 with maskVerif by translating the code Quantile generates for the runner (cf. Figure 1) into maskVerif's input language.When looking at complete executions from Table 1, maskVerif goes out of memory on a machine with 120 GiB of RAM.When given round-reduced versions, it verified at most one round of AES DOM, and at most five forward rounds of Prince TI within 8.97 h.In any case, these tools are focused on t-probing security and cannot verify imperfect masking.

PROLEAD
PROLEAD [MM22], similarly to Quantile, is based on simulating the target circuit and collecting statistics about the values of all the wires.Its main difference to Quantile is that it uses a statistical test (a G-test) whose null hypothesis is the independence between the observed values X and the input secrets S. Furthermore, for the value of S, it performs a fixed-vs-random test: the simulations are grouped in two classes: one where S is fixed to all-zeros, while the other one uses uniformly and freshly sampled S. From the test statistic, PROLEAD can derive a p-value.
Compared to formal verification techniques, PROLEAD shares its main advantage with Quantile: linear scalability with the circuit size.Its false positive rate is also controlled with a p-value, which can be reduced by increasing the number of samples in the test.In contrast, the false positives produced by Coco and maskVerif are deterministic and cannot be worked around with increased computational resources.
Whereas the false positives of PROLEAD are easily controlled with its p-value (with precautions considering the number of different tests performed, as explained in Section 3.1), the false negatives are more problematic.Such false negatives have two root causes: a G-test's intrinsic false negative rate and the use of a fixed-vs-random test.
First, controlling the probability of false negatives of the G-test is more challenging than controlling false positives since their occurrence depends not only on the parameters of the statistical test (such as the number of samples and the significance threshold) but also on the effect size, i.e., on how much the distribution of the set of probes changes when the native values vary.PROLEAD computes a false negative rate for its result by assuming a "small effect" size of φ = 0.1, following [Coh88] (we refer to [Coh88,MM22] for the definition of the effect size φ).If the effect size is smaller than φ = 0.1, the false negative rate will be higher than the one computed by PROLEAD.It is crucial to meaningfully define the effect size threshold such that an effect below the threshold can be neglected considering the application domain, as explained in [Coh88] (where PROLEAD's "small" φ = 0.1 threshold is taken from): "The terms "small", "medium", and "large" are relative, not only to each other, but to the area of behavioral science [...]".Since side-channel analysis techniques are designed to detect tiny key-dependent variations in the leakage, it is not unreasonable to assume that a negligible effect size for side-channel analysis is much smaller than one for behavioral science.A better way to select the effect size threshold would be to ensure that leakage from a wire whose key dependence matches that effect size is benign, i.e., that the number of traces needed to mount a successful attack exploiting that leakage is above a targeted security level.
Second, by running its fixed-vs-random test, PROLEAD checks whether the wire distributions are the same when (i) the native secret inputs are zero and (ii) the native secret inputs are uniformly distributed.These distributions may be equal (or very close, as discussed in the previous paragraph), while there is a dependence: other (i.e., some nonzero) fixed native secret values may violate the equality.In such a case, PROLEAD will wrongly report the absence of leakage even when requesting a very low false negative probability.This problem can be solved by iterating the test multiple times while changing the fixed input values or performing a G-test with more rows in the contingency table.Both of these solutions will lead to worse performance (due to requiring more samples) and make the definition of effect size more complicated.

Mutual information estimation and bounds
The problem of estimating mutual information has attracted attention for many years.However, perhaps surprisingly, the particular problem we are interested in (confidence interval for the mutual information between discrete variables with a large domain and unknown distribution) has not received attention.
A discussion of the properties of simple estimators can be found in [Pan03], and some more detailed discussion also appears in [MCHS23].A few recent works designed improved estimators (we focus on the discrete case) [APP13, SSK15, HS19], but they do not provide a theoretical convergence analysis for their estimators.
A different line of work considers bounds on the mutual information, assuming that some of the properties of the joint distribution are known (e.g., minimum and maximum values of some marginals or conditionals) [DG97,PDSS16].Besides the issue of computing such values for our problem, these bounds introduce a fixed gap (due to exploiting a few characteristics of the distributions) that does not shrink with the number of samples.

Conclusion and Future Work
In this paper, we introduced Quantile.Using a nonasymptotic statistical theory, Quantile computes statistical bounds on the mutual information between the secrets and the value of a wire in a circuit or between the secret and noisy leakage of this wire.Quantile enabled us to evaluate the effective security order of a masked implementation concretely.The verification method implemented in Quantile scales efficiently to large circuits.
We now discuss future work opportunities.First, although the theory of Section 3 works with any leakage domain, the implementation of Quantile is currently limited to univariate bit leakage.Although efficiently extending it to multivariate leakage is a challenging engineering problem, it would enable the analysis of higher-order masked circuits or analysis within the robust probing model [FGP + 18] (i.e., take glitches and transitions into account).Such extensions to Quantile would also enable it to give security bounds against soft analytical side-channel attacks (SASCA) [VGS14], which exploit information from multiple sharings in a single attack.
Next, we observed that our bounds are not very tight, mainly due to worst-case assumptions on the statistical distributions when computing the confidence intervals.Replacing such assumptions with the observed distributions in the samples (while preserving the provable bounds) would enable tighter bounds, i.e., improved security bounds at a lower sampling cost.
Finally, the randomness reuse case studies we performed are fairly simple and nonoptimized.We believe that Quantile can be used to design efficient masked circuits with smaller imperfections than our examples, e.g., by shuffling the random bits between the rounds.

A.2 Deviation of The Plug-in Entropy Estimator
Lemma 2 (The plug-in entropy estimator is sub-Gaussian).For a discrete random variable X, the plug-in entropy estimator H n (X) from Definition 1 is sub-Gaussian, i.e., , which are all independent and distributed identically to X. Therefore, if we are able to show that changes in the outcome of X i have a limited influence on H n (X), we can apply McDiarmid's inequality from Proposition 3 to prove the sub-Gaussianity of H n (X).
In the following, we analyze what happens to H n when one of the random variable, without loss of generality we choose the i-th one, has a different outcome.Let function h : p → −p log 2 (p) represent entropy summands.Furthermore, let (x i ) n i=1 be the outcomes of sampling X n , and let x i be a different outcome of the i-th random variable.The greatest change in H n is given by For all values x ∈ X \ {x i , x i }, the term h n j=1 1 {x} (xj ) n appears both positively and negatively inside the absolute value, canceling them.As for x i (and x i ), the argument to the negative terms decreases (increases) by 1 n .Let c i = n j=1 1 {xi} (x j ) and let The above supremum simplifies to We now analyze the parts of the absolute value depending on c i and c i separately.We see that h ci n − h ci−1 n is monotonically decreasing because it has the strictly negative derivative has its supremum and infimum at the interval ends c i = 1, respectively c i = n, i.e., Doing a similar analysis for parts of the absolute value depending on c i shows that The absolute value in (10) reaches its largest value when either both the c i and c i -dependent parts reach their supremum or both reach their infimum.Therefore, finally Since the bound is independent of the X i whose value changes, we can apply McDiarmid's inequality from Proposition 3 to get H n (X) N E H n (X) , n 4 H bin n −1 2 .Proof.We prove this by looking at the inverse probability, removing the supremum of I (X k ; S|D) by representing the inequality as an union of events, relaxing said events and applying an union bound for the probabilities.Invering the probabilities again gives (13) concluding the proof.

B Proof of Lemma 1
We first recall an instrumental result which bounds the conditional entropy of processed random variables.Finally, since f models memoryless channel, we can apply Proposition 5 to (S, . . ., S) and L (conditionning everything on D), finding that I (L; S|D) ≤ n Adv I (L; S|D).Combining this with (17) gives (6) which concludes the proof.

Lemma 1 .
Let n Adv ≥ 0 be an integer, S be a random variable with domain S and D = (D i ) n Adv i=1 be a vector of random variables independent of S with the same domain D. Furthermore, let f : S × D → L be a randomized function modelling a memoryless channel, and let L = (L i ) n Adv i=1 be a vector of random variables with L i = f (S, D i ).Let Adv : D n Adv × L n Adv → S be a (potentially randomized) function attempting to recover the value of S as S = Adv (D, L) and let r Adv = Pr [S = S].Then, assuming I (L; S|D) = 0,

Figure 1 :
Figure 1: Workflow of the information leakage quantification framework.

Figure 2 :
Figure2: Noisy leakage mutual information I (L; S|D) of the most leaking signal in the Prince TI circuit for first-and second-order leakage, where each bit leaks independently with additive Gaussian noise according to SNR.The right-hand side axis shows the number of traces n Adv needed for an attack with r Adv = 50% based on (6).The first-order leakage bounds are based on (4), with p taken from (5), and I (X; S|D) bounds taken from Table1(these upper and lower bounds give the interval shown in the plot).

Theorem 1 .
Let X, D and S be discrete random variables with domains X , D and S, and let D and S be distributed uniformly.Let n D , n X|D , n S,D , and n X|S,D be positive integers and δ > 0 be a real number.Let D = (D i ) n D i=1 , D = (D i ) n S,D i=1 , S = (S i ) n S,Di=1 be vectors of independent random variables distributed identically to D and S respectively.Furthermore, letX i = (X i,j ) n X|D j=1 and X i = X i,j n X|S,D j=1be vectors of independent random variables distributed identically to X|D = D i and X|S = S i , D = D i respectively.Finally, letH n D ,n X|D (X|D) = 1 n D n D i=1 H n X|D (X|D = D i ) , H n S,D ,n X|S,D (X|S, D) = 1 n S,D n S,D i=1 H n X|S,D (X|S = S i , D = D i ) I (X; S|D) = H n D ,n X|D (X|D) − H n S,D ,n X|S,D (X|S, D)be estimates for H (X|D), H (X|D, S), and I (X; S|D) respectively.ThenI LB (X; S|D) = I (X; S|D) − log 2 1 + |X | − 1 n X|S,D − σ 2 log (δ −1) andI UB (X; S|D) = I (X; S|D) + log 2 1 + |X | − 1 n X|D + σ 2 log (δ −1 ), with H bin n X|S,D −1 2 , satisfy Pr [I LB (X; S|D) < I (X; S|D)] > 1 − δ and Pr [I UB (X; S|D) > I (X; S|D)] > 1 − δ. (11)Proof.We break down the difference between the estimate and real mutual information asI (X; S|D) − I (X; S|D) = P + Q + R − U − V − W, (12) dependent) random variables whose mutual information I (X k ; S|D) is bounded as in Theorem 1 with confidence 1 − δ/n comp > 0. Then Pr sup k I (X k ; S|D) < sup k I UB (X k ; S|D) > 1 − δ. (13) k ; S|D) ≥ sup k I UB (X k ; S|D) l ; S|D) ≥ sup k I UB (X k ; S|D) ≤ Pr ncomp l=1 {I (X l ; S|D) ≥ I UB (X l ; S|D)} ≤ ncomp k=1 Pr [I (X l ; S|D) ≥ I UB (X l ; S|D)] ≤ n comp • δ n comp = δ.

Proposition 4 (
Fano's inequality).Let X, Y and X be random variables with domains X , Y and X that obey the Markov chain X → Y → X , with p = Pr [X = X ].Then H (X|Y ) ≤ H (X|X ) ≤ H bin (p) + p log 2 |X | .Proof.See [CT06, Theorem 2.10.1] for the derivation.Proposition 5. Let X = (X i ) n i=1 and Y = (Y i ) n i=1 be two vectors of random variables.Then I (X; Y ) ≤ n I (X; Y ).Proof.See [dCGRP19, Lemma 3] for the derivation.Lemma 1.Let n Adv ≥ 0 be an integer, S be a random variable with domain S and D = (D i ) n Adv i=1 be a vector of random variables independent of S with the same domain D. Furthermore, let f : S × D → L be a randomized function modelling a memoryless channel, and let L = (L i ) n Adv i=1 be a vector of random variables with L i = f (S, D i ).Let Adv : D n Adv × L n Adv → S be a (potentially randomized) function attempting to recover the value of S as S = Adv (D, L) and let r Adv = Pr [S = S].Then, assuming I (L; S|D) = 0, n Adv ≥ H (S) − (1 − r Adv ) log 2 (|S| − 1) − H bin (r Adv ) I (L; S|D) .(6) Proof.Since S and D are independent, we have I ((S, D) ; (L, D)) = H (S, D) − H (S, D|L, D) = H (S) + H (D) − H (S|L, D) .(14) The random variables S, (L, D) and S rescribe a Markov chain, i.e., (L, D) conditionally depends on S because L conditionally depends on S, whereas S conditionally depends on (L, D) but not S. Therefore, Fano's inequality from Proposition 4 applies, giving H (S|L, D) ≤ H (S|S ) ≤ H bin (r Adv ) + (1 − r Adv ) log 2 |S| .(15) Alternatively, the mutual information I ((S, D) ; (L, D)) can be broken down as I ((S, D) ; (L, D)) = H (L, D) − H (L, D|S, D) = H (L, D) − H (L|S, D) = H (D) + H (L|D) − H (L|S, D) = H (D) + I (L; S|D) .(16) Combining equations (14) and (16), and subsequently applying inequality (15) gives I (L; S|D) = H (S) − H (S|L, D) ≥ H (S) − H bin (r Adv ) − (1 − r Adv ) log 2 |S| .(17)

Table 3 :
Single-core simulation performance of Verilator, PROLEAD and Quantile, applied to first-order protected designs AES DOM and Prince TI.