Masking in Fine-Grained Leakage Models: Construction, Implementation and Verification

. We propose a new approach for building eﬃcient, provably secure, and practically hardened implementations of masked algorithms. Our approach is based on a Domain Speciﬁc Language in which users can write eﬃcient assembly implementations and ﬁne-grained leakage models. The latter are then used as a basis for formal veriﬁcation, allowing for the ﬁrst time formal guarantees for a broad range of device-speciﬁc leakage eﬀects not addressed by prior work. The practical beneﬁts of our approach are demonstrated through a case study of the PRESENT S-Box: we develop a highly optimized and provably secure masked implementation, and show through practical evaluation based on TVLA that our implementation is practically resilient. Our approach signiﬁcantly narrows the gap between formal veriﬁcation of masking and practical security.


Introduction
Physical measurements reveal information beyond the inputs and outputs of programs as execution on physical devices emits information on intermediate computations steps. This information, encoded in the noise, time, power or electromagnetic radiations, is known as side-channel leakage and can be used to mount effective side-channel attacks.
The masking countermeasure splits secret data a into d shares (a 0 , . . . , a d−1 ) such that it is easy to compute a from all shares but impossible from less than d shares [CJRR99,ISW03]. This requires attacks to recover d shares instead of a single secret value. An active line of research considers the construction of masked algorithms, denoted "gadgets", which compute some functionality on masked inputs while enforcing that secrets cannot be recovered from less than d intermediate values. Construction of gadgets is particularly difficult when considering side-channel leakage which allows to observe more than just the intermediate computation steps [GMPO19]. Extended leakage models have been devised to consider additional side-channel information in systematic manner [FGP + 18,PR13,DDF14,BGI + 18].
Naturally, the question arises whether the masking countermeasure has been applied correctly to a gadget and whether it actually improves security. There exist two main, and fairly distinct, approaches to evaluate the effectiveness of the applied countermeasures: (I) Physical validation performing specific attacks or statistical tests on physical measurements [DSM17,DSV14,SM15,PV17,MOW17] and (II) Provable resilience based on attacker and leakage models [CJRR99,ISW03,FGP + 18,PR13,DDF19] and automated verification [BBD + 15, BBD + 16, Cor18,EWS14]. We review the strengths and weaknesses of both approaches.
The main benefit of reproducing attacks is the close correspondence to security; a successful attack implies a real threat, an unsuccessful attack rules out a vulnerability from exactly this attack under the specific evaluation parameters. The drawback is the inherently limited attacker scope to only those attacks which have been performed and the fact that exhaustive evaluation of all attacks remains intractable in most cases. Statistical evaluation allows to bound the retrievable side-channel information, the success rate of retrieval, or to detect side-channel information leakage without considering actual attacks [SM15,DSM17,DSV14]. Nonetheless, the evaluation remains specific to the input data and measurement environment used during assessment. In both cases it is difficult to decide at which point to stop the evaluation and to declare an implementation to be secure. In addition, these methods have large computational requirements which imply an increased wait time for the evaluation results. This prevents fast iterative development cycles with repeated proposal of implementations and evaluation thereof. Vice versa; the implementer has to carefully produce good implementations to avoid too frequent evaluation, limiting creative freedom.
Provable resilience provides a rigorous approach for proving the resilience of masked algorithms. The main benefit of this approach is that guarantees hold in all environments which comply with the assumptions of the proof and that assessment ends when such a proof is found. Inherent to all formal security notions for side-channel is (I) a formal leakage model which defines the side-channel characteristics considered in the proof and (II) an attacker model. The leakage model defines which side-channel information leakages (observations) are accessible to the attacker during execution of a masked program whereas the formal attacker model defines the capabilities of the attacker exploiting this information, e.g. how many side-channel measurements an attacker can perform.
Threshold probing security is arguably the most established approach for provable resilience. In this approach, execution leaks the value of intermediate computations, and the attacker can observe at most t side-channel leakages during an execution of a program masked with d > t shares. The notion of threshold probing security proves perfect resilience against adversaries observing at most t leakages but cannot provide assurance for attackers which potentially observe more. Programs enjoy security against practical attackers w.r.t. the chosen notion if the side-channel model accurately captures the device's leakage characteristics. The main benefit of probing security is that it can be used to rule out classes of attacks entirely, in difference to physical evaluation such as Test Vector Leakage Assessment (TVLA) [SM16]. Variations of threshold probing security such as the t-Non-interference (t-NI) and t-Strong-Non-interference (t-SNI) refinements exist which are easier to evaluate (check) or guarantee additional properties [BBD + 16].
A further benefit of provable resilience, and in particular of threshold probing security, is that it is amenable to automated verification. The main benefit of automated verification is that it delegates the formal analysis to a computer program and manages the combinatorial explosion that arises when analyzing complex gadgets at high orders.
The main critique of formal security notions for side-channel security is related to the large gap between formal model and behavior in practice, resulting in security assurance that are sometimes hard to interpret as recently shown by Gao et al. [GMPO19]. In particular, implementations of verified threshold probing secure algorithms frequently enjoy much less practical side-channel resilience as precisely analyzed by Balasch et al. [BGG + 14] and [GMPO19]. The advantage of physical evaluation is preeminent in that the increasing diversity of discovered side-channel leakage effects is not entirely considered by existing verification frameworks. One of the reasons being that the considered leakage effects are inherently integrated into the tool and therefore prevent flexible and fine-grained modeling. In the current setting, to consider new leakage with distinct behavior it is required to modify the tool's implementation. But the diversity of power side-channel leakage encountered in practice is expected to grow as long as new execution platforms are developed [PV17,BGG + 14,CGD18,MOW17,SSB + 19,Ves14].

Our Work
In this paper, we illustrate that automated verification can deliver provably resilient and practically hardened masked implementations with low overhead.

Fine-Grained Modeling of Leakage
We define a Domain Specific Language (DSL), denoted IL, for modeling assembly implementations and specifying fine-grained leakage models. The dual nature of IL has significant benefits. First, it empowers implementers to capture real leakage behavior in the form of device-specific leakage models, which ultimately ensure that the purported formal resilience guarantees are in close correspondence with practical behavior. Second, it supports efficient assembly level implementations of masked algorithms, and bypasses thorny issues with secure compilation. Third, it forms the basis of a generic automated verification framework in which assembly implementations can be analyzed generically, without the need to commit to a fixed or pre-existing leakage model. Specifically, we present a tool that takes as input an implementation and checks whether the implementation is secure w.r.t., the security notion associated with the leakage models given with the implementation. This stands in sharp contrast with prior work on automated verification, which commits to one or a fixed set of leakage models.

Optimized Hardening of Masking
The combination of fine-grained leakage models and reliable verification enables the construction of masked implementations which exhibit no detectable leakage in physical assessment, known as "hardened masking" or "hardening" of masked implementations. We demonstrate several improvements in constructing hardened gadgets and a hardened PRESENT S-Box at 1 st and 2 nd order which exhibit no detectable leakage beyond one million measurements in TVLA. We provide generic optimization strategies which reduce the overhead from hardening by executing the code of a secure composition of gadgets in an altered order instead of introducing overhead by inserting additional instructions as countermeasure. The resulting overhead reduction of almost 57% for the first order implementation and of 64% for the second order shows a need to consider composition strategies in addition to established secure composition results. Our contributions outperform the "lazy strategy" [BGG + 14] of doubling the number of shares in masking instead of performing hardening; the security order can be increased without detrimental impact on performance as our optimized 2 nd order hardened PRESENT S-Box is as fast as a non-optimized 1 st order hardened PRESENT S-Box, effectively increasing the security order "for free".

Related Work
For the sake of clarity, we organize related work by areas: glitches [PR13,DDF14,BGI + 18]. Leakage effects were for the first time summarized in a general model by the Robust Probing model [FGP + 18]. Later, De Meyer et al. in [DBR19], introduce their concept of glitch immunity and unify security concepts such as (Strong) Non-Interference in an information theoretic manner. In comparison to these works, our DSL offers a much higher flexibility in terms of leakages, since it allows to take into account a broader class of leakages, and consequently more realistic scenarios. Another relevant area tackles the problem of composing secure gadgets; a prominent new development is the introduction of strong non-interference, which achieves desirable composition properties that cannot be obtained under the standard notion of threshold probing security [BBD + 16]. Belaid et al. present an elegant alternative approach to solve the problem of composition; however their approach is based on the assumption that only ISW gadgets are used [BGR18]. The formal analysis of composability in extended leakage models started to receive more attention with the analysis of Faust et al. in [FGP + 18], which formalized the physical leakages of glitches, transitions and couplings with the concept of extended-probes and proved the ISW multiplication scheme to be probing secure against glitches in two cycles. Later, Cassiers et al. in [CGLS20] proposed the concept of Hardware Private Circuits, which formalizes compositional probing security against glitches, and presented gadgets securely composable at arbitrary orders against glitches. Our work augments the t-NI and t-SNI notions to capture resilience and composition in any fine-grained model which can be expressed using our DSL and in the presence of stateful execution, as required for provably secure compilers such as MaskComp and Tornado [BBD + 16, BDM + 20]. The research area of optimization of hardened masking did not receive much attention in the literature, for the best of our knowledge.

Automated Verification
Proving resilience of masked implementations at high orders incurs a significant combinatorial cost, making the task error-prone, even for relatively simple gadgets. Moss et al. [MOPT12] were the first to show how this issue can be managed using program analysis. Although their work is focused on first-order implementations, it has triggered a spate of works, many of which accommodate high orders [BRNI13,EWS14,BBD + 15,Cor18,ZGSW18,BGI + 18]. MaskVerif [BBD + 15, BBC + 19], which we use in our work, is arguably one of the most advanced tools, and is able to verify different notions of security, including t-NI and t-SNI at higher orders, for different models, including ISW, ISW with transitions, and ISW with glitches. Notably, SILVER can verify the recent PINI notion [KSM20]. Gigerl et al. verify the security of software implementations executed on a processor's netlist w.r.t. a fixed hardware leakage model, bridging both worlds [GHP + 20]. Furthermore, the latest version of MaskVerif captures multiple side-channel effects for hardware platforms, which are configurable by the user. However, the input language of MaskVerif lacks the expressiveness of IL, making it difficult to capture the rich class of potential leakage in software implementations.

Modeling Side-Channel Behavior
Side-channel behavior is also expressed for analysis purposes other than provable resilience. Papagiannopoulos and Veshchikov construct models of platform specific side-channel effects they discover in practice [PV17]. Their tool ASCOLD prevents combinations of shares in the considered leakage effects, which are hard coded into the tool. Most importantly, they show that implementations enjoy improved practical security when no shares are combined in their leakage model, which is reminiscent of first order probing security in extended leakage models. Our contributions allow users to provide fine-grained leakage specifications in IL to verify widely established formal security notions at higher orders.
ELMO [MOW17], MAPS [CGD18] and SILK [Ves14] intend to simulate physical measurements based on detailed models. The tools assume fixed leakage effects but allow customization by the user in form of valuation functions. This degree of detail is relevant for simulating good physical measurements but not necessary for our information theoretic notions of security. The authors of MAPS distinguish effects which are beyond what is captured in ELMO's fixed set of combinations and show the need to remain unbiased towards leakage specifications when developing tools for side-channel resilience evaluation. Most notably, ELMO can accurately simulate measurements from models inferred in an almost automated manner and is now being used in works attempting to automate the construction of hardened implementations [SSB + 19].

Expressing Side-Channel Leakage
Verification of side-channel resilience requires suitable representation of the implementation under assessment. This representation must express a program's functional semantic and information observable per side-channel. It is well known that the leakage behavior of execution platforms differs, and this diversity must be expressible to gain meaningful security assurance from verification.

A Domain Specific Language with Explicit Leakage
Already at CHES 2013 Bayrak et al. [BRNI13] point out the difficulty of expressing arbitrary side-channel leakage behavior yet providing a "good interface" to users willing to specify device specific side-channel characteristics. The reason can be related to the fundamental approach of implicitly augmenting the underlying language's operators with side-channel. In such setting, the addition of two variables c ← a + b; implicitly models information observable by an adversary, but what is leaked (e.g. a, b, or a + b) must be encoded in the language semantics (i.e., the meaning of ← and +) and thus prevents flexible adoption of leakage characteristics.
The concept of "explicit leakage" is an alternative as it requires to explicitly state what side-channel information is emitted. We present a Domain Specific Language (DSL) adopting the concept; the language's constructs do not capture side-channel behavior (i.e., their execution provides no observable side-channel information), except for a dedicated statement "leak" which can be understood as providing specific information to an adversary. The given example can now be stated as c ← a + b; leak {a + b} ;, which has two important benefits: First, verification and representation of programs can be decoupled to become two independent tasks. Second, specification of side-channel behavior becomes more flexible in that a diverse set of complex side-channels can be expressed and altered without effort.
Our DSL, named "IL" for "intermediate language" has specific features to support representation of low-level software. A Backus Normal Form representation is given in Figure 1. Its building blocks are states χ, expressions e, commands c of multiple statements i and global declarations g of variables and macros with local variables x 1 , . . . , x k .  A state element χ is either a variable x, an array x with an indexing expression e, or a location in memory e . Memory is distinguished to allow specifications of disjoint memory regions which eases formal verification. Expressions are built from state elements χ, constant integers n, unique labels l, and operators o applied to expressions. Infix abbreviations for logical "and" ⊗, "exclusive-or" ⊕, addition + and right shift are used in the following. Allowed statements i are assignments χ ← e, explicit leaks leak {e 1 , . . . , e j } of one or more expressions and call to a previously defined macro m(e 1 , . . . , e j ) where m is the name of the macro. Statements for if conditionals and while loops are supported as well. Labels l are needed to represent the execution of microcontrollers (MCUs) which is based on the address of an instruction. They are defined by a dedicated statement, enabling execution to proceed at the instruction subsequent to this label. Static jumps to unique labels and indirect jumps based on expressions of labels are supported to represent control-flow.
In a nutshell, executable implementations consist of an unstructured list of hardware instructions where each instruction is located at a specific address and execution steps over addresses. We represent implementations as a list of label definitions and macro calls: every instruction is represented by an IL label corresponding to the address of this instruction and a macro call representing the hardware instruction and its operands. A line of Assembly code "0x16E: ADDS R0 R1" becomes almost identical IL code: label 0x16E; ADDS(R0, R1) ;, where adds is a call to the model of the "ADDS" instruction.
The DSL allows to express fine-grained leakage models specifying the semantic and side-channel behavior of assembly instructions. In this light, verifying side-channel resilience of implementations involves three steps: (I) modeling behavior of instructions, (II) representing an implementation using such a model and (III) analyzing or verifying the representation (Section 3).
We stress the significant benefit: verification and representation become separate concerns, i.e., automated verification is now defined over the semantic of our DSL and the separate leakage model of step (I) can be freely modified or exchanged without altering the work-flow in stages (II) and (III). In particular, our tool, named "scVerif" allows the user to provide such a leakage specification in conjunction with an implementation for verification of side-channel resilience.

Modeling Instruction Semantics
The DSL allows to construct models which are specific to the device executing an implementation by attaching device specific side-channel behavior. This is especially important for the Arm and RISC-V Instruction Set Architectures (ISAs) since these are implemented in various MCUs which execute instructions differently, potentially giving rise to distinct side-channel information. The instruction semantic must be modeled since some leakage effects depend not only on intermediate state but also on the order of execution (e.g. control flow). In the following, we show construction of models for Arm Cortex M0+ (CM0+) instructions which are augmented with leakage in Section 2.3. The DSL enables construction of leakage models for other architectures or programming languages as well.
IL enables to express architecture flags, carry bits, unsigned/signed operations, cast between data types, bit operations, control flow, etc. in close correspondence to ISA specifications. The instructions of the CM0+ ISA operate on a set of globally accessible registers and flags, denoted architecture state. They can be modeled as global variables in IL: var R0; var R1; . . . var PC; var apsrc; (carry flag) var apsrv; (overflow flag) var apsrz; (zero flag) var apsrn; (negative flag).
Addition is used in the adds instruction and instructions operating on pointers such as ldr (load) and str (store). Expressing the semantic of addition with carry requires casting 32 bit values to unsigned, respective signed values and comparing the results of addition to assign the carry and overflow flags correctly. The IL model of adds is expressed in Algorithm 1, closely following the Arm ISA specification [ARM18] with six parameters for Algorithm 1 Low-level model of addition with carry and instruction for addition.

11:
apsrz ← rd = 0; The adds instruction is modeled by calling the macro and expressing the side-effect on global flags. A special case of addition to pc requires to issue a branch to the resulting address (represented as label). The operator n is used to compare whether the parameter rd is equal to the register with name pc and conditionally issue a branch.
Sampling randomness, e.g. in the form of queries to random number generators, can be expressed by reading from a tape of pre-sampled randomness in global state.

Modeling Leakage
We augment the instruction models with a representation of power side-channels specific to threshold probing security. For this security notion it is sufficient to model the dependencies of leakages, which is much simpler and more portable than modeling the constituting function defining the actual value observable by an adversary. Specifying multiple expressions within a single leak{e 1 , e 2 , . . .} statement allows the threshold probing attacker to observe multiple values (expressions) at the cost of a single probe. On hardware this is known from the "glitch" leakage effect which allows to observe multiple values at once [FGP + 18]. The leak statement allows generic specification of such multi-variate leakage both for sidechannel leakage effects but also as worst-case specifications of observations. In particular, a program which is resilient w.r.t. leak{e 1 , e 2 } is necessarily resilient w.r.t. any function f (a, b) in leak{f (e 1 , e 2 )} but not vice versa.
The adds instruction is augmented with leakage, which is representative for ands (logical conjunction) and eors (exclusive disjunction) as they behave similar in our model. Observable leakage arises from computing the sum and can be modeled by the statement leak {rd + rn} ;. Transition leakage as in the robust probing model of [FGP + 18] is modeled in a worst case manner: instead of the Hamming-Distance there are two values leaked at the cost of a single probe: leak {rd, rd + rn} ;, covering any exotic combination as e.g. observed in [GMPO19,MOW17]. The order of execution matters, thus this leakage must be added at the top of the function, before assigning rd 2 . For better clarity we expose these two leakage effects as macros. The specification of adds is given in Algorithm 2.
Definition 1 (Computation Leakage Effect). The computation leakage effect produces an observation on the value resulting from the evaluation of expression e.  EmitComputationLeak(rd + rn) ;

6:
ADDS(rd, rn) ; 7: } Power side-channels encountered in practice sometimes depend on previously executed instructions. Corre et al. describe a leakage effect, named "operand leakage", which leaks a combination of current and previous operands of two instructions (e.g. parameters to adds) [CGD18]. A similar effect on memory accesses was observed by Papagiannopoulos and Veshchikov, denoted as "memory remnant" in [PV17]. The explicit leak statement enables modeling of such cross-instruction leakage effects by introducing additional state elements χ, denoted as "leakage state". In general, leakage effects which depend on one value p from past execution and one value c from current instruction can be modeled by placing p in global state opA during the first instruction and emitting a leak of global state and current value in leak {opA, p} in the latter instruction. The operand and memory remnant leakage effects always emit leakage and update leakage state jointly. We put forward a systematization under the name "revenant leakage", leaning its name to the (unexpected) comeback of sensitive data from past execution steps and, in the figurative sense, haunting the living cryptographer during construction of secure masking. The leakage effect is modeled in Definition 3 and applied to the adds instruction in Algorithm 2. The definition can easily be modified such that the state change is conditional to a user-defined predicate or the leakage is extended to a history of more than one instruction.
Definition 3 (Revenant Leakage Effect). The "revenant" leakage effect releases a transition leakage prior to updating some leakage state x ← p.
The leakage effects are applied in instruction models by calling EmitRevenantLeak with the distinct leakage state used for caching the value (e.g. opA) and the value leaking in combination, e.g. the first operand to an addition. The overall leakage model for a simplified ISA is depicted in Algorithm 3, it corresponds to the model used for CM0+ Assembly 3 . In our model the leakage state elements are rn, i ← rd; 31: } denoted by opA, opB, opR, opW to model four distinct revenant effects for the 1 st and 2 nd operand of computation as well as for load and store separately. Some effects have been refined to match the behavior encountered in practice, which diverges in the mapping of operands and an unexpected propagation of the destination register in load instructions.
The empirical construction of this model was conducted in cooperation and is described in [ABB + 21]. In short, the model was initiated by testing whether the leakage effects discovered in prior work, are detectable on our platform by constructing small first-order test cases as described in [PV17]. In an iterative process multiple gadgets with provable security in the model were constructed and assessed by physical leakage detection. Every detected leakage was analyzed and added to the model, again guided by tests according to [PV17], until no leakage was detectable anymore.
In [PV17] the "neighboring" leakage is reported, but we did not observe it on CM0+ MCUs during our case-study. The effect represents a coupling between registers, probably related to the special architecture of the "ATMega163", highlighting the need of device specific leakage models. Neighboring leakage can be modeled by using the n operator as shown in Definition 4. Definition 4 (Neighboring Leakage Effect). The neighboring leakage effect causes a leak of an unrelated register RN when register RM is accessed. The DSL in combination with the concept of explicit leakage enables to model all leakage effects known to us such that verification of threshold probing security becomes aware of these additional leakages. Our effect definitions can serve as building block to construct models such as our model in Algorithm 3 but can be freely modified to model behavior not yet publicly known. In particular, the expressiveness of modeling appears not to be limited except in that further computation operations o might need to be added to our small DSL.

Stateful (S)NI and Automated Verification
In this section, we lay the foundations for proving security of IL implementations. We first define security notions for IL gadgets: following a recent trend [BBD + 16], we consider two notions: non-interference (NI) and strong non-interference (SNI), which achieve different composability properties. Then, we present an effective method for verifying whether an IL gadget satisfies one of these notions.

Security Definitions
We first start with a brief explanation of the need for a new security definition. At a high level, security of stateful computations requires dealing with residual effects on state. Indeed, when a gadget is executed on the processor, it does not only return the computed output but it additionally leaves "residue" in registers, memory, or leakage state. Code subsequently executed might produce leakages combining these residues with output shares, breaking secure composability. As an example, let us consider the composition of a stateful refreshing gadget with a stateful multiplication scheme: Refr Mult(x, y) . In the case of non-stateful gadgets, if Mult is t-NI and Refr is t-SNI, such a composition is t-SNI. However, if the gadgets are stateful this is not necessarily anymore the case.
We give a concrete example: Consider a modified ISW multiplication such that it is t-SNI even with the leakages defined in the previous chapter, the output state s out of the multiplication, in combination with the revenant leakage effect in the load of Algorithm 3 can be used to retrieve information about the secret as follows: After the multiplication one register could contain the last output share of the multiplication gadget and the gadget is still secure. If the refreshing first loads the first output share of the multiplication in the same register, the revenant effect emits an observation containing both values (the first and last output share of the multiplication) in a single probe. Thus, the last probes can be used to get the remaining output shares of the multiplication, and the composition is clearly vulnerable.
We first introduce the notion of gadget, on which our security definitions are based. Informally, gadgets are IL macros with security annotations.
Definition 5 (Gadget). A gadget is an IL macro with security annotations: • a security environment, mapping inputs and outputs to a security level: secret (H) or public (L), • a memory typing, mapping memory locations to a security level: secret (H), public (L), random (R), • share declarations, consisting of tuples of inputs and outputs. We adopt the convention that all tuples are of the same size, and disjoint, and that all inputs and outputs must belong to a share declaration.
We now state two main notions of security. The first notion is an elaboration of the usual notion of non-interference, and is stated relative to a public input state s in and public output state s out . The definition is split in two parts: the first part captures that the gadget does not leak, and the second part captures that the gadget respects the security annotations.
Definition 6 (Stateful t-NI). A gadget with input state s in and output state s out is stateful t-Non-Interfering (t-NI) if every set of t observations can be simulated by using at most t shares of each input and any number of values from the input state s in . Moreover, any number of observations on the output state s out can be simulated without using any input share but using any number of values from the input state s in .
The second notion is an elaboration of strong non-interference. Following standard practice, we dinstinguish between internal observations (i.e., observations that differ from outputs) and output observations.

Definition 7 (Stateful t-SNI).
A gadget with input state s in and output state s out is stateful t-Strong-Non-Interfering (t-SNI), if every set of t 1 observations on the internal observations, t 2 observations on the output values such that t 1 + t 2 ≤ t, combined with any number of observations on the output state s out , can be simulated by using at most t 1 shares of each input and any number of values from the input state s in .
Both definitions require that gadgets have a fixed number of shares. This assumption is made here for the simplicity of presentation but is not required by our tool.
Finally, we note that there exist other notions of security. One such notion is called probing security. We do not define this notion formally here but note that for stateful gadgets t-SNI implies probing security, provided the masked inputs are mutually independent families of shares, and the input state is probabilistic independent of masked inputs and internal randomness.
We validate our notions of security through a proof that they are composable -Section 4 introduces new and optimized composition theorems. The general composition results hold for stateful t-NI, respective stateful t-SNI, because the notions ensure similar properties as their non-stateful counterparts. Proposition 1. Let G 1 (·, ·) and G 2 (·) be two stateful gadgets as in Figure 2. Assuming G 2 is stateful t-SNI and G 1 is stateful t-NI, then the composition G 2 (G 1 (·), ·) is stateful t-SNI.
Proof. Let s 1 in and s 1 out be respectively the state input and state output of G 1 and s 2 in and s 2 out respectively the state input and state output of G 2 . We prove in the following that the composition G 2 (G 1 (·), ·) is stateful t-SNI.
Let Ω = (I, O) be the set of observations on the whole composition, where I i are the observations on the internal computation of G i , I = I 1 ∪ I 2 with |I| = |I 1 ∪ I 2 | ≤ t 1 and |I| + |O| ≤ t.
Since G 2 is stateful t-SNI and |I 2 ∪ O| ≤ t, then there exist observation sets S 2 1 and S 2 2 such that |S 2 1 | ≤ |I 2 |, |S 2 2 | ≤ |I 2 | and all the observations on internal and output values combined with any number of observations on the output state s 2 out can be simulated by using any number of values from the input state s 2 in and the shares of each input with index respectively in S 2 1 and S 2 2 .
Since G 1 is stateful t-NI, |I 1 ∪ S 2 1 | ≤ |I 1 ∪ I 2 | ≤ t and s 1 out = s 2 in , then there exists an observation set S 1 such that |S 1 | ≤ |I 1 | + |S 2 1 | and all the observations on internal and output values combined with any number of observations on the output state s 2 out can be simulated by using any number of values from the input state s 1 in and the shares of the input with index in S 1 . Now, composing the simulators that we have for the two gadgets G 1 and G 2 , all the observations on internal and output values of the circuit combined with any number of observations on the output state can be simulated from |S 1 | ≤ |I 1 | + |S 2 1 | ≤ |I 1 | + |I 2 | ≤ t 1 shares of the first input and |S 2 2 | ≤ |I 2 | shares of the second input and any number of values from the input state s 1 in . Therefore, we conclude that the circuit is stateful t-SNI.

Automated Verification
In this section, we consider the problem of formally verifying that an IL program is secure at order t, for t ≥ 1. The obvious angle for attacking this problem is to extend existing formal verification approaches to IL. However, there are two important caveats. First, some verification approaches make specific assumptions on the programs, e.g. [BGR18] assumes that gadgets are built from ISW core gadgets. Such assumptions are reasonable for more theoretical models but are difficult to transpose to a more practical model; besides they defeat the purpose of our approach, which is to provide programmers with a flexible environment to build verified implementations. Second, reimplementing t-SNI and t-NI checker for IL is a very significant engineering endeavour. Therefore, we follow an alternative method: we define a transformation T that maps IL programs into a fragment that coincides with the core language of MaskVerif and reuse the verification algorithm 4 of MaskVerif for checking the transformed program. The transformation is explained below and satisfies correctness and precision. Specifically, the transformation T is correct: if T (P ) is secure at order t then P is secure at order t (where security is either t-NI or t-SNI of t). The transformation T is also precise: if P is secure at order t and T (P ) is defined then T (P ) is secure at order t. Thus, the sole concern with the approach is the partial nature of the transformation T . While our approach rejects legitimate programs, it works well on a broad range of examples. The main differences between IL and MaskVerif is that the latter does not have memory accesses, macros and control-flow instructions and limits array accesses to constant indices. Our program transformation proceeds in two steps: first, all macros are inlined; then the expanded program is partially evaluated.

Partial evaluation
The partial evaluator takes as input an IL program and a public initial state and returns another IL program. The output program is equivalent to the original program w.r.t. functionality and leakage, under some mild assumptions about initial memory layout, explained below. Our partial evaluator manipulates abstract values and tuples of abstract values, and abstract memories. An abstract value ϑ can be either a base value corresponding to concrete base values like Boolean b or integer n, a label l that represent abstract code pointers and are used for indirect jumps, and abstract pointers x, n . The latter are an abstract representation of a real pointer. Formally, the syntax of values is defined by: Initially the abstract memory is split into different (disjoint) regions modeled by fresh arrays with maximal offset that do not exist in the original program. Those regions is what we call the memory layout. A base value x, n represents a pointer to the memory region x with the offset n (an integer). This encoding is helpful to deal with pointer arithmetic.
The following code gives an example of region declarations: region mem w32 a It means that the initial memory is split into 5 distinct region a, b, c, rnd, stack, where a is an array of size 2 with index 0 and 1. Remark that the initial assumption is not checked (and cannot be checked by the tool). Then another part of the memory layout provides some initialisation for registers (IL variables): init r0 <rnd, 0> init r1 <c, 0> init r2 <a, 0> init r3 <b, 0> init sp <stack, 0> In particular, this specifies that initially the register r0 is a pointer to the region rnd. Some extra information is also provided to indicate which regions initially contain random values, or correspond to input/output shares. The partial evaluator is parameterized by a state p, c, µ, ρ, ec , where p is the original IL program, c is the current command, µ a mapping from p's variables to their abstract value, ρ a mapping from variable corresponding to memory region to their abstract value, and ec is the sequence of commands that have been partially executed. The partial evaluator iteratively propagates values, removes branching instructions, and replaces memory accesses by variable accesses (or constant array accesses). Figure 3 provides some selected rules for the partial evaluator.
For expressions, the partial evaluator computes the value ϑ of e in µ and ρ (which can be ⊥) and an expression e where memory/array accesses are replaced by variables/constant array accesses, i.e., [[e]] ρ µ = (ϑ, e ). If the expression is of the form o(e 1 , . . . , e n ), all the arguments e i are partially evaluated to (ϑ i , e i ), the resulting expression is the operator applied to the resulting expressions e i and the resulting value is the partial evaluation of o(ϑ 1 , . . . , ϑ n ).õ checks if the ϑ i are concrete values in that case it computes the concrete values else it returns ⊥. Sometimes the partial evaluator uses more powerful simplification rules like 0+ϑ ϑ. If the expression is a variable, the partial evaluator simply returns the value stored in µ and the variable itself. The case is similar for array accesses, first the index expression is

in this case the resulting expression is x[ofs] and the value is ρ(x)[ofs].
For assignment, the partial evaluator evaluates the left side of the assignment χ as an expression, leading to a refined "left side" expression χ , the right part of the assignment e is also partially evaluated leading to (ϑ, e ) the partially evaluated assignment is χ ← e and the mapping µ and ρ are updated accordingly with the value ϑ. For leak instructions, the partial evaluator simply propagates known information into the command. For controlflow instructions, the partial evaluator tries to resolve the control-flow and eliminates the instruction. For goto statements, the partial evaluator tries to resolve the next instruction to be executed and eliminates the instruction.
The transformation is sound.

Proposition 2 (Informal). Let P and P be an IL gadget and the corresponding MaskVerif gadget output by the partial evaluator. For every initial state s satisfying the memory layout assumptions, the global leakage of P w.r.t. s and a set of inputs is equal to the global leakage of P w.r.t. the same inputs.
We briefly comment on proving Proposition 2. In order to provide a formal proof, a formal semantics of gadgets is needed. Our treatment so far has intentionally been left informal. However, the behavior of gadgets can be made precise using programming language semantics. We briefly explain how. Specifically, the execution of gadgets can be modelled by a small-step semantics that captures one-step execution between states. This semantics is mostly standard, except for the leak statements which generate observations. Using the small-step semantics, one can model global leakage as a function that takes as input initial values for the inputs and an initial state and produces a sequence of observations, a list of outputs and a final state. Last, we transform global leakage into a probabilistic function by sampling all inputs tagged with the security type R independently and uniformly from their underlying set. This yields a function that takes as input initial values for the inputs and an initial partial state (restricted to the non-random values), a list of observations selected by the adversary and returns a joint distribution over tuples of values, where each tuple corresponds to an observation selected by the adversary.

Implementation
We have implemented the partial evaluator as a front-end to MaskVerif, named "scVerif". Since the input language of MaskVerif did not include a leak construction (only the internal representation was using it), we have modified the MaskVerif input language to provide direct access to the leak statement as well as leak free assignment. Users are now able to write MaskVerif gadgets with custom leakage by using explicit leak and leak-free statements.
Moreover, we have extended the input language of maskverif so that inputs and outputs can be declared as public. This allows to express the presented notions of stateful t-SNI and t-NI. The extended checker verifies that public outputs only depend on constants and public inputs, i.e., they neither depend on the private or shared input variables, nor the random variables used by the gadget. No further modification was necessary to check formal notions w.r.t. to custom leakage models. Automated representation of programs in custom leakage models is done by the scVerif front-end that generates an equivalent MaskVerif program (see Prop 2).
Users can write leakage models, annotations and programs in IL or provide programs in Assembly code. If the output program lies in the MaskVerif fragment, then verification starts with user specified parameters such as security order or which property to verify. Else, the program is rejected. The tool also applies to bit-and n-sliced implementations and provides additional automation for temporal accumulation of probes to represent capacitance in physical measurement. Sharesclicing is not yet supported as the scheme is questioned in [GMPO19] and fundamentally insecure in our CM0+ models. However, few additional transformations allow to extend our work to these implementations.

Representative Proofs of Efficient Masking
We describe the construction and optimization of gadgets that do not exhibit vulnerable leakage at any order t ≤ d − 1, where d is the number of shares. That is, we harden masked implementations to be secure at the optimal order t = d − 1 in fine-grained leakage models, opposed to the "lazy" strategy of masking in a basic model at higher orders with the intention to achieve practical security at lower orders t < d − 1 [BGG + 14].
Creating a secure gadget is an iterative process which involves three tasks: (a) understanding and modeling the actual leakage behavior (b) constructing an (efficient) implementation which is secure in the fine-grained model (c) optionally performing physical evaluation of side-channel resilience to assess the quality of the model for the specific target platform. Protecting an implementation against side-channel effects mandates insertion of instructions to circumvent vulnerable combination of masked secrets.

Hardened Masking
In this section, we discuss the development of gadgets which enjoy security in any finegrained leakage model. We design gadgets first in the simplified IL model depicted in Algorithm 3. Designing in IL is more flexible than assembly since shortcuts such as leakage free operations and abstract countermeasures are available. Once the gadget is hardened the gadget is implemented in assembly and verified again, which is to a large degree trivial but requires to substitute abstract countermeasures by concrete instructions.
Each gadget takes as input one or two values a and b, respectively encoded in (a 0 , . . . , a d−1 ) and (b 0 , . . . , b d−1 ), and gives as output the shares (c 0 , . . . , c d−1 ), encoding a value c. By convention, inputs and outputs are stored in memory to allow construction of implementations at higher orders. Our gadgets, provided in the Supplementary material, use the registers R0, R1, R2, and R3 as memory addresses pointing to inputs, outputs and random values stored in memory. The registers R4, R5, R6, and R7 are used to perform the elementary operations. Registers beyond R7 are used rarely.
A gadget which is correctly masked in the basic leakage model, i.e., secure against computation leakage (Definition 1), can be secured by purging the architecture and leakage state at selected locations within the code 5 . The reason is simple: every leak must be defined over elements of the state and removing sensitive data from these elements prior the instruction causing such leak mitigates the ability to observe the sensitive data.
We distinguish "scrubbing" countermeasures, which purge architecture state, and "clearing" countermeasures, which remove values residing in leakage state. Two macros serve as abstract countermeasures, scrub(R0) and clear(opA) assign some value which is independent of secrets to R0, respectively opA. On assembly level these need to be substituted by available instructions. Clearing opA or opB is mostly done by ANDS(R0, R0) ; since R0 is a public memory address. Purging opR (respective opW) requires to execute LOAD (respectively STORE) instruction reading (writing) a public value from memory, but the side-effects of both instructions require additional care. Sometimes multiple countermeasures can be combined in assembly.
Moreover, we approach the problem of securing a composition against the leakage effects introduced in Section 2.1 by ensuring that all the registers involved in the computation of a gadget are completely cleaned before the composition with the next gadget. This, indeed, easily guarantees the requirements of stateful t-SNI in Definition 7. We use fclear as abstract placeholder for the macro run after each gadget to clear the state s out . Additional clearings are needed between intermediate computations in the gadgets; these macros are represented as clear i , where the index distinguishes between the different macros in the gadget since each variety of leakage needs a different countermeasure.
Finally, randomness is employed in order to randomize part of the computation, especially in the case of non-linear gadgets, where otherwise with one probe the attacker could get the knowledge of several shares of the inputs. We indicate with rnd a value picked uniformly at random from F 32 2 , prior to execution. For giving an intuition of our strategy, we depict in Algorithm 4 and Algorithm 5 respectively an addition and a multiplication scheme at 1 st order of security. Some other examples of stateful t-SNI addition, multiplication and refreshing schemes for different orders can be found in section A of the Supplementary material. They have all been verified to be stateful t-SNI with the use of our new tool. Some algorithms are clearly inspired by schemes already existing in the literature, as the ISW multiplication [ISW03] and the schemes in [BBP + 16]. We analyze the S-Box of Present and provide a stateful t-NI secure Algorithm for first and second order in Appendix C. Stateful t-SNI Security can be achieved by refreshing the output with a secure stateful t-SNI Refresh gadget. For reasons of simplicity, we divided the S-Box into three functions and designed stateful t-NI secure Gadgets accordingly. Considering that all three Gadgets only have one fan-in and fan-out, the composition is also stateful t-NI secure.
The methodology just described, despite being easy to apply, can be expensive, as it requires an extensive use of clearings, especially for guaranteeing secure composition. However, a couple of strategies can be adopted in order to overcome this drawback and optimize the use of clearings. We describe such optimization strategies in the following.
More formally, let a, b, c be d-shared encodings (a i ) i∈ [d] , Figure 4 (left) and clear are the leakage countermeasures between each share-wise computation as explained in Section 4.1. In the following we consider a composition F (F(a, b), c) and present a technique to optimize the efficiency of both gadgets. Instead of performing first the inner function F(a, b) =: m and then the outer function F(m, c) =: o, we perform This method allows us to save on the number of clear, load, and store operations. In a normal execution, the output m of the first gadget needs to be stored in memory, just to be loaded during the execution of the second gadget. With the optimized execution, instead, we do not need to have such loads and stores, since the two gadgets are performed at the same time. Additionally, by considering the composition as a unique gadget, we can save on the clearings that would be otherwise needed after the first gadget to ensure the stateful t-SNI. We provide a security proof forF (a, b, c) in Proposition 3 and a concrete application of Proposition 3 to Algorithm 4 in the Supplementary material.

Proposition 3. The optimized gadgetF(a, b, c) as described above, is stateful-t-NI.
Proof. We show that all observations in the gadget depend on at most one share of each input. Since the attacker can perform at most n − 1 observations, this implies that any combination of its observations is independent of at least one share of each input. More precisely, the computation of the i th output ofF(a, b, c) only depends on the i th shares of a, b or c. Hence the observations in each iteration only leak information about the i th share since we clear the state after the computation of each output share. Therefore, any combination of t ≤ d − 1 observations is dependent on at most t shares of each input, and any set t observations is simulatable with at most t shares of each input bundle.

Optimized Composition of Gadgets with Independent Inputs
The second scenario that we take into account is the one described in Figure 4 (right), where two non-linear gadgets, e.g. two multiplication algorithms, sharing one of the inputs are performed. We refer in the following to this situation as non-linear composition. In this case, it is possible to reduce the number of loadings and clearings, by re-using the shares in common, once loaded into the registers and replacing the intermediate clearings of a gadget by independent computations of another gadget.
The optimization technique described to save clearings also holds for two gadgets with independent inputs. The intermediate clearings in a gadget ensure that two computations on two different shares of the same secret do not leak together. Since this clearing is only a computation independent of the secret, the clearing can be replaced by a useful computation of another gadget.
With our tool, we have proven that the merge of stateful t-SNI multiplications, given in Appendix A of the Supplementary material, is also stateful t-SNI. Since we only need the more efficient special optimization for the PRESENT S-Box, we focus on two multiplications with shared input. In total, we save 64% cycles for second order. Overhead

Case study: Masking the PRESENT S-Box
The impact of our methodology is estimated by masking a large circuit, the PRESENT block cipher, at 1 st and 2 nd order with the basic rules for composability (Section 3) and the introduced optimizations (Section 4.2 and 4.3). The structure of the S-Box of PRESENT allows the adoption of the optimization techniques, both in the linear and in the non-linear composition. Based on [CFE16], the S-Box consists of two share-wise functions and one non-linear function. The non-linear part is depicted in Figure 5. A complete description of the S-Box is provided in the Supplementary material. Our masked implementation of the PRESENT S-Box, using the trivial solution for composability, is provided in the Supplementary material. Algorithm 12 in Appendix C depicts the masked S-Box, where the subroutines calcA in Algorithm 14, calcB in Algorithm 15 and calcG in Algorithm 16 are first order NI gadgets. The optimized version of it, instead, employs our optimization techniques which are given in the subroutines calcA_opt, calcB_opt and calcG_opt, respectively in Algorithms 17, 18 and 19. Our focus is the optimization of computational overhead arising from hardening masked implementations. The optimizations reduce the use of randomness in case of probabilistic clearings and scrubs. Furthermore, the tool can verify that manual choices of randomness reuse in large implementations are secure [FPS17].
Our 1 st and 2 nd order PRESENT S-Box require 7, respective 26 words of entropy, the implementation is given in Algorithm 13. With the help of the tool the requirements can be reduced to 3, respective 18 words of entropy.
As metric to measure the improvements of our optimization techniques, we take the amount of basic operations used in the implementations, as shown in Table 1. For reference, the metric from an unprotected bitsliced PRESENT S-Box is shown as well. From this comparison, we can see that both implementations use almost the same amount of core operations (xor and and), since the two versions implement the same algorithm. More precisely, the non-optimized version requires two xor operations less, thanks to the parallel calculation of all output values in calcG_opt, where b · d needs to be added to a and d . On the other hand, since in the non-optimized version more intermediate values need to be stored and loaded inside the functions, while in the optimized version it is only needed to store intermediate values between the functions, the number of stores and loads employed is lower, producing an improvement in terms of operation count. Additionally, the amount of loads is reduced further in the optimized version by loading every input share once per output share. This holds with the exception of the limited amount of registers, requiring to load a 1 and d 1 twice for the second output share and b 0 and b 1 only are needed to load once in the whole gadget.
In Table 2 the efficiency of our approach is depicted as the ratio between the operation needed for the calculation and the overhead caused by clearings in both the normally composed and the optimized versions of the PRESENT S-Box. The comparison shows that the optimizations strongly reduce the overhead from hardening.
In these regards, we underline how the aforementioned optimization is possible thanks to the use of our new tool. The latter, indeed, allows us to first prove the security of combination of stateful gadgets, i.e., the optimized compositions discussed above, and then to verify their security in the biggest context of the S-Box, which would otherwise be too exhaustive to prove by pen and paper.

Resilience in Practice
The question whether proofs in fine-grained leakage models connect to resilience in practice was left open so far. The connection between threshold probing security and resilience in practice is straightforward: the formal property that no combination of t (modeled) observations provides benefit to an attacker can directly be transposed to the physical setting where no combination of t measurement samples should provide valuable information on secrets. A threshold probing proof is thus representative whenever the specified leakage model contains all information derivable from measurement samples, i.e., the model is sufficiently complete. Our work enables verification in leakage models with the mandated precision.
The assurance of representative proofs is important in that it provides a lower bound on the attack complexity since at least t pieces of information have to be recovered and this difficulty is exponential in t when sufficient noise is present [PR13]. Our systematic approach allows to get the most out of masking by achieving the optimal resilience at security order t = d − 1 in practice, which is important for efficient implementations.
The question of evaluating the quality of a model is still unanswered for this new kind of specification which expresses data dependency only. Leakage certification is an established approach to systematically validate the quality of leakage models but requires more detail than needed for probing security since the constituting function for each measurement sample must be modeled [DSM17,DSV14]. Leakage detection is a good candidate due to direct connection to probing security and the way models are shared across implementations.
Representative proofs of threshold probing security correspond to the hypothesis that the distribution of every combination of t leakage observations is independent of secrets. Leakage detection methods such as TVLA assess exactly this hypothesis in comparing the distribution of measurement samples taken during execution on a fixed secret with execution on random secrets [SM15]. Informally, TVLA evaluates whether the observable leakage of computation on secrets can be distinguished from leakage generated by random inputs, which should be indistinguishable for secure implementations. Other detection techniques such as Hotelling's T 2 -test (e.g. [BSS19]) can be used to increase accuracy, as long as the inherent probing threshold is obeyed. We prefer TVLA since positive results indicate the originating measurement sample(s), easing model adoption.
The quality of our model is evaluated by constructing multiple implementations in this shared model and applying physical leakage detection independently on each implementation. We stress that in this assessment strategy the model becomes stronger the more verified implementations are evaluated using leakage detection, which is a significant benefit of systematic hardening in general. Our model is (empirically) qualitative since all implementations are leakage free at their optimal order in physical leakage detection assessment at a minimum of one million traces 6 .
The power consumption of two CM0+ MCUs ("FRDM-KL82Z", "STM32L073RZ") is each measured with an oscilloscope sampling the current consumption via an inductive current probe at 2.5 GS/s, a bandwidth of 500 MHz and 8bit quantification. The MCUs are clocked at 4 MHz and every 125 samples are averaged resulting in 5 samples per cycle. Each execution is averaged over four repeated executions to further reduce the noise, resulting in an assessment with very little noise. Sets of one million measurements each are compared in random vs. fixed Welch t-test, alpha certainty of 0.0001. Significant leakage is detected when the t-statistics are larger than the non-adopted threshold of 4.5.
Our 1 st order PRESENT S-Box is free of significant leakage on both MCUs, as seen in Figure 6. The need for device-specific models is justified by the fact that our practically resilient and formally secure code emits detectable leakage when executed on the distinct Arm Cortex M4F (CM4F) architecture, namely the "STM32F407", as seen in Figure 7. A distinct fine-grained leakage model is needed for this processor as there are clear signs of leakage at low number of traces. The three-stage pipeline with three-address arithmetic logic unit of the CM4F are likely causing distinct leakage behavior, amenable to future, fine-grained leakage models. To show the applicability of our model at higher order we evaluate our 2 nd order PRESENT S-Box in 2 nd order multivariate TVLA on the KL82Z by processing the measurements such that every pair of sample points is combined and evaluated, the results are shown in Appendix D, Figure 8. The combinatorial blow-up requires hundreds of CPU hours to evaluate the S-Box, compared to few seconds when using scVerif.
The model sufficiently represents power side-channel leakage for uni-variate (first order) and multi-variate (higher-order) attacks up to one million traces and as such threshold probing security proofs in this particular model appear representative. In general, the combination of probing security and TVLA evaluation is beneficial as strict verification of many implementations depends on a single, shared specification of leakage behavior while physical evaluation strengthens the shared specification by assessing in different contexts. Re-using models in the form of shared libraries allows to reduce the risk of specification errors as well, thus we provide our model as open source [BGG + 20]. Moreover, our approach allows to verify concrete implementations at higher orders of security with predictable resilience in practice, scaling beyond the computational bound of multivariate TVLA.

Conclusion
In this paper, we show how automated verification can deliver provably resilient and practically hardened masked implementations with low overhead.
Our DSL allows to construct fine-grained models of side-channel behavior which can be adopted flexibly to specific contexts. For the first time, this approach allows to verify formal notions of side-channel resilience in user-provided models at higher orders. The combination of representative leakage models and formal verification enables to rule out entire classes of practical side-channel attacks backed by provable security statements.
New generic optimization strategies are introduced to reduce the overhead mandated by additional countermeasures for security in fine-grained leakage models. The optimizations are applied to a masked PRESENT S-Box and validated to be leak free up to a high number of traces in physical leakage assessment despite the high efficiency of the constructions. Moreover, the optimized and hardened constructions show that practical resilience and efficiency can go hand in hand, motivating further research.
Our tool scVerif serves as front-end to MaskVerif but the presented concept to model side-channel behavior explicitly is likely adoptable to verification of other security notions such as noisy or random probing security, given that sufficient information such as signalto-noise ratio or occurrence probabilities are encoded in the model. This could allow to bound the success rate of attacks at order t > d in combination with the powerful but bounded assurance from probing security for t ≤ d.

Supplementary material
A Basic algorithms A.1 Addition gadgets Algorithm 6 SECXOR: Addition scheme at 2 nd order of security Input: a = (a 0 , a 1 , a 2 )

B Optimization with Proposition 3
In Algorithm 11, we give the concrete construction of how Proposition 3 is applied to the standard xor given in Algorithm 7. We point out that we analyzed the worst-case scenario in Proposition 2, and in Algorithm 11, a complete clear is not needed between the computation of each output share. Table 3 illustrates that all observations never depend on two different shares of the same input and t-NI security holds with the same arguments as in the proof.

C PRESENT Sbox
The PRESENT S-box S of the first order implementation, based on [CFE16], is expressed in the following way: S(x) = A(G (G(B(x)))) with the affine functions A and B: A(x) =