Exploring Crypto-Physical Dark Matter and Learning with Physical Rounding

State-of-the-art re-keying schemes can be viewed as a tradeoff between efficient but heuristic solutions based on binary field multiplications, that are only secure if implemented with a sufficient amount of noise, and formal but more expensive solutions based on weak pseudorandom functions, that remain secure if the adversary accesses their output in full. Recent results on “crypto dark matter” (TCC 2018) suggest that low-complexity pseudorandom functions can be obtained by mixing linear functions over different small moduli. In this paper, we conjecture that by mixing some matrix multiplications in a prime field with a physical mapping similar to the leakage functions exploited in side-channel analysis, we can build efficient re-keying schemes based on “crypto-physical dark matter”, that remain secure against an adversary who can access noise-free measurements. We provide first analyzes of the security and implementation properties that such schemes provide. Precisely, we first show that they are more secure than the initial (heuristic) proposal by Medwed et al. (AFRICACRYPT 2010). For example, they can resist attacks put forward by Belaid et al. (ASIACRYPT 2014), satisfy some relevant cryptographic properties and can be connected to a “Learning with Physical Rounding” problem that shares some similarities with standard learning problems. We next show that they are significantly more efficient than the weak pseudorandom function proposed by Dziembowski et al. (CRYPTO 2016), by exhibiting hardware implementation results.


Introduction
State-of-the-art. Protecting block cipher implementations against side-channel attacks is a difficult problem. Countermeasures like masking [CJRR99,ISW03] are expensive in software [GR17] and hardware [GMK17]. They are also error prone due to physical defaults such as glitches [MPG05,NRS08] or transitions [CGP + 12, BGG + 14], and due to composability issues [CPRR13, BBD + 16]. Informally, this situation is caused by the complex (nonlinear) nature of the block ciphers: while the linear parts of an implementation can be trivially secret-shared with limited complexity overheads, the secure execution of their nonlinear parts typically implies overheads that are quadratic in the number of shares and requires refreshing algorithms that increase their randomness cost.
As a result of these limitations, the concept of fresh-rekeying (illustrated in Figure 1) was introduced by Medwed et al. [MSGR10]. Its main underlying idea is to leverage a separation of duties between a "re-keying function" RK, that is easy to protect against side-channel attacks (e.g., easy to mask) and is only used to produce a fresh key k * , and a cryptographically strong function (e.g., a block cipher or a tweakable block cipher) to 2 (possibly Toeplitz for efficiency) and a public vector r ∈ F n 2 . Next, it computes the product K · r and it interprets the output of this product as a vector of 0/1 values over F 3 . Finally, the output of the wPRF is the sum of these values modulo 3. In other words, their wPRF can be defined as: with map : {0, 1} m → F 3 that maps y ∈ {0, 1} m to y i mod 3. The similarity between this function and the one of Dziembowski et al. is striking: the matrix multiplication can be seen as multiple inner products and the mapping function plays the role of the rounding. Its limitations for masked implementations are therefore similar: the mapping function is nonlinear over F n 2 which requires special care and implies overheads. Crypto-physical dark matter and learning with physical rounding. The main research question we tackle in this paper is whether we can build a secure re-keying scheme in the adversarial model of Figure 2(b), by leveraging some "crypto-physical dark matter". By this, we mean building a wPRF combining a matrix multiplication with a physical mapping that would not have to be computed explicitly (i.e., digitally) and would rather be performed in an analog manner by an implementation's leakages. Taking the example of the Hamming weight function, which is a frequently observed leakage model [MOP07] and will be our running example, we know from the results of Belaid et al. that multiplications in F 2 κ make this proposal insecure (without noise). In the following, we put forward that multiplications in a prime field can lead to secure (and efficient) candidates.
Our contributions in this respect are threefold. First, we show that crypto-physical dark matter cannot be secure if its underlying multiplications take place in a small field or if it is based on too small vectors. Next, we propose instances based on medium size prime fields F p (e.g., with p ≈ 2 32 ) that are well suited for software and hardware implementations, and we analyze some relevant security properties of the functions combining prime field multiplications and the Hamming weight mapping. Finally, we highlight the excellent implementation properties that a re-keying scheme based on such a crypto-physical dark matter enables. Informally, these properties are due to the fact that contrary to re-keying schemes in the model of Figure 2(c) where the mapping/rounding has to be computed securely (e.g., thanks to masking), the (physical) mapping/rounding we introduce never has to be computed securely in the model of Figure 2(b), since it is performed by a leakage function. As a result, masking with a small key and complexity overheads that are linear in the number of shares can theoretically be obtained with practically-relevant leakage functions, also leading to a set of interesting open problems browsed in conclusions.
We note that, as usual when introducing a new cryptographic primitive, our focus in this work is to exhibit relevant security & implementation properties which may open new research directions. In this respect, our claim is that the proposed re-keying scheme is at the same time more secure than the one of Medwed et al. [MSGR10] under reasonable (e.g., Hamming weight) leakage models and more efficient than the one of Dziembowski et al. [DFH + 16] thanks to a significantly shorter key. We hope these results can be used as a seed to trigger more cryptanalytic investigations and physical security analyzes.
We additionally note that we will use the term crypto-physical dark matter for the re-keying operations and the term Learning With Physical Rounding (LWPR) for the problem of recovering the long-term key of the resulting re-keying scheme. 1 Related works. In addition to the previously listed schemes that leverage the masking countermeasure, it was also proposed to use a leakage-resilient PRF for re-keying, as investigated in [MSJ12, BSH + 14, MSNF16, USS + 20]. Such a solution has been recently integrated in the ISAP Authenticated Encryption scheme [DEM + 17]. It does not rely on key-homomorphism and rather aims at limiting the manipulation of the long-term key in order to limit the attack vectors to Simple Power Analysis (SPA) attacks.

Background & definitions 2.1 Notations
We denote vectors with bold letters v and matrices with bold capital letters M . We use the log notation for the logarithm in basis 2. For n ∈ N, we denote by [n] the set of integers from 1 to n and by [0, n] the set of integers from 0 to n.

Physical model & LWPR
In order to define the LWPR problem, we need to define the physical model we are working with. In this respect, the main issue is that we must specify crypto-physical dark matter computations that mix mathematical operations and physical ones. For this purpose, we first observe that the elements of the vector y = K · r are in F p . We then formalize as "physical rounding" the function modeling the side-channel information an adversary gets from noise-free leakages on this vector. The physical model we will consider for the rounding is a composition of two (more or less specialized) assumptions.
One the one hand, we assume that the leaking device computes on binary-represented data: each value in F p is therefore represented with (at least) log p bits. We denote as g : F p → {0, 1} log p the function associating to each element of F p the binary representation of its representative in [0, p − 1]. We also define g m : . We argue that this assumption is quite generic and captures the reality of most embedded computing devices deployed in current applications.
On the other hand, we need a more specialized assumption defining how the physical (noise-free) leakages depend on the m log p bits provided by g m . This role will be played by the leakage function. We denote the leakage function computed on the binary representation of the manipulated data as L g (.) and use it as a parameter of our investigations.
We can then define a generic LWPR problem as follows.
Definition 1 (Learning with physical rounding). Let p, n, m ∈ N * , p prime, for (unknown) Lg,p sample distribution is given by: where K r = K · (r, 1) and L g : F m p → R d is the physical rounding function. Given query access to D LWPR n,m Lg ,p for a uniformly random K, the LWPR n,m Lg,p problem is (q, τ, µ, )-hard to solve if after the observation of q LWPR samples, no adversary can recover the key K with time complexity τ , memory complexity µ and probability higher than .
Note that K is multiplied with (r, 1) rather than r. The additional m log p -bit key addition is needed to obtain strong differential properties, as discussed in Section 4.2.1.
As already mentioned, as a starting point and as an interesting feasibility result, we will next consider the security that can be obtained with the Hamming weight function which the most frequently observed leakage model for standard CMOS devices [MOP07]. For this purpose, we first denote the Hamming weight function HW(v), defined on any vector v of length t ∈ N * with coefficients in {0, 1} as HW(v) = t i=1 v i , where the sum is performed in Z. We then consider two possible implementations of it: • Parallel: L p g (y) : y → HW g m (y) = HW g(y 1 ) + HW g(y 2 ) + . . . + HW g(y m ) .

A general adversarial model
While trying to break the LWPR assumption is one natural path to attack our re-keying scheme, it is not the only option. Side-channel security also relies on the fact that the re-keying function itself is well protected. Since crypto-physical dark matter computations are key-homomorphic, masking is a natural candidate for this purpose. It leads to the general adversarial model of Figure 3, where the first (red) attack path targets the leakage of the re-combined ephemeral key L(k * ) (i.e., the LWPR assumption) while the second (blue) attack path targets the noisy leakages of the shared computations K 1 r, K 2 r, . . . , K d r together with the leakage generated by the shares' recombination.
We next formalize this adversarial model, starting with a definition of our re-keying scheme and following with the key recovery experiment it aims to keep hard.
Definition 2 (Re-keying scheme). Let n ∈ N be a security parameter, and d ∈ N a number of shares. A re-keying scheme RK is made of the polytime algorithms: • Gen(1 n , d). Generates the long-term key K and the initial sharing K 1 , K 2 , . . . , K d .
Definition 3 (Side-channel key recovery experiment Exp skr A,RK (n, d)). The experiment processes in three (setup, challenge and final) phases specified as: • Setup phase. A long term key K is generated in function of the security parameter n and it is split into shares K 1 , K 2 , . . . , K d using the Gen(1 n , d) algorithm.
• Challenge phase. The adversary A performs q re-keying queries. For each query, the vector r is chosen at random, the SharedMult algorithm is computed on the shares of the long-term key and r, and the shares of the long-term key are refreshed. The vector r is then given to A with the following leakages: and some noise N i -the shares' leakages.
-L(Rec) + N rec , for some noise N rec -the leakage from the shares' recombination.
-L(k * ), the noise-free leakage from the ephemeral key k * .
• Final phase. The adversary A outputs a candidate for the long-term key k . The output of the experiment is defined to be 1 if k = k and 0 otherwise.
We say that A succeeds, or breaks the re-keying scheme RK, with probability , time complexity τ , memory complexity µ and q queries if after q queries in a challenge phase bounded by these time and memory complexities, Exp skr A,RK (n, d) = 1 with probability .
Remark 2. While the second attack path of Figure 3 (targeting the masked computations) is well investigated in the literature, the first attack path (targeting the LWPR samples) is new. We therefore start by studying this second attack path in Sections 3 and 4. For completeness, we also discuss the second attack path and how to choose the number of shares to reach a given security level in Section 5.3. Whether both attack paths can be combined in advanced attacks is an interesting scope for further research.
Remark 3. While the LWPR problem is stated for noise-free leakages (which is important since the ephemeral key is unshared), the secure implementation of masking generally requires a certain level of noise. Yet, contrary to the masking of nonlinear operations for which this necessary level of noise may increase with the number of shares [BCPZ16], the masking of a key-homomorphic primitive only requires a constant noise rate. Other advantages of the re-keying approach for masking are recalled in Section 5.3.
Remark 4. The adversarial model of Figure 3 does not explicitly show that the ephemeral key k * is used in a (tweakable) block cipher in Figure 1. Concretely, the leakage L(k * ) therefore has to be understood as all the leakage that can be obtained on k * . The starting assumption we study in this paper is the one of an adversary who does not obtain significantly more information than the Hamming weight of k * . Analyzing more general classes of leakages is an important open problem, as will be discussed in Section 6.

Functions over F p and vectorial Boolean functions 2.4.1 Cryptographic criteria & p-ary functions
We adapt tools from the analysis of cryptographic criteria for Boolean functions (from F n 2 to F 2 [CCH10a]) to study the cryptographic properties of functions from F n p to F p .
Definition 4 (p-ary function). For p a prime, a p-ary function f in n variables (an nvariable p-ary function) is a function from F n p to F p . The set of all p-ary functions in n variables is denoted by F p,n , and |F p,n | = p p n . 2-ary functions are called Boolean.
Definition 5 (Algebraic normal form and algebraic degree (e.g., [Hou18])). We call Algebraic Normal Form (ANF) of a p-ary function f its n-variable polynomial representation over F p (i.e., belonging to F p [x 1 , . . . , x n ]/(x p 1 − x 1 , . . . , x p n − x n )): x Si where a S ∈ F p . The ANF of f is unique, and the algebraic degree of f equals the global degree of its ANF: Definition 6 (Nonlinearity). For d ∈ N * , the order-d nonlinearity nl d (f) of a p-ary function f ∈ F p,n , is the minimum Hamming distance between f and all the functions in F p,n of degree at most d: where d H (f, f * ) is the Hamming distance |{x ∈ F n p | f(x) = f * (x)}| between f and f * .

Cryptographic criteria & vectorial Boolean functions
We give definitions on vectorial Boolean functions that will be used in the paper [CCH10b], with criteria borrowed from the analysis of block ciphers that we will evaluate.
Definition 7 (Vectorial Boolean function). A function from F s 2 to F t 2 is called vectorial Boolean function. For F an (s, t)-vectorial Boolean function, the t Boolean functions Definition 9 (MELP (e.g., [Vau99])). Let a family of vectorial Boolean functions (F K ) K from F s 2 to F t 2 be parameterised by a key K. Its Maximum Expected Linear Probability (MELP) is defined as: Definition 10 (MEDP (e.g., [Vau99])). Let a family of vectorial Boolean functions (F K ) K from F s 2 to F t 2 be parameterised by a key K. Its Maximum Expected Differential Probability is defined as: b}| is the number of solutions of the differential equation defined by the mask (a, b), with ⊕ the bitwise XOR.

Definition 11 (ε-AXU [CW79]). A family of keyed functions
In particular, for (F K ) K an ε-AXU family of keyed functions from F s 2 into F t 2 , we have:

Negative results: Small p or small n are not enough
The operations involved while computing a LWPR sample can be separated in two parts: on the one hand a matrix-vector multiplication over F p , on the other hand the physical rounding function. The first part is a linear operation over F p . Therefore, if the second part has a cryptographic weakness in characteristic p, it could be used to break the LWPR problem. As a warm-up, we show the LWPR problem cannot be hard with small p or n values. For this purpose, we first show that the LWPR n,m L p g ,3 problem is no harder than solving a quadratic system of equations in characteristic 3. We then give a minimum (necessary) condition on the vector size n for the LWPR problem to be hard.

F 3 is not secure enough
We first focus on the function L p g (y) when y is a single element of F 3 . In this case, the adversary observes the real L p g (y) which belongs to Z and can be embedded in F 3 . That is, she observes the outputs of the 3-ary (ternary) function f(y) = HW(g(y)) mod 3 associating to each y ∈ F 3 the Hamming weight of its binary representation.
Next, coming back to the general case, we focus on the function L p g (y) where y ∈ F m 3 . When the adversary embeds L p g (y) in F 3 , it corresponds to the ternary function f (y) = HW(g m (y)) mod 3. In other words, f is simply the direct sum of m times the previous function f, and its algebraic degree is equal to the one of f. In particular: i , and f has algebraic degree 2 for p = 3. Finally, since each y i is the result of a product between a row of K and (r, 1) over F 3 , the adversary can extract from each LWPR n,m L p g ,3 sample a quadratic relation over F 3 in the elements of K, namely: By collecting LWPR samples the adversary can linearize this quadratic system in at most m((n + 1)(n + 2))/2 unknowns, and determine a valid key for these samples by solving it. Hence, considering that solving a linear system through standard Gaussian elimination has at most a cubic complexity, we conclude that LWPR n,m L p g ,3 is not (O(mn 2 ), O(m 3 n 6 ), O(m 3 n 6 ), 1)-hard, making such an instance hardly useful for practical applications. 2 This attack can be extended for all prime p's. But since its complexity increases exponentially with p, it will not be a security issue for p big enough: Proposition 1. Let n, m, p ∈ N, p a prime, solving the LWPR n,m L p g ,p problem can be reduced to solving an algebraic system of degree p − 1 in characteristic p.
Proof. We begin by considering a function f ∈ F p,1 , the p-ary function defined for y ∈ F p as f(y) = HW(g(y)) mod p. Note that independently of the exact expression of the function g, the HW function gives an element in Z, and considering its remainder modulus p always gives a function from F p to F p . Since all functions from F p to F p are the functions of F p,1 , the degree of f is at most p − 1 (see Definition 5). Then, we focus on the p-ary function Note that: Hence Since each y i is the result of the linear combination over F p of the i-th row of K and the public vector r, each LWPR n,m L p g ,p sample leads to a degree at most p − 1 equation in the key elements, in characteristic p. More precisely: ). Therefore, an adversary solving the algebraic system given by the different r values recovers K and breaks the LWPR n,m L p g ,p problem. 2 The system may give various solutions and not only the correct K. We ignore this issue since our goal in this section is only to show that small p's cannot guarantee higher security.

Small vectors are not secure enough
Contrary to the previous result which holds for any m (and shows that crypto-physical dark matter cannot be secure with small p's even if implemented in parallel), we now consider an attack against LWPR n,1 Lg,p taking advantage of a small value of n, which only imposes a condition for serial implementations. Roughly, when m = 1, each sample gives the Hamming weight of a known linear combination of the key elements. We show next how this information over different samples can lead to an attack, provided n is small.
First, note that each sample of LWPR n,1 Lg,p has the form (r, u) with u = HW(g( n i=1 k i r i + k n+1 )) since the key is reduced to a vector when m = 1. The value u belongs to Z, and considering the binary decomposition g over bits, 0 ≤ u ≤ , the function HW(g(·)) is surjective over [0, ] and |HW(g(·)) −1 (u)| takes different values for u ∈ [0, ].
Let us denote as A u the set of preimages of u through HW(g(·)). Then, each sample gives the information which is the correct one. After collecting n + 1 samples, and the corresponding u 1 , . . . , u n values, one of the n+1 i=1 |A ui | linear systems in n + 1 unknowns over F p characterizes the key. We next show that a few extra samples are sufficient in order to verify if a candidate key is the right key. Consequently the LWPR n,1 Lg,p problem can be solved by solving a certain amount of linear systems. We additionally highlight how this amount evolves with n, and how to reduce it.
The inner product between (r, 1) and k is uniformly distributed in F p (any non null element modulus p generates the multiplicative group). Therefore the probability for a wrong key k to give the same u is |A u |/p. Let us denote M = max u∈[0, ] |A u |. Then, the probability of having a wrong key consistent with t samples is lower than or equal to (M/p) t . Considering M/p ≤ 1/2 with λ samples (where λ is the bit-security parameter) already ensures that k is consistent with the samples only with a negligible probability. More precisely, for 2 −1 < p < 2 where ∈ N * and g corresponding to the usual binary representation over bits, we get M/p ≤ /2 /2 −1 , which allows us to determine the number of extra samples to consider for rejecting the wrong keys.
The amount of linear systems over F p to solve can be as high as M n+1 , which is at most /2 n+1 for the definition of g we consider. For each linear system, the attack consists in solving the system given by the first n + 1 samples, which can be done in time O(n 3 ) with standard Gaussian elimination, and in testing if the obtained key is consistent with t ≤ λ extra samples, which costs t inner products, evaluations of g and HW. Assuming the cost of the t evaluations is smaller than O(n 3 ) (the cost of solving the linear system), the attack cost in time is O(M n+1 n 3 ) and it requires at most n + 1 + λ samples. Taking p as an upper bound on M gives attacks when (n + 1) log p + 3 log n < λ. For mid-size p values that are interesting for implementation purposes (e.g., p ≈ 2 8 , 2 16 , 2 32 ), this inequality therefore sets a condition for the minimum vector size n. If not respected, we can conclude that the resulting LWPR n,1 This attack exists for all values of n, but its complexity increases exponentially with n. We further show in Appendix A that even when generalizing the attack, the complexity still increases exponentially with n, and also with m for LWPR n,m L p g ,p .

First analysis and proposed instance
We now move to the analysis of crypto-physical dark matter instances that can lead to hard LWPR problems. As a first step in this direction, we put forward desirable security properties that can be used in order to rule out a number of standard attacks. As already mentioned in introduction, this analysis is admittedly not exhaustive: it is only proposed to support our claim that re-keying in the noise-free model of Figure 2(b) can provide stronger security guarantees than the initial proposal of Medwed et al. [MSGR10], which is only secure in the noisy model of Figure 2(a). We proceed in two steps for this purpose. First, we extend the negative results of the previous section which analyze our construction in F p . We show that with sufficiently large p and n values, we can lower bound the algebraic degree and the nonlineariy of the function generating LWPR samples. Next, we complement this analysis with an evaluation in F 2 . In this case, we focus on the cryptographic properties of K r when interpreted as a function over binary fields, and focus on its differential/linear properties and its algebraic degree.

Analysis in characteristic p
As mentioned in Section 3, the product K r is linear over F p . Hence, our focus is on the (linear invariant) cryptographic criteria of the remaining function L g considered over F p , denoted as f . 3 In Section 3.1, we saw an upper bound on the degree of such a function. We now prove a lower bound on this degree. It allows us to thwart attacks based on solving a linearized algebraic system. Thanks to this bound on the degree, we also derive a bound on the nonlinearity of small order of the function f . It enables us to prevent attacks based on solving low-degree noisy algebraic systems, relying on a good approximation of f by a low degree function. We first introduce the iterated Hamming weight function, similarly to the iterated logarithm, that will be used to prove the lower bound on the degree.
Definition 12 (Iterated Hamming weight function). Let n ∈ N, we define the iterated Hamming weight as: This definition allows us to prove the following two results: Proposition 2 (Degree lower bound). Let m, p ∈ N * , p odd prime, and f the p-ary function defined as L p g (y) mod p, then deg(f ) ≥ (p − 1) 1 it H (p−1) .
Proof. Let us consider f the p-ary function in one variable which associates to each element y ∈ F p the element in F p corresponding to HW(g(y)). We show that applying f iteratively it H (p − 1) times gives the function y p−1 : the one associating 0 to 0 and 1 to any nonzero element. Note that f (0) = HW(g(0)) = 0, f (1) = HW(g(1)) = 1 and (considering the order in Z) for x ≥ 2, we have 1 ≤ f(y) < y. Then, for each y = 0 there is a number of iteration s ∈ N * such that for all t ∈ N * , t ≥ n, f •t (y) = f(f(· · · (f(y)) · · · )) = 1, where we denote f •t the function consisting in iterating t times f. By construction the function it H (·) is non decreasing. Hence, for all x ∈ [p − 1] we have it H (x) ≤ it H (p − 1) and for all y ∈ F * p we obtain f •it H (p−1) (y) = 1. Since f •it H (p−1) (0) = 0 we can conclude that f •it H (p−1) and y p−1 take the same values over the whole F p , hence it is the same p-ary function. Due to the uniqueness of the ANF we use that deg(f •it H (p−1) ) = p − 1, and therefore deg(f) ≥ (p − 1) Eventually, the function f is the direct sum of m times the function f. Therefore deg(f ) = deg(f), which allows us to conclude. 3 Criteria over F i p could be considered too, but since the Hamming weight function only exceeds p when m ≥ p log p , such a generalization will not lead to relevant observations for our intended instances.  (d, m). For all d ∈ N such that d < p, the minimal Hamming distance of such code is (p − d)p m−1 (e.g., [PW04], page 3).
Since − 1, m), and therefore for all h ∈ RM p (p − 1, m) , m), therefore f is at Hamming distance at least p m−1 from all functions of degree at most d, which allows to conclude.
Finally, we can also prove a better bound for the first-order nonlinearity: Proposition 4 (First-order nonlinearity lower bound). Let m, p ∈ N * , p odd prime, = log(p − 1) , and f the p-ary function defined as L p Proof. The first-order nonlinearity gives the minimum Hamming distance to constant functions and degree one functions. We study these two cases separately for this proof.
All degree one m-variable functions l a (y) have an ANF of the following shape: a 0 + The closest constant function to f in Hamming distance is the one equal to the value of F p taken the most by f . Since m < p − 1, the value of f seen as an integer in [0, p − 1] is equal to HW(g(y 1 ), . . . , g(y m )) considered over Z. The binary vector (g(y 1 ), . . . , g(y m )) has length m, and the maximal number of lengthm binary vector having the same Hamming weight is given by the central binomial coefficient We finally conclude from the two parts that the following holds: Discussion. Various techniques can be used to solve a noisy algebraic system of fixed degree. We use the higher-order correlation approach of [Cou02] to encompass different attacks and derive the corresponding complexities. Higher-order correlation attacks consists in approximating f by a degree d function h, and in solving systems of equations until one is such that f and h coincide on all these equations. In the system of equations given by the output of f , only the key elements are unknown. Therefore, solving the correct algebraic system allows retrieving the key. Following, the time complexity of solving a noisy degree d system of equation over F p can be written as C(1 − ε) −D with: • C the time complexity to solve a degree d system of equations in V variables over F p , • (1 − ε) the probability of the approximation to be correct for one equation, • V is the number of variables (i.e., at least the number of key variables k, but it can be more if techniques introducing new variables -such as linearization-are used).
• D the quantity of data necessary (at least the number of variables V ), For illustration, we consider the complexity of linearizing the degree d system and solving the linear system obtained. It allows deriving concrete estimations of the complexity for three different attacks. For this purpose, we first observe that the number of variables after linearization is where ω is the exponent in the complexity of Gaussian elimination.
In the first case, the adversary aims to solve an exact algebraic system. It corresponds to ε = 0 and d = deg(f ). The time complexity is then O((V deg(f ) ) ω ) and the bound on the degree from Proposition 2 allows us to conclude. The second attack targets a noisy linear system corresponding to ε = nl 1 (f )/p m and d = 1. In this case, V 1 = k and the time complexity is at least O(k ω (p m /(p m − nl 1 (f ))) k ). Hence, the bound of Proposition 4 provides the required estimation. Finally, if the adversary targets a noisy system of higher algebraic degree d > 1, we have ε = nl d (f )/p m . The time complexity is then at least

and the bound of Proposition 3 allows us to conclude.
Those results illustrate that the proposed crypto-physical dark matter operations lead to LWPR samples that resist some standard cryptanalysis techniques. We note that the attacks outlined are not claimed to be optimal. For example, using Gröbner basis algorithms such as F4 [Fau99] should improve over the aforementioned linearization techniques, which we leave as an interesting scope for further investigations.

Analysis in characteristic 2
We now consider a complementary analysis of the crypto-physical dark matter function f = L g (K r) interpreted over binary fields. In contrast with the previous results in F p where the security of the related LWPR problem mostly depends on the Hamming weight function, such a leakage function is actually weak over binary fields (e.g., L g (.) mod 2 is a linear relation). Therefore, we rather rely on h K = K r in F p to provide security in the binary case. Our main result in this direction is to show that the MEDP (Maximum Expected Differential Probability) of h K = K r can be bounded by leveraging existing results on universal hash functions. We complement this result by heuristic investigations on small instances from which we conclude that its MELP (Maximum Expected Linear Probability) follows a similar trend, and that its algebraic degree is close to maximal.
For this purpose, we first define the vectorial Boolean functions we will study. Denote the smallest integer such that 2 > p, that is the size in bits of the representation of an element of F p . Denote x [2] ∈ F i 2 a representation of the vector x [p] ∈ F i p over a binary field, obtained by representing each element of x in F 2 . We consider the family of functions , where K 1...n is the matrix made of the n first columns of K.

Differential analysis
We argue about the security of our construction by exhibiting its MEDP. We note that this security property is commonly studied for block ciphers and message authentication codes, but it is usually hard to obtain a good estimate of it without constraining assumptions. Here, we get the exact MEDP of the construction. The very first universal family of hash functions, called H 1 , introduced by Wegman and Carter in [CW79], allows us to derive this result for our construction. It works in two steps: (1) we show that h with m = 1 is α p -almost XOR universal with α close to 1; (2) we use tweaks of classical results on concatenation to show that h is α p m XOR universal. Concretely, our construction h with m = 1 is a modification of H 1 where the multiplication over F p is replaced with a scalar product over F p . Denote x [2] the natural decomposition of x ∈ F p over F 2 , with 2 > p (i.e., seeing x in N and using its decomposition in basis 2). It yields: Let K a random variable over the universe of keys Ω = (F n p \ {0}) × F p , which has cardinality (p n − 1)p. Let r, r ∈ F n p , r = r . Denote k ∈ Ω = (k 1 , . . . , k n+1 ), Then for any d ∈ F 2 : For any couple (a, b) ∈ F 2 p and for any r, r ∈ F n p , r = r , without loss of generality assume r = 0, there are at most p n−1 values of k such that (h k (r), h k (r )) = (a, b). Details on this are given in Appendix B, in particular this justifies the supplementary key addition.
Indeed, it is a system of 2 linear inequivalent equations in n + 1 independent variables, hence it has n − 1 degrees of freedom. Thus:

The second line uses that for events A and B, Pr(A∩B) = Pr(A | B) Pr(B) ≤ Pr(A) Pr(B).
With 2 > p and (.) [2] the binary decomposition, we have β = 1, which yields Note that this proposition works for any and x → x [2] , including for 2 < p and x → x [2] the reduction modulo 2 that has β = p 2 , which generalizes the result from Carter and Wegman. Note also that α 1 for p and n not too small and 2 close to p. From Proposition 5, we have that (h k ) k is α p -almost XOR universal. Using a refinement of the result on the concatenation of universal functions [Sti91], we get the corollary: Corollary 1 ((K r) [2] is AXU). Let K be a random matrix of ((F n p \{0})×F p ) m . Define h K (r) = K 1...n · r + K n+1 , for r ∈ F n p . The family (h K ) K defined for all K as h K (r) = (((h K (r)) 1 ) [2] , . . . , Proof. Denote e K : F n p → F p the parallel computation in h , i.e., e K (r) = ( h is made of m parallel versions of e K (i) with independent keys K (i) , but identical input value r. Thus for any r = r , e K (i) (r) = e K (i) (r ) for all 1 ≤ i ≤ m. Thus for any r = r , as we can apply Proposition 5 on the m independent e K (i) which all have inputs r = r .
From this result, we conclude that (h K ) K is α p m -almost XOR universal. Therefore For p not too small (e.g., the instance of Section 4.3), the factor α can be neglected, as discussed for Proposition 5, and we get MEDP Discussion. Security against differential attacks is a standard requirement for cryptographic primitives [BS90]. The MEDP of our construction is close to optimal, which can can be interpreted as follows: set a differential (a, b) through the target function, then a small MEDP implies that for a random choice of key, this differential will have a small probability. This does not guarantee that for any given choice of key, no differential has a high probability. It rather guarantees that the differential (a, b) that maximizes the MEDP will only maximizes the differential probability of a few keys, and that for a given key, few differentials have a high probability. (We checked experimentally that these guarantees were verified for small instances of our construction). Our analysis therefore suggests that for any choice of key, finding a high probability differential is hard.
We note that the supplementary key addition of our construction is required for our proof, which leverages the one of Carter and Wegman (see Appendix B). Without it this result does not hold, and we tested experimentally that the MEDP can be significantly worse in that case, depending on choices of p and n (and can even be equal to one for small n values). It is an interesting open question to find out whether a slightly worse (yet, still sufficient) MEDP could be proven without this supplementary key addition for large enough p and n values. We note also that since our re-keying function works with random inputs, finding a good difference anyway requires birthday complexity (so overall, we do not expect differential attacks to be significant threats against our construction).
We finally mention that truncated differential attacks [Knu94] are of particular interest for our re-keying scheme, as it outputs m words of F p by independent parallel computations. It is therefore natural to consider the probability of obtaining a certain difference on one output word, regardless of the rest of the output. From Proposition 5, we have that the MEDP of such a truncated differential is less than 1 p , which is excellent for an output in F p , thus discarding the possibility of predicting the output difference.

Linear, algebraic and other cryptanalyses
Security against linear cryptanalysis is another standard requirement for cryptographic primitives [Mat93]. It consists in finding affine relations between the input and output bits of a primitive, that hold with high probability. In turn, it can lead to distinguishers, key-recovery attacks, and also to being able of predicting output bits without the key.
Resistance against linear attacks is classically measured with the MELP. However, we are not aware of results bounding the MELP of a universal hash function that would lead to theoretical results similar to the MEDP ones. Hence, we only argue about resistance against linear attacks experimentally. We analyzed reduced instances with m = 1 and small primes (i.e., p up to 251) and values of n (i.e., n up to 4) and observed that the MELP follows a similar trend as the MEDP, and is always less than 1 p for each output word. We conjecture that for any key, finding a linear relation with high probability is hard.
We also evaluated the algebraic degree of the same small instances as used to heuristically assess the MEDP of our re-keying function and observed that it was always maximum (equal to ), which guarantees that no algebraic attack can be easily performed. In particular, this is different from the case of the scalar product (with or without the supplementary key addition) over F 2 . In this binary field case, the MEDP is still optimal (in particular Proposition 5 holds) but the scalar product is linear. Therefore, higher-order differential attacks [Lai94,Knu94] can be performed exploiting the low degree of the function (which is 1). Having a high degree, our construction over F p avoids such issues, and should resist to other attacks exploiting a low-degree function, such as cube attacks [DS09].
Finally, some other techniques seem suitable to analysing our construction. For instance, Divide-and-Conquer techniques seem relevant on the m independent outputs of h K , and Guess-and-Determine techniques [HR00] could be interesting as the master key K is never modified. Investigating those and more advanced techniques is left for future work.

Exemplary instance (and variants)
Relying on the previous analyses, we propose to consider an instance of crypto-physical dark matter based on a Mersenne prime p = 2 31 − 1. We aim for the generation of a 128-bit key in the parallel case and therefore consider m = 4 (the generated key will have entropy ≈ 4 × 31, which is sufficient for the intended applications). As for the size of the secret matrix n, we set it based on the minimum condition (n + 1) log p + 3 log n ≥ λ, with λ ≈ 128, and consider n = 4. More conservative solutions could be considered (especially for serial implementations, see the discussion in Section 6), but we assume these values are a good starting point in order to stimulate external cryptanalysis. We expect that this instance provides a concrete security such that the first attack path of Figure 3 is more challenging than targeting directly the long-term key shares via a side-channel attack like in Section 5.3, up to significantly higher number of shares than the proposal of Medwed et al. [MSGR10]. Variants that would be worth being investigated in the future include: • Using a Toeplitz matrix for K. This would reduce the key size at the cost of introducing more redundancy in the computations, which could be exploited via mathematical cryptanalysis (extending the results in this paper) or side-channel cryptanalysis (e.g., enabling so-called horizontal attacks [BCPZ16]).
• Removing the additional key addition (i.e., considering a nm-word key rather than a (n + 1)m-word key). As mentioned in Section 4.2.1, a first step in this direction would be to evaluate how much the MEDP can be preserved in this case.

Implementation results
The previous section showed that a re-keying based on crypto-physical dark matter is significantly more secure than the initial proposal of Medwed et al. [MSGR10], under realistic leakage assumptions. We now discuss the other part of our contribution. Namely, we show that the proposed instance is also significantly more efficient than the wPRF proposal of Dziembowski et al. [DFH + 16]. In order to make the two solutions somewhat comparable (given that they are providing security in quite different models), we consider a hashed version of the LWR-based wPRF (that gives a PRF in the random oracle model [BR17]), implemented on a a modern FPGA in [BSS20], which we compare with the full TBC-based re-keying scheme of Figure 1(b), with an AES-based TBC following [LRW11]. We then propose a masked implementation of our new solution relying on a similar hardware architecture as [BSS20] and exhibit the improved performances it achieves. We finally analyze the side-channel security of this masked crypto-physical dark matter and discuss how to select the number of shares in order to reach a given security target.

Hardware architecture
Our architecture to perform crypto-physical dark matter computations is illustrated in Figure 4. As the architecture in [BSS20], it is designed to leverage the key-homomorphism of the re-keying function by processing the shares serially. It is divided in two main blocks. The first one is composed of the different memories that hold the shares of K, the value of r and the randomness required to refresh the shares. The latter is embedded into the memory blocks and is added to each word of K after it has been read from memory. The second block is the computation core. It includes the logic to perform the different dot product operations (organized as a pipeline for efficiency) and an accumulator that recombines the intermediate results. For security and performance (in particular latency) reasons, we perform 5 multiplications in parallel, leading to an internal bus of 160 bits. The detailed architecture of the memory block is shown in Figure 5. The r memory consists of a 128-bit long register that is fed with the appropriate value at the beginning of an execution. Both the key and the randomness memories are composed of d (i.e., the amount of shares used) independent memories of d * m * (n + 1) * 32 = d * 640 bits. In practice, these are mapped to BRAM blocks which are dedicated memory resources embedded in the FPGA. While processing a specific share, all the memories are read and the appropriate output is selected. The selection mechanism is straightforwardly done using a multiplexer for the randomness memory. An additional register barrier is added in front of the selection multiplexer in order to avoid physical defaults such as glitches that may lead to reductions of the security order [MPG05]. Additionally, the register barriers corresponding to memories that are not expected to be read are reset. The values that are read in memory are then directly forwarded to the computation pipeline. A dedicated mechanism is used in order to refresh the key values that are forwarded to the computation core. Each share is directly re-randomized after being used, with a (linear) refresh mechanism that is similar to the one in [BSS20]. Namely, we first generate uniform 31-bit values thanks to four 128-bit LFSRs and output a uniform value over F p thanks to a simple rejection sampling (i.e., we output a single 31-bit value that is different from 2 31 , and use four LFSRs so that they jointly fail with low probability). We then refresh the key words and write them back in the appropriate memory.
The data output by the memory block directly feeds the computation core by entering the modular multiplication layer. As depicted in Figure 6, the latter in composed of 5 independent modular multiplications. We rely on the DSP resources embedded in the FPGA to do so, and more precisely on the multiplier units that they contain. We use instances of the solution presented in [KSHS17] in order to perform the modular multiplications. This solution mixes an efficient utilization of the DSP resources combined with an adder tree specifically designed to limit the logic depth. Additionally, it allows easily incorporating the modular reduction with limited overheads.
Following the modular multiplication, the modular adder tree is split in a two-level pipeline to improve the maximal clock frequency that can be reached. It turns out the register between the second and the third addition layers has a significant impact on the clock frequency, improving its maximal value from 50MHz to 80Mhz at a low logic cost (see Table 1). The difference with the solution in [BSS20] which reaches 90MHz without pipeline is due to the additional logic required to perform the modular reductions.
Eventually, an accumulator ends the pipeline and is composed of 5 independent adders with feedback. As when a refreshed key share is written back in the memory, the data coming from the dot product pipeline is fed to all the adders and only a specific one is activated depending on the line processed. Following this configuration, the final session key value is obtained once all the lines of all the shares have been processed. We start with the size of the key storage which is among the easiest to quantify. It is represented in Figure 7 and simply illustrates the difference between the (128 × 32)-bit key that the LWR-based solution requires, to be compared with our (20 × 32)-bit key. We follow with the randomness requirements of the two proposals, which are illustrated in Figure 8. They are proportional to the key size, but the gap between the LWR-based solution and ours is amplified due to the fact that the output of each inner product in [DFH + 16] has to be rounded to 10 bits for security reasons, and log 2 (d) bits are additionally lost due to the error correcting code that is needed in order to deal with the carry propagations that make the LWR-based wPRF almost key-homomorphic only. The implementation cost of the two proposals on a Xilinx Kintex 7 is summarized in Table 1. As previously mentioned, our architecture is a parallel one performing P = 5 modular multiplications concurrently. The one of [BSS20] uses levels of parallelism P = 1, 2, 4 and 8. We report the P = 4 and P = 8 cases which are the most comparable (since the P parameter also sets the number of DSP blocks used by the FPGA, that is worth 4 × P ). The figures exclude the key storage that is implemented in the FPGA BRAMs blocks. The LWR-based solution additionally includes an implementation of Keccak to hash the input. The LWPR-based solution additionally includes the implementation of an AES-based TBC. We conclude from these figures that both solutions lead to reasonable costs and their area requirements should not be a problem for practical deployment. We additionally note that in the LWPR case, the cost of the AES-based TBC implementation (which leverages a 32-bit architecture) amounts for 650 registers and 629 LUTs. Eventually, the most relevant metric to highlight the performance gains of the cryptophysical dark matter is the latency given in Figure 9 (excluding the generation of the masking randomness but including the input hash function for the LWR-based solution, and the AES-based TBC for the LWPR-based one). It underlines that even when the LWR-based solution is implemented with a large level of parallelism (e.g., P = 8), the LWPR-based solution is orders of magnitude faster. Furthermore, the lower part of the figure shows that for 20 shares, a full re-keyed TBC can be performed in approximately 160 cycles (to which one must add the cost to generate 12,500 random bits, as per Figure 8, which means a quite reasonable 80 bits per cycle if to be generated on-the-fly).

Physical security
The previous performance evaluations show that crypto-physical dark matter can be a very competitive option for higher-order masked implementations. For completeness, we next provide the results of a practical security evaluation for such masked implementations, using the same worst-case approach as in [BSS20]. We first recall some generic advantages of key-homomorphic primitives for masking, then describe how to evaluate/bound their concrete (e.g., power consumption) leakage, and finally show how to choose the number of shares needed to reach a given security target thanks to this leakage bound.
Advantages of key-homomorphic primitives for masking. In short, and as carefully discussed in [BSS20], key homomorphic primitives enable a significant simplification of both the design of secure masked implementations and their evaluations. A first interesting feature for this purpose is that they are trivial to analyze from a probing security viewpoint, since they enable the independent manipulation of the shares. Key-homomorhic primitives do not suffer from composability issues and the only refreshing they need is for the shares of their long-term key, which can be performed using cheap schemes (with linear overheads), as discussed in [BDF + 17], Section 8.2.
A second interesting feature is that they mitigate the risks of physical defaults (like glitches or transitions) leading to shares' re-combinations. This benefit again comes from the possibility to manipulate shares independently. In case of shares-serial implementations like the one we chose, they can additionally reduce the risks of couplings [CBG + 17].
A third advantage is that each key share is manipulated minimally (i.e., a constant number of times, independent of the security order). This ensures an inherently good resistance against horizontal attacks, formalized by a constant noise rate.
Eventually, a consequence of the previous advantages is that the evaluation of such masked implementations is scalable. For example, increasing the number of shares in a block cipher implementation usually benefits from order-specific optimizations and implies some re-design of the internal components that consequently requires repeating the (time-consuming) side-channel security evaluations for each security order. By contrast, increasing the number of shares of a key-homomorphic architecture boils down to re-using exactly the same component multiple times, avoiding the need to repeat evaluations.
Evaluation of the shares' leakages. The side-channel security of a masked implementation depends on two main assumptions: the shares' leakages must be sufficiently noisy and independent [DFS19]. As discussed and evaluated in [BSS20], the independence is essentially guaranteed by design in a shares-serial key-homomorphic architecture. Since we use the same architecture, we do not detail the detection tests needed to confirm this assumption and rather focus on the level of noise in a prototype implementation.
For this purpose, we first synthetized our design for the Xilinx Spartan 6 FPGA available on the SAKURA-G board. 4 Its clock frequency was set to 6 MHz. The synthesis was performed with the "keep hierarchy" flag avoiding the tool to trim out useful registers. The leakage signal was captured with a Tektronix CT-1 probe. This signal was sampled with a Picoscope 5244d at a rate of 500 MSamples/s, with 12-bit resolution.
An exemplary power traces is given at the top of Figure 10. It corresponds to a 3-share implementation that generates a fresh key in ≈ 12 cycles (i.e., 4 × 3 cycles, where the 4 factor is the number of 31-bit key words generated to obtain a 124-bit key and the 3 factor is the number of shares). As a standard first step in our evaluations, we estimated the 8-bit side-channel Signal-to-Noise Ratio (SNR) [Man04], which is represented at the bottom of the same figure. We observe that the level of SNR varies between bytes and selected the most leaking byte for our following (worts-case) investigations. As a standard second step in our evaluations, we used the SNR plots to select Points-of-Interest (POIs) in the traces, represented by the crosses in Figure 10. We then estimated the information leakage that can be extracted from the corresponding multivariate distribution under a Gaussian assumption, using the information theoretic metrics and bounds introduced in [BHM + 19]. Namely, we estimated the Gaussian Perceived Information (gPI) and the Gaussian hypothetical information (gHI) using the sampling based estimation described in this reference. The convergence of these two metrics for the 10 POIs we selected is illustrated at the top of Figure 11. The gHI gives a bound on the amount of information that can be extracted with such a Gaussian model. We assume that it provides a reasonable approximation of the worst-case security level of our implementation (with the usual cautionary remark that better measurement setups and models can always improve the gPI and its corresponding gHI bound). Since we consider an 8-bit adversary while attacks based on 32-bit guesses can be performed by determined adversaries, we multiplied our leakage estimation by a factor 4 to make our approximations more conservative. Selecting a number of shares. Eventually, based on the previous estimations of the information leakages and assuming that the independence condition is fulfilled for our implementations, we use the results in [DFS19] in order to select the number of shares needed to reach a target security level. In short, they bound the data complexity N of any side-channel attack against a masked implementation by the inverse of the the mutual information obtained on its shares (which we approximate with the aforementioned gHI) raised to the number of shares d. It leads to the following inequality: with c a small constant depending on the size of the key hypothesis and the target success rate [dCGRP19] (e.g., c = 10 is a standard value for 8-bit guesses). The resulting data complexities are given at the bottom of Figure 11 in function of the number of shares. They confirm that high security can be obtained against the second attack path of Figure 3.

Conclusions & open problems
We introduce crypto-physical dark matter as a provocative solution to improve the security and performances of state-of-the-art re-keying schemes. Its main idea is to combine simple computations in a medium size prime field with a physical leakage function that we assume operating in a sufficiently different field. As feasibility results, we show that such a combination ensures a number of relevant cryptographic properties for the well known Hamming weight leakage function, and that it leads to excellent performances in hardware, leading to a number of stimulating research challenges that we detail next.
From a (both mathematical and implementation) security viewpoint, the preliminary analyzes of this work could be extended towards more advanced attacks, in order to refine the understanding of the complexity to solve the LWPR problem. Another important open question is to generalize our conclusions to broader classes of leakage functions. For example, obtaining results for any "linear" leakage function, as modeled in [SLP05], appears as a natural first step. Eventually, and despite our current security analyzes suggest that some degree of parallelism can make the LWPR problem harder, it would be interesting to study whether crypto-physical dark matter could lead to secure implementations in software (e.g., 32-bit) devices, that are in general difficult to secure against side-channel analysis and will likely require stronger instances and larger number of shares.
Besides, from an application viewpoint, the integration of the proposed re-keying scheme in leakage-resilient modes of operation and/or the efficient protection of decryption algorithms with them would be worth being investigated as well. A natural starting point is the work of Mennink on fresh re-keying applied to authenticated encryption [Men20].