Fallen Sanctuary: A Higher-Order and Leakage-Resilient Rekeying Scheme

. This paper presents a provably secure, higher-order, and leakage-resilient (LR) rekeying scheme named LR Rekeying with Random oracle Repetition (LR4), along with a quantitative security evaluation methodology. Many existing LR primitives are based on a concept of leveled implementation, which still essentially require a leak-free sanctuary ( i.e. , differential power analysis (DPA)-resistant component(s)) for some parts. In addition, although several LR pseudorandom functions (PRFs) based on only bounded DPA-resistant components have been developed, their validity and effectiveness for rekeying usage still need to be determined. In contrast, LR4 is formally proven under a leakage model that captures the practical goal of side-channel attack (SCA) protection ( e.g. , masking with a practical order) and assumes no un-bounded DPA-resistant sanctuary. This proof suggests that LR4 resists exponential invocations (up to the birthday bound of key size) without using any unbounded leak-free component, which is the first of its kind. Moreover, we present a quantitative SCA success rate evaluation methodology for LR4 that combines the bounded leakage models for LR cryptography and a state-of-the-art information-theoretical SCA evaluation method. We validate its soundness and effectiveness as a DPA countermeasure through a numerical evaluation; that is, the number of secure calls of a symmetric primitive increases exponentially by increasing a security parameter under practical conditions.


Background
Side-channel attacks (SCAs) are physical attacks on cryptographic implementations [KJJ99].SCA countermeasures are roughly categorized into three types: masking, hiding, and leakage-resilient (LR) cryptography.Both masking and hiding are basically designed to suppress/eliminate the leakage for a given algorithm/device.However, it has been shown experimentally [BS21] and theoretically [DFS15,IUH22a,MRS22] that it might be difficult for masking to achieve secure implementation on some low-end devices with trivial noise.Hiding (e.g., a secure logic style like WDDL [TV04]) is also sometimes unsuitable because its effectiveness depends heavily on the given device/technology.In contrast, LR cryptography features cryptographic algorithms capable of secure computation up to a specific, predetermined level of leakage.For developing practically secure cryptographic modules, it is essential to investigate the possibility and limitations of LR cryptography as well as masking and hiding.against higher-order attackers, while the number of traces for key recovery increases exponentially by the masking order [DFS15,IUH22a,MRS22].
On the other hand, the security of LR-PRFs has been discussed using a bounded data or trace complexity model, which means that the number of plaintexts/traces available in attacking the PRF with a secret key is bounded.Here, secure LR-PRF implementation requires a component resistant to DPA with m plaintexts or traces (i.e., m-bounded DPA-resistance).This is a more relaxed and practical condition than unbounded DPA resistance.Nonetheless, the validity of the bounded complexity model needs to be clarified in practice.At least, the model is valid only if it works with a dedicated protocol, but there has been little discussion about its practical use.In addition, it is not trivial how to determine m for a given device about SCA success rate on the LR-PRF.See Appendix A for the details of existing LR-PRFs.
In summary, the existing LR cryptography schemes have some non-trivial limitations in their practical use, and the relation and combination of the aforementioned LR cryptography schemes have not been comprehensively discussed.It remains an open problem to determine how far away LR cryptography is from a leak-free sanctuary.

Our contributions
We present a cryptographic scheme and its security evaluation to address the abovementioned challenges.
New LR rekeying scheme.We propose a provably secure, higher-order, and LR rekeying scheme, named LR Rekeying with Random oracle Repetition (LR4).For this purpose, we introduce a new leakage model for rekeying and provide a formal security proof of LR4.We then show the validity of the leakage model and how to utilize LR4 in practice.We also discuss its practical aspects and analyze its implementation cost, efficiency, and low latency.We compare LR4 to state-of-the-art LR encryption schemes in Section 3.3, and confirm the advantage and effectiveness of LR4.
From a technical viewpoint, our major contributions include the new definition of leakage function, rather than the development of security proof technique or security notion.The definition of the leakage function for security notion/proof has been extensively studied, but its link to bounded trace complexity is largely unexplored.Currently, Accumulated Interference (AI) in [DMP22] (see Section 2.2) is the state-of-the-art for it.In this paper, we define another leakage function regarding trace complexity bound, which captures practical features of side-channel leakage and overcomes some drawbacks of existing leakage functions.We then prove the LR4 security using a promising security notion.Thus, LR4 achieves preferable features for practical rekeying (see Section 3), compared to existing LR schemes.
Evaluation methodology for practical usage.We propose an information-theoretical methodology for evaluating the attack cost and success rate on LR4 given a device/condition by utilizing and extending state-of-the-art SCA evaluation methods [dCGRP19,IUH22a,MRS22].So far, the success rate has been commonly used for evaluating the SCA capability/resistance [SMY09], and many studies have been devoted for its practical and feasible estimation on symmetric primitive [MOS11, DFS15, dCGRP19, IUH22a, MRS22, BCG + 23], while there are few studies on the success rate evaluation on mode of operations (rather than primitive).Thus, we formally define the attack success rate on LR4, and then formally analyze the relationship between the attacks on LR4 and the underlying symmetric primitive, which enables a quantitative evaluation of the attack cost as success rate and the number of attack traces.Our methodology is able to determine the rekeying order d and the trace bound m as the rekeying interval from a quantity of target device (i.e., mutual information or signal-to-noise ratio (SNR)).
Validation.Using the proposed methodology, we show a numerical evaluation of the attack cost on LR4 instantiated with AES.The results confirm an exponential increase of the number of secure calls of a symmetric primitive by increasing the rekeying order d under practical conditions.We also discuss properties required for a secure rekeying.
Technical challenges.The main technical challenge in our work is the simultaneous pursuit of high practicality and strong provable security on leakage resilience in rekeying mechanism.The former excludes several conventional techniques in LR cryptographic schemes, most notably a leak-free component and generation and transmission of true random values (where generation must also be secure against leakage).Our LR4 could be viewed as an alternative interpretation of classical GGM PRF with a counter, however, we need to show its leakage resilience both from theory and practice.This requires us to develop a dedicated security model capturing practical protection methods applicable to each module in LR4, such as higher-order masking of a practical order.Meanwhile, we should consider how to determine the parameter in the security proof (i.e., trace bound m), given a device, for the practical usage of LR4 with a quantitative security guarantee.For evaluating LR4's practical security, we develop a formal definition of SCA success rate on LR4 and extend a state-of-the-art information theoretical evaluation method [dCGRP19] to its evaluation.Consequently, we are able to show that LR4 has exponential security with respect to the parameters (number of modules and leakage resistance of each module) both from practical and theoretical viewpoints.

Conventional studies on rekeying
Rekeying is one of the primordial countermeasures against DPA suggested by Kocher et al. [KJJ99].The basic form of rekeying is illustrated in Figure 1.The rekeying schemes exploit the fact that most SCAs on symmetric primitives require a number of traces (i.e., calls of the target primitive) for the key recovery.The basic idea behind the rekeying is to use a temporal key (i.e., session key) k tmp generated from a master key k mst , and then update the session key using a deterministic rekeying function (g in Figure 1) and a (random) IV r at a frequency that does not allow any attackers to succeed in the temporal key recovery.Here, the target primitive is assumed to have a minimal resistance against SCAs with a certain number of traces (m-bounded DPA-resistance, or SPA-resistance if the number of traces is one) because the temporal key is discarded after the number of calls (or one).In contrast, the rekeying function should be DPA-resistant or leak-free because it is called many times with the master or internal temporal key.
The above idea was formalized by Medwed et al. in 2010 (MSGR) [MSGR10] as Fresh Rekeying.Fresh rekeying and its variants have been extensively studied from various viewpoints such as efficient instantiations, formal or practical security, and both with and without leakage [GFM13, MPR + 11, BDSH + 14, DEMM14, MS14, PM16, DFH + 16, DMMS21].In particular, realization of key derivation function (g in Figure 1) is one of the central research topics in Fresh rekeying.MSGR suggested to use non-cryptographic operation (ring/field multiplication) for the key derivation based on the observation that key derivation should have no black-box security as the derived temporal key is never given in clear.Dobraunig et al. [DEMM14] pointed out a problem of this instantiation by showing a chosen-plaintext (master) key recovery attack.Their attack is a simple time-memory tradeoff using a set of precomputed ciphertexts with guessed temporal keys and a fixed plaintext.Dziembowski et al. [DFH + 16] proposed rekeying components based on lattice cryptography backed by a certain theoretical guarantee.This direction has been further explored by Duval et al. [DMMS21].Despite reports on some attacks [DEMM14,PM16], in reality, the root of security-an unbounded DPA-resistant/leak-free module-is barely implemented with sophisticated SCA countermeasures (e.g., masking) to the best of authors' knowledge.

Paper organization
Section 2 introduces notations and existing attack/leakage models for LR cryptography.Section 3 proposes our higher-order and LR rekeying scheme, named LR4.Section 4 formally proves the security of LR4, with a formalization of leakage model for rekeying.Section 5 presents a quantitative and information-theoretical evaluation methodology based on formal analysis on attack cost and success rate on LR4 and how to use LR4 in practice.This is followed by Section 6, which demonstrates the validity of LR4 through numerical evaluations and discussion.Section 7 discusses the relation, comparison, and compatibility of LR4 with existing LR cryptographic schemes.Finally, Section 8 concludes this paper.

Basic notations
Let [i] denote {1, 2, . . ., i} for any positive integer i.We define {0, 1} i and {0, 1} * as the set of i-bit strings and the set of all arbitrary-length strings, respectively.Let log denote the binary logarithm.
A tweakable block cipher (TBC) is a keyed function E : K × T w × M → M, where K is the key space, T w is the tweak space, and M = {0, 1} n is the message space, such that for any (K, T w ) ∈ K × T w , E(K, T w , •) is a permutation over M. We interchangeably write E(K, T w , M ), E K (T w , M ), or E Tw K (M ).The decryption routine is written as ( When T w is a singleton, it is essentially a block cipher and is written as E : K × M → M. For sets X , Y, and T w , Func(X , Y) denotes the set of all functions from X to Y, Perm(X ) denotes the set of all permutations over X , and TPerm(T w , X ) denotes the set of all functions f : T w × X → X such that for any T w ∈ T w , f (T w , •) is a permutation over X .A tweakable uniform random permutation (TURP) with a tweak space T w and a message space X , P : T w × X → X , is a random tweakable permutation with uniform distribution over TPerm(T w , X ).The decryption is written as ( P −1 ) Tw (•) for TURP given tweak T w .A random oracle (RO) is a random function that is uniformly distributed over Func({0, 1} * , n) for some fixed n (which can be implemented with a lazy sampling).An ideal cipher (IC) is a random block cipher that is uniformly distributed over TPerm(K, X ) for some fixed finite sets K and X (that is, the set of all the block ciphers with key space K and message space X ).These are assumed to be publicly accessible when involved in the game.An IC accepts both encryption and decryption queries with a chosen key.

Attack/leakage models for LR cryptography
Data and trace bounded attacker.In an m-bounded data complexity model, the attacker can call the underlying symmetric primitive with m different plaintexts using an identical secret key [MR04,MSJ12].If the data complexity bound m is sufficiently small, we expect that the attacker cannot obtain sufficient information about the secret intermediate variable (e.g., Sbox output) and secret key.Thus, a key-recovery SCA is (believed to be) difficult if the number of available plaintexts is sufficiently bounded.However, Medwed et al. reported in [MSJ12] that, even if m = 2, the attacker can recover the secret key from low-noise devices (e.g., a low-end microcontroller) with a non-trivial success probability.Accordingly, they suggested considering the number of available traces for an attack success.In this paper, we suppose that an m-trace bounded attacker can utilize not more than m traces.In the rekeying context, m corresponds to the rekeying interval.Note that a malicious attacker may call the primitive with a fixed input many times, especially if attacking the decryption module of a nonce/IV-based (authenticated) encryption scheme.This observation means that we should consider how to implement a symmetric primitive under m-bounded trace complexity in practice (which may require considering a high-level protocol).LR4 handles this consideration, as discussed in Section 3.2.

Bounded leakage function.
In LR cryptography, secret information (e.g., temporal key and internal value/state) leaks through each call of the underlying symmetric primitive.Let s (i) be the i-th state (which may include the i-th temporal key).At the i-th call, information about s (i) is leaked as L i (s (i) ) for each i, where L i is the i-th leakage function.From practical constraints, we consider a non-adaptive leakage model: a fixed L exists such that L = L i for any i [SPY + 10].The leakage function is given by, for example, some bits of s (i) and its Hamming weight with noise; or, it is defined using the number of leaked bits so that a state leaks at most λ bits.Note that the leakage function is usually bounded somehow.The attacker trivially wins if he/she gets all the information about the secret key or the intermediate value/state through leakage.

Accumulated interference (AI).
Related to the bounded leakage function, Dobraunig et al. presented the concept of Accumulated Interference (AI) [DMP22] in 2022, which models leakages of permutation-based LR-AEs through both SCA and fault attacks.This model allows evaluating the attacker's advantage by the accumulated gain (AG).AG is defined using an input dataset for a given SCA, and AI is related to the trace complexity bound.However, AG has been evaluated only experimentally and empirically for specific SCAs; therefore, the evaluated advantage value cannot be an upper bound regarding all possible (even theoretically optimal) SCAs, which is essential for leakage resilience.Although an asymptotic approximation of the attacker's advantage for a given SCA is important to evaluate the SCA resistance of a device in some applications, a theoretical upper-bound evaluation is essential for both the theory and practice of LR cryptography.Dobraunig et al. then presented an LR encryption scheme asakey and its variant strengthened asakey based on AI/AG.In particular, the variant utilizes a caching strategy similar to LR4 for improved leakage resilience.In Section 3.3, we compare LR4 and asakey to demonstrate the significance of LR4.

Adaptive vs. non-adaptive leakages.
Major existing LR cryptography schemes adopted a non-adaptive leakage model (e.g., [DP10, YSPY10, FPS12, DMP22]).Actually, the practical validity of the non-adaptive leakage model/assumptions were discussed in [SPY + 10, YSPY10, FPS12]; power/EM attackers must fix leakage function before obtaining any leakage/output because of physical constraints of power/EM measurement (e.g., on-board pin/connector and EM probe).Although extreme attackers might move 3 Proposed scheme

Basic concept
As discussed in Section 1, while (fresh) rekeying is a promising approach for LR cryptography, its full practicality is questionable due to the need for a leak-free function.We present a solution to this problem, dubbed LR4, and detail its construction below.Let a positive integer d be the rekeying order.For each i ∈ [d], let G i : {0, 1} n k × {0, 1} nctr → {0, 1} n k be a function called a rekeying component, which takes an n k -bit key and an n ctr -bit counter and outputs an n k -bit key.We require each G i should behave like an independent RO; that is, for any input, the output looks random, and the outputs from the same input to G i and G j for i ̸ = j are uncorrelated.Let E, D : {0, 1} n k × {0, 1} n bc → {0, 1} n bc denote the encryption and decryption routines of an n bc -bit block cipher with an n k -bit key.The LR4 rekeying scheme with order d consists of an encryption function LR4.E and a decryption function LR4.D such that LR4.E, LR4.D : {0, 1} n k × ({0, 1} nctr ) d × {0, 1} n bc → {0, 1} n bc .LR4.E (resp.LR4.D) takes an n k -bit master key k mst , a d-tuple of n ctr -bit counters ctr d = (ctr 1 , ctr 2 , . . ., ctr d ), and an n bc -bit plaintext (resp.ciphertext) as inputs, and outputs an n bc -bit ciphertext (resp.plaintext).Figure 3 show LR4.E and LR4.D using a temporal key derivation function R, such that R : {0, 1} n k × ({0, 1} nctr ) d → {0, 1} n k .R takes an n k -bit master key k mst and a d-tuple of n ctr -bit counters ctr d as inputs, and generates a temporal (or session) key k tmp (as defined in Figure 3).See also Figure 2 for illustration.
For LR security, we assume that each G i does not leak anything up to m ≤ 2 nctr encryption calls with the same key under SCAs.Similarly, we assume that E does not leak up to m ′ encryption and decryption calls with the same temporal key.See Section 4 for the formal treatment and Section 5 for how to determine m (and m ′ ).
A sound SCA countermeasure should increase the number of traces for an attack success by increasing the security parameter(s).The LR4 can generate m d temporal keys securely under the m-bounded trace complexity model.Thus, the number of secure E calls increases exponentially by d, from m to m d , under the bounded trace complexity model (although m should be determined dependently on d, as discussed in Section 5).Formal and rigorous security with a leakage model is defined and proven in Section 4. Note that the proposed scheme does not contribute to the decrease of an SCA success rate, although we confirm the exponential increase of the number of secure E calls under some practical conditions as demonstrated in Section 6.1.Thus, the proposed scheme can be another direction for provably secure SCA countermeasures apart from masking.
LR4 has a structural similarity to the classical GGM PRF, as the temporal key generation is represented by an m-ary tree with nodes of counter value, as shown in Figure 4.As mentioned in Section 1, GGM PRF has also been adopted by several LR-PRFs [FPS12, MSJ12, MS14], although our objective-an LR rekeying scheme without an (unbounded) leak-free component-makes LR4 different from these LR-PRFs in terms of the interface and the security notion under leakage.See Appendix A for GGM and existing LR-PRFs.
On implementation efficiency.Compared to the existing related LR cryptography (e.g., GGM, the LR-PRFs described in Appendix A, and asakey), LR4 achieves an advantage of high-rate construction with provable security.For example, asakey, which is a stateof-the-art nonce-based and sponge-based LR encryption scheme in 2022 [DMP22], has a nonce-processing part with bit-by-bit absorption (represented by a binary tree like GGM) for its leakage resilience.Due to the leakage function definition, asakey is enforced to use a nonce processing part with a very low rate (i.e., 1-bit absorption per permutation) for provable security with leakage resilience.In other words, it is difficult to achieve a high-rate construction with provable security under its leakage function, as well as the existing LR-PRFs.In contrast, LR4 is the first GGM-like LR scheme that achieves both the provable security and a high rate.For example, if we instantiate ROs using SHA-3 as G i (k i , ctr i ) = SHA-3(k i ∥ ctr i ∥ i), LR4 readily absorbs more than 128 bits with temporal key using only one SHA-3 computation, whereas asakey requires n permutations to absorb an n-bit nonce.See Section 3.3 for quantitative comparisons.

Caching intermediate keys for improved SCA security and computational efficiency
We propose that the computation of LR4 utilizes caching of all intermediate keys as long as they can be used later in order to improve the computational efficiency and SCA security.
For example, if we increment a counter from 0 to 1 in Figure 2, k 2,1 (k i,j denotes the j-th intermediate/temporal key of k i ), which has been computed by processing counter 0, is cached to derive k 3,2 for counter 1, and the computing device releases it when the counter becomes 3.This is essential because re-computation of an intermediate key would lead to unexpected side-channel leakage violating the trace bound (see also Remark 1).
Figure 5 2 and Figure 3, we require an additional counter ctr d+1 to guarantee that E is called not more than m ′ times with an identical k tmp (note that the value of ctr d+1 has no influence on the output, except for ⊥).Here, we consider a total counter value as d+1 i=1 2 nctr(i−1) ctr d+2−i .This indicates that the counter is incremented from ctr d+1 .In R C , we first check the validity of input counters to detect counter replay attacks at Line 1.If the counter value is smaller than the cached one (i.e., replayed value) or out of range (i.e., ctr i ≥ m for 1 ≤ i ≤ d or ctr d+1 ≥ m ′ ), we abort the encryption/decryption.Otherwise, we compute the temporal key from the counters and master key with minimal computations.If ctr i = ch i and flag = 0, the computation is omitted and the cached key is used to avoid the leakage (where flag represents the necessity of computation).Note that ch d+1 and ctr d+1 are known to the attacker, which indicates that the side-channels (e.g., power/EM and timing) of the if branches in R C leak no secret information.
Latency and computational efficiency.The caching strategy also improves the latency and computational efficiency2 , as stated in Proposition 1.This is a substantial improvement over the straightforward computation requiring d times G.
Proposition 1.For a rekeying interval m, we need to run G only 1 + 1/(m − 1) < 2 times on average.Proof.Let d and m denote the rekeying order and interval, respectively.By definition, LR4 takes at most m d distinct counter values.The average number of G calls for processing these m d values is Let S d := d i=1 im −i .Taking the sum of geometric progression, we obtain 12 return k tmp ; kh d+1 , ch d+1 with cache invokes R C instead of R. The caches ch d+1 and kh d+1 are initially given as ch d+1 = (0, 0, . . ., 0), kh i+1 = G i (kh i , 0) for each 1 ≤ i ≤ d, and kh 1 = k mst .Here, kh i should be called at Line 9 only when it is actually used.
Hence, the right-hand side of Equation ( 1) is This completes the proof.
Memory overhead.The cache-based LR4 requires a non-volatile memory (NVM) to cache the intermediate keys.The memory overhead is given by (n k + n ctr )(d + 1) bits.For a practical parameter of n k = 128 and n ctr = 20 (see Section 6.1), the memory overhead is 148(d + 1) bits (e.g., 888 bits (= 111 bytes) when d = 5), which is efficient and sufficiently practical, compared to existing ones (see Section 3.3).For example, even low-end microcontrollers such as Atmel AVR Xmega128D series, which may be a major target of SCA countermeasures, have a 16K-128K byte flash memory and 1K-2K byte EEPROM.The memory overhead of LR4 is less than 10% of the very low-end ones.Thus, we confirm the practicality of cache-based LR4.
Explicit synchronization and replay detection.As the nonce (i.e., counters) forms a total order and LR4 caches its internal counters, LR4 detects replayed queries by comparing the orders of query and internal counters at the validity check at Line 1 in R C (ctr d+1 ; kh d+1 , ch d+1 ), before performing encryption/decryption.So, the attacker cannot perform any replay (for trace averaging and trace complexity bound violation).In contrast, if receiving a forwarded nonce (maliciously or accidentally), the LR4 module updates the internal counter to the nonce at Line 11 (as well as a valid counter) and performs encryption/decryption using the temporal key corresponding to the values.This is because such counter-forwarding never yields a violation of trace bound.Thus, LR4 offers secure explicit synchronization for the cases of malicious/accidental synchronization Figure 6: The rekeying function of asakey.K is a master key, and p is an underlying permutation.N 1 , . .., N k are one-bit split nonces where an input nonce N is written as failures.Note that denial-of-service attacks by counter forwarding to exhaust key lifetime remains an open problem to be prevented, while LR4 can prevent key-recovery SCAs.
Remark 1 (SCA security and potential threat).This paper focuses on a countermeasure against non-invasive power/EM analysis, in which any cached key does not leak unless it is called [MOP07].We consider the trace complexity bound as how many times the cached key is called; thus, the caching strategy essentially improves the SCA security of LR4.
In contrast, (semi-)invasive attacker may utilize a static leakage directly from memory components [MOP07,SSAQ02].Such a (semi-)invasive attacker attempts to directly read secret/cached keys to bypass SCA countermeasures.For such cases, the number of calls and trace complexity bound are no longer meaningful.Such (semi-)invasive attacks are outside the scope, as in many existing studies on LR cryptography and SCA countermeasures.(Semi-)Invasive attacks should be prevented by, for example, tamper-resistant memory and memory encryption, rather than SCA countermeasures.

Comparison with state-of-the-art
The high rate and leakage function of LR4 yield an efficient implementation with provable security, in comparison with existing LR cryptography such as asakey [DMP22] and ISAP [DEM + 20].asakey and ISAP are sponge-based encryption and AE, respectively, and thus they are functionally different from LR4.However, LR4's temporal key derivation function R and asakey/ISAP's rekeying function are functionally compatible; therefore, we compare them.As depicted in Figure 6, asakey uses a nonce-processing part as a rekeying function, which consists of a bit-by-bit absorption of the nonce, followed by a spongebased encryption.The intermediate state derivation in the nonce absorption of asakey is representable as a binary GGM.The strengthened asakey, which is a cache-based variant with an up-counter nonce, caches the intermediate states during the nonce processing part.
To validate the effectiveness of LR4, we show its comparison to the strengthened asakey.
We instantiate the strengthend asakey using Keccak-p[1600, 12] with 128-bit key according to [DMP22], while we instantiate RO in LR4 using Keccak-p[1600, 24] (i.e., SHA-3).We mainly discuss (strengthened) asakey in the following paragraphs because ISAP has a similar rekeying function as that of asakey.Note that ISAP has more parameters than asakey, such as the length of nonce absorption per one permutation call and the number of rounds of the underlying permutations.However, all instances of ISAP determine the length of nonce absorption to be one, which is the same as asakey.The number of rounds of the underlying permutations varies in each instance (mainly 1 round or 12 rounds), but we leave out the schemes using 1-round permutations from our comparison since we would like to focus on the schemes having provable security (also see the last paragraph in comparison of computational cost/latency).

Memory overhead.
The strengthened asakey requires an NVM to cache all its internal states, whose bit length is the product of the permutation length (1,600 here) and key/nonce length.In the above instantiation, the memory size of asakey is 1600 × 128 = 204800 bits.Although it can be reduced by limiting the number of calls by m, the required NVM are given by 1600 log m bits, which is still very high cost for practical m.In contrast, cache-based LR4 requires an NVM of only (n k + n ctr )(d + 1) bits, which is far fewer for realistic d (e.g., 888 bits for d = 5 and a practical parameter of n k = 128 and n ctr = 20 for example).Thus, LR4 is adoptable of even low-end microcontrollers with a 16 K-128 K byte flash memory and 1 K-2 K byte EEPROM as mentioned before.
Computational cost/latency.Moreover, to improve computational cost and latency, asakey and ISAP specify instantiations that use reduced-round permutations.For example, two instances of ISAP employ a 1-round Keccak/Ascon permutation in the nonce processing part except for the first and the last permutations.It significantly reduces computation cost and latency.To the best of our knowledge, any practical weaknesses have not been reported in these instances.However, in the asakey/ISAP's security proofs, the underlying primitive should be a public random permutation, which is intepreted as non-existence of structural distinguisher in practice.The use of a 1-round permutation implies a deviation from this assumption.
Provable security.The LR notions of asakey and LR4 share a (common) principle; consider a distinguishing game involving a leaky oracle, non-leaky oracle, and an idealized primitive oracle, while asakey's leakage model and ours are different and incomparable.Our model captures features of practical SCA countermeasures (e.g., higher-order masking) and dedicated to the tree-based re-keying schemes.As Section 4.3 shows, LR4 can be used as a replacement for leak-free/fresh-rekeying components in some of the existing LR-AEs (e.g., [Men20]), which helps make these schemes real.
Deleated and moved into above partially: We hereafter compare LR4 to ISAP's rekeying function (it is similar to asakey).ISAP uses lightweight primitives to overcome its low rate.In fact, two schemes of the ISAP family use the 1-round permutation of asconp/keccak-p[400] as a primitive in their rekeying functions.However, in the asakey/ISAP's security proofs, the underlying primitive should be a random permutation, which is far from a 1-round permutation.Thus, its rekeying function cannot avoid a huge gap between provable security and practical construction.In contrast, LR4 has both provable security and efficiency.
Leakage evaluation.In [DMP22], asakey's leakage resilience was evaluated for specific SCAs (e.g., CPA) to derive the AG value.In other words, asakey's leakage resilience can be evaluated for SCAs feasible by the evaluator/designer.However, an advanced attacker may mount stronger SCAs, which makes the asakey's practical security unclear.In contrast, LR4's security proof and leakage evaluation method (in Section 5.2) consider theoretically-optimal SCAs (i.e., most advanced SCA attacker) and capture the practical aspects of (higher-order) masking.Thus, LR4's leakage resilience includes practical security and covers all possible SCA attackers including one evaluated in [DMP22].

Security definition
We introduce a formal security notion under leakage for rekeying schemes, including LR4.The core idea of our security model is the same as [BMOS17].We define the security of an LR rekeying scheme as the probability that an adversary querying some leakage oracles successfully distinguishes the two worlds: real and ideal.Following Mennink [Men20], we model a rekeying scheme as a TBC, where a tweak corresponds to the IV (r in Figure 1) of a rekeying scheme.Our model allows the adversary to choose tweaks arbitrarily in the game and hence is more general than assuming it is a random value or a counter (although a practical choice would be a counter).In the real world, the adversary accesses LR4.E and LR4.D, while, in the ideal world, it accesses a TURP P of tweak space {{0, 1} nctr } d and message space {0, 1} n bc .Regarding leakage oracles, we define LR4-L.E and LR4-L.D as those of LR4.E and LR4.D, respectively.We will detail them later.We also assume that G 1 , G 2 , . . ., G d are independent ROs, and E and D are IC.We define the leakage-resilience of Fresh Rekeying (LFR) advantage for the security of LR4 as where LR4 ± denotes a pair of oracles LR4.E and LR4.D, and LR4-L ± denotes LR4-L.E and LR4-L.D. Also, P ± (resp.E ± ) denotes P (resp.E) and its inverse P −1 (resp.D), and G := {G 1 , G 2 , . . ., G d }.We call LR4 ± and P ± construction oracles and call queries to them construction queries.Similarly, we call LR4-L ± leakage oracle and call queries to it leakage queries.The use of idealized primitives, such as RO, can be found in the theoretical analysis of LR schemes, particularly for obtaining efficient schemes [YSPY10, BGP + 19, DJS19, DM19, GSWY20, FPS12].See also [BBC + 20] for an overview and discussion.
Leakage oracle.We here define LR4-L ± to capture the m-bounded trace complexity model introduced in Section 3.1, including caching keys shown in Section 3.2.We assume that LR4-L.E and LR4-L.D have the same input/output as LR4.E and LR4.D, except that they additionally output leakage Leak ∈ ({0, 1} n k ∪ {⊥}) d+1 .For the definition of Leak, recall that we assume each rekeying component and a block cipher can securely perform m and m ′ calls with the same key, respectively.Also, we assume m = 2 nctr to simplify the proof.To capture this leakage assumption, we define that LR4-L.E and LR4-L.D leak an overused key, which we detail below.We first assume that the leakage oracle records all the invoked intermediate and the temporal key values of the underlying G and E that appeared in the queries to the leakage oracle.For a query to the leakage oracle, when the first i ∈ [d] counters (ctr 1 , . . ., ctr i ) are the same as some previous queries, the leakage oracle merely refers to the memorized key value of (i + 1)-th intermediate key when G i is invoked with the same key (e.g., k i ∈ {0, 1} n k ) more than m times, we define k i as an overused key and set the i-th element of Leak as its value k i .Otherwise, we set it as ⊥.Similarly, when E is invoked with the same temporal key k d+1 m ′ times, we define k d+1 as an overused key and set the (d + 1)-th element of Leak as k d+1 ; otherwise, we set it as ⊥.We also write Leak = ⊥ to mean that no keys are overused, i.e., Leak = ⊥ = (⊥, ⊥, . . ., ⊥).Here, our model regards the side-channel leakage with m traces during the computation of all intermediate values and outputs.Thus, our leakage model and proof consider the best possible SCAs (including an optimal SCA in Section 5.1).
Query rules.We assume the adversary can query the same counters up to m ′ times in the leakage queries to prevent a trivial leak of a temporal key.Note that there is no restriction on repeating the same counters in the construction queries.Also, we suppose the adversary does not perform repeating/replaying and forwarding queries; it does not repeat a query across different oracles or the same oracle.In construction queries, we assume that the adversary does not query (ctr, C) to LR4.D after querying (ctr, M ) to LR4.E and obtaining C, and vice versa.The same assumption applies to the leakage queries.Note that the adversary can query any oracle in any order and can query counters in any order in construction and leakage queries.
Relation to existing leakage models.Our security notion under leakage is defined with a distinguishing game consisting of real and ideal worlds involving leaky and non-leaky (classical) oracles, where the former is in the both worlds.The latter in the real world performs real encryption whereas in the ideal world, it is idealized and returns always random.We also allow the adversary to query primitive oracles (G and E) in the both worlds.This framework itself is identical to those in the literature [BMOS17, KS20, DEM + 20, DMP22].The main difference is the definition of leakage function (the response of leaky oracle).A leakage function in existing models is stateless, namely does not reflect the query/response history, while ours depends on the previous queries as we care about how many times the same key has been used in each rekeying module.In addition, leakage functions in the existing LR-AEs, such as [BMOS17, KS20], are a direct composition of those defined for internal components (e.g., key/tag derivation functions, encryption function and message hashing function).The leakage functions for leak-free components (typically key/tag derivation functions) is vacuous and those for leaky components leak everything about its computation.In case of sponge-based LR-AEs, the internal primitive is typically a single cryptographic permutation and the leakage function determines the input and output leakage per every permutation call occurred in an encryption/decryption query to the leaky oracle.LR4 consists of two components, G and E, and defines different leakage functions that are dependent on the query histories.So our model shares some basic principles with existing works, however, the leakage function is dedicated to capture what is aimed by practical protection methods, e.g., high-order masking.

Security bound for LR4
Theorem 1.Let A be the adversary following LFR game.Let q be the total number of construction queries, and q L be the total number of leakage queries.For i ∈ [d], we assume A queries p i times to G i and p I times to E. Then, we have where p = d i=1 p i and q ≤ 2 n bc −1 .We also assume q L ≤ m ′ 2 nctrd .This theorem shows that LR4 has birthday-bound security regarding the internal key length and almost optimal security regarding the block cipher length since m ′ is small.Also, the term (q + q L )(p + p I )/2 n k indicates the relationship between the upper bounds of online and offline complexities that the adversary requires to attack LR4.
Proof.We use the H-Coefficient technique [Pat08,CS14] for the proof.See Section B.1 for the technical background.We first define transcripts, a set of input/output values of oracles in the LFR game the adversary obtains.
and Q L be the transcripts consisting of input/output of the construction oracle, G 1 , . .., G d , E and D, and the leakage oracle, respectively.In detail, we define where IK i,• is the input key; IV i,• is the input counter; and OK i,• is the output key.We also define To simplify the security proof, we assume that, after A finishes its interactions with all oracles, the leakage oracle (LR4-L.E and LR4-L.D) reveals all the involved keys (the master key, intermediate keys, and temporal keys) in computing the outputs.Then the construction oracle also reveals keys involved in its output computations.Note that constructions oracle in the ideal world uses P and P −1 hence have no keys to reveal; instead, the oracle outputs dummy values sampled uniformly at random from {0, 1} n k .To prevent a trivial win of A, the construction oracle (of both worlds) does not reveal the keys already revealed by the leakage oracle.For example, in Figure 4, assume that the adversary queries three counters (Lctr d 1 , Lctr d 2 , Lctr d 3 ) = ((0, 0), (1, 0), (1, 1)) to the leakage oracle and queries three counters (ctr d 1 , ctr d 2 , ctr d 3 ) = ((0, 1), (1, 1), (2, 0)) to the construction oracle.In this case, as shown in Figure 7, the leakage oracle reveals ) and then the construction oracle reveals only (k 3,2 , k 2,3 , k 3,7 ) since k 1,1 , k 2,1 , k 2,2 , and k 3,5 are already revealed by the leakage oracle.Let ]} be the transcript of keys revealed by the leakage and the construction oracles, where a indicates which oracle reveals the key: a = 0 means that the leakage oracle reveals the key, and a = 1 means that the construction oracle does.The index i indicates the depth of the key, and j indicates the index of the key in i-th depth keys, as shown in Figure 4 and Figure 7.In the case of the above example (i.e., Figure 7), the adversary obtains We introduce four bad events.Roughly, if the transcripts defined above fulfill any bad event, the adversary successfully distinguishes two worlds with high probability.
Bad1: A collision between the elements of Q K in the same depth.That is, the event that there exists i ∈ i,j2 .This event also includes the event that A obtains Leak other than ⊥.
Bad2: A collision between the revealed key in i-th depth and the input key of G i , where i ∈ {1, . . ., d}.That is, the event that there exists i ∈ {1, . . ., d}, j ∈ [2 nctr(i−1) ], a ∈ {0, 1}, Bad3: A collision between the revealed temporal key and the input key of E. That is, the event that there exists j ∈ Bad4: A ciphertext (resp.plaintext) collision between construction queries and leakage queries when counters are the same and plaintexts (resp.ciphertexts) are distinct.That is, the event that there exists i ∈ [q], j ∈ [q L ], and An upper bound of Adv LFR LR4 (A) would correspond to an upper bound of p bad := Pr [Bad1 ∪ Bad2 ∪ Bad3 ∪ Bad4] in the ideal world.This argument holds because the second part of the H-Coefficient technique, the so-called good transcript probability ratio, is lower bounded by 1 (see e.g., [CS14] for details).We here move out the derivation of this part to Appendix B because it is a typical one for birthday-secure constructions and is tedious but rather straightforward.For readers unfamiliar with H-Coefficient, we refer Second, the keys circled by blue will be revealed-following the tree in the real world and randomly sampled in the ideal world-except those not already revealed (i.e., circled by red).Thus, the latter step only reveals k 3,2 , k 2,3 , and k 3,7 .
to [CLS15, Theorem 1] to get an idea of how typical H-coefficient proofs with the case that the good transcript probability ratio is larger than one are conducted.We evaluate p bad in the ideal world.We start by evaluating Pr[Bad1].For each i ∈ [d+1], let nk i be the number of elements k , the number of revealed keys in i-th depth).Now, we have nk 1 = 1 assuming q + q L ̸ = 0, and nk 1 ≤ nk 2 ≤ • • • ≤ nk d+1 ≤ q + q L .In the ideal world, the elements k (1) •,• are chosen uniformly at random and independently, and k (0) •,• are derived from ROs.Thus, we obtain Pr For Bad4, we define Cnc as the number of distinct counters in construction queries, and ctr d 1 , . .., ctr d Cnc as the distinct counters.Let q 1 , . .., q Cnc be the number of construction queries whose counter is ctr d 1 , . .., ctr d Cnc , respectively; thus, recall that the adversary queries to LR4-L with the counter ctr d i , at most m ′ times.Assuming Bad1 ∩ Bad2 ∩ Bad3 happens, for each i ∈ [τ ], the probability of a plaintext/ciphertext collision is at most n bc holds.We evaluate Theorem 1 by summing up the four probabilities of bad events.Tightness of Theorem 1.The bound in Theorem 1 is tight, as we have two matching attacks.We here present two distinguishing attacks to show the tightness of Theorem 1.The attacks try to invoke the events corresponding to Bad1 and Bad4 defined in the proof.
The first attack shows the tightness of the term dq 2 L /2 n E +1 , and it corresponds to Bad1.The attacker first repeats queries to LR4-L.E with the same plaintexts and distinct counters.With a sufficient number of queries, the attacker can find a key collision defined in Bad1 with a high probability by obtaining Leak other than ⊥ or finding collisions of some ciphertexts.Once the attacker finds the key collision, it can distinguish two construction oracles LR4 and P by querying twice to a non-leakage oracle with the same plaintext and the counters where the key collision occurs.If ciphertext collision occurs, the attacker figures out that it queries LR4 with a high probability; otherwise, it does that it queries P.This attack requires q = O(1) and sufficiently large q L ≈ O(2 n E /2 ) since the probability of the key collision is at most dq 2 L /2 n E +1 .Note that we can show the tightness of the term dq 2 /2 n E +1 in the same manner as the above attack.The attacker repeats construction queries with the same plaintexts and distinct counters.If ciphertext collision occurs in some counters, the attacker can distinguish two worlds by checking if other plaintexts also collide with the counters.
The second attack shows the tightness of the term 4m ′ q/2 n bc , and it corresponds to Bad4.The attacker repeats construction queries and leakage queries with the same counters and distinct plaintext.If the attacker queries to the real world, the output ciphertexts cannot collide.However, if it queries to the ideal world, the probability of ciphertext collision is at most 4m ′ q/2 n bc , as aforementioned.

Applications of LR4
As a primary application of fresh rekeying, Medwed et al. considered a challenge-response (CR) protocol with low-cost devices (e.g., RFID) [MSGR10].LR4 can be used for CR protocols, utilizing counters instead of random challenge values.A more practically useful application is LR-AE (e.g., [DJS19, KS20, Men20, BGP + 19]).However, one line of research has presented various LR-AEs based on different leakage models for different security goals, utilizing different (LR and non-LR) primitives.Pinpointing how known LR-AEs can benefit from our proposal is not easy due to this wide variety of problem settings.In a very general sense, if an LR-AE scheme uses a nonce-based rekeying component which is assumed to be leak-free (e.g.. ISAP [DEM + 20]) it could be replaced with LR4 (but again it depends on the details of the scheme and needs ad-hoc security analysis).Some examples.Given the aforementioned limitations in mind, we describe example applications of LR4 to existing LR-AEs in more detail.The first is the proposal of Mennink at Asiacrypt 2020 [Men20].Mennink proposed a class of LR-AEs based on ΘCB (a TBC-based idealized version of OCB) [KR11].He proposes to instantiate ΘCB by replacing the internal TBC with a fresh rekeying scheme, where the input value r of Figure 1 is used as a tweak.Mennink presented the black-box security of the proposal but did not clearly show what leakage-resilience security would be possible3 .Still, the crucial point of his argument is that any TBC-based AE can use a fresh rekeying scheme as long as each encryption takes distinct tweaks determined by the nonce and the length of input variables (rather than the value of an input variable itself).If this holds and the nonce is a counter, each tweak is unique and determined incrementally, and we can use LR4 as the underlying TBC of ΘCB efficiently.As a result, the encryption of ΘCB does not leak anything from the TBC up to the bound of Theorem 34 .We should point out that Mennink's proposal does not strictly follow the rule mentioned above of tweak update in the processing of AD, as a TBC encrypts each AD block without taking a nonce (see e.g., [Men20, Fig. 5]).However, this can be easily fixed by involving the nonce for each AD block encryption.This fix does not harm the security under the standard model, namely, without leakage.Unlike many existing LR-AEs, the resulting scheme is parallelizable and the rate is one; namely, it needs just one TBC (realized by LR4) call to process one input block, and thus it is quite efficient.
ΘCB has a relatively large state size due to its parallel structure.If we want to reduce the implementation size, serial counterparts such as PFB_plus and PFB ω by Naito et al. [NSS20] could be used instead.These modes are specifically designed with (higherorder) masking implementations in mind.Romulus [IKMP20], a finalist of the NIST Lightweight cryptography project [Nat23], could also be used, with a similar modification to the AD processing as mentioned above to the Mennink's scheme.Meanwhile, some TBC-based LR-AEs do not follow the condition mentioned above on the tweak values used in encryption, such as HOMA [NSS22] and TEDT [BGP + 19].LR4 is not suitable for these because the tweak cannot be updated in an incremental manner.Designing efficient LR rekeying schemes (or TBCs) that suit these LR-AEs remains an interesting open problem.
The second example is FGHF [DJS19] or its improvement [KS20].These LR-AEs are encryption-then-MAC composition, where the encryption and MAC function uses single-block leak-free PRFs.The first is directly replaceable with LR4 as it takes nonce N as an input (which will be a counter of LR4; the input block of E could be a fixed constant).The second leak-free PRF takes the (key-less) hash value V of the tuple of (ciphertext, nonce, associated data), and does not take a nonce N .Using LR4, this PRF can be modified so that it takes V as E's input and the next nonce (N + 1) as the counter of LR4.We do not go into the details here, however, the security proof with leakage could be obtained in a similar manner to that of [KS20] (given certain restrictions on the decryption leakege imposed by m ′ ).
5 Quantitative success rate evaluation methodology for rekeying schemes

SCA backgrounds and success rate
Notations for SCA.We introduce notations for the discussion about SCA.An uppercase letter (e.g., X) denotes a random variable/vector on a set denoted by the calligraphic character (e.g., X ), and a lowercase character (e.g., x) denotes an element of the set (i.e., x ∈ X ), unless otherwise defined.Let Pr be the probability measure and p be the density or mass function.A side-channel trace is defined as x ∈ X ⊂ R ℓ , where ℓ is the number of sample points.Let m be the number of traces available for an attack.Let X and T be the random variables for side-channel trace and n b -bit partial plaintext/ciphertext, respectively.We suppose that m side-channel traces X m = (X 1 , X 2 , . . ., X m ) and plaintexts/ciphertexts T m = (T 1 , T 2 , . . ., T m ) are independent and identically distributed (i.i.d).A secret variable utilized in SCA is denoted by Z.If we need to specify a secret key k, we denote it by Z (k) .For example, Z (k) = Sbox(T ⊕ k) and n b = 8 for major AES software implementations.
Optimal SCA.In SCA on symmetric ciphers/primitives, we usually compute the rank of key candidates from side-channel traces and partial plaintexts/ciphertexts, and estimate the correct key according to the score.Let S : K × X m × T m → R be a score function and let δ S : X m × T m → K be an SCA distinguisher using S.For example, the correlation power analysis (CPA) utilizes the absolute value of Pearson's correlation coefficient as a score function, combined with a leakage function [BCO04] is proven to provide an optimal attack, where p Z|X denotes the true conditional probability distribution of Z given X.In other words, δ L (X m , T m ) = arg min k L(k; X m , T m ) is an optimal distinguisher5 .The function L is called NLL.For a given device, considering such an optimal attack is sufficient to evaluate the SCA resistance (and leakage resilience) against all possible SCAs.
Success rate.Success Rate (SR) has been commonly used for evaluating SCA performance and the validity of SCA countermeasures for various cryptographic implementations [SMY09].SR given from m traces, denoted by SR m , is defined as the probability that the rank of correct key k * becomes one, as where rank(k * , m) denotes the correct key rank in an optimal SCA, defined as Here, 1 denotes the indicator function.Note that, for a simplified notation, we here omit the inputs of X m and T m to rank and L; therefore, rank and NLLs are random variables in this context.A sound SCA countermeasure should exponentially increase the number of traces required to achieve an SR by an increase of security parameter(s).For example, masking schemes are proven to satisfy this property: the SR of SCA on masked implementations exponentially decreases by an increase of the masking order, which corresponds to the number of traces to achieve the SR, under a condition about mutual information [DFS15, IUH22a, MRS22].

SR upper-bound evaluation.
As the true probability distribution is usually unknown and unavailable, it is quite difficult to evaluate the SR of an optimal SCA in practice.Currently, one of the most popular methods for evaluating an optimal SR would be to use DL-SCA: the evaluator profiles the device under test (i.e., trains an NN to imitate the true probability distribution p Z|X ) and repeats an attack with m traces using the trained NN to evaluate the SR m empirically [ZBHV19, PHJ + 19, dCGRP19].However, an empirical approach like this incurs a non-negligible computational cost, and the soundness and validity of the evaluation result are sometimes uncertain due to NN approximation error, NN hyperparameter variations, and the stochastic aspects of learning.Alternatively, an inequality evaluation is sometimes useful for estimating the theoretically achievable SR from a quantity (e.g., mutual information and SNR) for a given device/implementation, as stated in Theorem 2.
Theorem 2 (SR upper-bound [dCGRP19,IUH22a]).Let I(Z; X) be the mutual information between the secret intermediate value Z and side-channel trace X.Let SR m be the success rate of SCA with m traces.SR m is upper-bounded as where ξ(SR m ) is a function ξ : [0, 1] → R + 0 , defined as In Equation (4), H(K) denotes the entropy of K, log is the binary logarithm, and H 2 is the binary entropy function; namely, Usually, it holds H(K) = n b and |K| = 2 n b , where n b denotes the bit length of a partial secret key targeted by the SCA.Inequality (3) evaluates the SR of an optimal attack, and every SCA (on Z) must satisfy Inequality (3)6 .Note that an SR upper-bound conversely represents the lower-bound of the number of traces required to achieve a given SR.

Formalization
In the following, we consider the cache-based LR4.This section presents a methodology to evaluate the overall success rate in attacking LR4 (AR in short) in a quantitative manner.Our methodology is derived as a combination/unification of trace complexity bound and bounded leakage from underlying primitive(s) through an information-theoretical SR evaluation to quantify the attack cost and AR, whereas the existing studies utilize only the trace complexity bound as a threshold value for attack success/failure.For this purpose, we consider the bounded leakage as the mutual information I(Z; X), which is a common leakage representation, as in many previous studies, and extend Inequality (3) for the numerical evaluation of attack cost/AR.This unification is essential for the practical usage of LR4 with a guarantee of quantitative security.Hereafter, we refer to the success rates of overall attack on LR4 and a partial key recovery as AR and SR, respectively.
In this paper, as a common case, we suppose that the rekeying components for LR4 have a construction similar to an AES-like block cipher; that is, its round function consists of n s parallel evaluations of an n b -bit Sbox for key-plaintext XOR (corresponding to SubBytes following AddRoundKey in AES).We also suppose that the LR4 instantiates rekeying components using an identical primitive.Recall that the d-th order LR4 consists of d rekeying components.with m-bounded traces (i.e., rekeying interval of m).Here, the computations of the rekeying components under an m-bounded traces model (meaning rekeying interval of m) are performed for d i=1 m i−1 different keys, as the i-th rekeying component generates m i different temporal intermediate keys for the (i + 1)-th rekeying component.This indicates that the attacker has m i−1 chances/trials for key-recovery SCA with m traces on the i-th rekeying component.It would be sufficient for the attacker to recover at least one key among all the SCA trials.Here, we consider AR as a probability that an attacker can achieve the full recovery of at least one intermediate/temporal key of a rekeying component by all possible SCA trials with m traces.Using the rank metric as well as SR, AR is formally defined as follows.
Definition 1 (Success rate of SCA on LR4).Let AR d,m be the probability that an attacker succeeds in at least one full key recovery by attacking the d-th order m-bounded LR4 instantiated using rekeying component(s) with n s parallel Sboxes.Using the rank metric, AR d,m is defined by where k * i,j,h denotes the h-th n b -bit correct partial key of the j-th temporal key at the i-th rekeying component of LR4.
In Definition 1, the right-most intersection ns h=1 means that the attacker should recover a full key from SCA on n s parallel Sboxes; the center union j=1 means that the attacker can mount an m-bounded-trace SCA (i.e., a trial) on the i-th rekeying component m i−1 times with different keys; and the left-most union d i=1 means that the attacker can mount the trial m i−1 times for d different rekeying components.It is sufficient for the attacker to succeed in at least one full-key recovery among all trials.Hence, we should take the union of events ns h=1 rank(k * i,j,h , m) = 1 in terms of i and j, whereas the full-key recovery of a trial is represented as the intersection in terms of h.
Remark 2 (On payload encryption).LR4 generates m d temporal keys, and we call the payload encryption m ′ × m d times in total.Definition 1 considers SCAs on rekeying components, excluding the payload encryption.If we use a primitive for the payload encryption same as the rekeying component (implying that m = m ′ ), it is sufficient for the evaluation including the payload encryption calls to consider AR d,m , as in the numerical evaluation in Section 6.1.Even if we use a distinct primitive for the payload component, we can readily evaluate its AR by an union of full-key recovery events for the primitive involved in m ′ in a similar manner, which is extendable to Theorem 3 below.
Remark 3 (Relation to multi-user security).Our definition is similar to the security notion in the multi-user encryption setting [DLMS14, BT16, LMP17, HTT18, DGGP21, NSSY22], which has been used to determine the rekeying frequency in real-world cryptographic protocols such as (D)TLS and QUIC [Res18,RTM18,TT21].Security analysis of LR4 is related to a cryptanalysis in the multi-user setting.This is because we perform multiple rekeying component evaluations using different keys, which would correspond to the case that multiple users evaluate a block cipher with distinct keys.Note that, in attacking LR4, rekeying component outputs are available only through leakage, different from common cryptanalyses.The numbers of queries and users correspond to the trace bound m and key lifetime (or the number of SCA trials) described below as σ d,m , respectively.In other words, the above definition and the following theorem(s) are used to evaluate the success rate (or advantage) and the rekeying frequency in a setting similar to multi-user encryption with an SCA leakage.

Information-theoretical evaluation
We next formally provide the relation between AR and SR to derive a concrete and quantitative AR evaluation method, under some standard and realistic assumptions.
Lemma 1 (Relation between AR and SR).Let AR d,m be the overall success rate of SCA on the d-th order m-bounded LR4.Suppose that all temporal keys are mutually independent.Suppose that, for rekeying components, the SR of SCA on n b -bit partial key recovery is identical for all Sboxes/partial keys.Let SR m be the partial key recovery success rate of an SCA with m traces.It holds that where σ d,m = d i=1 m i−1 denotes the number of SCA trials available for the attacker.
Proof.We first show that Equation (6) holds.Let (Ω, F, Pr) be a probability space.Let [A] c denote the complement of a set A ∈ F (i.e., [A] c = Ω \ A).Let A m i,j,h denote an event of rank(k * i,j,h , m) = 1.According to De Morgan's law, Equation ( 5) is transformed into Here, events of ns h=1 A m i,j,h (i.e., success of full-bit temporal key recovery in a trial) for all i and j are mutually independent, as temporal keys are supposed to be mutually independent owing to RO.In addition, the rekeying components are performed on an identical device, which indicates that the events ns h=1 A m i,j,h are i.i.d in terms of i and j.Therefore, Equation ( 8) is followed by Due to the assumption on SR of SCA on n b -bit partial key recovery, events A m i,j,h for all h are also mutually independent, which indicates that it holds Pr ns h=1 for any i and j.In addition, owing to the assumptions, we consider an identical SR for all i, j, and h because SR m = Pr[A m i,j,h ] as well as Equation (2).Thus, we conclude as required.Equation ( 7) is derived from Equation (6) as and finally we conclude Lemma 1 states how much/little SR is required to achieve an AR, and Equation ( 7) is used for deriving the SR corresponding to a given AR.In designing a cryptographic module with SCA countermeasure(s), an acceptable AR is determined in advance.For LR4, the parameters (rekeying order d and interval m) should be determined for a required AR and given device with mutual information I(Z; X) as a leakage amplitude (or an SNR, which upper-bounds I(Z; X) via the Shannon-Hartley theorem).For the evaluation, we introduce an upper-bound of AR as Theorem 3.
Theorem 3 (AR upper-bound).Let AR d,m be the success rate of m-bounded-trace attack on the d-th order LR4, as in Definition 1.Let I(Z; X) be the mutual information between the secret intermediate value Z and side-channel trace X.With the same assumption as Lemma 1, the AR d,m is upper-bounded as where σ d,m = d i=1 m i−1 and ξ is the function defined as Equation (4) in Theorem 2.
Proof.It is obvious from Inequality (3) in Theorem 2 and Equation ( 7) in Lemma 1.
Theorem 3 states the relation between AR with m-bounded traces and the mutual information I(Z; X) considered as a leakage from each rekeying component; thus, this is a unified security metric of the bounded trace model and underlying primitive leakage.Using Theorem 3, we can determine the security parameters d and m for a required AR and given device with I(Z; X) or SNR.

Meanings of Theorem 3.
According to [dCGRP19,IUH22b,IUH22a], function ξ represents the number of bits required to achieve an SR.For example, SR = 1/2 n b implies that the attacker has no advantage in the attack, as represented by by ξ(1/2 n b ) = 0. Conversely, SR = 1 implies that the attacker obtains the full-bit information of a secret key, as represented by ξ(1) = n b .In contrast, mI(Z; X) represents the amount of information that the attacker receives through m traces.Note that n s I(Z; X) = λ, where λ is the bounded leakage and n s denotes the number of parallel S-boxes, and m means the traces bound.Thus, Theorem 3 reveals the relation between the bounded leakage function and trace complexity bound in an analytical and quantitative manner.In practice, Inequality (9) evaluates an upper-bound of SR m to achieve a given AR d,m (i.e., attack cost), to determine the appropriate trace bound m (i.e., the rekeying interval).
Remark 4 (On SR range).In Inequality (9), SR m is defined in the range of [0, 1].However, the minimum value of SR m should be 1/2 n b , which means that the attacker has no advantage in guessing the secret key.In other words, it makes no sense to consider the case that SR m ∈ [0, 1/2 n b ), because any attacker trivially achieves SR m = 1/2 n b by a random guess.Therefore, we should determine d and m such that they satisfy SR m ≥ 1/2 n b in addition to Inequality (9).Conversely, if SR m ≥ 1/2 n b is not achievable for a given AR and I(Z; X), such AR cannot be reached by the device.
Remark 5 (Relation between Theorem 1 and Theorem 3).Theorem 1 proves the security bound against an SCA attacker under a theoretical (yet reflecting the idea of practical protection methods, e.g., high-order masking) leakage model with idealized primitives.In contrast, Theorem 3 states a bound of overall success rate of an optimal and real SCA on actual symmetric primitive(s).Each theorem captures different and essential aspects of LR4.

Practical usage of LR4
We can utilize LR4 as an SCA countermeasure with a guarantee of quantitative security evaluated by our methodology in Section 5.2.The proposed design flow is as follows.
Step 1: Device profiling.We first need to know the value of mutual information I(Z; X) or achievable SNR of side-channel measurement by profiling the target device.For example, a deep-learning based profiling method in [IUH22a, Section 6] is useful to evaluate a tight upper-bound of I(Z; X).In addition, according to the Shannon-Hartley theorem, I(Z; X) is upper-bounded by SNR as I(Z; X) ≤ 1/2 log (1 + SNR) (assuming that noise is additive Gaussian).This indicates that it would be sufficient to evaluate the SNR, which may be easier than I(Z; X) evaluation.
Step 2: Determination of master key lifetime and acceptable key recovery success rate.In this paper, we define the master key lifetime as the number of temporal keys generated under a given I(Z; X) or SNR.The master key lifetime should be considered with the number of calls the target cipher or LR-AE required for the application.At the same time, we determine an acceptable full-key recovery success rate as a threshold value of AR ∈ (0, 1]. Step 3: Determination of security parameters.We then determine the security parameters including the rekeying order d and rekeying interval m (i.e., appropriate trace complexity bound for the situation) using Theorem 3 such that, for a given AR, the key lifetime exceeds the desired value determined in Step 2. Namely, for a given AR, we should determine d and the corresponding maximum value of m with satisfying Inequality (9) and SR m ≥ 1/2 n b such that the master key lifetime requirement is met.If the requirement cannot be met with practical d and m (see also Remark 4), we need to mitigate/reduce the leakage by other SCA countermeasures such as masking and hiding.
Here, if we adopt a masking scheme, we do not have to profile masking gadgets because we can precisely estimate the resulting leakage from masked implementation using the aforementioned profiling method in [IUH22a].Alternatively, under some conditions (see [IUH22a]), we can use another inequality instead of Inequality (9) in the following corollary: Corollary 1.Let e be the masking order.Let I(S; L) be the mutual information between a masking share S and its corresponding leakage L. It holds where log and ln are the binary and natural logarithms, respectively.
Proof.It is proven by combining Theorem 3 and a lemma in [IUH22a,MRS22] Here, I(S; L) is equal to the I(Z; X) of non-masked implementation in some settings; therefore, Inequality (10) can be used to evaluate the AR on masked implementation from the profiling result on non-masked implementation, without actual evaluation on masking gadgets [IUH22a].Remark 6 (Conditions for Inequality (10)).Inequality (10) is meaningful for a non-trivially low I(S; L) (i.e., worse SNR) and/or large masking order e, as mentioned in [IUH22a, Remark 5.1].At least, it should hold I(S; L) < 2 ln(2) ≈ 0.72 to use Inequality (10).If I(S; L) is relatively high and e is relatively small, we need to actually profile the adopted masking gadgets or to use the aforementioned method in [IUH22a].It should be noted that Béguinot et al. recently proved another bound in [BCG + 23], in which they claim a more precise evaluation than [IUH22a,MRS22].It would be useful for the practical and more precise evaluation, although we used Corollary 1 based on [IUH22a,MRS22] for the proof-of-concept evaluation in this paper.In other words, for masked implementation, we can achieve a more precise evaluation if we use a precise inequality about masked implementations.
Step 4: Actual design/implementation.After determining the parameters that satisfy the master key lifetime requirement, we conduct an actual design and implementation for LR4.Here, if we adopt no SCA countermeasure other than LR4, it is sufficient to use a common non-protected implementation like naïve or reference implementations as it is, which may have been used in Step 1. Otherwise, a sound masked implementation should be utilized.The masking scheme used here should be provably secure under a practical leakage model (e.g., [NRS11, RBN + 15, GMK16, GM17, BBD + 16]), and implementation should be done by carefully considering the physical defaults that cause security order degradation (e.g., coupling, cross-share interaction, and glitches), which have been shown and analyzed in many studies [RSVC + 11, BGG + 14, dCBG + 17, DCEM18, FGP + 18, GMPO20, SCS + 21, SSB + 21, MKSM22].Usage of leakage detection/verification tools, design automation tools and/or open-source implementations is promising to achieve such a provably secure masked implementation (e.g., [Rep16, UHMA17, UHMA21, BBC + 19, KSM20, SCS + 21, SSB + 21, KMMS22, BMRT22]).

Numerical evaluation
We show the validity of LR4 through a numerical evaluation of the key lifetime for a given d and mutual information.
First, we virtually determine the mutual information in Step 1.We set the acceptable full-key recovery success rate as 1% as an example7 and then evaluate the master key lifetime for various rekeying orders d (and masking orders e) using Inequality (9) (or Inequality (10)) with achievable trace bound m for AR d,m = 0.01.Here, we assume that each rekeying component in LR4 is implemented using one AES encryption call (namely, where E is AES for any i) for a proof-of-concept evaluation, although such a plain AES encryption is not an RO.See Section 6.2 for a discussion about the instantiation of an RO using AES or other symmetric primitives.Note that the evaluation result under the assumption of one AES encryption call would be consistent with actual RO instantiations.In addition, we suppose that AES is utilized for encrypting payload data using a temporal key generated by LR4.The AES encryption for payloads should be trace-bounded by m, so the key lifetime is given by m × m d = m d+1 , as m d corresponds to the number of generatable temporal keys.
Table 1 and Table 2 list the evaluation results of key lifetime lower-bounds of LR4 with non-masked and masked implementations, respectively, where I(Z; X) is the virtually determined mutual information value; SR denotes the SR required to satisfy AR d,m = 0.01 with a maximum value of m for a given d (evaluated using Equation ( 7)); m denotes the maximum value of m under the conditions evaluated using Inequalities (9) and (10) for non-masked and masked implementations, respectively; and m d+1 denotes a lower-bound of the maximum number of secure encryption calls with generated temporal keys."N/A" means that we cannot derive the value due to the computational difficulty (that is, the evaluation requires extremely high-precision floating-point arithmetic).Note that rekeying and masking are not applied if d and e are zero, respectively (if d = 0, the master key is used for the payload encryption as it is).
The results demonstrate the validity of LR4 as an SCA countermeasure: the key lifetime increases exponentially (i.e., digit-wisely) by an increase of the rekeying order d in most parts of the tables.In addition, from Table 2, we confirm that a combination with masking is more effective for improving key lifetimes if I(Z; X) is sufficiently smaller (i.e., the leakage is sufficiently noisy).It should be noted that, in I(Z; X) = 0.01 of  key lifetime of first-order masked implementation is smaller than non-masked one.This is because Inequality (10) evaluates a lower-bound of key lifetime for a given AR d,m , but does not necessarily precisely/tightly represent an actual value under some conditions.As mentioned in Remark 6, Inequality (10) cannot provide a meaningful evaluation of the bound if I(S; L) is too low and e is too small.I(S; L) = 0.01 and e = 1 may be such a condition, but an actual key lifetime would be longer than the non-masked implementation.

Practical instantiations
Standard instantation using SHA-3.The provable security analysis of LR4 assumes that G i is an RO and E is an IC.A straightforward instantiations would be e.g., SHA-3 for G (as and AES for E. If one wants to avoid multiple distinct primitives for implementation efficiency, E can be also permutation-based, say using Keccak-p with an adequate domain separation and output truncation (but the resulting function is non-invertible so it limits the applications).In principle, instead of E we can use more complex functions, such as nonce-based encryption or AE, possibly using a permutation.What security/efficiency benefit is expected depends on the scheme we use, and exploring such combinations would be an interesting future direction.
Instantiation using AES.Our SR evaluation in Section 6.1 assumes a naïve use of AES for G for ease of evaluation, but this instantiation has a gap from the proof.It is important to consider secure instantiations using AES owing to its ubiquity and maturity as a symmetric primitive.We briefly discuss secure instantiations of E and G based solely on an ideal cipher E base .For simplicity, let E base have a key length one-bit longer than E so that we can generate two independent ideal ciphers, E ′ and E ′′ , from E base by using this extra key bit for domain separation.The problem is how to instantiate G from E ′ .What we need for G is indifferentiability [MRH04] from the fixed-length RO.Note that classical block cipher-based compression functions (e.g., Davies-Mayer), are not indifferentiable [KM07].We present two secure examples here.First, if E ′ is a block cipher of n-bit block and n-bit key, we can instantiate G by Mennink's F 3 construction [Men17], which is n/2-bit indifferentiability from the (fixed-length) RO and needs three calls of E ′ .Second, if E ′ is a block cipher of n-bit block and 2n-bit key, G can be a 2n-bit indifferentiable hash function using Hirose's double-block-length compression function [Hir06] with a proper domain extension.For example, we can use MDPH [Nai19], which has (n − log n)-bit indifferentiability [GIM22].Both examples utilize different primitives and offer different security levels, so the proper choice will depend on the security goal, application, and the available primitives.
We should point out that the average computation cost of LR4 is at most two G calls plus one E call (Section 3), which means the impact of G's cost on the total computation is limited.In addition, security evaluation for the aforementioned (secure) instantiations could be done in the same manner as Section 6.1 as long as we use AES as E base .

Conditions for exponential increase of key lifetime.
Interestingly, in Table 1 and Table 2, SR and the trace bound m get more severe for larger d.For larger d, the attacker can have the larger number of SCA trials (i.e., σ d,m increases), which enables at least one full-key recovery with a smaller value of SR.This is also represented as Equation ( 7), which shows a monotonic decrease in terms of σ d,m for a fixed AR d,m .In other words, the key lifetime gets longer by increasing d only if the gain of m d+1 is greater than the decrease of SR and m.In fact, if d is very large, the key lifetime does not (exponentially) increase and AR decreases anymore by increasing d, which implies that LR4 is valid as an SCA countermeasure for sufficiently small d.
In contrast, the security of masking is guaranteed as the SR decreases exponentially by increasing the masking order e.In particular, the security of masking is asymptotically proven; that is, it holds SR m → 1/2 n b as e → ∞, although the exponential increase may not be guaranteed for a small e [IUH22a].Thus, the rekeying and masking have opposite features to each other.A higher-order SCA countermeasure usually incurs a large performance overhead.If LR4 is available, its adoption can be one of the best choices to counter SCAs, as it is very efficient for small d and its overhead is practically small (as shown in Section 3).

On upper-bound of master key lifetime for general rekeying schemes
In the above, we mentioned that the master key lifetime does not increase exponentially by increasing d if d is (very) large, although it is effective for practical values of d.In general, the increasing rate gets slower for larger d, and the increase of master key lifetime eventually stops for a certain value of d, depending on I(Z; X).Intuitively, this is because the minimum value of SR m should be 1/2 n b (as mentioned in Remark 4), although SR m is monotonically decreasing in terms of d and SR m ∈ [0, 1] to the definition.In fact, any rekeying scheme including LR4 has an upper-bound of master key lifetime according to Proposition 2. Proposition 2 implies that, for a given TR value τ , there always exists the number of SCA trials σ such that TR σ ≥ τ , and the implementation cannot achieve a master key lifetime of more than σ with regard to the overall success rate of τ .In other words, Proposition 2 implies that, for any given AR, we should make SR m approach to zero when d → ∞, although SR m should be greater than 1/2 n b .Here, the master key life time σ d,m is maximized by d and m which tightly satisfies Inequality (9) with SR m = 1/2 n .Thus, for a given I(Z; X) (or I(S; L) and masking order) and AR, there exists an upper-bound of the master key lifetime, distinctly from a brute-force/cryptanalysis on the master key.Theorem 3 is an upper-bound of the master key lifetime with regard to SCA9 , and it is a case study of LR4.Our discussion emphasizes a (trivial) fact that an ultimate goal of SCA countermeasures including rekeying is to achieve a (master) key lifetime as long as lifetime against pure cryptanalysis, such that SCA leakage is no longer useful for the attacker.
Related to Proposition 2, the convergence rate of TR → 1 is very important and represents the achievable security by rekeying schemes.It depends on the value of I(Z; X).Fortunately, for LR4, we experimentally confirmed that the convergence is slow for practical d and a wide range of I(Z; X), which indicates the validity of LR4 security for many practical conditions.In contrast, for very high I(Z; X) (e.g., I(Z; X) = 1), the leakage is not sufficiently bounded at all, and it is impossible for any SCA countermeasure to protect such a leaky device (see also [BS21]).Discussion on the convergence rate for a wider range of I(Z; X) would be useful for achievable security of rekeying schemes.The purpose/goal of the rekeying scheme is to improve master key lifetime in general; hence, investigating (the existence of) tighter upper-bounds and the convergence rate for rekeying schemes is an important future work for making the rekeying security more concrete.

Resilience against fault attacks
We here briefly discuss the resilience of LR4 against fault attacks [BDL97].A major fault attack would be differential fault analysis (DFA) [BS97].DFA recovers the secret key from pair(s) of correct and faulty ciphertexts for an identical input, where faulty ciphertext means that an error bit flip) is induced to an intermediate value of the encryption/decryption.For example, in the case of AES encryption, one-bit fault in the eighth-round input is sufficient for the full-key recovery if the corresponding correct ciphertext is available [PQ03].However, the DFA attacker should observe the output of the symmetric primitive to obtain the pair(s) of correct and faulty ciphertexts.As the output of ROs of LR4 is not (directly) available for the attacker, DFA is inapplicable to LR4 implementation (except for the payload encryption).In addition, DFA must require to query an identical plaintext twice to obtain pair of correct and faulty ciphertexts.If the LR4 is correctly implemented in such a way as to detect the replayed queries as mentioned in Section 3.2 (and the payload encryption is nonce-based), ROs (and payload encryption) never evaluate an identical input more than once, which also indicates the inapplicability of DFA.Some other fault attacks have been also developed, such as fault sensitivity analysis (FSA) [LSG + 10, MMP + 11], differential fault intensity analysis (DIFA) [GYTS14], and persistent fault analysis (PFA) [ZLZ + 18, ZHF + 23].These fault attacks utilize a statistical mean for the key recovery like DPAs.Hence, LR4 can offer a leakage resilience by determining an appropriate trace bound m and m ′ similarly to Section 5 (which may be trivial for some attacks).Moreover, we want to stress that the FSA requires to query an identical plaintext many times to observe the leakage of fault sensitivity; the DFIA utilizes the output ciphertext; in addition, PFA is a chosen-plaintext fault attack.Neither such chosen-plaintext strategies nor RO outputs are avaialble in attacking ROs in LR4; thus, it would be difficult to apply these fault attacks to ROs in LR4 (in a naïve manner).Meanwhile, the payload encryption part can be protected using an appropriate trace bound m ′ , a fault attack resilient mode of operation, and/or countermeasure against fault attacks (e.g., fault detection schemes).
Note that fault attacks on hash functions (e.g., SHA-3), which may be a natural choice for RO instantiation, would be frequently more difficult than block ciphers, although we basically discuss fault attacks on AES in this section.

Comparison to LR-PRG/stream cipher
LR-PRG and stream cipher may be used for generating a key stream as a temporal key.For example, Pietrzak's LR stream cipher [Pie09] is based on a weak PRF F (whose outputs are pseudorandom as long as inputs are random) and its initial state is one uniformly random input of F , in addition to two keys of F which are the master key, and the random input is sent in clear.F is called in an alternating manner, using an internal state consisting of two keys and one input to F .Leakage model is different from ours, namely it is assumed that F leaks a certain amount of bits for each invocation via a leakage function restricted on the output size.The security is proved in terms of the pseudorandomness of single (possibly long) output sequence with leakage (hence the game does not consider multiple initializations).
One of the advantages of LR4 over LR-PRG/stream cipher is that LR4 offers an explicit synchronization.LR4 can immediately generate arbitrary temporal keys, which indicates that LR4 can redeem the communication whenever the synchronization fails (maliciously or accidentally) and can start a new session without reset.In contrast, LR-PRG and stream cipher require a reset with a new initialization vector in cases of synchronization failure or new session beginning.As mentioned in Section 1, an attacker may mount an SCA on LR-PRG/stream cipher during some first state updates, if the attacker can trigger resets repeatedly.Thus, an explicit synchronization is essential for an LR rekeying, which makes LR4 more suitable.

Relation and comparison to LR-PRFs
As mentioned, known LR-PRFs [MSJ12,FPS12] have structural similarities to our approach in terms of the use of GGM, but LR4 and [MSJ12,FPS12] are basically incomparable due to the different leakage models and the assumptions on the primitives.In [FPS12], the authors assume non-adaptive bounded leakage and a weak PRF as an underlying primitive.They show how to construct leakage resilient non-adaptive PRF, in which the adversary non-adaptively chooses the inputs of PRF.In [MSJ12], a formal security proof is not given, but the authors show that parallel implementation improves the security and efficiency of GGM-like LR-PRF.In terms of the constructions, two LR-PRFs [MSJ12, FPS12] use independent public randomness for each node on the path.These public random values (IVs) are crucial for their security proofs and significantly increase the bandwidth.Moreover, the generation of these random values must be secure even under leakage, which can be quite costly in practice.For completeness, we briefly describe GGM and [MSJ12,FPS12] in Appendix A.

Applicability of LR4 to LR-AEs
As discussed in Section 4.3, many LR-AE proposals are designed with leveled implementation [BBC + 20] in mind [PSV15, BGP + 19, DJS19, KS20, DEM + 20, BBB + 20, SPS + 22].Although their security assumptions and leakage models vary (as discussed in [BBC + 20]), they share the core idea of combining a leak-free/DPA-resistant component for, for example, the derivation of a temporal key, and SPA-resistant component(s) using the derived temporal key for the rest of the encryption routine.As we have discussed in Section 4.3, LR4 could be used as a component of existing LR-AEs if they meet certain conditions.However, these conditions are not always met, particularly when it comes to the tag-generation function (TGF) (see [BBC + 20]).As discussed in Section 4.3, extending LR4 to handle such case would be an interesting future direction.
Ultimately, the goal of LR-AEs is to improve the temporal key lifetime and change the key lifetime unit from the number of (tweakable) block cipher calls to AE calls.When applicable, an LR rekeying scheme contributes to this goal.

Summary
This paper studied rekeying as a power/EM SCA countermeasure and presented a new higher-order and LR rekeying scheme named LR4.We developed a leakage model for rekeying to formally prove the security of LR4, and analyzed its performance overhead and practical usecases.In addition, we defined the success rate of attack on rekeying schemes and developed a methodology for evaluating the success rate quantitatively through a unification of bounded trace complexity and bounded leakage.This is useful for determining the rekeying frequency for a bounded leakage (defined as a mutual information value for a given device here) regarding a success rate, which is mandatory for the practical usage of LR4.Through a numerical evaluation, we confirmed the validity and effectiveness of LR4 as an SCA countermeasure (as well as masking), as the number of secure encryption/decryption calls increases exponentially by an increase of rekeying order under practical conditions.

Future works
Relaxing security assumption.Our current security proof relies on the idealized primitives.A standard model-based proof would provide additional confidence (see e.g., [BGPS21]).It might be possible to remove a (pseudo)random property for G as observed by MSGR.Moreover, it should be noted that our SR definition is related to the multi-user analysis [DLMS14, BT16, LMP17, HTT18, DGGP21, NSSY22] (as in Remark 3).Clarifying the relationship would be an interesting future direction.
Investigation of other possible rekeying construction.In this paper, we discussed the evaluation of LR4 in Section 5.2, but Definition 1 and our evaluation methodology are readily and naturally generalizable and extendable to other rekeying schemes (as in Proposition 2 in Section 6.2.3).It is an important future work to investigate efficient rekeying constructions that make the master key lifetime longer.
Extension to SCAs other than power/EM attack.The focus of this paper was power/EM SCAs, for which we developed the models, security proofs, and an evaluation methodology.It would be valuable to extend our theory and methodology to utilize rekeying in a provable secure manner against other SCAs such as timing and cache attacks.
wPRF.Let fk (r; 0) and fk (r; 1) be the lower and upper n k bits of f k (r), respectively.Using n F n E -bit public randomness r 1 , r 2 , . . ., r n F , the FPS scheme evaluates F k (t) as (1 ≤ i ≤ n s ) denote the i-th digit of t in the 2 n b -ary number representation.For example, t (n b ) [i] is typically given by two hexadecimal numbers for AES as 2 n b = 16 2 .Using 2 n b distinct public values r 0 , r 1 , . . ., r 2 n b −1 , the MSJ scheme evaluates F k (t) using n s iterative block cipher calls, as The MSJ scheme performs one F k (t) evaluation with n s encryption calls, which is a significant reduction from n E = n b n s of the GGM scheme.For the MSJ scheme, Medwed et al. suggested determining the i-th public value r i as r i = i ∥ i ∥ • • • ∥ i, which increases the trace complexity bound (i.e., decreases the SR or increases the number of traces).Some improvements and practical evaluations of MSJ have been devoted in [MSNF16, USS + 20, BMPS21].

B Proof of Theorem 1 B.1 H-coefficient technique
Assume that computationally-unbounded adversary A queries to the two worlds: real and ideal, denoted by O re and O id , and tries to distinguish them.The H-coefficient [Pat08,CS14] is a general technique to evaluate the distinguishing probability of A. We define a transcript as a set of input/output values that A obtains during the interaction with the world.Let T re (resp.T id ) denote the probability distribution of the transcript induced by the real world (resp.the ideal world).By extension, we also use the same notation to refer to a random variable distributed according to each distribution.We say that a transcript τ is attainable if Pr[T id = τ ] > 0 holds with respect to A. Let Θ denote the set of attainable transcripts.The following is the fundamental lemma of H-coefficient technique; see e.g.[CS14] for the proof.
Lemma 2. Let Θ = Θ good ⊔Θ bad be a partition of the set of attainable transcripts.Assume that there exists ε 1 ≥ 0 such that for any τ ∈ Θ good , one has and that there exists

B.2 Evaluation of good transcript probability ratio
In Section 4, we defined bad events to show how to partite the set of attainable transcripts and then showed the evaluation of ε 2 , i.e., Pr[T id ∈ Θ bad ] ≤ d(q + q L ) 2 /2 n k +1 + (q + q L )(p + p I )/2 n k + 4m ′ q/2 n bc .All that remains is evaluating a good transcript probability ratio, i.e., ε 1 in Lemma 2.
Lemma 3.For any τ ∈ Θ good , we obtain the following evaluation: T E , T L , T K denote the random variables of each transcript.Let * ∈ {re, id}.In both real and ideal worlds, we obtain the following evaluation: To prove Lemma 3, we evaluate the lower bound of (P1 re • P2 re • P3 re )/(P1 id • P2 id • P3 id ).
Evaluation of P1 re and P1 id .
We first obtain P1 re = P1 id because the probability distribution of transcripts defined by the interactions with the oracles G 1 , . .., G d , and E ± are identical in both worlds.
Evaluation of P2 re and P2 id .In the ideal world, the keys revealed by the construction oracle (i.e., k (1) •,• ) are chosen at random and independently from {0, 1} n k .Regarding keys revealed by the leakage oracle (i.e., k (0) •,• ), recall that the transcript τ is good; thus, there is no collision between the revealed keys in the same depth in Q K (i.e., Bad1), and there is no collision between in revealed keys of depth i, k (0) i,• , and the input keys of the RO G i where i ∈ [d] (i.e., Bad2).Therefore, we obtain P2 id = (1/2 n k ) Σ d+1 i=1 nki , where nk i is the number of elements k (•) i,• in Q K (i.e., the number of revealed keys in i-th depth).In the real world, keys revealed from the construction oracle are all real, unlike in the ideal world.However, due to Bad1 and Bad2, we obtain P2 re = (1/2 n k ) Σ d+1 i=1 nki in the same manner as the above discussion of keys revealed by the leakage oracle in the ideal world.Therefore, we obtain P2 re = P2 id .
Evaluation of P3 re and P3 id .In the ideal world, the construction oracle is TURP P ± ; thus, T C and T L are independent.Then we obtain the following equations.
Recall that Cnc is the number of distinct counters in construction queries, and ctr d 1 , . .., ctr d Cnc are the distinct counters.Also recall that q 1 , . .., q Cnc are the number of construction queries whose counter is ctr d 1 , . .., ctr d Cnc , respectively.Since the adversary queries to P ± which is independent from other oracles, we obtain .
Similarly, we define Lnc as the number of distinct counters in leakage queries, and Lctr d 1 , . .., Lctr d Lnc as the distinct counters.Also, let q L,1 , . .., q L,Lnc be the number of leakage queries whose counter is Lctr d 1 , . .., Lctr d Lnc , respectively; thus, q L,i ≤ m ′ for i ∈ [Lnc] and Lnc i=1 q L,i = q L .Here, temporal key values inputted into E in LR4-L, which are derived from Lctr d 1 , . .., Lctr d Lnc , are all distinct due to Bad1.Also, there is no collision between the temporal keys in LR4-L and input keys of the IC due to Bad3.Thus, we obtain In the real world, unlike the case of P3 id , we cannot divide the evaluation of P3 re into two evaluations about T C and T L since they are not independent in the real world.However, we can discuss the evaluation of P3 re in almost the same manner as P5 id .We define CLnc as the number of distinct counters throughout the construction and leakage queries (i.e., CLnc ≤ Cnc + Lnc), and CLctr d 1 , . .., CLctr d CLnc as the distinct counters throughout Q C and Q L .Also, let q CL,1 , . .., q CL,CLnc be the summation number of construction and leakage queries whose counter is CLctr d 1 , . .., CLctr d CLnc , respectively; thus, CLnc i=1 q CL,i = q + q L .As in the case of P5 id , all the temporal keys derived from CLctr d 1 , . .., CLctr d CLnc are distinct and have no collision with the input key of the IC due to Bad1 and Bad3.Thus, we obtain

Figure 4 :
Figure 4: Key diagram of LR4 when d = 2 and m = 3. Solid, dashed, and dash-dotted arrows mean that key is derived from key at parent's node when ctr i = 0, ctr i = 1, and ctr i = 2, for each i ∈ [d] = [2], respectively.Temporal keys are generated and used from left to right.

Figure 5 :
Figure 5: The cache-based version of the temporal key derivation function R. LR4with cache invokes R C instead of R. The caches ch d+1 and kh d+1 are initially given as ch d+1 = (0, 0, . . ., 0), kh i+1 = G i (kh i , 0) for each 1 ≤ i ≤ d, and kh 1 = k mst .Here, kh i should be called at Line 9 only when it is actually used.
Figure6: The rekeying function of asakey.K is a master key, and p is an underlying permutation.N 1 , . .., N k are one-bit split nonces where an input nonce N is written asN 1 ∥ • • • ∥ N k .K * isa derived temporal key input to the sponge-based encryption part.ISAP has a similar rekeying function as this construction.

Figure 7 :
Figure 7: Example of key reveal procedure in the proof.The leakage queries are for the leaves (3, 1), (3, 4) and (3, 5) and the relevant (intermediate) keys are circled by red.The construction oracle queries are for the leaves (3, 2), (3, 5) and (3, 7) and the relevant (intermediate) keys are circled by blue.First, the keys circled by red will be revealed.Second, the keys circled by blue will be revealed-following the tree in the real world and randomly sampled in the ideal world-except those not already revealed (i.e., circled by red).Thus, the latter step only reveals k 3,2 , k 2,3 , and k 3,7 .

Proposition 2 (
Attacker with infinite trials almost surely succeeds in at least one full-key recovery).Let TR σ,m be the probability of at least one success during σ SCA trials with m-bounded-trace, defined asTR σ,m = Pr σ v=1 ns h=1 rank(k * v,h , m) = 1 ,where rank(k * v,h , m) denotes the correct key rank of the h-th partial key at the v-th trial with m traces.With the same assumption as Lemma 1, it holdsTR σ,m → 1 as σ → ∞; namely, v,h , m) = 1 = 1, if SR m ̸ = 0. Proof.If SR m ̸ = 0, then it holds Pr[ ns h=1 rank(k * v,h , m) = 1] = (SR m ) ns > 0 for any v,according to the assumption on the SR.Therefore, as the SCA trials are mutually independent of each other, it holds ∞ v=1 Pr ns h=1 rank(k * v,h , m) = 1 = ∞.(11) According to the Borel-Cantelli lemma [Fel91, pp.201-202], Equation (11) is followed by Pr lim sup v→∞ ns h=1 rank(k * v,h , m) = 1 = 1, which means that events ns h=1 rank(k * v,h , m) = 1 (i.e., successful full-key recoveries) infinitely often occur with probability one.This implies Proposition 2. Corollary 2. Let AR d,m be the success rate of attack on the d-th order m-bounded-trace LR4 defined in Definition 1.With the same assumption as Lemma 1, it holds AR d,m → 1 as d → ∞ or m → ∞.Proof.It is proven 8 by Proposition 2 as the case that TR σ,m = AR d,m with σ = σ d,m , where it holds σ d,m → ∞ as d → ∞ or m → ∞.Note that SR m → 1 as m → ∞ usually holds.

Faust
et al. proved that the above PRF is leakage resilient if both the leakage function and inputs are non-adaptive.Medwed-Standaert-Joux (MSJ) scheme [MSJ12].The MSJ scheme consists in a multi-ary tree for an improved computational cost, whereas GGM and FPS employ a binary tree.Let n b denote a positive integer divisible n E , and let n s = n E /n b .Typically, n b is defined as the bit-length of Sbox (e.g., n b = 8 and n s = 16 for AES).Let t (n b )[i] bc − j) .We next show P3 re ≥ P4 id • P5 id holds.For ∀i ∈ [CLnc], the query of CLctr d i in Q C and Q L can be classified into any of the following three cases: (Case 1)CLctr d i is queried only in Q C , (Case 2) CLctr d i is queried only in Q L , (Case 3) CLctr d i is queried in both Q C and (s), its impracticality and meaninglessness were discussed in [SPY + 10,FPS12].Meanwhile, some impossibility results (difficulty in LR cryptography with practical construction under adaptive leakage) were shown in [SPY + 10, FPS12].Known practical remote power SCAs (e.g., [ZS18, LKO + 21]) also utilize non-adaptive leakage.Thus, non-adaptive leakages are more common, practical, and significant than adaptive ones, and we focus on non-adaptive leakage in this paper.
ctr d ) Figure 2: Encryption of LR4, where k 1 = k mst .probe displays the cache-based version of the temporal key derivation function R, denoted by R C .It caches the intermediate keys and counters for given m and m ′ , where kh i and ch i denote the caches for i-th intermediate key kh d+1 = (kh 1 , kh 2 , . . ., kh d , kh d+1 ) and counter values ch d+1 = (ch 1 , ch 2 , . . ., ch d , ch d+1 ), respectively.Unlike Figure The strengthened asakey must call at least oneKeccakp[1600, 12]for each encryption call, while LR4 calls at least one Keccak-p[1600, 24] (i.e., SHA-3) per m ′ E K calls, where m ′ is trace bound for E K .Note that E K can be instantiated with an (LR-)AE.For example, as sponge-based encryption like ISAP or Ascon [DEM + 20, DEMS21], which can encrypt a message with (practically-)arbitrary bit length by only one E K call (see also Section 6.2.1).On average per E K call, LR4 requires less than 2/m ′ Keccak-p[1600, 24] calls (as proven in Proposition 1), while asakey always requires at least one Keccak-p[1600, 12] call for nonce-processing.As m ′ is usually greater than 10 (see Section 6.1), cache-based LR4 has a far lower latency than strengthened asakey on average.Also, for non-cached version, LR4 requires only d Keccak-p[1600, 24] calls while strengthened asakey requires 128 Keccak-p[1600, 12] calls.

Table 2 , the Table 1 :
Key lifetime lower-bound evaluation results of non-masked LR4 for AR d,m = 0.01

Table 2 :
Key lifetime lower-bound evaluation results of masked LR4 for AR d,m = 0.01