Automatic Search of Meet-in-the-Middle Differential Fault Analysis on AES-like Ciphers

Fault analysis is a powerful technique to retrieve secret keys by exploiting side-channel information. Differential fault analysis (DFA) is one of the most powerful threats utilizing differential information between correct and faulty ciphertexts and can recover keys for symmetric-key cryptosystems efficiently. Since DFA usually targets the first or last few rounds of the block ciphers, some countermeasures against DFA only protect the first and last few rounds for efficiency. Therefore, to explore how many rounds DFA can affect is very important to make sure how many rounds to protect in practice. At CHES 2011, Derbez et al. proposed an improved DFA on AES based on MitM approach, which covers one more round than previous DFAs. To perform good (or optimal) MitM DFA on block ciphers, the good (or optimal) attack configurations should be identified, such as the location where the faults inject, the matching point with differential relationship, and the two independent computation paths where two independent subsets of the key are involved. In this paper, we formulate the essential ideas of the construction of the attack, and translate the problem of searching for the best MitM DFA into optimization problems under constraints in Mixed-Integer-Linear-Programming (MILP) models. With the models, we achieve more powerful and practical DFA attacks on SKINNY, CRAFT, QARMA, PRINCE, PRINCEv2, and MIDORI with faults injected in 1 to 9 earlier rounds than the best previous DFAs.


Introduction
Fault analysis (FA), first introduced by Boneh et al. [BDL97] against RSA-CRT implementations, is one of the most powerful attacks on implementations of cryptographic primitives.It allows the attacker to get additional side-channel information and achieve the key recovery attacks in practical time.At CRYPTO 1997, Biham and Shamir [BS97] proposed the differential fault analysis (DFA) on DES block cipher.Since then, various fault attacks were introduced, such as (statistical) ineffective fault analysis [Cla07, DEK + 18, DEG + 18], collision fault analysis [BK06,Hem04], statistical fault attacks [FJLT13, DEK + 16], fault intensity analysis [LSG + 10], persistent fault attack [ZLZ + 18], fault template attack [SBR + 20], fault correlation analysis [SMC21], and other fault attacks [JSC + 14, DPdC + 15, BHJ + 18, TLG + 15, SHS16, HS13, GKPM18, MGV08, PAM19, RLK11].For fault analysis on AES, the fault can be induced in the round counter [CT05,AK97] to reduce the number of rounds, or in the internal state during a round [PQ03,DLV03,BS03,Gir04] to exploit the confusion and diffusion characteristics of the fault, or in the key schedule [CY03,Gir04,TFY07,DV12].There are several fault models in [RSG21].A less common model is the random bit fault model [BS03], which allows the adversary to flip one particular bit.A more common model is the random byte fault model [PQ03].[BS97] exploits the difference between correct and faulty ciphertexts by injecting a fault in the internal state of the last several rounds.With the correct and faulty ciphertext pairs, the adversary can retrieve the secret key by a cryptographic analysis.The key point of DFA is that it allows the adversary to analyse a small number of rounds of a block cipher.DFA has been widely applied to the attacks on DES [Hem04,Riv09], AES [PQ03, DLV03, BS03, Gir04, MSS06, DFL11, YSW18, SHS16], PRESENT [SGSS14, GYS15,PBMB17] and others [FT09,TBM14].The countermeasures against DFA include cipher or mode level (e.g.FRIET [SBD + 20], CRAFT [BLMR19], DEFAULT [BBB + 21], and others [MSGR10, MPR + 11, DKM + 15]) and implementation level ways [LRT12], etc.A widely used implementation level countermeasure against DFA is to perform the computation twice and check whether the same result is obtained [MSY06, ML08, JMR07, BBK + 10].

Differential Fault Attack
Since DFA against block ciphers usually targets the last few rounds, one does not need to protect the whole cipher thus saving computation time [MSY06, CFGR10, BBK + 10].However, the number of rounds to protect must be chosen carefully in order to prevent security flaws.In order to determine how many rounds to protect, one has to know how many rounds DFA can work on.At CHES 2009, Rivain et al. [Riv09] studied against DES by introducing some faults at the end of round R − 7, R − 6, R − 5 or R − 4. For AES, most DFA techniques [PQ03,Gir04,Muk09] work by inducing faults at the end of round R − 3, R − 2 or R − 1.Therefore, protecting the last and the first three rounds of AES against DFA is usually suggested [CFGR10].At CHES 2011, Derbez, Fouque, and Leresteux [DFL11] introduced the Meet-in-the-Middle (MitM) and impossible differential fault analysis on AES by inducing faults at the end of round R − 4 for the first time, that broke those implementations only protecting the last three rounds.
The Meet-in-the-Middle (MitM) approach is a time-memory trade-off cryptanalysis technique, which can be traced back to Diffie and Hellman's attack on DES [DH77].The basic idea is to divide the key space into two independent subsets (also known as neutral sets), and then find matches from the two subsets.Let E K (•) be a block cipher with block size n-bit, such that C = E K (P ) = F K2 (F K1 (P )), where K = K 1 K 2 has n bits, and K 1 and K 2 are neutral key materials of n/2 bits.For a given plaintext-ciphertext pair (P, C), a naive exhaust search attack needs a time complexity 2 n to find the key.However, the for solving problems like Mixed Integer Linear Programming (MILP), Satisfiability (SAT), Satisfiability Modulo Theories (SMT) problems, or Constrained Programming (CP), better cryptanalytic results have been achieved in topics including differential/linear attacks [MWGP11, SHW + 14, KLT15], impossible differential attacks [ST17, CCF + 21, SGL + 17], cube or integral attacks based on division properties [XZBL16,TIHM17].
MILP is a method frequently used in business and economics to solve optimization problems.It deals with the problems of optimizing a linear objective function f (x 1 , x 2 , . . ., x n ) subject to linear inequalities involving variables x i , 1 ≤ i ≤ n.The first attempt to apply the MILP model to cryptanalysis is to determine the minimum number of differential active S-boxes for AES by Mouha et al. [MWGP11].They assigned Boolean variable x i for each S-box, where x i = 1 means the i-th S-box is active.The variables are restricted by linear inequalities, which are derived from the differential propagation properties of each operation of AES.For example, given XOR operation x i ⊕ x j = x k , if x k = 1 (active), there is at least one active S-box for the i-th and j-th S-box, i.e., x i + x j ≥ 1.At EUROCRYPT 2021, Bao et [BDF11] proposed an ad-hoc automatic tool to search DFAs on round-reduced AES.Their tool mainly used C++ programs to exhaust all possible DFA attacks on AES by exploiting its details and properties.Applying their tool to other block ciphers does not seem to be trivial, and many recent DFAs on other block ciphers are still done manually [CZS16,KAKS22,VSBM20].In this paper, we manage to propose an MILP modelling method for the MitM DFAs, which is general for some popular block ciphers.
Our contributions.In this paper, we introduce an automatic search model based on MILP for the good or optimal MitM differential fault analysis, which is then successfully applied to several popular lightweight block ciphers.Under the fundamental assumptions of DFA, we can inject a random byte (nibble) fault to the internal state of the block cipher, and the location of the fault in the state could be controlled.We can obtain both correct and faulty ciphertexts.Under the assumptions, we generalize the MitM differential fault analysis, and translate the problems into MILP models by modelling the location and propagation of the differential faults, the matching point with differential relationship, the propagation of the two neutral sets, and the objective functions, etc.
Since many lightweight block ciphers adopt non-MDS (non-Maximum-Distance-Separable Matrix) layer (e.g.SKINNY [BJK + 16]), we develop the matching rules for non-MDS operations, which include the modellings for the differential equations and the propagation of the two neutral sets.We develop the propagation rules of the two neutral sets, which will dominate the overall time complexity and should be balanced and minimized.When optimizing the computational complexity of the DFA attacks, the number of faults we need to inject is also one of the parameters to evaluate the strength of the attacks, which is one of the factors we considered in our objective function.To keep our DFA practical, we perform multiple MitM DFAs to recover the full key, while each MitM DFA only recovers a fraction of the key bits.Therefore, we develop the objective function to balance and minimize the time complexities of different MitM DFAs.Our tool is generic by writing down the linear inequalities on the matching part, the propagation of differential faults, the propagation of neutral sets, and the objective function.So, implementing the model for a different block cipher is just to replace the linear inequalities for each part.
As applications, we show better DFAs for SKINNY, CRAFT, QARMA, PRINCE, PRINCEv2, and MIDORI found by our tools in Sect.4, where the positions of fault injections can be in 1 to 9 earlier rounds than the best previous attacks.We also use our model to search for DFAs on AES-128/-192/-256, but only get improvements to the secondary steps of Derbez et al.'s attacks [DFL11] on AES-192 and AES-256.In [DFL11], Derbez et al. extended the DFA attacks to AES-192 and AES-256.In the first step, they use MitM approach to recover the subkey in the last round, and in the second step they recover the remaining key bytes following the idea of Piret and Quisquater [PQ03].The time complexity of the second step is about 2 40 .In Section 4.6, we give an improvement on the second step, instead of employing Piret and Quisquater's method, we use another step of MitM attack.The time complexity of our improved step is 2 10.6 .However, the overall time complexity of the DFA is dominated by the first step, so the time complexity of our attack is the same as Derbez et al.'s.We summarise our improved differential fault attacks in Table 1.The attacked "Round" means the number of non-linear layers between the position of the fault injection and the ciphertext.More rounds attacked means that there are more rounds need to be protected when implementing the cipher on devices.Columns with "Round (Previous)" and "Round (Ours)" list the best results of previous DFAs and our improved ones.The "Time" column and "Memory" column show the time complexities and memory complexities of our attacks, and the "# of faults" list the number of byte (nibble) faults need to inject in our attacks.Our model and source codes are publicly available at https://github.com/yqy-yu/MITM-DFA.

Comparisons with previous MILP-based MitM attacks.
In mathematic cryptanalysis, MILP-based MitM automatic models [BDG + 21, DHS + 21, SS22] have been proposed.In their models, they do not need to model the differential fault propagation.Therefore, their matching point is easier and only related to the propagations of two neutral sets, while our modelling for the matching point has to consider the propagations of both the neutral sets and the differential fault.Since our MitM DFA is for the practical attack, the sizes of two neutral sets are kept small, i.e., one MitM attack only recovers a small fraction of the full key.To recover the full key, we perform multiple MitM DFAs.Therefore, the overall time complexity is dominated by an MitM attack with the highest complexity.While previous models [BDG + 21, DHS + 21, SS22] only consider one MitM attack.Therefore, the objective function is different. 1The ability in [KAKS22] is stronger, where they need to inject one-bit faults and know which bit is injected.

Differential Fault Analysis using MitM
At CHES 2011, Derbez et al. [DFL11] proposed a differential fault analysis on AES using the MitM method shown in Figure 1.Note that R = 10 for AES-128.They induced a random byte fault injection at the end of round R − 4. For AES-128, according to the differential propagation rules, they can obtain several differential equations of the state at the beginning of Round 9 (i.e., X 9 ) between the correct and faulty states, i.e., Other matching patterns for column 1 of XR−1

Active Cell Inactive Cell Values for matching
Subkey cells need not be guessed where X 9 [i] denotes the i-th byte in the correct state X 9 , X 9 [i] denotes the i-th byte in the faulty state X 9 , and I ∈ F 8 2 .These differential equations can provide a filter of subkey bits by constructing relationships, e.g., (2) Use C to denote the ciphertext state, the state X 9 can be represented as Denote K 0 as the whitening key, K i (i = 1, 2, ..., 10) as the subkey of Round i and When computing X 9 [0] with Equ.(3), 5 key bytes, i.e., K 10 [0, 7, 10, 13] and U 9 [0], need to be guessed.Similarly, to compute X 9 [1], another 5 key bytes also need to be guessed, i.e., K 10 [3, 6, 9, 12] and U 9 [13].Since X 9 [0] and X 9 [1] are computed independently with two independent key subsets, we denote the two key subsets as neutral key sets.For a given pair of correct and faulty ciphertexts, the relationship in Equ.
(2) has to be satisfied, which acts as a filter of 2 −8 for the subkey space of 10 bytes involved in Equ.

Description of SKINNY
For AddRoundTweakey (ART), the first and second rows of subtweakey ST K are xored to the internal state.Initially, The ST K for each round is obtained by tweakey update function, which consists of two parts: firstly, a permutation P T = [9, 15, 8, 13, 10, 14, 12, 11, 0, 1, 2, 3, 4, 5, 6, 7] is applied to the tweakey arrays.Then, use different LFSRs to update the the first and second rows of T K 2 and T K 3 if t = 2n or t = 3n.3 Programming the MitM-DFA with MILP

Formulate the MitM Differential Fault Analysis
In this section, we show the specific modelling schemes for SKINNY.To facilitate the visualization of our analysis, each cell can take one of the five colors (White, Gray, Red, Blue, and Orange) according to certain rules, and a valid coloring scheme in our model corresponds to a configuration of the MitM differential fault analysis.The semantics of the colors are listed as follows: • White(W): Inactive in forward differential propagation.
• Blue(B): Known cells by guessing the key cells in Set 1 .
• Red(R): Known cells by guessing the key cells in Set 2 .
For SKINNY with R rounds, we set the round of fault injection as R s , and use R m to denote the round of the matching point.For each internal state A in the rounds R s to R m − 1, we introduce the Binary variable a A i to denote if the i-th cell of state A is active, i.e., a A i = 1 means active cell with Gray color; a A i = 0 means inactive cell with White.For the states in the rounds R m to R, we introduce two additional Binary variables b For rounds from R s to R, the propagation of Gray and White cells depends on the differential propagation.For rounds from R to R m , the propagation of R, B and O rely on the backward computation rules.

Programming the MILP Model
In this section, we show how to build constraints for each component and how to solve the model for specific block ciphers.Use SKINNY as an example, we show the details below.

Constraints for the states from Round R s to Round R m − 1
According to the differential property of SKINNY, we model the constraints of active cells before and after SC, SR and MC operations.Use X r , Y r and Z r to denote the state before SC, SR and MC operations in round r, respectively.Let ST K r denote the tweakey state used in round r.Since ARK and AC operations don't affect the propagation of the differential, we only list the the rules for SC, SR and MC operations below.
• SC: Because we only consider the cell-level truncated differential propagation, for each S-box, the output difference is nonzero if and only if the input difference is nonzero.The constraint for SC is a Yr i − a Xr i = 0, ∀ 0 ≤ i ≤ 15.We add the constraint i a = 1 to make sure that random fault is injected to one cell at the starting point.
• MC: For each column j of MC, we build constraints according to the following rules, where 0 ≤ j ≤ 3: Following the method by [SHW + 14], we convert the above equations to inequalities.For the matching point X Rm = MC(Z Rm−1 ), the possible matching rules are given as follows (also in Figure 4): • Rule-1: With input truncated differential form 1*00, the 0-th cell, the 1-th cell, and the 3-th cell of the output difference are possible for matching.
For example, for holds according to the property of MC of SKINNY.Therefore, if ∆B[0] can be computed by key subset marked by Blue and ∆B[3] can be computed by another key subset marked by Red, then a MitM attack can be performed with these two neutral key subsets and the relation ∆B[0] = ∆B[3], i.e., a matching point exists.We restrict that only the three cells B[0, 1, 3] can be marked by Blue or Red.In other words, if this column has both Blue and Red cells, there exists one matching equation.
• Rule-2: With input truncated differential form 1*10, the 0-th cell and the 3-th cell of the output difference are possible for matching.
• Rule 3: With input truncated differential form 1*01, the 1-th cell and the 3-th cell of the output difference are possible for matching.
• Rule-4: With input truncated differential form 0010, the 0-th cell, the 2-th cell and the 3-th cell of the output difference are possible for matching.
• Rule-5: With input truncated differential form 0110, the 0-th cell and the 3-th cell of the output difference are possible for matching.
• Rule-6: With input truncated differential form 0011, the 2-th cell and the 3-th cell of the output difference are possible for matching.
We introduce Binary variables match i (0 ≤ i ≤ 3) for each column.If there exists at least one Blue cell and one Red cell in the i-th column, then match i = 1, otherwise match i = 0 or the model is infeasible.This can be done with the constraints , for 0 ≤ j ≤ 3. We call DoM = i match i the degree of match, and add constraint DoM = 1 to make sure that one differential equation is utilized to act as a filter in each injection phase.

Constraints for the states from Round R to Round R m
In this section, we show the constraints for the known values by guessing two neutral key subsets in DFA on SKINNY.Use b A i and c A i to denote the value of the i-th cell in the state A, which is known by guessing subkey cells in Set 1 and Set 2 respectively.The propagation rules of b A i and c A i are determined by operations in the decryption direction.We list the rules for different operations for b A i , and the constraints for c A i are the same.
• SC: If the output of SC is known, the input is known, i.e., b Xr i − b Yr i = 0, ∀ 0 ≤ i ≤ 15.
• ARK: For SKINNY, the first and second rows of subtweakey array are XORed to the internal state.If a cell is XORed with the subtweakey state, we can deduce the value of the cell by guessing the corresponding cell in the subtweakey state, else we need not guess any subtweakey cell.Thus we have the following constraints for ARK: • SR: For SR operation, a cell permutation P is applied on the internal state, i.e., b Yr P • MC: For each column j, the known cells in the input column deduced from the output column follow the rules below, where 0 ≤ j ≤ 3: Following the method by [SHW + 14], we convert the equations to inequalities.
According to these principles, we build constraints for internal states from round R to R m .

Objective Function and Solving Process
To properly evaluate the performance of our attacks, we take into account two factors: the number of faults required to inject and the time complexity of recovering the key cells for each injection.Since multiple fault injections may be necessary to fully recover the key, the solving phase is a multi-step process.
For SKINNY-n-n, the key schedule is linear, where a cell-wise permutation P T is applied in each round to update the tweakey matrix.Denote the master tweakey state as tk, where tk i represents the i-th cell of the state.To prevent repeatedly guessing one byte of the tweakey across multiple rounds, we introduce KB i and KR i (0 ≤ i ≤ 15) to indicate whether each tweakey cell belongs to Set 1 or Set 2 .We constrain the variables with , and .
We introduce the auxiliary variables KG i (0 ≤ i ≤ 15) to represent whether tk i has been guessed or not, which are initialized as KG i = 0, ∀ 0 ≤ i ≤ 15.After each fault injection and key recovery phase, we update the state with KG i = 1 if tk i has been obtained in the key recovery phase, i.e., we update the model according to the following equation after each solving phase: Use V to denote the "NOT" operation for any variable V .In each solving phase, denote the number of cells that we need to guess in each tweakey subset (Set 1 or Set 2 ), and n 3 = 15 i=0 (KG i ∧ KB i ∧ KR i ) denotes the number of cells in tweakey state that we have not guessed.For each recovery phase, the time complexity to recover the full tweakey state is about ( to search for the attack with the optimal time complexity.To ensure our attack can run in a practical time, We add constraints for KB i and KR i .For example, we add constraints to make sure the time complexity of each filter phase is bounded by 2 40 .
To perform the key recovery attack with fewer fault injections, we expect to exhaust the matching degree that can be got from each injection.Thus, we set the cell with a fault injection at the starting point to always have a difference, and repeatedly solve the model until there are no differential equations that can be used for matching.That is, if we get a X Rs i = 1 after the first solving phase, we add constraint a = 1 to the model and run the optimizing phase again, until the model is infeasible.Then we remove the constraint a X Rs i = 1.The results of each optimizing phase will be output.

Applications
In this section, we show some applications of our automatic search model.We applied our algorithm to SKINNY, CRAFT, QARMA, PRINCE , PRINCEv2, and MIDORI block ciphers.All the results are shown in Table 1.

DFA on SKINNY-n-n
By utilizing our automatic search model, we are able to develop better DFAs on SKINNY-n-n, with fault injections at the beginning of round R − 8. Denote the i-th cell in the master tweakey state as tk i .In this section, we show the attack for SKINNY-128-128 by steps as an example.
• Injection 1: As shown in Figure 5, firstly we inject random byte faults at X R−8 [13], and query for the correct and faulty ciphertext pairs.According to the matching rules we build for SKINNY, we get two differential equations that correspond to filter phase Filter 1 and Filter 2, i.e.Filter 1 uses the differential equation ∆X R−2 [1] = ∆X R−2 [13] to filter the wrong key bytes and Filter 2 uses ∆X R−2 [4] = ∆X R−2 [12] to do that.

Active Cell Inactive Cell
Determined by Set1 (Set2) Determined by Set1∩Set2 Subkey cells need not be guessed for all 6 pairs of ciphertexts with all possible hypotheses of {tk 3 , tk 10 , tk 15 }.If there is an intersection with the index, the 6-byte key is potentially correct.We expect to retrieve one correct value of the 6 tweakey bytes {tk 2 , tk 3 , tk 9 , tk 10 , tk 13 , tk 15 } due to the 6 pairs match of ∆X R−2 [1] = ∆X R−2 [13].The time complexity of this step is about 2 24 and the memory cost is also about 2 24 .
Filter • Exhaustively Search: For the remaining 3 bytes of tweakey that we have not guessed, we test all possible values and check the plaintext and ciphertext.This step costs a time complexity of about 2 24 .
The overall time and memory complexities of our attack are about 2 24 , and we need to inject 9 random bytes faults to recover the full key.
For SKINYY-64-64, the process is similar, which only alters the positions of tweakey cells guessed because of the difference in encryption round.And we can inject faults at the beginning of round R − 9 to get a more powerful differential fault analysis.
Similar to the attack on SKINNY-128-128, we inject random faults at X R−9 [13] and get the differential equation ∆X R−3 [1] = ∆X R−3 [13].Then we guess the tweakey nibbles in Set 1 and Set 2 to get the values of the left and the right side of the equation respectively, where Set 1 = {tk 2 , tk 5 , tk 9 , tk 11 , tk 12 , tk 15 }, Set 2 = {tk 6 , tk 7 , tk 10 , tk 13 , tk 15 }.We can utilize 10 pairs of correct and faulty ciphertexts with differences in X R−9 [13] to reduce the number of possible values of the 10 tweakey nibbles.We choose 10 random plaintexts and inject random faults at X R−9 [13].This phase costs about 2 24 time and 2 20 memory.And for the remaining 6 nibbles {tk 0 , tk 1 , tk 3 , tk 4 , tk 8 , tk 14 }, we exhaustively search and test with the correct plaintext and ciphertext, which costs 2 24 time for encryption.

Simulation results
We perform the simulation experiments according to our methods in Sect.4.1.1.We randomly choose keys and repeat the attack process for 1000 times.In each experiment, we vary the number of injected faults in each filter step and calculate the average number of key candidates after filtering.Our results are summarized in Figure 6, where the x-axis represents the number of fault injections in each filter step, and the y-axis represents the average number of remaining key candidates across 1000 simulations.For example, the red triangle point with the x-axis coordinate of 3 represents the number of remaining keys after filtering by 3 pairs of ciphertexts in Filter 2, which is 2 58.2 .In our experiments, with all 9 fault injections, the number of remaining key candidates after Filter 1-4 is always 2 24 , which validates that our attack is effective.All simulations are performed on a PC using C code, and each key recovery operation takes a few minutes.Our codes are available at https://github.com/yqy-yu/MITM-DFA.

DFA on SKINNY-n-2n
The attack in Sect.4.1.1targets at the SKINNY-n-n, where only T K 1 is used.SKINNY also supports versions with t = 2n and t = 3n.In the TWEAKEY framework, users can choose what part of the tweakey serves as input key material or tweak material, and T K 1 is recommended for processing the public tweak material.In this section, we consider the case of SKINNY-n-2n, where T K 1 is used for tweak and T K 2 is used for key material.
According to our injection model in Figure 5, we can extract L R−3 5,7] in Filter 1, where L 1 and L 2 denote the linear operations on T K 1 and T K 2 .L 1 only represents the permutation P T whereas L 2 denotes the combination of P T and LFSR on T K 2 .Since the tweak T K 1 can be seen as public, we can compute 5,7] straightforwardly from Filter 1. Then we can determine the corresponding bytes in T K 2 by inverting the LFSR and the permutation P T round by round.The same for other filter phases.
In the actual situation, the security does not rely on the privacy of tweak material, and it is also unrealistic to keep privacy in practice.So we can retrieve the key material for SKINNY-n-2n as same as for SKINNY-n-n.It indicates that the tweak material does not provide any extra protection against fault attack if keeping fixed.

Description of CRAFT
CRAFT [BLMR19] is a lightweight tweakable block cipher with a 64-bit block size, a 128-bit key K, and a 64-bit tweak T .The cipher's internal state can be represented as a 4 × 4 square array of nibbles or as a 16-nibble vector by concatenating the rows of the square array.Use the index 4 × i + j to denote the nibble at row i and column j of the 4 × 4 array, where 0 ≤ i ≤ 3, 0 ≤ j ≤ 3.
For CRAFT, each round function applies five involutory round operations: SubSbox(SB), MixColumn(MC), PermuteNibbles(PN), AddConstant(ARC) and AddTweakey(ATK).The Determined by Set1 (Set2) Determined by Set1∩Set2 Subkey cells need not be guessed In the key schedule, four 64-bit tweakeys T K 0 , T K 1 , T K 2 , T K 3 are derived from K = (K 0 ||K 1 ) and T , as where Q is a permutation and T K (i−1) mod 4 is applied in the i-th round.
When guessing the key cells, we can choose the nibbles in equivalent subkeys, i.e. we can adjust the operations in round function as

DFA on CRAFT
CRAFT [BLMR19] is designed to be efficiently protected in its implementations against differential fault analysis.However, we assume that the target implementation of CRAFT does not use such countermeasures, as studied in [RVB22].In round r, denote the state at the beginning of round r as X r , the state after MC as Y r , and the internal state after PN as Z r .Further, A[i] denotes the i-th nibble of the state A, where 0 ≤ i ≤ 15.
• Injection 1: As shown in Figure 7 , there is about 1 value of the key nibbles remaining.We need to inject 11 random nibble faults.The time complexity is about 2 36 and the memory cost is about 2 12 .
• Injection 3: Then we inject random nibble faults at Z R−10 [6], filter the subkey nibbles with 4 pairs of correct and faulty ciphertexts.The time cost is 2 16 .
• Injection 4: Finally we inject random nibble faults at Z R−10 [4], filter the subkey nibbles with 4 pairs of correct and faulty ciphertexts.And then exhaustively search the remaining four subkey nibbles, the time cost is 2 16 .
The overall time complexity of the attack is about 2 36 , and the memory cost is about 2 12 .The number of faults we need to inject in all phases is 11 + 9 + 4 + 4 = 28.

Description of QARMA
QARMA [Ava17] is a family of lightweight tweakable block cipher.It supports two kinds of block sizes n = 64 and n = 128, denoted by QARMA-64 and QARMA-128.Each version applies a master key of 2n bits.QARMA uses an Even-Mansour scheme with a keyed pseudo-reflector.The first r rounds of the cipher use the forward round function R(IS, tk), which is composed of four operations in order: AddRoundTweakey, ShuffleCells(τ ), MixColumns(M ) and SubCells(S).The (r + 1)-th round function omits ShuffleCells and MixColumns operations.The last r rounds use the backward round function R(IS, tk), which is the inverse of R(IS, tk).The round function used in the first and last rounds omits ShuffleCells and MixColumns.The matrix M used in MixColumns is defined as where M 4 is uesd for QARMA-64 and M 8 is used for QARMA-128.ρ can be seen as a simple circular left rotation of the bits.The 2n-bit K is partitioned as ω 0 ||k 0 .In encryption, k 1 = k 0 is used for each round and whitening keys ω 0 and ω 1 = o(ω 0 ) are added at the beginning and the end, respectively.In the key-recovery phase, we will guess the equivalent key M (τ (k 0 )).

DFA on QARMA-64
Because the matrix M used in MixColumns is different in each version, QARMA-64 and QARMA-128 have different differential matching rules.In this section, we show how to achieve the key recovery attack on QARMA-64.Part of the attack is shown in Figure 8.
with 8 pairs of correct and faulty ciphertexts.
• Injection 2: Inject random nibble faults at with 8 pairs of correct and faulty ciphertexts.
• Injection 3: Inject random nibble faults at with 4 pairs of correct and faulty ciphertexts.• Injection 4: Inject random nibble faults at with 4 pairs of correct and faulty ciphertexts.
• Injection 5 and Injection 6 respectively inject random faults at and Injection 6 filter . Each filter phase needs 2 pairs of correct and faulty ciphertexts.
After the six injection phases, we exhaustively search for the remaining 4 subkey nibbles M (τ (k 0 ))[2, 7, 8, 13] and retrieve subkey k 0 by linear transformation τ • M .Then we compute ω 1 by k 0 and k 0 ⊕ ω 1 .The overall time complexity of our attack is about 2 16 and the memory complexity is about 2 16 .It requires the injection of 28 nibble faults.

DFA on QARMA-128
In this section, we show the process of our fault attack on QARMA-128 with 2 32 time complexity and 2 32 memory complexity.
• Injection 1: We inject random faults at X R−3 [15] and get following equation system: Each equation provides a one-byte filter for involved subkey bytes.For example, we choose 8 plaintexts and inject random faults to get the correct and faulty ciphertexts.We can then guess the subkey bytes [14] for the 8 pairs.By filtering with ρ 3 • ∆Z R−2 [5] = ∆Z R−2 [9], we can retrieve the value of the 8 subkey bytes.
• Injection 2: Then we inject random faults at X R−3 [6] and get 2 differential equations for a further filter of subkey bytes, Only at most one unknown subkey byte is involved in one hand of each equation.So we choose 2 plaintexts and inject faults to get the correct and faulty ciphertexts.We expect to get the right value of M (τ (k 0 ))[6, 7, 9].
Then we exhaustively search for the remaining 4 subkey bytes M (τ (k 0 ))[3, 4, 5, 12] to retrieve the full key.The overall time complexity of this attack is about 2 32 and the memory cost is 2 32 .The number of faults we need to inject in the attack is 10.

Description of PRINCE
PRINCE [BCG + 12] is a lightweight block cipher that follows FX-construction with a 64-bit block size and a 128-bit key size.The 128-bit key K = K 0 ||K 1 is split into two 64-bit parts.K 0 is used as a whitening key and K 1 is used as round keys for the core of the structure, named as PRINCEcore.The round function R of PRINCEcore applies an S-box layer SB, followed by a linear layer consisting of a MixColumns operation MC and a ShiftRows SR.SR is the same as AES, and the matrices in MC-layer are built from the following four matrices: In MC-layer, bit-wise matrix M 0 is multiplied with the first and the last columns of the internal state, and M 1 is multiplied with the second and the third columns.
Other matching patterns of ZR−1 The last five apply the inverse of the forward round function R −1 , and the middle layer applies R .The round function R, R and R −1 are shown as: PRINCEv2 [BEK + 20] modifies the middle layer and key schedule of PRINCE.The 128-bit key is departed into K = (K 0 ||K 1 ) and alternately used in each function.The subkeys used in the last two rounds are K 0 ⊕ α and K 1 ⊕ β, where α and β are constants.The process for retrieving the full key is similar to that for PRINCE.

DFA on PRINCE
We can recover the full key of PRINCE with random faults injected at Z R−3 [0].As shown in Figure 10, we can build the following equation system according to differential characters: We guess {K 1 [11], K 0 ⊕ K 1 [1, 5, 9, 13]} to compute ∆Z R−1 [11] and filter the values of {K [15].This equation provides a 2-bit filter for each correct and faulty ciphertext pair.With 10 pairs of correct and faulty ciphertext pairs, we get 20-bit information on these 10 subkey nibbles.
Similarly, we can get 20-bit information of We can obtain further filters by other equations in Equ.(7).For example, m 3 Then we can build equation systems for other columns of ∆Z R−1 , like to filter the subkey nibbles.We expect to get the correct value of K 0 ⊕ K 1 and at least 48-bit information of K 1 .We exhaustively search for the remaining possible values of K 1 , which costs about 2 16 time, and compute K 0 to retrieve the full key.The overall time and memory complexities are both about 2 20 , with 10 fault injections.

Description of MIDORI
MIDORI [BBI + 15] is a family of low-energy block cipher, which includes two versions with block size n = 64 and n = 128, i.e., MIDORI-64 and MIDORI-128.Both versions accept a 128-bit keys K.For MIDORI-64, K is denoted as two 64-bit parts K = K 0 K 1 .The whitening key is W K = K 0 ⊕ K 1 and the round key is RK r = K (r−1) mod 2 ⊕ α r for 1 ≤ r ≤ 15.
Finally, we exhaustively search for the values of MC −1 (K 0 )[4, 15] and compute K 0 and K 1 to recover the full key.The overall time complexity is about 2 16 and the memory complexity is 2 16 , and the number of faults we need to inject is 15.
DFA on MIDORI-128.For MIDORI-128, the key used in the last two rounds is the same except for the constant addition.We can recover K[0, 1, 3, 4, 5, 7, 8, 9, 11, 12, 13, 15] according to the filter of Equ.(9) with faults injected to X R−4 [2].To compute the value of each side in Equ.(9), we need to guess 3 bytes of K, for example, K[0, 1, 12] for ∆X R−1 [5].This phase can be done with 6 pairs of correct and faulty ciphertexts.Then, we inject faults at X R−4 [9] and filter the subkey bytes according to Equ. (10).We expect to get the right value of the full key with a filter of 3 pairs.Thus, the overall time complexity of the attack on MIDORI-128 is about 2 24 , and the memory cost is also 2 24 , with 9 fault injections.

Extended Attack on AES-192
In Other matching patterns of XR−1 Step Step II.Two correct-faulty ciphertext pairs needed in our new Step II can reuse the pairs from Step I, and no additional fault injections are needed here.However, since Step I dominates the overall time complexity of the DFA on AES-192, the overall time complexity of the full attack is the same as Derbez et al.'s [DFL11].This process can be also applied to AES-256 to improve Step II.

Discussion
In order to achieve lightweight, many lightweight ciphers adopt sparser diffusion layers.For example, MIDORI, PRINCE and QARMA adopt the so-called 4 × 4 almost MDS layer.Note that the branch numbers (the smallest nonzero sum of active inputs and outputs of the matrix) of MDS and almost MDS matrices are 5 and 4. Therefore, the algorithms need more rounds to achieve full diffusion (any input bit nonlinearly affects all the state bits).For example, MIDORI needs 3 rounds to achieve full diffusion while AES needs 2 rounds.Ciphers with more lightweight diffusion layers, like SKINNY and CRAFT, respectively need 6 and 7 rounds to achieve full diffusion.
The main idea behind MitM DFA is to explore the potentially low diffusion of the target ciphers.The lower diffusion layers can cause more rounds that need to be protected against DFAs.With the MitM approach, a differential matching rule (e.g., Eq. (2)) exists with higher probability after more rounds of diffusion for one-byte faults, which will act as the matching to filter the wrong key guessing.Besides, with the lower diffusion layers, the number of key bits involved in the two neutral key sets (e.g., Set 1 and Set 2 in Figure 2) are smaller.It dominates the time complexity of the MitM DFAs, because the keys in Set 1 (and Set 2 ) must be enumerated to compute and filter with the differential equation (e.g., Eq. ( 2)).These are the reasons behind the attacks summarized in Table 1, where for SKINNY and CRAFT, the MitM DFA can achieve 9 and 10 rounds, but for MIDORI, PRINCE and QARMA, the attack can only work on 4 or 5 rounds.

Countermeasures
The double-check mechanism is a common countermeasure against fault injection attacks [MSY06, ML08, JMR07, BBK + 10].The crucial operations of encryption devices, which are vulnerable to fault analysis, should run twice.If the results of the two executions match each other, the results are credible.Indeed, this mechanism is always accompanied by a loss of efficiency.Our improved DFAs in Sect. 4 provide new insight into how many rounds of encryption devices need to be protected.For example, we suggest protecting at least the last 9 rounds of SKINNY-128-128, with the double-check mechanism, the running procedure of SKINNY-128-128 is shown below:  The number of rounds for SKINNY-128-128 is 40, in our fault detection scheme, 49 rounds need to be implemented.Such that the scheme requires about 49/40 computational sources of SKINNY-128-128.

Conclusion
In this paper, we present the MILP-based automatic tools for MitM DFAs and apply to SKINNY, CRAFT, QARMA, PRINCE, MIDORI and AES-192/256.We make full use of the differential matching rules and reach better key-recovery attacks for these block ciphers in practical time.We achieve fault attacks with faults injected in earlier rounds, which imply that more rounds of the encryption devices of these ciphers should be protected against DFA.

Figure 4 :
Figure 4: Rule for matching state of SKINNY.
[DFL11], Derbez et al. extended the DFA attack on AES-128 to AES-192.The DFA attack on AES-128 is stated in Sect.2.1.The extended DFA attack on AES-192 is briefly given as follows, Figure 1 represents the last four rounds of AES-192.

3.
After a random delay, check whether Res 1 = Res 2 .If yes, output Res 1 as ciphertext, else discard the result.

Table 1 :
Summary of DFA Results

Generalizations and Notations of MitM DFA. Use
[BDF11] andFouque introduced an ad-hoc and dedicated automatic tool to search MitM DFAs on AES[BDF11].In this paper, we try to build generic automatic models based on MILP for MitM DFAs to realize the optimal fault injection and key recovery phase.Round 1 to Round R to denote the iterated round function of block ciphers, and assume we can inject random cell (byte or nibble) faults at a specified internal state to obtain correct and faulty ciphertexts.Denote the round we inject faults as Round R s , and denote the round R m as the matching point if there exist differential equations that can act as a filter of subkeys, e.g., ∆S[i] = ∆S[j], where ∆S[i] is only determined by key cells in the neutral set Set 1 and ∆S[j] is determined by neutral set Set 2 .Thereafter, a MitM DFA is performed to recover the subkey.Figure can be calculated with the correct and faulty ciphertext pairs by guessing the tweakey cells {tk 2 , tk 9 , tk 13 }, and the differential state ∆X R−2 [13] can be calculated by guessing {tk 3 , tk 10 , tk 15 }.Equ.(4)provides a filter of 2 −8 .Therefore, we require 6 pairs of correct and faulty ciphertexts with differences at X R−8[13]to recover the tweakey bytes {tk 2 , tk 3 , tk 9 , tk 10 , tk 13 , tk 15 }.Such that we choose 6 plaintexts, and inject random byte faults at X R−8[13]to get the corresponding ciphertexts, and guess the tweakey bytes to compute the differential state ∆X R−2 [1] and ∆X R−2[13].After the filter of Equ.(4), there is about 1 possible value of {tk 2 , tk 3 , tk 9 , tk 10 , tk 13 , tk 15 } remaining.We use hash tables to reduce the complexity of this process.Specifically, we calculate the differential state ∆X R−2[1]for 6 pairs of correct and faulty ciphertexts, and store the corresponding tweakey bytes {tk 2 , tk 9 , tk 13 } indexed by the value ∆X j R−2[1](1 ≤ j ≤ 6) of the 6 pairs.Then we calculate ∆X R−2[13] 2: We use the differential equation ∆X R−2 [4] = ∆X R−2[12]to filter the tweakey bytes, similar to the process in Filter 1.With already recovered {tk 2 , tk 9 , tk 13 } in the previous phase, we guess tk 12 to compute the value of differential state ∆X R−2 [4], and guess {tk 4 , tk 11 , tk 12 } to compute the value of differential state ∆X R−2[12].We build a hash table indexed by ∆X R−2 [4] for the 6 ciphertext pairs obtained in Filter 1 to filter the key bytes, where about 1 candidate remaining.The time complexity of this step is about 2 24 and the memory cost is about 2 8 .In this stage, we inject random byte faults at X R−8[14]and filter the key bytes.This is a two-step process involving Filter 3 and Filter 4.