Practical Multiple Persistent Faults Analysis

. We focus on the multiple persistent faults analysis in this paper to ﬁll existing gaps in its application in a variety of scenarios. Our major contributions are twofold. First, we propose a novel technique to apply persistent fault apply in the multiple persistent faults setting that decreases the number of survived keys and the required data. We demonstrate that by utilizing 1509 and 1448 ciphertexts, the number of survived keys after performing persistent fault analysis on AES in the presence of eight and sixteen faults can be reduced to only 2 9 candidates, whereas the best known attacks need 2008 and 1643 ciphertexts, respectively, with a time complexity of 2 50 . Second, we develop generalized frameworks for retrieving the key in the ciphertext-only model. Our methods for both performing persistent fault attacks and key-recovery processes are highly ﬂexible and provide a general trade-oﬀ between the number of required ciphertexts and the time complexity. To break AES with 16 persistent faults in the Sbox, our experiments show that the number of required ciphertexts can be decreased to 477 while the attack is still practical with respect to the time complexity. To conﬁrm the accuracy of our methods, we performed several simulations as well as experimental validations on the ARM Cortex-M4 microcontroller with electromagnetic fault injection on AES and LED , which are two well-known block ciphers to validate the types of faults and the distribution of the number of faults in practice.


Introduction
Fault attacks are a class of physical attacks that consists of two phases; (1) fault injection and (2) fault analysis. In the first phase, the adversary tries to disturb the operation of the target device by using the available tools for injecting the desired fault. Fault can be induced by some common methods, such as clock glitches, voltage starving, voltage spikes, electromagnetic pulses, laser pulses, light pulses, hardware Trojans, and so on. In the second phase, the adversary analyzes the response of the target device to the fault with the aim of retrieving some sensitive information like the secret key.
Boneh et al. were the first to introduce fault attacks with their application on RSA [BDL97]. Shortly after this seminal work, Biham and Shamir proposed the Differential Fault Analysis (DFA) on block cipher DES [BS97]. DFA is the most common fault analysis technique that has been successfully applied to various block ciphers. Besides, novel techniques have been proposed in follow-up works such as Fault Sensitivity Analysis (FSA) [LSG + 10], Differential Fault Intensity Analysis (DFIA) [GYTS14], Safe-Error Analysis (SEA) [YJ00], Ineffective Fault Analysis (IFA) [Cla07], Statistical Fault Analysis (SFA) [FJLT13], and so on. Dobraunig  The generated faults can be generally classified into three categories based on the fault duration. Most proposed fault attacks rely on transient faults which have a temporary effect on the system. The permanent fault is another type of fault that has irreversible effects during the lifetime. The third type of fault is the persistent fault that drops between transient and permanent faults. The faulty value persists from one encryption to another one, while it disappears when the target device is reset.
The notion of persistent fault was first coined by Schmidt et al. who presented an attack on AES by erasing the non-volatile memory with ultraviolet light [SHP09]. This notion was later expounded by Zhang et al. in an extended way to a more dedicated framework [ZLZ + 18]. Persistent fault attack or analysis (PFA) has several advantages over the previous fault attacks. The main advantageous features are as follows: 1) The attack only requires a set of ciphertexts, which is a less restrictive model than known-or chosen-plaintext models. 2) The fault injection does not require precise timing or location based on the synchronization of the encryption process. 3) PFA takes advantage of the inherent characteristic to bypass some redundancy-based countermeasures.

Related Works
The original PFA proposed in [ZLZ + 18] has a number of limitations. The major limitation of original attacks concerns the assumption that the fault location and the value of the fault are known to the attacker. In a follow-up work at CHES 2020 [ZZJ + 20], it is shown that the assumption of having exact knowledge of the fault value can be relaxed. Besides, this work introduced a new framework based on utilizing maximum likelihood estimation (MLE) that led to a 28% reduction in the required ciphertexts compared to the original attack [ZLZ + 18]. However, this framework is only applicable if a single fault occurs, and it cannot be extended to the multiple-faults setting unless the attacker knows the exact location and value of the fault again. This issue was addressed in [ESP20]. Engels et al. presented a new attack called Statistical Persistent Fault Analysis (SPFA), by making use of the statistical fault analysis (SFA) [FJLT13] in the persistent fault settings. While SPFA makes it possible to perform persistent fault analysis in the multiple-faults setting without requiring the location and value of fault, the average residual key entropy is still high. For instance, the correct key of AES can only be found through an exhaustive search with a complexity of 2 50 . Hence, the overall complexity of SPFA is dominated by the process of finding the correct key among possible candidates after performing the attack.
Aside from the the aforementioned works on circumventing PFA constraints, several publications focused on other aspects, mainly demonstrating the application of the original attack to various ciphers and implementations. Pan et al. indicated that PFA can break any higher-order masking countermeasures with only one persistent fault injection in [PZRB19]. Caforio and Banik presented studies on the application of PFA for reverseengineering purposes in the chosen-key model [CB19]. Gruber et al. applied the PFA to authenticated encryption schemes such as Deoxys-II, OCB, and COLM [GPT19]. In [CGR20], the authors meliorated the PFA by using several steps such as estimation theory, rank, and key combination algorithms. Very recently, Xu et al. proposed Enhanced PFA (EPFA) [XZY + 21] which is more a key recovery approach in the single fault model. In their work, they employ the leaked information in other rounds also to reduce the number of required ciphertexts. However, an extension of this method to the multiple faults setting is challenging, as we will discuss later. Besides, the improvement in this method comes with the cost of a less feasible assumption in which the attacker knows the location of the faults.

Our Contributions
Persistent fault attacks that are published so far can be classified in different ways: 1) Whether the value of fault is known or unknown. 2) Whether the attack can be efficiently extended in the multiple-faults setting. 3) Whether the average residual key entropy (and consequently the overall complexity) is low or high. 4) Whether finding the correct key among the remaining candidates (after performing the PFA) is considered in the ciphertext-only scenario or the known-plaintext scenario.
By considering the proposed attacks based on the aforementioned classifications, a number of important problems have remained open that can be identified as follows: 1. While multiple faults is a realistic scenario, there is no framework for applying PFA in a multiple-faults setting that is as efficient as a single fault setting and does not require knowledge of the value of the fault. It is important to note that multiple persistent faults may occur during practical fault injection, especially as the technology node shrinks faster than the fault injection capability.
2. Because the PFA cannot uniquely retrieve the secret key, especially when there is a limited amount of data available, it is critical to have an efficient method for obtaining the correct key in the ciphertext-only scenario rather than utilizing a pair of plaintext-ciphertext in the known-plaintext scenario. To the best of our knowledge, there is no efficient and generalized framework applicable to all SPN ciphers to uniquely retrieve the correct key from a given set of key candidates in the ciphertext-only scenario under the assumption of multiple persistent faults occurring, with the exception of [ESP20], which requires a quite high runtime.
Our contribution is twofold. In the first part of our twofold contribution, we propose a novel technique that enables us to significantly reduce the number of remaining key candidates up to 2 n for an n × n Sbox in the cipher. The main features of the proposed methods are as follows: a) It does not rely on knowledge about the value or the location of the fault. b) It can be effectively extended to the cases where only an extremely limited number of ciphertexts is available, e.g. less than 500 ciphertexts, while keeping the number of remaining candidates within a very reasonable bound. The main results of this part are compared with noted previous works in Table 1. In contrast to previous works, our methods are parametric. Hence, Table 1 includes only the selected results of this paper. For more instances, we refer to Section 3.
In the second part, we introduce two generalized and efficient methods for determining the correct key among a set of key candidates in the ciphertext-only scenario. These methods are described in Section 4 and target ciphers which use large Sboxes and lightweight ciphers with small Sboxes. Our experimental results demonstrate that the key-recovery processes can retrieve the correct key with the success probability one in a very short time.
Besides, we performed practical electromagnetic fault injection experiments which are reported in Section 5. The experiments were performed on the 32-bit ARM Cortex-M4 microcontroller running both AES and LED. The experiments validate that faulting multiple Sbox elements is more likely than faulting single elements. Moreover, the key recovery is validated on the resultant ciphertext.
The proposed method is validated on block ciphers with different block sizes and in different scenarios based on the number of faults. In particular, we tested on AES-128 and LED-64 to represent the Sbox of 8 × 8 and 4 × 4 respectively. The source code for our simulations in pure Python3 language is publicly available at address below: https://github.com/hadipourh/faultyaes.
and y r [j] denote the j th word input and output of the substitution layer in the r-th round, respectively. Hence, the j th word of the ciphertext can be calculated as is the j th word of the last round key. Let us assume that the injected fault alters the correct value v to the faulty value = v * = v ⊕ ∆ where S * denotes the faulty look-up table of Sbox. Hence, the value of v no longer appears in y R [j], while the value of v * is expected to appear twice as usual in y [j]. As a result, the probability distribution of y R [j] and consequently the probability distribution of C[j] is not uniform. Hence, the adversary collects N ciphertexts and calculates the statistic distribution of each byte of ciphertext by counting the appearance of each of the possible values in C[j] for all 0 ≤ j ≤ L − 1. If a large enough number of ciphertexts N is available, the adversary is able to find the minimal and maximal number of counts uniquely. We denote the value that is not observed in C In the aforementioned methods, the adversary needs to know the exact position or the value of the fault. Besides, a large number of faulty ciphertexts is required to find C min [j] and C max [j] uniquely. In response to these challenges, Zhang et al. adopted the main principles of the original PFA attack but made use of Maximum Likelihood Estimation to estimate C min [j] without knowing the fault position or the value of the fault [ZZJ + 20].

Limitations of PFA in Multiple-Faults Setting
An extension of the proposed technique in [ZZJ + 20] to the case with multiple faults is possible, but this comes with the cost of assuming the adversary knows the value of the fault. Recently, Engels et al. [ESP20] proposed Statistical Persistent Fault Analysis (SPFA) that is applicable to a block cipher in the multiple-faults setting while the attacker does not need to know the value of the fault. The proposed method can be applied to any number of faults, but as we discuss later, its runtime is quite high.
The insurmountable bottleneck of existing PFA techniques in the multiple-faults scenario is the high entropy of residual keys, where an exhaustive search with almost infeasible time complexity is required to be performed to find the correct key. The key candidates cannot be reduced even by increasing the number of faulty ciphertexts. After performing the PFA based on the proposed techniques in [ZLZ + 18, ZZJ + 20], the number of key candidates cannot become less than λ L where λ is the number of faults. For instance, independent of the available ciphertexts, at least 2 48 and 2 64 candidates will remain after applying in PFA on AES for the case of multiple faults λ = 8 and λ = 16, respectively. Likewise, SPFA suffers from a similar limitation, since the correct key of AES can be retrieved with the time complexity of 2 50 [ESP20] which is a considerable runtime.

Target Ciphers
Our methods are applicable to SPN block ciphers. However, we apply the proposed methods to two well-known ciphers, i.e., AES [DR99] and LED [GPPR11], to demonstrate the flexibility of our methods.

Our Framework for PFA with Multiple Faults
This section starts with a short introduction to the fault model which is considered in this paper. Then we give an overview of a specialized technique that makes performing PFA feasible in the multiple faults setting. Subsequently, we introduce our method for different scenarios based on the number of available ciphertexts.

Fault Model and Notation
The conventional PFA technique on table-based implementations of SPN ciphers and associated notation presented in Section 2.1 can be generalized to multiple faults. Suppose the correct values V = {v 0 , · · · , v λ−1 } are altered to the faulty values V * = {v * 0 , · · · , v * λ−1 }, respectively such that V ∩ V * = ∅. 1 Similar to the PFA with a single fault, we can take a similar discussion to conclude exactly λ values will never be observed in each word of the ciphertexts C[j] where 0 ≤ j ≤ L − 1. To detect these impossible values, the adversary counts the appearance of each of the possible values in C[j] for all 0 ≤ j ≤ L − 1, given N faulty ciphertexts. Assume that the adversary observes λ j minimum values for each word of the ciphertext C[j]. We denote the observable minimum values of the byte ⊕ v equals to one of minimum values of the byte C[j]. If V is known and N is large enough, this relation can be used to retrieve sk R [j]. There exist at least λ impossible values due to the existence of λ persistent faults. If N is insufficiently large, the number of observable minimum values λ j can be larger than λ. In the next part, we discuss the relation between the number of available faulty ciphertext and the number of minimum values.

The Effect of Available Data
The exact value of λ j − 1 depends on the number of available faulty ciphertexts N . If N is large enough, it is expected for each word of the ciphertext C[j], exactly λ j = λ minimum values will be observed. If a limited number of ciphertexts is available, each set D j for 0 ≤ j ≤ L − 1 is expected to have more than λ elements. The estimation of λ j can be seen as an instance of the coupon collector's problem, a well-known problem in probability theory. In what follows, we aim to find a closed formula for the answer to this question: How many ciphertexts (N ) are required to expect m = 2 n − λ values to be observed given m = 2 n − λ possible values for C [j]? Let N denotes the number of ciphertexts needed to observe m values, and t i denotes the time required to observe the i th value after (i − 1) th value has been observed. Then the relation Due to the linearity of expectations, the expectation of N can be computed as shown in Equation 1.
where H n is the n-th harmonic number. Based on Equation 1, we can calculate the number of observable minimum values λ given the number of ciphertexts N and the number of faults λ.
On the other hand, the number of observable minimum values λ can be estimated by an exponential function given in Lemma 1.

Lemma 1. Given the number of ciphertexts N and the number of faults λ, the number of observable minimum values λ can be estimated by the exponential function
Proof. The harmonic number H m can be approximated as ln(m) + γ + 1 2m where γ is the EulerâĂŞMascheroni constant. Hence, from Equation 1, we have: In the next parts, we demonstrate how the PFA can be performed efficiently in different scenarios to retrieve the secret key in a multiple-faults setting.

Core Idea
If N is large enough, for each word of the ciphertext C[j] it is expected to observe exactly λ minimum values, which means λ = λ holds. Hence, each value of C mini [j] is equal to the exclusive-or difference between one of the corrupted values v ∈ V and sk R [j]. Consequently, the set of minimum values of the byte C[j] is equal to (2) We let δ j denotes the exclusive-or difference between sk R [0] and sk R [j]. The crucial observation that can be deduced from Equation (2) is that there exist a relation between each set D j and the set D 0 as expressed in Equation (3).
. In other words, given any C mini [j] there exists i such that the relation C mini [j] ⊕ C min i [0] = δ j holds. We exemplify this important observation by considering a case with three persistent faults (λ = 3). Given enough ciphertexts, exactly two minimum values can be observed for each byte Considering Equation (2) and without loss of generality, we assume It is easy to observe that the following equations hold: Equation (3) implies that a careful comparison between the set of minimum values of the byte C[0] and the set of minimum values of the byte C[j] might determine the value of δ j . One should note that any information about the value of δ j where 0 ≤ j ≤ L − 1 can directly lead to a decrease in the number of candidates for the last round key. In the case that the adversary could retrieve the values δ j for all 0 ≤ j ≤ L − 1, it is sufficient for him/her to guess only the value of sk R [0], since other words of the last round key can be determined simply as sk The method described here may bear some resemblance to collision attacks that rely on internal collisions detected via side-channel leakage [SWP03,SLFP04]. Collisions in side-channel attacks imply that associated intermediate values are equivalent, whereas we create a set of relationships between the intermediate values using the ciphertext's impossible values. Additionally, collision attacks are relevant because they rely on certain key-dependent intermediate values, despite the fact that they are highly dependent on the cipher's structure and key schedule, whereas our approach makes no use of the cipher's internal specifications (like linear layer).
In the next parts, we will precisely describe how Equation (3) can be exploited to retrieve the values δ j in different scenarios. In this work, we discern between two scenarios based on the number of available ciphertexts.

Multiple Persistent Faults with a Large Number of Ciphertexts
Assuming that a sufficient number of ciphertexts are available, each set D j for 0 ≤ j ≤ L−1 has exactly λ elements. This scenario corresponds to the model assumed in Section 3.3. Equation (3) implies that there exist exactly one value where 0 ≤ ≤ λ − 1 such that the relation C min0 [0] ⊕ C min [j] = δ j holds. We use this fact to retrieve the value of δ j as illustrated in Algorithm 1. More precisely, we compute the value α = C min0 [0] ⊕ C min [j] for all 0 ≤ ≤ λ − 1, and verify whether the relation D j = D 0 ⊕ α holds or not. If this relation satisfies for α , then we conclude that δ j = α holds.
We refer to the below table as an example of performing Algorithm 1 for a byte-oriented block cipher (n = 8). The first and second rows represent the elements of D 0 and D 1 , respectively. D 0 0x49 0x52 0x74 0x8C 0x94 0xA5 0xD2 0xE5 D 1 0x53 0x6E 0x75 0x82 0xAB 0xB3 0xC2 0xF5 We can easily verify if Then we continue the process for the value In other words, for any C min [0] there always exists an element in D 1 such that the difference between the elements is equal to 27: if E ∈ D then 9: cnt ← cnt + 1 10: It is easy to check that the relation D 1 = D 0 ⊕ α holds only for = 1. Hence, we can deduce that δ 1 = α 1 = 27.

Complexity Analysis
We assumed that the number of minimal values observed for each of the bytes C[j] equals λ. In other words, we assumed λ = λ. The expected value of N can be computed using the formula given by Section 3.2. By replacing m = m = 2 n − λ in Section 3.2, the expected value of N can be calculated as given in Equation (4).
We observe in the case of two faults (i.e. λ = 2), Algorithm 1 returns two candidates for each value of δ j . This observation comes from the fact that if the relations holds as well. In this case, it is not possible to determine δ j uniquely. The attacker needs to guess the value of sk R [0] and calculate sk R [j] = sk R [0] ⊕ δ j . Since there are two candidates for each value of δ j , the number of candidates for the last round key equals to In case the number of faults is larger than two (i.e. λ ≥ 3), the condition in line 11 of Algorithm 1 does not necessarily hold if α = δ j . The probability that a random n-bit value α satisfies the condition in line 11 of Algorithm 1 can be estimated by This probability decreases with the number of faults. Hence, the success probability of our method always increases with the number of faults. We remind that the value of δ j can be retrieved by Algorithm 1 by considering all α where 0 ≤ ≤ λ − 1. Hence, the expected number of values for δ j (i.e., |∆ j |) can be estimated by Equation (7) where 1 ≤ j ≤ L − 1.
It is worth noting that for typical values of λ, and n the terms 1 is almost equal to 1. The adversary needs to only guess the n-bit sk R [0] as the remaining part of To sum up, the number of remaining candidates for the last round key equals to Table 2 includes the number of required ciphertexts derived from Equation (4), and the number of remained key candidates derived from Equation (5) in case of λ = 2 and Equation (8) in case of λ ≥ 3 for an SPN cipher, some typical values of λ and λ , when n = 8 and L = 16. It can be observed that in all cases the number of key candidates is significantly less than the number of possible candidates for the last subkey, i.e., 2 n·L . Besides, the number of required ciphertexts decreases slightly as the number of faults increases.

Simulation Results
We evaluated the accuracy of Equation (4) by selecting AES-128 as the target cipher. Assuming that λ faults are applied, we altered λ values in the AES Sbox at random such that there is no overlap between original and faulty values. Similar to the previous works, persistent faults were applied after deriving the sub-keys in our simulations. After encrypting N random plaintexts, we counted the number of distinct observed values at each output byte of ciphertext. We repeated this experiment for 100 random secret keys and counted the average number of observed values for an arbitrary byte of ciphertext. Figure 1(a) and Figure 1(b) illustrate the output of our simulations for the different numbers of faults 1 ≤ λ ≤ 16. It can be seen that the number of non-observed values exponentially converges to the number of faults as the number of ciphertexts grows, such that for more than m · H m available ciphertexts, the number of non-observed values is almost equal to the number of faults. Moreover, the derived data from our simulation appropriately fits the exponential curve a · e −b·N + c such that a, b, and c are very close to the theoretical values derived from Lemma 1, as you can see in Figure 1(c) and Figure 1(d) which confirms the high accuracy of Lemma 1 in practice.
Considering AES-128 as the target cipher, we also implemented Algorithm 1 and experimentally evaluated the average number of candidates for δ j returned by this algorithm where 1 ≤ j ≤ 15. To do so, for the given number of faults and number of available ciphertexts, i.e., λ and N respectively, we applied λ random faults to generate N (faulty) ciphertexts as before. Next, feeding Algorithm 1 with the non-observed values in our experiment, we generate some candidates for δ j where 1 ≤ j ≤ 15. We repeated this experiment for 100 random secret keys to compute the average number of candidates returned by Algorithm 1 for an arbitrary output byte. When the number of ciphertexts was larger than (m · H m ) where m = 2 8 − λ, for each δ j , where 1 ≤ j ≤ 15, Algorithm 1 returned only one candidate and two candidates for the case λ ≥ 3 and λ = 2, respectively. These results confirm the estimation given in Equation (7) and the accuracy of Algorithm 1.

Multiple Persistent Faults with a Limited Number of Ciphertexts
If a limited number of ciphertexts is available, each set D j for 0 ≤ j ≤ L − 1 is expected to have more than λ elements. If we denote the size of the set D j by λ j , then λ j ≥ λ. In this case, the deterministic relation described in Equation (3) does not hold anymore. This comes from the fact that due to the limited number of available ciphertexts, D j includes λ j − λ elements that are not impossible. We discern two challenges in applying our initial method (described in Section 3.4). First, it is unclear whether the value C min0 [0] is an impossible value or not. Second, the relation C mini [0] ⊕ δ j ∈ D j does not always satisfy for all values of i. In this part, we take a sophisticated technique to circumvent these challenges in the case λ j ≥ λ. The correct value of δ j can be determined by utilizing the process illustrated in Algorithm 2 which is similar to a specialized technique proposed in Section 3.4 with two main changes. First, we utilize a trial and error approach to find the first impossible value C min k [0] ∈ D 0 where k is an unknown value and 0 ≤ k ≤ λ 0 − λ. We make use of C min k [0] as a base to determine the value of δ j . Second, we extend the preliminary technique to the probabilistic setting. Instead of checking the relation where the desired relation C mini [0] ⊕ α holds at least λ times.

Complexity Analysis
Let us assume that C min k [0] is an impossible value. We can take a similar discussion to conclude that there exists an unknown 0 ≤ ≤ λ j − 1 such that δ j equals to α where α is defined as α = C min k [0] ⊕ C min [j]. In the case that λ = 2, we expect the relation In the case that λ ≥ 3, we expect the relation C mini [0] ⊕ α ∈ D j at least λ times holds for holds on average x times with the probability given in Equation (9).
Hence the probability that the relation C mini [0] ⊕ α ∈ D j holds more than λ times can be computed as Equation (10) where Pr[cnt = x|α = δ j ] can be calculated based on Equation (9).
In the case C min k [0] is not an impossible value, the condition of line 13, i.e. cnt ≥ λ, holds with probability P for the calculated α where 0 ≤ ≤ λ j . In the worst case, the first impossible value in the set D 0 is C min λ j −λ . In the case C min k [0] is an impossible value, the condition of line 13 holds with probability P for α = δ j (i.e. (λ j − 2) times and (λ j − 1) times for λ = 2 and λ ≥ 3, respectively). To sum up, the upper bound of the expected number of values for δ j can be estimated by Equation (11) where P is given in Equation (10) provides an estimate of the number of candidates for δ j under the worst-case scenario of no repetition. As a result, the number of candidates may be reduced in practice, particularly when P is large. The adversary needs to guess the n-bit value of sk R [0]. Then, he/she finds other bytes of the last round key based on the relation sk R [j] = sk R [0] ⊕ δ j . As a consequence, the number of remaining candidates for the last round key equals to Table 3 includes the number of required ciphertexts derived from Section 3.2, and the number of remained key candidates derived from Equation (12) for an SPN cipher, and some typical values of λ and λ , when n = 8 and L = 16 (like AES). It also shows that λ = 2 · λ is cost-efficient.

Simulation Results
To experimentally validate the correctness as well as the quality of Algorithm 2, we carried out several random experiments to see how many candidates this algorithm returns on average when a limited number of ciphertexts are available. More precisely, choosing AES-128 as the target cipher as before, for a given number of faults and a number of non-observed values, namely λ and λ respectively, we firstly apply λ random faults and generate N (faulty) ciphertexts such that N = m · (H m − H m−m ) according to Section 3.2. Next, detecting the non-observed values at each output byte, i.e., D j , we feed Algorithm 2 by (D 0 , D j ) to derive some candidates for δ j , where 1 ≤ j ≤ 15, and |D 0 | = |D j | = λ . Iterating this experiment for 100 randomly chosen secret keys we compute the average number of candidates returned by Algorithm 2. Assuming that |D 0 | = |D j | = λ holds for the input of Algorithm 2, we computed the average number of candidates returned by this algorithm for different values of λ and λ . As shown in Table 3, Equation (11) and Equation (12) provide an accurate estimate of the output size of Algorithm 2 and, consequently, the number of key candidates when the probability P is small. However, when P is large (as indicated by the blue color in Table 3), simulation results reveal that the attack performs significantly better than Equation (11) predicts. This discrepancy arises because increasing P results in an increase in the number of repeats.

Side Information about Faults
In this part, we demonstrate that the proposed framework can also leak side information about the injected faults.

Determining the Number of Faults (λ)
The assumption of knowing the number of faults usually makes sense as it can be estimated via a profiling phase by the attacker. However, the profiling phase is not necessarily possible in all applications. An interesting property of the proposed methods described in Section 3.4 and Section 3.5 is that the number of faults can be determined by the adversary as a piece of side information.
If the attacker is able to query enough ciphertexts as it is assumed in Section 3.4, then he can gradually increase the number of ciphertexts and monitor the size of D j for 0 ≤ j ≤ L − 1. The set of minimum values of the byte C[j] (i.e. size of D j ) decreases as the number of ciphertexts increases. After a while, the number of ciphertexts does not affect the size of D j . At this point, the adversary can determine the number of faults as λ = min{|D j |} where 0 ≤ j ≤ L − 1.
If the attacker has access to a limited number of ciphertexts, then he/she needs to perform Algorithm 2. As it is mentioned in Section 3.5, the corresponding value of cnt in the case of α = δ j becomes equal or larger than λ. In other words, if we denote the maximum value of cnt during the execution of Algorithm 2 for retrieving the δ j by cnt max j , then the relation λ ≤ cnt max j holds. Hence, a similar process described in Algorithm 2 can find not only δ j based on the cnt max j but it also determines an upper bound for the number of faults that means λ ≤ min{cnt max j }.
During our experiments to verify the proposed method in Section 3.5, we noticed that appearing an α such that cnt(α ) > λ is almost impossible in Algorithm 2. In other words, most of the time we observed that cnt(α ) ≤ λ for each α , which leads us to the correct value of λ.

Determining Values of Faults
In this part, as another interesting property of the proposed methods in Section 3.4 and Section 3.5, we demonstrate that some information about V and V * can be obtained by the adversary. Let us assume that by performing Algorithm 1 or Algorithm 2, τ key candidates K i are obtained where 1 ≤ i ≤ τ . Besides, we are given D j for 0 ≤ j ≤ L − 1. We assume D u has the minimum size among all D j for 0 ≤ j ≤ L − 1. For each key candidate K i , the corresponding V i can be computed as In this way, the τ returned candidates for the secret key are converted to τ candidates tuples of ( On the other hand, the probability of . We make use of this non-uniform distribution to determine the C max [j] candidates, i.e. D * j for 0 ≤ j ≤ L − 1. It is clear |D * j | ≤ λ and for given enough ciphertexts we know the exact value of λ. However, we overestimate each D * j to do not miss any member of V * ⊕ sk R [j]. Hence, we include λ values of C max [j] in D * j , λ can be fine-tuned dependent on the number of available ciphertexts. Next, we can use Algorithm 3 to narrow the members and determine the correct set D * 0 . Given the (K i , V i ) candidates for 1 ≤ i ≤ τ and D * 0 , we define In this way, τ candidates of (K i , V i , V * i ) are determined and could be used later in the key recovery process, i.e. Section 4.3.
Implementing Algorithm 3 and considering the AES-128 as the target, we also performed several random experiments to confirm the correctness as well as the quality of Algorithm 3. In every single experiment, after applying λ random faults we produce a sufficiently large number of random faulty ciphertexts to collect non-observed values at each output byte, i.e., D j for 0 ≤ j ≤ 15. By sufficiently large number, we mean larger than m · H m , where m = 2 n − λ. Next feeding Algorithm 1 by the derived (D 0 , D j ) we find some candidates for each δ j , where 1 ≤ j ≤ 15. It should be recalled that Algorithm 1 returns a unique value on average when a sufficiently large number of ciphertexts are available. Lastly, we call Algorithm 3 to retrieve the D * 0 . Throughout our experiments, capturing the cnt[x] for all 0 ≤ x ≤ 255, we observed that all members of correct D * 0 can be simply distinguished from the other values, since cnt[x] for x ∈ D * 0 was always much higher in comparison to the x / ∈ D * 0 for correct D * 0 in all of our random experiments. The used filter in line 8 of Algorithm 3 is to consider the possible overlap in V * . Consequently, Algorithm 3 returns the correct D * 0 in practice, when a sufficiently large number of ciphertexts are available.

Key-recovery Process for Remaining Key Candidates
In this section, we propose generalized techniques for the key-recovery process in the ciphertext-only model over the remaining key candidates after the PFA attack.
10: return at most λ first elements of D as D * 0

Attack Scenarios
As we demonstrated in Section 3.6.2, the attacker can obtain corresponding V for each possible key candidate as a piece of side information by performing Algorithm 1 or Algorithm 2. Because retrieving V * is not always possible, we distinguish between two situations based on whether the attacker knows the exact set of V * for a particular key candidate or not. Determining the set of V * depends on available ciphertexts and the size of the utilized Sbox. If the adversary only has access to a small number of ciphertexts, he/she can only use the V. Having a larger number of available ciphertexts, on the other hand, might indicate that the adversary is aware of both V and V * . In contrast to a cipher with a bigger Sbox, such as AES, a cipher with a smaller Sbox, such as LED, requires a fewer number of ciphertexts to produce V * . With few ciphertexts, it is difficult to identify the precise set of V * and hence D * j for a cipher with a bigger Sbox.
In case that a limited number of ciphertexts is available to the adversary, he/she has access only to the V. In contrast, having a larger number of available ciphertexts can be interpreted that both V and V * are known to the adversary. On the other hand, a smaller number of ciphertexts is required to obtain V * for a cipher with a smaller Sbox, e.g. LED, compared to a cipher with a larger Sbox, e.g. AES. In contrast, for a cipher with a larger Sbox it is hard to determine the exact set of V * and therefore D * j with few ciphertexts. Standard cipher designers often pick a larger Sbox but a smaller number of rounds. In contrast, lightweight block ciphers often have a tiny Sbox, but the round function is repeated across a larger number of rounds. Dependent on this distinction, we suggest two distinct techniques for the key-recovery process in Section 4.2 and Section 4.3, based on whether the adversary knows V * or not. This technique gives the attacker the option of selecting one of the approaches dependent on the application. We should point out that in these approaches, the attacker has no knowledge of the location or values of the faults.

Key-recovery Attack Based on V
Let us assume the adversary is given N faulty ciphertexts C 1 , . . . , C N that are produced using an identical faulty Sbox S * . The input and the output of the Sbox layer of the r th round are designated by x r and y r , each of them consists of L words of the same size n = b/L. Similar to previous research, we assume that the cipher's key schedule is invertible and that |sk R | = k = b, i.e. given sk R it is possible to determine the maser key K uniquely. Furthermore, we assume that the round keys are precomputed and are unaffected by faults.
By performing PFA as described in Section 3, the adversary has τ candidates of The correct key and corresponding V are unclear to the adversary. However, it is apparent that the correct values V = {v 0 , · · · , v λ−1 } were not produced in the output of Sboxes throughout the encryption process. If the candidate (K i , Vi) is a correct pair and the state is unaffected by faults, the elements of Vi should not appear after the Sboxes during the decryption of ciphertexts under K i . This distinguisher appears to be beneficial in locating the correct key among the key candidates. However, this fact cannot be used straightforwardly, since the process of decryption is challenging in the PFA model. Although the Sbox used in an SPN block cipher is always invertible, this is not true for S * (.). More precisely, if β / ∈ {V ∪ V * } then we can uniquely determine S * −1 (β); if β ∈ V * then there will be more than one possible value for S * −1 (β) and if β ∈ V then S * −1 (β) =⊥.
We make use of the aforementioned properties to propose a sophisticated key-recovery framework in the probabilistic setting. The process is described in Algorithm 4. For any candidate (K i , V i ) we allocate a counter cnt(K i , V i ) and set them zero. We decrypt each C j under the key candidate K i round by round. In the r th round , for 1 ≤ r ≤ R − 1 , we calculate y r and increase the counter cnt(K i , V i ) if the elements of V i do not appear in y r . More precisely, we increase the counter cnt( Finally, a pair (K i , V i ) with the highest counter value is returned as our guess for the correct pair.

Attack Analysis
In what follows, we estimate the expectation of the counter cnt(K i , V i ) for the wrong key and correct key to proving this can be used as a strong distinguisher for determining the correct key among the key candidates.
Given a wrong pair (K i , V i ), we expect the calculated value y r will be random for 1 ≤ r ≤ R−1. Hence, the probability of achieving a y r such that (Figure 2a). As a consequence, the expected value of cnt(K i , V i ) for a wrong pair (K i , V i ) can be estimated as given in Equation (13).
Similarly, we can calculate the expected value of cnt(K i , V i ) for the correct pair (K i , V i ). Given the correct pair (K i , V i ), cnt(K i , V i ) can be increased under two circumstances as is depicted in Figure 2b. First, we consider a situation in which {y h [0], · · · , y h [L − 1]} have not been alerted by the introduced faults for In this case, cnt(K i , V i ) is increased with probability (1 − λ 2 n ) L·r in the r th round. Hence, the expected increase in cnt(K i , V i ) under this circumstance (left branch in Figure 2b) can be estimated as given in Equation (14).
Second, we consider a situation in which {y h [0], · · · , y h [L − 1]} have been alerted by the introduced faults for As can be followed from Figure 2a, this observation can be extended to other rounds as well. In general, the counter cnt(K i , V i ) is increased again with the probability of p r = (p r −1 · (1 − p) + p r −1 ) · p = r · p r · (1 − p) over the r th round, where r = L − r. Consequently, the expected increase in cnt(K i , V i ) under this circumstance (right branch in Figure 2b) can be estimated as given in Equation (15).
After that, we detect non-observed values at each output byte, i.e, D j for 0 ≤ j ≤ 15, and perform Algorithm 2 to derive the candidates for δ j , where 1 ≤ j ≤ 15. Then we obtained candidates for the last round key as well as its corresponding V = D 0 ⊕ sk R [0]. These steps provided the input of Algorithm 4, by calling which we could retrieve the correct key. We calculated the average of cnt(K i , V i ) for the correct as well as the wrong keys. We have not only observed that Algorithm 4 can uniquely retrieve the correct key, but the experimental results precisely match the theoretical expectations concerning the cnt(K i , V i ) for wrong keys (as it can be seen in Table 4). We also have observed that cnt(K i , V i ) for the correct key is even higher than the expected value according to Equation (16). There are some dependencies in the case of the correct key. As we considered the worst-case scenario for the correct key by disregarding any dependencies, the results are always better for the correct key in practice. As we discussed in Section 4.2.1, this observation adjusts with the fact that the calculated values of state y r−1 for the correct key are not completely random.
The higher actual value of cnt(K i , V i ) for the correct key is helpful for the key-recovery attack. Our results confirm the high accuracy of Algorithm 4 in retrieving the correct key uniquely in practice.
It is worth noting that, with a non-optimized implementation of Algorithm 4 in the Python3 language running on a single core Intel Core i7-9750H at 2.60GHz, the correct key can be recovered in less than a minute when λ > 2. For λ = 2, if we utilize basic data-parallel programming, we can obtain the right key in a few hours on the same machine, albeit the number of key candidates may be more than in the previous cases. To accomplish this, we simply divide the set of key candidates into several equally subsets and run Algorithm 4 on each subset in parallel, ultimately picking the key with the highest counter as the correct key.

Key-recovery Attack Based on V and V *
In this part, we demonstrate how the knowledge of V * can be utilized to provide a more efficient key-recovery attack to retrieve the correct key uniquely given the candidates of (K i , V i , V * i ). Let us assume that the attacker can obtain a list of candidates for (K i , V i , V * i ) by performing Algorithm 2, as it is described in Section 3.6.2. Considering the permutation layer of an SPN cipher, the output of an Sbox in the (r − 1) th round is affected by t Sbox(es) in the r th round in the backward direction, where t depends on the structure of the target block cipher and the target word y r [j]. We denote the corresponding t-value for the j th word by t j . For the block ciphers like AES and LED which utilize strong permutations t j = 4 for all words 0 ≤ j , · · · , y R [j tj ]} ∩ V * i = ∅, the probability that the word y R−1 [j] would not be in V i depends on whether the candidate (K i , V i , V * i ) is a correct or wrong candidate as it is given in Equation (17).
Equation (17) emphasizes that y R−1 [j] ∈ V i is an impossible event for the correct candidate. In other words, if y R−1 [j] ∈ V i happens during the decryption process over a candidate (K i , V i , V * i ), then it can be interpreted that the candidate is certainly wrong. This strong distinguisher can be utilized to filter the wrong candidates of (K i , V i , V * i ), as it is described in Algorithm 5. To determine that a candidate (K i , V i , V * i ) is a correct or wrong candidate we follow this procedure: For each available ciphertext C l , we obtain the words of the corresponding y R−1 as much as possible. If there is a y R−1 [j] ∈ V i , then we remove (K i , V i , V * i ) as it is a wrong candidate.

Attack Analysis
Given a ciphertext C l , we can determine y R−1 [j] only if {y R [j 1 ], y R [j 2 ], · · · , y R [j tj ]}∩V * i = ∅ which occurs with the following probability: Hence, we expect to be able to determine Ψ = L−0 j=0 (1 − 2·λ 2 n ) tj words of y R−1 for a ciphertext. The probability of passing the filter (i.e. y R−1 [j] ∈ V i never happened) by a wrong candidate of ( Given N ciphertexts C 1 , . . . , C N , the probability that a wrong candidate can pass the filter in all N experiments is (1 − λ 2 n ) N ·Ψ while the correct candidate (K i , V i , V * i ) always passes the filer over all experiments. When (1 − λ 2 n ) Ψ is small enough, the correct key may be obtained uniquely.
Algorithm 5 Key-recovery attack on multiple fault model, given candidates of Require: C1, . . . , CN and τ candidates of ( . Hence, given a ciphertext C l we can determine four words of its corresponding y 31 if the corresponding four words from y 32 are not in V * i , which happens with probability (1 − 2·λ 2 n ) 4 . Given N ciphertexts, we expect to find Ω = N · 4 · 4 · (1 − λ 2 n ) 4 words of y 31 , since each ciphertext contains four such sets. Consequently, the expected wrong keys that could pass the filtering will be |W K| = τ · (1 − λ 2 4 ) Ω . Table 5 represents the numerical results for 2 ≤ λ ≤ 7. Following the provided experimental results in Section 5, λ ≤ 6 are more probable for 4-bit block ciphers, e.g. LED, and the proposed attack filters the wrong keys for these values of λ perfectly.
The same argument can be used for AES, and to determine 4 words of y 9 we need 4 corresponding words from y 10 . Table 5 represents the numerical results for 2 ≤ λ ≤ 32 faults on AES. We also implemented Algorithm 5 to simulate the key recovery attack on AES-128. We observed that if a sufficiently large number of ciphertexts are provided, it always delivers the correct key uniquely.

Experimental Results
In this section, we report results from our practical fault injection experiments performed on AES and LED block-ciphers. The main aim of our experiments is to identify the types of persistent faults on the Sbox that are achievable.

Target Platform
Our DUT is the STM32F407VG microcontroller based on the 32-bit ARM Cortex-M4 processor housed on the STM32F4DISCOVERY evaluation board. The core and peripherals of the DUT are clocked at the maximum possible clock frequency of 168 MHz. We compiled our implementations using the arm-none-eabi-gcc compiler with the highest compiler optimization level-O3. We utilize the ST-LINK/v2.1 add-on board for UART communication with our DUT and OpenOCD framework for flash configuration and on-chip hardware debugging with the aid of the GNU debugger for ARM (arm-none-eabi-gdb).
For AES-128, we used a simple round based implementation written in the C language with a focus on Sbox transfer from flash to RAM on boot-up 2 . For LED, the experiments were done on a publicly available implementation 3 .

Experimental Setup for Fault Injection
We utilize Electromagnetic Fault Injection (EMFI) to inject persistent faults into our DUT. Our choice of EMFI is motivated by several reasons. Firstly, the fault injection can be done in a completely non-invasive manner. Moreover, it can be used to inject faults from the front-side of the chip, and thus requires very minimal and in some cases, no preparation of the DUT for fault injection. Our EMFI setup consists of a pulse generator that can generate high voltage pulses (up to 200V) with very low rise times (<4ns). A controller software on the laptop synchronizes the operation of the EM pulse generator and DUT through serial communication. The pulse generator is directly triggered by an external trigger signal from the DUT, which synchronizes the voltage pulse with the DUT's operation. The EM pulse injector is a customized hand-made EM probe designed as a simple loop antenna. The location of the EM pulse injector on the chip is controlled by an XYZ motorized table. Our setup also contains an additional relay switch to perform an automated power-on reset of the device, used during validation of the persistent faults. Refer to Figure 3 for the EM probe used in our experiments.

Persistent Faults on Sbox through EMFI
We consider the scenario of cryptographic software running on such embedded microcontrollers wherein the Sbox is typically present as part of the code stored in flash memory. Upon boot up (or an encryption call), the Sbox is retrieved from flash and is stored in a designated location in the main memory (RAM). Subsequently, the encryption procedure utilizes the Sbox stored in RAM to compute the ciphertext. This approach is desirable to decrease Sbox access times, especially in devices such as constrained microcontrollers with no cache memory. Thus, if an attacker is able to fault the movement of Sbox from flash to RAM, then it leads to a persistent fault in the Sbox. A similar fault model has been reported by Menu et al. [MBD + 19]. In our experiments, we consider two types of Sboxes: (1) 4-bit Sbox (LED-64) and (2) 8-bit Sbox (AES-128). The Sbox values are loaded from the flash memory into the registers, in an iterative manner, using the 32-bit LDR.W load instruction and subsequently, they are stored from the registers into the RAM using the 32-bit STR.W store instruction. For our practical experiments, we fix the width of the pulse to 7 ns (nanoseconds) and the voltage to about 190 v, as we are able to observe reliable faults with these fault injection parameters. We then perform a thorough fault injection campaign, over the entire area and a full sweep of the injection delay to identify the different types of faults that can be observed on the Sbox values.

Fault Injection Results
While our aim is to study the different types of faults achievable on the Sbox values, our main focus is on the number of Sbox values that can be corrupted and not the value that the faulted Sbox is corrupted to, as it is not relevant for our analysis. In the case of the 4-bit Sbox of LED-64, the attacker is much more likely to fault 3-5 elements in the Sbox, compared to faulting a single entry (Figure 4(a)). In the case of the 8-bit Sbox of AES-128, the chance to fault 4 or 6 elements of the Sbox is the highest (Figure 4(b)). Refer to Figure 4(c) for the bar plots demonstrating the frequency of faults affecting single and multiple Sbox elements. We found that in total, 18,865 faults successfully targeted the AES S-box, out of which over 87% were affecting multiple elements, and single element faults were less than 10%. A few faults affect more than half of the elements of the Sbox, which violates our fault model and are considered out of scope (see Section 3.1). We observed about 3% and 0.1% of faults for AES and LED Sbox of such types. While single faults are achievable with high repeatability, it requires detailed profiling of the chip surface to identify exact coordinates and EM pulse parameters. An attacker who does not spend much effort in the profiling phase is more likely to get multiple faults. We also verified the proposed key-recovery attacks by incorporating the faulty Sboxes with 2, 4, 6 injected faults driven from experiments. In all cases, both Algorithm 4 and Algorithm 5 returned the correct key efficiently. We also made two interesting observations: 1) V * < V, which shows a bias in the injected fault and 2) for the driven Sbox with 8 faults, we observed an ineffective fault value. Hence, the exact λ was 7 not 8. Interestingly, Algorithm 3 returned the correct λ which was 7. The details of Sboxes and key-recoveries are available under the following address: https://github.com/hadipourh/faultyaes Further, the presented attacks can be extended to other implementation choices. A common performance-oriented implementation choice for AES is the use of T-tables instead of Sbox. T-tables merge SubBytes, ShiftRows and MixColumns into 4 8times32 t-tables. As the last round of AES does not execute MixColumns, often Sbox is used in the last round, thus keeping our analysis technique unchanged. Even if a modified t-table is used for the last round, the proposed analysis can be trivially adapted, as already shown in [ZLZ + 18]. Similarly, masked implementation based on look-up tables is also vulnerable, as previously demonstrated in [PZRB19].

Conclusion
While the feasibility of persistent fault analysis with a single-fault injected is demonstrated in the literature, there are some challenges in extending the known techniques to the multiple faults setting. In this paper, we provided new insight into PFA by proposing novel methods for extending its application to the multiple faults setting that can be performed in practice. We provided parametric frameworks that can be easily adjusted to different scenarios. This paper can be considered as a significant step in performing PFA under multiple faults in more realistic scenarios.