FaultMeter : Quantitative Fault Attack Assessment of Block Cipher Software

. Fault attacks are a potent class of physical attacks that exploit a fault injected during device operation to steal secret keys from a cryptographic device. The success of a fault attack depends intricately on (a) the cryptographic properties of the cipher, (b) the program structure, and (c) the underlying hardware architecture. While there are several tools that automate the process of fault attack evaluation, none of them consider all three inﬂuencing aspects. This paper proposes a framework called FaultMeter that builds on the state-of-art by not just identifying fault vulnerable locations in a block cipher software, but also providing a quantiﬁcation for each vulnerable location. The quantiﬁcation provides a probability that an injected fault can be successfully exploited. It takes into consideration the cryptographic properties of the cipher, structure of the implementation, and the underlying Instruction Set Architecture’s (ISA) susceptibility to faults. We demonstrate an application of FaultMeter to automatically insert optimal amounts of countermeasures in a program to meet the user’s security requirements while minimizing overheads. We demonstrate the versatility of the FaultMeter framework by evaluating ﬁve cipher implementations on multiple hardware platforms, namely, ARM (32 and 64 bit), RISC-V (32 and 64 bit), TI MSP-430 (16-bit) and Intel x86 (64-bit).


Introduction
Cipher implementations are highly vulnerable to a potent class of physical attacks known as fault attacks. These attacks exploit faults injected during the cipher's execution, causing an error that propagates to the output. The flawed output, called faulty ciphertext, is then used to extract the secret key using differential, impossible differential, or algebraic properties of the cipher. Several block ciphers including the AES [TMA11], PRESENT [BEG13], Simon [TBM14b], Speck [HZFW15], and CLEFIA [AM13] are vulnerable to fault attacks. A single precisely injected fault in any of these ciphers is sufficient to substantially reduce the entropy of its key.
For software implementations of block ciphers, faults are typically injected in memory components such as registers, flash memory [CN10], SRAM [ZZJ + 20], and DRAM [KGGY20]. Alternatively, faults are injected in the processor pipeline, for instance, causing instructions to be skipped [KSV13]. Most faults are injected using glitches in the voltage or clock source of the device or by using optical or electromagnetic radiation. Other faults are injected by exploiting physical properties and the structure of device components. For example, Rowhammer [KDK + 14], RAM-Jam [ATG + 19], SPOILER [IMB + 19], RAM-Bleed [KGGY20], TRRespass [FVH + 20] and Blacksmith [JvdVF + 22] utilize the physical properties of memory to inject faults. load r0, #key load r1, #p1 xor r0, r1 and r0, 0x0f and r0, r3 store r0, c0 While there are a large number of locations in a program where faults can be injected during its execution, only a small portion of these faults are exploitable. There are three requirements that a fault should satisfy to be successfully exploited.
• Fault should impact vulnerable operations. The fault should target the small subset of vulnerable operations in the cipher. For instance, prior works such as [KRR + 20] show that only 4.98% of instructions in an AES implementation 1 are vulnerable. Faults injected elsewhere in the program cannot be exploited.
• Corrupt instruction output. Most fault attacks require that the fault modifies an instruction output and not halt execution. For example, a fault that alters an instruction's opcode can lead to an illegal instruction exception causing the program to terminate. Such a fault is not exploitable because it does not provide the attacker with the faulty ciphertext, which is essential for the attack.
• Propagate to the output. The fault at the target instruction should propagate to the ciphertext. This may not always be the case. For instance, if register r2 is affected by a fault in the instruction mul r2, r1, the fault will not propagate if r1 = 0x0. Such a fault may not be exploitable 2 .
The success of a fault attack is intricately dependent on the cipher algorithm, its implementation, and the underlying hardware. While the vulnerable operations depend on the cipher algorithm, corrupting an instruction output depends considerably on the Instruction Set Architecture (ISA) of the microprocessor, and propagating the fault to the ciphertext depends on the program structure. Understanding the extent to which these factors influence a fault attack would help develop metrics that can be used to compare and evaluate implementations for fault attack resistance. It would help in designing efficient countermeasures and tools that could automatically patch software for fault attack vulnerabilities.
The current practice to evaluate fault attack resistance is by empirically subjecting the device to faults. Unfortunately, this is largely a manual process, requiring expensive instruments and considerable time. Recently researchers have introduced tools to automate the process of finding vulnerable instructions in cipher implementations. While tools such as [ABMP13, KRR + 20] work on C implementations, [BHL18,HBZL19] operate on assembly code. Many of these tools fail to assess the extent to which an injected fault is exploitable. The output of these tools is binary: either an instruction is exploitable or it is not. Few works like, TADA [HBZL19] additionally identify attacks from vulnerable fault instructions. Most of the tools fail to consider cryptographic properties of the cipher [TMA11, CN10,DFL11] that can significantly impact the attack success. Further, most tools do not consider the impact of the underlying hardware in the fault attack.
Our Contributions. In this paper, we introduce an automated framework, called FaultMeter, that not just identifies vulnerable instructions in ciphers but quantitatively evaluates the success with which an injected fault can be transformed into an attack. Figure 1 depicts the flow of FaultMeter. Given a block cipher implementation, FaultMeter (C1) first uses existing tools such as [KRR + 20], to identify the vulnerable instructions taking into consideration the cipher's cryptographic properties. Only faults injected in any of these vulnerable instructions are exploitable and can be used to retrieve information about the cipher's key. (C2) Then, for each vulnerable instruction, it quantifies the probability that an injected fault can corrupt the instruction's output. To perform this quantification, it captures the sensitivity of the underlying microprocessor's instruction opcodes and data to faults and quantifies the probability of successful instruction skips. (C3) It then performs static analysis to capture the probability with which an injected fault can propagate to the program output resulting in a faulty ciphertext. Steps (C2) and (C3) are performed by the Fault Exploitability Quantification module in FaultMeter. The output of this module is a success score for every vulnerable instruction. The success score quantifies the fault attack vulnerability of an instruction. A fault injected in an instruction with a high success score is more likely to yield a successful fault attack compared to a fault in an instruction with a low success score. This quantification is different from contemporary fault attack tools [BHL18,HBZL19,KRR + 20] that provide the list of vulnerable instructions from cipher implementation. We demonstrate the application of FaultMeter in a compiler that generates executables with directed countermeasures automatically inserted to meet user-specified security margins. In addition to the input program, the compiler accepts a user input that specifies the desired security level. The compiler uses the success score from FaultMeter to quantify the fault attack threat in the program at an instruction granularity, then applies appropriate countermeasures to minimize performance overheads while adhering to the desired security level. Our contributions can be summarized as follows.
• We present FaultMeter, the first automated framework that can quantify the fault attack vulnerability of instructions in block cipher implementations. The vulnerability not just depends on the cipher algorithm and its crypto-properties, but the implementation as well as the underlying hardware.
• We study how the processor's Instruction Set Architectures (ISA) have an influence on a fault attack. For the study, we consider six microprocessors, namely, Intel x86 (64 bit), RISC-V (32-bit and 64-bit), ARM (32-bit and 64-bit), and TI's MSP-430 (16-bit).This results in interesting observations, such as TI's MSP-430 and Intel x86 having highest success score compared to other processors, and RISC-V(32-bit) having the lowest success score.
• To demonstrate that the fault attack vulnerability depends on the implementation, we consider three AES-128 implementations that include a lightweight implementation, a T-table implementation, and a bitsliced implementation [RSD06]. We also evaluate two other cipher implementations, CLEFIA-128 and CAMELLIA-128, to demonstrate the scalability of FaultMeter across ciphers.
• We present an application of FaultMeter by using it in a compiler that can automatically tradeoff between security and performance to meet the user's security requirements.
Structure of the Paper. The paper is organized as follows: Section 2 provides the necessary background. Section 3 includes the recent works for automated fault vulnerability detection tools. Section 4 discusses the requirements for a successful fault attack and the FaultMeter framework, expanding on steps C2 and C3. Section 5 describes the implementation and evaluates the FaultMeter framework on different block cipher implementations and processors. Section 6 provides an application of FaultMeter framework, where it used to automatically insert countermeasures based on the user's security requirement. Section 7 provides the limitations of FaultMeter. Section 8 includes the discussion and future work. Section 9 concludes the paper.

Fault Attacks
A fault attack has two phases. In the first phase, the attacker injects a fault during the cipher execution that corrupts the output of an operation, causing an error that propagates to the output, resulting in a faulty ciphertext. In the second phase, the attacker uses the faulty ciphertext to reduce the entropy of the secret key. The cipher algorithm critically determines the success of a fault attack. For example, AES is far more vulnerable to fault attacks compared to ciphers like CLEFIA and PRESENT. It takes a single fault during an AES execution to completely reveal its secret key [TMA11], while 8 [AM13] and 18 [BEG13] faults are needed for CLEFIA and PRESENT, respectively. Within a cipher, too, not all operations are equally vulnerable. For example, a fault in the 8-th round of AES reveals the entire secret key, while a fault in the 9-th round only reveals 32-bits of the key. Faults injected before the 7-th round are not exploitable. Implementations of the cipher also influence the fault attack surface. Keerthi et al. [KRR + 20] for instance, showed that the percentage of vulnerable instructions in seven different implementations of AES-128 varies from 4.2% to 11.4%. A fault in any of these vulnerable instructions can potentially be exploited. In this paper, we provide quantification for the success of a fault attack. We show how the exploitability of a fault injected in a vulnerable instruction can depend not just on the cipher algorithm and the implementation but also on the underlying Instruction Set Architecture of the microprocessor.

Countermeasures for Fault Attacks
Several countermeasures [BG13, GST12, LRT12, GK13, TBM14a, ML08] have been introduced to protect cipher implementations from fault attacks. Most countermeasures detect fault injection using techniques like redundancy, parity, or error correction codes [BBK + 03, GK12, KWMK02, KKG03,WKKG04]. If a fault is detected, the countermeasure either aborts the encryption operation or masks the output of the operation to make the fault unexploitable. Other countermeasures makes use of infection techniques that diffuse faults, making them unexploitable [LRT12,GST12,BG13,TBM14a]. Naïvely inserting either of these countermeasures has considerable overheads, often degrading performance by over two times. In the paper we show how FaultMeter can be used to automatically insert targeted countermeasures during compilation. The countermeasures are tuned to meet the application's security and performance requirements.

Intermediate Representation (IR)
The LLVM compiler converts the high-level representation to machine code using different compiler passes. The transformation pass converts the high-level representation to Intermediate Representation (IR) instructions. FaultMeter uses LLVM's generated IR instructions for the analysis. These instructions are represented in the Static Single Assignment form as defined below.

Definition 1. [Static Single Assignment]
Static Single Assignment (SSA), is a format for program representation, where variables in every assignment are used only once [RWZ88].

Automated Fault Attack Vulnerability Detection
Evaluating the security of cipher implementations against fault attacks is a tedious and manual task. Recently a few tools were introduced to automate the fault attack assessment process. Tools like [KRH17,SKMD17,RRHB21] work at the algorithm level to determine vulnerable operations in a cipher and compute the attack complexity. These tools work directly on the algorithm and do not consider implementation aspects, which can significantly influence the attack success.
Another class of tools [SSR + 20,AWMN20,GJL20,RSS + 21,WLR + 21] work on hardware implementations of ciphers, typically taking RTL or netlist of the design as input to detect fault vulnerable gates. Any fault injected in these vulnerable gates can result in a successful fault attack. FIVER [RSS + 21], for instance, determines effective and ineffective faults on a gate-level netlist while [GJL20] bridges the gap between hardware and software faults.
The  RPL + 14, RBLC15, LFB + 21] quantify the vulnerability and can determine the attack success based on specific fault models. They, however, do not consider cryptographic properties of the cipher, such as its differential [TMA11] impossible differential [DFL11], and algebraic properties [CN10]. A cipher's cryptographic properties significantly abet fault attacks. FaultMeter, on the other hand, builds on existing tools and can evaluate cipher implementations considering complex cipher properties.
The underlying Instruction Set Architecture of the platform greatly affects fault induction. Except for [HSP21], [BHE + 19], and [HKR + 15], none of the other software tools take into consideration the impact of the fault in the underlying processor. While [HSP21, BHE + 19, HKR + 15] evaluates faults in the ARM processor, FaultMeter considers a range of processors from 16-bit to 64-bit, RISC and CISC architectures. This analysis brings out interesting results, such as some ISAs are more vulnerable to fault attacks compared to others. Further, FaultMeter computes the probability that a disturbed instruction output can propagate to the ciphertext. Such quantification helps to customize countermeasures as per the user's requirement.

Automated Fault Attack Countermeasure Insertion
Automatic countermeasure insertion was first proposed in SAFARI [RRHB20], which synthesized hardware and software programs based on a high-level specification of the cipher algorithm and a user-defined security margin. While the generated programs had fault-attack countermeasures inserted automatically, the programs were generic and could not be optimized to suit specific platforms and requirements. For example, SAFARI would synthesize the same code for an IoT edge device as well as a server.
Rather than synthesizing countermeasures like SAFARI, FEDS [KRR + 20] can insert countermeasures in any cipher implementation, thus supporting optimized codes in handwritten assembly. However, FEDS cannot tune countermeasures to meet the user's security requirements. For example, a user developing a highly sensitive application such as an electronic voting machine would require high security guarantees and would not mind the additional performance overheads. On the other hand, less security critical applications, such as a smart-clock, would value performance and energy consumption over security. FEDS would be ignorant of the difference in requirements and provides the same countermeasures for both applications.
Similar to FEDS, FaultMeter can produce highly optimized implementations of block ciphers, however unlike FEDS, it can support countermeasures that can be added automatically based on the user's security requirements. Thus FaultMeter would likely provide a stronger countermeasure for the electronic voting machine and weaker countermeasures for the smart clock. The weaker countermeasures would result in lower performance overheads and energy requirements. A critical aspect in FaultMeter that enables such applicationspecific operations is the ability to quantify the success of converting an injected fault into an attack.

Quantifying the Success of Injected Fault
The probability that an injected fault can be exploited to create a successful attack depends on the (1) cipher algorithm, (2) its implementation, and (3) the underlying hardware.
FaultMeter uses FEDS, to detect instructions in a program that are vulnerable to fault attacks. In this section, we provide a quantification of the vulnerability that can be used to distinguish between less vulnerable and more vulnerable fault injections. The quantification depends considerably on the underlying hardware architecture and program structure. In this section, we provide the basis for the quantification.
Fault Model. Fault injection can either modify the data flow or control flow of the program. We consider a single transient fault injected in the device during the cipher's execution. The fault either corrupts an instruction or the associated data during the program execution. Alternatively, the fault can be inserted in the program counter altering the sequence of instructions executed, i.e. the control flow. After the fault is injected, it propagates towards the output. The fault model considered is a fault injected in code, data, or program counter to randomly alter it.

Requirements for a fault attack exploit.
To exploit the fault, requires three conditions to be satisfied. We discuss these requirements using a toy cipher shown in Figure 2.

C1. [Fault in vulnerable instructions] Only faults injected in certain locations can
yield a successful attack. For example, only faults inserted in the shaded nodes in the Control Flow Graph (CFG) in Figure 2 can be used to recover key K0. These are the vulnerable instructions with respect to K0. Faults injected anywhere else in the program do not yield any information about K0. FaultMeter identifies these vulnerable instructions with the help of existing tools in the Vulnerable Instruction Identification module (refer Figure 1). ...

Non−vulnerable Nodes
Vulnerable Nodes

C2. [Corrupting the output of vulnerable instructions]
When a fault is injected in the instruction, it causes bits in the opcode to toggle. Similarly, faults injected in data can change the values stored in memory or registers, and faults injected in the program counter can alter the sequence of instructions executed. However, not all faults would result in a wrong output. For example, the fault may result in an undefined instruction or get interpreted as another instruction. In the former case, the undefined instruction would result in an exception, causing the program to terminate. Such faults cannot be exploited because the faulty ciphertext is not available. In the latter case, there is a chance that the output of the instruction is not affected by the fault. For example, if a fault in the swap(S0, S1) instruction transforms it to swap(S1, S0), the output of the instruction is unaffected. The opcode encoding significantly impacts the probability that an instruction is corrupted by a fault. Figure 2 shows the probability that a randomly injected fault corrupts the output of the instruction a = a 1 in the six different platforms namely, ARM (32-bit and 64-bit), RISC-V (32-bit and 64-bit), TI's MSP-430 (16-bit) and Intel x86 (64-bit) microcontroller. FaultMeter learns these probabilities offline for each microprocessor. Section 4.2 provides further details about how these probabilities are computed.

Identifying Vulnerable Instructions in an implementation (C1)
Only faults injected in vulnerable instructions can yield a successful attack.  take as input block cipher implementation either in the form of assembly or in a high-level language like C and outputs the list of instructions that are vulnerable to fault attacks. Typically, each tool handles a subset of fault attacks. For example, [KRR + 20] can detect regions of an implementation that are vulnerable to Differential Fault Analysis [TMA11] and Impossible Differential Fault Analysis [DFL11]. Similarly, the DATAC [BHL18] tool identifies locations that are vulnerable to instruction skip fault injections. The first stage of FaultMeter ( Figure 1) uses one of these tools to determine vulnerable instructions in a program. Only a few instructions in a cipher implementation are exploitable by a fault attack. In this paper, we make use of the open-source tool FEDS 3 [KRR + 20] that uses the LLVM Intermediate Representation (IR) 4 of the program to identify vulnerable program instructions. FEDS takes the source code of a block cipher as input and outputs the list of exploitable instructions in the implementation by mapping the known vulnerable instructions as shown in Figure 3. A fault in any of these 'vulnerable instructions' is exploitable.
The input to FEDS is a compiler generated Intermediate Representation (IR) obtained from the LLVM compiler. FEDS converts the IR to a Control Flow Graph (CFG), where the instructions form the vertices of the graph and edges are added based on the program flow. To perform the analysis, FEDS performs backward dataflow analysis on the CFG to identify the vulnerable nodes in the graph. Figure 3 depicts the list of vulnerable instructions (I 1 to I 18 ) that can induce a fault in the output of the operation t ← S0 ⊕ S1.
The result from FEDS is binary. Either an instruction is vulnerable, or it is not. For each vulnerable instruction identified by FEDS, FaultMeter provides a score between 0 and 1. A score close to 1 indicates that a fault injected in that instruction is more likely to result in a successful attack. For non-vulnerable instructions (not identified by FEDS), FaultMeter results a score of 0.

Quantifying the probability of fault-induced instruction corruption (C2)
When a single fault is transiently injected during an instruction execution, it can manifest by either altering or leaving unaltered the instruction output, or terminating the program, as shown in Figure 4. A fault due to the altered instruction output may propagate, resulting in a faulty ciphertext. We classify the fault manifestations into four classes:      The faults in set F 2 , F 3 and F 4 cannot induce a successful fault attack, as the outcomes do not provide the faulty ciphertext that is necessary to carry out the attacks. In this section, we quantify the probability that an injected fault leads to a faulty output. We consider three types of fault injections. First, we consider faults injected in an instruction affecting the opcodes. Second, faults injected in operands (for example registers), and third, faults injected in the program counter affecting the flow of the program. Each subsection considers one of these fault injections.

Fault Injection in instructions
When an injected fault changes bits in an opcode, it can result in a valid or invalid instruction (see Figure 4). An invalid instruction opcode results in program termination (F 4 ), while a valid instruction can have any of the remaining three (i.e. F 1 , F 2 , or F 3 ) outcomes. The probability of these outcomes depends not just on the type of instruction but also on the encoding. They are thus unique to each Instruction Set Architecture. To understand these probabilities, we consider six microprocessors, namely, Intel x86 (64-bit), TI's MSP-430 (16-bit) 5 , ARM (32-bit and 64-bit) 6 and RISC-V(32-bit and 64-bit) 7 , to identify the reliance of fault injection on the underlying architecture. For each of these microprocessors, we generate random programs 8 , cross compile and execute the binary multiple times in a simulator 9 . In each execution, faults are injected in an instruction using simulation tools, such as by modifying the instruction memory and then observing the instruction output. The result of the fault falls in one of the four classes i.e. F 1 , F 2 , F 3 , or F 4 . Figure 5 shows the results from the simulation. These probabilities were computed based on 50 randomly generated programs, with over 25,000 instructions and about a million injected faults in each platform. Figure 5a shows the probability of each fault class on the six microprocessors. Of the four classes, the probability that the fault is activated, i.e. F 1 , is interesting for evaluating fault attacks. Figure 5b shows the impact of 1-bit, 2-bit, 3-bit, and 4-bit fault injections on an instruction. We observe that the probability of Among the 32-bit processors considered, RISC-V has a lower instruction density compared to ARM. This is because RISC-V has considerably large number of unused opcodes compared to ARM, hence low density and higher chances that F 4 occurs. RISC-V 64-bit has a higher instruction density compared to the 32-bit variant. This is because of the additional instructions supported in the 64-bit and not 32-bit. This marginally increases instruction density, lowering the chances that F 4 occurs.
Fault resulting in a valid opcode but the program terminates (F 3 ). In most cases, these appear due to faults in the operand of branch and memory instructions. For example, a fault changes the branch offsets stored as part of a branch instruction leading to an illegal branch target. Similarly, faults may modify the address of load/store instructions leading to an invalid memory operation. Arithmetic and logic instructions can also experience these events. For example, the fault changes the destination register to either stack pointer or program counter, potentially setting illegal values to these registers. In a few cases, faults in the opcode of instructions can also trigger the event F 3 . For example, a fault changing an arithmetic/logic opcode to a branch instruction. F 1 and F 2 ). A fault that results in program completion can either produce a correct output or a faulty output. For such faults, the correct output is produced in 25% of the cases on average across all processors. Some examples where the output of the program does not change in spite of fault injection are provided here:

Faults that result in program completion (events
• ARM supports conditional execution of instructions, where an instruction is executed only if certain conditional flags are set. We found that in many cases, a fault injected in the condition bit present in the instruction did not alter the output.
• Often, multiple registers may hold the same data. Few bits in the instructions specify the operands to be used for the source and destination registers. A fault that changes the source register to another holding the same data would not affect the output.
• Certain faults were observed to change the arithmetic and logic opcodes in a way that does not alter the outputs. For example, a fault that changes the opcode for add to the signed equivalent adds in ARM may not always alter the output.
• Compare operations have outputs of True or False; hence with high probability, the output remains the same even after fault injection.
• Faults that alter the memory address of load instructions in a way that the new address holds the same data as the original do not alter the program output.

Fault in data memory and registers
Faults injected in data memory, or registers can influence the output of an instruction. With respect to Figure 4, a disturbance in data memory or registers can cause in instruction to provide a wrong output (F 1 ) or cause program termination (F 3 ). In some cases, the fault would go unaffected (F 2 ). However, such faults in data or registers cannot result in an invalid opcode (i.e. F 4 ). The probability of the events F 1 , F 2 , and F 3 depend not just on the type of instruction but also on the width of the registers.
To understand the probabilities of these events, we consider faults injected in 8-bit, 16-bit, 32-bit, and 64-bit registers. For each register size, we generate random C programs, compile, simulate random fault injections in registers, and observe the outputs of each instruction. The event F 3 is observed when the fault modified registers are used to hold addresses for branch, load, or store instructions. The modified registers result in invalid instructions causing program termination.
In arithmetic and logic instructions, these faults result in either a wrong output (i.e. F 1 ), and in some cases do not affect the output (i.e. F 2 ). For example, in a conditional branch such as 'if (a < b)' a fault in either a or b does not alter the output in half of the executions. Arithmetic instructions like multiplication mul a,b do not alter the output if one of the operands is zero. Similarly, in 32-bit platforms, the output of and a,b is not altered when one of the operands is 0×FFFFFFFF.

Faults in the Program Counter
Unlike faults in instruction and data, the effect of a fault in the Program Counter (P C) is influenced by the control flow graph of the program. If the fault modifies the P C in such a way that the new address falls outside the control flow graph, i.e. an address outside the program, then the program is likely to terminate (F 3 ) (refer Figure 4). On the other hand, if the fault modifies the P C such that the new address lies in the control flow graph, then either events F 1 or F 2 are likely. These faults either skip instructions or repeat the execution of instructions. The former generally occurs when the fault causes the P C to be incremented, while the latter generally occurs when the fault decrements the P C. Not all  F 1 faults are exploitable. The exploitable F 1 faults are restricted to those where one or more vulnerable nodes in the program are skipped or executed more than the expected number of times.
To understand the probabilities of these events for a given cipher implementation, we generate the control flow graph of the program with vulnerable nodes marked. These nodes are identified by the Vulnerable Instruction Identification module (Section 4.1). Fault injections are simulated in the P C for each node of the CFG and the flow of the program is observed after the fault injection. The events F 1 , F 2 , and F 3 are counted to compute the probabilities of occurrence. Figure 3 is depicted in Figure 6. The cipher is implemented in the 16-bit TI MSP-430 and the address of each IR instruction is also shown in Figure 6. If a single bit fault is injected in the program counter, for instance in node I 6 with the address is <311C>, the P C can take 16 possible values due to the fault injection, 11 of these values result in a P C outside the program causing the program to terminate. The valid P Cs after fault injection are <3114>, <3118>, <311E>, <313C>, and <319C> as these addresses fall within the CFG. Of these addresses, <311E>, <313C>, and <319C> result in a forward jump, skipping vulnerable nodes (shaded in Figure 6). These faults result in exploitable F 1 events. The addresses <3114> and <3118> result in a backward jump, causing the re-execution of vulnerable nodes. These too result in exploitable F 1 events. There are no F 2 events in this example. Hence the probability of F 1 , F 2 , and F 3 events for the fault in the P C corresponding to node I 6 are: 0.31, 0.0, and 0.69 respectively. In a similar way, faults injected in the P C corresponding to the I 53 results in probabilities 0.06, 0.25, and 0.69 respectively. Among all faults, only one that modifies the P C from <31BC> to <313C> resulting in the re-execution of the vulnerable instruction I 18 which is a vulnerable instructions and hence marked exploitable.

Computing the probability that an injected fault causes for an instruction
Of the four events, F 1 , F 2 , F 3 , and F 4 , only event F 1 is useful in a fault attack because only in this case the fault induces an error in the program and does not terminate it. We denote the probability that the output of the j-th instruction can be faulted by I1  I2  I3  I4  I5  I6  I7  I8  I9  I10  I11  I12  I13  I14  I15  I16  I17  I18  I19  I20  I21  I22  I23  I24  I25  I26  I27  I28  I29  I30  I31  I32  I33  I34  I35  I36  I37  I38  I39  I40  I41  I42  I43  I44  I45  I46  I47  I48  I49  I50  I51  I52  I53  where P(F 1 ) is the probability of event F 1 occurring when a fault is injected and ' * ' denotes any of the fault injections i.e. in the instruction, memory, or program counter. Figure 7 depicts these probabilities for each IR instruction for the pseudo-code shown in Figure 3 with the three fault injections corresponding to a fault in the instruction, data/register, or the program counter. These probabilities are represented as P(C i 2,j ), P(C d 2,j ), and P(C p 2,j ) respectively. To generate these probabilities, the source code written in C is first compiled using the LLVM compiler to generate a binary and also the Intermediate Representation (IR).
Using the probabilities obtained in Section 4.2.1 and 4.2.2, each instruction and the operands in the generated executable are analyzed to determine the corresponding P(C * 2,j ) (where * is either i or d). These probabilities are at the instruction level and is the only hardware dependent step. We map these probabilities onto the corresponding machine-independent IR instructions generated by the LLVM compiler. Unlike opcodes and operands, faults in the P C are directly evaluated using the control flow graphs (CFGs) generated from the IR instructions as discussed in Section 4.2.3.
The graph in Figure 7 shows that the faults injected in registers or memory have a higher probability of corrupting the instruction output compared to the faults injected in the opcode or the program counter. This is because faults injected in opcodes and the program counter are more likely to terminate the program due to invalid opcodes F 4 or an invalid program counter.

Fault propagation from the instruction to the ciphertext (C3)
The output of an instruction in the program can be corrupted either by a fault injected in that instruction (discussed in Section 4.2.1) or a fault injected in a previous instruction that propagates to the given instruction. The latter depends on the program structure. For example, consider the instruction L 6 (refer Figure 2) resulting in a non-zero value of b. If a fault induced in this instruction changes the value of the byte b to another non-zero value, then the fault will not propagate to the ciphertext due to the conditional statement in L 8 . Thus, only a fault that changes b to zero would propagate to the output. We denote the probability that the fault propagates through a sequence of instructions I i , I i+1 , . . . , I i+n as P(C 3,(i,i+1,···,i+n) ).
Fault propagation can be done in two ways. The first is through registers, where the output of one instruction is used as an input to another. Alternatively, faults can propagate through memory operations. For instance, by a store of faulted data to memory, followed by a subsequent load from the same address. To compute P(C 3,( * ) ), FaultMeter classifies instructions into three different classes based on the model proposed by Guanpeng et al.
in [LPH + 18]. The first class considers fault propagation through registers. The second class considers fault propagation from a corrupted store to a subsequent load, while the third class considers control flow instructions.
.... Fault propagation through registers. We consider a sequence of instructions, where the output of instruction is the input to another, and the data is transferred through registers. This data-dependent sequence of instructions has two properties: (a) it ends with a store or a control flow instruction, and (b) the output of every instruction in the sequence flows to the input of a subsequent instruction in the sequence. As an example the first four IR instructions (I 1 , I 2 , I 3 , and I 4 ) from Figure 3 have two data-dependent sequences, namely, (I 1 , I 3 , I 4 ) and (I 2 , I 3 , I 4 ) as shown in Figure 8a. Note that I 4 is a store instruction that marks the end of the data sequence. For each control flow or store instruction, FaultMeter computes the maximum probability that the output of the instruction is corrupted. To do this, it identifies all possible data-dependent sequences that terminate with the given control flow or store instruction, and computes the maximum probability that the output of the instruction is corrupted. The choice of maximum probability gives the highest success with which the output of an instruction can be corrupted. For instance, the maximum probability that I 4 (Figure 8a) is corrupted is given by P(C 3,(4) ) = max P(C i 2,4 ), P(C 3,(3,4) ), P(C 3,(1,3,4) ), P(C 3,(2,3,4) ) .
(2) P(C i 2,4 ) is the probability that a fault injected in instruction I 4 and corrupts its output. All other probabilities correspond to a fault injected in a predecessor instruction (either I 1 , I 2 , or I 3 ) that propagates, corrupting the output of I 4 . To compute these probabilities, we take an example of the sequence of instructions I 1 , I 3 and I 4 . If a fault is injected in the instruction I 1 then, assuming independence between instructions, the fault propagation probability at the end of the sequence is computed as P (C 3,(1,3,4) where P(C i 2,1 ) is the probability of a fault injected in instruction I 1 (refer Section 4.2.1). This fault propagates through I 3 and I 4 due to the data dependent path. We quantify this by considering the fault in the data in the corresponding instructions, i.e. P(C d 2,3 ) and P(C d 2,4 ) (refer Section 4.2.2). In a similar manner we quantify the fault propagation probability in each instruction sequence P(C 3,(3,4) ) and P (C 3,(2,3,4) ).

Fault propagation in memory dependent instructions.
To compute the fault propagation through memory operations, we keep track of load and store dependencies. For example, in Figure 8b, the store in I 29 and the subsequent load in I 47 are to the same address. Thus a corrupted store in I 29 corrupts the load in I 47 . From the corrupted load, the fault propagates through registers in data-dependent instruction sequences. For example, the next data dependant instruction sequence is from I 47 to I 50 , which produces an output. It should be noted that the current version of the work assumes that the addresses can be resolved statically. The support for load and store operations that can only be resolved at runtime is left as future work. Fault does not change the control flow path. For example, the fault in I 21 can influence the store in I 29 only if the branch is taken. Thus, the probability of fault propagation from I 21 to I 29 also depends on the probability that the branch is taken. To compute these probabilities, we determine all the store instructions along the control flow path and then evaluate fault propagation to the stored memory location using the data-dependant sequence (discussed earlier in this section). Thus, the probability that a fault injected in I 21 can propagate to the memory location used in the store in I 29 is computed as follows: where T 26 is the probability that the branch in I 26 is taken. To generalize this approach when there are multiple stores in both taken as well as not-taken branches, we independently compute probability of fault propagation to each store using the data-dependant sequence analysis as discussed earlier, and then consider the maximum probability.
Fault changes the control flow path of the program. For example, a fault in instruction I 26 can cause the taken branch to be not-taken (or vice-versa) and can result in the store instruction at I 29 to be illegally executed (or not executed). In either case, the corresponding memory operation in the store instruction is corrupted. To compute the fault propagation probability from the control flow instruction to the corrupted store, we would need to consider the conditional probability that the branch is illegally taken or not-taken due to the fault in the branch instruction. The probability that the branch is illegally not-taken due to the fault in the control flow instruction that alters the output of the instruction and is given by This fault corresponds to the event F 1 defined in Section 4.2, where the fault is activated. In Equation 5, P(T 26 ) is the probability that a branch is taken, and P(C i 2,26 ) is the probability that the output of the 26-th instruction (the branch) is corrupted by the fault. The probability of the branch taken or not taken is determined empirically. For example, a corrupted output of a branch instruction could make a not-taken branch taken or vice-versa. Similarly, we can compute the probability that the branch is illegally taken due to the fault, i.e. P[T 26 | F 1 ], in a similar manner.
Considering Figure 9, the probability that the memory location used in the store instruction I 29 is corrupted due to a fault in I 26 can either result in execution or non execution of the store. Assuming that the branch at I 26 is taken, the fault can force the branch to be not-taken, resulting in the execution of the instruction I 29 corrupting the corresponding stored memory. On the other hand, assuming that the branch at I 26 is not-taken, the fault can force the branch to be taken, skipping the store instruction at I 29 . This too corrupts the corresponding memory. Thus, the probability that the memory used in the store is corrupted is given by P(C 3,(26,27,28,29) ) = P T26 (C i 2,26 ) + P NT26 (C i 2,26 ) = P(NT26) × P(C i 2,26 ) + P(T26) × P(C i 2,26 ) = (1 − P(T26)) × P(C i 2,26 ) + (P(T26) × P(C i 2,26 ) = P(C i 2,26 ) . (6)

The FaultMeter Algorithm
The FaultMeter algorithm (Algorithm 1) takes three parameters. The first parameter, CFG is the control flow graph of the implementation under test (IUT). The second is V_list, which is a list of vulnerable instructions in the IUT. These vulnerable instructions are obtained from the Fault Vulnerable Identification module and the only locations in the IUT where an injected fault can be exploited. The third parameter is a processor specific lookup table, TPF, comprising of instructions and the fault activation probabilities that were obtained empirically as discussed in Section 4.2. The fault can be injected in instruction, memory/register, or in the program counter. Without loss of generality, Algorithm 1 considers faults injected only in instructions. The algorithm assigns a probability to each node in V_List. This probability, SuccessScore, denotes the success with which injected faults propagate to the output. The algorithm starts executing from Main(Line 1-7). For each vulnerable location (i.e. each element in V_List), it creates a data-dependent graph (DDG). These graphs are acyclic and show the propagation of a fault from I l to the output. For example, in Figure 3, 18 out of the 54 instructions present are vulnerable, hence the algorithm would have 18 different DDGs. For each of these graphs DDG[I l ], function ComputeP is invoked. The second parameter passed to ComputeP holds the fault activation probability for the instruction I l . The function returns the probability with which a fault in instruction I l propagates to the output.
The ComputeP function extracts the path from I l to I n such that I n is a store, a branch instruction, or an output instruction. If there exists a branch between I l and I n in the CFG, then the probability of I n being executed is determined (Line 11-14). This depends on the probability that the branch is taken or not-taken. For example, in Figure 9, the data dependant path is I 21 , I 27 , I 28 and I 29 . The probability that I 29 executes is the probability that the branch at I 26 is taken. Line 15, computes the fault propagation probability from I l to I n .
Lines 16-28 identify the next sequence of instructions to evaluate. If I n is a branch, it identifies the probability that store instructions in the taken and not-taken paths are I l , · · · , In ← Extract path from I l to In of DDG l , where In is the first be store, branch, or output instruction in the path. /* instruction sequence I l , I l 1 , I l 2 · · · , In */ ComputeP terminates if I n is an output instruction then P(C 3,(I l ,···,In) ) is assigned with a probability p (Line 25). Suppose the instruction I n is a store, then a store to load dependency (I n to I l ) is determined to propagate the fault further (described in Section 4.3). ComputeP is recursively invoked (Line 28) with the load instruction I l as the fault activation location of the DDG[I l ] and p, the probability of fault corrupting the output of I l (refer example in Figure 8b). The function returns with the value of the fault propagation probability for I l , when it finds an output instruction. Figure 10 shows the fault propagation for 2 locations I 1 , I 15 for two processors TI MSP-430(16-bit) and RISC-V(32-bit) for the toy cipher given in Figure 3. The SuccessScore for these locations is computed using Algorithm 1. The x-axis highlights the IR instructions I 1 to I 54 , and the y-axis shows the fault propagation probability. The graph shows that I1  I2  I3  I4  I5  I6  I7  I8  I9  I10  I11  I12  I13  I14  I15  I16  I17  I18  I19  I20  I21  I22  I23  I24  I25  I26  I27  I28  I29  I30  I31  I32  I33  I34  I35  I36  I37  I38  I39  I40  I41  I42  I43  I44  I45  I46  I47  I48  I49  I50  I51  I52  I53   the fault propagation probability varies based on the processor and the location of fault activation.

Implementation and Evaluation
In

Vulnerable Instruction Identification
The Vulnerable Instruction Identification module (refer Figure 3) takes the block cipher source code as input and marks the fault vulnerable instructions from the implementation.
FaultMeter uses the FEDS [KRR + 20] framework for this stage. FEDS determines all the exploitable instructions from IR instructions that are susceptible to fault attack. The Vulnerable Instruction Identification module works by converting the IR instructions to control flow graph and also finds the dependencies between the instructions using a reverse data flow analysis on the control flow graph (refer Section 4.1).  Figure 11: SuccessScore of vulnerable instruction from an AES(LookUp Table) based implementation on six different processors. Each cell represents an instruction and the color code represents the SuccessScore.
output of the Vulnerable Instruction Identification module. The percentage of exploitable instructions varies from 6.56% AES-128(LookUp Table) to 23.2% (CAMELLIA-128) from the total instruction in the control flow graph. As an example, the AES (T- Table) implementation has 4.2% instructions that are exploitable to fault attack from the total of 4299 IR instructions present. A fault induced in any of these vulnerable instructions can result in a successful fault attack. These results just depend on the cipher implementation and are agnostic of the underlying hardware used.

Fault Exploitability Quantification
Fault Exploitability Quantification module takes the Control Flow Graph with marked vulnerable nodes as input and processor dependent fault activation probability and quantifies the vulnerability using Algorithm 1. The module considers faults injected in instruction opcodes, memory, registers, and in the program counter. In each case, the probability of the injected fault propagating to the output is computed. This probability depends on the underlying hardware architecture and the program structure.
FaultMeter evaluation on different architectures. To demonstrate that the fault propagation varies based on program structure, we have considered five cipher implementations (given in Table 3) and five RISC microprocessors: ARM (32 and 64 bit), RISC-V (32 and 64 bit), and TI MSP-430 (16 bit). We also considered the Intel x86 (64-bit) CISC architecture. Figure 11 shows the memory layout of exploitable instructions for the AES-128 LookUp Table based implementation (Table 3). Each colored cell shows the exploitable instructions, with the color indicating the probability that a fault in that instruction can corrupt the ciphertext and result in a successful attack. Notice that on each processor architecture, the vulnerable instructions are the same. However, the difference in color across the architectures indicates that the exploitability of the instruction differs from one architecture to another. TI's MSP430 (16-bit) RISC processor and Intel x86 (64-bit) processor have instructions with high fault susceptibility. This means that a fault injected in these processors have a high chance of disturbing the execution compared to a fault in the same instruction in the other processors. This is because, comparatively, TI MSP430 has a densely packed instruction set (refer Section 4.2); therefore, there is a smaller probability of obtaining an invalid opcode. This results in high P(C 2 ). Similarly, the large number of instructions in Intel's x86 platforms, owing to the CISC architecture, provides a similar impact.  Table) 95  Table), Figure 12a, 0.5% of instructions (≈ 72) instructions have maximum SuccessScore is 0.5 across different architecture, whereas for AES-128 (T-Table) Figure 12b and AES-128 (BitSliced) Figure 12c, the percentage of vulnerable instructions with similar SuccessScore is 0.4% (≈ 18) and 1.5%(≈124) respectively. Of the three AES implementations, the BitSliced implementation is most prone to fault attacks because it has the highest percentage of vulnerable instructions (Table 3), and it has a higher fault propagation probability across all the architectures (Figure 12). Similarly, the T- Table implementation is the most secure of the AES-128 implementations considered. Figure 12f compares the fault propagation probability of the cipher implementations (given in   Table)) after inserting countermeasures on three processors with Uin 0.4 and 0.8. Each cell represents an instruction.
infrastructure would require much more secure implementations compared to an application in a consumer device. Thus for such applications, designers typically would want to prioritize security in lieu of performance. Such trade-offs would be less acceptable for the consumer device, especially in a resource-constraint device, where each byte and each clock cycle is valuable. In this section, we demonstrate the use of FaultMeter to cater to the diverse security requirements of applications. We use FaultMeter to automatically insert countermeasures based on the user's input, which is a number between 0 and 1 defining the extent to which security is important in the application. A value close to 1 implies that the user prioritizes security over performance, while a value close to 0 implies that performance is critical. The SuccessScore produced by FaultMeter is used to tune between security and performance. For instance, if the user input is U in , then all the instructions where SuccessScore ≥ (1 − U in ) are protected by automatically inserting countermeasures only in these locations. All other locations with lesser SuccessScore are not protected.
To demonstrate automatic countermeasure insertion, we use the spatial redundancy countermeasure as a case study. The countermeasure addition module (Figure 1 Table) 0  Table 4 shows the percentage increase in code size and clock cycles for different ciphers realized in the TI MSP-430 (16-bit) and ARM (64-bit) processors for a naïvely protected executable and an executable generated with FaultMeter based countermeasures. We observe that (a) the percentage increase in code size and execution clock cycles is far less with FaultMeter based protection. Unlike the naïve approach where all instructions in the executable are protected, with the FaultMeter based protection, only instructions with a SuccessScore ≥ (1 − U in ) are protected. (b) The increase in code size and execution varies directly with the value of U in and inversely with fault coverage.
The TI MSP-430(16-bit) is the most vulnerable processor compared to other architectures, hence the percentage increase in code size is higher compared to ARM (64-bit) processor. From the table it is evident that the performance overhead after the countermeasure addition varies depends on the implementation as well as the underlying architecture. Figure 13 shows the heat map of vulnerability of instructions for two different User Input (U in ) values for the AES-128(Look-Up Table) (Table 3) on three platforms after the countermeasure is inserted. From the figure, it is evident that a higher U in results in more protected implementations. The countermeasures inserted too is different in each platform. The measure of SuccessScore after the countermeasure inserted is computed independently of the previous experiments. Figure 14 shows the percentage of instructions protected for different user inputs. For example, for AES-128 (T-Table) on a TI MSP-430, when U in = 0.7, 4% of vulnerable instructions are protected, while for an ARM 64-bit, only 1% of vulnerable instructions are protected

Limitations
In its current form, there are two limitations of FaultMeter.
• Fault Vulnerable Identification module used in FaultMeter identifies the vulnerable instructions, while FaultMeter quantifies the vulnerability. If an instruction is identified incorrectly as not vulnerable by the Fault Vulnerable Identification module, FaultMeter will not be able to quantify it. Similarly, if an instruction is marked incorrectly as vulnerable, the output of FaultMeter is also incorrect.
• FaultMeter currently works with unprotected implementations of block ciphers. It needs to be extended to support implementations where the protection is already incorporated. In order to do this, FaultMeter would need to distinguish instructions that are present due to the countermeasure. Distinguishing these countermeasure related instructions from other instructions is challenging at the compiler's intermediate representation level and therefore left as future work.

Discussion
Non-Cryptographic Applications: Besides cryptography, FaultMeter can be used for other security applications, such as information flow analysis [SM03], safety-critical applications etc. The major challenge is to find the sensitive locations from the application software.

Conclusion
FaultMeter is an automated framework that can quantify the success with which an injected fault can be exploited. We show that this success probability depends on the cipher algorithm, its implementation, as well as the Instruction Set Architecture (ISA) of the processor. Our evaluation of five cipher implementations on six hardware platforms brings out interesting observations. For instance, TI MSP 430 (16-bit) and Intel x86 (64-bit) are the most vulnerable to fault attacks. Comparing the 32-bit RISC processors, ARM is more vulnerable to fault injection than RISC-V. On the other hand, the 64-bit variant of RISC-V is more vulnerable than the equivalent ARM variant. Further, the smaller TI MSP-430 processor is the most vulnerable amongst all processors considered. Comparing different implementations of AES, the T-table implementation is the most secure against fault attacks. The quantification that FaultMeter provides can be used to strategically used to choose the right countermeasure in block cipher implementations to meet the application's security requirements as we demonstrated in the paper.