A Finer-Grain Analysis of the Leakage (Non) Resilience of OCB

. OCB3 is one of the winners of the CAESAR competition and is among the most popular authenticated encryption schemes. In this paper, we put forward a ﬁne-grain study of its security against side-channel attacks. We start from trivial key recoveries in settings where the mode can be attacked with standard Di ﬀ erential Power Analysis (DPA) against some block cipher calls in its execution (namely, initialization, processing of associated data or last incomplete block and decryption). These attacks imply that at least these parts must be strongly protected thanks to countermeasures like masking. We next show that if these block cipher calls of the mode are protected, practical attacks on the remaining block cipher calls remain possible. A ﬁrst option is to mount a DPA with unknown inputs. A more e ﬃ cient option is to mount a DPA that exploits horizontal relations between consecutive input whitening values. It allows trading a signiﬁcantly reduced data complexity for a higher key guessing complexity and turns out to be the best attack vector in practical experiments performed against an implementation of OCB3 in an ARM Cortex-M0. Eventually, we consider an implementation where all the block cipher calls are protected. We ﬁrst show that exploiting the leakage of the whitening values requires mounting a Simple Power Analysis (SPA) against linear operations. We then show that despite being more challenging than when applied to non-linear operations, such an SPA remains feasible against 8-bit implementations, leaving its generalization to larger implementations as an interesting open problem. We last describe how recovering the whitening values can lead to strong attacks against the conﬁdentiality and integrity of OCB3. Thanks to this comprehensive analysis, we draw concrete requirements for side-channel resistant implementations of OCB3.


Introduction
The side-channel cryptanalysis of block ciphers has been a topic of intensive research over the last two decades. One important outcome of these research advances is that if no special care is taken, the implementation of any block cipher can be targeted by a side-channel attack, quite independently of its internal components [MOP08]. For example, to a large extent the methods developed to analyze leaking implementations of the AES apply to most (e.g., lightweight) block ciphers developed afterwards [HPGM16]. Yet, some works indicate that while the structure of a block cipher has limited impact on the side-channel attack vectors that can be turned against it (in part because most modern ciphers follow quite similar design principles), the way the block cipher is used in a mode of operation can have a more significant impact [BBC + 20].
One of the first examples of such an impact is the first-order Differential Power Analysis (DPA) against the AES in counter mode with unknown initial counter put forward by Jaffe [Jaf07]. It was followed by investigations on HMAC which turned out to be attackable given some additional tweaks as well [MTMM07,BBD + 13]. A similar issue of secret unknown constants was observed when trying to attack the MILENAGE algorithm used in 3G/4G communications [LYS + 15]. Eventually, attacks against the XTS-AES mode of operation have been discussed in [LFDC19] and Jaffe's attack was recently improved in the context of the NIST CTR_DRBG mode [Mey20].
In parallel, the design of modes of operation specifically tailored to mitigate side-channel attacks have evolved under the umbrella of leakage-resilient cryptography. Many constructions have been proposed for this purpose, leveraging various design ideas. We mention [DP08] for one of the first modes exploiting ephemeral key evolution and [BKP + 18] for the introduction of strengthened key and tag generation mechanisms, and we mention TEDT [BGP + 20], ISAP [DEM + 20] and Spook [BBB + 20] as three authenticated encryption schemes based on such ideas.
These research advances have been recently discussed in a comprehensive manner at Crypto 2020 [BBC + 20]. The main outcome of this discussion is that for some modes of operation, it is possible to reach high security against side-channel attacks without uniformly protecting all the components with expensive countermeasures, leading to so-called leveled implementations.
In this paper, we are interested in the OCB3 mode of operation which, for simplicity, we denote as OCB [RBBK01]. It is among the most popular solutions for instantiating an authenticated encryption scheme efficiently and it is one of the winners of CAESAR competition. In the aforementioned discussion on modes of operation [BBC + 20], OCB is referred to as a "Grade-0" design, where all individual components need protection against DPA. While such an observation is essentially, correct since OCB uses the same long-term key in all the executions of its underlying block cipher, our research question is whether some finer-grain modeling would allow reaching a more balanced view? In other words, do all the computations involved in an execution of OCB require the same level of security against side-channel analysis or can we identify different types of DPAs against different computations within OCB, or even parts of the computations that only require Simple Power Analysis (SPA) resistance, leading to some possibility of finer-grain leveling in its secure implementation? We contribute to this question by exhibiting several attack vectors against various implementations of OCB, where its most sensitive computations are gradually protected against DPA, and by discussing the complexity of these different attacks.
For this purpose, we first highlight that trivial DPAs can be mounted against the initialization, the processing of associated data, the processing of a final incomplete message block and the decryption of OCB. Preventing these attacks consequently requires a strong and uniform protection of all the block cipher calls within OCB. Note that throughout this paper, the term DPA (resp., SPA) denotes an attack where the number of plaintexts for which the leakage can be observed for a fixed key is under adversarial control (resp., is bounded by design).
We then consider an OCB encryption where the initialization phase is well protected against DPA and discuss attacks that are able to circumvent the secret whitening that this secure initialization implies. A straightforward option is to target the secret whitening as a type of masking scheme and to perform an attack mixing the leakage of two target intermediate values [HTM09,CR17]. Yet, we put forward a more efficient solution that allows attacking such an implementation with a horizontal attack exploiting the relations between consecutive whitening values. It can be viewed as a DPA with a more expensive key guessing strategy, which is well suited to implementations with reasonable noise levels. We show experimentally that it is the best attack vector in the case of an implementation of OCB in an ARM Cortex-M0 and analyze it theoretically.
We finally consider the case where all the block cipher calls of OCB are strongly protected against DPA and the whitening becomes the only target for a side-channel attack. Recovering these whitening values requires performing an SPA against linear operations. We first show experimentally that despite more challenging than similar attacks targeting non-linear operations like [KPP20, BBC + 20], such SPAs are feasible against 8-bit implementations. We next argue that they become hard when targeting larger intermediate values (again, even more than when targeting non-linear operations). We use this observation to confirm the existence of a risk, put forward that (as in general for single-trace SPA) this risk is hard to quantify since it is implementation-and setupdependent, and leave the generalization of the exhibited SPA to 16-bit or 32-bit implementations as an interesting open problem. We finally show how to exploit the recovery of the whitening values with simple attacks that break the integrity and the confidentiality of OCB.
Based on this finer grain analysis, we conclude that a secure implementation of the OCB encryption requires the strongest DPA protections for its initialization, that the block cipher calls used for the message processing may benefit from slightly weaker countermeasures if only encryption leaks without associated data (since the adversary has to deal with an additional secret whitening to target them), and that the whitening itself becomes a more challenging target in case all the block cipher calls are sufficiently protected. In this last case, we additionally mention that preventing the attack against the whitening layer may be easier as the target attack is an SPA and the operations to protect are linear, making it easier to mask and enabling a mild leveling.

Notations
In this paper, we use capital letters for random variables and small caps for their realizations. We use sans serif font for functions (e.g., F) and calligraphic fonts for sets (e.g., A). We use small bold caps for vectors (e.g., v). We denote the conditional probability of a random variable A given B with P [A|B]. We denote as m j = m j i the ith block of the plaintext number j. We denote as m j i = m j i (κ) the κth byte of the ith block of the plaintext number j. We use similar notation for ciphertexts c j i = c j i (κ) . When explicit by the context, we will sometimes omit the superscript j if only one plaintext with several blocks is considered. When there is no ambiguity, we will also sometimes omit the byte-indexing for readability reasons. Given a block cipher BC, a master key k and a plaintext block m i , we denote as BC k (m i ) the encryption of m i under k. We will further denote by k i the ith round key. We finally denote the concatenation of two values a and b as a||b.

Template attacks
The main experiments in this study will be done using Template Attacks (TAs) [CRR02]. Such profiled attacks correspond to a strong (ideally the strongest) adversary which is best suited to our main motivation, which is to understand the (ideally worst-case) security of the different parts of OCB. Let t denote a leakage measured on a cryptographic device that manipulates a target intermediate value v = F(m, k) associated to a known plaintext byte m and a secret key byte k. In a TA, the adversary first uses a vector of profiling traces t p in order to estimate a leakage model, next denoted asP model [t | m, k]. The profiling traces are typically obtained by measuring a device that is similar to the target under control of the adversary. Next, during the online phase, the adversary uses a vector of new attack traces t a from the plaintext vector m a obtained by measuring the target device to compute the probability of each key byte guess k as follows: The result of the attack is a vector p containing the probability of each key byte guess. In the following, we will use Gaussian estimations for the leakage probability density function (PDF). The PDF will thus be estimated by computing mean vectors µ and covariance matrices Σ. In addition, we systematically leveraged a dimensionality reduction to combine several informative leakage samples, namely the Linear Discriminant Analysis (LDA) described in [SA08].

Information theoretic and security metrics
We exploit two information theoretic metrics in our discussions. The Signal-to-Noise Ratio (SNR) is used to identify points-of-interest in the leakage traces. The mutual information is used to discuss the link between a DPA with unknown plaintexts and a DPA against a masked implementation.
Signal-to-noise ratio [Man04]. For a leakage sample L and an a-bit variable X, the SNR is a measure of the (first-order) information leaked. For all values i ∈ [0, 2 a − 1] that X can take, a leakage set of j ∈ [0, b] samples of L is acquired and is stored in a vector s. The SNR is then computed as in Equation 2, where E and Var denote the sample mean and variance: Mutual information [SMY09]. For a leakage sample L and an a-bit variable X, the Mutual Information (MI) computes how many bits of information can be learned about X on average when observing a realization of L. It is computed as per Equation 3: We will also use the two security metrics put forward in [SMY09] to evaluate our attacks. First, the Success Rate (SR) is the probability that an attack returns a vector p such that the correct key byte is ranked first. We will use it to evaluate the DPAs in Section 5 which allow perfect key recoveries. Second, the guessing entropy is the average key rank of the key. We will use it (applied to full keys after rank estimation [PSG16]) to evaluate the more challenging SPA of Section 6.

OCB mode of operation
The core OCB encryption is depicted in Figure 1 (ignoring the associated data).
A message m consisting of several blocks m i is encrypted using a block cipher BC to produce a ciphertext block. In this paper, we focus on the case where OCB is instantiated with AES-128, that we denote OCB-AES. First, a value δ 0 is initialized with a user input nonce. Using δ 0 , several values δ i are then computed using a function Inc such that δ i = Inc (δ i−1 ). The δ 0 initialization and Inc function are detailed next. Each δ i is added to the corresponding plaintext block m i before its passed to the block cipher. Finally, the output of the block cipher is again added to δ i in order to produce the ciphertext block c i . To obtain the last ciphertext block c , if m is full (i.e., |m | = 128), OCB proceeds as for a normal message block and if m is not full, the first |m | bits of m are XORed with BC k (δ −1 ⊕ l * ), where the l * value is defined below.
For authentication, a tag τ is computed. It is produced using the checksum of the message checksum := m 10 * ⊕ −1 i=1 m i . It then evaluates τ = BC k checksum ⊕ δ ⊕ l $ ⊕ sum AD (with the l $ value defined below). Finally, The first |τ| bits of the output are used as a tag.
The associated data is processed in a similar way as the plaintext in Figure 1, replacing the plaintext blocks by the associated data ones. However, in the plaintext case, the initial value δ 0 depends on the encryption of the nonce. In the associated data case, δ 0 is always initialized to 0. The subsequent δ i values are updated using the same function Inc as for the plaintext case. The output of all the encryption blocks are XORed together, resulting in the value sum AD .
The description of the initialization function Init to compute δ 0 is given in Algorithm 1. As we can see, this processing mainly depends on two components. The first one is the user's input nonce being processed in a known manner to compute the intermediate value top. Second, the value top First |τ | bits δ0 ← Init(nonce) Figure 1: OCB encryption mode (when the last message block is not full).
is used as an input to the block cipher with the master key to produce the value ktop. We use the notation L to highlight the operations for which we will exploit leakages.

Experimental setup
In order to validate the results presented in this study, we performed practical experiments for which we now detail the setup. We first discuss the underlying implementation and the associated available leakages. As depicted in Figure 1, the input of the ith block cipher execution is m i ⊕ δ i . Moreover, as shown in Algorithm 2, we have δ i = δ i−1 ⊕ l ntz(i) . As a result, one can rewrite the ith input of each AES execution such that it always depends on δ 0 and its corresponding sums of l i . The resulting equations are shown in Algorithm 3. In addition, at the beginning of the AES Algorithm 3 Rewriting of OCB block cipher inputs depending on δ 0 .
execution, the first round key k 0 is XORed to the state and we thus have the initial state equal to m i ⊕ δ i ⊕ k 0 . Putting things together, the initial state of each AES execution can be divived in three parts: a known message m i , a fixed unknown constant k i ⊕ δ 0 , and a sum of l i values that varies depending on the block index j. Concretely, we considered an implementation where all the l i values are precomputed. It starts by computing δ 0 which is stored in some variable. This variable is then updated by adding l ntz(i) when processing the ith plaintext block. The result is finally passed to the AES state input along with the plaintext block m i . Overall, and as depicted in Figure 1 and Algorithms 1 and 2, the adversary is provided with leakages on the following operations: 1. Leakages corresponding to the initialization block cipher call ktop L BC k (top).

Leakages on the computations and updates
3. Leakages corresponding to the block cipher calls processing the plaintext: We implemented AES-OCB in an ARM Cortex-M0 microcontroller. The C code used for this purpose was designed to take advantage of 32-bit operations whenever possible, relies on table lookups for the S-box executions and does not include side-channel countermeasures. Our target implementation ignores the processing of the associated data which, as will be shown in Section 4, leads to a trivial attack. We collected measurements using a Picoscope 5244D oscilloscope sampling at 500MSamples/s. We used a shunt resistor of 6 Ohms to measure the current consumption of the target, an STM32F0308 Discovery board. Two small modifications were performed on the board. First, a crystal oscillator was added to the board to provide a stable clock source for our measurements. Second, decoupling capacitors were desoldered. The target device was set to run at its maximum clock frequency of 48MHz. We set up a trigger signal to synchronize our traces so that the ith trace contains the leakages on the update of δ i and the block cipher call BC k (δ i ⊕ m i ). We measured 800k traces with random inputs for profiling and 1.6M traces for attacks. Our profiling set contains the encryption of 800k plaintexts of one block and our set of attack traces corresponds to 20 plaintexts, each of 80,000 blocks of 128bit.

Trivial side-channel attacks
While the OCB mode of operation does not claim any side-channel resistance, attacking a mode of operation might differ from the known/chosen plaintext/ciphertext DPAs against its underlying block cipher. As the latter is usually the one considered in the side-channel attack literature, we first exhibit trivial attacks which can be reduced to that block cipher DPA case.
DPA against the OCB initialization: The leakages associated to the initialization ktop L BC k (top) lead to a trivial first-order DPA attack against the AES block cipher. Indeed, we can see that the initialization procedure (Algorithm 1) performs the encryption of the value top using the master key. As top only depends on the nonce, it can thus be known or chosen by the adversary. As a result, the side-channel security of a fully unprotected implementation of OCB-AES is as low as the security of an unprotected AES in a known or chosen plaintext scenario.
DPA against the OCB decryption: In the decryption scenario, the adversary is able to query OCB-AES with the same nonce several times. In that case, δ 0 and thus δ 1 can be fixed for several ciphertext queries. That scenario again leads to a first-order known or chosen ciphertext DPA against the AES block cipher. For this purpose, the adversary targets the first decryption block c j 0 in two steps. First, a DPA on the external round can be applied, where the key material considered is equal to δ 1 ⊕ k. Once recovered, the adversary can predict the value of the state at the next round, which can subsequently be attacked with another DPA, leading to a master key recovery. Note that this second step can be performed using the same set of traces as for the first step.
DPA against the processing of the associated data: As explained in Section 2.4, the whitening value δ 1 is fixed for the associated data processing. As a result, a two-step DPA following exactly the same methodology as for the previous decryption attack can be mounted. DPA against incomplete messages: As also detailed in Section 2.4, the encryption of the last message block is processed differently if incomplete (i.e., it has to be padded). Let us assume that the index of the last block is i. Ignoring the padding, the corresponding ciphertext block c i is computed as That is, there is no addition with δ i before outputting c i . As a result, the output of the block cipher call BC k (δ i ) can be computed as c i = m i (ignoring the padding). This enables a trivial DPA in a known ciphertext scenario.
These trivial attacks lead to the following conclusions. First, in the case of decryption leakages or in the case where associated data must be authenticated, all the block cipher calls in OCB must be strongly protected against DPA. This is unavoidable given the specifications of OCB and therefore we next consider the more interesting case where only encryption leakages are available and no associated data is processed. Second, the trivial attacks also show that even in this case, the initialization block cipher call and the last block encryption (if incomplete) must be strongly protected against DPA as well. As a result, and in order to deepen our analysis of the leakage properties of OCB, we next consider the case where this initialization (and the last message block encryption) are well protected. This will allow us to determine whether the other operations of OCB require the same (strong) DPA protections. For this purpose, we will investigate the best attack vectors against implementations with gradual protection levels.

Protection level 1: secure initialization
In this section, we focus on the case where OCB-AES is used for encryption (without associated data) and its initialization (and the last block encryption) are well protected, leaving the adversary with the block cipher calls used for processing the message blocks as main target. We aim to answer the question whether weaker protections can be sufficient to protect this part of OCB?
In this respect, it has already been observed in [BBC + 20] that even in the case where only encryption leaks, all the block cipher calls of OCB use the same long-term key, and thus would require some protection against DPA. Yet, attacking the OCB encryption differs from a classical attack against its underlying block cipher, due to the presence of whitening values. Our goal is therefore to provide a finer-grain analysis, evaluating the complexity of the best attacks and comparing them with a standard DPA against the AES (as used by the trivial attacks). More precisely, and as shown in Figure 1, each plaintext block m i is now XORed with δ i before being passed to the block cipher. OCB does not allow the same nonce to be repeated for two different plaintext (since it makes no misuse-resistance claims). Hence, δ 0 and all δ i will be different and unknown for each encryption. In order to deal with this more challenging context, we first describe a (so-called) baseline second-order attack against OCB in Section 5.1: it directly deals with the unknown whitening values by exploiting their leakages in combination with the S-box output leakages. Next, Section 5.2 introduces a more efficient first-order attack. It essentially trades a better data complexity (i.e., number of traces to recover the key) for a higher guessing (time) complexity. We show experimentally that it is indeed the best attack vector against our target ARM Cortex-M0 implementation. We conclude the section by discussing the reasons of this improved data complexity and the extent to which it remains the best attack vector for any noise level.

Baseline DPA attack
As just mentioned, the main challenge when attacking the OCB encryption of a message is to deal with the unknown δ i values. It implies that the state of the AES after the initial key addition is δ i ⊕ m i ⊕ k, where δ i is unknown and different for each encryption block.
A first straightforward approach to deal with this problem is to apply a divide-and-conquer second-order DPA. For this purpose, the adversary will use leakages on one byte of both δ i ⊕ m i and δ i ⊕ m i ⊕ k. That is, the adversary will query OCB encryptions for several plaintexts and use the leakages corresponding to the processing of plaintext blocks for each of them. More specifically, and reusing the notations of Section 3, we exploit the following two leakages t 0 and t 1 (since the attack is repeated in the same manner for each byte, we omit the byte notation): 1. The first leakage t 0 targets the the input of BC k to retrieve information on δ i ⊕ m i .
2. The second leakage t 1 uses the block encryption BC k (δ i ⊕ m i ) and targets the first-round Sbox output in order to retrieve information on δ i ⊕ m i ⊕ k.
Using the bivariate leakage t = (t 0 , t 1 ), we then apply the template attack described in Section 2.2, adapted to deal with the unknown δ i . That is, Equation 4 shows how to computeP model [t | k] in that case. For completeness, Algorithm 4 shows the pseudocode for the baseline attack.

Improved DPA attack
The main drawback of the baseline attack is that it deals with the unknown whitening values as with a kind of light masking scheme. As will be quantified in Section 5.4, this implies a higher data complexity. We now present an improved DPA attack that circumvents this drawback. At a high level, it exploits the possibility to observe the leakage of a single message consisting of many blocks. By exploiting the relations between different l i values, we can mount a first-order DPA that reduces the baseline attack's data complexity at the cost of a more expensive guessing strategy.
More precisely, the main idea of the improved DPA is to extend the guessing space from the sole key byte of k 0 by including the corresponding bytes of l i . Using the rewriting of the block cipher inputs shown in Algorithm 3, we can see that if the l i are known (or guessed), the only remaining unknown is δ 0 . As δ 0 is constant for each block cipher call, we can consider it as a "part of the key" so that a first-order attack can be performed. That is, the search space now becomes one byte of k 0 ⊕ δ 0 = h along with the used bytes of l i . From a guess on these two values, the output of the first round S-box can be guessed and used for a first-order attack.
The attack only requires to use the leakage t 1 corresponding to the output of the first S-box. However, adding l i to the search space comes at a cost. By rewriting the equations in Algorithm 3, we can see that the first message block manipulates l 0 , which in turn adds one byte to our search space, increasing the time complexity from 2 8 to 2 16 . The next two blocks further manipulate l 1 , which again increases the search space from 2 16 to 2 17 . Indeed, since l 1 = double(l 0 ), a 9-bit hypothesis on l 0 is sufficient to represent both corresponding bytes of l 0 and l 1 . We observe that we can gradually increase the number of blocks we can consider by increasing the number of l i values in our search space. In general, adding all l i values from l 0 to l n , n ∈ [0, 120], increases the search space from 2 8 (as for the baseline attack) to 2 16+n . Finally, since the ith message block processing uses l ntz(i) , adding all l i values from l 0 to l n allows us to consider 2 n+1 − 1 message blocks for our attack. Upon success, this first step of the attack will return both the byte of k 0 ⊕ δ 0 and the corresponding bytes of l i . By repeating it for each byte, the attacker is provided with the full values of k 0 ⊕ δ 0 and l 0 . From the knowledge of k 0 ⊕ δ 0 and l 0 , the attacker can fully predict the state of each block cipher call after the key addition. Therefore, she can mount a standard first-order attack against the second AES round, for which the state is known before the addition of the second round key k 1 . Algorithm 5 provides the pseudo code for the whole attack.
Algorithm 5 Improved attack. update results(h * ||l * 0 ) using equation (1) 5: end for 6: end for 7: Rank hypotheses h * , l * 0 independently according to their probabilities 8: Recover both bytes of k 0 ⊕ δ 0 and the corresponding l 0 value 9: Repeat lines 1-8 with adjustment for each bytes of k 0 ⊕ δ 0 and l 0 10: Carry out a standard DPA on the second AES round to recover the second round key

Experimental results
We implemented the baseline and improved attack using the measurements described in Section 3. For both attacks we used the same profiled DPA using LDA-based dimensionality reduction described in Section 2.2. More precisely, we first identified points-of-interest in the leakage traces by computing the SNR of the target intermediate values over time. We then applied LDA on these points-of-interest to concentrate the leakage into a few dimensions. Our experiments used up to 15 dimensions. We depict in Figure 2 the success rates over the full key of our attacks. They illustrate the gains of the improved attack in terms of data complexity and the weak side-channel security of an implementation of OCB-AES that does not protect its plaintext processing block cipher calls. We note that despite targeting a well-controlled prototype implementation in a permissive (pro-filed) setting for the adversary, we believe these attacks are sufficient to confirm the need of strong DPA protections for the message processing blocks of OCB. In particular, relaxing the profiled setting is not a significant issue for our improved attack, since adversaries can for example take advantage of an "on-the-fly" regression-based profiling to extract first-order information [DPRS11]. In a similar context as ours, [CR17] also showed how to get rid of the profiling assumption used in [HTM09]. We add an SNR curve illustrating our selection of points-of-interest in Appendix A. It confirms that our implementation leaks as expected for an unprotected implementation (i.e., that our target code did not lead to more leakage than expected for a standard AES implementation).

Discussion
The previous experimental results show that the improved attack indeed brings concrete benefits over the baseline one. In this section, we complement these practical results with a more theoretical discussion. We start by detailing the reasons of the worse data complexity of the baseline attack by discussing its links with attacks against masked implementations. We then provide a model for predicting the complexity of the improved attack which allows us to conclude that the benefits observed in Section 5.3 should remain stable for a wide range of realistic noise levels.

Baseline attack and masked implementations
There is a tight link between between our baseline attack and attacks against masked implementations [ISW03,RP10]. In case of (Boolean) 2-share masking, any sensitive variable v is decomposed into two values v 1 and v 2 such that v = v 1 ⊕ v 2 . This decomposition ensures that the distribution of any share is independent of the secret. From a security viewpoint, the number of traces required to recover the secret increases therefore exponentially with the number of shares given sufficiently noisy and independent leakages [PR13,DFS19]. A similar noise amplification is observed in our baseline attack (by simply viewing the whitening values as a secret mask).
In the case of an AES implementation protected with masking, the output of the S-box computation is of the form Sbox (m ⊕ k) ⊕ mask. The attacker is provided with a leakage on that value along with a leakage on the mask. From these two leakages, she can perform a second-order attack (as in our baseline attack). The main difference between our baseline attack and an attack against a Boolean masked implementation lies in the position of the unknown random mask with respect to the S-box computation. That is, in the baseline attack, the unknown δ i is added to the state inside the S-box computation, and not outside: Sbox (m ⊕ k ⊕ δ i ). We analyzed the mutual information between the leakages and the key byte for these two different attacks using in a simple simulated setting where we assumed the leakage of all the intermediate values in our implementation to be the Hamming weight of these values with additive Gaussian noise. The corresponding results are shown in Figure 3. The X-axis corresponds to the SNR and measures the level of noise. The Y-axis corresponds to the mutual information MI (L = (T 1 , T 2 ) , K), where L denotes the random variable of the leakages and K represent the key random variable. It indicates the worst-case security level. The blue curve corresponds to an unprotected AES, the red curve to a masked implementation with two shares and the yellow curve corresponds to our baseline attack against OCB-AES.
We first observe that both the attack against the 2-share implementation and the baseline attack against AES-OCB have similar slopes in high-noise regimes. This slope indicates the statistical security order of the implementation which is two for these two implementations (vs. one for the unprotected implementation). So the whitening values can indeed be viewed as a impacting security like a 2-share masking scheme. However, we also observe that the curve of the baseline attack against OCB is shifted vertically compared to the one of the 2-share masked AES. This difference is typical of a masking scheme that is not Boolean, for example an Inner Product Masking (IPM). In the context of IPM (see [BFG + 17], Figure 3), the algebraic complexity of the encoding makes it slightly harder to exploit leakage even in low-noise regimes. In the OCB-AES case, the unknown δ i being inside the S-box computation implies a similarly higher algebraic complexity when trying to recombine the leakages in a second-order attack. These results directly explain the higher data complexity of the baseline attack when compared to the improved one. They further show that the (data complexity) gap between both attacks will increase with the noise level, as does the gap between the MI of the unprotected implementation and the one of the second-order attacks in Figure 3. As a result, the last question to study is whether the baseline attack can sometimes be concretely limited by its higher time complexity, which we discuss next.

Modeling the improved attack
As stated above, in order to observe up to 2 n+1 − 1 message blocks, the adversary requires a hypothesis space of a size 2 16+n . Since lowering the SNR will increase the number of traces required to attack, a natural question is whether this 2 16+n time complexity can become a bottleneck when large n values are needed to recover the key. To answer this question, we derived a model for the attack's data complexity which allows us to determine what is the time complexity of the improved attack for lower SNRs than observed in our experiments. More precisely, we computed a lower bound on the number of traces needed to reach different success rates given different SNRs, for a simple case with a single point-of-interest in each trace. We note that in this setting, we have 256 univariate Gaussian templates corresponding to the leakage of the S-box output.
A series of works have been published for estimating the success rate of DPA attacks (e.g., including [Man04, SPRQ06, Riv08, FLD12, TPR13, LPR + 14]). They mostly differ in the distinguishers they consider and the exact assumptions they require. We adopt the notion of the additive distinguisher from [LPR + 14] as well as the technique to approximate the score vector (a vector consisting of the distinguisher for each key hypothesis) with a multivariate Gaussian distribution. However, for our analysis, as the number of hypotheses is given by 2 16+n , it is difficult to compute the corresponding covariance matrix for the distribution. Instead we apply a formula from [MOP08] to calculate the lower bound for the number of traces assuming the distinguishers are mutually independent. We defer the detailed derivation of this bound to Appendix B.
The results of this bound computed for different success rate values and SNRs can be found in Figure 4. The dashed black line shows number of traces (i.e., encryption blocks) that can be  observed for each value of n. The theoretical bounds are consistent with our experimental results. Their main conclusion is that for realistic ranges of SNR, the improved attack will indeed be the method of choice, thanks to its better data complexity and practically reachable time complexity. For example, even the (noisy) unprotected AES hardware implementation from the DPAcontest v2 [rg10] has SNR values between 0.0069 and 0.0096 [PHJL17]. According to Figure 4, it would require a number of traces ranging from 1,000 to 4,000 traces which can be achieved using n = 11, leading to a time complexity of 2 27 which is reasonable even for a standard desktop PC.

Protection level 2: secure block cipher calls
Given the negative results of the previous section, we now consider an implementation of OCB-AES where all the block cipher calls are well protected against side-channel attacks. In this case, the adversary is only left with the leakage of the whitening values. We therefore aim to question whether side-channel attacks remain possible. Clearly, key recovery is not an option since OCB's master key is only manipulated by block ciphers. So the question boils down whether it is possible to recover the whitening values, and to exploit them in concrete attacks. We next show that recovering the whitening values requires mounting a SPA against the (linear) operations used to process these values. For this purpose, we describe a worst-case SPA aiming at recovering l 0 . We then show experimentally that this attack is applicable to small implementations. As a proof-ofconcept, we illustrate it against a weakened implementation where the whitening computations are performed on 8 bits (while our ARM Cortex-M0 implementation naturally allows 32-bit operations). It allows us to illustrate the existence of a risk and to discuss the challenges raised by such attacks and their interpretation in terms of concrete impact. We finally show that if l 0 can be recovered, then strong attacks against the confidentiality and integrity of OCB-AES are possible.

Worst-case SPA against l 0
In order to recover l 0 , we are not able to perform a classical DPA attack. The only available leakages are δ i L δ i−1 ⊕ l ntz(i) and BC input L δ i ⊕ m i . We therefore have to rely on the leakage provided by the loading of l ntz(i) into the registers of the Cortex-M0. Given the function ntz, the value l 0 is used in the whitening of every other plaintext block. Hence, for a plaintext of a given number of blocks i, targeting l 0 in an SPA manner will provide the highest number of attack traces (i.e., i/2). The algorithm describing the SPA attack against l 0 is described next.
We additionally investigated the possible improvements of this attack. A first natural option is to exploit the relation between the different l i 's as described in Algorithm 2. This boils down to perform an SPA targeting l 0 , l 1 , . . . , and recombining their leakages to obtain a more precise end for 8: end for 9: return argmax(results κ ) as found key byte guess on l 0 . Another option is to leverage belief propagation to exploit additional leakages on the loading of δ i−1 , the computation and storing of δ i , the plaintext and the computation of the input of BC k [VGS14]. We observed that these options only lead to marginal gains and therefore do not report them. We posit that the first improvement is of limited effectiveness because the number of traces available to estimate l i+1 is half the number of traces to estimate l i , while the second one is of limited interest because the belief propagation algorithm cannot extract significantly more information than the plain SPA due to the linear nature of the operations targeted.

Experimental results
We implemented the previous attack against l 0 using the measurements described in Section 3 and the profiled templates with LDA-based dimensionality reduction described in Section 2.2. As in Section 5.3, we preselected points-of-interest based on the SNR and then compressed the informative samples thanks to LDA, keeping up to 15 dimensions. Yet, contrary to the DPA attacks of Section 5.3, we could not recover l 0 in full. The quantiles of the key rank estimated for the full l 0 value thanks to the rank estimation algorithm in [PSG16] are represented in Figure 5. These results lead to several interesting observations that we detail next: First, it appears that the SPA against the whitening values is significantly more difficult than a similar (single-trace) attack against a block cipher (or hash function) implementation [KPP20, BBC + 20]. In particular, and despite our attacks decrease the key rank significantly, we could not reach a key rank of one for the performed attacks. It implies that exploiting this leakage concretely will require launching the attacks of Section 6.3 and 6.4 multiple times (e.g., 2 10 times for our easiest target l 0 ). We posit that this increased difficulty is due to the combined effect of a limited number of leaking operations and the fact that they are linear [Pro05].
Second, and as usual for a SPA, the complexity of the attack depends on the value of l 0 . That is, contrary to DPA attacks that are equally difficult for all keys, the efficiency of SPA attacks depends on the target intermediate value [MOS11]. This is reflected in Figure 5 by the quantiles of the key ranks (which are here computed over 20 independent experiments).
Third, and as usual for SPA attacks as well, their worst-case evaluation is more sensitive than the one of a DPA attack. The main reason of this fact is that the efficiency of SPA mostly depends on the (noise-free) deterministic part of the leakage function (while the efficiency of a DPA depends on the SNR or MI). In other words, it depends only on the side-channel signal, which may significantly vary with the implementations and measurement setups. So as discussed in [BMPS21], it is in general a good idea to parameterize implementations conservatively against such attacks.
Overall, the results in this section are therefore conceptually different from the DPA ones in Section 5.3. That is, while these DPA results confirmed the possibility of quite realistic attacks directly justifying a need of strong countermeasures, the SPA results in this section are more of a proof-of-concept showing the existence of a risk. It is an interesting open problem to further investigate how to improve them against larger implementations. For example, applying exactly the strategy of this section to a 16-bit implementation (i.e., limiting the XORs of our ARM Cortex-M0 implementation to 16 bits rather than 8 for Figure 5) already turned out to be much more challenging (i.e., we could not reduce the rank below 2 80 ). Independently of this information extraction question, the next two sections show that recovering l 0 directly leads to concrete attacks against the confidentiality and integrity of OCB. So we first focus on the description of these attacks, before discussing the comparative relevance and impact of these findings in conclusion.

Attack against the confidentiality of OCB
The attack in this (and the next) section assume that the adversary recovers l 0 via side-channels but, as just mentioned, can be launched for several l 0 candidates in case it is not recovered in full. After the recovery of l 0 , the adversary does not need side-channel capabilities anymore: these attacks succeed without any online leakage during the encryption of the challenge plaintext.
In order to simplify the treatment of the attacks, we introduce some additional notations. First, we introduce the set I( j): it is the set of the indexes of the l i values used in the computation of δ j . Similarly, we define the set I( j, j ) as the set of the indexes of the l i values used in the computation of either δ j or δ j (but not both). That is: Finally, we define l ( j) := ⊕ i∈I( j) l i and l ( j, j ) := ⊕ i∈I( j, j ) l i . Thus, we have: Our attack against the confidentiality of OCB shows that it is possible to distinguish the encryption of a message m from the encryption of another message provided that m has a certain structure. Roughly speaking, the adversary has to distinguish the output of her oracle implemented with OCB k from a random one. For this output, called the challenge one, she does not receive any leakage. On the other hand, she can query an oracle implemented with OCB k receiving the outputs and their leakages. So formally, we want to show that the following advantage is not negligible: where OCBL k denotes the fact that the adversary has access to the leakage of OCB k .
The high-level idea of the attack is depicted in Figure 6 (a). It aims at forcing the input of BC k to be the same in the computation of two different ciphertext blocks. This can be done, for example, by choosing m 2 := m 1 ⊕ l (1) ⊕ l (2) = m 1 ⊕ l (1,2) . In fact, c 1 = BC k m 1 ⊕ l (1) ⊕ δ 0 ⊕ l (1) ⊕ δ 0 while c 2 = BC k m 2 ⊕ l (2) ⊕ δ 0 ⊕ l (2) ⊕ δ 0 . Consequently, if m 2 = m 1 ⊕ l (1,2) , c 2 = c 1 ⊕ l (1,2) . The attack is specified in Algorithm 7. It can be generalized to any ciphertext block position. Note that this attack only applies if the message is composed of more than two message blocks (since the last ciphertext block is computed in a different way). Note also that there exist several definitions of confidentiality in the presence of leakage. For example the one of Barwell et al. [BMOS17] and the one of Guo et al. [GPPS19]. Our attack breaks both definitions.

Attack against the integrity of OCB
The goal of this second attack, illustrated in Figure 6 (b), is to provide a forgery (nonce * , c * ) from a ciphertext c = (C, τ), obtained as the answer of an encryption query on input (nonce * , m), with the knowledge of the l i s (i.e., formally, to break the ciphertext integrity with leakage definition of [GPPS19]). Its high-level idea is to find a way to force the correct tag τ * to be the same as τ. We start by observing that the tag depends only on the checksum of a message and on its length. Thus, if |m| = |m * | and their checksums are the same, their tags are the same if the nonce is the same. To force the checksums to be the same, we use the same idea as for the previous attack against the confidentiality of OCB. In fact, if m = (m 1 , m 2 , m 3 ) and (c 1 , c 2 , c 3 , τ) is its encryption, we know that if we replace c 2 with c * 2 = c 1 ⊕ l 1,2 , we have encrypted m * 2 = m 1 ⊕ l (1,2) instead of m 2 . Similarly, if we replace c 1 with c * 1 = c 2 ⊕ l (2,1) , we have encrypted m * 1 = m 2 ⊕ l (2,1) . Finally, we observe that the checksum of the plaintext obtained in the decryption of C = (c 1 , c 2 , c 3 ) is the same as the plaintext obtained in the decryption of C * = (c * 1 , c * 2 , c 3 ), since m * 1 ⊕ m * 2 = m 1 ⊕ l (2,1) ⊕ m 2 ⊕ l (1,2) = m 2 ⊕ m 1 and l (2,1) = l (1,2) . This proves that (nonce, c * ) with c * = (C * , τ) is a valid forgery. The attack is specified in Algorithm 8. It can be easily generalized to longer ciphertexts.

Conclusions
In this paper, we studied the resistance of the OCB mode of operation against side-channel attacks. While it was already observed that OCB does not inherently provide side-channel resistance as it uses the same long-term key in all its block cipher calls, we aimed at providing a finer-grain analysis to balance that statement. That is, do all operations in OCB need to be protected and if so, do they require the same level of protection? We answered that question by exhibiting several attacks against OCB for different levels of protections. First, we showed that trivial schoolbook DPA attacks on the underlying block cipher can be mounted against the initialization, the processing of a final incomplete message block, the processing of associated data and the decryption of OCB. Next, assuming a first level of protection preventing the above trivial attacks, we investigated the case where only leakages from the plaintext encryptions are available. In that context, we first showed that one can apply a second-order attack against the secret whitening as one would exploit against a 2-share masked implementation. We also presented an improved (first-order) DPA, decreasing the data complexity of the second-order attack at the cost of a higher (time) key guessing complexity. Experiments and theoretical analysis then confirmed the interest of the improved attack for realistic noise levels. Finally, we considered the case where all block cipher calls are strongly protected against side-channel leakage. We showed that when the whitening values can be recovered by side-channel analysis, it is possible to mount simple attacks that break the integrity and the confidentiality of OCB. Yet, the recovery of these whitening values requires mounting an SPA which is more challenging than the DPAs against the block cipher calls, especially for large computing architectures. From these experiments, we conclude that OCB requires the strongest protections for its initialization, processing of the associated data and the last message block if incomplete. The processing of the message blocks might benefit from a slightly lighter protection (e.g., a masked implementation with one less share). As for the processing of the whitening values, it leads to different intuitions. On the one hand, it is significantly more difficult to target and therefore may require only lighter protections. On the other hand, the worst-case SPAs that can target them are also more difficult to evaluate since quite implementation-and setup-dependent. Given that protecting this (linear) part of OCB thanks to masking is also easier, and since a leakage on them can lead to strong attacks against both the confidentiality and the integrity of the mode, we conclude that it may be a good practice to protect them conservatively as well.  In this section we present the detailed calculations for the bound presented in Figure 4. For a fixed n, let = 2 n+1 − 1 denote the number of blocks an attacker can observe, which is the maximum number of traces we have for given n. Let H denote the set of all possible keys. We consider one key as a concatenated value of the last byte of δ 0 ⊕ k 0 and the last 8 + n bits of l 0 . The attack for other bits follows the similar analysis. Let h j denote a single key with h c being the correct key. Recall that |H| = 2 16+n . Let T x i ,h j denote the random variable corresponding to a leakage at encryption of block i with plaintext input m i and key h j . For simplicity, we use x i to denote the last byte of m i , m i (16). An observed leakage, denoted by t i , is a realization of the random variable T x i ,h c . Let t a := [t 1 , t 2 , . . . , t ], m a := [x 1 , x 2 , . . . , x ] denote the vector of leakages and corresponding plaintext bytes respectively. To carry out the template attack, we calculate the probability P[h j | t a , m a ] for each key hypothesis h j . The guessed key value is then given by the hypothesis achieving the maximum probability. Following equation (1): Note that the denominator is a constant for any hypothesis h j . Hence, finding h j that gives the maximum value to P[h j | t a , m a ] is equivalent to finding h j that achieves the maximum of