A Tale of Snakes and Horses: Amplifying Correlation Power Analysis on Quadratic Maps

. We study the success probabilities of two variants of Correlation Power Analysis (CPA) to retrieve multiple secret bits. The target is a permutation-based symmetric cryptographic construction with a quadratic map as an S-box. More precisely, we focus on the nonlinear mapping χ used in the Xoodoo and Keccak-p permutations, which is aﬃne equivalent to the nonlinear mapping of Ascon . We thus consider three-bit and ﬁve-bit S-boxes. Our leakage model is the diﬀerence in power consumption of register cells before and after one round. It reﬂects that, in hardware, the aforesaid cryptographic algorithms are usually implemented by deploying a round-based architecture. The power consumption diﬀerence depends on whether the targeted bits in the register ﬂip. In particular, we describe two attacks based on the CPA methodology. First, we start with a standard CPA approach, for which, to the best of our knowledge, we are the ﬁrst to point out the diﬀerences between attacking a three-bit and a ﬁve-bit S-box. For CPA, the highest correlation coeﬃcient is the most likely secret hypothesis. We improve on this with our novel combined Correlation Power Analysis (combined CPA), or Snake attack, which uses quadratic map cryptanalysis (e.g., of the function χ ) to achieve a better attack in terms of the number of traces required and computational complexity. For the Snake attack, sums of absolute or squared values of correlation coeﬃcients are used to determine the most likely guess. As a result, we eﬀectively show that our proposed Snake attack can recover the secret in n 2 2 (or n 2 2 + n ) intermediate results, compared to 2 2 n for the CPA, where n is the length of the targeted S-Box. We collected power measurements from a hardware setup to demonstrate practical attack success probabilities according to the rank of the correct secret hypothesis both for Xoodoo and Keccak-p . In addition, we explain our success probabilities thanks to the Henery model developed for horse races. In short, after performing 16 , 896 attacks, the Snake attack or combined CPA on Xoodoo consistently recovers the correct secret bits with each attack using 43 , 860 traces on average and with only 12 correlations, compared to 61 , 380 traces for the standard CPA attack with 64 correlations. The Snake attack requires about one-ﬁfth as many correlation values as the standard CPA. For Keccak-p , the diﬀerence is more drastic: the Snake attack recovers the secret bits invariably after 771 , 600 traces with just 20 correlations. In contrast, the standard CPA attack operates on 1 , 024 correlations (about ﬁfty times more than the Snake attack) and requires 1 , 223 , 400 traces.


Introduction
Side-channel analysis enables the exploitation of weaknesses in the physical implementation of cryptographic primitives that may leak secret data.Indeed, the physical behavior of an electronic device, e.g., its power consumption or electromagnetic radiation, carries information about the processed data and internal operations.An adversary that can measure such a physical leakage presents a significant threat to devices that manipulate sensitive information.More precisely, by analyzing the power consumption, they 1 can retrieve parts of the secret with methods like Differential Power Analysis (DPA) [KJJ99] or Correlation Power Analysis (CPA) [BCO04].If countermeasures are insufficient, an adversary could practically decrypt messages or forge an authentication code, although the primitive is secure.The cryptographic community, therefore, aims to find sound countermeasures [BDPVA10, BNN + 15, ABP + 18, PAM19, ZSS + 21].
In this work, we investigate, with CPA, the security of a device that implements a permutation-based cryptographic algorithm with an S-box based on a quadratic map, i.e., has algebraic degree two.An S-box is the nonlinear layer of a round function in a symmetric cryptographic algorithm.We chose to study quadratic maps partly because research work aimed at classifying n-bit S-boxes in affine equivalence classes to find optimal S-boxes found an interest in quadratic maps (n ∈ N, n ∈ [1, 6]) [BBS17, DC07, LP07, ZBRL15, MB19].Moreover, they are reasonably simple to manipulate for our proposed CPA attacks, which exploit the algebraic expression of the S-box.
Our electronic device has a register that stores intermediate data that is updated after each of the aforementioned rounds.That is because permutation-based algorithms are implemented with a round-based architecture.We aim to quantify the gain in attack success probability between exploiting the leakage of multiple storage cells of a register state at once and the combination of the individual leakage of those cells for different S-box sizes.The first attack corresponds to the standard CPA, and the second is our newly proposed combined CPA strategy.The aim is to explore the difference in success probability and computational complexity between those two attacks, for small S-boxes, with simple assumptions.
We focus on the function χ, as it was shown in [Dae95] to be the quadratic function with the most interesting properties to provide resistance against differential and linear cryptanalysis.The function χ is used in the Keccak-p [BDPV11] and Xoodoo [DHVV18] cryptographic permutations, which are at the core of cryptographic algorithms such as Keccak/SHA-3 [Dwo15], Keyak [BDP + 16b], Ketje [BDP + 16a], Xoofff [DHVV18] or Xoodyak [DHP + 20] NIST Lightweight Cryptography competition finalist.The function operates on a group of five bits in Keccak-p and on one of three bits in Xoodoo.We apply thus our attacks on five-bit and three-bit S-boxes, which are instances of the mapping χ.
Contribution.Earlier work by Bertoni et al. [BDD + 12] considered the security of the function χ for a single register cell against DPA using simulations.They studied the power consumed to retrieve two bits of the secret after one round of the Keccak-p permutation.Samwel and Daemen [SD17] extended this work by targeting the five secret bits of a Keyak (Lake Keyak) S-box with DPA.
• We extend and formalize the previous analyses by proposing a methodology to target n register bits, which are iteratively updated by a cryptographic round function.For that purpose, we improve the leakage model from [BDD + 12, SD17] in Section 3 to adapt it to the Gaussian additive noise model [MOS11] for our round-based hardware architecture (Section 2).Additionally, Samwel and Daemen [SD17] proposed to partition the power consumption values according to the values of the messages in order to exploit the leakages of several register bits: we formalize and generalize this idea to any S-box size that uses a quadratic map.
• In Section 4, the standard CPA attack is detailed for the first time for the quadratic map χ, to the very best of our knowledge.We further present the combined CPA strategy, which aggregates individual leakage information from multiple register bits by exploiting the cryptanalysis of the function χ.We nickname it the 'Snake attack' because of the visual representation of the combination of individual leakage bit results to retrieve several secret bits.
• In Section 5, we present a method for determining the rank of the solution secret hypothesis for an attack strategy, and derive from it the probability of success.
• We provide practical success probabilities in Section 6 for our hardware setup.We argue that our results follow the probability Henery model [Hen81], which was originally proposed to model horse races.
In conclusion, for our three-bit and five-bit S-box, the combined CPA, or Snake, attack has a higher probability of success than CPA for the same number of power traces, i.e., power consumption measurements, while, most importantly, having fewer correlation results to manipulate.Indeed, after performing 16,896 attacks, combined CPA on a three-bit S-box unfailingly recovers the secret solution bits starting from 43,860 power traces with simply 12 correlation values, compared with 64 correlation results and 61,380 power traces for CPA.The difference is even more noticeable for a five-bit S-box, since combined CPA consistently determines the secret bits from 20 correlation results after 771,600 power traces.In contrast, the standard CPA attack only does it after 1,223,400 power traces and requires 1,024 correlation results.In a nutshell, the Snake attack or combined CPA has the advantage of manipulating fewer intermediate steps at once than the classic CPA to ease the effort of side-channel analysts.

Hardware Architecture
We consider a hardware implementation of a permutation using a quadratic map as an S-Box for our CPA attacks.In the remainder of this paper, we focus on the wellknown function χ, which is investigated for its interesting differential and correlation properties [Dae95, MPLJ19, DMM21], or more generally for its state diagram [SD23], to find potential distinguishers for cryptanalysts.Multiple cryptographic schemes are based on permutations like Keccak-p [BDPV11,Dwo15] or Xoodoo [DHVV18], which use the function χ to have a simple nonlinear layer.The previous permutations are well-studied in the cryptographic community [LSL21, LIMY20, LQT19, GPT21, EME22].

Permutation
We study the function χ in a permutation-based cryptographic scheme, namely the sponge construction [BDPV08].This construction is an iterative construction that applies a permutation f several times.The permutation consists in itself of a round function that is applied iteratively to an array of bits or state.The round function contains a linear layer λ and a nonlinear one.The latter comprises mappings χ n in parallel on arrays of n bits.
Definition 1 (Function χ n ).Let n be an odd integer.Let i ∈ Z/nZ be an index to designate the i-th bit in a binary sequence of length n.The function χ n : restricted to a n-bit array and applied on the bit x i is The length n of a sequence to which χ n is applied, depends on the permutation used in the cryptographic scheme.We use the generic term sequence to refer to the said n bits to be permutation-agnostic.
Xoodoo.The permutation Xoodoo [DHVV18] applies a round function to a 384bit state consisting of of 128 columns of three bits.The round function of Xoodoo is decomposed in two linear mappings θ and ρ west , followed by the addition of a round constant ι, then the application of the nonlinear mapping χ, and another linear mapping ρ east .The nonlinear mapping χ is applied to the individual bits x i of a column.Therefore, in our notation, it corresponds to applying the function χ 3 on a binary sequence of three bits.
Keccak-p.The permutation Keccak-p, as specified by the NIST SHA-3 standard [Dwo15], is defined over a state of 5 • 5 • 2 l bits with l ∈ N, l ∈ [0, 6], with a number of rounds being equal to 12 + 2l.The state is organized in an array of five rows times five columns times 2 l lanes.The round function of Keccak-p is composed of five mappings: three linear mappings θ, ρ, and π are first applied, followed by the nonlinear mapping χ and the round constant addition ι.The nonlinear mapping χ is applied on the bits x i of a row, which corresponds to the application of χ 5 on a binary sequence of five bits in our notation.
Henceforth, we consider five-bit and three-bit sequences to focus on χ 3 and χ 5 .As an additional remark, we focus on the function χ in this paper, not only for its simplicity and its interesting properties for symmetric cryptography but also because we know that all invertible quadratic functions operating on three bits are affine equivalent2 to χ 3 if all their three functions restricted to one bit are quadratic.Therefore, this nonlinear function is the best candidate to test and present our CPA attacks on small S-boxes.On a side note, according to De Cannière

Round-Based Hardware Architecture
We consider an electronic device with a round-based cryptographic algorithm implemented without protection.Its hardware architecture comprises a register composed of storage cells.The round logic performs the round function on the register state and updates it with the output of the round function, as depicted in Figure 1.This circuit computes an r-round permutation in r cycles.We keep the architecture as simple as possible to better understand how to optimize standard power attacks in such an elementary setup.Before performing the first round, it is assumed that the register contains a secret K that results from earlier processing, e.g., the absorption of a key.Then, a message M is added to the register state once before applying the first round function.The message M has the same length as the secret K.The sum of the secret and the message M ⊕ K is the input of the round logic.The linear layer λ, which we do not detail for the sake of generality, is applied to this value to generate the input to the nonlinear layer.We then define two intermediate variables µ and κ in order to set-up our attack.
Definition 2 (µ and κ).Let µ and κ be, respectively, the n bits of λ(M ) and λ(K) that are input to the targeted sequence χ n .
The input of the considered sequence χ n is µ ⊕ κ with κ constant to be more specific.The register stores the output of the round logic that equals χ n (µ ⊕ κ) for one sequence χ n after the first round.The presented architecture is an implementation without round constants for the sake of simplicity.

Correlation Power Analysis
Cryptographic operations are performed in electronic devices where data is stored in registers that are a collection of flip-flops.A flip-flop can be thought of as a cell that stores a binary digit.When the state of the flip-flop is changed, i.e., the stored bit is switched, a certain amount of power is applied to perform the change.In contrast, switching power is not required if the state is maintained.
The power consumption of an electronic device is a function of the binary data processed by the circuit, but in a complex manner.It has noisy behavior, partly due to the difficulty of accurately modeling the power consumption and inherent non-deterministic effects.However, the correlation between the power consumption and its switching activity can be measured and exploited to reveal some cryptographic secrets using a statistical attack, such as DPA [KJJ99] or CPA [BCO04].For those attacks to work, the measured power consumption must depend on known data and constant secret bits.We consider as an example the circuit presented in Section 2.2 and an attacker that targets one flip-flop that stores a single bit of the state.This bit is updated after every round.The adversary performs a CPA attack to exploit the leakage due to the switching of the said flip-flop after one round.The stored bit is denoted by b r , with r the round index.For the cipher, we can express the value of b r after one round as a simple algebraic expression that depends on a known message and a constant secret at the input.Then, the adversary measures the power consumption for many cryptographic computations for the same secret but with different, known messages.A sequence of instantaneous power consumption measurements for a single message is commonly called a power trace.The adversary then computes b 1 ⊕ b 0 for each power trace with the known input message and with a hypothesis on the secret bits.Each of these hypotheses now has an array of bits representing whether the target bit changed or not.If the secret hypothesis is correct, the value b 1 ⊕ b 0 is also correct, and the traces where the bit flips (i.e., the sum results in 1) consume on average more power than the ones for which the bit does not flip (i.e., the sum is 0).Otherwise, for the incorrect hypotheses, the value b 1 ⊕ b 0 usually does not correlate well with the power consumption, depending on how much nonlinearity is involved in the round function.In other words, the most likely secret hypothesis is the one that maximizes the correlation between the array of hypothetical bit (non-)flips and the power consumption.
The presented attack can also be generalized to target multiple bits at once.Intuitively, the more bits flip at once, the more power an electronic device consumes.Let b i r be the i-th bit in the r-th round.Then, for targeting bits at once, the adversary can use j=1 b j 1 ⊕ b j 0 (for each hypothesis of the secret) to correlate with the power consumption.Again, the hypothesis that maximizes the correlation is deemed the correct one.We aim to understand the CPA attack strategy that uses fewer power traces and computational complexity while targeting a quadratic map operating on a few bits to retrieve multiple secret bits and making simple hypotheses.More precisely, what is the gain in success probability between extracting information from the power consumption of multiple register cells and combining those said cells individually for quadratic maps operating on different sequence sizes?To this end, we describe a leakage model based on [BDD + 12].

Activity Function and Leakage Model
We assume the adversary knows the messages M j that correspond to j specific traces and can compute µ for each message, which is the bits of λ(M ) that are input to a χ n sequence (Definition 2).The attacker exploits the power consumption of m storage cells, with m ≤ n, between the initial register state at time t 0 and the state after the first round at time t 1 where n is the length in bits of an S-box, i.e., binary input sequence.In this study, m is a parameter to determine whether we target the signal power consumption of individual bits (m = 1), or all the bits of the first sequence χ n at a time (m = n with n ∈ {3, 5} in this paper).
Definition 3 (K ).Let K be the m targeted bits of the register at time t 0 that stores the secret K.We denote by K i the i-th bit of K .
Ultimately, the adversary aims to recover K , but depending on the attack strategy, they might only be able to recover κ at first.We now consider that the adversary aims to recover the secret (K , κ).The hardware architecture and attack parameters are depicted in Figure 2, where the targeted secret bits correspond to the first S-box in this example.Note that this is a widespread architecture for implementations of Xoodoo, and Keccak-p in Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs).We use a simple leakage model for the m register cells [BDD + 12], sometimes denoted as the Hamming distance power consumption model [PSDQ05,BCO04].It counts the number of bit flips between time t 0 and t 1 .The power consumed to flip a bit value, i.e., to change a 0 into a 1 and vice versa, is assumed to be higher than keeping the same value over time.It also supposes that all cells contribute equally to the power consumption and that the transitions 0 → 1 and 1 → 0 lead to the same value.Under these assumptions, we define the activity function d i (K i , κ, µ) ∈ F 2 for one bit in Definition 4. If the bit value flips, it equals 1, otherwise, it equals 0. To compute the Hamming distance between the first m bits of the state at time t 0 and t 1 , we sum the outputs of the activity function of those bits in Definition 5. We thus obtain the power consumption of the first m bits; it is our signal.It should be noted that the round constants, which are not described here for the sake of simplicity, shall be included in the activity function in practice.
Definition 4 (Activity function).For i ∈ Z/nZ, let K i be the i-th bit of the secret register state.Let d i ∈ F 2 be the expression of the activity of the i-th bit after the first round: Definition 5 (Signal power consumption).The signal power consumption S ∈ R depends on the m bits at time t 0 and the intermediate variables µ and κ, In Definition 5, if a bit x i flips, the corresponding element of the sum is −1, and otherwise +1.By doing so, the model is balanced.Summing the elements (−1) di(K i ,κ,µ) m times encodes the number of bit flips of the targeted m bits stored in the register stage.This definition of signal power consumption makes it possible to recover our secret in our measurements from the hardware setup, as it heavily depends on the secret values.If the adversary wants to target one bit of the secret, they choose m = 1 to capture the signal value of one bit; otherwise they take m = n to recover all the secret bits K , which are inputs to the targeted S-box.
As we focus on one sequence χ n at a time, we consider that the power consumption of the remaining sequences of χ n contributes to the overall noise in our model.We define R j as the noise power consumption of a trace j from the non-targeted sequences χ n and other parts of the electronic device, in Definition 6. Definition 6 (Noise power consumption).We assume the noise power consumption is a random variable drawn from a zero-mean normal distribution with variance σ 2 for a trace j.It is independent of the intermediate variables µ and κ: R j ∼ N (0, σ 2 ) .
The power consumption P j of a single trace is described with the Gaussian additive noise model [MOS11] in Definition 7. It is characterized by a random variable drawn from a normal distribution with variance σ 2 and mean S = 0. Definition 7 (Power consumption).We define the power consumption for a secret (K , κ) to be the sum of the signal part and the noise part for a trace j and a message M j , such as The above definition makes it possible to model real experiments in a simple manner accurately.At last, we take advantage of the simplicity of the presented leakage model to propose a novel attack strategy to recover some secret bits by manipulating the activity function underlying the modeled power consumption.By doing so, we take advantage of the cryptanalysis of the S-box.

Pearson Correlation Coefficient and Signal Reference Values
In a standard CPA attack, the adversary first feeds an oracle (i.e., our activity function) with some knowledge (µ) and hypotheses about the secret.The oracle then estimates of the power consumption [BCO04], as explained earlier, and creates a set of power traces P containing power traces P j .Following this, the adversary computes the Pearson correlation coefficient between the a set of power traces P and the oracle output for each hypothesis.The more power traces the adversary has, the higher their chances are to recover the secret solution, thanks to the Gaussian additive noise model and Proposition 1.More precisely, the average power trace of a set P has a mean S and a variance that decreases proportionally with the number of measurements.As a result, the amount of power traces collected by the adversary is positively correlated with the probability of determining whether a bit has flipped for fixed (K , κ, µ).
Proposition 1 (Average of Gaussian random variables, see [LL02, Section 2.4]).The average of N independent Gaussian random variables with mean S and variance σ 2 is also a Gaussian random variable with mean S and variance The knowledge of the oracle, or activity function, is stored in tuples of signal reference values S ref , which are determined solely by the activity of (K , κ, µ).More specifically, the signal reference values result from the computation of the signal power consumption (Definition 5) for all possible values of (K , κ, µ) given the m targeted register bits and the length n of the S-box.They are first sorted in tuples depending on the value K .Then, each tuple contains sub-tuples indexed by the corresponding value of κ, which themselves list signal reference values indexed by the corresponding value of the message µ.Suppose the adversary targets all the m bits of the register (m = n).In that case, there are 3 • 2 n evaluations of the signal power consumption from Definition 5 for all the possible values of K , κ, and µ.Otherwise, if the adversary targets one bit of the register (m = 1), then the signal power consumption is evaluated for the two possible values of K and 2 3 values of κ and µ due to the algebraic expression of the activity function (Definition 4), which depends on one bit of K , and three bits of κ and µ.
Definition 8 (Signal reference values S ref ).The signal power consumption values are stored in tuples of tuples S ref (K ), one for each of the 2 m possibilities of K , which contains tuples S ref K (κ), one for each value of κ , which consists of all signal power consumption evaluations for the message µ with fixed (K , κ).More formally, we have: which, for a fixed value K , consists of: Then, for a fixed secret value (K , κ), we have: where S(K , κ, µ) is the signal power consumption.
The adversary then computes the Pearson correlation coefficient between those reference values S ref K (κ) and the power consumption to find the most likely secret value of K and κ, as presented in the following Definition.

Definition 9 (Pearson correlation coefficient). The Pearson correlation coefficient ρ of two random variables P and S
with E the expectation and σ the standard deviation [LL02, Section 2.3].
Pearson correlation coefficients range between −1 and +1, where +1 (resp.−1) indicates that the two random variables are perfectly positively (resp.negatively) linearly correlated.In our case, it means that the power consumption is positively or negatively correlated with a secret value (K , κ).The adversary deduces which secret value is the most likely solution for the measured power traces by selecting the highest absolute correlation value.
The adversary's chance of finding a solution secret depends on the amount of gathered power traces.We further explore this fact to compute probabilities of success based on the number of collected traces for each of our attacks.

Attack Strategies
We detail two CPA attack strategies to target one sequence χ n at a time.The first strategy is CPA [BCO04], which simultaneously targets n bits of the register.To the best of our knowledge, we are the first to provide the details of this attack on a quadratic map (χ 3 and χ 5 in this paper).In comparison, the second strategy combines n single-bit CPA attacks.We call it combined CPA or Snake attack.Both the presented strategies are device-agnostic and are finely tuned for quadratic maps.

Correlation Power Analysis
In our first strategy, the attacker exploits the joint power consumption of the n bits of the sequence χ n at a time.They can thus have correlation results for 2 2n secret hypotheses at once.For example, for n = 3, the attacker can directly recover the six secret bits (K 0 , K 1 , K 2 , κ 0 , κ 1 , κ 2 ).In order to exploit the leakage of the n register cells, the adversary employs the well-known CPA (see Section 3.2).We describe how to adapt it to our leakage model to compare it with combined CPA.
The adversary represents the signal power consumption values with 2 n tuples S ref (K ), each containing 2 n tuples S ref K (κ), consisting of 2 n values for the possible values of µ (Definition 8).That is because the adversary computes signal power consumption reference values for m = n (Definition 5).To be more comprehensive, we write below the signal reference tuples S ref (K ) for three bits.The explicit values of the binary sequences are written in monospace font, e.g., 000.We denote by S ref (K = x) the tuple of signal reference values for a fixed K , whose value equals the sequence x.In a similar fashion, we write the sub-tuple of signal reference values for a fixed K equal to x and κ to y by S refx (κ = y): The tuples of signal references values for χ 5 can be trivially extended from the above tuples.An explicit example of a tuple of signal power references for a secret K being 100 and n = 3 is provided in Table 1.In the table, each bit of µ is written as µ 0 µ 1 µ 2 , and the same goes for κ.The tuple S ref (K = 100) is represented in a matrix form, where the columns are the said tuples S ref100 (κ) and the lines indicates the corresponding values of µ.
According to Definition 5, if all the n targeted bits of the register stage flip between time t 0 and t 1 , the corresponding signal power consumption equals −n.If only one bit flips, it equals +1 for n = 3; if not, it equals −1.The signal power consumption is +n only if the n bits have not flipped.Hence, the sign of the Pearson correlation coefficient is regarded to determine whether the bits have switched.The adversary thus has enough information to recover the 2n secret bits at once, given sufficient power traces.Ultimately, Table 1: Signal power consumption of reference S ref (K = 100) with 2 3 tuples for κ described by 2 3 elements representing the µ, for the CPA strategy.
the attacker aims to recover the secret K only, but they need the additional information provided by the bits of κ as an intermediate step to reach that goal.Degenerated cases.For n = 3, six out of eight tuples S ref K (κ) are linearly uncorrelated for a fixed K .The latter makes it possible to distinguish the eight hypotheses for κ with the Pearson correlation coefficient for a given K .The only exceptions are when K = 000 and K = 111.For those values, only four reference tuples S ref K (κ) are linearly uncorrelated out of the eight ones.It means that the best adversarial guess is an equally likely secret hypothesis and its bit inverse.We call those cases degenerated.Compared to Table 1, the tuple of signal power reference values for a degenerated case has the highest absolute values positioned on the diagonal lines.For example, let (K = 000, κ = 010) be the values to guess.In the best case, the adversary obtains the solutions (K = 000, κ = 010) and (K = 111, κ = 101) as equally likely.When n = 5, the degenerated cases do not appear.Indeed, the tuples S ref K (κ) contain more elements (2 5 ) because there are more possible values µ since n = 5.That makes it possible to distinguish all the different secret possibilities.

The Snake Attack: Combined Correlation Power Analysis
In the second strategy, the adversary combines single-bit CPA attacks to retrieve n bits of κ and then the corresponding n bits of K .The leakage of each bit x i is exploited with the Pearson correlation coefficients (Section 3.2).We want to reduce the number of intermediate steps for this attack to recover the 2 2n possible secret bits compared to the previous CPA strategy.
Our oracle is the activity function (Definition 4), which is evaluated for (K i , κ, µ), i.e., one bit of the targeted sequence χ n .Thus, m = 1 for this strategy.To reduce the number of intermediate steps, we take a closer look at the expression of the activity function: we see that κ i simply adds to K i .That means that the sum i = κ i ⊕ K i affects the sign of the signal power consumption values.The signal reference values where i = 0 have the opposite sign of the ones where i = 1.Hence, the Pearson correlation coefficient ρ S ref K i (κ),P has the same absolute value regardless of i , considering that all the other parameters stay unchanged.As a result, the adversary can not determine the individual bits of the sum i with only the signed value of the Pearson correlation coefficient.However, the adversary obtains information about the most likely hypothesis on (κ i+1 , κ i+2 ).They then combine those bits of knowledge when targeting the following i-th bit of the sequence χ n .The adversary thus successfully exploits the redundant information underlying the Pearson correlation coefficients about the secret bits in a divide-and-conquer manner.
As an example, for n = 3, acquiring the correlation coefficient values for the first, second, and third bits makes it possible to retrieve respectively (κ 1 , κ 2 ), (κ 2 , κ 0 ) and (κ 0 , κ 1 ), as highlighted in Figure 3a.Furthermore, Figure 3b presents the retrieved bits of i = 0: κ for a sequence χ 5 .The adversary thus exploits the redundant information provided by the quadratic terms of the activity function to determine the most likely κ.We henceforth denote by * κ i+1 κ i+2 (i ∈ Z/nZ) the fact that the sign of K i ⊕ κ i is discarded.
As a result, we refine the signal reference values from Definition 8 to compute the Pearson correlation coefficient.To that end, the adversary generates n reduced tuples of signal power consumption references S ref for each bit to combine the correlation results.
To not take into account the sign effect, the adversary fixes the value of K i ⊕ κ i to either 0 or 1 while generating the reduced signal reference values.We define below in a formal manner the reduced signal reference values S ref for our optimized combined CPA.
Definition 10 (Reduced signal reference values S ref ).The reduced signal power consumption values are stored in n tuples of tuples S ref (K i ), one for each targeted bit x i of the output of S-box, which contains 2 2 tuples S refi ( * κ i+1 κ i+2 ), one for each value of (κ i+1 , κ i+2 ), which consists of 2 3 signal power consumption evaluations for the values of the message µ.More formally, we have: which, for a bit i, consists of: Then, for a fixed secret value (κ i+1 , κ i+2 ), we have: where S(K i , κ, µ) is the signal power consumption for m = 1.
Like the signal reference values S ref from Definition 8 with m = 1, the adversary generates 2 3 values in the tuple S refi .That is because the algebraic expression of the activity function (Definition 4) depends on three bits of µ for a bit i.
For an explicit example, Table 2 provides the tuple S ref (K 0 ) for the first bit of the register (m = 1) with n = 3, as specified in Definition 8 for the standard CPA strategy.
We now replace it by Table 3 by discarding the tuples where K 0 ⊕ κ 0 = 1 to eliminate the sign effect of i following our combined CPA strategy.We have thus reduced signal reference values.
000 001 010 011 100 101 110 111 Table 3: Reduced signal power consumption of reference S ref (K 0 ).In practice and in this example, the adversary chooses to retain the tuples where K ⊕ κ 0 = 0 from Table 2 according to Definition 10.
In addition, we provide the reduced tuples of signal reference values for n = 3 and n = 5.For specific values of κ i+1 and κ i+2 , we write the binary sequence in monospace font with the correct amount of * to discard the unknown bits of κ (e.g., **01* for n = 5, i = 1, κ 2 = 0 and κ 3 = 1).We start with the reduced tuple S ref for a three-bit S-box: As a convenient result, with the reduced reference values that discard K i ⊕ κ i , the attacker obtains signal reference tuples S refi ( * κ i+1 κ i+2 ) for each K i that have a correlation coefficient equal to zero.In other words, there is only one best adversarial hypothesis on κ i+1 and κ i+2 in this strategy.Furthermore, which half of the tuples S refi ( * κ i+1 κ i+2 ) the adversary uses is irrelevant to retrieve the bit value of κ.
The adversary then computes the n Pearson correlation coefficients ρ S ref i ( * κi+1κi+2),P , which we henceforth write ρ * κi+1κi+2 for readability purposes.They obtain positive and negative values to guess the secret.That is because they arbitrarily choose which four reference values to discard for each tuple due to the sign effect of i .To eliminate the latter, the adversary squares the correlation coefficients or takes their absolute values.
The adversary then combines the absolute or squared correlation coefficients to determine the most likely hypothesis for the n bits of κ.However, selecting the highest result for each bit may provide an inconsistent value of κ due to some overlapping of the retrieved bits.To tackle this problem, the adversary chooses one correlation coefficient per bit and adds them to have coherent 2 n values of κ.For example, κ = 011 is obtained by adding the correlation results corresponding to κ = *11, κ = 0*1, and κ = 01*.There is no other combination.Figure 4a depicts how to obtain two possibilities for κ, namely κ = 000 and κ = 101.For our five-bit S-box, Figure 4b presents the combination of the correlation coefficients for κ = 00000 and κ = 10101.The labels in the figures symbolize the Pearson correlation coefficients ρ * κi+1κi+2 for a particular value of (κ i+1 , κ i+2 ) with the remaining unknown bits of κ, whose numbers depend on the length n of the considered S-box.Visually, linking the correlation results looks like performing a 'Snake walk'.Hence, the nickname of the combined CPA strategy: the Snake attack.
We provide in the following a definition of the Snake walk, or the combination of the Pearson correlation coefficients, to recover the bit information κ while individually targeting the n bits of the register state.We have n reduced tuples of signal reference values S ref (K i ), which contains each four reduced sub-tuples S refi ( * κ i+1 κ i+2 ), one for each evaluation of (κ i+1 , κ i+2 ) from Definition 10.Definition 11 (Snake walk).The Snake walk represents the addition of the n squared or absolute Pearson correlation coefficients ρ * κi+1κi+2 , such that the known values (κ i+1 , κ i+2 ) are equal for all the targeted bits of the output of the S-box in the register stage.In other words, for a fixed bit indexed by i ∈ Z/nZ, the value of κ i+2 in the index of the Pearson correlation coefficient ρ * κi+1κi+2 , which targets the i-th bit of the output of the S-box, is equal to the value of κ i+2 in ρ * κi+2κi+3 , which targets the (i + 1)-th bit of the said output.
Recovering K or K .Once a value κ is labeled as the most probable solution,the adversary can either recover K that is encoded in the entire initial register state or K from Definition 3. To find the value K, the adversary targets all the output bits of the S-boxes stored in the register state after one round to recover all the corresponding variables κ.Afterward, they can determine K by applying the inverse of the linear layer λ −1 on the concatenation of the variables κ.Alternatively, the attacker infers the n bits of K by retrieving the sign of the correlation values for the i-th studied bits.Indeed, the adversary can recover the initial secret by computing i ⊕ κ i = K i for each bit x i .That is because, from the expression of the signal power consumption, if the sign of the correlation coefficient is negative, the sum i equals 1. Contrarily, if the sign is positive, i equals 0.
As a result, the adversary performs at most n2 2 + n intermediate steps by exploiting the sign of the correlation values to recover 2 2n bits of information (K , κ) when targeting one S-box by taking advantage of the algebraic expression of the activity function (Definition 4).The latter heavily relies on the expression of the quadratic map of the permutation.
Otherwise to recover directly the value K, like in our practical evaluations, the adversary can focus on recovering all the corresponding κ, which results in 4n correlation values to manipulate times the number of parallel S-boxes for a round function and then apply the inverse of the linear layer once on all the retrieved values κ.Concretely, for one S-box, this reduces the number of correlations that need to be computed for n = 3 from 64 to 12 and for n = 5 from 1,024 to 20, resulting in a vastly more efficient attack.

Practical Evaluation
We provide a method to determine the rank of the secret to guess.We then describe how to compute the probabilities of success for the secret bits to retrieve to be ranked at a given position.Finally, we present our hardware setup to evaluate our attack strategies.

Rank of a Secret Hypothesis
We not only intend to quantify the gain in terms of attack success probability between one attack strategy and another for a three-bit and a five-bit quadratic S-box but also to compare those probabilities for the rank of the secret solution.There are as many ranks as the number of correlations that need to be computed: it equals the number of tuples S ref (K ) times their number of sub-tuples S ref K (κ) for the CPA strategy, or 4n for the Snake attack strategy.The secret hypotheses are ranked by their scores, which are (signed) correlation coefficients for CPA and absolute or squared coefficients for Snake.
Definition 12 (Rank of a secret hypothesis).A secret hypothesis is ranked first (resp.second, etc.) if its corresponding score is the highest (resp.second highest, etc.) positive value compared to the other secret hypotheses.
The above definition is more restrictive than the order of the success rate [SMY09].Indeed, we compute the probability to be ranked r-th, whereas the order of success rate considers whether the solution is ranked among the r-th first hypotheses.

Attack Success Probabilities for the Ranks
We aim to plot the probabilities to have the solution secret be ranked first, second, etc., for an adversary depending on their amount of power traces, also known as signal-to-noise ratio (SNR).A low SNR means manipulating a few power traces, hence much noise in the measurements.A high SNR highlights the signal part because the adversary gathered a sufficient number of power traces (Proposition 1).
We then compare the two attack strategies (Snake attack and CPA) w.r.t. the probability that an adversary can recover the solution value of the secret bits.After that, we infer which strategy performs better in terms of the required number of power traces and intermediate steps.

Hardware Setup
For our practical evaluation, we target round-based implementations of Xoodoo (n = 3) and Keccak-p (n = 5) for a width b = 1600, on a Spartan 6 FPGA running at 4 MHz.We measure the current via the voltage drop over a shunt in the supply path.Then, to acquire suitable power traces, we used a ZFL-2000GH+ Low Noise Amplifier (LNA) configured with a 22 dB gain and a Spectrum M4 oscilloscope (8 −bit resolution) with a sample rate of 2.5 GS/s.
For Xoodoo, we randomize the whole input state, adding a random key that is fixed for all measurements, before performing the first round.It should be noted that to attack Xoodoo, we also have to account for the addition of the round constant with the mapping ι and the ρ east linear mapping ρ east that are performed after χ 3 .
In contrast to this, for Keccak, we cannot randomize the whole 1600-bit input state since the target FPGA does not have enough input pins.We tackle this by randomizing only 128 bits and measuring 13 rounds.Following this, we target the transition from round 12 to 13.The reason for this is that after 12 rounds, we are sure that the whole state is randomized and view the state at this point as being a randomized message M added to a fixed key K. Thus, we choose the random key K after the measurement and compute the fictional M from the randomized state.Then we attack as known and we also compute µ = λ(M ) and account for the addition of the round constant with the step ι.
We have implemented both the standard CPA attack and the Snake attack in C, making use of multi-threading.Furthermore, we will make our implementations publicly available upon acceptance of the paper.

Attack Success Probabilities Results
First, we present our practical results for Keccak-p and Xoodoo with the two attack strategies.Secondly, we argue about the probability distribution obtained with the Henery model.

Practical Results
We present discrete attack success probabilities computed from measurements from the hardware testbed in Figures 5a and 5b for Xoodoo and Figures 6b and 6a for Keccak-p.The probability of being ranked first is depicted in black.The gray curves represent the probability of being ranked second, third, etc.For those curves, the higher the rank (rank two being the highest), the slower they collapse.In other words, the higher the rank for the gray curves, the higher their relative position is in the figures.In addition, the success probability to be ranked first equals one for a high SNR (i.e., a high amount of power traces), whereas the other ranks have a zero probability.Note that to make the gray curves more visible, we limit the x-axis to 25,000 traces for Xoodoo and 400,000 traces for    For Xoodoo, the results were generated for sets of power traces that contain 75,000 traces each.The attacks were performed 16,896 times each for CPA and combined CPA/Snake attack.The initial success probabilities (i.e., full noise or low SNR) are close to be equally distributed for each strategy: ideally, they are equal to 1/2 2n for CPA and 1/2 n for the Snake attack.However, in practice, the first probability already includes a low number of processed traces, which leads to a slight deviation from the ideal value.
Due to the much larger state of Keccak-p compared to Xoodoo, there is a lot more noise when targeting the nonlinear layer of Keccak-p.It is reflected in Figures 6b and 6a   increased number of traces needed to recover the correct secret.Indeed, we have 1.5 million traces per set.To match the Xoodoo experiment, we conducted 16,896 experiments.
As a result, the Snake attack on Xoodoo recovers the correct κ after 43,860 power traces for all experiments.The standard CPA attack takes 61,380 traces until all experiments label the solution (K , κ) as the first rank.We also recall that for each Snake attack experiment, only 12 correlations must be computed, compared to 64 for standard CPA.
For Keccak, there are even higher differences concerning the number of power traces.Indeed, the Snake attack consistently finds the value of the solution κ after 771,600 traces for all experiments.In contrast, the standard CPA attack requires 1,223,400 traces.Computationally, the Keccak Snake attack requires computing 20 correlations: it outperforms the standard CPA attack, which operates on 1,024 correlations.
Figures 7a and 7b directly compare the probabilities that the solution hypothesis is not ranked first.We can see that the Snake attack constantly has a lower probability than the standard CPA approach, demonstrating that the Snake attack outperforms the standard CPA approach.In summary, the quadratic function χ n is attacked with fewer efforts thanks to taking advantage of its algebraic expression with combined CPA, or Snake attack, compared to the standard CPA.

Probability Distribution: Henery Model
If we consider our approach from a probabilistic modeling perspective, we are comparing and ranking 2 2n , or 2 n , possible outcomes in the form of Pearson correlation coefficients between observed and reference power consumption values.The probability of success for each outcome (i.e., secret hypothesis) is given by how close it is to the original solution.We note that the probabilities of success in Figures 6a and 5a add up to one for a given number of traces.They are dependent of each other and follow a particular order (the higher the probability of success, the higher the rank).It is akin to ranking the outcomes in a horse race according to the horses' probabilities of winning.This type of problem was studied and formalized by Henery in 1981 [Hen81].
In the latter, the author defines a probability model over each outcome permutation (i.e., ranking in this work) given a particular distribution family for the outcome variables.
Our ranked success probabilities follow a Henery model, but it is unfortunately (generally) impossible to derive an analytically tractable expression for them, given an arbitrarily distributed outcome.Thus, we rely instead on real-world measurements.

Discussion
In this section, we refer to existing attacks in the literature that employ similar side-channel methods to the ones presented in this paper.We discuss the similarities and differences with our work in this paper.
Samwel and Daemen [SD17] extended the work of Bertoni et al. [BDD + 12] by targeting the full sequence χ 5 as implemented in Lake Keyak, with DPA on a FPGA instead of targeting only one bit of the said S-box.It is similar to our standard CPA attack scenario from Section 4.1.They concluded that attacking a full binary sequence is less efficient than attacking a single bit of the non-linear layer.We continue this work by providing more detailed insight into the attack on the full sequence χ 5 with CPA and by formalizing the attack in order to generalize it to any χ n .Indeed, we provide the probabilities of success according to the ranks and detail the tuples of signal reference values.We also study the case for χ 3 and exhibit the potential degenerated cases that an adversary might encounter.We improve on the single-bit attack with Snake, or combined CPA, by effectively reducing the number of intermediate results to compute.
Furthermore, Messerges et al. presented the multi-bit DPA approach [MDS02] where, instead of considering the leakage of one bit with the selection function, the attacker considers d bits.From there, three partitions are created depending on the values of the output of the selection function for d bits: one where the output of the selection function is 0s, a second one for all 1s, and a third one for mixed values.The meaning of partitions in this work corresponds to the signal reference values in our case.However, our partitions-tuples of signal reference values-reflect the precise number of bit flips.
The latter work inspired the authors from [LCC + 06], which proposed the Partitioning Power Analysis (PPA) as a more general method to exploit the power consumption signals of d bits instead of only one.Their presented attack is similar to CPA but without normalizing the correlation coefficient by the standard deviation of the power consumption values.They classify power traces into (d + 1) classes according to the Hamming distance of the considered d bits between a previous and current state.This approach is included in our work for the classic CPA approach, where the signal reference values can have values ranging from −n to +n depending on the Hamming distance of the state before and after one round.We further extend their work by combining the Pearson correlation coefficients to have coherent secret values, as in the Snake attack.
Other works in power analysis have similar notions to our tuple of signal reference values.Another example is the power analysis attack presented by Standaert et al. for an FPGA implementation of AES [SÖP04].They used a selected prediction matrix similar to our tuples of signal reference values.Nevertheless, they do not link the reference values among themselves to reduce the complexity of their side-channel attack unlike our work.
On the other hand, combining leakage signals is used for higher-order masking schemes [CPR07], where masking refers to splitting a secret variable into several shares.A power analysis attack is applied to the individual shares of the secret.It uses a combining function (e.g., a product, a difference of absolute values) on them to find a high correlation with the original un-shared value.However, the term combination is used differently than in our paper for combined CPA, for which we exploit the information redundancy to reduce the number of intermediate steps to compute.Some attacks were proposed to reduce the complexity of the power analysis, like the Butterfly-Attack [ZKS12] presented by Zohner et al. for the modular addition.It comes in handy to reduce the complexity of a DPA attack on this operand by lowering the number of hypothesis computations in a divide-and-conquer manner and then exploiting the information on the symmetric counterpart of the solution key.They exploit the latter because, by splitting the modular addition results into blocks, the divide-and-conquer results no longer highlight the solution key with the highest correlation value.Nonetheless, it is applied to a different kind of cipher (that uses modular addition) and in a different context than the permutations for symmetric cryptography targeted in this paper.
Applicability to Ascon.The NIST Lightweight Cryptography competition winner, Ascon, has an S-box that is affine equivalent to χ 5 [DEMS21].It operates on each bit-slice of five registers that contain 64 bit-word and are denoted by x i,0≤i≤4 .For the first round of the first permutation, the register x 0 is initialized by a constant C (or Initialization Vector), x 1 by a first key word K 1 , x 2 by a second key word K 2 , x 3 by a first nonce word N 1 and x 4 by a second nonce word N 2 .For the same bit position for the 64 bit-word, we can write the Algebraic Normal Form (ANF) of the S-box after its first execution (operations carried in F 2 ): The Snake attack exploits the redundancy of quadratic terms, where some key bits appear.It also assumes that there are as many key bits as the size of the S-box.We observe, based on the above equations, that the first call of the Ascon S-box does not fulfill those requirements: there are only two key bits for a five-bit S-box and only the registers x 0 and x 1 contain quadratic terms with two key bits.In that case, the Snake attack does not provide any advantage, and we refer to Samwel and Daemen [SD17] for a detailed DPA attack on the Ascon S-box with the same leakage model as ours.

Future Work
In this paper, we presented in details CPA attacks on χ 3 (Xoodoo) and χ 5 (Keccak-p) to understand in depth how to optimize standard correlation power analysis.They can be extended to larger quadratic S-boxes employed in other ciphers, for which the difference in the number of intermediate results will be even more drastic between the Snake attack and the standard CPA.For example, one could target Subterranean 2.0 [DMMR20] NIST Lightweight Cryptography competition candidate that uses as an S-box the function χ 257 , or Rasta [DEG + 18] and Dasta [HL20] that use χ n with n being the block length and an odd number.Furthermore, to apply our Snake attack to different quadratic functions (other than χ), one would need to change the activity function accordingly, then compute the corresponding signal reference vectors and sum the correlation results according to our Snake walk.
Additionally, our Snake attack heuristically combines multiple one-bit CPA attacks, which opens the door for an approach that relies on DPA instead since only two classes are to be distinguished.An ad-hoc porting of the Snake attack could include the computation of Welch's t-test in place of CPA, and then the t-test results would be heuristically combined with the 'Snake walk' to obtain the secret κ.We performed preliminary experiments on the same trace sets of Xoodoo used for the results shown in Figure 5b.However, we observed that the sum of squared t values results in a weaker attack that requires more traces than the standard CPA approach.Thus, a method to rely on DPA remains for future work.
Moreover, it would be interesting to assess whether our method can be applied to other types of leakage, e.g., electromagnetic information with Differential Electromagnetic Analysis (DEMA) introduced by Quisquater and Samyde [QS01].
To extend the scope of this study, one could try to recover the initial secret after several rounds or a masked implementation with the help of the presented model and theory.As the Snake attack exploits the information redundancy of quadratic terms to find the secret values, it seems a natural fit for masked hardware implementations of cryptographic algorithms (like Threshold Implementations [NRR06]), which are designed to split the secret bits into several variables each.

Conclusion
We considered the difference in power consumption of register cells before and after one round of a hypothetical implementation of a permutation-based algorithm for quadratic maps, which are used as S-boxes.This work details the case for χ n with n ∈ {3, 5}.We model the flipping activity of the register state with simple assumptions, which takes advantage of the cryptanalysis of quadratic maps.We correlate the measured consumed power for the targeted bits with reference values by the mean of the Pearson correlation coefficient.Afterward, we study the success probabilities of CPA and combined CPA, or Snake attack, to retrieve up to 2n secret bits.To the best of our knowledge, we are the first to describe both the standard CPA attack and the novel combined CPA for quadratic maps.
We demonstrate that an adversary performs better with our novel combined CPA strategy (Snake attack) than the standard CPA approach for small S-boxes.In order words, the attack uses n2 2 + n intermediate steps for the combined attack, instead of 2 2n for the classic one, to determine 2 2n secret bits.Not only we successfully reduce the computational complexity of the standard CPA attack with the Snake attack, but we also require fewer power traces to find the secret solution.In addition, for the CPA strategy for Xoodoo, the adversary can recover all secret bits but encounters two degenerated cases that give two equally likely secret hypotheses.We also propose a probability distribution model for ranked attack success probabilities.Our study highlights that a tailored CPA strategy that uses the cryptanalysis of a quadratic map performs better than traditional CPA.
[DC07], there are three affine equivalence classes for this size of S-box.They are made explicit in the work of Bilgin et al. [BNN + 15].A well-known example used in symmetric cryptography for a slightly larger S-box is the five-bit S-box of Ascon [DEMS21], winner of the NIST Lightweight Cryptography competition, which is affine equivalent to the S-box of Keccak.

Figure 2 :
Figure 2: First round of the first permutation.The unknown variables and states to the attacker are in red, what the adversary knows is in green, and the scope of this study is in blue.The gray area contributes to the noise in our model.

Figure 3 :
Figure 3: The information about the secret bits from the different Pearson correlation coefficients is highlighted according to the targeted i-th bit of the sequence χ n in different dotted lines.

Figure 4 :
Figure 4: Combination of correlation coefficients to obtain κ values, namely the 'Snake walk'.Each label (e.g., 0***0) represents an absolute or a squared Pearson correlation result.The values indicated in the labels are the binary sequences κ with the known bits (0 or 1) for κ i+1 and κ i+2 , and the unknown ones (*).
Traces Probability of Success Per Rank (b) Combined CPA (Snake attack) recovering κ.The correlation coefficient are squared to their sign.
Traces Probability of Success Per Rank (b) Combined CPA (Snake attack) recovering κ.The correlation coefficients are squared to discard their sign.
Number of Power TracesProbability to be Rank less than 1(b) n = 5

Figure 7 :
Figure 7: Comparison between CPA (dotted line) and Snake attack (solid line) for the probability of the correct hypothesis to not be rank one.