Novel Key Recovery Attack on Secure ECDSA Implementation by Exploiting Collisions between Unknown Entries

. In this paper, we propose a novel key recovery attack against secure ECDSA signature generation employing regular table-based scalar multiplication. Our attack exploits novel leakage, denoted by collision information , which can be constructed by iteratively determining whether two entries loaded from the table are the same or not through side-channel collision analysis. Without knowing the actual value of the table entries, an adversary can recover the private key of ECDSA by ﬁnding the condition for which several nonces are linearly dependent by exploiting only the collision information. We show that this condition can be satisﬁed practically with a reasonable number of digital signatures and corresponding traces. Furthermore, we also show that all entries in the pre-computation table can be recovered using the recovered private key and a suﬃcient number of digital signatures based on the collision information. As case studies, we ﬁnd that ﬁxed-base comb and T_SM scalar multiplication are vulnerable to our attack. Finally, we verify that our attack is a real threat by conducting an experiment with power consumption traces acquired during T_SM scalar multiplication operations on an ARM Cortex-M based microcontroller. We also provide the details for validation process.


Introduction
A digital signature has an important role as an authentication mechanism in modern security. The Elliptic Curve Digital Signature Algorithm (ECDSA) [KR13,RHAL92,JMV01] is an elliptic curve cryptography-based digital signature scheme that is used in a wide variety of security services. ECDSA has speed and memory usage advantages with a shorter key length while providing a level of security equivalent to that of RSA, a common representative public-key cryptosystem. Thus, it is preferred in constrained environments such as smart cards.
On the other hand, side-channel analysis (SCA) takes advantage of various forms of leakage (e.g., execution time, power consumption, electromagnetic emission, and acoustic emanation) occurring when cryptosystems are executed in devices to retrieve secret information because the instructions and data processed by the device are correlated with these leakages [Koc96,KJJ99,GMO01,GST14,MOP08]. Because it is well-known that SCA can be practically employed to break cryptosystems, many works on SCA against ECDSA have been reported to investigate its practical security.
In this paper, we propose a novel key recovery attack against secure ECDSA signature generation employing regular table-based scalar multiplication by exploiting side-channel collisions between unknown entries. Because ECDSA uses a fixed base, table-based scalar multiplication is commonly employed in ECDSA due to its efficiency and security. Although table-based scalar multiplication can be easily implemented to be practically secure against known SCAs, our attack can still be successful against it. Our attack exploits a form of leakage to determine whether two entries loaded from the table are the same or not, i.e., whether there is a collision or not between two corresponding traces. Without knowing the actual value of the table entries, an adversary can recover the private key for ECDSA exploiting only the collision information by finding the condition for which several nonces are linearly dependent. We show that this condition can be satisfied practically with a reasonable number of digital signatures and corresponding traces. The required number is at most one more than the total number of entries in the table. Furthermore, we also show that all entries of the pre-computation table can be recovered using the recovered private key and a sufficient number of digital signatures based on the collision information. We first explain the detail of our attack against ECDSA signature generation employing regular table-based scalar multiplication. We then show that fixed-base comb and T_SM scalar multiplication are vulnerable to our attack as case studies. Finally, we prove that our attack is a real threat by conducting an experiment with power consumption traces acquired during T_SM scalar multiplication operations on an ARM Cortex-M based microcontroller. We also provide details on how to conduct the validation process.
This paper is organized as follows. In Section 2, we briefly explain ECDSA signature generation and provide an overview of side-channel attacks on ECDSA. We also generalize regular table-based scalar multiplication for the description of our attack. In Section 3, we describe our novel key recover attack, which can recover the private key for ECDSA by identifying the situation in which several nonces are linearly dependent using collision information from the pre-computation table. Section 4 presents case studies on fixed-base comb and T_SM scalar multiplication. In Section 5, we validate the feasibility of the proposed attack by describing an experiment using real traces of T_SM scalar multiplication acquired from an ARM Cortex-M based microcontroller. Finally, we conclude this paper in Section 6.

Notations
Let t ∈ R D×1 denote side-channel traces of length D. Let x[j] denote the j-th entry of vector x. Let Z a := {x mod a|x ∈ Z} and Z b a := {(z 1 , z 2 , ..., z b )|z 1 , z 2 , ..., z b ∈ Z a }. Let A B be the concatenation of vectors (or matrices) A and B. Let (·) T be a transpose of the vector (or matrix), and M at X×Y (Z) be the set of all X-by-Y matrices with all entries in Z. Let I N is a N × N identity matrix.

Overview on Side-Channel Attacks against ECDSA Signature Generation
The ECDSA signature generation algorithm, described in Algorithm 1, generates a signature (r, s) for message m using private key d obtained from the key generation process. It consists of a scalar multiplication stage over an elliptic curve using a randomly selected secret nonce k (also known as the ephemeral key or ephemeral scalar), and a subsequent stage in which r and s are computed using the value obtained from the scalar multiplication. Naive implementation of Algorithm 1 can be easily broken by side-channel attacks. Side-channel attacks against ECDSA signature generation can be categorized as multipletrace attack (MTA) and single-trace attack (STA) depending on the required number of traces.
Representative of MTA are differential power analysis (DPA) [KJJ99] and correlation power analysis (CPA) [BCO04] against ECDSA signature generation [Cor99, AFV07,HMHW09]. An adversary can conduct a DPA-like attack against intermediate data from the multiplication operation d · r, performed in line 5 of Algorithm 1, by partially guessing private key d because the value of r is known as a signature. The adversary can then test whether the guess is correct by evaluating the correlation between multiple traces and hypothetical values for the intermediate data with a statistical tool and consequently recover the secret d. Hence, to counter this attack, it is required to adjust the operations of k −1 · (h + d · r) in line 5 of Algorithm 1 such that the inverse of random secret k is involved in every prior operation, i.e., k −1 · h + (k −1 · d) · r, to prevent the guessing of the intermediate data.
Another form of MTA is lattice attacks using partial nonces. Partial information for nonces with signatures can be reconstructed as a system of inequalities for private key, which is known as the hidden number problem (HNP) [HGS01,NS02]. The system of inequalities is a closest vector problem (CVP) and can be solved efficiently using the LLL lattice basis reduction algorithm and Babai's nearest plane algorithm. For the lattice attack, partial nonces can be retrieved in the following ways. First, if biased nonces are generated in Algorithm 1, then partial nonces can be easily guessed [AFG + 14, BH19]. Second, if some part of Algorithm 1 is implemented to consume a non-constant execution time and has different times according to the message or nonce, then it can leak partial information [BH09, BT11, YB14, DHMP14, BvSY14, ABF + 15, vSY15, FWC16, GPP + 16, BFMT16, ASS17, GB17, Rya18, DPP20, RSBD20, ANT + 20, MH20, CPB20, JSSS20, GuHT + 20, MSEH20, WSBS20]. Third, it can be possible through template-based attacks [MO09, BCP + 14, BCP + 19] or STA. These attacks, which will be described later, can be used to find only partial information because errors may exist.
Here, we provide an overview of STA. Similar to MTA, the naive implementation of scalar multiplication allows simple power analysis (SPA) [KJJ99], which is a basic form of STA, because the patterns of doubling and addition in a single trace of scalar multiplication are different. This distinguishability is generally caused by two factors. First, differences in the implementation of doubling and addition operations, which results in different execution times and computation methods, are used. Second, an irregular scalar multiplication algorithm is employed, e.g., double-and-add scalar multiplication where point addition is computed depending on the bits of the scalar. Thus, there are two approaches to counteracting SPA.
The first approach is introducing regular scalar multiplication algorithms such as double-and-add always [Cor99] and the Montgomery ladder algorithm [IT02], which always employ an identical sequence of operations consisting of the same number of doubling and addition. The second approach is removing the difference in unit instructions for doubling and addition, such as side-channel atomicity [CCJ04] and unified point addition [BJ02]. Although these approaches are effective in counteracting SPA because they focus on removing the operational leakages of scalar multiplication algorithms, leakages from registers and data are still exploitable and STA using these leakages have been proposed [Wal01, HMH + 12, CFG + 12, BJPW14, HIM + 14, DGH + 16, HKT15, SHKS15, SH17].
Because register-or data-dependent leakages, i.e., collision characteristics, can be determined to be highly correlated or not depending on the bit value of the scalar, all forms of attack using collision characteristics can be categorized as a collision attack (CA). Walter [Wal01] firstly proposed the idea of a collision attack with a single trace, referred to as the Big Mac attack, which can determine the bits of the secret exponent by exploiting the collision characteristics between sliding window exponentiation and its pre-computation process utilizing the difference of means. ROSETTA (Recovery of Secret Exponent by Triangular Trace Analysis) [CFG + 12] distinguishes squaring from multiplication operations by exploiting collision characteristics caused by the same input single precision multiplications in a squaring operation. Unlike ROSETTA, which determines inner-collision in one operation, i.e., squaring, HCCA (Horizontal Collision Correlation Attack) [BJPW14] exploits collision characteristics between two (or more) operations, particularly in the same input operands manipulated in target operations. Hanley et al. [HKT15] extended the idea of HCCA by exploiting the collision characteristics between the input and output operands of the target operations.
Whereas the collision attacks listed above focus on data-dependant leakages in target operations, e.g., field multiplications in the doubling or addition of scalar multiplication, several studies have proposed approaches to exploiting collision characteristics caused by different register behavior dependant on the value of the scalar by measuring locationdependent electromagnetic emissions [HMH + 12, SHKS15]. In addition, these attacks employ clustering algorithms for the better discrimination of register-dependant leakages [HIM + 14, JB17].
An adversary against ECDSA can obtain nonces through STA against scalar multiplication and then can recover the private key as follows: (1)

Table-based Scalar Multiplication for ECDSA Signature Generation
In this section, we generalize regular ws : where ks j is the row index for referencing the j-th column of the table, and the result of Algorithm 3 with sequence ks(k) and ws is the same as with k · P . ks(k) and ws are determined depending on the scalar multiplication algorithm.
It is worth noting that our attack targets Algorithm 3 with function ws, which outputs a value that depends on only j, not ks j . In other words, methods such as the sliding window method are excepted.
Algorithm 2 Preparation of pre-computed tables for regular table-based scalar multiplication Choose α i,j ∈ Z ord that is appropriate for the current table-based scalar multiplication 5: end for 7: end for 8: Return T

Proposed attack
This section proposes a novel key recovery attack on ECDSA signature generation employing regular table-based scalar multiplication. We set two assumptions for the threat model of our attack: Assumption 1 We assume that ws is a function of index j in the loop in Algorithm 3.
i.e., we consider only cases of ws, the output of which depends only on index j, not ks(j). Assumption 2 We assume that an adversary can perform multiple ECDSA signature generation operations where the private key is fixed and acquire corresponding signatures and traces of the table-based scalar multiplication with fixed pre-computation table T .
We emphasize that our attack can be performed even when the adversary has no knowledge of the pre-computed table entries. Because our attack only uses collisions between entries, the pre-computation phase is independent of our attack. We illustrate the details of the proposed attack in two steps: the preparation of the collision information and the recovery of the private key for the ECDSA signature. Figure 1 presents the overall flow of the proposed attack.

Preparation of Collision Information
First, we prepare collision information through side-channel analysis. Regular table-based scalar multiplication repeats the operations in lines 4-5 in Algorithm 3 c times, where c is defined according to the security parameter of the scalar multiplication. In each loop, point doubling and point addition are conducted respectively based on j and ks j , i.e., iteration and the index of the row of the table. Note that, although security parameter c might be confidential, it can be easily obtained by visually inspecting the power consumption of the overall process of scalar multiplication. In addition, row size r of the table is guessable from c using a guessing scalar multiplication algorithm.
We can now consider a attack scenario where an adversary can obtain N signatures (r i , s i ) generated with the fixed private key and different messages and their corresponding traces T i , where i = {1, ..., N }, on the table-based scalar multiplications with the fixed pre-computation table which are operated during the signature generations 1 . Because lines 4-5 in Algorithm 3 are regularly iterated, the adversary can determine and extract samples corresponding to these operations, denoted by a subtrace, for each iteration and reconstruct the trace Let us consider a specific iteration, e.g., 0 ≤ j ≤ c − 1, and every subtrace of each trace corresponding to the iteration, i.e., t 1,j , ... t N,j . In this situation, we can perform collision attacks that determine whether a row index of the table corresponding to a subtrace is the same or not as the indexes of the table corresponding to the other subtraces. These attacks are possible because the data-dependent leakages of any two entries from T are highly correlated if the selected row index of the entries is the same. Because there are at most r different row entries in T for each iteration, we can classify all of the subtraces for the iteration in r different groups by repeating the process (the process described above is presented in Figure 2 and a more detailed version is provided in Algorithm 9 in Section 5). At this time, we assume that this grouping for all iterations is possible without any error 2 . For the rest of iterations, the grouping of the subtraces can be achieved using the same technique. After the grouping of the traces is completed, we can label each t i,j with g i,j ∈ {0, 1, ..., r − 1}. Consequently, we can define the collision information vector corresponding to (r i , s i ) as G i = (g i,0 , g i,1 , ..., g i,c−1 ). In the next section, we describe how the private key can be recovered by utilizing the collision information vectors and the acquired signatures.

Key Recovery by Identifying Linearly Dependent Nonces
In the second step, we find the condition in which nonces are linearly dependent by utilizing the collision information and then recover the private key on the basis of this linear dependency. The values of each component g i,j of G i are assigned arbitrarily as labels in the elements of the set {0, 1, ..., r − 1} only to classify the selected entries from a column of pre-computation table T into r groups. If we consider any two columns, the same label does not mean that the same row index of the entry is selected in the actual scalar multiplication operation. In other words, the labels of the collision information are independent both vertically and horizontally. Hence, in order to indicate independent information between different labels, each component g i,j of G i is converted to one-hot representation E oh as follows: where e i is a unit vector in which only the i-th component is 1 and all of the other components are 0. The collision information vector Hence, each component of v i corresponds bijectively to each entry of pre-computation table T although we still do not know how to map this. Now, we can consider v i to be a coefficient vector representing a linear combination to construct nonce k i with unknown variables, i.e., the table entries. Note that we set the codomain of E oh as Z r ord , not as Z r 2 , because all of the values in the computations in the later process are defined in Z ord . Because v i is a finite-dimensional vector, if there are enough vectors, then a specific vector can be expressed as a linear combination of the other vectors, i.e., it is linearly dependent, via Corollary 1.
Corollary 1. Let V be a finite-dimensional vector space and let the dimension of V be n = dim V . Then, any subset of V that contains more than n vectors is linearly dependent.
As a result, if there are N (> r · c) vectors, there exists at least one vector that can be expressed as a linear combination of the remaining vectors because v i is in the (r · c)-dimensional vector space, i.e., one of N different vectors can be expressed as a linear combination of the remaining vectors. However, it is possible that these linear dependencies also occur in N <= r · c different vectors because v i s are sparse vectors.
To obtain a linear dependency, we construct matrix A with N different vectors v i as rows of the matrix and then construct M as follows: By performing Gaussian elimination on matrix M as shown in (7), the row echelon matrix, denoted by A , can be calculated from A. Then, A = B · A is established for A, B, and A in (7). Note that, because the components of v i represent coefficients of a linear combination of all entries of the pre-computation table, Gaussian elimination is calculated with modulus ord. where If there is a linearly dependent relationship between the used vectors, there is at least one row with all components at zero in the row echelon matrix A . Let us define the corresponding row index as g ∈ {2, 3, . . . , N }. Then, (8) is satisfied, where b g represents the g-th row vector of B. b g · A = 0 and b g = 0 where 0 is a vector with all components at zero. This means that all components of b g = (b g,1 , b g,2 , ..., b g,N ) are coefficients for the linear combination of nonces k i resulting 0. In other words, the corresponding nonces with non-zero components in b g are linearly dependent. Finally, from (9) and the linearly dependent nonces, the private key d can be recovered as shown in (10). Note that finishing the Gaussian elimination in our attack is not required. Because the attacker needs only one row with all entries at zero, the attacker can stop Gaussian elimination early as soon as this row appears.

Discussion of Issues regarding the Proposed Attack
There are some issues with our attack. First, computing the collision information accurately is required. i.e., the collision information should have no errors. However, it is possible for errors to occur in the grouping stage because trace noise can induce errors in the collision attacks. Currently, only a trial-and-error solution is possible. From a sufficient pool of N p (> N ) traces, the attacker chooses N traces randomly and conducts an attack until the recovery of the secret key because our attack assumes that there is no error in the grouping stage. Other solutions can be the focus of future work. Secondly, for the same reason, our attack may be limited for relatively large row size r. The number of groups for each column grows with r. However, because clustering errors are dependent on the signal-to-noise ratio (SNR) of the traces, the limitation associated with the row size is changed according to the SNR of traces. Thus, we skip formulating an accurate limitation for r.
We investigate the probability of incorrect clustering between two different entries assuming an ideal environment, i.e., only the Hamming weight as a leakage model and without noise. In the same way, as Section 5 will describe, we can target points of interest about loading two multi-precision integers 32-bit-wisely from the pre-computation table.
i.e., we can target the leakage on loading the coordinates X, Y of k · P , where P is a base point. These coordinates X and Y of k · P are determined by the corresponding 256-bit scalar. Thus, to simplify the calculation of probability, we assume here that we can target 8 points of interest corresponding to the load scalar for each entry instead of the coordinates. i.e., there are 8 points of interest here. Note that we assume that each entry from the pre-computation table is different. This assumption is reasonable because the probability of two random 256-bit values being the same is extremely low if random extraction is valid. When two different random 32-bit words are chosen, the probability of two Hamming weights of the two words being equal is as follows: Thus, the probability of two leakages being equal when two different entries are chosen is as follows: We claim that the trial-and-error method can be successful because the probability P is very low. We also empirically verify this through 100,000 simulations, with no incorrect clustering for each r ∈ {2 2 , 2 4 , 2 8 } 3 .

Applications: Case studies
In this section, we explore two types of representative regular table-based scalar multiplication for ECDSA signature generation that are considered to be practically or perfectly resistant to side-channel analysis as the targets of our attack.  Libb, Liba]. Of these general table-based scalar multiplication approaches, we take the well-known fixed-base comb method as an example to demonstrate that our attack can be easily applicable to general table-based scalar multiplication deployed in ECDSA signature generation. In this section, for the sake of simplicity, we select the original version of the fixed-base comb method here, as described in Algorithm 4, although there are many variations to improve the efficiency and counteract side-channel analysis. Algorithm 4 is a version of the algorithm combining Definition 1, Algorithm 2, and Algorithm 3 for fixed-base comb scalar multiplication. For the fixed-base comb method, ws(x) = 2 for any x, r = 2 w , c = d

Case Study: Fixed-base Comb Scalar Multiplication
holds. Note that, although the pre-computed table has only one column, we assume that there is a r × c pre-computed table, which is the original table repeated c times, for the sake of simplicity. With this setting, our attack can be applied easily to fixed-base comb scalar multiplication. Table 1 shows how many signatures and traces are required for our attack depending on the security parameter of the fixed-base comb method.

Case Study: T_SM Scalar Multiplication
T_SM scalar multiplication is proposed to be resistant against STA [SCM + 18]. To achieve security against STA, it is designed to differ from other types of scalar multiplication. It can be employed only in specific settings such as ECDSA signature generation because it outputs both random nonce k and corresponding result point k · P for scalar multiplication using randomly pre-computed tables whereas conventional scalar multiplication algorithms calculate k · P based on the inputted k, as shown in Algorithm 5 and Algorithm 6. T_SM scalar multiplication, as described in Algorithm 6, is performed by selecting a random row per column, which is represented by j, and accumulating the scalar and Algorithm 4 Fixed-base comb scalar multiplication [LL94,HMV04] Require: base point P ∈ E(F p ) of order ord, scalar k = (k t−1 , ..., k 1 , k 0 ) 2 ∈ Z ord , security parameter w, d = t/w Ensure: Q = k · P Pre-computation 1: for j = (j w−1 , ...j 1 , j 0 ) 2 ∈ 0 up to 2 w − 1 do 2:  for j ∈ 0 up to n − 1 do 3: end for 6: end for 7: Return T k , T P Algorithm 6 T_SM scalar multiplication [SCM + 18] Require: security parameter λ = m · n ∈ Z + , the order ord of base point P ∈ E(F q ), pre-computation tables T k and T P Ensure: k, Q = k · P 1: k ← 0, Q ← ∞ 2: for j ∈ 0 up to n − 1 do 3: Q ← Q + T P [row, j] 6: end for 7: k ← k mod ord 8: Return k, Q corresponding point for each table corresponding to the rows and columns, respectively. This accumulation is iterated n times, hence, n long integer additions for k (line 4 in Algorithm 6) and n elliptic curve point additions for k · P (line 5 in Algorithm 6) are conducted. Consequently, the accumulated results for the scalar and point are returned as a nonce k and point k · P . With this property, T_SM scalar multiplication can be employed for ECDSA signature generation by replacing step 1 and 2 in Algorithm 1 because it outputs k and corresponding k · P . Note that it is not possible to know whether the output scalar k from T_SM scalar multiplication is uniformly random in [1, ord − 1], as described in step 1 in Algorithm 1, because k is an accumulated result of random entries. However, we can evaluate the security of T_SM scalar multiplication by calculating the number of cases selecting random scalars from table T k to produce a result k of (2 m ) n = 2 mn = 2 λ , which is the same as the complexity of the λ-bit scalar for conventional scalar multiplication. Although it is not yet investigated whether the T_SM method outputs biased nonces or not, this is not of interest in this work, and remains a goal for future work.
It is worth noting that every entry in the pre-computed table of T_SM scalar multiplication is independent and this is an extreme case of table-based scalar multiplication. Due to this, we choose to explore the vulnerability of T_SM against our attack, although the threat model for the T_SM method is not considered against MTA. Note also that it is difficult to reveal the table entries through only STA against the pre-computation phase, which is executed once and occupies many execution times, because the digital oscilloscope has a restricted sampling memory [DLO + 19].
Although the notation in Section 2 is not completely consistent with T_SM scalar multiplication, T_SM scalar multiplication also repeatably uses two pre-computed tables and uses an entry of a column per iteration. Thus, T_SM scalar multiplication can be represented as a case of Algorithm 3 with (14) below for security parameter λ = m · n, ws(x) = 1 for any x, r = 2 m , c = n.
In other words, step 4 in Algorithm 3 is skipped. With this setting, our attack can also be applied to T_SM scalar multiplication. Table 2 shows how many signatures and traces are required for our attack depending on the security parameter of the T_SM method.

Experimental Results
In this section, we validate our attack by conducting a practical experiment of our attack on 256-bit T_SM scalar multiplication for security parameter λ = m · n = 2 × 128 as a proof of work.

Experimental Setup and Trace Acquisition
We implement T_SM calculation, where λ = 256, m = 2, and n = 128, operating on ARM Cortex-M4-based STM32F405 microcontroller [Devb] which is embedded on ChipWhisperer [OC14] CW308T-STM32F [Deva] target board. T_SM calculation (Algorithm 6) is implemented in C and Thumb-2 assembly language. In detail, line 4 for scalar addition is implemented as 256-bit long integer addition and line 5 for point addition is implemented as "madd-2004-hmv" point addition [BL], described in Algorithm 7, where randomly selected entries from T P are inputted as the operand P 2 . For practical reasons, the first iteration of Algorithm 6 where j = 0 is implemented as simple loading, i.e., copying corresponding data from the tables to the variables of k and Q from T k and T P according to the first row value.

Identification and Extraction of Target Operation Traces
In the next step, we perform visual inspection to identify and extract the target operation traces to exploit collision characteristics and then construct the collision information by grouping. To this end, we apply a low-pass filter with an arbitrary frequency lower than the operating frequency, e.g., 1 MHz, to easily identify operations as presented in Figure 4.
First, we determine which are the power consumption traces corresponding to the long integer additions and the point additions. As represented in Figure 4 (a), each point addition operation can be identified by eleven peaks because it consists of eight multiplications and three squarings of long integers. The long integer additions can also be determined because it is located before the point addition operations. We then identify the loading operation, which is located before the first iteration of the long integer addition and the point addition. Now our target operations for the attack are the loading operation for j = 0 and one long integer addition and two long integer multiplications for j ∈ {1, ... 127}, i.e., line 3 and 4 in Algorithm 7, which manipulates X 2 and Y 2 selected from T P according to row (these operations are highlighted with gray boxes in Figure 4 (a) and (c)). Note that, in a real attack scenario, because an adversary may not know when the target long integer multiplication algorithm is operated, it should guess the location of the target operation in this case. However, because this location is necessarily located in the early stages of the iteration, the cost of guessing the location may not be great, thus it does not seriously degrade the feasibility of the attack.
With the power consumption samples of the target operations as references, we can determine the locations of the target operations on the other 512 power consumption traces from T_SM scalar multiplication by utilizing cross-correlation. After that, we reconstruct the target operation traces C i,j where i ∈ {1, ..., 513} indicates the trace acquisition number. Hence C i,0 consists of samples of the loading operation and C i,j where j ∈ {1, ..., 127} consists of samples of one long integer addition and two long integer multiplications sequentially concatenated.
Note that 1 MHz low-pass-filtered traces are used only for the identification of target operations. In the rest of the attack process, the original 5 MHz-low-pass-filtered traces are used for trace cutting exploiting cross-correlation and the reconstruction of C i,j . After the  In each iteration, gray boxes indicates intervals of three target operations according to one long integer addition, which manipulates an entry from table T k , and two long integer multiplications, which manipulate an entry from the table T P consisting of two coordinate values of an elliptic curve point. extraction of all C i,j , we apply the integration compression technique [MOP08] on each C i,j for noise reduction in which 50 samples are integrated into a single sample because we sampled 50 samples per clock cycle. We treat these compressed traces as the same C i,j because these are used for the rest of the attack.

Making Difference Traces and Finding Points of Interest
In this step, we make difference traces D i,j , as described in Algorithm 8, by calculating average traceC j and subtracting it from each C i,j for j ∈ {0, ..., 127} independently to exploit collision characteristics followed by finding points of interest (PoIs). We illustrate examples of difference traces for j = 0 and j = 127 in Figure 5 and Figures 6, 7, and 8, respectively, as these represent two types of target operations. In the corresponding figures, all D i,j are plotted in different colors according to the actual row value used during the trace acquisition process to visually represent exploitable collision characteristics. Of course, in a real attack scenario, an adversary cannot acquire differently colored figures at this stage because D i,j has not been grouped yet. Hence to find PoIs, it should utilize a variance trace, which represents the variance of samples corresponding to each index in the time domain.
For j = 0, because the loading operation solely loads data from each table T k and T P and stores the same data for some variables, every peak in the variance trace can be selected as PoIs. On the other hand, for j ∈ {1, ..., 127}, we heuristically choose the moment only when some data are loaded from the tables to exploit collision characteristics because there are always collision characteristics caused by load operation independent of how to compute long integer operation. Hence, these PoIs should be carefully selected because it is difficult to identify PoIs as shown in Figures 6, 7, end for 8: end for 9: Return D i,j two long integer multiplications (Figure 7 and Figure 8) can be determined by relatively low peaks located before large peaks in the variance trace. Note that, in this case, the variance trace is generated by utilizing all subtraces corresponding to the same operation, i.e., every D i,j with j ∈ {1, ..., 126}. Finally, with the selected PoIs, which are represented by indexes of samples, we reconstruct each D i,j only consisting of samples corresponding to PoIs from each D i,j .

Grouping Traces with Correlation Coefficients
Now, we can determine the group label for each D i,j using Algorithm 9 on N = 513 traces where i ∈ {1, ..., N } and j is fixed with a certain value, e.g., j = 0. First, we prepare two N × 2 m matrices ρ and λ, one for the Pearson correlation coefficients and another to determine the group label, respectively. Then, we calculate the correlation coefficients ρ i,1 between trace D 1,0 and N traces D i,0 . By comparing each ρ i,1 and the mean value of every ρ i,1 where i ∈ {1, ..., N }, we set entries of the first column of λ as one if ρ i,1 is larger than the mean or zero otherwise. For the next step, we find D i,0 with the lowest ρ i,1 , which means the least similar trace to the first group and calculate ρ i∈{1,...,N },2 . Now we set the entries of the second column of λ by comparing each ρ i,2 and the mean of all ρ i,2 similarly as described above. The rest of the process for determining the entries of λ can be generalized as finding the least similar coefficients between every traces and setting the next column of λ. The least similar trace to the former groups can be determined by finding row indices with all zero entries for λ and the minimum of the sum of the correlation coefficients for the former groups corresponding to these indices.
After all of the entries of ρ and λ are calculated, we can determine vector (l 1 , ..., l N ) corresponding to the j-th components of collision information vectors where j is fixed, i.e., g i,j = l i . First, we set the D 1,0 as the first group, i.e., l 1 = 0. For the remaining D i,0 , we find the column index of λ with only one non-zero row entry and all others at zero and determine the group label according to that column index. If there is a row entry with more than one, the group label can be determined by finding the column index with the largest correlation coefficient corresponding to the row index on ρ. By iterating Algorithm 9 with the remaining D i,j where j ∈ 1, ..., 127, we successfully determine the entire collision information vector G i∈{1,...,N } without error.

Recovery of the Private Key
The final step is the recovery of the private key by exploiting the collision information acquired in the previous step. We transform the collision information vectors in one-hot representation, i.e., v i , to find linearly dependent nonces. We can generate matrix M in (7). Next, we calculate M through Gaussian elimination on M . Because the matrix consists of 513 transformed vectors more than the total number of entries in the pre-computation table, there must exists at least one row in which all components are zero in part A 4 . Then, the row vector in B corresponding to the zero row vector of A is a vector resulting that the linear combination of nonces with its components is zero. Finally, we can recover the private key of ECDSA using (10).

Discussion of Clustering Algorithms
Note that we provided a correlation-based clustering algorithm (as shown in Algorithm 9) and confirmed that the clustering algorithm works well by real experiment for m = 2, showing that our attack is valid. This does not guarantee that the clustering algorithm works for a larger class. Because the correlation of fewer PoIs is susceptible to noise,

Conclusion
We proposed a novel key recovery attack against ECDSA signature generation employing regular table-based scalar multiplication by exploiting side-channel collisions between unknown entries. Without knowing the actual values of the table entries, an attacker can extract collision information indicating which group each entry belongs to for every individual column of a pre-computation table through vertical collision attacks. Next, the private key can be recovered by finding the condition in which several nonces are linearly dependent exploiting only the collision information. Additionally, we explained that all of the unknown entries in the pre-computation table can be recovered using the recovered private key with sufficiently more digital signatures and traces. We then presented case studies for our attack against ECDSA employing fixed-base comb and T_SM scalar multiplication to illustrate that our attack can be easily applicable to other forms of table-based scalar multiplication. Finally, we validated that our attack is a real threat by conducting a practical experiment with power consumption traces acquired during the operations of T_SM scalar multiplication on an ARM Cortex-M based microcontroller. We also detailed the validation process.