Some New Methods to Generate Short Addition Chains

Modular exponentiation and scalar multiplication are important operations of most public key cryptosystems, and their fast calculation is essential to improve the system efficiency. The shortest addition chain is one of the most important mathematical concepts to realize the optimization. However, finding a shortest addition chain of length k is an NP-hard problem, whose time complexity is comparable to O($k!$). This paper proposes some novel methods to generate short addition chains. We firstly present a Simplified Power-tree method by deeply deleting the power-tree, whose time complexity is reduced to O($k^2$) sacrificing some increasing of the addition chain length. Moreover, a Cross Window method and its variant are introduced by improving the Window method. More precisely, the Cross Window method uses the cross correlation to deal with the windows and its pre-computation is optimized by the Addition Sequence algorithm. The theoretical analysis is conducted to show the correctness and effectiveness. Meanwhile, the experiment shows that the new methods can obtain shorter addition chains compared to the existing methods. The Cross Window method with the Addition Sequence algorithm can attain 9.5% reduction of the addition chain length, in the best case, compared to the Window method.


Introduction
Public-key cryptosystems are widely used in practice, but they are much slower than symmetric cryptosystems. In the process of encryption and decryption, modular exponentiation and scalar multiplication are the key factors impacting the efficiency. Common public-key cryptosystems include DH, RSA, ElGamal, ECC, etc. Modular exponentiation is used in DH, RSA and ElGamal, which is y = x e mod c. (1) In ECC, scalar multiplication is used as In these two operations, the representation of a positive integer e as a sequence of doublings and additions is involved. In fact, the operations can be abstractly approached using 2's Complements for finding addition-subtraction chain of 160-bit integers. In 2016, Brian et al. [KAJK16] used the Window method to find addition chain of smooth isogeny primes.
In this paper, we take into account some interesting observations about existing methods and put forward our novel ideas. To obtain a shortest addition chain, the exhaustive search of the Power-tree method is unpractical. We study the Binary method carefully and find a "backward-adding" way to construct the addition chain with our simplified power-tree, which deletes the nodes in power-tree substantially. For the Window method, adjacent windows are usually adopted which has limited window combinations. We exploit cross windows which diversify the window combinations and present our Cross Window method for better results. In the pre-computation, a new Addition Sequence Algorithm is presented to build a shorter pre-computation of the used windows, which leads to our Cross Window method with the Addition Sequence Algorithm.
To sum up, the contributions of this paper are as follows: 1. Firstly, we present a Simplified Power-tree method by deeply deleting the power-tree, whose time complexity is reduced from O(r!) to O(r 2 ) .
2. Secondly, a Cross Window method is introduced by improving the Window method, which achieves better result since more window combinations are handled by using cross windows.
3. Thirdly, a Cross Window method with the Addition Sequence Algorithm is given to optimize the pre-computation in the Cross Window method with using our new Addition Sequence Algorithm, which attains 9.51% reduction of the addition chain length, in the best case, compared to the Window method.
The remaining paper is structured as follows. Section 2 briefly reviews some general existing methods. Section 3 shows a detailed description of our novel methods, including the Simplified Power-tree method, the Cross Window method and the Cross Window method with the Addition Sequence algorithm. We perform our experiments in Section 4, which shows that our methods can obtain shorter addition chains compared to the existing methods. Section 5 concludes the whole paper.

Binary Method
The Binary method (BM) uses binary representation of an integer, and an optional addition is performed depending on whether a bit is 1 or 0. The general implementation of BM is shown in Alg. 1 (from [NMMZ17]

Power-tree Method
The Power-tree method (PTM) means that all nodes are represented in the form of a tree, and the nodes on the path are used as the addition chain of an integer. A complete power-tree without duplicate nodes on any path is a tree that contains all possible results, as shown in Fig. 1. The shortest addition chain of an integer can be determined by exhaustive search within all paths, which takes a long time. Let the depth of node 1 be 0. The number of subnodes of a node in depth r is not less than r + 1, and the total number of nodes in depth r is greater than 1 · 2 · 3 · ... · r = r!.

Window Method
The idea of the Window method (WM) is to split the binary form of an integer into some windows, then the windows are processed to get the addition chain through two parts: pre-computation and construction. Let the window length be k. Pre-computation selects all odd integers from 1 to 2 k − 1, and 2, which is {1, 2, 3, 5, 7, ..., 2 k − 1}, with length 2 k−1 + 1.
In WM, for the binary form of an integer e, we read a window w (the bit-length of w (denote as) n(w) ≤ k and M SB(w) = LSB(w) = 1, where M SB(w) and LSB(w) indicate the most and the least significant bit of w respectively). As a result, w is in the pre-computation. In construction, n(w) times doubling step and one time star step are performed. For the consecutive 0s, doubling steps are directly conducted. The implementation of WM is shown in Alg. 2.

New methods
In this section, we propose the Simplified Power-tree Method (SPTM), the Cross Window method (CWM) and its variant the Cross Window method with the Addition Sequence algorithm (CWM-ASA). The former improves PTM by deleting the power-tree. The latters improve WM by using cross windows and optimizing the pre-computation.

Simplified Power-tree Method
We first give another view of BM. Instead of using an optional addition mixed in doubling steps, we do the optional addition when all the doubling steps are done and the addition number 1 is adjusted to corresponding numbers. This implementation of BM is Alg. 3. This view of BM shows a feasible way to construct the addition chain, which leads to the key point "backward-adding" of our Simplified Power-tree method (SPTM). end if 6: end for 7: return A SPTM is proposed by subtly deleting tree nodes, which results in relatively small time and space complexity. A simplified power-tree consists of root chain, main chain and branch chains. The structure of the root chain is BM(m) where m is a base integer to build branch chains, and the main chain is {2 i |n(m) ≤ i ≤ t, 2 t ≤ e < 2 t+1 }. For each node 2 i in the main chain, a branch chain follows as The structure of simplified power-tree is shown in Fig. 2.
Based on the simplified power-tree, the steps of constructing the addition chain of e are as follows: (1) Obtain an addition chain of e by BM(e) and record it as the result.
(2) Search the branch chains and update the recorded addition chain whenever a shorter addition chain is found.
(3) Output the recorded addition chain. More specifically in step (2), for the branch chain followed c i , the corresponding addition chain of e is directly obtained if e is in the branch chain. Otherwise we form an initial chain and do the "backward-adding": whenever the newest integer in current chain adding the node backward in the initial chain is less than e, do the adding and append the adding result in current chain. Each branch chain can get an addition chain of e, as proved in The main chain and the root chain contain {1, 2, 4, ..., c i }, which can construct any integer from 1 to 2c i − 1 by BM, including C by BM(C). As a result, using the "backward-adding" , each branch chain can generate an addition chain of e.
Theorem 2: Let l SPTM be the length of addition chain obtained by SPTM. The range of l SPTM is Proof: In the worst case, all the branch chains cannot get a shorter chain than BM(e). The method degenerates to BM, and the length is n(e) + h(e) − 2.
In the best case, all 1s in the binary form of e are divided into identical form of h(w) − 1 star steps and n(e) − n(w) doubling steps. Thus the addition chain length is

Cross Window Method
The Window method (WM) only considers the adjacent correlation which means that each window is divided sequentially. In practice, there are windows with cross correlation which means there is a cross relationship between the windows. Using cross windows can achieve if t r + t j < e then 15: end if 19: end for 20: return A a better result in somes cases. For example, for the integer (1011111) 2 , it only needs 2 star steps using cross windows, which is less than the result using adjacent windows, as shown in Fig. 3 and Fig. 4.  The Cross Window method (CWM) is to deal with the cross windows. CWM has two parameters: valid window length k and interval expansion length s. CWM has two parts, same as WM: pre-computation and construction. In pre-computation, the valid length k is divided into two parts: the length of the right part R = k/2 , and the length of the left When s ≥ 1, the pre-computation of CWM is obtained by inserting interval zeros between the left and right parts of the pre-computation of WM. The general binary structure is (a) 2 ||0 s ||(b) 2 , a ∈ {1, 2, 3, ..., 2 L − 1}, b ∈ {1, 3, 5, 7, ..., 2 R − 1}, which is performed specifically as follows: (1) Get all odd numbers from 1 to 2 R − 1, and 2.
(3) Combine all numbers from 1 to 2 L − 1 with the interval expansion and the numbers in step (1).
In the construction of CWM, for the binary form of e, we read a window w (where n(w) ≤ k + s and M SB(w) = LSB(w) = 1). If n(w) > s + R, the interval expansion positions of the window are set to 0s. If R < n(w) < s + R, reset w as its first R bits with removing the tail 0s. If n(w) < R, do nothing. As a result, w is in the pre-computation. Then an operation is performed. y x is to align the highest non-zero bit of y with the highest non-zero bit of x and execute a subtraction. A concrete example is listed by binary form in Fig. 6. For e = (11111011100011001101001) 2 , the first window is processed as w 0 = (111000111) 2 , then do e = e w 0 = (00011000000011001101001) 2 . Repeat this for w 1 = (11) 2 , w 2 = (11000101) 2 and w 3 = (1000001) 2 . Finally, e is zero. From the above example, it is easy to find that w 1 is embedded in w 0 as (111110111) 2 . As a result, it cannot construct the addition chain like WM. To solve this problem, we record all the windows at corresponding locations and construct the addition chain from the recorded windows. That is, do doubling steps bit-by-bit from the first recorded window and add the window at each recorded position. The implementation of CWM is shown in Alg. 5.

Cross Window Method with Addition Sequence Algorithm
The pre-computation of CWM can be optimized since some integers in the pre-computation may not be used as a window. In this paper, a new Addition Sequence Algorithm (ASA) is presented to construct a short pre-computation of the used windows. Addition Sequence (AS) refers to the shortest addition chain containing given multiple integers, which is an NP-complete problem. However, AS is solvable in CWM, since only the pre-computation is involved which contains small integers. When we obtain a shorter pre-computation, we can also use larger valid window length and interval expansion length and are possible to obtain shorter addition chain. Now we give a pragmatic ASA, which can find a short addition chain containing all the used windows quickly. For an increasing order sequence A = {e 0 , e 1 , ..., e d−1 }, let the last two numbers be x, y(y > x) and let y = tx + C(0 ≤ C < x). For tx, we get BM x (tx) = {a 0 x, a 1 x, a 2 x, ..., tx} and put it in A by increasing order. We put C in A by increasing order if it is not in A and is non-zero. Thus the addition chain from x to y is formed. Repeat the above steps for the following two numbers in A in reverse order until all integers in A are solved. Finally, an addition chain containing e 0 , e 1 , ..., e d−1 is obtained. The implementation of ASA is shown in Alg. 6.

Algorithm 6 Addition Sequence Algorithm
add BM x (tx) into A in increasing order 7: if C = 0 and C ∈ A then 8: add C into A in increasing order (6)

Proof:
The difference between CWM-ASA and CWM is the pre-computation. The lower bound of the pre-computation length in CWM-ASA is exactly the pre-computation length in CWM, thus the range of l CWM-ASA is the same as l CWM . Theorem 6: In general, let the number of recorded windows be v, the length of pre-computation be u, and the first window be w 0 , the addition chain length obtained by CWM-ASA is Proof: In CWM-ASA, we first construct the pre-computation with length u and then perform n − n(w 0 ) times doubling step repeatedly and (v − 1) times star step for recorded windows except the first window. Thus the addition chain length obtained by CWM-ASA is l CWM-ASA = u + n(e) − n(w 0 ) + v − 1.

Numerical Results
In this section, we implement BM, WM, SPTM, CWM and CWM-ASA and the performance are compared. We firstly show the performance of these methods on small integers with l ≤ 22. Then a general case is conducted with the integers generated randomly with different Hamming weight. Moreover, the integers of effective types of SPTM are exhibited to indicate the irreplaceable advantages of SPTM in some cases. The parameters are selected as: WM: 1 ≤ k ≤ 20; SPTM: 1 ≤ m ≤ 63 and m is odd; CWM: 1 ≤ k ≤ 10, 0 ≤ s ≤ 10; CWM-ASA: 1 ≤ k ≤ 20, 0 ≤ s ≤ 20. The final result for an integer of a method is the shortest addition chain length within the parameter range.

The Integers with l ≤ 22
For 365634 positive integers with l ≤ 22 [Cli], the results of BM, WM, SPTM, CWM and CWM-ASA are shown in Table 1.  In this range, from the first row, we can see the optimal results proportions of BM, WM, SPTM, CWM and CWM-ASA are 2.16%, 12.63%, 18.76%, 23.68% and 62.58% respectively, and from the last row the average gap with the shortest is 3.5433, 1.4605, 1.1369, 1.0475, and 0.3890 respectively. The results of SPTM, CWM and CWM-ASA are better than those of BM and WM, and are more concentrated on the part with smaller gap. CWM-ASA has the best results, and the optimal and suboptimal (the gap with the shortest is 1) results account for 98.53%.

The Integers Generated Randomly with Different Hamming Weight
Let p = Hamming weight bit-length , which means the bit 1 occurs with the probability of p. Select bit-length as 160, 384, 512, 1024, 2048, 4096 and p as 0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 0.9. Set 50 integers for each combination. The average addition chain lengths are shown in Table 2 by bit-length. In the random case, for each bit-length, the results obtained by SPTM is greater than WM, which shows that SPTM is not effective in this case. The results obtained by CWM and CWM-ASA are better compared to WM, and the chain length obtained by CWM-ASA is relatively short. Fig. 7 shows the chain length optimization degree of CWM-ASA compared with WM. When p ≤ 0.5, the optimization degree generally declines with the increasing of p, and the overall optimization degree is relatively low; when p = 0.5, the optimization degree is approximately the lowest; when p ≥ 0.5, the optimization degree generally increases with the increasing of p, and the overall optimization degree is relatively high.
For the bit-length, with the increasing of the bit-length, the optimization degree of CWM-ASA declines. This is because the corresponding extra times doubling step are unavoidably brought in with the increasing of the bit-length, so that the overall cardinality becomes larger.
In addition, the numbers with larger Hamming weight (p = 0.95) are tested, as shown in Table 3. For p = 0.95, the average optimization degree of CWM-ASA is 43.11% compared with BM. When the bit-length is 4096, the average addition chain length obtained by BM is 7988.70, while CWM-ASA is 4414.82, and the optimization degree reaches 44.74%. The average optimization degree of CWM-ASA is 7.89% compared with WM. When the length is 160 bit, the average addition chain length obtained by WM is 202.06, while CWM-ASA is 182.84, and the optimization degree reaches 9.51%.

The Integers of Effective Types for SPTM
SPTM is effective to the integers which have windows whose higest bit are followed by a long series of 0s and the rest of the windows. In this case, the length of the window is so long that the pre-computations of WM and CWM are overwhelming. Without using pre-computation, the result of SPTM is better.
(2) The integers of k bits is generated randomly, and then one bit 1 and s 0s are set ahead to form a window, and the window is copied to k + 1 copies.
(3) The positions of these windows are randomly generated, and the position distance among the windows is not less than k. Then an integer is obtained from these windows with removing the tail 0s.
The average test results are shown in Table 4. In this case, the results obtained by SPTM are the best, and the results obtained by CWM and CWM-ASA are also better than those obtained by WM. This shows that although SPTM is not suitable for the random integers, it can achieve the best results among several methods for the windows having the higest bit followed by a considerable number of 0s.

Computational Cost and Memory Usage
In SPTM, for any given positive integer e, because the main chain and branch chains mainly contain doubling steps, their lengths are approximately equal to the bit-length of e (i. e. O(log e)). The number of branch chains is also O(log e) and a total of O((log e) 2 ) additions are performed. Thus, the time complexity of SPTM is O((log e) 2 ). The branch chains are searched one-by-one and the recorded addition chain is constantly updated, so that the space complexity is O(log e).
In CWM and CWM-ASA, the computational cost and memory usage mainly come from the generation of the obtained addition chain. The computational cost and memory usage of the pre-computation, the window locations and the chain obtained by ASA are negligible, since they handle small integers as windows. The length of the obtained addition chain is O(log e) and each element is generally added by a doubling step or a star step. Thus, for CWM and CWM-ASA, the time complexity and the space complexity are all O(log e).
We give an estimate of the memory usage of the proposed methods in Table 5. For an addition chain of e, each element in the addition chain needs at most n(e) bits of memory usage and the total memory usage is approximately (n(e)) 2 bits. SPTM needs twice memory usage because SPTM stores the current chain and the recorded chain. The proposed methods can be performed in a short time. SPTM can complete the computation in 1 second when the bit-length of the target integer is less than 2048 and in several seconds for the integers of 4096 bits. CWM and CWM-ASA only need several milliseconds for computation, even for the integers of 4096 bits.

Conclusion
In this paper, we proposed a Simplified Power-tree method and a Cross Window method with a new Addition Sequence algorithm. The Simplified Power-tree method constructs a power-tree with deep deletion, which is more suitable when the windows have the highest bit followed by a considerable number of 0s. The Cross Window method considers the windows with cross relationship. The cross windows are processed by recording the window positions for recovery. Furthermore, the pre-computation is optimized with the Addition Sequence Algorithm. The Cross Window method is slightly better than the Window method, and the Cross Window method with the Addition Sequence algorithm has a better optimization, especially in the case of large Hamming weight. Roughly speaking, the average optimization degree is 7-8%, and the best case is 9-10%.