Don’t Forget Pairing-Friendly Curves with Odd Prime Embedding Degrees

. Pairing-friendly curves with odd prime embedding degrees at the 128-bit security level, such as BW13-310 and BW19-286, sparked interest in the ﬁeld of public-key cryptography as small sizes of the prime ﬁelds. However, compared to mainstream pairing-friendly curves at the same security level, i.e., BN446 and BLS12-446, the performance of pairing computations on BW13-310 and BW19-286 is usually considered ineﬃcient. In this paper we investigate high performance software implementations of pairing computation on BW13-310 and corresponding building blocks used in pairing-based protocols, including hashing, group exponentiations and membership testings. Firstly, we propose eﬃcient explicit formulas for pairing computation on this curve. Moreover, we also exploit the state-of-art techniques to implement hashing in G 1 and G 2 , group exponentiations and membership testings. In particular, for exponentiations in G 2 and G T , we present new optimizations to speed up computational eﬃciency. Our implementation results on a 64-bit processor show that the gap in the performance of pairing computation between BW13-310 and BN446 (resp. BLS12-446) is only up to 4 . 9% (resp. 26%). More importantly, compared to BN446 and BLS12-446, BW13-310 is about 109 . 1% − 227 . 3%, 100% − 192 . 6%, 24 . 5% − 108 . 5% and 68 . 2% − 145 . 5% faster in terms of hashing to G 1 , exponentiations in G 1 and G T , and membership testing for G T , respectively. These results reveal that BW13-310 would be an interesting candidate in pairing-based cryptographic protocols.


Introduction
A pairing on an elliptic curve E defined over a prime field F p is a non-degenerate bilinear map of the form e : G 1 × G 2 → G T , where G 1 , G 2 and G T are three groups with the same order r.The two input groups G 1 and G 2 lie in E(F p k ) and the output group G T is a subgroup of F * p k , where k is the smallest positive integer such that r | p k − 1. Taking advantage of the powerful bilinearity property of pairings, a range of cryptographic protocols are designed, such as authenticated key agreements [CK03,Sco13], direct anonymous attestation (DAA) [BCC04, YCZ + 21] and Succinct Non-interactive ARguments of Knowledge (SNARKs) [EHG22,EHG20,AEHG22]. Very recently, pairings were also used to speed up group membership testing on several non-pairing-friendly curves [Kos22].
The fundamental security of pairing-based protocols depends on the difficulty of solving discrete logarithm problem (DLP) on the three pairing groups.The best-known discrete logarithm algorithm on elliptic curves is the Pollard's rho algorithm [Pol75], which requires around √ r group operations.It means that the size of r is at least 256-bit to reach the 128-bit security level.On finite fields side, the best attack algorithm is derived from the Number Field Sieve (NFS) family (eg.[Sch93]).Before 2015, a 3072-bit full extension field F p k was widely believed to be 128-bit secure under the attack of NFS.With a series of new variants of NFS proposed [BGK15, KB16,KJ17], the asymptotic complexity of NFS has decreased significantly.In particular, the special extension tower number field sieve (SexTNFS) [KB16] is tailored to pairing-friendly fields, i.e., the characteristic p can be represented as a tiny-coefficients polynomial of moderate degree.For example, according to the estimate in [BD19,GMT20], the real security level of BN254 is around 100 ∼ 103-bit under the attack of SexTNFS.As a result, parameters of many pairing-friendly curves have to be re-evaluated for achieving the desired security level.In 2021, Guillevic [Gui20] recommended a list of curves at the updated 128-bit security level.In the new estimation, the author found that BN446 and BLS12-446 are best choices for achieving the 128-bit security level in the BN and BLS families, respectively.

Pairing-friendly curves with fast exponentiation in G 1
The development of NFS also affects the selection of pairing-friendly curves in many pairing-based cryptographic protocols.A common scenario is that a protocol is designed to minimize the workload of one party equipped with resource-constrained devices.For example, in the pairing-based DAA scheme the Trusted Platform Module (TPM) is a small discrete chip that is required to perform a few exponentiations in G 1 .One of challenges in the design of the DAA scheme is to minimize the computational cost of the TPM [YCZ + 21].In this situation, pairing-friendly curves with fast exponentiation in G 1 are very attractive.Or equivalently, curves equipped with small size of prime field are well-suited for the DAA scheme.To this aim, Clarisse et al. [CDS20] recommended two 128-bit secure curves: BW13-310 with embedding degree 13 over a 310-bit field, and BW19-286 with embedding degree 19 over a 286-bit field.Besides these, there actually exist other candidates: BLS24-315 and BLS48-286.For these curves, their characteristic p can be represented by five computer words on a 64-bit processor.Among these curves, BW13-310 is competitive because of the small size of the full extension field F p k .It is worth noting that even compared to BN446 and BLS12-446, BW13-310 also has advantage in terms of the efficiency of full extension field arithmetic.Moreover, the odd prime embedding degree k on BW13-310 also leads to a large value of ϕ(k), which induces a small number of iterations for both Miller loop of the optimal pairings [Ver09] and the group exponentiations in G 2 and G T [GS08].

Contributions
In this paper, we give a detailed study of BW13-310.We show that this curve is a powerful candidate in pairing-based protocols at the updated 128-bit security level.Our contributions are summarized as follows: • In Section 3, we propose a new formula for computing the optimal pairing on BW13-310.We show that the computational cost of the Miller loop comes mostly from two evaluations at the same Miller function of bit length around log r/(2ϕ(k)).On this basis, we propose a shared Miller loop such that the two function evaluations can be accomplished simultaneously.In addition, we also give a slight optimization for the final exponentiation.By using these techniques, we also discuss how to compute the products of pairings on this curve in Section 4.
• In Section 5, we focus on optimizing group exponentiations in G 2 and G T on BW13-310.In the case of G 2 , we show that GLV [GLV01] and GLS [GLS09] methods can be combined to build a 2ϕ(k)-dimensional decomposition, which means that the number of point doublings is only around log r/(2ϕ(k)) (≈ 12).In the case of G T , we develop an all-positive decomposition such that group inversion operation can be avoided.
• In Section 6, we provide high performance software implementations of pairing computation, hashing (to G 1 and G 2 ), group exponentiations and membership testings over BW13-310 on a 64-bit processor.By means of the RELIC cryptographic toolkit [AG], detailed performance comparisons of all building blocks on BW13-310, BN446 and BLS12-446 are presented.
-It is surprising to observe that the single pairing computation on BW13-310 is only up to 4.9% and 26% slower than that on BN446 and BLS12-446, respectively.
In particular, the computation of the and exponentiation in G 2 .
• In Section 7, we estimate the performance of the Boneh-Lynn-Shacham (BLS) signature scheme [BLS04] and the unbalanced Chen-Kudla (UCK) key agreement protocol [Sco13] built on different pairing-friendly curves, including BN446, BLS12-446 and BW13-310.The results show that -the UCK protocol built on BW13-310 is about 125.6% and 40.6% faster than that on BN446 and BLS12-446 for the resource-constrained party (Client), respectively; -the BLS signature scheme built on BW13-310 is about both 150% faster than that on BN446 and BLS12-446 for the signer, respectively.

Preliminaries
Let p be a large prime, and E an ordinary elliptic curve defined by an equation of the form y 2 = x 3 + ax + b where a, b ∈ F p are selected such that 4a 3 + 27b 2 = 0.The group E(F p ) consists of points (x, y) satisfying the above equation with x, y ∈ F p , together with a point at infinity O. Denote by #E(F p ) the order of E(F p ). Then #E(F p ) is precisely p + 1 − t, where t is the trace of the p-power Frobenius endomorphism π : (x, y) → (x p , y p ).
Let r be a large prime such that r | #E(F p ).The embedding degree k with respect to r is the smallest positive integer such that r | p k − 1.We use G T to denote the subgroup of order r in Define G 1 and G 2 are eigenspaces of π acting on E[r] with eigenvalues 1 and p, respectively.Or equivalently,

Optimal pairing
For any point R ∈ E and n ∈ Z + , we denote f n,R as a normalized rational function with divisor For any i, j ∈ Z + , the functions f i,R , f j,R and f i+j,R satisfy the following relations: where (2), Miller [Mil04] proposed a polynomial time algorithm for computing f n,Q (P ) for any n ∈ Z + , P ∈ G 1 and Q ∈ G 2 , which is described in Alg. 1.Throughout this paper, we call f n,R as Miller function and one execution of the main loop in Alg. 1 as a basic Miller iteration.
Let m be a multiple of r with m r 2 , and write m as ω j=0 c i p i .An optimal pairing [Ver09] on E is defined by where s i = ω j=i c j p j .Eq. (3) allows pairing evaluation to be accomplished by using around log r/(ϕ(k)) basic Miller iterations and one exponentiation by (p k − 1)/r.

Algorithm 1 Miller's Algorithm
end if 10: end for 11: return f For curves with even embedding degrees, pairing computation benefits from the denominator elimination optimization so that the vertical line in Alg. 1 can be ignored.Unfortunately, this technique does not apply to curves with odd prime embedding degrees.The penalty is slightly made up for by a modified Miller function g m,Q with divisor Comparing Eqs.(1) and (4), it is easy to deduce the following relations: Exploiting Eqs. ( 5)-(8), Dai et al. [DZZZ22] found that the optimal strategy for performing Miller loop is as follows: (1) combining two consecutive doubling steps into one quadrupling step; (2) combining one doubling and one addition steps into one doubling-addition step.

A family of curves with embedding degrees k ≡ 1 mod 6
Freeman, Scott and Teske [FST10, Construction 6.6] constructed a family of cyclotomic pairing-friendly curves with embedding degrees k ≡ 1 mod 6, k ≤ 1000 and 18 k.
In particular, the characteristic p, the prime order r and the trace of Frobenius t are parameterized by where Φ l (•) represents l-th cyclotomic polynomial.All members in this family have jinvariant 0 and are defined by an equation of the form y 2 = x 3 + b for some b ∈ F * p .Following [CDS20], this family is named as the BW family.By the form of r(z), we have r(z) | (z 2k − z k + 1), which implies that Thus, one of short vectors (c 0 , c 1 , • • • , c ω ) for the optimal pairing in this family is given by (z 2 , −z, 1, 0, • • • , 0).Plugging this vector into Eq.(3), the corresponding formula of the optimal pairing is It is known from [EMJ16, Lemma 3.5] that Therefore, Eq. ( 9) can be rewritten as BW13-310 and BW19-286: In the BW family, BW13-310 and BW19-286 are the two curves defined by setting k = 13 and 19, z = −2224 and −145, b = −17 and 31, respectively.In particular, the selected prime p on BW13-310 satisfies that p ≡ 1 mod 13, so that the full extension field F p 13 can be represented as F p [v]/(v 13 − α) for some α ∈ F * p .Using Magma [BCP97], it is easy to check that the polynomial v 13 − 2 is irreducible over F p , which means that we can select the value of α as 2. According to the estimation in [Gui20, CDS20], both curves are 128-bit secure even under the attack of the SexTNFS.

Single Pairing Computation on BW13-310
Notations.We denote by a, m, m u , s, s u , i and r the computational costs of addition, multiplication, multiplication without reduction, squaring, squaring without reduction, inversion and modular reduction in F p , respectively.It is obvious that m = m u + r and s = s u +r.Likewise, we use ã, m, mu s, su , ĩ, f, ẽ and r to represent the computational costs of addition, multiplication, multiplication without reduction, squaring, squaring without reduction, inversion, Frobenius, exponentiation by |z| and modular reduction in F p 13 , respectively.The cyclotomic group G Φ13(p) is defined by G Φ13(p) = {β ∈ F p 13 | β Φ13(p) = 1}.We denote by ĩc the cost of the inversion in G Φ13(p) .The notation × is used to denote field multiplication without reduction.For any point Q, we use (x Q , y Q ) to represent the point in affine coordinates, and In this section, we propose a new formula for pairing computation in the BW family, and discuss how to apply it to BW13-310 in detail.Since p ≡ 1 mod 3, there exists an endomorphism φ : (x, y) where ω is a cube primitive root of unity in F * p and λ is a root of the quadratic congruence equation Based on this observation, we prove the following theorem.
Theorem 1.Let notation as above.Then the formula of the optimal pairing in the BW family can be expressed as where φ = φ 2 .
In Theorem 1, we propose a new formula for computing the optimal pairing in the BW family, which is suitable for BW13-310 and BW19-286.Using the new formula, the number of basic Miller iterations is reduced to log|z| ≈ log r/(2ϕ(k)).In detail, when performing Miller's algorithm using Eq. ( 12), the computational cost largely comes from two evaluations at the same Miller function of length log|z|, i.e., f −z,Q (P ) and f −z,Q ( φ(P )).Recently, Fouotsa et al. [FGA23] proposed the x-superoptimal pairing on BW13-310 and BW19-286, which is expressed as Clearly, Eqs. ( 12) and ( 18) require the same number of iterations when performing Miller's algorithm.But the latter requires three evaluations at the same Miller function of length log|z|, i.e., f −z,Q (P ), f −z,Q (φ(P )) and f −z,Q ( φ(P )).Therefore, compared to the x-superoptimal pairing, our proposed formula would be more efficient.In the following, we investigate how to perform the pairing computation on BW13-310 in detail.

Shared Miller loop
Computing a single pairing by multiple Miller function evaluations were studied in [Sco05, ZXZ + 11, AFCK + 13].This inspires us to consider how to speed up pairing computation on BW13-310 by using Eq. ( 12), i.e., computing f −z,Q (P ) and f −z,Q ( φ(P )).Since the points P and φ(P ) have the same y-coordinates, the two Miller function evaluations share a large amount of intermediate values during Miller iteration.Hence, it would be efficient to calculate f −z,Q (P ) and f −z,Q ( φ(P )) simultaneously.In other words, we can accomplish two Miller function evaluations at a shared Miller loop.Recall from Section 2 that the seed z = −2224 on BW13-310, and the Miller function f 2224,Q can be obtained from f 1,Q via the following sequence: For any i ∈ Z, we denote by N i,Q and D i,Q the numerator and denominator of g i,Q , respectively.According to Eq. ( 5), we have where xP represents the x-coordinate of φ(P ).In the following we will discuss how to update the terms

Shared addition step(SADD)
In this subsection we show how to obtain N m+1,Q (P ), D m+1,Q (P ), N m+1,Q ( φ(P )) and , respectively.To this end, the point T + Q is first calculated.Since T and Q are represented in Jacobian and affine coordinates respectively, we adopt the mixed addition formula presented in [AFG + 17, Section 4.3.2] to compute T + Q, which is given by It can be done by using the following sequence of operations: The above calculation comes at a cost of 6 m + 2 mu + 3s + r + 8ã, assuming that computing U 0 − U 1 requires 2ã.For any point (x, y), it can be deduced from Eq. ( 6) that where L T,a,1 (x, y) and L T,a,2 (x, y) are given by T and Z 3 T have been obtained at the point addition step, we perform the following sequence of operations to compute L T,a,1 (P ), L T,a,2 (P ), L T,a,1 ( φ(P )) and L T,a,2 ( φ(P )) which requires 2 m + 3 mu + 39m + 2r + 7ã as On this basis, we can obtain N m+1,Q (P ), D m+1,Q (P ), N m+1,Q ( φ(P )) and D m+1,Q ( φ(P )) from Eq. (21) at a cost of 4 m.In summary, the computational cost at the SADD step is 6 m + 2 mu + 3s + r + 8ã = 12 m + 5 mu + 3s + 39m + 3r + 15ã.
In addition, it can be seen from Eq. ( 20) that when m = 1, both D m,Q (P ) and D m,Q ( φ(P )) are equal to 1, which indicates that two full extension field squarings can be saved.

Function transformation
According to the relation between f −z,Q and g −z,Q , we immediately have Moreover, the points π 2 (Q) and π 2 (φ(Q)) have the same x-coordinates, that is, Putting Eqs.( 24) and (25) together, Eq. ( 12) can be rewritten as e(P, Q) = ( L1 L2 ) (p 13 −1)/r , where Once the values of N −z,Q (P ), D −z,Q (P ), N −z,Q ( φ(P )) and D −z,Q ( φ(P )) are given, the computation of L 1 and L 2 can be done at a cost of ẽ+ ĩ+6 m+13m+5 f+3ã.We will delay the inversion of L 2 into the easy part of the final exponentiation such that one inversion can be saved.

The final exponentiation
An optimized final exponentiation routine is critical for fast pairing computation on BW13-310.The exponent (p 13−1 )/r can be split as (p 13−1 )/r = (p − 1) hard part

•
Raising L 1 /L 2 to the power of p − 1 is easy, which can be done at a cost of ĩ + 3 m + 2 f as follows: The bottleneck of the final exponentiation is to raise f to the power of the hard part.
Firstly, the values of f λ0,0 , f λ1,0 , f λ2,1 and f λ3,1 can be computed using the following operations: Since only one additional field multiplication is required for obtaining f 3 at the process of computing f x , the cost of computing ( 26) is 4ẽ + 4 m + 4s.Denote g by f (x 12 +x 13 +x 14 ) .On the basis of (26), we then compute g with 10ẽ + m: To compute f λ0,1 , f λ1,1 , f λ2,0 and f λ3,0 , the following operations are performed: The cost of computing ( 27) is 4ẽ + 7 m.Using the trick of Montgomery's simultaneous inversion [Mon87], we then can compute the terms f 0 = f λ0 and f 1 = f λ1+λ2•p+λ3•p 2 as follows: which requires ĩc + 9 m + 4 f.Finally, raising f to the power of h can be expressed as Since the value of f 3 can be obtained from (26), the computation of (28) requires 9ẽ + 5 m + 4 f.Applying the technique of lazy reduction [AKL + 11] and Karatsuba algorithm, a detailed description of the finite field arithmetic in F p 13 was given in [DZZZ22].In Table 1, we summarize the associated operation counts.We now provide detailed operation counts of the pairing computation on BW13-310 by using our algorithms.From (19), the computation of N −z,Q (P ), D −z,Q (P ),N −z,Q ( φ(P )), D −z,Q ( φ(P )) requires executing 5 SQPL, 2 SADD and 1 SDBLADD.Thus, the total number of operations in the Miller loop is

Operation counts
Eq.( 20) It can be seen from Section 3.2 that the total number of operations in the final exponentiation is In Table 2, we compare the operation counts of ML and FE on BW13-310 to the previous works available in the literature.It should be noted that the estimation in [Gui20] assumes that m ≈ 59m, while in [FGA23] assumes that m ≈ 66m.Clearly, our algorithms require fewer computational cost as compared to the previous works.

Pairings Products Computation on BW13-310
The evaluation of the products of pairings is often required in many pairing-based protocols.Efficient algorithms for computing such products were proposed in [GS06, Sco11, ZL12] by sharing an amount of full extension field squarings and the costly final exponentiation step.In the case of BW13-310, the products of n-pairings can be expressed as By Eqs. ( 24) and ( 25), the values of L n,1 and L n,2 are given by where xPi represents the x-coordinate of φ(P i ).Clearly, the most costly operations for computing L n,1 and L n,2 take place in the evaluations of To start, we need to obtain the following four initial values: Given an integer m, we denote T i by [m]Q i for each point Q i .Then, the following relations are easily derived from Section 3.1: Based on the analysis of Section 3.1, we can deduce that the costs of the nSADD, nSDBLADD and nSQPL steps are • nSADD: n(12 m + 5 mu + 3s + 39m + 3r + 15ã).
Analogous to the single pairing computation, two full extension field squarings can be saved at the first nSQPL step, and we can obtain the terms and n i=1 D −z,Qi ( φ(P i )) by executing 5 nSQPL, 2 nSADD and 1 nSDBLADD.On this basis, we continue to calculate L n,1 and L n,2 .Since the point P i for each i is defined over F p , we have n i=1 The above computation requires 3(n − 1) m + 13nm + 3 f + 3nã.By the form of Eq. ( 29), it is straightforward to see that the cost of computing L n,1 and L n,2 is Based on the above analysis, the cost of the Miller loop for computing n-pairings products on BW13-310 is Eq. ( 30)

Exponentiation in Pairing Groups
Exponentiation in three pairing groups G 1 , G 2 and G T also plays a vital role in the implementation of pairing-based protocols.In this section, we discuss how to efficiently perform the operation on BW13-310.The best-known method for computing [n]P with P ∈ G 1 and n ∈ Z r was introduced by Gallant, Lambert, Vanstone [GLV01], which is called the GLV method.The basic idea of this method is as follows.If the target curve has an efficiently computable endomorphism φ such that φ(P ) = [λ]P for some integer λ, then the scalar n can be decomposed as n 0 , n 1 such that n ≡ n 0 + n 1 • λ mod r, where |n 0 |, |n 0 | ≈ √ r.Thus, the computation of [n]P can be replaced by the multi-exponentiation [n 0 ]P + [n 1 ]φ(P ).In summary, this method can halve the number of point doublings.In addition, recoding n 0 and n 1 with the w-width non-adjacent form (w-NAF) can reduce the number of point additions.In the RELIC cryptographic toolkit [AG], this method can be implemented automatically once the associated curve parameters are given.Therefore, we only investigate how to perform exponentiations in G 2 and G T on this curve.

Exponentiation in G 2
For exponentiation in G 2 on curves admitting a twist, such as BN and BLS12 families, GLS method [GLS09] In general, GLS method largely reduces the number of iterations required for computing It is possible to build a higher-dimensional decomposition by combining GLV and GLS methods on some certain curves.This idea was initially proposed by Longa and Sica [LS12] to obtain a four dimensional decomposition on certain curves, and subsequently applied into different scenarios [CL15,FHLS14].On BW13-310, the orders of φ and π are respectively 3 and k in End so a random exponent n can be decomposed into 2ϕ(k) mini-exponents, where the bit size of the maximum of It should be noted that the 2ϕ(k)-dimensional decomposition takes advantage of the fact that ψ(Q) = [−z]Q.More specifically, since log r ≈ 24 log|z|, the exponent n can be written in the basis of |z| as where log|n i | < log|z| ≈ 1 24 log r.Thus, the computation of By the form of the endomorphism ψ, we have , and thus the cost of computing ψ i (Q) is negligible.The procedure of performing exponentiation in G 2 on BW13-310 is summarized in Alg. 2.
Remark 1.We conclude that if the orders of φ and π are coprime in End F p k (E), then there exists a 2ϕ(k)-dimensional decomposition in G 2 .However, the orders of φ and π are respectively d and k in End F p k (E) with d | k (d = 3 or 4) on many mainstream pairing-friendly curves, such as BN, BLS12, KSS16 and KSS18 families.It means that ψ = φ • π satisfies Φ ϕ(k) (ψ) = 0. Thus, the new technique is not suitable for the above mentioned curves.

Exponentiation in G T
Since there is not an efficient computable endomorphism in F p k corresponding to the GLV endomorphism in E(F p k ), the exponentiation in G T is slightly different to that in G 2 .In other words, in the case of exponentiation in G T , a given exponent n can be end for 18: end for 19: return R only decomposed into ϕ(k) multi-exponents by using the Frobenius endomorphism, rather than 2ϕ(k).We now follow the same recipe described by Galbraith and Scott [GS08] to decompose n the as n k) .To this aim, we first define a modular lattice L as Clearly, a basis Inputting the basis B * into the LLL algorithm [LLL82], we obtain a LLL-reduced basis Since (r, 0, 0, • • • , 0) ∈ L, there exists a unique solution (y 0 , y 1 , Multiplying n/r on the both side of (31), it produces that where Since α i ∈ Z for each i, it is clear that n ∈ n + L.Moreover, combining Eqs.( 32) and (33), we have Since the selected seed z is negative and For the mainstream pairing-friendly curves, such as BN, BLS12, KSS16 and KSS18 families, the embedding degrees k are always even.This allows inversion in G T to be computed for almost free by using a simple conjugation over F p k/2 .In other words, nonpositive exponent decomposition will not bring extra overhead for exponentiation in G T on these curves.However, the picture is different on BW13-310 as it has an odd prime embedding degree.It pays a penalty for expensive cost inversion in G T .In order to avoid this operation when performing exponentiation in G T , we expect multi-exponents are all-positive.
where c i is denoted as the i-th tuple of c. Combining Eq.s (34) and (35) together, the proof is immediate.
The all-positive decomposition described in Proposition 1 also leads to around 1 bit increase for the bound of the size of the mini-exponents.Considering the expensive cost of inversion in G T , this trade is absolutely worthwhile.In addition, one should be noted that when performing the small exponentiations by n i for i = 0, 1, • • • , 11, the NAF expression is not applicable.The procedure of exponentiation in G T on BW13-310 is presented in Alg. 3.

Implementation Results
In this section, we present our implementation results of the pairing computation and common building blocks on BW13-310 within the RELIC cryptographic toolkit.For the pairing computation and the group exponentiations in G 2 and G T we use the algorithms proposed in this paper.For hashing to G 1 and G 2 , the group membership testings and the group exponentiation in G 1 , we exploit state-of-the-art techniques.In detail, • The function H 2 : {0, 1} * → G 2 is implemented by using the method proposed in [DZZ22].Likewise, it is split into two phases: hashing a binary string to a random point R 2 ∈ E(F p 13 ), followed by mapping R 2 to G 2 .The computational cost of hashing to G 2 largely comes from the second phase, which requires approximately 26 scalar multiplications by z, 13 point doublings and 41 point additions in E(F p 13 ).
• The best-known algorithms for the G 1 , G 2 and G T membership testings on BW13-310 are proposed in [DLZZ23], which require approximately 12 scalar multiplications by z in E(F p ), 1 scalar multiplications by z in E(F p 13 ) and 2 exponentiations by z in F p 13 , respectively.
In order to evaluate strengths and weaknesses of BW13-310 in real-world pairing based cryptographic protocols, we present a performance comparison between BW13-310 and other 128-bit secure pairing-friendly curves, including BN446, BLS12-446 and BLS24-315.Specially, BN446 and BLS12-446 are well known for fast pairing computations, while BLS24-315 is another interesting curve with fast exponentiation in G 1 .All of these curves are defined by an equation of the form y 2 = x 3 + b for some b ∈ F * p , and the related parameters are summarized in Table 3. RELIC provides high speed implementations of pairing computations and the required auxiliary building blocks on BN446, BLS12-446 and BLS24-315.We have integrated our codes in this library to allow a direct performance comparison across different curves.The source code is available at https://github.com/eccdaiy39/BW13-P310.Our benchmark results are presented in Figs.1-4.Timings are measured on an Intel Core i9-12900K processor running at @3.2GHz with TurboBoost and hyper-threading features disabled, averaged over 10 4 executions.For group exponentiations G 1 , G 2 and G T on each curve, window widths w are set as 4, 1 and 1, respectively.
• Compared to BN446 and BLS12-446, BW13-310 is about 109.1% − 227.3%, 100% − 192.6%, 24.5% − 108.5% and 68.2% − 145.5% faster in terms of hashing to G 1 , exponentiations in G 1 and G T , and membership testing for G T , respectively.In essence, as to operations related to G 1 , BW13-310 benefits from fast prime field arithmetic.As to operations related to G T , even though BW13-310 fails to provide fast cyclotomic squaring [GS10, Kar12] and decode exponents in the NAF form, it is much more favoured for a small size of full extension field and a large value of ϕ(k), which result in fast full extension field multiplication and high dimensional GLS decomposition.
• More surprisingly, the gap in the performance of single pairing computation between BW13-310 and BN446 (resp.BLS12-446) is only up to 4.9% (resp.26%).In particular, the computation of the Miller loop on BW13-310 is even up to 48.2% faster than that on BN446.In fact, our results reveal that a few percent efficiency disadvantage of pairing computation on BW13-310 mainly arises from the final exponentiation part.For the computation of the n-pairings products, BW13-310 outperforms BN446, while still slower than BLS12-446.For example, for the 8pairings products, BW13-310 is about 14.2% faster than that on BN446, while 25.5% slower than that on BLS12-446.
• However, BW13-310 also introduces a significant penalty for hashing to G 2 and exponentiation in G 2 .Indeed, the pairing group G 2 on BW13-310 is defined over the full extension field as the lack of twists, while that on BLS12-446 and BN446 lies in a subfield F p 2 .
• For BW13-310 and BLS24-315, they provide nearly equal performance in terms of exponentiation in G 1 , membership testing for G 1 .Moreover, the former has a significant advantage in terms of hashing to G 1 , exponentiation in G T , membership testing for G T and pairing computation, while the latter outperforms for hashing to G 2 and exponentiation in G 2 .
In summary, our implementation results show that BW13-310 is competitive in scenarios that hashing to G 2 and exponentiation in G 2 are not necessary, or performed by a powerful computational entity.

Applications
In this section, we estimate the performance of two pairing-based cryptosystems built on the above mentioned pairing-friendly curves, aiming to explain that BW13-310 is an interesting candidate.

Unbalanced Chen-Kudla protocol
In [CK03], Chen and Kudla designed a two-party identity based authenticated key agreement protocol from pairings.In this protocol, each entity is required to perform one exponentiation in G 1 and one pairing computation.In real-world protocols, it is often the case that one entity (Client) is equipped with a resource-constrained device, while the other (Server) is more powerful.To reduce the workload of the client, it is reasonable to shift the time-consuming pairing computation to the server.To this aim, Scott [Sco13] proposed an unbalanced Chen-Kudla protocol (UCK).In this scenario, a system wide public parameter Q ∈ G 2 is introduced.A trusted authority (TA) posseses a master secret key s ∈ Z r .The client and server have secret keys S A and S B issued by TA as , where ID A is the client's identity.Then the protocol runs as follows: 1 ) and sends the two points R 1 and R 2 to the server; 2. The server chooses r 3 ← Z r at random, calculates R 3 = [r 3 ]R 1 and g = e(R 3 , Q), and send the pairing value g to the client; 3. The client obtains the session key K A by computing The server obtains the session key K B by computing 4. If the both entities follow this protocol, they would share the same session key Since the client can precompute the point H 1 (ID A ) and the pairing value e(S A , Q), the entity only costs two exponentiations in G 1 and two exponentiations in G T .Meanwhile, the server costs one hashing to G 1 , two exponentiations in G 1 and two pairing computations.
Considering the protocol is designed to minimize the workload of the client, we can see that BW13-310 is well-suited as it provides fast implementation for group exponentiations in both G 1 and G T .Based on implementation results presented in Figs.1-4, Table 4 shows our cost estimates for each party of the UCK protocol built on different curves.One can see that the UCK protocol built on BW13-310 is about 125.6% and 40.6% faster than that on BN446 and BLS12-446 for the client, respectively.

BLS signature scheme
The Boneh-Lynn-Shacham (BLS) signature is a famous short signature scheme from pairings [BLS04].In the scheme, the point g 2 ∈ G 2 is a public parameter and the signer posseses a pair of key (s, pk = [s]g 2 ), where s is private and pk is public.Then, the scheme works as follows: 1. To sign a message msg, the signer computes M = H 1 (msg), sig = [s]M , and sends the pair (msg, sig) to the verifier.
2. To verify the signed message, the verifier computes M = H 1 (msg) and assert that the signature is valid if and only if sig ∈ G 1 and e(sig, pk) = e(M, g 2 ).
It should be noted that an attacker can use the point sig = sig + R to forge a valid signature as e(sig , pk) = e(sig, pk), where R is an random point in the subgroup r • E(F p ).Thus, it can not be ignored for the verifier to check sig ∈ G 1 .In this setting, the signer costs one hashing to G 1 and one exponentiation in G 1 , while the verifier costs one hashing to G 1 , one subgroup membership testing for G 1 and one product of 2-pairings.From Figs.
1-4, we estimate that the BLS signature scheme built on BW13-310 is about both 1.5× faster than that on BN446 and BLS12-446 for the signer, respectively.Considering that the performance penalty for the verifier is not expensive, this tradoff becomes favorable in the case that the scheme is designed to reduce the workload of the signer.

Conclusion and Future Work
In this work, we presented a detailed study of a 128-bit secure pairing-friendly curve: BW13-310.We first proposed a new formula for computing the optimal pairing on this curve.Specially, we showed that it requires two evaluations at the same Miller function of bit length approximately log r/(2ϕ(k)).On this basis, we proposed a shared Miller loop such that the two function evaluations can share intermediate values as much as possible.In addition, we also described several optimizations for group exponentiations in G 2 and G T on this curve.In the case of G 2 , we showed that GLV and GLS method can be combined to build a 2ϕ(k) dimensional decomposition.In the case of G T , our optimization eliminates expensive field inversion.Finally, we presented high speed implementations of pairing computation, hashing (to G 1 and G 2 ), group exponentiations and membership testings on a 64-bit processor over BW13-310.The technique of lazy reduction was fully utilized to minimize the number of modular reductions.Our results showed that compared to BN446 and BLS12-446, BW13-310 wins out in the performance of hashing to G 1 , group exponentiations in G 1 and G T , and membership testing for G T .Furthermore, it was very surprising to find that the gap in the performance of single pairing computations between BW13-310 and BN446 (resp.BLS12-446) is only up to 4.9% (resp.26%).In particular, compared to BN446, BW13-310 even has certain advantages for the computation of pairings products.Our results also reported that the target curve would pay a penalty for hashing to G 2 and the group exponentiation in G 2 .
Very recently, Longa [Lon23] further optimized the technique of lazy reduction such that the penalty of "double-precision" operations can be avoided.We note that the new algorithm gets a greater performance boost on prime fields with smaller sizes, potentially helping BW13-310 become more attractive.In addition, a faster SVW map (SwiftEC) was proposed in [CSRHT23](ASIACRYPT 2022).The performance comparison across different pairing-friendly curves using these optimized algorithms are left as future work.
[i]R,[j]R represents the straight line through [i]R and [j]R, and ν [i+j]R is the vertical line through [i+j]R.Based on Eq.

Proposition 1 .
Let the lattice L and the LLL-reduced basis B = (b 0 , b 1 , • • • , b 11 ) of L be constructed as above.For any integer n ∈ Z, there exists a vector n = (n 0 , n 1 , • • • , n 11 ) ∈ n + L such that the tuples of n are all-positive and n ∞ ≤ 3(z 2 − 2z + 2)/2.Proof.We first decompose the scalar n into the vector n as in (33).Then, we define c = b 0 + b 1 + • • • + b 9 − b 10 − b 11 and n = c + n.It is obviously that n ∈ n + L.Moreover, by the definition of b

Algorithm 3 •
Exponentiation in G T on BW13-310 Input: a random positive integer n ∈ Z r , a random element f ∈ G T Output:f n 1: f 0 ← f 2: for i = 1 to 11 do Compute f j i for j ∈ {1, 3, 5, • • • , 2 w − 1} 6:Decompose n into all-positive mini-exponents (n 0 , • • • , n 11 ) by using Proposition 17: Recode n i = t−1 j=0 n i,j 2 j such that n i,j ∈ {1, 3, 5, • • • , 2 w − 1} 8: g ← 1 9: for j = t − 1 down to 0 do 10: g ← g 2 11:for i = 0 to 11 do 12:if n i,j > 0 13: g ← g • f The function H 1 : {0, 1} * → G 1is implemented by using the Shallue-van de Woestijne (SVW) map[SvdW06], followed by a cofactor multiplication.The SVW map aims to efficiently hash a binary string to a random point R 1 ∈ E(F p ) in constant time, and the cofactor multiplication forces R 1 into the target group G 1 .RELIC provides dedicated implementation of this map in the ep_map_from_field() function in the file /src/ep/relic_ep_map.c.The procedure of clearing cofactor can be done at a cost of one multiplication by z 2 − z + 1[EHGP22].
Miller loop on BW13-310 is even up to 48.2% faster than that on BN446.As a result, for the computation of the pairings products, BW13-310 gains an advantage over BN446, while is still slower than BLS12-446.-More importantly, compared to BN446 and BLS12-446, BW13-310 is about 109.1% − 227.3%, 100% − 192.6%, 24.5% − 108.5% and 68.2% − 145.5% faster in terms of hashing to G 1 , exponentiations in G 1 and G T , and membership testing for G T , respectively.-On the negative side, BW13-310 also pays a penalty in terms of hashing to G 2

Table 1 :
Operation counts for the full extension field arithmetic

Table 2 :
Comparison of operation counts for pairing computation on BW13-310 with the previous works available in the literature u + 19866s u + 5448r + 192605a.