Faster Bootstrapping via Modulus Raising and Composite NTT

. FHEW-like schemes utilize exact gadget decomposition to reduce error growth and ensure that the bootstrapping incurs only polynomial error growth. However, the exact gadget decomposition method requires higher computation complexity and larger memory storage. In this paper, we improve the eﬃciency of the FHEW-like schemes by utilizing the composite NTT that performs the Number Theoretic Transform (NTT) with a composite modulus. Speciﬁcally, based on the composite NTT, we integrate modulus raising and gadget decomposition in the external product, which reduces the number of NTTs required in the blind rotation from 2( d g + 1) n to 2( d d g / 2 e + 1) n . Furthermore, we develop a modulus packing technique that uses the Chinese Remainder Theorem (CRT) and composite NTT to bootstrap multiple LWE ciphertexts through one blind rotation process. We implement the bootstrapping algorithms and evaluate the performance on various benchmark computations using both binary and ternary secret keys. Our results of the single bootstrapping process indicate that the proposed approach achieves speedups of up to 1.7 × , and reduces the size of the blind rotation key by 50% under speciﬁc parameters. Finally, we instantiate two ciphertexts in the packing procedure, and the experimental results show that our technique is around 1.5 × faster than the two bootstrapping processes under the


Introduction
Homomorphic encryption (HE) is a prospective cryptographic primitive that performs arbitrary computation on ciphertexts without access to the secret key.Due to its confidentiality, HE schemes have emerged as a core technology for applications such as privacy-preserving cloud computations [MSM + 22].The first fully homomorphic encryption scheme was proposed by Gentry [Gen09] in 2009, and since then, the field has seen significant progress.The common FHE schemes are typically divided into three classes based on the data types: BGV and BFV schemes for modular arithmetic over finite fields, which are usually used for small integer computations [BGV14,Bra12,FV12]; CKKS scheme for approximate computations over real and complex numbers [CKKS17, CHK + 18]; FHEW and TFHE schemes for evaluating boolean circuits, which are well-suited for comparisons and decision diagram computations [DM15,CGGI16].
Nowadays, these homomorphic encryption schemes are based on the (Ring) Learning With Errors assumptions, where a small amount of error (noise) is introduced to the encrypted message in the encryption process.However, the error can accumulate during circuit evaluations and even corrupt the plaintext if it exceeds a certain threshold.As a result, managing noise effectively has become a central concern in the design of FHE schemes.At present, there exist two popular methods for reducing error in FHE schemes.The first method is gadget decomposition, including digit decomposition and RNS decomposition, in which an element is broken down into smaller digits.Although this approach can reduce noise, it suffers from efficiency bottlenecks due to its quadratic growth.Modulus raising, introduced by Gentry et al. [GHS12], offers a more efficient alternative compared to gadget decomposition.However, a larger ciphertext modulus is required in the HE cryptosystem, which may reduce the security level of the scheme.Thus, some works [KPZ21a, CCH + 22] explore a hybrid method that combines the advantages of both gadget decomposition and modulus switching to achieve a balance between noise control and security level.
Furthermore, choosing an appropriate ciphertext modulus is also crucial in FHE schemes since it determines the upper bound on the noise level that the scheme can tolerate during the computation.A sufficiently large ciphertext modulus must be chosen in advance to account for noise growth during calculation, or a bootstrapping procedure must be used to reset the noise and keeps the modulus within a reasonable range.FHEW [DM15,MP21] and TFHE [CGGI16,CGGI20] schemes typically use the latter strategy and are known for efficient bootstrapping.Their efficiency is mainly due to the small ciphertext modulus, which allows for the use of CPU native types to represent ciphertexts in both the coefficient representation and Discrete Fourier Transform (DFT) representation.Specifically, the two schemes follow the same bootstrapping framework, which involves the homomorphic decryption of an LWE ciphertext through a blind rotation procedure.However, there are some differences in terms of their underlying algebraic structures and implementation details.
The TFHE scheme typically relies on the real torus T, which is the set of real numbers modulo 1, represented as the interval [0, 1).In practice, torus elements are not represented with an infinite number of digits but instead approximated to a finite precision.At the level of implementation of the algorithm, the TFHE scheme usually utilizes 32 or 64 bits data to represent the ciphertext modulus in TFHE-lib [CGGI20] and TFHE-rs [BSJJ22], which offers the advantage of performing modulo operations for free based on the data type.However, this set can only utilize the Fast Fourier Transform (FFT) to accelerate polynomial multiplication.On the other hand, the choice of ciphertext modulus in the FHEW-like schemes is more flexible.Typically, the modulus can be set to a prime number, which enables it to perform more homomorphic operations than powers-of-two.For instance, the trace operation [CDKS21] requires the polynomial dimension and the modulus to be coprime.In scenarios like this case, the NTT outperforms the FFT in terms of efficiency.
These original FHEW and TFHE schemes focus on bootstrapping single LWE ciphertext.Micciancio et al. [MS18] proposed a novel refreshing procedure that can simultaneously refresh multiple LWE ciphertexts, which makes it more suitable for practical applications.Building upon this, subsequent works, such as [LW23a] and [LW23b], have significantly improved the asymptotic cost per gate bootstrap to homomorphic multiplications.It is worth noting that these schemes heavily rely on certain algebraic structures, and therefore, these works are all designed based on FHEW-like schemes.

Contributions and Techniques
In this paper, we focus on the FHEW-like bootstrapping and use composite NTT to optimize and improve the blind rotation procedure.
Composite NTT.The methods for performing NTT using composite moduli can be categorized into different strategies.we analyze these mathematical principles for these methods and investigate the computational environments and tasks associated with HE in which they are best suited.We provide a proof for the approach [HP22] that constructs the root for the composite NTT, and extend this method to encompass multiple moduli, enhancing its versatility and applicability.
Reduce the number of NTTs in blind rotation.Based on the composite NTT, we present two more flexible and improved variants of the external product.In the first method, the RGSW ciphertext is represented as RLWE P Q (P sm) and RLWE P Q (P m) ciphertexts, where s is the secret key, m is the massage, and P is a temporary modulus.The new external product between RLWE and RGSW ciphertexts is where the division is performed to reduce error growth.We also integrate the modulus raising, digit decomposition, and RNS decomposition in the hybrid method, which can further reduce noise by the decomposition.Composite NTT-based packing bootstrapping.We introduce a novel packing bootstrapping algorithm for FHEW-like schemes.In particular, we can use CRT to pack some independent accumulators into one large composite modulus.By performing the composite NTT, these accumulators only need to perform one external product operation with a large modulus in each CMux gate.We remark that this technique gains from the application of composite NTT and does not apply the FFT-based TFHE scheme.Our approach offers adaptable deployment capabilities for platforms that implement various machine word lengths, especially hardware-accelerated architectures (i.e., the state-of-the-art ASIC accelerator SHARP [KKC + 23]).Finally, we implement the above methods and provide some comparisons and analyses.• We comprehensively analyze the proposed bootstrapping algorithm under different parameters, including the variance of noise growth, computational complexity, and decryption failure rate.We summarize the number of operations required in the blind rotation process for different methods in Table 1.
• Compared to exact decomposition in FHEW-likes schemes, our bootstrapping algorithm reduces the key sizes by 50% for blind rotation and achieves a speedup of up to 1.7 × compared to gadget decomposition.
• We implement the packing bootstrapping algorithm that bootstraps two LWE ciphertexts within 64 bits of CPU machine word lengths.The result shows that the proposed method is 1.5 × faster than the two bootstrapping processes.] also propose a method to construct the root for the composite NTT.However, their work lacks conclusive proof and does not consider the scenario of multiple moduli.Despite the advantages of NTT and FFT, it is incompatible with the noise control methodology since modulus switching is involved in gadget decomposition and modulus raising.Therefore, frequent switching of data form between the coefficient representation and DFT representation is required during the homomorphic encryption algorithm, which also leads to the consumption of a large number of computational resources [JLK + 21].Recently, Kim et al. [KLSS23] accelerated the key-switching in the Full-RNS setting for the CKKS scheme, which introduces a second gadget decomposition to reduce NTT computations.This work focuses on the multi-precision modulus, and the benefits do not appear to be evident in the RNS-based FHEW-like schemes.

Related Work
The blind rotation procedure in FHEW and TFHE bootstrapping can homomorphically compute the RLWE ciphertext of X n i=0 a i s i , which needs massive NTTs or FFTs, and Hadamard multiplications.Currently, there are three strategies for performing blind rotation: the AP method [ASP14, DM15] using a i as a selector to pick all the evaluation keys that encrypt E(a • s i ), and these are accumulated by the external product; and the GINX method [CGGI16] that homomorphically performs CMux gate, which is more effective for binary and ternary secret key distributions; and the LMK method [LMK + 23] that uses the ring automorphisms and RLWE-based key switching technique to support the arbitrary distribution of secret keys, which is subsequently used by [DMKMS23] to support packaging bootstrapping.

Paper Organization
The rest of the paper is organized as follows.We provide the necessary background knowledge and some general tools in FHE schemes in Section 2. In Section 3, we present some methods and comparisons for performing polynomial multiplication with composite modulus.In Section 4, we show the new bootstrapping algorithm with the composite NTT technique.In Section 5, we suggest some analysis of noise growth and details of the algorithm execution and experimental results.Finally, we conclude the paper in Section 6.

Notation
We denote as Z the set of integers, R as the set of reals.We use lower-case bold letters for vectors and upper-case bold letters for matrices.a, b is the inner product between two vectors.We denote Z Q the ring Z/QZ, and the scope of Q as the residue ring modulo Q, and the centered remainder of x modulo Q as [x] Q .For a real number r, we write the floor, ceiling, and round functions as r r r , respectively.For a set of k co-prime moduli Q 1 , ..., Q k , we denote Furthermore, we denote the 2N -th cyclotomic ring by R = Z[X]/(X N + 1) and the quotient ring by its coefficients are represented by the vector Coefs(m) = (m 0 , m 1 , ..., m N −1 ).In our notation, sometimes a polynomial is denoted by a(X), and sometimes it is denoted by a.The multiplication operations are indicated by the • and symbols, where the former is used for number and polynomial multiplication, while the latter is used for Hadamard multiplication.We use x ← D to denote the sampling of x according to distribution D. We denote Var(err(ct)) as the variance of error for the ciphertext ct.Fianlly, we denote a p = ( the p -norm of a vector a ∈ Z n and compute the p-norm with a polynomial by taking its coefficient vector.

Gadget Decomposition
Gadget Decomposition.Gadget decomposition includes digit decomposition and RNS decomposition.The digit decomposition can break a number down into individual digits using a radix base.Let a ∈ R Q and d g = log Bg Q + 1 .The decomposition function g −1 d and the expansion function g with the radix base B g are: Q .Thus, we can get g −1 d (a), g ≡ a mod Q.Furthermore, Residual Number System (RNS) decomposition is another gadget decomposition technique, and the details will be described in the following section.

Gaussian Distribution
The Gaussian function is defined as a distribution over Z and each element in Z is sampled with probability proportional to its probability mass function value under a Gaussian distribution over R. The Gaussian function is where σ, c ∈ R ≥ 0 and then The discrete Gaussian distribution with standard deviation σ and mean c is a distribution on Z with the probability of x ∈ Z given by D δ,c = ρ σ,c (x)/ρ σ,c (Z).If c = 0, we denote this distribution by χ δ .

Learning With Errors
We recall the learning with errors (LWE) assumption [Reg09] and Ring learning with errors (RLWE) assumption [LPR13]as follows.
• LWE Sample.A valid LWE sample is a vector (a, b) ∈ Z n+1 q that satisfies b = a, s + e mod q, where s is the secret key for LWE sample, a ← Z n q is a uniformly random vector, and error e ← χ δ is chosen from an error distribution.Then, (a, b) is a fresh ciphertext of 0.
where s is the secret key for RLWE sample, a is uniformly random in R Q , and the error e ← χ N δ is chosen from the error distribution.Then, (a, b) is a fresh ciphertext of 0. Thus, we can define the LWE and RLWE ciphertexts as Sometimes, the dimension N and secret key s may be omitted for the sake of simplicity.The message m can be recovered if satisfying e < q 2t by the (R)LWE decryption process Arithmetic The structure of ciphertexts in LWE and RLWE allows for homomorphic addition and scalar multiplication operations.For instance, given the LWE samples ct 1 = (a 1 , b 1 ) and ct 2 = (a 2 , b 2 ) , their terms can be added together to obtain: . Moreover, the multiplication between a ciphertext ct 1 = (a 1 , b 1 ) and a scalar cleartext z can be obtained directly from the addition operation:

Original RGSW Ciphertext and External Product
The original RGSW cryptosystem [GSW13,DM15] involves some RLWE samples and gadget matrix G, which can be denoted by External Product We define the original external product as D that involve the digit decomposition.Given the RLWE and RGSW ciphertexts, the external product outputs a new RLWE ciphertext as Error analysis.The noise in ct is given by e , where e 0 , ..., e 2dg−1 are the noise terms of the RGSW ciphertext and e is the noise term of the RLWE ciphertext.In bootstrapping, m ∈ ±X k is used as messages in the RGSW ciphertext.Thus, we can get the variance of e is Note that we define two additional forms of RGSW ciphertext in Section 4, and distinguish them based on their modulus.

NTT-based Multiplication
The Number Theoretic Transform (NTT) is a variation of the Discrete Fourier Transform (DFT) over the finite field.The NTT algorithm can convert a polynomial from its coefficient representation to the NTT representation, enabling Hadamard multiplication and significantly reducing the computation complexity from O(N 2 ) to O(N log N ).
To achieve NTT-based multiplication, the polynomial ring ), where f (X) and g(X) are coprime.If the prime modulus satisfies the condition Q ≡ 1 ( mod 2N ), then there exists a 2N -th primitive root of unity Thus, we can get: Due to the symmetric property of ζ, the polynomial can be further decomposed into N polynomials of 1 degree, i.e., Thus, for a polynomial a(X) ∈ R Q , we can obtain the length-N vector using the CRT Depending on this decomposition, we can define the NTT representation as NTT(a) = (A 0 , • • • , A N −1 ), where The iNTT process is symmetric and omitted.Then, for the multiplication of two polynomials c(X) = a(X) • b(X) ∈ R Q , we can compute the process as follows The detailed NTT and iNTT algorithms are described in Appendix A.

Sample Extraction
We show that the sample extraction technique [CGGI16] can extract the LWE ciphertext for the constant term of the polynomial.Given an RLWE cipheretext ct = (a, b) ∈ RLWE N s,Q (m), it returns an LWE sample as

Key Switching
Key switching procedure [DM15] is an important technique in FHE schemes, which can change the LWE dimension without changing the message.The procedure is described as follows.
• The key switching key generation algorithm takes secret keys z ∈ Z N , s ∈ Z n and a base B k as input, outputs • Given the key switching key ksk i,j and a ciphertext ct = (a, b) ∈ LWE N z,Q k (m), the key switching procedure computes the base B k expansion of each coefficient a i = j a i,j B j k , and outputs ct = KeySwitch(ct) ) for some a i,j,v ∈ Z n q and e i,j,v ∈ χ δ .We can obtain that a = − i,j a i,j,ai,j and b = b − a • z + a • s − i,j e i,j,ai,j .which is a new LWE ciphertext under the secret key s.And the variance of the noise satisfies Var(err(ct )) ≤ N d k • Var(err(ksk)) + Var(err(ct)).

Modulus Switching
The modulus switching technique can change the modulus of the ciphertext [BGV14,DM15].Take as input a ciphertext ct = (a, b) ∈ LWE n s,Q (m), the modulus switching algorithm outputs a ciphertext as According to [DM15], the variance of noise satisfies Var(err(ct . The correctness of the algorithm is given in Appendix B.

GINX Blind Rotation with Gadget Decomposition
Algorithm 1 GINX Blind Rotation with Gadget Decomposition. Input: We first present the GINX blind rotation algorithm with gadget decomposition.Given an LWE ciphertext (a, b) and n RGSW ciphertexts encrypting (s 0 , ..., s n−1 ), the blind rotation outputs a ciphertext ct ∈ RLWE(X −b+ n−1 i=0 a i s i ) as shown in Algorithm 1.In detail, the loop from lines 3-6 performs a CMux gate.It is easy to see that if s i = 0, the first term of the supplement is disregarded since it encrypts 0. On the other hand, if s i = 1, then (acc • X ai ) RGSW(1) equals the current accumulator value.Thus, the accumulator is replaced with the ciphertext of X aisi • acc.Furthermore, the CMux gate can be updated to the ternary CMux gate as shown in Section 4.

NTT Multiplication with Composite Modulus
In this section, we present an overview of the existing approaches employed in performing Number Theoretic Transform (NTT) with composite numbers, which include our construction as well.Furthermore, we undertake an extensive investigation into the theoretical principles behind these techniques and their suitability in diverse Homomorphic Encryption (HE) scenarios.

RNS Decomposition
A well-known technique is the Residue Number System (RNS), which uses the Chinese Remainder Theorem (CRT) to decompose multi-precision integers into vectors of NTTfriendly integers.It enables efficient operations using native (64-bit) integer types and reduces both the theoretical and practical computational overhead.More formally, for some distinct NTT-friendly moduli Q 1 , ..., Q k , the CRT yields an isomorphism Qi .Then one can perform the NTTs on the [a] Qi with coefficient-wise over the cyclotomic rings.Similar to the digit decomposition, the corresponding RNS vector is Similarly to the digit decomposition, we can obtain that CRT(a), g r ≡ a mod Q.Furthermore, we denote the iCRT as Note that some homomorphic operations, such as homomorphic multiplication in the CKKS scheme, need to switch this RNS basis to another RNS basis P = P 1 × • • • × P l in the so-called fast basis extension technique that involves the iCRT process.Please refer to [CHK + 19, KPZ21b] for more details.

NTT-unfriendly Rings
NTT-unfriendly rings mean that the parameters do not meet the requirements of section 2.6.For example, the NIST PQC finalist Saber [DKRV19] utilizes the power-of-two modulus, which is inherently incompatible with the NTT algorithm.Chung et al. [CHK + 21] present a technique to implement NTT on these rings, yielding better performance than the original schemes.
Specifically, the main idea entails elevating the polynomial ring to a larger one, where the modulus can cover the intermediate results of polynomial multiplication.Then, the NTT algorithm can be performed correctly on this ring directly.Before lifting, one should think about the maximum value of the product.When considering a modulus Q, the magnitude of the coefficients resulting from the multiplication within the ring R Q should not surpass N Q 2  4 .Thus, one can choose an NTT-friendly prime modulus Q > N Q 2 2 or multiple coprime NTT-friendly prime moduli p i that satisfy p i > N Q 2 2 .When using these moduli, the coefficients of the product will not be reduced during the polynomial multiplication, which guarantees the correctness of NTT-based multiplication in the larger ring.The subsequent processing can be summarized in the following three steps: 1. Lift polynomial coefficients on NTT-unfriendly Ring to one or multiple NTT-friendly Rings.
2. Perform NTT-based polynomial multiplications on the new NTT-friendly Rings.
3. Map the results back to the original NTT-unfriendly Ring by using either a modulo operation or inverse Chinese Remainder Theorem (iCRT).
Moreover, the methods of mixed-radix decomposition and Good's permutation are proposed by [CHK + 21] to deal with the case that polynomial dimension N does not satisfy the parameter requirements for NTT evaluation.Since we focus on the modulus, these details have been omitted.

NTT with Composite Modulus
In contrast to the aforementioned approaches, we delve into the mathematical essence of the NTT algorithm and explore the construction of a 2N -th primitive root of unity for the composite number The NTT requirement that Q ≡ 1 (mod 2N ) ensures the presence of the 2N -th primitive root of unity in Z Q .Consequently, the set r 1 , r 2 , ..., r Q−1 forms a cyclic group denoted as Z * Q , where r serves as the generator of this cyclic group with modulus Q.
where ζ is the 2N -th primitive root of unity in R Q .So, this property makes it suitable for decomposing the modulo polynomial X N + 1, as explained in Section 2.6.
However, if the modulus Q is a composite number, the generator r exists only if Q takes the form of 4, p k , or 2p k , where p is a prime and k is an integer.It should be noted that using the primitive root directly to generate a 2N -th root that satisfies the NTT requirements is not possible.To solve this problem, Heinz et al. [HP22] propose a method to construct the root for the composite NTT, while applying it to the attack and defense of the measurement channel.Given two NTT-friendly numbers Q 1 , Q 2 , let ζ Q1 and ζ Q2 be the 2N -th primitive root of unity for the polynomials R Q1 and R Q2 , respectively.The method that construct the 2N -th primitive root of unity ζ Q in [HP22] as In practice, the Equation 2 is often replaced with the slightly more efficient method as It is easy to verify that these two methods of constructing roots are equivalent.Then, we provide complete proof of their approach.
Proof.The equation 2 uses the inverse Chinese Residue Theorem to integrate two roots ζ Q1 and ζ Q2 , we have Since the ζ Q1 and ζ Q2 are the primitive roots of unity of ring R Q1 and R Q2 , respectively, the following equation holds Therefore, we can derive that Q satisfies the property of being periodic and symmetric.We extend this method to composite modulus consisting of multiple distinct NTTfriendly moduli, as illustrated in Algorithm 2. In this approach, we employ a binary tree technique to minimize the number of multiplications, instead of directly utilizing k − 1 iCRT operations.This optimization allows for more efficient computations and improved performance.

Algorithm 2 Construction of the 2N -th Root for Modulus
Run the CrootGen(1, k) function to obtain the 2N -th primitive root of unity ζ for modulus Q. 1: function CrootGen(w, v) It is worth noting that the NTT and iNTT algorithms with composite modulus differ from traditional algorithms solely in their input.This implies that we can input the root ζ and its inverse into Algorithms 6 and 7 respectively, to perform the composite NTT and iNTT operations, collectively referred to as Com-NTT.This approach allows for the seamless integration of composite modulus into the NTT and iNTT computations, enhancing the flexibility and compatibility of the NTT algorithm.

Applications
These methods for polynomial multiplication with composite modulus are aimed at different application scenarios for HE schemes.The RNS technique is commonly used in BGV, BFV, and CKKS schemes.By incorporating the RNS technique, these schemes can effectively deal with larger moduli while improving computational efficiency.RNS variants have emerged as the preferred choice in practical implementations, featured in software libraries like SEAL [SEA22] and OpenFHE [BBB + 22].
Compared to RNS decomposition, the composite NTT approach provides a more flexible strategy.On the one hand, it is compatible with the RNS decomposition.In addition, it can directly perform the Number Theoretic Transform (NTT) on the composite modulus.In general, the latter option is more suitable for HE operations within a 64-bit ciphertext modulus.For instance, the external product is improved in FHEW-like schemes, as demonstrated in Section 4.
The method described in [CHK + 21] applies to predetermined ring parameters, including the dimension and modulus of the polynomial.For instance, the original CKKS scheme [CKKS17] uses the ciphertext modulus q = p l to reduce the error resulting from the rescaling operation.The implementation in the HEAAN library adopts the strategy outlined in [CHK + 21], which lifts the ciphertext modulus to utilize the RNS-based NTT implementation.In summary, depending on various scenarios and application requirements in the homomorphic encryption scheme, it is possible to identify suitable parameter settings and NTT strategies that effectively accelerate the underlying polynomial operations.

Faster FHEW-like Bootstrapping with Modulus Raising
The external product and blind rotation in the FHEW-like scheme use exact gadget decomposition to reduce noise, which means that only digit decomposition is utilized to reduce the error growth.We introduce the modulus raising technique into the FHEW-like schemes and then propose a hybrid method for external products by integrating digit decomposition, RNS decomposition, and modulus raising.These methods improve the efficiency of blind rotation and bootstrapping.Finally, we show the new technique to bootstrap two LWE ciphertexts in one blind rotation.

External product with modulus raising
Instead of using a single ciphertext modulus in RGSW ciphertext, our method involves the composite number consisting of multiple NTT-friendly moduli.More precisely, the ciphertext modulus of RGSW is set to the composite number P Q, where P Q ≈ Q g .Sample two RLWE ciphertexts {ct 0 , ct 1 } ∈ RLWE s,P Q (0) and the RGSW ciphertext takes the following form: Definition 1. (Variant External Product).We define variant external product as M , which is performed between ct = (a, b) The error generated in M is e = a•e0+b•e1 P + e + e round , where e 0 and e 1 are the error terms of the RGSW ciphertext, e is the error term of the RLWE ciphertext, and e round is the error caused by the rounding operation.And the variance of e is (3) where the factor 1 12 corresponds to the variance of discrete uniform distribution in the range [−1/2, 1/2] and the ternary secret key distribution ensures that ||s|| 2 ≤ N/2.

Bootstrapping Procedure with Modulus Raising and Composite NTT
Then, we show the FHEW-like bootstrapping procedure based on the external product with modulus raising.For a ternary LWE secret key s ∈ {−1, 0, 1} n , our bootstrapping key generation process [BIP + 22] is described as follows.
(4) Note that the bootstrapping keys are precomputed and stored in the NTT representation, and can be reused in the bootstrapping process.
A key switching key ksk s (s i ) as shown in Section 2.7.A test polynomial tv embedding a look-up table f .
Algorithm 3 presents the improved bootstrapping algorithm for the GINX method by using modulus raising.The procedure begins with an LWE ciphertext as usual.In line 4 of the algorithm, let is a ternary CMux gate, and it is easy to verify that acc i M CT M ux,i yields the ciphertext acc i = RLWE(X ai•si ), where 1 is a noiseless RGSW ciphertext.In more detail, the accumulator acc is viewed as two polynomials with modulus P Q, and performs the composite NTTs.To reduce the number of NTT transformations and rounding operations, we generate beforehand a table containing all NTT representations of X i − 1 with modulus P Q, where 0 ≤ i ≤ 2N − 1. Subsequently, we utilize the ciphertext a i to retrieve the corresponding NTT representation for X ai − 1 and X −ai − 1, enabling direct Hadamard multiplication with bootstrapping keys.However, the division operation needs to be performed in the coefficient representation, which involves two iNTT operations.By utilizing n external products, we can get the ciphertext as Note that the test vector tv serves two purposes.It not only refreshes the noise but also embeds a lookup table f : Z t → Z t by defining the coefficients of the polynomial as follows: This method, known as functional bootstrapping, is described in [CJP21,KS21] schemes.Then, we can get an LWE ciphertext LWE N Coefs(s),Q (f (m)) through the sample extraction operation.Finally, the key switching and modulus switching operations are performed to obtain the ciphertext ct = LWE n s,q (f (m)) ∈ Z n+1 q , which completes the entire bootstrapping procedure.
Remark 4.1.Note that the RNS decomposition method can also be utilized for polynomial multiplication in the external product.However, this approach requires additional NTT operations, as explained in Appendix C. The computational complexity of blind rotation for the Algorithms 1, 3 and 8 is outlined in Table 2.

Error analysis
In this subsection, we analyze the variance of error for Algorithm 3. Firstly, the error growth in blind rotation is caused by a sequence of n external products i.e., acc n−1 i=0 CT M ux = (...((acc 0 CT M ux,0 ) CT M ux,1 )... CT M ux,n−1 ).Thus, we can obtain variance by using n times Equation 3as Due to the fact that Var(err • Var(err(bsk)) ≤ 4 • Var(err(bsk)) and the initial RLWE is noise-free, i.e., Var(err(acc 0 )) = 0. We can obtain the variance from the blind rotation as Then, we can get the variance of the error from the key switching operation as Finally, after modulus switching, we can conclude that the variance of the error generated by the bootstrapping process is Compared to the gadget decomposition, our method introduces an additional error that is derived from the rounding operation.In Section 5, we demonstrate that the error is negligible for the decryption failure rate.

Hybrid External Product and Blind Rotation
In Definition 1, we only use the modulus switch to reduce noise growth.Following that, we present a hybrid external product operation in Definition 2. Firstly, we show a hybrid approach based on digit decomposition and modulus raising.Given a modulus Q and the base B g , we can denote the gadget matrix G as , where .
Definition 2. (Hybrid External Product).We define the hybrid external product as H , (5) The error generated by H is e = , and its variance is Afterward, Algorithm 4 shows the second improved GINX blind rotation algorithm using the hybrid external product.The gadget decomposition and modulus switching are utilized to reduce error growth in lines 3 and 7, respectively.The correctness of the algorithm can be directly derived from the new external product.Compared to the case of d g > 2 for Algorithm 1 with the ternary secret key, the new blind rotation algorithm involves a lesser number of NTT operations.
Algorithm 4 GINX Blind Rotation with Hybrid External Product.

Input:
An LWE sample ct = (a, b) ∈ LWE n s,q (m), where q|2N .A blind rotation key bsk s (s i ) as shown in Equation 4 using hybrid RGSW.Output: Remark 4.2.We remark that the digit decomposition in Equation 5 can be replaced by RNS decomposition with the same number of NTTs and Hadamard multiplications.To be specific, we can select d g NTT-friendly numbers Q i , and let In this way, the gadget matrix G associated with the RNS decomposition can be expressed as . Thus, the external product can perform the RNS decomposition against the accumulator acc , and then use the composite NTTs to compute the subsequent Hadamard multiplications.Table 3 exhibits the computation complexity of blind rotation for Algorithms 1 and 4.

Packing Bootstrapping with Composite NTT
We show that the proposed composite NTT technique can support packing bootstrapping procedures.To simplify our algorithm, we use binary keys in this section.Details of this subroutine are given in Algorithm 5. Specifically, when packing l bootstrapping procedures, we first need to choose l sets for parameters P k Q k , and set Given l blind rotation keys brk P k Q k that are generated in Section 4.2, we can precompute to generate the new blind rotation key using iCRT as Algorithm 5 Packing Bootstrapping using CRT and composite NTT.
The blind rotation key brk P Q as shown in Equation 6. Output: 4: 6: In order to improve the expensive external product operation in CMux gates for the l blind rotations processes.We utilize the CRT to merge the l accumulators acc k into polynomial ring R P Q .Subsequently, we utilize the composite NTT to perform a single external product operation.Here, we need 2d g +2 NTTs and 4d g Hadamard multiplications in polynomial ring R P Q in lines 5-7 of the algorithm.To achieve this purpose, we incur additional operations, i.e., the CRT and iCRT processes.Furthermore, we use gadget decomposition for each LWE ciphertext to reduce noise and advance to the next iteration.The detailed analyses of algorithm correctness and noise growth have been omitted, as they can be directly derived from the properties of the CRT.
It is worth noting that the number of packing bootstrapping of the proposed algorithm is directly related to the machine word length of the experimental platform.Typically, by taking advantage of the CPU's capacity to handle 64 bits, we can achieve maximum gain by packing two bootstrapping procedures together, resulting in improved overall performance.

AP Blind Rotation. The AP blind rotation [DM15, ASP14
] supports arbitrary types of secret key distributions.The idea is to decompose the LWE ciphertext and extract the associated blind rotation key.Then, the blind rotation keys are accumulated through some external products.Our technique can improve the AP blind rotation by utilizing the external product M and H Given an LWE secret key s ∈ Z n q , the AP blind rotation key is generated as where i ∈ [0, n − 1], j ∈ [0, log Br q − 1] and v ∈ Z Br .The decomposition base B r ≥ 2 can offer a tradeoff between space and computational complexity.For an LWE cipheretxt, we can decompose each term a i as a i = log Br q−1 j=0 a i,j and accumulator acc is updated for all a i,j that acc = acc M bsk i,j,ai,j .
The detailed algorithm is described in Appendix D.
LMK Blind Rotation.The LMK blind rotation [LMK + 23] improves the AP method for efficiently supporting arbitrary secret key distributions by utilizing ring automorphisms and RLWE-based key switching in the FHEW cryptosystem.The technical overview of the high-level structure of their solution is as follows.
Given a current accumulator RLWE(g i−1 (X)), one can use automorphism Then, with the blind rotation key RGSW(X si ), the external product is performed to the ciphertext of g i−1 (X a −1 i ) • X si .Finally, the ring automorphism ψ ai : X → X ai is utilized to get the accumulator that encrypt g i−1 (X aisi ).After repeating this process n times, the accumulator can be calculated as During this process, the automorphisms ψ a exist only for odd values due to the power-of-two cyclotomic setting.Their work introduces some solutions and optimizes the algorithm by reordering the secret key.
In bootstrapping, key-switching is necessary to transform the ciphertext into encryption under the original key following the automorphism operation.The proposed external product technique can be utilized to accelerate both the external product and key-switching operations.We omit the algorithm, refer to [LMK + 23] for the detailed process.

NTRU-based External product
Our technique can also apply to the NTRU-based external product and bootstrapping procedure.In particular, Bonte et al. [BIP + 22] construct NTRU-based GSW-like ciphertext (NGS), where the NGS ciphertext is represented as a vector polynomial and performed external product by gadget decomposition.We can improve the NTRU-based external product by using techniques similar to section 4.2.These techniques serve to further improve the efficiency of NTRU-based bootstrapping.However, in practice, careful consideration must be given to parameter selection due to the dense sublattice attack, which restricts the modulus size of the NTRU problem below n 2.484+o(1) [DvW21].

Parameters and Performance
This section presents a comprehensive analysis of the proposed scheme, including parameter setting, error growth, and decryption failure rates.Additionally, we compare the bootstrapping experimental results of the proposed method with the gadget decomposition.This analysis will offer valuable insights into the respective strengths and weaknesses of these methods, allowing us to make choices about which method is more suitable for different parameters and scenarios.

Parameters and Noise Growth
The proposed algorithm in Section 4 works with the following parameters: -  -P , Modulus used in the RGSW sample and external product operation.Table 4 presents specific parameters used in implementation with the modulus raising and hybrid methods.The secret keys are selected from binary and ternary distributions, which are employed in the TFHE and FHEW schemes.We choose the numbers P Q that satisfy the requirements for the composite NTT as shown in Section 3.3.
Table 5 displays specific parameters for gadget decomposition and key-switching.The two columns labeled B g and d g are used in the hybrid external product, while corresponding to parameter sets B_1024_27 and T_1024_27 in table 4. The remaining parameters are used in the external product based on gadget decomposition and key-switching, which is provided by [MP21] scheme.Note that the bootstrapping with procedure with gadget decomposition entails an additional step of modulus switching to a smaller modulus Q, then performing the key switching operation.
-Q g , RLWE ciphertext modulus for gadget decomposition; -B g , Gadget base for RGSW encryption, which breaks integer Q g into d g digits; -B k , Gadget base for key switching, which breaks integer Q g into d k digits; Table 5: Bootstrapping parameters for the gadget decomposition and key-switching The security level of HE schemes is determined by several factors, including the secret key distribution, the dimensions and modulus of the (R)LWE sample, and the standard deviations of the error according to the HE standard [ACC + 18].Then, we use the LWE estimator [APS15] to estimate the security level, which calculates the complexity of primal attacks via the shortest vector problem, decoding, and dual-lattice attacks.Table 6 provides the cost of specific attacks using the BKZ.sieve cost model.
Afterward, we analyze the error growth and decryption failure probability with these methods under different parameters.Specifically, the bootstrapping procedure results in a ciphertext with an error from a Gaussian distribution with standard deviation , where the σ 2 ACC plays a prominent role in determining the overall error magnitude.We compare the error growth of three different methods generated by blind rotation with the ternary secret key distribution, and the variances of errors are We note that the modulus raising and hybrid methods introduce additional noise terms due to the rounding operation.However, we have a smaller value for the length of decomposition that satisfies d g = d g /2 , where d g = 1 is used in the modulus raising method.Typically, for the sets of parameters of the ternary key T_1024_36, T_1024_27, and T_2048_50.the results in Tables 4 and 5 can be obtained as follows: We remark that these comparison results give an intuition that the ratio of noise is so small as to be negligible, which can also be verified in the decryption failure rate.Specifically, in experiments, we evaluate one homomorphic addition for the NAND gate in bootstrapping, where t = 4, and the probability of decryption failure can be calculated using the following formula: erf is the Gaussian function.Table 7 presents the decryption failure rate for parameter sets T_1024_36, T_1024_27, and T_2048_50 with the ternary secret key distribution.
From the table, we can see that the error gap has almost no effect on the decryption failure rate.Finally, we also performed a large number of experiments on the LWE samples with various key distributions, and the results of the experiments were consistent with the theoretical analysis.

Key Sizes
In this subsection, we analyze the key sizes for the aforementioned methods.Table 8 illustrates the sizes of blind rotation keys for both gadget decomposition and modulus raising.It is worth noting that the sizes of the key switching have been omitted from the table since they remain consistent for all the methods.Across all parameter sets, the modulus raising and composite NTT techniques achieve a remarkable reduction in key size, amounting to 50% compared to the gadget decomposition.This notable outcome holds great promise, particularly for hardware acceleration, as the binary key distribution necessitates a mere 9 MB of key size.

Implementation and Experiment performance
Firstly, we compare the number of NTTs and Hadamard multiplications in bootstrapping using different decomposition lengths d g in Figure 1.In addition, we implement Algorithms 3 and 4. The evaluation environment was a commodity desktop computer system with an Intel(R) Core(TM) i5-12500 CPU @ 3.00GHz and 64 GB of RAM, running Ubuntu 22.04.2LTS.The compiler was g++ 11.3.0.The experiments of identity bootstrapping evaluation are presented in Table 9, and each result is an average of 5000 executions.
The proposed method demonstrates a significant improvement in runtime compared to the gadget decomposition technique.In particular, we achieve speedups of around 1.5 × and 1.7 × under the specific sets of parameters, respectively, which is consistent with the expected results described in Section 4 and Figure 5.3.After that, we show the effect of gain when using our method in AP-based and LMK-based blind rotation under 127-bit security levels as shown in Table 10.11 presents the experiment results for the packing method as shown in Algorithm 5.Under the B_1024_27 parameter setting, the packing approach achieves efficiency gains of approximately 2.6 × and 1.5 × when compared to Algorithms 1 and 4 for two bootstrappings, respectively.This substantial improvement holds practical significance for the practical application of FHEW-like schemes.Finally, we note that the proposed algorithms can not apply the FFT-based TFHE scheme.In terms of time performance, the TFHE scheme incorporates optimizations and accelerations for AVX instructions, as seen in libraries such as TFHE-lib [CGGI20] and TFHE-rs [BSJJ22] libraries.Generally, the AVX-512 instructions can potentially deliver a substantial 3 ∼ 4 times speedup compared to the baseline performance in C++.One of our future works is to utilize AVX-512 instructions to accelerate the proposed algorithms.This enhancement will allow us to further optimize the performance of our scheme and make it compatible with advanced AVX instructions.

Conclusion
In this paper, we propose a faster bootstrapping procedure for FHEW-like schemes.By introducing a composite NTT technique, we integrate the gadget decomposition and modulus raising into the external product operation, we can reduce the number of NTTs required in the blind rotation process.Furthermore, we introduce a packing method that can bootstrap two LWE ciphertexts using a blind rotation process based on composite NTT.The results of the implementation of the proposed algorithm show gains in both efficiency and size.Our methods have the potential to improve the FHEW-like schemes and can be applied in practical scenarios.

B Algorithm for modulus switching
We show the modulus switching algorithm as follows.Proof.Let the integers Q > q > t, the output ciphertext is By checking the decryption function, we can get ( q a, s + r, s + r = t q • m + q Q • e + r, s + r, where r ∈ R and r ∈ R n are in [−1/2, 1/2].According to the central limit heuristic, the error is close to a gaussian distribution, and its variance is Var(err(ct )) ≤ ( q Q ) 2 • Var(err(ct)) , where the factor 1 12 is the standard deviation of a uniform distribution in [−1/2, 1/2].Algorithm 8 shows the blind rotation by utilizing modulus raising and RNS techniques.To conduct polynomial multiplication between R Q and R P Q , the conventional approach involves decomposing the bootstrapping key polynomials with modulus P Q into R Q and R P using the CRT, as demonstrated in Section 3.1.This decomposition process doubles the number of NTTs and Hadamard multiplications required.Moreover, similar to gadget decomposition, the division operation must be performed on the coefficient representation, which also requires the iNTT operations.

D AP Bootstrapping with Modulus Raising
We show the AP Bootstrapping with modulus raising in Algorithm 9.Moreover, similar to section 4.2, the hybrid external product method can also be used in this algorithm, where B r is the decomposition base for LWE ciphertext.end for 10: end for 11: return acc .

Lemma 2 .
Input an LWE ciphertext ct = (a, b) ∈ LWE n s,Q (m) with the error variance Var(err(ct)), the modulus switching operation outputs a new LWE ciphertext ct with the error variance Var(err(ct )).

Table 1 :
Comparison of GINX blind rotation with different external products, where GD is the gadget decomposition, MR is the modulus raising, and HY is the hybrid method.In operation counts, NTT is the Number Theoretic Transforms, HM is the Hadamard Multiplication, and GDP and DRP are the gadget decomposition and division and rounding for polynomials, respectively, where d g , d g (d g > d g ) are the length of the gadget decomposition.
Number Theoretic Transform (NTT), Fast Fourier Transform (FFT), and Toom-Cook multiplication can be used to efficiently perform polynomial multiplication.The NTT algorithm needs to satisfy that Q ≡ 1 (mod 2N ), which guarantees the existence of the 2N -th primitive root of unity.Regarding other NTT-unfriendly rings, Chung et al.
[CHK + 21] propose that one can lift the polynomial ring to a larger NTT-friendly ring that covers all results without modular reduction.Compared to Toom-Cook multiplication, this method can improve the efficiency for unprotected Saber implementations on the Cortex-M4.After that, Abdulrahman et al. [ACC + 21] note that the implementation for [CHK + 21] has a large memory footprint.The method [ACC + 21] utilized multi-moduli NTTs to enable a very stack-efficient implementation competitive in memory usage.In terms of composite NTT, Heinz et al. [HP22

Table 2 :
Comparison of blind rotation for different algorithms under the ternary secret key, where the length is set to d g = 2 for gadget decomposition in Algorithm 3.

Table 3 :
Comparison of blind rotation for Algorithms 1 and 4 with the ternary secret key, where the d g and d g are the length for gadget decomposition and hybrid method, respectively, where d g > 2 and d g = d g /2 .
Ciphertext modulus for the RLWE sample and key-switching;

Table 4 :
Bootstrapping parameters for the modulus raising

Table 6 :
Security estimations for the parameter sets.

Table 7 :
Decryption failure rates for Bootstrapping.Comparison of blind rotations with different lengths of decomposition

Table 8 :
Sizes of blind rotation keys.

Table 9 :
Single-threaded timing results for bootstrapping

Table 10 :
Comparison AP-based and LMK-based blind rotations in 127-bit security level with the ternary secret key, where GD is the gadget decomposition, the hybrid method is used in our technique, and d g = 3 and d g = 4 are the lengths of decomposition in AP and LMK method, respectively.

Table 11 :
Bootstrapping comparisons for two LWE ciphertexts with the parameter B_1024_27, where the corresponding times for Algorithms 1 and 4 are simply doubled.