Fast and Accurate: Efficient Full-Domain Functional Bootstrap and Digit Decomposition for Homomorphic Computation

. The functional bootstrap in FHEW/TFHE allows for fast table lookups on ciphertexts and is a powerful tool for privacy-preserving computations. However, the functional bootstrap suffers from two limitations: the negacyclic constraint of the lookup table (LUT) and the limited ability to evaluate large-precision LUTs. To overcome the first limitation, several full-domain functional bootstraps (FDFB) have been developed, enabling the evaluation of arbitrary LUTs. Meanwhile, algorithms based on homomorphic digit decomposition have been proposed to address the second limitation. Although these algorithms provide effective solutions, they are yet to be optimized. This paper presents four new FDFB algorithms and two new homomorphic decomposition algorithms that improve the state-of-the-art. Our FDFB algorithms reduce the output noise, thus allowing for more efficient and compact parameter selection. Across all parameter settings, our algorithms reduce the runtime by up to 39 . 2%. Our homomorphic decomposition algorithms also run at 2.0x and 1.5x the speed of prior algorithms. We have implemented and benchmarked all previous FDFB and homomorphic decomposition algorithms and our methods in OpenFHE.


Introduction
Fully Homomorphic Encryption (FHE) is a powerful cryptographic tool that enables computation on encrypted data without requiring access to the decryption key.It has great potential for use in computing fields where data privacy is important, such as secure cloud computing [KSK + 18, PKS + 19, LATV12] and privacy-preserving machine learning [LKL + 22, BMMP18, CJP21, LHH + 21], as well as in the construction of cryptographic protocols such as private set intersection [CLR17, CHLR18, CMdG + 21].Since Gentry's first construction of an FHE scheme utilizing the bootstrap technique [Gen09], various FHE schemes have been developed [FV12, BGV14, CKKS17, GSW13, DM15, CGGI20] and significant improvements have been made [LW23a, LW23b, BIP + 22, Klu22].Among these FHE schemes, BGV/FV, CKKS and FHEW/TFHE have gained prominence recently because of their great efficiency.BGV/FV and CKKS have effective packing capabilities that allow for computations over vector data using Single Instruction Multiple Data (SIMD) instructions, making them ideal for simultaneously processing large arrays of numbers.However, these schemes are less efficient for evaluating deep circuits and inconvenient for evaluating non-polynomial functions.On the other hand, FHEW/TFHE utilize an efficient functional bootstrap (or programmable bootstrap) process that enables the evaluation of a lookup table (LUT) without additional cost, making these schemes ideal for evaluating boolean circuits and non-polynomial functions.Moreover, due to the switching method introduced in CHIMERA [BGGJ20] and later improved in PEGASUS [LHH + 21], a CKKS ciphertext can be converted into multiple FHEW/TFHE ciphertexts to compute non-polynomial functions and then converted back to CKKS ciphertext for SIMD polynomial evaluation.This makes functional bootstrap a versatile tool for all FHE evaluation purposes.
Despite its strength, functional bootstrap still suffers from two limitations: (1) the evaluated LUT f : Z p → Z p must be negacyclic such that f (x + p 2 ) = −f (x) for all x ∈ Z p , preventing some LUTs from being evaluated directly; (2) the input plaintext modulus p is typically small due to efficiency constraints, limiting its ability to evaluate large precision LUTs.Numerous efforts have been made to address these two limitations.To circumvent the negacyclicity constraint, Full Domain Functional Bootstrap (FDFB) algorithms supporting arbitrary LUTs have been proposed.These FDFB algorithms can be categorized into Type-SelectMSB, Type-HalfRange and Type-Split.Type-SelectMSB selects between two negacyclic LUTs based on the most significant bit (MSB) of the encrypted message and is used in algorithms proposed by [CLOT21,KS22].Type-HalfRange transforms the encrypted message to prevent it from exceeding p 2 , thereby bypassing the negacyclic limitation.This method is adopted in algorithms proposed by [LMP22, YXS + 21, GBA22].Finally, Type-Split expresses an arbitrary LUT as the sum of a 'pseudo-odd' LUT and a 'pseudo-even' LUT, each of which can be evaluated using two functional bootstraps.This method is employed in the algorithm proposed by [CZB + 22].In addition to focusing on the construction of FDFB, a method for using FDFB to aid in evaluating CKKS ciphertexts is presented in [LY23].To handle the evaluation of large-precision LUTs, Guimarães et al. [GBA21] propose tree-based and chaining methods to combine multiple functional bootstraps in TFHE.These two methods in [GBA21] assume that each ciphertext encrypts a digit of the original message.Therefore, when an input ciphertext has a large modulus, it must first be preprocessed with homomorphic decomposition before the methods can be applied.On the other hand, Liu et al. [LMP22] develop homomorphic digit decomposition algorithms and demonstrate how they can be used to evaluate large-precision sign functions.As a result, homomorphic decomposition is a crucial component in current techniques for evaluating large-precision LUTs.
In practice, functional bootstrap plays a critical role in many FHE applications, and thus its optimization is paramount for achieving high performance.Nevertheless, the efficiency of the FDFB and digit decomposition algorithms still requires further evaluation and optimization.

Our Contributions
compared to the state-of-the-art results across various parameter settings.
(2) We present two new homomorphic decomposition algorithms HomDecomp-Reduce and HomDecomp-FDFB, whose running speed is 2x and 1.5x that of Hom-Floor and HomFloorAlt from [LMP22], respectively.Unlike HomFloor, our algorithms do not require the input ciphertext to have small noise.The speedup of our algorithms directly results in faster large-precision evaluations of functions such as sign, ReLU, max, ABS, etc.
(3) We provide a comprehensive theoretical noise analysis for our FDFB and homomorphic decomposition algorithms, as well as those developed by previous works.We have implemented and benchmarked all the algorithms in the OpenFHE library [BBB + 22] to validate our results.Our implementation of all FDFB algorithms in a single library is a first-of-its-kind initiative, which provides standardized access to these algorithms.

FDFB Algorithms
The current FDFB algorithms are summarized as follows.
WoP-PBS 1 [CLOT21] (Type-SelectMSB) introduces an extra MSB to the encrypted message by doubling the ciphertext modulus.The algorithm evaluates the LUT to obtain a ciphertext that possibly differs by a sign from the desired result.Then, it extracts the MSB using functional bootstrap and offsets the sign by invoking BFV multiplication.However, the rapid noise growth of BFV multiplication requires the algorithm to use inefficient parameters, thus degrading performance.
WoP-PBS 2 [CLOT21] (Type-SelectMSB) builds two sub-LUTs according to the MSB of the encrypted message.The algorithm evaluates both sub-LUTs to obtain two ciphertexts and extracts the MSB using functional bootstrap.Then BFV multiplication is invoked to select the correct ciphertext.Again, BFV multiplication still requires large parameters and degrades performance.
FDFB-KS [KS22] (Type-SelectMSB) builds two sub-LUTs similarly to WoP-PBS 2 .The algorithm selects between the two sub-LUTs to obtain an encrypted LUT and then uses functional bootstrap to evaluate it.However, selecting the sub-LUTs requires multiple functional bootstraps and causes significant computational overhead.
EvalFunc [LMP22] (Type-HalfRange) introduces an extra MSB in a similar way to WoP-PBS 1 .The algorithm extracts the MSB using functional bootstrap and cancels it to ensure that the message belongs to half of Z p .Then it can evaluate the LUT without being constrained by negacyclicity.We note that the FullyFBS of [YXS + 21] and the FDFB-C of [GBA22] are essentially the same as EvalFunc.
Comp [CZB + 22] (Type-Split) expresses an arbitrary LUT as the sum of a 'pseudoodd' LUT and a 'pseudo-even' LUT.Then the algorithm evaluates each LUT using two functional bootstraps.
In [CIM19], Carpov et al. develop a multi-value bootstrap technique that allows several LUTs to be evaluated on the same input using a single functional bootstrap call.This technique can reduce the functional bootstraps required for WoP-PBS 1 , WoP-PBS 2 and Comp when the parameters support multi-value bootstrap.

Homomorphic Decomposition Algorithms
The current homomorphic decomposition algorithms are summarized as follows.
HomFloor [LMP22] uses two bootstraps to clear the lower bits of a large-precision message before modulus switching, which prevents the modulus switching noise from corrupting the higher digits.By iteratively applying these operations, a large-precision message can be decomposed into a vector of 4-bit digits.However, this algorithm does

FDFB-Compress
Compress the coded message using a functional bootstrap and reduce the noise

FDFB-CancelSign FDFB-Select
Replace BFV multiplication with LWE-to-RLWE packing and bootstrap Use a refined noise analysis for BFV multiplication; use fewer BFV multiplications

HomDecomp-Reduce HomDecomp-FDFB
Reduce the range of the lower bits instead of clearing them and use fewer bootstraps not apply to extracted CKKS ciphertexts because it requires a small noise in the input ciphertext.
HomFloorAlt [LMP22] uses three bootstraps to extract the digits of a large-precision message, allowing it to support the decomposition into 5-bit digits and decompose extracted CKKS ciphertexts.

Overview of Our Algorithms
We present the intuition behind our algorithm design and explain how it leads to better performance (see Table 1 for a summary).The key advantage of our algorithms is their reduced noise growth, which enables us to choose more compact LWE and RLWE parameters (such as decomposition bases in blind rotation and RLWE dimension) for a given plaintext modulus, resulting in shorter running time.
FDFB-Compress is a Type-HalfRange FDFB algorithm.Our key observation is that the LWE message must be in a coded (and thus redundant) form q p m + e ∈ Z q to prevent decryption failures due to errors, where q is the ciphertext modulus.This enables us to design a compression function that can compress the coded LWE message into [− q 4 , q 4 − 1] using one functional bootstrap.Then, we can perform another functional bootstrap on the compressed message to get the desired result.As a result, FDFB-Compress uses the same number of bootstraps as EvalFunc but reduces the error variance of the compressed message by half, resulting in a more compact parameter choice and better performance.
FDFB-CancelSign, FDFB-Select and FDFB-BFVMult (WoPPBS 1 -Refine) are all Type-SelectMSB FDFB algorithms.The primary objective of FDFB-CancelSign and FDFB-Select is to replace the BFV multiplication in WoP-PBS 1 and WoP-PBS 2 with LWE-to-RLWE packing and an additional functional bootstrap.This approach prevents the multiplicative noise growth in BFV multiplication and instead achieves additive noise growth.As a result, although FDFB-CancelSign and FDFB-Select require an extra functional bootstrap compared to WoP-PBS, their slower noise growth allows for more compact parameter choices and better efficiency in most cases, according to our experiments.On the other hand, WoPPBS 1 -Refine and FDFB-BFVMult are enhanced algorithms of WoP-PBS 1 and WoP-PBS 2 , respectively.They significantly reduce the error growth in WoP-PBS 1 and WoP-PBS 2 by roughly N times, where N is the RLWE dimension.This is achieved through a refined noise analysis of the BFV multiplication.Such an in-depth analysis allows for the choice of smaller bootstrapping

ModSwitch Ours
Figure 1: Comparison of our homomorphic digit decomposition approach and that of [LMP22].The blue parts stand for higher bits, while the green and red parts stand for lower bits before and after modulus switching.
parameters, resulting in enhanced efficiency.Moreover, FDFB-BFVMult removes one BFV multiplication in WoP-PBS 2 by combining two BFV multiplications with the sign bit into one multiplication, further reducing the noise growth by half.
The current homomorphic digit decomposition algorithms presented in [LMP22] extract digits by repeatedly clearing the lower bits m low of the encrypted messages (leaving a small bootstrap error) and then modulus-switching it to a smaller modulus q0 B .We observe that this goal can also be achieved by reducing the range of the lower bits instead of clearing them.In contrast to clearing the lower bits, reducing their range consumes fewer functional bootstraps.Still, it can reserve enough room to hold the modulus switching noise, thus preventing the higher digits from being destroyed by overflowed noise.Figure 1 illustrates a comparison of these two approaches.Following this idea, we design HomDecomp-Reduce and HomDecomp-FDFB, which run 2x and 1.5x faster compared to HomFloor and HomFloorAlt in our experiments.

Notations
The ring of integers modulo q is denoted as Z q = Z/qZ.Its elements are represented as integers in either [0, q − 1] (positive form) or [− q 2 , q−1 2 ] (signed form).For an integer a, its positive form and signed form in Z q are denoted as [a] + q and [a] q , respectively.For a power-of-2 N , the 2N -th cyclotomic ring is denoted as R = Z[X]/(X N + 1), and its quotient ring is denoted as R q = R/qR.Polynomials are represented using bold letters, e.g., a.For a vector a or a polynomial b, we use a i and b i respectively to denote a's i-th entry and b's coefficient of the X i term.The coefficient vector of b is denoted as For a postive interger n, the set {0, 1, . . ., n − 1} is denoted as n .We use a ← χ to represent a random variable a sampled from the distribution χ, and a ← S to indicate that a is uniformly sampled from the finite set S. We use D(Z, σ) to denote the discrete Gaussian distribution of parameter σ over Z.The infinity norm and 2-norm of a vector a are denoted as | a| ∞ and | a| 2 respectively.All logarithms are taken with a base of 2 unless otherwise stated.

LWE and RLWE Ciphertexts
Throughout this paper, we use lowercase q and n to denote the modulus and dimension of LWE instances, while uppercase Q and N are used for the RLWE modulus and dimension.
The LWE ciphertext encrypting an encoded message m ∈ Z q is defined to be LWE s,n,q (m + e) = (− a, s where a ← Z n q , e ← D(Z, σ), and the secret vector s ← {0, ±1} n .The RLWE ciphertext encrypting an encoded message m ∈ R Q is defined to be where a ← R Q , e i ← D(Z, σ), and the secret polynomial satisfies s i ← {±1, 0}.
For simplicity, we may sometimes use the abbreviated notation LWE s (m) and RLWE s (m) (or LWE(m) and RLWE(m)) to denote the LWE and RLWE ciphertexts respectively.
Messages in LWE and RLWE ciphertexts are typically encoded to prevent decryption failures caused by errors.For instance, in an RLWE ciphertext, m is often an up-scaled version of the actual message m ∈ R p , as given by m = Q p m = Q p m + e rnd , where p < Q is the plaintext modulus and e rnd accounts for the rounding errors.

RLWE and RGSW Ciphertexts
An RLWE ciphertext is a vector of RLWE ciphertexts encrypting the same message at different scales, i.e., where B ∈ Z is the decomposition base and l = log . Then the product : R q × RLWE → RLWE can be defined as The obtained RLWE ciphertext contains a noise much smaller than the regular R q × RLWE multiplication due to the small coefficients of u i 's.Besides, the LWE ciphertext can be defined similarly, but we omit the details here.
An RGSW ciphertext is defined as Then the external product : RLWE × RGSW → RLWE between (b, a) = RLWE s (u + e) and RGSW s (m) is defined as

Homomorphic Operators
We introduce some basic homomorphic operations that will be used in our constructions.

Mod Down/Up and Modulus Switching
Let c = (b, a) = LWE s,n,q (m + e) be an LWE ciphertext, and let q be a positive modulus.For q | q, the 'mod down' is defined as For q | q , the 'mod up' is defined as where v ∈ Z q /q .For any modulus q , the 'modulus switching' is defined as where e ms is the noise modulus switching introduces.The three homomorphic operators described above can also be defined for RLWE ciphertexts similarly but are omitted for brevity.

Sample Extract
which extracts the coefficient of the X i term into an LWE ciphertext.

Key Switching
Given an LWE ciphertext c = (b, a) = LWE s,n,q ks (m + e), a decomposition base B ks and key switching keys ksk i,j,k = LWE s ,n ,q ( q q ks s i where e ks is the error key switching introduces. Besides LWE-to-LWE key switching, it is possible to pack LWE ciphertexts into an RLWE ciphertext with similar techniques [GBA21, CZ22], which can be viewed as a specific instance of the public functional key switching method proposed in [CGGI20].This homomorphic operator, denoted as PackingKS(LWE(m), {ksk i,j,k }), is parameterized by a positive integer d and outputs RLWE(m + mX + . . .+ mX d−1 ).Its full definition is detailed in the full version of the paper.

Blind Rotation and Functional Bootstrap
Blind rotation is the key step in the bootstrap of FHEW/TFHE.Given an LWE ciphertext c = LWE s (m + e) with modulus q|2N , a polynomial TV ∈ R Q (often called the test vector) and blind rotation keys where e acc is the noise that blind rotation introduces.In other words, TV is rotated left by 2N q (m + e).{brk ± i } are parameterized by the blind-rotation base B g .A smaller B g (3) modulus switching to q ks ; (4) key switching to the original secret key; (5) modulus switching to q. F is an LUT from Z p to Z p .
means longer running time and smaller e acc .Since the inner structure of blind rotation is irrelevant to the focus of this paper, we omit the details about the use of {brk ± i }.Interested readers can refer to [MP21] for more details.In this paper, we assume q = 2N and omit the {brk ± i } in notations.Note that the constant term of the rotated TV equals TV m+e for m + e ∈ [0, N − 1], and equals −TV To evaluate a negacyclic LUT F : Z p → Z p using blind rotation, the coefficients of TV are arranged in a redundant way to eliminate the error in input ciphertext.Specifically, by setting TV i = Q p F ( p q i ) , the constant term of BlindRotate(LWE s ( q p m + e ), TV) is an encryption of Q p F (m ) .The entire process of the functional bootstrap is illustrated in Figure 2. The noise introduced by the bootstrap process is denoted as e boot .We use Boot[f ](c) to represent the result of performing functional bootstrap using function f on an LWE ciphertext c and use BootRaw[f ](c) to represent the freshly extracted LWE ciphertext after blind rotation (i.e., without any modulus switching or key switching).Notably, each TV uniquely corresponds to a negacyclic function f , so either TV or f can be used to parameterize the functional bootstrap.If the plaintext polynomial TV is replaced with an RLWE ciphertext c tv , we denote the resulting output as Boot[c tv ](c) or BootRaw[c tv ](c).

Multi-Value Bootstrap
Multi-value bootstrap enables the evaluation of multiple LUTs on the same input LWE ciphertext with the cost of a single bootstrap [CIM19].In this approach, the unscaled test vector is denoted as TV ∈ R p , and the goal is to compute Q p TV X −(m+e) , where p is the plaintext modulus.To enable the computation of multiple LUTs, multi-value bootstrap decomposes TV 0 is first multiplied by X −(m+e) using blind rotation, and the resulting RLWE ciphertext is multiplied by TV 1 , which also multiplies the output error variance by

BFV Multiplication
Let p be the plaintext modulus.For two RLWE ciphertexts where e mult is the noise of BFV multiplication.We note that re-linearization keys are required for BFV multiplication.See [KPZ21] for the detailed process.

Noise Introduced by the Operators
The variances of e ms , e ks , e acc , e boot are denoted by σ 2 ms , σ 2 ks , σ 2 acc , σ 2 boot respectively.Besides, recall that q ks is the key switching modulus in blind rotation.B ks and B g are the decomposition bases for key switching and blind rotation, respectively.The values of these variances are listed in the following lemma, and the proof can be found in [MP21].
Lemma 1.Let σ 2 be the variance of the encryption noise, and PackingKS introduces the same amount of noise as KeySwitch.Besides, we denote σ 2 com = ( q q ks ) 2 (σ 2 ms + σ 2 ks ) + σ 2 ms as the variance of noise introduced by the last three steps in the functional bootstrap (Figure 2).

Improved FDFB Algorithms
This section introduces four new FDFB algorithms.We assume that the plaintext modulus p is a power of 2 for better presentation.Notably, changing p to any even number will not affect the correctness or efficiency of the algorithms presented because, as we will see later, the advantage of our algorithms comes from their slow noise growth, whose correctness is independent of the choice of p.We assume the ciphertext modulus q = 2N is a power of 2 and view the message as an integer modulo q in the positive form.For an LWE ciphertext c encrypting m = q p m + e, we add q 2p to c before performing any operations to ensure that e + q 2p ∈ [0, q p − 1].This will simplify the understanding of homomorphic digit decomposition algorithms in Section 4 and is consistent with [LMP22].To keep the description of the FDFB algorithms concise, we focus on input arguments like the LUT F and the input LWE ciphertext, omitting other arguments like the bootstrap key.In our noise analysis, we assume that the input ciphertext of the FDFB algorithms has an error variance of σ 2 boot as in [LMP22].The proof of correctness and noise analysis of the FDFB algorithms is provided in the full version of the paper.

FDFB-Compress
This algorithm employs the Type-HalfRange strategy.Specifically, it first compresses the coded message q p m + e ∈ Z q into the range [− q 4 , q 4 − 1] by evaluating the negacyclic function f C (x) : Z q → Z q via a functional bootstrap, where The design of f C serves two purposes.Firstly, it maps messages encoding the same m to the same value.Secondly, it ensures that the outputs of f C for different m s are at least q 2p apart.q 2p must be greater than 2β to prevent the bootstrapping noise from interfering with the compressed message.In other words, the plaintext modulus p is upper bounded by p < q 4β .After compression, it is possible to bypass the negacyclicity constraint and evaluate an arbitrary LUT F : Z p → Z p on the compressed message by using one functional bootstrap to compute f eval : Z q → Z q , which is defined as The algorithm for FDFB-Compress is fully described in Algorithm 1, with its parameter requirements and noise analysis provided in Theorem 1.

FDFB-CancelSign
This algorithm employs the Type-SelectMSB strategy.Given LWE s,n, q 2 ( q 2p m + e), FDFB-CancelSign first executes ModUp to obtain a ciphertext LWE s,n,q ( q 2 MSB + q 2p m + e) and then performs a raw functional bootstrap to evaluate and obtain a ciphertext encrypting (−1) MSB Q p F (m ) .Finally, an LWE-to-RLWE packing key switching and another functional bootstrap cancel the extra (−1) MSB factor.The algorithm for FDFB-CancelSign is fully described in Algorithm 2, and its parameter requirements and noise analysis are given in Theorem 2.

FDFB-Select
This algorithm employs the Type-SelectMSB strategy but does not perform the ModUp operation as in FDFB-CancelSign.In particular, let F : Z p → Z p be an arbitrary LUT, let ct = LWE s,n,q ( q p m + e) be a ciphertext encrypting m , and let MSB be the most significant bit of m .FDFB-Select first constructs two sub-LUTs from Z p/2 to Z p , which correspond to the LUT F with MSB = 0 or MSB = 1 respectively.These two sub-LUTs can be extended to F 0 , F 1 : Z p → Z p to fulfill the negacyclic constraint.i.e., F 0 (x) = F (x) and F 1 (x) = −F (x + p/2) for x ∈ [0, p/2), F 0 (x) = −F (x − p/2) and F 1 (x) = F (x) for x ∈ [p/2, p).F 0 and F 1 correspond to the functions in (4) and (5).
By evaluating these two functions on ct + q 2p using a single functional bootstrap each, we can obtain two ciphertexts that encrypt F 0 (m ) and F 1 (m ), respectively.Additionally, we can obtain a ciphertext encrypting MSB by evaluating function (6) on ct + q 2p using a single functional bootstrap.
Finally, we use the encryption of MSB to select F MSB (m ) from F i (m ) by a single functional bootstrap.The algorithm for FDFB-Select is fully described in Algorithm 3, and its parameter requirements and noise analysis are given in Theorem 3.
The first three functional bootstraps have the same input ciphertext ct, thus can be accomplished via a single multi-value bootstrap at the cost of increased noise growth.Therefore, when the parameter settings enable multi-value bootstrap, FDFB-Select needs only two functional bootstraps, otherwise it requires four functional bootstraps.In case multi-value bootstrap is unavailable, we develop a variant of FDFB-Select, called FDFB-SelectAlt, described in Algorithm 4, which uses only three bootstraps.The parameter requirements and noise analysis of FDFB-SelectAlt are given in Theorem 4.
Remark.We actually use an improved version of the base-aware LWE-to-RLWE packing proposed by [GBA21] to pack ct pos and −ct neg into ct pk .To pack [GBA21] generates M key switching keys, with each key corresponding to an index i ∈ M .However, we observe that generating the key switching key for i = 0 is sufficient since the keys for i = 0 can be obtained by multiplying the key for i = 0 by X N M i .The storage cost of this optimized version of PackingKS is only 1 M that of [GBA21].
Algorithm 4: FDFB-SelectAlt input : Plaintext modulus p and an LUT F : Z p → Z p input : Base B pk and modulus q pk for PackingKS input : {ksk i,j,k }, packing keys for PackingKS with d = N input : An LWE ciphertext (b, a) = LWE s,n,q ( q p m + e) output : An LWE ciphertext LWE s,n,q ( q p F (m ) + e ) The process of WoPPBS 1 -Refine is identical to that of WoP-PBS 1 , but it employs a much tighter noise analysis, as we will demonstrate later.It first obtains a ciphertext that encrypts (−1) MSB Q p F (m ) in the same way as FDFB-CancelSign.Then it evaluates the function (7) via a functional bootstrap to acquire the encryption of Q p (−1) MSB .Finally, it computes the product of the two LWE ciphertexts using LWE-to-RLWE packing and BFV multiplication.The algorithm is fully described in Algorithm 5, and its parameter requirements and noise analysis are given in Theorem 5.
Algorithm 5: WoPPBS 1 -Refine input : Plaintext modulus p and an LUT F : Z p → Z p input : Base B ks and modulus q ks for key switching input : Base B pk and modulus q pk for PackingKS input : {ksk i,j,k }, key switching keys input : {ksk i,j,k }, packing keys for PackingKS with d = 1 input : FDFB-BFVMult is an improved version of WoP-PBS 2 .Unlike WoP-PBS 2 , which requires the sign bit to be multiplied with both f neg (ct) and f pos (ct), FDFB-BFVMult only needs one BFV multiplication because the sign bit is multiplied with the fresh ciphertext (f neg − f pos )(ct).Consequently, FDFB-BFVMult further halves the noise growth.Specifically, FDFB-BFVMult first constructs two LUTs F 0 and F 1 in the same way as FDFB-Select.Next, by using two functional bootstraps to evaluate f pos and f neg − f pos (defined in (4) and ( 5)), it obtains encryptions of m pos = Q p F 0 (m ) and Then it evaluates the function (8) via a functional bootstrap to acquire the encryption of using LWE-to-RLWE packing and BFV multiplication.The algorithm is fully described in Algorithm 6, and its parameter requirements and noise analysis are given in Theorem 6.
Since the two bootstraps in WoPPBS 1 -Refine (and the three bootstraps in FDFB-BFVMult) share the same input, they can be accelerated by employing a single multi-value bootstrap at the cost of increased noise growth.
Algorithm 6: FDFB-BFVMult input : Plaintext modulus p and an LUT F : Z p → Z p input : Base B ks and modulus q ks for key switching input : Base B pk and modulus q pk for PackingKS input : {ksk i,j,k }, key switching keys input : {ksk i,j,k }, packing keys for PackingKS with d = 1 input : An LWE ciphertext (b, a) = LWE s,n,q ( q p m + e) output : 6 ct res ← ct prod + ct pos 7 ct res ← KeySwitch(ModSwitch(ct res , q ks ), {ksk i,j,k }) 8 return ModSwitch(ct res , q) Refined BFV Noise Analysis.Next, we provide a refined noise analysis for the BFV multiplication involved in FDFB-BFVMult (WoPPBS 1 -Refine).Our core observation is that in LWE-to-RLWE packing, only the constant term of the output polynomial message is assigned the value of the input LWE message, while the coefficients of non-constant terms are close to 0.
Lemma 2 provides a noise analysis of this kind of BFV multiplication.We note that only the dominating term of the error variance is displayed in Lemma 2 (as well as in Theorem 5 and Theorem 6) due to the complexity of the full formula.Refer to the full version of the paper for the full formula and its proof.
In FDFB-BFVMult (WoPPBS 1 -Refine), each of the multiplicands for BFV multiplication is obtained by packing an LWE message with an error variance of σ 2 acc into the constant term of an RLWE ciphertext.This means that the constant term of the encrypted polynomial has an error variance of σ 2 acc + σ 2 ks , while the error variance of non-constant terms is σ 2 ks .Note that σ 2 acc and σ 2 ks correspond to σ 2 i and σ 2 i in Lemma 2. In practice, σ 2 acc is much larger than N σ 2 ks and one of the packed LWE messages is a sign bit (i.e., in {0, ±1}).It then follows from Lemma 2 that the output error variance is about 2p 2 σ 2 ms σ 2 acc .On the other hand, for ordinary BFV multiplication where all terms have an error variance of σ 2 acc + σ 2 ks , the output error variance is about 2N p 2 σ 2 ms σ 2 acc .This is because the dominating noise term becomes a polynomial-polynomial multiplication and introduces an extra factor N compared to scalar-polynomial multiplication (refer to the remark in the full version of the paper for details).This means the noise growth is reduced by roughly N times compared to conventional BFV multiplication.
The output error e has a variance of approximately.Additionally, when multi-value bootstrap is employed, the variance becomes Theorem 6. Suppose |e| < q 2p and |e | < q 2p , then FDFB-BFVMult(F, LWE s,n,q ( q p m + e)) = LWE s,n,q ( q p F (m ) + e ).The output error e has a variance of approximately.Additionally, when multi-value bootstrap is employed, the variance becomes

Improved Homomorphic Digit Decomposition
This section presents two algorithms HomDecomp-Reduce and HomDecomp-FDFB to decompose an LWE ciphertext with a large modulus q 0 into multiple LWE ciphertexts with a smaller modulus q, each encrypting a digit of the original message.HomDecomp-Reduce creates buffer space for modulus switching noise by reducing the range of lower bits by half.It can handle digits of up to 4 bits and requires one bootstrap operation per decomposed digit.In contrast, HomDecomp-FDFB clears the lower bits approximately and can handle digits of up to 5 bits, but it requires two bootstrap operations per digit.We still assume q = 2N as in the previous section.In our noise analysis, we assume that the input ciphertext of the decomposition algorithms has an error variance of σ 2 boot as in [LMP22].Proof of theorems is left to the full version of the paper due to space limit.

HomDecomp-Reduce
In HomDecomp-Reduce, the range of lower bits is first reduced by half using one bootstrap operation to accommodate the subsequent modulus switching noise.The reduction function f red : Z q → Z q0 is defined in (9), with different input and output ranges.
The complete algorithm is described in Algorithm 7. Its parameter requirements and noise analysis are given in Theorem 7.

Algorithm 7: HomDecomp-Reduce input :
A base B for homomorphic decomposition input : An LWE ciphertext ct = LWE s,n,q0 ( q0 p m + e) output : LWE ciphertexts {ct i } encrypting the digits of m 1 i ← 0 2 while q 0 > q do

HomDecomp-FDFB
In HomDecomp-FDFB, we use FDFB-Compress to evaluate the continuous identity function f id (x) = x : Z q → Z q0 (using zero extension), and the obtained result is used to approximately clear the lower bits in the input ciphertext.See Algorithm 8 for a full description of HomDecomp-FDFB and Theorem 9 for its parameter requirements and noise analysis.Before beginning, we show how to evaluate a continuous function F with FDFB-Compress, where the input and output scaling factors are ∆ in and ∆ out respectively.First, the compression function f C in (1) is substituted with f C , which is defined in (10) and illustrated in Figure 3.
The strategy adopted to construct f C is called 'β-padding', which creates a 2β distance between f C (0) and f C ( q 2 ) to separate the cases where the input is 0 and q 2 .Otherwise, the bootstrapping error may intermix the two cases, making it impossible for f eval to distinguish between them.As a result, when the input is positive and near 0, FDFB-Compress may yield an incorrect result F (− q 2∆in ) instead of F (0). Also, f C ( q 2 − 1) and f C (q − 1) must be β away from q 4 and 3q 4 respectively to ensure that the output message of f C always lies within half of Z q .
The modified version of f eval in (2) (which we denote as f eval ) is rather complicated.Intuitively f eval aims to recover the original input to f C , evaluate F on the recovered input, and subsequently scales the result by ∆ out .As the evaluation of f C introduces a bootstrapping error, the input recovered by f eval also contains a bootstrapping error (multiplied by some constant), which means that the output error of FDFB-Compress depends on the Lipschitz constant of F .The output error variance is given in Theorem 8, and the proof can be found in the full version of the paper.Theorem 8.When evaluating a continuous function f with Lipschitz constant L, the output error variance of FDFB-Compress is HomDecomp-FDFB sets ∆ in = ∆ out = 1 and F = f id , which gives the following theorem.
Theorem 9. Let e f be the output error of FDFB-Compress, then its variance is , HomDecomp-FDFB outputs the decomposed digits correctly.

Analysis and Comparison
This section analyzes the FDFB and the homomorphic decomposition algorithms, both previous ones and ours, concerning their noise growth and the number of required bootstraps.

Analysis of FDFB Algorithms
Table 3 presents the error variance ratio between our and previous FDFB algorithms and the number of bootstraps required.For Type-HalfRange FDFB algorithms (FDFB-Compress and EvalFunc), the coded message must first be compressed into half of Z q .Thus the error of the compressed message (e.g., the error in ct of line 1 of Algorithm 1) plays a major role in the selection of parameters.For Type-SelectMSB FDFB algorithms (other algorithms in Table 3), the output error plays a major role in the selection of parameters.The dominant term of the output error variance is the σ 2 acc -term for most algorithms (refer to the full version of the paper for the formula of the output error variance of all algorithms).Thus, in the table, the first row of the ratio column represents the ratio of the error variances of the compressed message.The remaining rows of the ratio column represent the ratios of the σ 2 acc -terms of the output error variance.For FDFB-CancelSign, FDFB-Select and FDFB-SelectAlt, the ratios of the output error variance can be a small multiple of the displayed ones.For other algorithms, the output error variance ratios are very close to the displayed ones since the σ 2 acc -term is dominant.
As stated earlier, the efficiency of an FDFB algorithm is not solely determined by the number of bootstraps it requires.The error variances also impact the compactness of parameters and thus affect the final efficiency.As shown in Table 3, the main advantage of our FDFB algorithms is their reduced noise growth.This allows for the selection of larger decomposition bases during blind rotation, resulting in a reduction in the decomposition dimension (denoted by l as described in Section 2.2.2).Since the number of NTTs required for a blind rotation is proportional to (l + 1), our algorithms achieve better performance.To be more specific: • FDFB-Compress reduces the error variance of the compressed message by half, resulting in a more relaxed parameter choice than EvalFunc.
• FDFB-CancelSign, FDFB-Select, FDFB-SelectAlt and their multi-value bootstrap variants use LWE-to-RLWE packing and blind rotation instead of BFV multiplication.This reduces the noise to O(1/N 2 p 2 ) that of WoP-PBS.Although our algorithms require an additional bootstrap to replace the BFV multiplication, we demonstrate in Section 6 that they are still faster than WoP-PBS in most cases due to their slower noise growth.
• WoPPBS 1 -Refine and FDFB-BFVMult use significantly tighter noise analysis for BFV multiplication than WoP-PBS 1 and WoP-PBS 2 , reducing the noise growth to 1/N the original value.
The Optimality of FDFB-Compress.We observe that FDFB-Compress achieves optimality among Type-HalfRange algorithms.Recall that Type-HalfRange first uses functional bootstraps to transform the coded message q p m + e ∈ Z q into φ(m ) + ẽ ∈ U ⊆ Z q and then evaluate the LUT with another functional bootstrap, where φ is an arbitrary map, U satisfies U ∩ (U + q 2 ) = ∅ to bypass the negacyclic constraint, and ẽ has a variance of at least σ 2 ẽ ≥ σ 2 boot .Additionally, to ensure the correctness of evaluation, m must be reconstructible from m + ẽ, i.e., there is a map λ from U to Z p such that λ(φ(m ) + ẽ) = m Thus, on the one hand, FDFB-Compress achieves the minimum number of bootstraps required for Type-HalfRange (i.e., 2).On the other hand, since φ(m ) + ẽ ∈ λ −1 (m ), by the pigeonhole principle there exists an m . This requires β < q 4p , which is also the only requirement for FDFB-Compress.This means that FDFB-Compress achieves the most compact parameter choice among Type-HalfRange algorithms, thus achieving optimality.

Analysis of Homomorphic Decomposition
Table 4 compares the number of bootstraps needed for previous and our homomorphic digit decomposition algorithms.Algorithms in the same row of the table share the same digit decomposition base B (i.e., their decomposed digits have the same plaintext modulus).According to the table, our algorithms need one less bootstrap than previous algorithms in [LMP22].HomFloor requires that the input ciphertext encodes a discrete plaintext with small noise, which ensures a gap between two adjacent encoded messages to accommodate the noise introduced by subsequent bootstraps.Since an extracted CKKS ciphertext encodes messages continuously without any gaps, HomFloor cannot be applied to decompose it.Also, HomFloorAlt has an extra constraint for the ciphertext modulus.In contrast, our methods are free of these constraints, making them more flexible than previous methods.The full version of the paper provides a theoretical analysis of the noise growth and parameter choice.

Implementation
We implement all the FDFB algorithms and homomorphic decomposition algorithms, including both previous ones and ours, in OpenFHE [BBB + 22] (commit id 745a492).We disable multi-threading, except during key generation.We build OpenFHE using the g++ compiler of version 12.2.1 with flag WITH_NATIVEOPT=ON (as the authors did in [LMP22]).The performance of algorithms is tested on a machine with Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz and 125G of RAM, running Fedora Release 36.
Parameter Setting.We use two parameter sets in our LWE schemes, i.e., PARAM decomp and PARAM fast , which have been verified to meet 128-bit security using latticeestimator [APS15] (commit id 48fa49b).Table 5 presents the details of these parameter sets, and we briefly explain the selection criteria of q ks below since n can be determined from q ks .For PARAM decomp , the maximum ciphertext modulus is set to 2 35 such that the ciphertext to be digit-decomposed has a large modulus.This choice for q ks is also consistent with [LMP22].For PARAM fast , we focus on FDFB algorithms for discrete LUTs.Thus q ks can be set to a smaller value to accelerate FDFB.However, if q ks is too small, it may lead to large key switching noise, corrupting the correctness of FDFB.Therefore, we set q ks = 2 20 in PARAM fast .The performance of discrete LUT evaluation with FDFB variants is tested with the plaintext modulus set to 2 4 and 2 5 .To ensure fair comparisons, we have only recorded the best performance among the parameters for FDFB variants with multiple parameter choices (e.g., multi-value or not).In our experiments, the multi-value versions usually run faster than the non-multi-value ones.Thus, the multi-value versions of most algorithms are recorded.
Please refer to the full version of the paper for a complete list of the parameters used in the benchmarks.
Performance of FDFB Algorithms.Table 6 shows the running time of previous and our FDFB algorithms under four scenarios (two parameter sets × two choices of p).We can draw the following conclusions from the benchmark data.
First, the experiment data validate our algorithms' advantage over their predecessors, as suggested theoretically in Section 5. To be more specific: • FDFB-Compress can support p = 2 4 in scenario A while EvalFunc cannot because the former benefits from a reduced error variance of the compressed message.In fact, EvalFunc would need to double the RLWE dimension N to support p = 2 4 , which leads to worse efficiency.
• FDFB-CancelSign shows a speedup of 8.6%∼10.4% compared to WoP-PBS * 1 , even though it requires one additional bootstrap and does not use multi-value bootstrap for acceleration.This is due to the slower noise growth of FDFB-CancelSign, which allows for the choice of a larger decomposition base B g in blind rotation, resulting in improved performance.On the other hand, FDFB-Select * and FDFB-SelectAlt tolerance for homomorphic noise and forces prior methods to use smaller B g , degrading their performance.
Second, when comparing the fastest algorithms from previous works and our algorithms, we observe a 23.4%∼39.2%reduction in running time across all four scenarios (see Table 7).Among our algorithms, FDFB-BFVMult * is the fastest or very close to the fastest in all the scenarios.However, it does not render our other algorithms obsolete because (1) they support the addition of more bootstrapped ciphertexts since they have smaller output error than FDFB-BFVMult (WoPPBS 1 -Refine); (2) they are useful for smaller RLWE dimensions, where BFV-based FDFB methods might be unavailable.
Performance of Homomorphic Digit Decomposition. Figure 4 illustrates the performance of different homomorphic decomposition algorithms (the raw data can be found in the full version of the paper).Data for B = 2 4 are drawn in solid lines, while data for B = 2 5 are drawn in dashed lines.For all choices of log 2 (q 0 ), HomDecomp-Reduce runs roughly twice as fast as HomFloor, and HomDecomp-FDFB runs roughly at 1.5 times the speed of HomFloorAlt.Such speedup in homomorphic decomposition directly leads to speedup in the large-precision sign/ReLU/max/ABS evaluation, as they all require extracting the MSB of the input message.

Conclusion
This paper develops four FDFB algorithms and two homomorphic decomposition algorithms.
Our FDFB algorithms achieve a running time shorter than the best known results by up to 39.2%.Our homomorphic decomposition algorithms run 1.5x to 2x as fast as those presented in [LMP22], leading to speedup in large-precision ReLU, sign, max and ABS evaluation.We give a thorough theoretical noise analysis for FDFB and homomorphic decomposition algorithms, both in prior works and ours.We also implement all the algorithms in OpenFHE for a fair comparison between them.

Figure 2 :
Figure 2: The five steps of FHEW/TFHE bootstrapping: (1) blind rotation of TV by the input ciphertext; (2) extracting the constant term of the rotated TV;(3) modulus switching to q ks ; (4) key switching to the original secret key; (5) modulus switching to q. F is an LUT from Z p to Z p .

Figure 3 :
Figure 3: Compression function f C for continuous function evaluation.

Figure 4 :
Figure 4:The "running time"-"input precision" graph for previous (blue) and our (red) homomorphic digit decomposition algorithms under PARAM decomp .

Table 1 :
A summary of the intuition behind our algorithms and the improvement over previous methods.

Table 2 :
Symbols used in our noise analysis.

Table 3 :
Comparison of previous and our FDFB algorithms regarding their noise growth and the number of bootstraps required.

Table 5 :
Parameter sets for LWE scheme and their use cases.

Table 6 :
Running time of previous and our FDFB algorithms under four scenarios (A to D).Each running time is obtained by averaging over 100 tests and is measured in milliseconds (ms).For each scenario, the best algorithms from previous works and this paper are marked in blue and red, respectively.A '/' indicates that the algorithm is unavailable in that scenario because the plaintext modulus p exceeds its parameter requirements.

Table 7 :
Performance improvement of our FDFB algorithms.