Pasta: A Case for Hybrid Homomorphic Encryption

. The idea of hybrid homomorphic encryption ( HHE ) is to drastically reduce bandwidth requirements when using homomorphic encryption ( HE ) at the cost of more expensive computations in the encrypted domain. To this end, various dedicated schemes for symmetric encryption have already been proposed. However, it is still unclear if those ideas are already practically useful, because (1) no cost-beneﬁt analysis was done for use cases and (2) very few implementations are publicly available. We address this situation in several ways. We build an open-source benchmarking framework involving several use cases covering three popular libraries. Using this framework, we explore properties of the respective HHE proposals. It turns out that even medium-sized use cases are infeasible, especially when involving integer arithmetic. Next, we propose Pasta , a cipher thoroughly optimized for integer HHE use cases. Pasta is designed to minimize the multiplicative depth, while also leveraging the structure of two state-of-the-art integer HE schemes (BFV and BGV) to minimize the homomorphic evaluation latency. Using our new benchmarking environment, we extensively evaluate Pasta in SEAL and HElib and compare its properties to 8 existing ciphers in two use cases. Our evaluations show that Pasta outperforms its competitors for HHE both in terms of homomorphic evaluation time and noise consumption, showing its eﬃciency for applications in real-world HE use cases. Concretely, Pasta outperforms Agrasta by a factor of up to 82, Masta by a factor of up to 6 and Hera up to a factor of 11 when applied to the two use cases.


Introduction
In recent years, people have become increasingly concerned about the privacy of their data, and new regulations like the General Data Protection Regulation (GDPR) 1 forbid sharing and processing sensitive data.However, many applications, such as machine learning and statistics, require a vast amount of data to be as accurate as possible.With GDPR and similar regulations it is therefore difficult or even impossible to gather enough data to create useful and accurate models.One solution to this problem is employing privacy-preserving cryptographic protocols and primitives, such as secure multi-party computation (MPC) or homomorphic encryption (HE).Homomorphic encryption schemes allow performing computations on encrypted data without having access to the secret decryption key.Many privacy-preserving applications which employ homomorphic encryption use the following design principle: First, the data holder encrypts their dataset using a homomorphic encryption scheme and sends the ciphertexts to a server.The server then performs the computations on the ciphertexts and produces an encrypted result.Only the data holder knows the secret decryption key, so the server has to send the encrypted result to the data holder who can then decrypt it to get the final result of the computation.While this approach protects both the privacy of the input data and the secrecy of the applied computations, it comes with several drawbacks: First, applying homomorphic encryption results in a drastic performance penalty.Secondly, HE schemes suffer from ciphertext expansion.This means that the ciphertexts in HE schemes are several orders of magnitude larger than the corresponding plaintexts.This expansion negatively impacts the amount of data which has to be transferred from the data holder to the server.Especially when the client is an embedded device with limited bandwidth, memory, and computing power, this expansion can have a considerable impact on the overall performance of the application.The academic literature proposes two orthogonal solutions to this ciphertext expansion: Using symmetric ciphers in hybrid homomorphic encryption, or using LWE encryption and efficient conversion algorithms [CDKS21].In this paper we focus on hybrid homomorphic encryption, its effect on integer HE use cases, and consequences of the chosen symmetric cipher.

Hybrid Homomorphic Encryption (HHE)
Hybrid homomorphic encryption was first mentioned in [NLV11].The main idea behind HHE is the following: Instead of encrypting the data with HE schemes, encrypt the data with a symmetric cipher (expansion factor of 1) and send the symmetric ciphertexts to the server.The server then first homomorphically performs the symmetric decryption circuit to transform the symmetric ciphertext into a homomorphic ciphertext and then proceeds with performing the actual computations.This procedure trades bandwidth requirements with a more expensive computation on the server and requires that the data holder first sends the symmetric key encrypted under homomorphic encryption.HE Schemes and HE-Friendly Symmetric Ciphers.Today, many HE schemes exist, such as BFV [Bra12,FV12] and BGV [BGV12] which allow for integer plaintexts in Z q with q ≥ 2, CKKS [CKKS17] which allows HE for real numbers, the original TFHE scheme [CGGI20] allowing only boolean plaintext, as well as the optimized TFHE version working over low-precision integers [CJP21].These different schemes come with vastly different advantages and disadvantages and have diverging optimization criteria, such as minimizing the multiplicative depth in BFV/BGV/CKKS and minimizing the total number of gates when using the gate-bootstrapping mode of TFHE.
At first, researchers tried to evaluate existing ciphers, like AES [DR00, DR02], with homomorphic encryption [GHS12, CCK + 13, CLT14].However, despite their plain efficiency, existing ciphers were not well-suited for HHE.Especially their large multiplicative depth deemed to be incompatible with modern HE schemes.As a consequence, researchers came up with symmetric cipher designs with different optimization criteria compared to, e.g., AES, mainly minimizing the multiplicative depth to be efficiently computable under HE.Many proposed HE-friendly symmetric ciphers, such as LowMC [ARS + 15], Rasta [DEG + 18], Agrasta [DEG + 18], Dasta [HL20], Kreyvium [CCF + 16], and FiLIP [MCJS19], are defined over Z 2 , i.e., plaintexts are binary values.Consequently, they can be used to combat ciphertext expansion in the original TFHE scheme, as well as in BFV/BGV when instantiated with q = 2. Follow-up work then also introduced efficient ciphers for the requirements of the updated TFHE scheme (e.g., Elisabeth [CHMS22]), as well as ciphers tailored to CKKS, such as Rubato [HKL + 22].
Open Problem.However, despite there being a vast number of symmetric ciphers proposed in the literature, the real ramifications of applying HHE to any use case are not yet understood so far.This is a direct consequence of a lack of benchmark comparisons of different symmetric ciphers in different HE libraries when applied to different use cases.As a result, the inefficiency of existing symmetric ciphers when applied to BFV/BGV with q > 2 (which is required for many use cases involving statistics or integer arithmetic in general, e.g., [JVC18, CMdG + 21, BBH + 22]) was not yet realized so far: Once q is chosen for BGV/BFV, it cannot be changed without knowledge of the secret decryption key or without bootstrapping which is still not supported by many major HE libraries.Thus, if one wants to use one of the vast ciphers over Z 2 , one needs to instantiate BGV/BFV with q = 2 to be able to evaluate the boolean decryption circuit of these ciphers.This, however, results in also having to evaluate the use case in Z 2 which requires to build binary circuits with significantly larger multiplicative depth to realize integer arithmetic.For this reason, using HHE in use cases over integers already implies a heavy performance loss compared to just implementing the use case with homomorphic encryption.

Contribution
Having said that, in this paper we tackle these problems and close the gap by implementing a benchmarking framework comparing multiple symmetric ciphers in three HE libraries and two use cases.We then also introduce the novel family of stream ciphers (dubbed Pasta) which are defined over F p .More specifically, our contributions are the following: Extensive HHE Benchmarking Framework.To the best of our knowledge, we are the first to provide an extensive comparison of different symmetric ciphers in the context of hybrid homomorphic encryption spanning over several libraries.Notably, this increases the number of publicly implemented HHE schemes from only one to a total of 17, aiding public verifiability. 2 We come to the conclusion that most existing designs are not well-suited for large classes of use cases.

Designing an Efficient Cipher for HHE.
Based on the conclusions of our benchmarking framework, we explore the design space for efficient ciphers for HHE over F p .Starting from the cost metrics in BFV/BGV and the Rasta design strategy, we compare several different proposal for efficient S-box implementations and show how to instantiate the slowest part of the cipher -the linear layer -in an efficient way by splitting the design in two parallel branches.
Pasta.Based on the analysis just described, we propose a new symmetric cipher, dubbed Pasta, optimized for integer HHE use cases.Pasta is defined to operate on plaintexts in F t p , greatly increasing the performance compared to most previously proposed symmetric ciphers which are defined over Z 2 .Further, Pasta is designed to make use of the structure of two state-of-the-art integer HE cryptosystems (BFV and BGV) to minimize HHE decompression latency while still maintaining a small number of rounds and multiplicative depth.Our extensive benchmarks in our newly created framework confirm the advantage of Pasta compared to all other symmetric ciphers for HHE.Concretely, Pasta outperforms Agrasta [DEG + 18], the currently fastest Z 2 cipher for HHE, by a factor of 82 when applied to a small use case in HElib, and it outperforms Masta [HKC + 20] and Hera [CHK + 21], the two F t p contenders, by a factor of up to 6 and 11 respectively when applied to a larger use case in SEAL.
Follow-Up Works.Since we initially made our paper publicly available, our implementation framework has been used as a baseline for benchmarks in the followup designs proposed in [CIR22] and [CHMS22].Furthermore, the Rubato cipher [HKL + 22] directly uses the Feistel S-box (proposed in Section 6.4) as its non-linear layer.

Outline
The remaining paper is structured as follows.We first start with a small introduction to homomorphic encryption in Section 2, before we discuss related work to combat ciphertext expansion in different HE libraries in Section 3. Then we proceed by showing the effect of HHE on the server and client side when applied to a specific use case in Section 4. This section concludes with the statement, that the choice of symmetric cipher mostly effects the server side, which is why we proceed investigating the server side when using Z 2 ciphers in Section 5. Since these ciphers are not suited for integer HE use cases, we design a new cipher in Section 6 and give the complete specification of the result, dubbed Pasta, in Section 7. We continue by analyzing the security of Pasta in Section 8 and finally benchmark it against its competitors in Section 9.About Benchmarks.Throughout the paper, we run all benchmarks on a Linux server with an Intel Xeon E5-2699 v4 CPU (2.2 GHz, turboboost up to 3.6 GHz) and 512 GB RAM.Each individual benchmark only has access to one thread.

Notation
Let t ≥ 1.For each vector x ∈ F 2t p we denote x := x L x R where x L , x R ∈ F t p are respectively the left and the right t words.Further, we write rot i ( y) to indicate a rotation of the vector y ∈ F t p by i steps to the left.With y m we denote the element-wise product (Hadamard product) between two vectors y, m ∈ F t p .

Homomorphic Encryption
Homomorphic encryption has often been labeled the holy grail of cryptography, since it allows to perform any computation on encrypted data without knowledge of the secret decryption key.The concept of HE was introduced by Rivest et al. [RAD78], but the first schemes were only capable of performing one specific operation on encrypted data (e.g., multiplication with RSA [RSA78], addition with Paillier [Pai99]).The breakthrough came with Gentry's work from 2009 [Gen09], showing the first fully homomorphic encryption (FHE) scheme which in theory can perform any computation on encrypted data.Although deemed impractical, this work led the way for many improvements and follow-up publications [Bra12, FV12, BGV12, CGGI20, CKKS17].Today's HE schemes base their security on the learning with errors (LWE) hardness assumption [Reg05], and its optimization over polynomial rings (Ring-LWE, or R-LWE) [LPR10].In these schemes, random Gaussian noise is added during the encryption process.A homomorphic operation then increases this noise, negligible for homomorphic addition, but significant for homomorphic multiplication.Once the noise exceeds a specific threshold, the decryption will fail.The resulting schemes, therefore, allow the evaluation of arbitrary circuits over encrypted data up to a specific multiplicative depth which depends on the encryption parameters.Such a scheme is called a somewhat homomorphic encryption (SHE) scheme.In general, increasing the parameters to support a bigger circuit depth comes with a great performance penalty.In [Gen09], Gentry introduced the bootstrapping technique, a method to reset the noise in a homomorphic ciphertext.Bootstrapping allows to evaluate circuits of arbitrary depth on encrypted data and turns a (bootstrappable) SHE scheme into an FHE scheme.However, bootstrapping comes with a significant performance overhead, which is why it is often faster to choose an SHE scheme with sufficiently large parameters.

Packing
Many modern HE schemes allow to encode a vector of n plaintexts into only one polynomial, and therefore, encrypt a vector into only one ciphertext [SV14].Thereby, the size of the ciphertext does not depend on the exact number of slots (≤ n) of the vector filled during encryption.Homomorphic operations on the ciphertexts then correspond to element-wise operations on the encrypted vector.This packing is similar to single-instruction-multipledata (SIMD) instructions on modern CPUs and can be used to massively increase the throughput and decrease the ciphertext expansion of HE applications.Operations supported by this packing include addition, subtraction, multiplication, and slot rotation.However, once encrypted, one cannot directly access individual slots of the encrypted vector.The available number of slots n depends on the parameters of the HE scheme and can range up to several thousand slots.Slot rotation is implemented by evaluating Galois automorphisms τ i : a(X) → a(X i ) on encoded polynomials.

HE Schemes and Libraries
In this paper, we consider three HE schemes and their implementation in three libraries.We discuss the BFV [Bra12,FV12] scheme (and its implementation in SEAL [SEA20]) in this section and for the sake of conciseness refer to Appendix A for a discussion of BGV [BGV12] in HElib [HS20] and TFHE [CGGI20] in the TFHE library [CGGI16].Furthermore, benchmarks in HElib and TFHE are later discussed in the appendix as well.

BFV [Bra12, FV12] in SEAL [SEA20].
In BFV in SEAL plaintexts are elements in Z q .However, to support the packing described in the previous section, q has to be a prime p and packing is not supported for q = 2, i.e., one can not pack boolean plaintexts.We use SEAL version 3.6.2 in the paper.The runtime and added noise by homomorphic additions is negligible, which is why additions are considered free in the BFV cryptosystem.Therefore, the most relevant performance metric is the multiplicative depth of the evaluated circuit.

Related Work
In this paper we focus on HHE for the BFV and BGV HE schemes, and also discuss the application to the gate-bootstrapping mode of the original TFHE library.Hence, we include the boolean ciphers LowMC [ARS + 15], Rasta [DEG + 18], Agrasta [DEG + 18] (which is the "aggressive" version of Rasta, recently broken in [LSMI21]), Dasta [HL20], Kreyvium [CCF + 16], and FiLIP [MCJS19], alongside the F p competitors Masta [HKC + 20] and Hera [CHK + 21], in our comparison.However, other proposals for different HE schemes, such as CKKS and the Concrete library, exist which we shortly discuss in Section 3.1 and Section 3.2.Finally, in Section 3.3, we discuss an alternative approach to reducing bandwidth requirements for HE applications which does not involve symmetric encryption schemes.
However, the scheme includes approximation errors, which makes it incompatible with directly evaluating symmetric ciphers under a CKKS encryption.In [CHK + 21] the authors mitigate this problem by proposing a framework (alongside the stream cipher Hera), where the symmetric cipher is first evaluated under a BFV encryption, before it gets translated to a CKKS ciphertext.The currently fastest symmetric cipher proposed for this framework is Rubato [HKL + 22].Similar to CKKS, this cipher includes approximation errors, which allows it to greatly reduces the number of rounds.Consequently, it is very fast when used with CKKS, but incompatible with BFV and BGV.To highlight the impact of Pasta we want to mention that the S-Box used in Rubato is directly taken from Pasta as proposed in Section 6.4.

HHE for Concrete
Recently, a new HE library, dubbed Concrete [CJL + 20], has emerged, which implements a newer variant of TFHE as proposed in [CJP21].This library is vastly different compared to SEAL/HElib: it allows to perform HE on plaintexts in the ring Z 2 q for small q, supports bootstrapping and evaluating lookup tables during bootstrapping.Packing, however, is not supported.In [CHMS22], the authors introduce Elisabeth-4, a Z 2 q variant of FiLIP which is optimized for HHE using Concrete, and evaluate its performance when classifying the FMNIST dataset using a deep neural network with HHE.Using Elisabeth-4 (and consequently Concrete) leads to different tradeoffs compared to Pasta: On one hand HE use cases are not bound by the depth due to bootstrapping, on the other hand, it only allows small precision integers (q = 4) potentially limiting its applicability to high-precision use cases.Directly comparing Elisabeth and Pasta is difficult due to their different design criteria and optimizations for vastly different HE libraries.Nonetheless, in [CHMS22] the authors compare Elisabeth to Pasta using our implementation framework, showing that a singlethreaded evaluation of Pasta-4 in HElib has a 1.26 times higher throughput than a multithreaded Elisabeth-4 in Concrete even though it is evaluated with 48 threads.

LWE-Native Encryption
In [CDKS21], the authors describe efficient algorithms to convert many LWE ciphertexts into a packed (see Section 2.1) R-LWE one.These algorithms can also be used to reduce ciphertext expansion of homomorphic encryption.Their approach works as follows: First, they encrypt each plaintext m i ∈ F p under a secret key s ∈ Z N using basic LWE encryption by sampling a random vector a i $ ← Z N q and calculating b i = − a i , s + µ i , where µ i ∈ Z q is a randomized encoding of m i (with Gaussian noise).The LWE ciphertext then is (b i , a i ) ∈ Z N +1 q .To further reduce the size of the ciphertexts, one can use a random seed se and generate a i using a pseudo-random number generator (PRNG) f .The seed can be reused to generate the random part of each ciphertext as a i = f (se; i).The resulting ciphertexts are semantically secure in the random oracle model.The client then transmits all b i alongside the seed se to the server, which then transforms all LWE ciphertexts into a packed HE one using the algorithms described in [CDKS21].The total communication cost for this approach is one Z q element for each plaintext m i ∈ F p , plus one seed se to generate the random part of the ciphertexts.
According to the benchmarks in [CDKS21], the LWE encryption approach has a smaller multiplicative depth, and thus, less noise consumption compared to HHE. 3 Depending on the actually evaluated use case, this smaller noise consumption can lead to requiring smaller HE parameters with less noise budget, and thus, a runtime advantage.However, their algorithms do not achieve a ciphertext expansion factor of 1, but a factor of logq logp + |se|.For many HE applications, the plaintext space defined by p is in the range of 16 to 60 bit and the size of q can easily exceed 800 bits, resulting in big expansion factors.

A first look at HHE
The performance, advantages, and disadvantages of HHE are not so well understood so far.Therefore, we start with an high-level investigation of the effects on both the client and server when applying HHE to a real use case before we investigate the choice of symmetric cipher in the next sections.
Benchmarking a Generic Use Case.Matrix multiplications over integers are a basic building block in many applications involving statistics or machine learning.Hence, for our first look we choose to apply HE and HHE to a use case involving three affine transformations to a secret vector x 0 .In other words, the layers have the form , and p is a 60-bit prime.To make the use case more generic, we elementwise square the output vector after the first two affine transformations.The final use case has a multiplicative depth of 3 plaintext-ciphertext and 2 ciphertext-ciphertext multiplications and can be seen as, e.g., a small 3-layer neural network with squaring activation functions.We benchmark this use case after the initial setup phase, i.e., the server knows an HE encryption of the symmetric key and all HE evaluation keys.Further, we repeat this 1000 times, and the server aggregates the final results before sending them back to the client.In a real-world scenario, this would be equivalent to, e.g., a sensor device sending measurements in fixed intervals to a server.
In Table 1, we give results for evaluating this use case in the SEAL library, first by just using HE, then by applying HHE with 3 different ciphers, and finally by applying the alternative approach using LWE-native encryption [CDKS21] (i.e., transmitted ciphertext are essentially seeded LWE ciphertexts).To better show the effects of HHE, we instantiate the HHE benchmark once with a generic symmetric cipher (AES), once with a fast boolean HHE optimized cipher (Agrasta), and once with a HHE optimized cipher defined over F p (Pasta-3 as defined in Section 7 -since we aim to investigate the general effects of applying HHE to a use case in this section, details on Pasta-3 are not important at this point).

HHE Results.
As Table 1 shows, using HHE reduces the total client-to-server communication from 7.4 GB to 1.5 MB, the exact size of sending the input vector consisting of 200 60-bit field elements 1000 times.Furthermore, data encryption is also faster and requires less RAM, with the traditional cipher AES being the fastest option.However, to support the homomorphic evaluation of the HHE decompression circuit (i.e., homomorphically computing the symmetric decryption), the server-side requires larger HE parameters with higher noise budget, increasing the server-side runtime and RAM requirements.For HHE using the F p cipher (i.e., Pasta-3), the server-side runtime increases by a factor of 10.However, using HHE with Z 2 ciphers (e.g., Agrasta or AES) requires to implement binary circuits for the use case, resulting in a significant multiplicative depth requiring huge HE parameters, and thus in infeasibly long server runtimes.
Remark 1.As discussed in Section 2, one can use bootstrapping to reset the noise in a homomorphic ciphertext to allow the evaluation of circuits with arbitrary multiplicative depth.However, SEAL does not support bootstrapping, and it is still very inefficient in HElib and does not result in faster runtimes for the Z 2 ciphers compared to Pasta in HHE.Thus, we omit explicit bootstrapping benchmarks in this paper.

LWE Results.
As discussed in Section 3.3, LWE-native encryption [CDKS21] has larger ciphertext expansion than HHE (Concretely an expansion factor of 881 60 = 14.68 for the used parameter set).However, its smaller multiplicative depth allows it to use the same HE parameters as just using homomorphic encryption, resulting in a smaller runtime overhead.Both using HE and using LWE-native encryption require sampling Gaussian noise during encryption.Constrained devices, however, often do not have access to a reliable source of randomness.Therefore, we also list the number of random Gaussian words required on the client side to perform the encryption in Table 1.HHE does not require sampling random values during encryption, which is why using HHE is the preferable choice on constrained devices without a reliable source of randomness.Consequently, the first benchmarks show that HHE has the preferable effect on the client side due to not requiring sampling Gaussian randomness, having faster plain performance, and requiring less communication.The LWE-native encryption approach, however, leads to a faster server side evaluation due to having a smaller multiplicative depth.
Client Side Performance.Table 1 clearly shows that just using homomorphic encryption would result in unnecessarily large client-to-server communication.To further demonstrate the performance loss, we show the combined client timings (for encryption and client-toserver communication) for different network speeds in Figure 1.We depict timings for using only HE, HHE using Pasta-3, and for the LWE-native approach.We omit HHE using Z 2 ciphers, since they result in infeasible server runtimes.Figure 1 shows that using HHE always results in the fastest client-side latencies, especially for network speeds below 1 Gbps (the average LTE upload speed in the USA is 5 Mbps4 ) where HE runtime is fully dominated by the data transmission.

Conclusion.
To summarize, if the encryption time on a client is the bottleneck, then using HHE with an F p cipher (in this case Pasta-3) is the preferred choice.Only HHE using traditional Z 2 ciphers (e.g., AES) is faster, but using them results in infeasibly long server-side computations.Furthermore, if the client bandwidth is the bottleneck, then HHE has a considerable advantage.The concrete communication advantage depends on the HE parameters.For our example use case HE requires a factor of 4936× more communication than HHE, the LWE-native approach a factor of 14.86×.Since HHE has the largest server-side runtime overhead, using HHE has the best effect on constrained clients or in slow network settings.The choice of the symmetric cipher used in HHE has similar effects on the client side (all have ciphertext expansion of 1), but severely affects the server-side runtime.Consequently, we investigate the server-side computation using different symmetric ciphers in the remainder of the paper, starting with the inefficiency of Z 2 ciphers.10 0 10 1 10 2 10 3 10 4 105 10 6 10 7 10 8 10 9 We further want to note, that for sake of simplicity we assume plaintexts to have the exact size of the used prime p (i.e., 60 bit) in this first example of HHE.In practice, the exact plaintext space might be smaller to prevent overflows in F p during homomorphic computations.Thus, while still instantiating the symmetric cipher and HE scheme with a 60 bit plaintext prime p, the actually used plaintexts might be significantly smaller.Since the size of HE and LWE ciphertexts in Table 1 do not depend on p but on a ciphertext modulus q, the size of the used plaintext being undetectable once encrypted, and the need to instantiate F p ciphers with the same prime to allow decryption under HE, the values in Table 1 do not change for HE, LWE and HHE with the F p cipher Pasta-3.Only the Z 2 ciphers will benefit from the smaller plaintexts with smaller client to server communication.However, since the server side computation with its too large multiplicative depth is infeasibly long due to the need for binary circuits, this small advantage on the client side plays no role in practice.

Inefficiency of Z 2 Ciphers
In this section, we evaluate the usability of proposed symmetric ciphers for HHE.We focus on boolean ciphers with plaintexts in Z 2 since these are the majority of ciphers proposed for HHE.The main design criterion of all these ciphers is to reduce the AND depth of the decryption circuit.
Hybrid homomorphic encryption aims to reduce the communication overhead for outsourcing computations to a cloud.Therefore, we investigate not only the performance of the decryption circuit of each cipher under homomorphic encryption, but also the performance of the cipher in a complete HHE use case.The use case we benchmark in this section is very small, concretely a server which computes r and M ∈ Z 5×5 2 16 , i.e., a 5 × 5 matrix-vector multiplication of 16-bit integers.The matrix M and the vector b are private and owned by the server, whereas x is a private vector owned by the client.The client uses HHE to send x in encrypted form to the server, and will get r in encrypted form as a result.As described above, the choice of a cipher over Z 2 also requires that we compute the integer matrix multiplication over Z 2 .This requires the implementation of binary circuits for addition 5 and multiplication, which have a much higher AND depth than performing the same operations over F p .Despite being only a very small matrix multiplication (5 × 5 with 16-bit integers), our benchmarks (given later in this section) show that the evaluation is already very slow, making it infeasible for Z 2 ciphers to be applied to real-world statistics or machine learning use cases with multiple chained matrix multiplications of larger integers with matrices consisting of hundreds of entries.

A Zoo of Z 2 Ciphers
In this paper, we benchmark 128-bit security instances of the ciphers LowMC [ARS + 15], Rasta [DEG + 18], Agrasta [DEG + 18], Dasta [HL20], Kreyvium [CCF + 16] (as stream cipher and in depth-bounded CTR mode), and FiLIP [MCJS19].In Table 2 we summarize the parameters of the ciphers in their respective modes of operation.6We start this section by first introducing Rasta, which is the baseline for many other proposals, before we discuss some followup-ciphers not included in our benchmark comparisons.
Rasta.Rasta is a family of stream ciphers, in which a permutation is applied to the secret key to produce the keystream.The permutation consists of several rounds of affine layers and an S-box instantiated with the χ-transformation [Dae95].The main design criteria of Rasta is that each affine layer is pseudorandomly generated from an extendable-output function (XOF) [NIS15] seeded with a nonce N and the block counter i.This essentially prevents all attacks which require multiple plaintext/ciphertext pairs and allows to build a cipher with a low number of rounds.We depict the Rasta permutation in Figure 2.
Fasta.Shortly after first releasing our paper to the public the cipher Fasta [CIR22] was published.Fasta is an optimization of Rasta in which the linear layer is adapted for faster packed evaluation for specific HElib parameters.However, since not every HE library (such as SEAL) allows packing for Z 2 ciphers, and Fasta's optimization directly benefits from very specific HElib parameters and does not translate to every use case or library, we do not include it in our comparisons.For benchmarks comparing Rasta to Fasta using our implementation framework we refer to [CIR22].public key dependent XOF N, i Chaghri.Very recently, another boolean ciphers, namely Chaghri [AMT22], was proposed in the literature.Following the Marvelous [AAB + 20] design strategy, each round of Chaghri has a AND-depth of 2. Together with its comparably high number of rounds, Chaghri's total depth is 16, making it significantly deeper then any other symmetric cipher over Z 2 discussed in our work.Furthermore, this design is heavily optimized for using a special type of packing, where each slot encodes polynomials in F 2 63 .While this allows them to use Frobenius automorphisms to evaluate x 2 k for free, it also has the disadvantage that no technique (to the best of our knowledge) is known to homomorphically extract bits from these polynomials.Consequently, one either has to pack only one bit into these polynomials severely limiting throughput, or Chaghri can only be applied to very specific use cases using this packing.Furthermore, this type of packing is not available in some libraries, such as SEAL.Finally, each Chaghri round consists of two multiplications with 3 × 3 MDS matrices, which have to be implemented over polynomials with 63 elements, which is very expensive without this packing.Besides, Chaghri was broken shortly after publication, which is also confirmed by the authors [AMT22].The attack [LAW + 22] works in practical time and increases the number of rounds from 8 to at least 14.Based on the benchmarks given in [AMT22], this increase by 75 % would result in a performance close to AES (i.e., the only other cipher they consider in their paper), which is severely outperformed by any other Z 2 cipher proposed for HHE.However, the authors of [LAW + 22] propose a modification of Chaghri, which allows to keep the 8 rounds while maintaining roughly the same efficiency, which was then later adopted by the authors of Chaghri [AMT22].
For all these reasons, Chaghri does not provide better performances than any other cipher considered in this paper, and we do not include it in our performance evaluation.

SEAL Benchmarks
In this section we discuss the benchmarks for the Z 2 ciphers in SEAL, for benchmarks in HElib and TFHE we refer to Appendix B.2.1 and Appendix B.1 respectively.In SEAL, the available noise budget (i.e., how much further noise can be introduced until decryption will fail) depends on the ciphertext modulus q.However, big moduli q require a big degree N of the cyclotomic reduction polynomial for security.N , which is always a power of two, has a severe impact on the performance of the HE scheme.While a larger N allows for larger q to increase the noise budget, it significantly increases the runtime of homomorphic operations.
In Table 3 we present the benchmarks for the SEAL library, for homomorphically decrypting only one block, and for the small HHE use case, i.e., the 16-bit 5 × 5 affine transformation.For both benchmarks we give timings for homomorphically encrypting the symmetric key and homomorphically decrypting the symmetric ciphertexts (i.e., decompressing the HHE ciphertext) for the smallest N allowing enough noise budget for correct evaluation.We parameterize q such that the HE scheme has a security of 128 bits.For the HHE use case we additionally give the runtime for the affine transformation.Since SEAL does not allow to use packing with plaintexts in Z 2 , all implementations are bitsliced (i.e., one HE ciphertext per bit).

Discussion
Our benchmarks show that the runtime of the whole HHE use case (including cipher evaluation) using the Z 2 ciphers is high, despite the tested use case being small.This emphasizes the requirement of F p ciphers for HHE with integer use cases.In SEAL and HElib, the fastest ciphers are the ciphers based on the Rasta design strategy (Rasta, Dasta, Agrasta), with Agrasta being the fastest due to its small multiplicative depth.
Only FiLIP has better noise propagation.However, due to its large symmetric key and long evaluation time, it is not competitive in the libraries we benchmarked.For figures comparing the runtime of HHE in SEAL and HElib and a comparison to F p ciphers, we refer to Section 9.1.

Designing an Efficient Cipher for HHE over F p
Following the results from the previous section, we now want to design an efficient cipher for HHE for integer use cases.We will first have a look at existing related work (Section 6.1), before we identify the cost metric of the HE schemes in more detail (Section 6.2) and design a cipher accordingly.

Related Work
Masta.In an independent and concurrent work another symmetric cipher over F t p created for HHE use cases is introduced, namely Masta [HKC + 20].In their work, the F p cipher Masta is proposed to increase throughput compared to boolean ciphers when evaluated under HE and its decryption runtime under HE is compared to Rasta when implemented in the HElib library. 7asta can be seen as a direct translation of Rasta (Figure 2) to F t p , with the exception of a different strategy in sampling random invertible matrices.Their approach involves sampling a random polynomial m ∈ Z p [X]/(X t − α) and translating m into a matrix M .This matrix is then invertible by design and they only have to sample s field elements ∈ F p .Even though the S-box used in Rasta is in general not a permutation over F t p , and therefore limits the possible outputs of the S-box layer in Masta,8 the designers did not consider any additional changes to the baseline design and do not leverage any advantages of HE over fields F p .In this paper we consider the two 128-bit security instances of Masta with the lowest depth and use Shake128 to pseudorandomly generate all affine layers.
Since Masta does not consider any additional changes to Rasta based on the properties of BGV/BFV, and the S-box is not a permutation in F p , we aim to design a more optimized cipher in the next sections.
Hera.Another F p cipher, namely Hera [CHK + 21], was proposed in the literature alongside a framework for applying HHE to CKKS.Contrary to Rubato, Hera can also be applied to BFV and BGV which is why we also consider it in our comparisons.
The main design rationale behind Hera is to apply the Rasta design strategy in a different way to also benefit from the prevention of statistical attacks by randomizing the cipher, but with less preprocessing cost.They do this by fixing the affine layers and randomizing the key schedule by multiplying the key elements with pseudorandomly sampled F p elements.They also fix a small statesize of just 16 words and a round number of 5 for 128 bit security and instantiate their linear layers with efficient AES-like matrices.As nonlinear layer they use the well-known cubing layer (see Section 6.4).

Cost Metrics
The goal is to design an efficient cipher for HHE over F t p with 2 16 < p < 2 60 .9Since in both BGV and BFV (and their respective implementations in SEAL and HElib) the most significant performance metric is the multiplicative depth due to the absence of an efficient bootstrapping operation, our main goal is to reduce this metric.Since every round contributes to the multiplicative depth, and therefore to the overall noise consumption during a homomorphic evaluation of the cipher, we aim to design a secure cipher with a minimal number of rounds.Further, high-degree polynomials have a large multiplicative depth, and hence we consider low-degree S-boxes.Meeting both of these requirements usually requires a large state size for security.However, large state sizes lead to a high runtime of the cipher evaluation, especially in the linear layers.Therefore, our design will have to balance noise consumption and runtime to be efficiently usable in HHE.Furthermore, most HE applications leverage packing (Section 2.1) to increase performance, which is why we also aim to design a packing-friendly cipher which produces packed homomorphically encrypted ciphertexts.For a comparison of a word-sliced implementation of our final design to a packed implementation we refer to Appendix C.There we also compare a word-sliced implementation of Hera to Pasta.

Cost of HE Operations.
In Table 4 we summarize the cost of each HE operation in SEAL and HElib.Note that the key switching operation is free in terms of noise in SEAL, whereas it adds noise to the ciphertext in HElib.Key switching is required after a ciphertext-ciphertext multiplication and after an homomorphic Galois automorphism (required for rotation), which is why these operations require more noise in HElib.For both libraries the noise consumption depends on the size of the prime p, with larger p implying higher noise consumption, especially in pt-ct and ct-ct multiplications.Therefore, one cannot consider plaintext-ciphertext multiplications as negligible when working over F p and we also have to consider the plaintext-ciphertext multiplicative depth when designing an efficient cipher over F p .Remark 2. In the future, more efficient bootstrapping implementations might become available e.g.due to efficient HE hardware accelerators which implement this feature.Depending on the concrete efficiency of bootstrapping, the optimization angle of HE might shift from minimizing the multiplicative depth to minimizing the most expensive HE operations, such as multiplications.In this case, symmetric ciphers optimized for HHE will be allowed to have more rounds with higher degree S-Boxes and will more closely look like some ciphers optimized for e.g.MPC where the total number of multiplications is the main bottleneck.

Design Basis
Since our Z 2 benchmarks indicate that designs based on Rasta are the preferred choice, we first consider an F t p version of Rasta with equal text/key size, and then modify it for security and efficiency.In the following, we analyze several candidates for each of the operations defining the cipher, and we also determine their implementation efficiency.Based on these results, we then design Pasta in Section 7.

S-Box
The original Rasta design uses the χ-transformation [Dae95] over Z t 2 as a single nonlinear layer.However, the χ-function is in general not a permutation when working over F t p , which is why we consider alternative building blocks.Since the affine layers in a Rasta-based permutation are pseudorandomly generated for each new block, many attacks (mainly statistical attacks) are already prevented.Hence, the main goal of the S-box in this setting is to provide a sufficiently high degree to prevent algebraic attacks -the concrete structure of the S-box plays a comparably minor role.Consequently, we propose invertible lowdegree S-boxes, describe how they can be efficiently implemented in a packed homomorphic evaluation, and compare their efficiency.Despite not being a permutation, Masta still uses the χ-function naturally defined over F t p , which is why we include it in our comparison.

χ-S-box.
The χ-S-box is defined as The indices in the χ-S-box are taken modulo t, which is why χ can be efficiently evaluated using rotations, i.e., This works if the rotation is cyclic for the vector of size t.However, once encrypted, homomorphic rotations are cyclic over a larger vector of size n.Hence, we need to simulate cyclic rotation by preprocessing the state first.However, the resulting vector has more than t elements, which can influence further homomorphic operations.Thus, one has to apply a masking multiplication afterwards with a mask m = 1 ∈ F t p : Cube S-box.Given a prime p, gcd(p We recall that the cube S-box is the invertible power map with the smallest degree, and it can be efficiently evaluated by simply applying two homomorphic multiplications which affect the state elementwise, i.e., S( x) = x x x.
Feistel-Like S-Box (via a Quadratic Function).

S-Box Cost Comparison
All S-box designs can efficiently be implemented on packed HE ciphertexts and require only a constant number of homomorphic operations independent of the state size.A summary of required homomorphic operations as well as the multiplicative depths of the different S-boxes is given in Table 5.
Based on Table 5, we decide to choose the Feistel S-box S as the main S-box for our nonlinear layers, and to use the cube S-box S to increase the degree of our cipher to combat linearization attacks and reduce the state size of the cipher.We further explore the choice of the two different S-boxes in Section 8.4.

Linear Layer
In Rasta, the homomorphic runtime is dominated by the linear layer.In this section we discuss how to efficiently implement matrix-vector multiplications on packed homomorphic ciphertexts and introduce optimizations to reduce the homomorphic evaluation time.

Choice of Random Matrices
In the original Rasta design, each random t × t matrix is directly sampled and checked for invertibility.However, doing the invertibility check is expensive in F p in terms of computational complexity.Therefore, in Pasta we choose a different approach and generate each matrix as a sequential matrix [GPP11, GPPR11] (Section 7).These matrices are invertible by design and only require to sample t field elements and performing t • (t − 1) field multiplications and (t − 1) • (t − 1) field additions.Compared to sampling polynomials m i ∈ Z p [X]/(X t − α) and translating them to matrices M i (like in Masta), sequential matrices require to sample equally many field elements, but need more field additions and multiplications.Sampling sequential matrices is thus slower with respect to the method used in Masta, but it comes with the cryptographic advantage of having less structure (see Section 8).Contrary to Hera, we do not fix the matrices and randomize key schedules due to the fact that in a packed implementation one can not leverage advantages of specially chosen matrices, such as implementation via only additions, and plain performance is insignificant compared to HE evaluation runtime.

Babystep-Giantstep Matrix-Vector Multiplication
The most efficient way of evaluating the product between a plain matrix and an encrypted packed vector in HE is using the babystep-giantstep optimized diagonal method [HS14, HS15, HS18]: where , and diag i (M ) expresses the i-th diagonal of a matrix M in a vector of size t, with i = 0 being the main diagonal.Note that rot j ( x) only has to be computed once for each j < t 1 .Therefore, a matrix multiplication requires t 1 + t 2 − 2 rotations, t plaintext-ciphertext multiplications, and t − 1 additions, and the total depth is 1 plaintext-ciphertext multiplication.Thus, we add words to the final state size of our design for efficiency if t does not nicely split into t = t 1 • t 2 .Compared to the number of homomorphic operations required to evaluate the S-boxes (Table 5), it is clear that the runtime of the homomorphic evaluation of our cipher is dominated by the linear layer.

Splitting the State
The babystep-giantstep algorithm dominates the runtime of the homomorphic Pasta evaluation and scales with the state size.Therefore, we propose to evaluate two individual instances of our cipher with state size t in parallel, with an efficient mixing step after each affine layer, allowing for an overall smaller state size.The final output of the design is then the output of the first half, and the second half is discarded.The result is a cipher with the following properties: (1) The state size s = 2 • t is an even number and we truncate t words at the end.
(2) Instead of evaluating one large s × s matrix multiplication we perform two smaller t × t matrix multiplications.
(3) The S-box is applied on both branches individually.
(4) The key has now double the size of the keystream.The latter has no effect on the HHE use case, since a packed homomorphic design still requires only one homomorphic ciphertext, with a size independent to the number of encoded words.However, we can use the inner structure of homomorphic ciphertexts to parallelize both cipher evaluations, cutting the runtime down to an evaluation of one cipher instance of state size t.
Inner Structure of HE ciphertexts.In R-LWE based homomorphic encryption schemes (like BFV and BGV) the plaintexts are polynomials ∈ R p = F p [X]/Φ m (X), with Φ m (X) being the m-th cyclotomic polynomial.Using packing (Section 2.1) one can encode a vector of integers into one polynomial, homomorphic additions and multiplications then affect these vectors element-wise.Further, one can use Galois automorphisms to permute the encoded vector.Thus, the encoded vector can be seen as a hypercube [HS14] and an automorphism rotates the data along one dimension.The precise structure of this hypercube depends on the choice of Φ m (X).In general, it is possible to use these automorphisms to create linear rotations over the encrypted vector, but this requires masking multiplications [HS14], which when evaluated homomorphically require noise budget.In terms of implementation efficiency, Φ 2n (X) = X n + 1, for n being a power of two, is a good choice.This polynomial is negacyclic and allows efficient polynomial multiplications via a negacyclic number theoretic transformation (NTT).For this reason, the homomorphic encryption standardization project10 recommends using these powerof-two cyclotomic rings.Consequently, SEAL only implements HE with those rings and Masta is defined to use these rings as well [HKC + 20].The hypercube generated by such rings also has a nice structure: It corresponds to a matrix of two rows, each of size #slots 2 .Galois automorphisms can then directly be used to either linearly rotate both rows at once or rotate all columns simultaneously, i.e., for the Galois automorphism τ i : a(X) → a(X i ).
Parallelizing Two Cipher Evaluations.In two state-of-the-art integer HE cryptosystems (BFV and BGV) we can use this inner structure of power-of-two homomorphic ciphertexts to parallelize both branches of our cipher.When encrypting the secret key and encoding vectors in the affine layer, one has to encode the vectors affecting the first branch of the cipher into the first row of the homomorphic ciphertext, and vectors affecting the second branch into the second row.As a result, all homomorphic operations are applied in parallel to both branches.
Efficient Linear Layer.For security, we have to mix both branches of our cipher after each affine transformation.An efficiently implementable linear layer, which is also invertible, is the following matrix multiplication: where I is the t × t identity matrix.This can be implemented by two homomorphic additions and a homomorphic rotation.
In Table 6 we compare the cost of the new linear layer (two parallel instances of state size t) to the cost of one larger linear layer of size s = 2 • t.The new linear layer effectively requires half the homomorphic additions and multiplications, and choosing t such that it splits nicely into t = t 1 • t 2 the number of rotations is also halved.

Total Homomorphic Operations and Multiplicative Depth
In Table 7 we summarize the number of homomorphic operations and the multiplicative depth of each individual part of our resulting new cipher, dubbed Pasta, as well as the total count for Pasta-3 (3 rounds) and Pasta-4 (4 rounds).The table also highlights that the multiplicative depth of Pasta, and therefore its noise consumption, only depends on the number of rounds.Further, the runtime of homomorphically evaluating Pasta is dominated by the affine layer and scales with the state size and the number of rounds.

Pasta Specification
Here we provide the full Pasta specification.Pasta is a family of stream ciphers which applies the Pasta-π permutation under a nonce N and a block counter i to the secret key, followed by a truncation, to produce the final keystream.Keystream generation is shown in Figure 3 The permutation Pasta-π( x, N, i) on a vector x ∈ F 2t p , thereby, is defined as where r ≥ 1 is the number of rounds and where  • S feistel is an S-box layer defined as S feistel ( x) = S ( x L ) S ( x R ), where S over F t p is a Feistel structure defined as where where I ∈ F t×t p is the identity matrix and where M j,L,N,i , M j,R,N,i ∈ F t×t p and c j,L,N,i , c j,R,N,i ∈ F t p are generated for each round from an XOF seeded with a nonce N and a counter i.
To efficiently sample each invertible matrix M j,k,N,i ∈ F t×t p , we sample sequential matrices following [GPP11,GPPR11].For each k ∈ {L, R}, we define M j,k,N,i := Mj,k,N,i t , where Mj,k,N,i ∈ F t×t p is defined as for α 1 , . . ., α t ∈ F p \ {0}.M j,k,N,i is an invertible matrix which can be built by sampling t random elements and performing t • (t − 1) multiplications and (t − 1) • (t − 1) additions.

Concrete Instances
We propose a 3-round instance Pasta-3 as well as a 4-round instance Pasta-4 using Shake128 [NIS15] as XOF.These instances provide at least 128 bits of security for the prime fields F p with log 2 (p) > 16 and gcd(p − 1, 3) = 1.Table 8 shows the block and key sizes and compares them to Masta and Hera.
Security Margin.In all cases, we add a security margin to our construction.Concretely, we take the largest number of words s needed for security, we multiply this number by 1.2 for a 20% security margin, and we then take the smallest even integer larger than or equal to that.

Comparison to Previous Designs
In this section we summarize Pasta by comparing it to previous designs.Furthermore, in Section 9.3 we discuss F p primitives for different use cases and compare them to Pasta.

S-box.
Rasta and Dasta use the χ-transformation as single nonlinear layer.Masta uses a translation of χ to F t p as nonlinear layer, despite it being no permutation, and Hera uses the cubing layer.In Pasta we introduce and use two different, bijective S-boxes.This is motivated by the desire of reducing the number of rounds while maintaining a reasonable state size.Having r − 1 Feistel S-boxes and a final cube S-box with higher degree and depth allows us to build Pasta instances with comparable number of plain/cipher words as Masta with one round less.This implies both, a faster homomorphic evaluation time, as well as less noise consumption compared to Masta.We further explore the choice of two different S-boxes in Section 8.4.Linear-Layer.Pasta, Rasta, Dasta, and Masta use randomly generated linear layers to mitigate statistical attacks, and Hera has a randomized key schedule for the same reason.While Rasta just samples random invertible matrices, Dasta uses random permutations of the same fixed matrix.Masta on the other hand samples random polynomials and translates them to matrices (which have lots of structure).These methods, however, all just differ in how the matrices are generated and do not effect the homomorphic evaluation time.Contrary, Pasta's linear layer is thoroughly optimized for efficient evaluation in HE.Indeed, instead of generating a 2t × 2t random invertible matrix directly, we pick up 2t random elements and construct two sequential matrices M i ∈ F t×t p as given in [GPP11,GPPR11].These two matrices are then combined into one 2t × 2t matrix via a cheap mixing operation, effectively cutting HE runtime in half.
Truncation vs. Feed-Forward.Pasta discards the feed-forward addition of the secret key (as done in Rasta, Dasta, and Masta) in favor of a truncation.This allows to prevent MITM attack in a more efficient way, at the cost of using a larger state.In the packed HE evaluation the truncated words, however, do not influence the runtime since they can be evaluated simultaneously to the non-truncated part of the state.

Pasta Security Analysis
Given a certain number of rounds (fixed in advance), our goal is to find the minimum number of key words s = 2t for which we can guarantee security of at least κ bits.If not specified otherwise, κ ≈ log 2 (p s ).This is slightly different from what is usually done in traditional symmetric cryptanalysis.Indeed, in general, given a state F s p and a security level κ, one looks for the minimum number of rounds which provide a security level of at least κ bits.Here we modify the approach since one of our main goals is to keep the depth as low as possible, focusing on 3 and 4 rounds.
Remark 3. The design approach of Pasta is analogous to the one originally proposed for Rasta.For this reason, in many cases we limit ourselves to adapt the security argument proposed for Rasta to Pasta.

Truncation versus Feed-Forward
Consider a permutation F : F s p → F s p , and assume it can be split as The advantage of a truncation with respect to a feed-forward operation is that it prevents attacks using the backward direction without requiring a high degree of the inverse round function.Indeed, in the feed-forward case, given y = F (x) + x, one can set up a system of equations of the form F 1 (x) = F −1 2 (y − x).In order to prevent the possibility to solve it using algebraic techniques (e.g., Gröbner bases), we need that both F 1 and F −1 2 have a high degree.In the case of truncation, given y = left t (F (x)), the system of equations becomes F 1 (x) = F −1 2 (y || y ) for a certain unknown y ∈ F t p .If t is large enough, the cost of solving it exceeds the security level.However, the overall size of the state must be larger than in the feed-forward case due to losing part of the state.

Security against Statistical Attacks: Properties of the Linear Layer
As in Rasta, the security against statistical attacks as differential [BS90] and linear [Mat93] ones (besides all their variants, as the truncated differential [Knu94], zero-correlation linear [BW12], impossible differential [BBS99], and so on) is achieved by changing the linear layers at every encryption.In a statistical attack, the attacker makes a statistical analysis of the ciphertexts generated by a set of chosen/known plaintexts in order to break the scheme.This strategy works under the assumption that the ciphertexts are generated via the same encryption scheme.By construction, this is not the case for Rasta-like designs as Pasta, which implies that statistical attacks are not a threat for our design.
Having said that, it is important that the linear layers that instantiate Pasta do not have any weakness that could be exploited for an attack, and that full diffusion is achieved over the entire scheme.For this reason, we study the linear branch number of the random matrices that instantiate Pasta, and we show that it is sufficiently high in general.We recall that the branch number of a matrix is defined as the minimum number of non-zero entries that two t-element mask vectors α and β that satisfy α = M T × β could have -we refer to [DGGK21] for an overview of correlation analysis in F p .Since Pasta's linear layer is defined as we have the following scenario: • the fixed matrix circ(2, 1) ∈ F 2×2 p is MDS, which implies that full diffusion among the l-th element of x L and the l-th element of x R is achieved for each l ∈ {0, 1, . . ., t − 1}; • the invertible matrices M j,L,N,i and M j,R,N,i are randomly generated for each new encryption, hence, we cannot guarantee a certain branch number a priori.
For this reason, we estimate a lower bound of the probability that a randomly picked matrix M ∈ F t×t p allows for transitions on the t-element mask vectors α to β, α = M T × β, where α and β have many zeros (which corresponds to the best scenario for an attacker).
Proposition 1.Let M ∈ F t×t p be a random invertible matrix.Its branch number satisfies the following for p > t ≥ 6: Proof.By definition: where • #(γ) denotes the number of non-zero entries of the vector γ; • X α,β denotes the set of invertible matrices that satisfy α = M T β; • I denotes the set of invertible matrices.
First of all, we are interested in the number |I| of all possible bijective matrices M .A matrix M is bijective, if all its row vectors are linearly independent and different from the all 0 vector.So, for the first row, we have p t − 1 possibilities to choose a row vector.For the second row, we have p t possibilities to choose the coefficients minus p choices that is just are linear combination of the first row.In the third row, we now have p t − p 2 choices, and so on.So we finally end up with Next, we consider the number of matrices M , that allow a transition α = M T β for fixed non-zero α and β.For our goal, we are interested in an upper bound of such a number.Hence, we limit ourselves to consider a weaker condition, namely, that (i) β maps to the first coordinate of α and that (ii) the matrix is invertible.It is simple to observe that the first condition is satisfied by at most p t−1 choices of the coefficients of the first row (note that if α 0 = 0, then we exclude the zero-vector as first row of M ).By combining this fact with the requirement that M is bijective, we get the number of matrices M that map α to β is upper bounded by Finally, we have a look at how many different masks α and β exist, which have together i non-zero entries.This number is simply given by (p − 1) i • 2t i .Now we have all ingredients we need to bound the probability that a randomly selected matrix M has a branch number smaller than z , We now set z = t/2 and assume that p > t ≥ 6, we get where (x t − 1) ≥ 3/4 • x t for x ≥ 3 and t ≥ 2, which means that Thus, Pr[branch number ≥ t/2] ≈ 1 for p t ≥ 6 .However, in our case, the total number of sequential matrices that we can generate is limited by the t elements α i we can choose.Hence, in total we can generate κ = (p − 1) t invertible matrices.Considering this special case, we get that

Security against Algebraic Attacks
To describe our analysis, we focus on Pasta-3.Our input x consists of s = 2t unknown key elements and the output y consists of t elements (after truncation).Hence, for a known nonce N and block counter i we have

Linearization
In a linearization approach, the attacker replaces all monomials of degrees greater than 1 by new variables, and finally tries to solve the resulting system of linear equations.Assuming n v variables and a maximum degree of d, the number of possible monomials is For Pasta-3 we have d = 12, and hence s input words with degree 12 after one function call.Further, we obtain t equations with each call.In order to get as many equations n e as variables n v for our equation system, we can simply request more data, which eventually results in n e = n v after s/t = 2 blocks (this has no effect on the efficiency of the linearization).Due to the complexity of solving a linear equation system in n m variables, we target log 2 (n m ) > 64.Hence, s ≥ 207 input words for a security of 128 bits.Following the same analysis, we need s ≥ 51 for Pasta-4 and s ≥ 101 for a Masta-like 4-round instance using only degree-2 Feistel-like S-boxes.
In this analysis, we assume that almost all monomials appear in the final representations, since our design provides strong diffusion in half of the state by using dense invertible matrices, and full diffusion after two full linear layers.In order to get more confidence in our design, we also did some practical tests and show the results in Figure 4. To avoid the effect of cancellations, we used prime numbers of sizes larger than 2 16 .We observe that for the state sizes we tested, the actual number of monomials in the output word with the smallest number of monomials is always very close to the upper bound for the number of monomials given in Eq. (3).  3) and the lowest number of monomials found in a practical evaluation.

Gröbner Basis Attacks
Here we determine how large our key s has to be in order to provide security with respect to Gröbner basis attacks up to a complexity of 2 128 function calls.As was the case above, we can simply generate sufficiently many equations by requesting at least s/t = 2 blocks.Hence, n v = n e , and we can estimate the complexity of solving such a system of equations by using theoretical bounds.However, these bounds assume a regular system of equations, and in practical tests we quickly observed that this is not the case for Pasta.Indeed, when building more full-round equations and hence an overdetermined system, we can force the degree of regularity to reach a minimum of 12.By reusing the estimate for the complexity of computing a Gröbner basis [BFSY05] we need s ≥ 207.Similar results can be obtained by assuming d = 24 for Pasta-4.
There is also a different way to argue the number of words to use.From the linearization analysis we know that there will be roughly 2 64 different monomials in each of the resulting equations.Due to the internals of Gröbner basis algorithms, this results in around 2 64 ω operations being necessary to compute a basis. 12We pessimistically (from a designer's point of view) set ω = 2 and thus have 2 64 ω = 2 128 .Additional Strategies.The strategy presented above is only one way to attack the system using Gröbner bases.It is common to also consider approaches which introduce new variables in each state.The main idea of this technique is to reduce the degrees of the equations at the expense of more variables, which is particularly useful when trying to represent high-degree equations in a more efficient way.In more detail, we may introduce a new variable after each nonlinear operation.Considering a total state size of s = 2t words, we need to introduce 2s(r − 1) new variables for an r-round construction (note that no new variables are needed after the final round, since the stream output added to a plaintext is a degree-3 combination of the previous variables).Using this many variables and equations of a degree larger than or equal to 2 results in a high solving complexity when assuming nontrivial (i.e., dense) equations (we refer to [JV17,NNY18], in which degree-2 equation systems over Z 2 are considered).We therefore conjecture that introducing intermediate variables will only increase the complexity needed to solve the final system when compared to using full-round equations.

Other Algebraic Attacks
Many other known attacks (including e.g.higher-order differential attack [Lai94,Knu94], interpolation attack [JK97], and so on) are prevented by our random linear layers which are different in each Pasta-π evaluation.This is the same strategy as used by e.g.Rasta and Masta.We shortly discuss these attacks in this section.Furthermore, the recent attack proposed on Agrasta [LSMI21] does not apply to Pasta, since it directly exploits the χ-layer which is not present in Pasta, and it works differently over large prime fields.
Higher-Order Differential Attacks.Higher-order differential attacks [Lai94,Knu94] are essentially prevented by the fact that the attacker is only allowed to evaluate a single instance once due to the different linear layers.Moreover, the only subspaces of a finite field F p with prime characteristic are {0} and F p itself, which makes higher-order differential attacks even harder (however, there have been variations of this attack vector which also work over F p [BCD + 20]).This also includes higher-order differential distinguishers and attacks based on higher-order differential properties (e.g., cube attacks [Vie07,DS09]).
Interpolation Attacks.In an interpolation attack [JK97], the attacker tries to build an interpolation polynomial mapping an input to the corresponding output.This polynomial can then be used to recover the secret key.However, interpolation attacks need multiple evaluations of a fixed permutation, which is not possible when considering Pasta and its varying linear layers.
Guessing Attacks.Guessing (or guess-and-determine) attacks combine the guessing of one or more variables with other attack strategies, potentially decreasing their complexities by fixing parts of the secret.However, due to the large number of state words and a minimum size of 17 bits for each of them, it is unlikely that guessing any of the state words (or even multiple of them) leads to an advantage.Indeed, using our analysis, guessing from 1 to 127/17 words does not lead to any improvement, but even makes the attacks worse.In more detail, we would need to improve the attack itself by a factor of at least 2 17w when guessing w input words, which for all configurations we tested (1, . . ., 128/17 guesses) is not possible with our analysis.For example, in the 17-bit case with s = 51 and when considering the linearization approach, the complexity is reduced by less than one bit when guessing a single variable.When assuming s = 44 (guessing the maximum feasible number of variables), the complexity of the attack is still around 120 bits, which is much more than the allowed 128 − 7 • 17 = 9 bits.We remark that this is the "weakest" instance from the attacker's perspective, and for all larger primes we would need an even higher performance increase for the actual attack.Further, given the density of the algebraic representation, we do not expect that the equation systems get significantly easier to solve by guessing any small number of variables.

On Using Two Different S-Boxes
To be optimized for HHE, we designed Pasta to have a small number of rounds (implying less noise consumption) and a small state size (implying fast homomorphic evaluation time).Therefore, we make use of a Feistel S-box of degree 2 and a cube S-box of degree 3. Using only Feistel S-boxes would result in a design with worse performance: A 3-round design using only Feistel S-boxes would require t ≈ 500 plain/cipher words (based on the security analysis in Section 8), which results in significantly longer homomorphic evaluation times.A 4-round design would have the same multiplicative depth as Masta-4, leading to the same HE parameters and noise consumption as Masta-4.Therefore, this design would be faster then Masta due to the smaller size t (t = 55 as shown in Section 8) in one evaluation branch.However, it would not have a noise advantage.Pasta-3, on the other hand, has both a runtime and a noise advantage due to requiring fewer rounds by having the same size t as Masta-4.
The diffusion of a 2-round cipher based only on cube S-boxes would largely rely only on the single layer between the matrix multiplication.Thus the resulting diffusion is likely bad potentially allowing to separate the cipher [CDK + 18].Therefore, we chose to instantiate Pasta-3 by using the smallest depth which allows a 3-round cipher with approximately the same number of plain/cipher words t as Masta-4, which is using two Feistel S-boxes and one cube S-box.

Pasta Benchmarks
In this section, we benchmark a packed implementation of our Pasta design in both SEAL and HElib.We also reimplemented a packed version of Masta and Hera, using the same algorithms to generate random field elements and homomorphic matrix multiplications as in Pasta to compare these ciphers in a fair setting.Similar as in Section 5, we also benchmark the ciphers in a real HHE use case.

Comparing Pasta to Z 2 Ciphers
We first compare Pasta, Masta, and Hera to the Z 2 benchmarks from Section 5. Therefore, we instantiate these ciphers with a 17-bit prime and benchmark their performance for the small use case from Section 5. 13 The resulting benchmarks can be seen in Table 9 where we depict both runtime and remaining noise budget after each step of the HHE use case for SEAL.For benchmarks in HElib we refer to Appendix B.2.2.

Discussion
In the following, we compare the runtime and noise consumption of all Z 2 and F p (with p = 65537) ciphers, namely in Figure 5 for homomorphically decrypting one block in SEAL (F p values from Section 9.2), and in Figure 6 for the HHE use case (including HHE decompression) in SEAL.For HElib benchmarks we refer to Appendix B.2.2.
Our figures indicate that Pasta is always the fastest cipher -mainly Pasta-4 due to the small number of encrypted words.However, Pasta-3 is faster when evaluating the whole HHE use case in SEAL due to the small multiplicative depth requiring smaller HE parameters for security.Comparing Pasta to the Z 2 ciphers, one can observe that homomorphically decrypting one block requires less noise budget for the Z 2 ciphers.
However, Pasta has (besides the runtime advantage) a noise advantage over the Z 2 ciphers when considering the HHE use case due to the significantly larger multiplicative depth of the binary circuits for integer arithmetic.Concretely, decompression and use case evaluation is 33× faster in SEAL using Pasta-3 and 82× faster in HElib using Pasta-4 compared to Agrasta.Using TFHE in gate-bootstrapping mode for Z 2 ciphers instead of e.g.SEAL does not help the Z 2 ciphers either, since Pasta-3 in SEAL is 47× faster than using Kreyvium in TFHE for the small HHE use case.Increasing the bitsize of the encrypted integers or chaining multiple matrix multiplications would further demonstrate the advantage of Pasta over Z 2 ciphers, since the drastic increase in the multiplicative depth of the use case would make using the Z 2 ciphers infeasible.

Pasta versus Masta and Hera
Since all F p ciphers outperform the Z 2 ciphers for HHE, we continue with comparing these ciphers.Similar to the Z 2 benchmarks, we also compare Pasta, Masta, and Hera in a real HHE use case.However, to further demonstrate the advantage of the F p ciphers in HHE, we benchmark a more extensive use case with a significantly higher multiplicative depth.We reuse the same use case as in Section 4, i.e., three affine layers interleaved with squarings on a vector x ∈ F 200 p .We benchmark the use case for 3 different primes p.

SEAL Benchmarks
In this section we discuss the benchmarks for the F p ciphers in SEAL, for benchmarks in HElib we refer to Appendix B.2.3.Furthermore, we provide CPU cycle counts for plain encryption with Pasta, Masta, and Hera in Appendix B.3.In Table 10 we present the benchmarks for the packed implementation of Pasta, Masta, and Hera in the SEAL library.We give timings for homomorphically decrypting one block and additionally timings for the bigger HHE use case.We parameterize SEAL to provide 128 bits of security and use the smallest N allowing enough noise budget for correct evaluation.

Discussion
In the following figures we compare the runtime and noise consumption of Pasta, Masta, and Hera for 3 different prime fields F p , in Figure 7 for homomorphically decrypting one block in SEAL, and in Figure 8 for the HHE use case (including HHE decompression) in SEAL.For HElib benchmarks we refer to Appendix B.2.3.The figures show the advantage of Pasta compared to its competitors.In all figures, Pasta-3 has a smaller runtime and noise consumption then Masta, especially when the smaller multiplicative depth allows for smaller HE parameters (compare, e.g., 33-bit prime fields in Figure 8, where Pasta-3 is 6× faster than Masta-4).Pasta-3 is only outperformed by Pasta-4 and Hera for a small number of encrypted words (e.g., only encrypting one block as for the 33-bit prime for SEAL where Hera is slightly faster then

Pasta in Different Use Cases
In recent years, many symmetric primitives defined over F t p , such as GMiMC [AGP + 19], and Hydra [GØSW22], have been proposed in the literature.However, contrary to Pasta, these primitives were not designed for HHE, but for MPC and zk-SNARK/STARK use cases, which is why they were optimized for different metrics.While having a low multiplicative depth is the most important design criterion for use cases involving homomorphic encryption, the other use cases usually just require a small total number of multiplications.Therefore, these afromentioned symmetric primitives have a significant larger number of rounds and, consequently, a large multiplicative depth which makes them infeasible for HE use cases.Pasta on the other hand has a very small depth, but the significantly larger statesize results in a large total number of multiplications.In HE use cases many of these multiplications are performed in parallel using packing, but this large number of multiplications makes Pasta worse for MPC and zk-SNARK/STARK applications.In some MPC scenarios (e.g., scenarios with a very high-delay, low bandwidth WAN connection between the parties), the low multiplicative depth of Pasta may, however, give it an advantage over the other constructions.

Conclusion
In this paper, we investigated hybrid homomorphic encryption, a technique to combat ciphertext expansion in homomorphic encryption applications at the cost of more expensive computations in the encrypted domain.Since HHE was first mentioned in [NLV11], many symmetric ciphers for HHE have been proposed in the literature.However, the effects of applying HHE to any use case were not really understood so far.In our work, we tackled this issue in several ways: First, we for the first time investigate the high-level impact on the server and client when applying HHE to a practical use case in Section 4. Secondly, we implement a framework which for the first time compares many different symmetric ciphers when used with HHE in three popular HE libraries.Finally, to improve the performance of HHE, we propose a new symmetric cipher, dubbed Pasta, which outperforms the state-of-the-art for integer use cases over F p .
The main take-aways of this paper are the following: Our investigations show, that HHE achieves the best results when the clients are embedded devices with limited computational power and bandwidth.Furthermore, many state-of-the-art ciphers are not well suited for many HHE applications due to being defined over Z 2 .Finally, while HHE is very beneficial for clients, the actual computation in the encrypted domain suffers.This is due to first having to decrypt the symmetric ciphertexts under homomorphic encryption before computing the actual use case.While this extra work naturally contributes to the computation runtime, it also contributes to the multiplicative depth of the whole HE computation.Since an efficient bootstrapping operation is still missing from most state-of-the-art HE libraries (such as the ones considered in this paper), this additional multiplicative depth significantly contributes to the whole computaiton runtime.As a consequence, we show that only evaluating a cipher under HE is not enough to estimate its performance in HHE, one has to consider the whole HHE use case instead.[HS20].The BGV scheme, and its implementation in HElib, allows plaintexts in Z p r and offers more flexibility for choosing HE parameters than SEAL.It allows arbitrary cyclotomic reduction polynomials and it is possible to find parameters which allow packing for Z 2 plaintexts.However, this flexibility comes with the drawback that parameterizing for HElib is more difficult than finding parameters in SEAL, and the limited parameter sets in SEAL allow for more optimized implementations.In this paper we use the HElib version 2.1.0.Similar to BFV in SEAL, additions are considered free in BGV, and the multiplicative depth of the circuit is the most relevant performance metric.

TFHE [CGGI20] in TFHE [CGGI16].
The TFHE library, more concretely the gatebootstrapping version of the original TFHE library which we use in this paper, is vastly different from SEAL and HElib.It only allows the encryption of boolean values (i.e., plaintexts are in Z 2 ), but it is optimized for fast gate bootstrapping.This basically means that after the evaluation of a homomorphic gate the noise in the ciphertext is reset.As a consequence, contrary to most other modern homomorphic encryption schemes, the multiplicative depth of a circuit is no relevant metric there.However, each homomorphically evaluated gate requires the same computational effort, thus additions are not considered to be free as in the BFV or BGV cryptosystems.The most relevant metric for TFHE in gate-bootstrapping mode is, therefore, the total number of gates.Furthermore, SIMD style packing is not supported in TFHE.Since TFHE only allows to encrypt boolean values, we do not implement and consider F p ciphers in this library.

B Additional Benchmarks
In this section we give the benchmarks of all the ciphers in the original TFHE library (Appendix B.1), and in HElib (Appendix B.2). Finally, we compare the plain performance of Pasta and Masta in Appendix B.3.

B.1 TFHE Benchmarks of Z 2 Ciphers
Since the noise in the ciphertexts is reset after every homomorphic operation due to gate-bootstrapping in TFHE, we do not have to choose any parameters for the benchmarks (except the security level, which we set to 128 bits).In Table 11 we present the benchmarks for the TFHE library for homomorphically decrypting only one block, and for the small HHE use case from Section 5. We give timings for homomorphically encrypting the symmetric key, homomorphically decrypting one block, and for the small HHE use case.Since TFHE does not support packing all implementations are bitsliced (i.e., one HE ciphertext per bit).
Discussion.In Figure 9 we compare the runtime of homomorphically decrypting one block and the whole HHE use case (including homomorphic decryption) of the Z 2 ciphers in TFHE.In the gate-bootstrapping version of TFHE the main performance metric is the total gate count, which is why Kreyvium is the fastest choice.Since the TFHE library only allows plaintexts in Z 2 we do not implement and compare F p ciphers in TFHE.

B.2 HElib Benchmarks
In this section, we give all the benchmarks in the HElib.First, we benchmark the Z 2 ciphers, before we compare them to the F p ciphers.Finally, we benchmark Pasta, Masta, and Hera in a more extensive use case.In HElib, the security and available noise budget mainly depend on the choice of the cyclotomic reduction polynomial, as well as the size of the ciphertext modulus.A bigger modulus provides a bigger noise budget at the cost of less security.A bigger cyclotomic polynomial provides more security, but is bad for performance.In our benchmarks, we use the tool provided by HElib to find suitable parameters given a target security level of 128 bits and a target noise budget which we gathered from the experiments.The resulting parameter sets provide λ ≈ 128 bits of security with the majority of sets providing slightly less.
In Table 12 we present the benchmarks for the HElib library, for homomorphically decrypting only one block, and for the small HHE use case from Section 5.For both benchmarks we give timings alongside the chosen m-th cyclotomic reduction polynomial (chosen by HElib) and the estimated security λ (estimated by HElib).For the HHE use case we additionally give the runtime for the affine transformation use case.To compare the benchmarks to SEAL and TFHE, all implementations are bitsliced (i.e., one HE ciphertext per bit).
Remark 4. HElib supports packing for Z 2 plaintexts.Even though a packed implementation of the symmetric ciphers will increase their overall performance, it complicates the evaluation of an integer matrix-vector multiplication based on binary circuits.Therefore, packed implementations do not fix the main issue of Z 2 ciphers for HHE, which is supporting integer arithmetic over F p .For this reason, we do not provide explicit packed benchmarks for the ciphers in the paper.

B.2.2 Comparing Pasta to Z 2 Ciphers in HElib
The benchmarks for HElib can be seen in Table 13 where we depict both runtime and remaining noise budget after each step of the HHE use case from Section 5.In the following, we compare the runtime and noise consumption of all Z 2 and F p (with p = 65537) ciphers, namely in Figure 10 for homomorphically decrypting one block in HElib (F p values from Appendix B.2.3), and in Figure 11 for the HHE use case (including HHE decompression) in HElib.Since Masta and Pasta require to use the m-th cyclotomic reduction polynomial (X m/2 + 1), where m is a power-of-two, we chose parameters differently compared to Appendix B.2.1: We parameterize q to provide enough noise budget to evaluate the benchmark and chose the m to be the smallest power-of-two such that the parameters provide ≥ 128 bits security.Thereby, for a fixed m, a smaller q provides both, larger security and faster performance.Consequently, greater λ in Table 13 also lead to faster runtimes compared to instantiating the same benchmark with exactly 128 bits of security.

B.2.3 HElib Benchmarks of F p Ciphers
In Table 14 we present the benchmarks for the packed implementation of Pasta, Masta, and Hera in the HElib library.We give timings for homomorphically decrypting one block and additionally timings for the bigger HHE use case (Section 9.2).We chose parameters in the same fashion as in Appendix B.2.2, i.e., choosing q to provide enough noise budget to evaluate the benchmark, and choose the m-th cyclotomic reduction polynomial, with m being a power of two, such that the HE scheme provides ≥ 128 bits security.
Remark 5.In Table 14, some benchmarks were run with λ < 128 bits security.The reason for that is that m = 262144 unfortunately lead to infeasible runtimes.Consequently, m = 131072 seems to be an upper limit for feasible runtimes in HElib, and use cases requiring larger amounts of noise than can be provided by m = 131072 and λ ≥ 128 would inevitably require an efficient bootstrapping operation.
In the following figures we compare the runtime and noise consumption of the ciphers for 3 different prime fields F p , in Figure 12 for homomorphically decrypting one block in HElib, and in Figure 13 for the HHE use case (including HHE decompression) in HElib.

B.3 Plain Benchmarks of Pasta, Masta and Hera
In Table 15 we compare the number of CPU cycles of the encryption circuit of Pasta to the encryption circuit of Masta and Hera.Since these ciphers generate random matrices and/or round constants independent of the secret key, which can be precomputed before encryption, we additionally give CPU cycles for generating these affine layers and keys schedules and the encryption circuit with precomputed randomness.Table 15 shows that Hera, with its small block size and fixed matrices which can be evaluated purely by additions, is the fastest cipher in plain.However, this advantage comes at the cost of higher number of rounds, which worsenes homomorphic performance.Comparing Pasta to Masta, one can observe that Pasta-4, due to its small state size, requires the smallest number of cycles to encrypt one block.Pasta-3, on the other hand, due to sampling sequential matrices instead of polynomials m ∈ Z p [X]/(X t − α) (as in Masta) and requiring twice as many matrices per round, is the slowest cipher to encrypt one block in plain.However, the difference to Masta-4 is only a factor of 3, which in practice corresponds to latencies in the order of milliseconds.

C Packed vs. Word-Sliced Implementation of Pasta
In Section 6, we describe efficient SIMD algorithms to evaluate Pasta on a packed HE ciphertext.In this section, we want to compare them to a word-sliced implementation where one would encrypt only one field element ∈ F p into one HE ciphertext.A wordsliced implementation has several disadvantages.First, the homomorphic evaluation time of Pasta would be much slower.In a packed implementation, the S-boxes can be evaluated with O(1) homomorphic operations, and with O(t) HE operations in a wordsliced implementation.The word-sliced affine layer requires O(t 2 ) HE operations compared to O(t) operations when using packing.Secondly, the initial setup in the HHE use case requires the transmission of the HE encrypted symmetric key.In a packed implementation, this is always only one HE ciphertext.However, in a word-sliced implementation, one has to transmit 2 • t HE ciphertexts, drastically increasing the communication cost of this setup phase.Finally, if the HHE use case leverages packing, one has to reconstruct a packed ciphertext from its word-sliced state using many rotations on the server.

Figure 1 :
Figure 1: Encryption + upload time of HE, HHE with Pasta, and LWE-native encryption [CDKS21] depending on network speed.

Figure 2 :
Figure 2: The r-round Rasta construction to generate the keystream K N,i for block i under nonce N with affine layers A j,N,i .The picture is taken from [DEG + 18].

•
. For a prime p s.t.gcd(p − 1, 3) = 1, 11 a Pasta encryption is defined as • KGen(): sk $ ← F 2t p Enc sk ( m, N ): To encrypt the message m ∈ F l p under the secret key sk and nonce N , parse m = m 0 || m 1 ||...|| m j with m i ∈ F t p and return c = c 0 || c 1 ||...|| c j , where c i = m i + left t (Pasta-π(sk, N, i)), where left t (•) returns the first t words.• Dec sk ( c, N ): To decrypt the ciphertext c ∈ F l p using the secret key sk and nonce N , parse c = c 0 || c 1 ||...|| c j with c i ∈ F t p and return m = m 0 || m 1 ||...|| m j , where m i = c i − left t (Pasta-π(sk, N, i)), where left t (•) returns the first t words.

Figure 3 :
Figure 3: The truncated r-round Pasta-π permutation to generate the keystream K N,i for block i under nonce N with affine layers A j,k,N,i .

Figure 4 :
Figure 4: Comparison of the estimated number of monomials in each of the output words according to Eq. (3) and the lowest number of monomials found in a practical evaluation.

Figure 8 :
Figure 8: Runtime and noise comparison for the bigger HHE use case in SEAL (security level λ = 128 bit).Ciphers marked with a * did not have enough noise budget.

Figure 9 :
Figure 9: Runtime comparison of homomorphically decrypting one block and the small HHE use case (including HHE decompression) of Z 2 ciphers in TFHE (security level λ = 128 bit).

Figure 11 :
Figure 11: Runtime and noise comparison for the small HHE use case in HElib (HE security level λ ).

Figure 13 :
Figure 13: Runtime and noise comparison for the bigger HHE use case in HElib (HE security level λ ).Ciphers marked with a * were evaluated with less than 128 bit HE security.

Table 1 :
Comparison of a use case with HHE to only using HE in SEAL.

Table 2 :
Parameters of the benchmarked Z 2 ciphers in their respective modes of operations in bits.

Table 3 :
Benchmarks of the Z 2 ciphers in the SEAL library (security level λ = 128 bit).

Table 4 :
Cost of HE operations in SEAL and HElib.

Table 5 :
HE operations and depth of different S-boxes.
S-box pt-ct Add ct-ct Add pt-ct Mul ct-ct Mul Rot pt-ct Depth ct-ct Depth

Table 6 :
Homomorphic operations and multiplicative depth of the linear layers, with t = t 1 • t 2 and 2 • t = s 1 • s 2 .

Table 7 :
Homomorphic operations and multiplicative depth of Pasta, with t = t 1 • t 2 .

Table 9 :
Runtime and noise budget of the small HHE use case in the SEAL library (security level λ = 128 bit).

Table 10 :
F p benchmarks for the SEAL library (security level λ = 128 bit).Noise budget did not suffice and bigger parameters are not available in SEAL.Thus, bootstrapping is required. a

Table 12 :
Benchmarks of the Z 2 ciphers in the HElib library.

Table 13 :
Runtime and noise budget of the small HHE use case in the HElib library.

Table 14 :
F p benchmarks for the HElib library.
a Further increasing m for security resulted in infeasibly long runtimes.

Table 15 :
Cycles for encrypting one block in plain, averaged over 1000 executions.