Cold Boot Attacks on Ring and Module LWE Keys Under the NTT

. In this work, we consider the ring-and module-variants of the LWE problem and investigate cold boot attacks on cryptographic schemes based on these problems, wherein an attacker is faced with the problem of recovering a scheme’s secret key from a noisy version of that key. The leakage resilience of cryptography based on the learning with errors (LWE) problem has been studied before, but there are only limited results considering the parameters observed in cold boot attack scenarios. There are two main encodings for storing ring-and module-LWE keys, and, as we show, the performance of cold boot attacks can be highly sensitive to the exact encoding used. The ﬁrst encoding stores polynomial coeﬃcients directly in memory. The second encoding performs a number theoretic transform (NTT) before storing the key, a commonly used method leading to more eﬃcient implementations. We ﬁrst give estimates for a cold boot attack complexity on the ﬁrst encoding method based on standard algorithms; this analysis conﬁrms that this encoding method is vulnerable to cold boot attacks only at very low bit-ﬂip rates. We then show that, for the second encoding method, the structure introduced by using an NTT is exploitable in the cold boot setting: we develop a bespoke attack strategy that is much cheaper than our estimates for the ﬁrst encoding when considering module-LWE keys. For example, at a 1% bit-ﬂip rate (which corresponds roughly to what can be achieved in practice for cold boot attacks when applying cooling), a cold boot attack on Kyber KEM parameters has a cost of 2 43 operations when the second, NTT-based encoding is used for key storage, compared to 2 70 operations with the ﬁrst encoding. On the other hand, in the case of the ring-LWE-based KEM, New Hope, the cold boot attack complexities are similar for both encoding methods.


Introduction
One of the attractive features of the Learning with Errors problem (LWE) [Reg05] is its "leakage resilience" [DGK + 10, BG10, BL14] which roughly states that the difficulty of the problem deteriorates only gradually as information about the secret is leaked.Indeed, the LWE problem has been shown to remain hard even when an attacker knows many bits of the secret [AGV09] or when random inner products with the secret vector are known [Pie12].With efficiency in mind, many systems proposed for practical use are based on the related ring-LWE problem (RLWE) [LPR13a] and module-LWE problem (MLWE) [LS15].For this setting, fewer results are known.For example, the ring-LWEbased public key encryption scheme from [LPR13b] was recently shown to remain IND-CPA secure when certain information on the private key is leaked [DSGKS17].
A running motivating example in the leakage resilience literature is "cold boot attacks", cf.[AGV09, NS09,DSGKS17].Cold boot attacks were introduced and studied in the seminal work of Halderman et al. [HSH + 09].Briefly, cold boot attacks rely on the fact that bits in RAM retain their value for some time after power is cut.In order to preserve the value for longer, memory can be cooled to extreme temperatures (−50 • C) in order to retain a ρ 0 = 1% bit-flip rate even after a time period of ten minutes.Halderman et al. also noted that bit-flip rates as low as ρ 0 = 0.17% are possible when liquid nitrogen is used for cooling.Another key observation was that memory has a ground state that the bits will decay to over time, i.e. the noise introduced is very biased.However, it was also noticed that there is a very small but non-zero probability of retrograde bit-flips away from the ground state.It was estimated that these retrograde bit-flips occur at a rate of ρ 1 ∈ [0.05 − 0.1%].In a cold boot attack, then, the attacker is assumed to have physical access to a machine shortly after a power down cycle.The attacker proceeds by extracting from memory a noisy version of a scheme's secret key, where a small number of bits have been flipped.The attacker then recovers the key by applying bespoke error correction algorithms.
To date, cold boot attacks have received a significant amount of attention across a range of cryptographic primitives including a variety of symmetric ciphers [HSH + 09, Tso09, KY10, AC11], RSA [HS09, HMM10, PPS12], discrete log systems [PS15] and, most recently, NTRU [PV17].However, the literature so far contains no dedicated analysis of cold boot attacks against LWE-based cryptographic primitives.
Establishing the resilience to side-channel attacks of LWE-based schemes is gaining significance in light of these schemes being on the brink of widespread adoption.Firstly, LWE/RLWE/MLWE assumptions are popular candidates for post-quantum cryptography.For example, many proposals submitted to NIST's post-quantum standardisation process are based on these assumptions [SSZ17, Ham17, DKRV17, GMZB + 17, BAA + 17, PAA + 17, PHAM17, SAL + 17, LLJ + 17, ZJGS17, Saa17, NAB + 17, SPL + 17, DTGW17, SAB + 17, LDK + 17, LLKN17].We note that NIST considers resistance to side-channel attacks as a worthwhile, albeit secondary security feature: "schemes that can be made resistant to side-channel attack at minimal cost are more desirable than those whose performance is severely hampered by any attempt to resist side-channel attacks.We further note that optimised implementations that address side-channel attacks (e.g., constant-time implementations) are more meaningful than those which do not" [Nat16].Secondly, efforts to standardise homomorphic encryption schemes have gained traction, with the first white papers being issued [CCD + 17, ACC + 17, BDH + 17].The homomorphic encryption schemes being considered for standardisation are all based on the RLWE assumption.
Contributions and road map.We consider the resistance of RLWE-and MLWE-based schemes to cold boot attacks.In light of the leakage resilience of LWE mentioned above, we investigate how cold boot leakage of secrets stored as polynomial coefficients affects the hardness of the LWE problem.We show that for moderate cold boot error rates the resulting problem is considerably easier to solve than the side-channel-free RLWE/MLWE instances from which it is derived; for this analysis, we simply apply standard security estimates.However, we note that this analysis does not apply to many schemes as specified and implemented in practice.In particular, many schemes, e.g.[PAA + 17, SAL + 17, LLJ + 17, Table 1: Cold boot attacks on Kyber KEM keys stored in the NTT domain with ρ 0 , ρ 1 the cold boot bit-flip rates.The column "cost" gives the cost of recovering 256 components of the secret in terms of the number of lattice points visited during enumeration (≈ 100 CPU cycles each).The attack can be repeated to recover all 768 components.The column "rate" shows the overall success rate 1 − (1 − p 0 ) 2 for recovering 256 components of the secret, cf.Section 7. We also give the costs of a cold boot attack when the secret key is stored in the time domain in the column "non-NTT", cf.Section 4. In that case, the success rate is always expected to be close to 100%.ZJGS17, Saa17, SPL + 17, DTGW17, SAB + 17, LDK + 17, CLP17], make use of a power-oftwo cyclotomic ring Z[x]/(x n +1).This ring is amenable to performing multiplications using a (negacyclic) number theoretic transform (NTT) with complexity O(n log 2 n).Adopting the language of the Fourier transform from which the NTT is derived, the expensive step of an NTT computation is to transform the inputs from the time domain into the frequency domain and back; the actual multiplication takes only O(n) elementary operations.Thus, it is beneficial to keep intermediate values in the frequency domain.For example, the Kyber specification [SAB + 17] directly specifies the secret key in the frequency domain.This implementation detail dramatically alters the landscape for cold boot attacks on RLWE/MLWE-based schemes that specify the use of an NTT: now, a cold boot attacker is confronted with the problem of "decoding a noisy NTT", i.e. recovering the input to an NTT given a noisy output.This problem is well-defined in our setting since the sought after input is small compared to the modulus q for which the NTT is specified.While our attack in principle applies to all RLWE/MLWE schemes using the NTT and storing secret keys in the frequency domain, we use a running example of the default Kyber parameters [SAB + 17] for concreteness.To start off, in Section 3, we establish the decoding cost for cold boot attacks when the NTT is not used for secret key storage, and obtain a solving cost of 2 70 operations for ρ 0 = 1%, ρ 1 = 0.1% bit-flip rates.We then introduce the "cold boot NTT problem" in Section 4. This accurately captures the information available to the adversary in a cold boot attack on Kyber and related schemes that store the private key using an NTT.We then develop a practical attack with a cost of roughly 2 43 operations for the aforementioned bit-flip rates by exploiting properties of the NTT.We summarise our findings in Table 1.In addition to the running example of Kyber, we also analyse New Hope KEM [PAA + 17] to give an idea of the attack performance on a RLWE-based scheme.The results for New Hope are slightly different to the Kyber results and are summarised in Table 2.In particular, for the bit-flip rates considered, the attack complexities on New Hope when using the NTT for key storage are comparable to the case where the NTT is not used.
Our attack proceeds as follows.In Section 5 we show how to reduce the dimension of our NTT cold boot problem by using a divide and conquer approach that is inspired by the standard recursive formula for the NTT.We then show that the resulting low dimensional instance is efficiently solvable using a careful application of lattice reduction.What prevents our attack from having trivial complexity is that we encounter LWE-like instances where the secret distribution has a peculiar form.Specifically, each component of the secret can be written as the sum of a small number of positive/negative powers of two and the secret itself is guaranteed to be sparse.1Therefore, in Section 6, we introduce a special attack on LWE with this type of secret.This attack combines guessing of higher-order bits with running an enumeration for a closest vector [LP11,LN13].Similar combinations of combinatorial and lattice-reduction techniques have been previously considered, e.g. in [HG07].In particular, our approach is similar to that used in [BCGN17,dBDJdW18] for solving the Mersenne Low Hamming Ratio Search Problem [AJPS17].However, in contrast to [BCGN17,dBDJdW18], our attack is aided by the fact that we are considering a lattice derived from an NTT matrix.These lattices are highly structured and display a behaviour very far from that observed for random lattices.Thus, on the one hand, we cannot apply standard estimates for various quantities involved in lattice reduction.On the other hand, performing lattice reduction on the lattice bases that we encounter turns out to be easier than expected.Thus, the cost of our lattice-reduction is not derived from (standard) estimates but based on experimental evidence obtained using [FPL17,FPY18].We describe the relevant properties of these lattices and the lattice reduction/enumeration step of our attack in Section 7. Finally, we report on the overall cost of our attack in Section 8.For completeness, we include an overview of other possible cold boot attack techniques (meet-in-the-middle and Gröbner bases) in the appendix.We also consider there an alternative approach to solving the cold boot problem based on Blahut's Theorem and the Berlekamp-Massey algorithm [Mas69].This approach succeeds when the bit-flip rate is low and where the secret key is guaranteed to have low Hamming weight when compared to the ring dimension.In particular, if the secret has Hamming weight w and an attacker has access to 2w consecutive clean components of the secret, then the full secret can be derived at a trivial cost.
Discussion.While our attacks are a far cry from the impressive bit-flip rates that can be handled for other primitives such as RSA and AES, they highlight that cold boot attacks apply to RLWE/MLWE-based schemes.Our results show that use of the NTT makes cold boot attacks easier for the MLWE-based Kyber KEM.However, for the RLWE-based scheme New Hope KEM, the complexity of cold boot attacks in the non-NTT and NTT cases is roughly the same for the bit-flip rates we considered.One reason for this is that our NTT-based attack allows us to consider each ring element of an MLWE secret individually which reduces the dimension of the cold boot problem.This is not possible in the case of RLWE where the secret key is a single ring element with a large dimension.2This fact also explains why the bit-flip rates that our attack effectively handles are lower for New Hope.
For Kyber KEM, our results suggest that vulnerability to cold boot attacks can be mitigated by storing the secret in the time domain instead of the frequency domain.This counter measure would increase decryption time in a typical IND-CCA setting by a factor of at most two as such a conversion from the time to frequency domain must take place already due to the re-encryption step.3However, such a counter measure would not completely rule out cold boot attacks: for bit-flip rates of ρ 0 = 0.2% the resulting MLWE instance is still relatively easy to solve using the methods of Section 3.This countermeasure does not appear to be relevant in the case of New Hope according to Table 2 where the complexity of attacking a New Hope key remains comparable whether the NTT is used for key storage or not.However, future work may propose better algorithms for solving the cold boot NTT decoding problem.

Preliminaries
For positive real y, we write y to denote the integer part of y, y to denote the smallest positive integer larger than y and y to denote the rounding of y to the nearest integer (where we round down in the case of a tie).We denote the integers modulo q as Z q .We use subscripts to reference individual entries of vectors e.g. a i .We start counting at zero.In the case where we have a polynomial a(x) = n−1 i=0 a i x i , we often identify it with its vector of coefficients (a 0 , . . ., a n−1 ).Our treatment of a as either a polynomial or a vector should be clear from the context.We use the notation s ← D to mean that s is an element sampled from the distribution D. If s is a k-dimensional vector, then s ← (D) k denotes that each entry in s is drawn independently from the distribution D. If S is a finite set, we use the notation s ← S to denote that s is an element sampled from the uniform distribution over S.
Let q be a prime such that an 2n th primitive root of unity γ exists, and set ω = γ 2 .The negacyclic number theoretic transform (NTT) in dimension n will be defined as the linear function NTT : Z n q → Z n q given by NTT n (a) i := This transform allows for fast polynomial multiplication in rings of the form Z q [x]/(x n + 1) where n is a power of two [SS71,Win96].In general, â will be used as shorthand for the NTT of a and we often drop the subscript n when its value is clear from the context.The inverse negacyclic NTT is given by We need the following result, whose proof is an easy exercise:

LWE definitions
We will be using the definition of ring-LWE (RLWE) discussed in [LPR13b] since it best represents practical use.The reason for this is that the original RLWE definition [LPR10] (and the definition of module-LWE (MLWE) from [LS15]) uses a continuous error distribution which is inconvenient in practice.We restrict to rings of the form R = Z[x]/(x n + 1) where n is a power of two.We also define R q := R/(qR).
Definition 1 (Ring-LWE distribution).For a "secret" s ∈ R q and an error distribution χ over R, a sample from the ring-LWE distribution A s,χ over R q × R q is generated by choosing a ← R q uniformly, e ← χ and outputting (a, a • s + e mod qR).
Definition 2 (Search ring-LWE problem).The search ring-LWE problem with secret distribution D over R entails recovering s from arbitrarily many samples of A s,χ where s ← D.
We note that in practice, we usually have a restriction on the number of samples.In the module-LWE definitions below, k will be a positive integer representing the module "rank".It is understood that if a := (a (0) , . . ., a (k−1) ) ∈ (R q ) k and s := (s (0) , . . ., s Definition 3 (Module-LWE distribution).For a "secret" s ∈ (R q ) k and error distribution χ over R, a sample from the module-LWE distribution A k,s,χ over (R q ) k × R q is generated by choosing a ← (R q ) k uniformly, e ← (χ) k and outputting (a, a • s + e mod qR).
Definition 4 (Search module-LWE problem).The search module-LWE problem with secret distribution D over R entails recovering s from arbitrarily many samples of A k,s,χ where s ← (D) k .
The decision variant of RLWE challenges an adversary to distinguish between samples from A s,χ and the uniform distribution over R q × R q given that s ← D. Similarly, decision MLWE is the problem of distinguishing between A k,s,χ and the uniform distribution over (R q ) k × R q given s ← (D) k .

Minimal binary signed digit representation
We will often consider integers in binary signed digit representation (BSDR).This representation is reminiscent of binary representation for positive integers, apart from the fact that each individual bit in BSDR has its own sign.For example, (1, 0, −1) is a BSDR of −3 because −3 = 1 • 2 0 + 0 • 2 1 − 1 • 2 2 .We also have that −3 can be written as (−1, −1) in BSDR.It is clear that integers can have many BSDRs.In order to reduce the number of possibilities, we often consider the minimal BSDRs corresponding to the BSDRs with the minimum possible Hamming weight.For example, the minimal BSDR of 31 is (−1, 0, 0, 0, 0, 1).Note that this has a lower Hamming weight than the binary representation of 31 i.e. (1, 1, 1, 1, 1).Even when considering minimal BSDRs, the issue of non-uniqueness can arise.The integer −3 is a simple example of this.One can also consider integers in q-ary signed digit representation (q-SDR).For example, if q = 3, a possible q-SDR of the integer 8 would be (−1, 0, 1).Once again these representations are not unique.We extend these definitions to vectors in the obvious way, i.e. by considering vectors component-wise.

Lattices
We here only briefly recall the definitions relevant to this work.For an introduction to lattices and lattice-based cryptography see [MR09,Pei15].An n-dimensional lattice is a discrete subgroup of R n .A rank m lattice Λ can be written in terms of a basis { b 0 , . . ., b m−1 } as We only consider full-rank lattices in this work where m = n.We can represent the basis as a matrix B ∈ R m×n where each row is considered to be a basis vector.The main computational lattice problem that will arise in this work is the bounded distance decoding problem (BDD), cf.[LN13].To define BDD in it simplest form, we will denote the shortest nonzero vector in a lattice Λ as λ 1 (Λ).The BDD problem then asks to find the closest lattice vector to some target point t under the guarantee that there exists a lattice vector within distance λ 1 (Λ) from t.
A common strategy for solving BDD is to first obtain a "high quality" basis for the lattice and then to run Babai's nearest plane algorithm to obtain a solution.At a high level, the most desirable bases for running Babai are short and orthogonal.Obviously, because of the geometry of most lattices, these bases simply do not exist.Due to this fact, definitions of "reduced" bases aim to mimic the notion of a short and orthogonal basis.The well-known BKZ algorithm [Sch87, CN11] outputs a so-called BKZ-reduced basis.It is parametrised by a block size β, indicating at which dimension calls are made to an exact SVP oracle as a subroutine in the algorithm.After performing BKZ-β reduction, the first vector in the transformed lattice basis will have norm δ m 0 • det(Λ) 1/m where det(Λ) is the determinant of the lattice under consideration and the root-Hermite factor δ 0 is a constant based on the block size parameter β.More generally, the quality of a reduced basis B can be expressed by the slope of the logs of the lengths of the vectors b i in the Gram-Schmidt orthogonalisation of B. For random bases, the Geometric Series Assumption (GSA) is commonly assumed to hold: Definition 5 (Geometric Series Assumption [Sch03]).The norms of the Gram-Schmidt vectors after lattice reduction satisfy Combining the GSA with the root-Hermite factor and the fact that det(Λ) = Increasing the block-size parameter β of BKZ-β leads to a smaller δ 0 but also leads to an increase in run-time.In this work, we consider the "enumeration regime" where lattice point enumeration is used to realise the exact SVP oracle in dimension β.In this case the running time grows as β Θ(β) [Kan83,MW15].
Babai's nearest plane algorithm has been generalised to consider multiple planes [LP11].This, in turn, can be considered as a form of pruned BDD enumeration [LN13].In this work, we follow the BDD enumeration approach to solving BDD, i.e. we first compute a high quality basis and then run pruned enumeration to recover the (hopefully) closest vector to our target vector.As is usual, we run enumeration in some sub-dimension and then extend the solution in the projected sub-lattice to a full solution by running Babai's nearest plane algorithm.This is equivalent to picking very small pruning coefficients for the smallest indices.We make use of BKZ and enumeration as implemented in [FPL17,FPY18].This implementation also features a Pruning module, which computes parameters for pruned enumeration.

Leakage resilience for Kyber's parameters
As mentioned above, we use the default parameter set of the Kyber KEM [SAB + 17], henceforth referred to simply as "Kyber", as the running example.However, we stress that our analysis applies generally to RLWE/MLWE keys as we will see later when the New Hope KEM [PAA + 17] is considered.Kyber relies on the MLWE problem in dimension k = 3 over the ring R q = Z 7681 [x]/(x 256 + 1).It uses a centred binomial error distribution B η with parameter η = 4.This distribution has standard deviation η/2 = √ 2. In Kyber, the components of the secret also follow B η .Now, consider the Kyber public key a, b := a • s + e with s i , e i ← B η and assume that, due to some leakage, we are given a noisy version of s, denoted by s := s + ∆.Here, the addition is over R q and ∆ is an element of R q representing bit-flips.This means that each component of ∆ should have low Hamming weight when written in minimal BSDR.For illustrative purposes, we will focus on cold boot bit-flip rates of ρ 0 = 1.0%towards the ground state and a retrograde bit-flip rate of ρ 1 = 0.1%, cf.[HSH + 09].More values are given in Table 1 and estimates for the New Hope KEM are given in Table 2.We consider which is an MLWE instance for the secret ∆.We note that the conversion works both ways, i.e. an attacker who can find ∆ can then solve the above MLWE instance, and thus the two problems are equivalent.By definition of B η we have −η ≤ s i ≤ η.Thus, s i fits into four bits (including one sign bit) and we may assume that the secret ∆ is both relatively sparse (at least when considered in minimal BSDR) and has components that are bounded by η = 4 in absolute value.This means that we only need to consider 768 • 4 bits altogether.We assume that half of these bits are in the ground state and the other half are not.That is, for ρ 0 = 1.0%, ρ 1 = 0.1%, we obtain a ∆ with an expected number of 17 = (1.0 + 0.1)/100 • 768 • 4/2 non-zero components, each bounded by four in absolute value.According to the LWE estimator from [APS15] the MLWE instance (1) for these parameter sets take ≈ 2 70.3 operations to solve assuming enumeration is used to realise the SVP oracle [CN11]. 4This attack might be improved somewhat by taking into account the a priori distribution of the s i .

Cold boot NTT decoding problem
The discussion in the previous section assumes that s is stored in RAM as a vector with small components, allowing a cold boot attacker to obtain a noisy image of s.Yet, as discussed in the introduction, Kyber stores ŝ := NTT n (s) instead of s.Thus, a cold boot attacker does not encounter a noisy version of s but a noisy version of ŝ.In other words, the costs derived in Section 3 are immaterial for a real-world attack on Kyber.In particular, the decoding problem encountered during a cold boot attack on M/RLWE-based schemes utilising an NTT, is as follows: Definition 6 (Cold boot NTT decoding problem).Let NTT be a (negacyclic) NTT of dimension n modulo q, let ξ be some known constant mod q, let s be a vector with some known distribution χ and let ∆ be some vector with known distribution ψ.Then the Cold Boot NTT Decoding Problem is to recover s given s := ξ NTT(s) + ∆.
In the definition above, we slightly generalise the cold boot problem encountered by permitting a scaling factor ξ, cf.Section 5.As before, in our setting ∆ corresponds to bit-flips which means that each component of ∆ should have low Hamming weight when written in minimal BSDR.However, contrary to the discussion in Section 3, the norm of the "noise term" ∆ is not necessarily small.By analogy with LWE, it will be convenient to consider the problem with the roles of s and ∆ reversed, i.e. to consider the inverse NTT of the above instance.In particular, we will be considering the problem of recovering s or ∆ given where W is the inverse (of a possibly scaled by some constant) negacyclic NTT matrix for dimension n, s is known, s is small and ∆ is sparse in minimal BSDR.We sometimes write W n to explicitly indicate the dimension of the NTT.
In a standard LWE setting, the matrix A is uniformly randomly sampled mod q.Indeed, to prevent precomputation attacks, Kyber specifies that a fresh A is computed for s.In contrast, in our decoding problem each instance has the same W which is the matrix representation of an inverse negacyclic NTT.Thus, precomputation attacks become feasible.More importantly, though, this matrix is highly structured and, indeed, the q-ary lattices derived from this matrix do not behave like random lattices.We consider this in Sections 5 and 7.
We note that while we are only given n samples in our decoding problem, the problem is still well defined, despite ∆ not being small.This is because ∆ is sparse when its components are written in BSDR form.On the other hand, the distribution of ∆ implies that standard techniques for solving LWE-like problems need to be adapted.We consider this in Section 6.
We parametrise the cold boot NTT decoding problem by a parameter κ representing the number of expected bit-flips; explicitly: Finally, we note that, for Kyber, the dimension of the problem is immediately reduced from n • k = 768 to n = 256 since a single Kyber key gives rise to k independent cold boot problems.It should be noted that this reduction in dimension does not occur when considering RLWE keys since RLWE is effectively MLWE with k = 1.For bit-flip rates of 0.17% and 1% in the ground state direction (and 0.1% in the retrograde direction), we expect a total of less than (0.17 + 0.1) • 256 • 13/200 = 5 and (1 + 0.1) • 256 • 13/200 = 19 bits to be flipped respectively.Therefore, under these cold boot assumptions, we expect either 5 or 19 unknown bit-flips.Note that in both cases, the number of retrograde bit-flips is approximately 2. The case ρ 0 = 0.17% can therefore be solved by exhaustive search in 13•256/2 3 • 13•256/2 2 ≈ 2 50 operations.For the case, ρ 0 = 1.0%, the naive strategy of simply guessing the positions of bit-flips implies an attack of complexity roughly 13•256/2 17 . This is the case that we will use as our running example.

Divide and conquer
It is well known that a 2 n -dimensional Fourier transform can be written in terms of two 2 n−1 -dimensional Fourier transforms.The same holds for a negacyclic NTT.To aid the presentation of the appropriate formulae, define g (e) := (g 0 , g 2 , . . ., g n−2 ) and g (o) := (g 1 , g 3 , . . ., g n−1 ) for any g ∈ Z n q .The negacyclic NTT can be shown to satisfy the following relations: (3) Example 1.Consider n = 8, given a 2n-th root of unity γ, we can write the forward negacyclic NTT in matrix form as Adding the rows i and i + 4 for i ∈ {0, 1, 2, 3}, we obtain W (+) n as shown below which corresponds to the NTT matrix for n = 4 scaled by ξ = 2: Using this halving property, we can split our cold boot NTT decoding problem into two smaller cold boot NTT decoding problems.Recall that our cold boot instance is described by the equation s = NTT −1 n (∆) + s (see Equation ( 2)).To show how we utilise Equations ( 3) and (4), we perform the following steps: 1. Take a forward NTT to obtain the instance NTT n (s) = NTT n (s) + ∆.

Perform the two folding steps:
(a) (Positive Fold) Compute the vector described by (b) (Negative Fold) Compute the vector described by 3. Define ∆ (l) := (∆ 0 , . . ., ∆ n/2−1 ), ∆ (r) := (∆ n/2 , . . ., ∆ n−1 ) and do the following: (a) (Positive Fold): Multiply by 2 −1 mod q and take an inverse NTT.The resulting instance is s (b) (Negative Fold) Define the matrix Ω such that Ω i,j = (γω i ) −1 δ i,j where δ i,j is the Kronecker delta function.Take an inverse NTT to obtain the instance To summarise, in matrix notation, we can halve the dimension of the instance s = W n ∆ + s by performing the folding step and deriving the following two instances of half the dimension: Looking at the form of the sub-instance given by the "positive fold" (Equation ( 5)), it is clear that we can run a further divide and conquer step to reduce the dimension further.
In fact, we can repeatedly divide and conquer the positive fold to reach any dimension we wish as illustrated in Figure 1.Considering, the "negative fold", the additional scaling factor Ω prevents us from folding down further without rescaling the rows (and thus s).However, we note that on the lowest level, the attacker may still solve the negative branch.
Remark 1.Note that we can also attempt to divide and conquer on the inverse NTT directly in the hope of obtaining sub-instances with error terms of the form s (l) ± s (r) and secrets ∆ (e) or ∆ (o) .Yet, when attempting to do this for the negacyclic NTT, we actually obtain sub-instances with errors of the form s (l) + ω ±n/4 s (r) which are not guaranteed to be small.However, these instances are still susceptible to lattice attacks for limited folding levels.
A drawback of reducing to an extremely small dimension is that the secret becomes less sparse at each level, eventually to the point that its distribution approaches the uniform distribution.Nonetheless, performing only a limited number of folding steps can preserve sparsity.This is because if ∆ := (∆ (l) , ∆ (r) ) is very sparse, then ∆ (l) ± ∆ (r) is still expected to be sparse (albeit not as sparse as ∆) and of the same Hamming weight as ∆ when written in minimal BSDR.We will see later that a sparse minimal BSDR is the key to our lattice-based attack, so reducing to trivial dimension would be detrimental to our cold boot attack.

Extending a solution
We now show how to derive a solution to an n-dimensional instance given an oracle that solves just one of the child instances in dimension n/2.We instantiate such an oracle in Sections 6 and 7.
For this we note that given the solution to one of the sub-instances, we can derive a solution to the other.First assume that the minimal BSDR (or possibly elements of a minimal BSDR list) of ∆ (l) + ∆ (r) has Hamming weight equal to that of ∆.In other words, there was no decrease in minimal BSDR Hamming weight when performing the positive fold.Then each bit set in the minimal BSDR of ∆ (l) + ∆ (r) (or the single correct element of the BSDR list) originate from either ∆ (l) or ∆ (r) .Therefore, in order to guess ∆ (l) − ∆ (r) , we simply flip some bits in the minimal BSDR of ∆ (l) + ∆ (r) .We then check the correctness of the guess by substituting the value of ∆ (l) − ∆ (r) back into the instance.Note that the list of minimal BSDRs is expected to be relatively short.For example, of the integers {1, . . ., 7680}, less than 4.92% have a BSDR list length of 4 or more when considering 13-bit representations.The maximum BSDR list length observed for these integers is 21 and occurs just 4 times.Since we will typically be encountering integers with low Hamming weight minimal BSDR, the length of the minimal BSDR lists ought to be shorter than suggested by these figures over {1, . . ., 7680}.
If the Hamming weight of the minimal BSDR of ∆ (l) + ∆ (r) is different to that of ∆, it has decreased with very high probability. 5For example, assume that each component of ∆ (l) is the result of at most a single bit-flip.Assume the same for ∆ (r) .Performing a fold in such a case would mean that each component of ∆ (l) + ∆ (r) is the result of at most two bit-flips.Therefore the minimal BSDR of each component should have Hamming weight at most 2. In what follows, a sum a + b is intended to represent folding where a is from ∆ (l) and b is from ∆ (r) .Under this assumption, there are two cases where the Hamming weight decreases by 1: (a) Two bits with the same sign and position collide after folding e.g., (1 + 1) = 2, (−1 − 1) = −2 (b) Two bits with opposite signs appear in consecutive positions after folding e.g., The Hamming weight can also decrease by 2 if two bits with opposite signs collide e.g., (1 − 1) = 0, (−1 + 1) = 0.In light of these observations, we can still use a combinatorial approach to derive ∆ (l) − ∆ (r) from ∆ (l) + ∆ (r) even when folding caused the Hamming weight to decrease.For now, assume that the Hamming weight κ of ∆ is known6 and let κ denote the Hamming weight of ∆ (r) + ∆ (l) and ignore the small factors arising from the non-uniqueness of the minimal BSDR.We perform one of the following three guessing strategies depending on κ − κ : 0: Flip signs of ∆ (l) + ∆ (r) to guess ∆ (l) − ∆ (r) ; 2 κ guesses required.
2: Assume the Hamming weight decreased by 2 due to a single collision in bits with opposing signs; at most (n/2 • log(q) − κ) • 2 • 2 κ guesses required Note that the 3κ factor arises because we must choose one out of the κ bits that directly resulted from the Hamming weight decrease, and then there are at most three ways that this spurious bit occurred.For example, suppose the spurious bit represented the integer 2. Then it could be that this value arose from the (1 + 1), (4 − 2) or (−2 + 4).The (n/2 • log(q) − κ) factor arises in the third case because we must choose a 0 bit that arose from a collision and there are at most (n/2 • log(q) − κ) zeros that are set to 0. There is a chance that this guessing approach fails.In order to increase the probability of success, we would have to perform additional guessing phases where we try to correct multiple spurious bits assuming various configurations.However, our experimental results below show that performing the three guessing phases above already yields a good probability of success.We also note that in a cold boot attack the exact value of κ is not known.In this case, the attacker starts by assuming κ = κ , followed by κ = κ + 1 and κ = κ + 2. We note that this is sufficient to achieve a high rate of success.Furthermore, an attacker may also directly solve the problem of the neighbour branch ∆ (l) − ∆ (r) .Indeed, given ∆ (l) + ∆ (r) , we can eliminate either ∆ (l) or ∆ (r) from the neighbour instance to obtain a problem in either ∆ (l) or ∆ (r) .This new problem will have associated Hamming weight roughly κ/2.Furthermore, since κ < n there is a very high probability that a known value (∆ (l) ) i + (∆ (r) ) i = 0 is indeed the result of adding (∆ (l) ) i = 0 and (∆ (r) ) i = 0. Thus, the dimension of the neighbour instance can be further reduced by eliminating those components, producing a rather easy instance.
Combining the solutions from the two neighbour instances yields a solution for the parent instance.Thus, a solution in dimension n, implies a solution in dimension 2n which can then be extended to solutions in 4n, 8n, . . .using the simple guessing approach above.The overall divide and conquer strategy can be summarised as follows: 1. Repeatedly divide and conquer the positive fold until a desired target dimension n has been reached.
3. (a) Given a solution to the positive fold, guess the solution to the negative fold and work the solution upwards.This costs in the order of operations multiplied by the number of folds.
(b) If guessing fails, solve the negative instance directly, using partial information about ∆ (l) or ∆ (r) .
4. Repeat the previous step until the full solution is recovered.
Table 3 uses Kyber parameters with κ = 19 bits flipped to give an overview of how the Hamming weight of ∆ evolves as we fold multiple times.Assuming two folds, this shows a rough success rate of 74% when only considering the trivial phase of guessing to work a 64-dimensional solution upwards.However, when all three phases of guessing are used, we empirically estimate that the success probability is around 97% when working a solution up from dimension 64.The corresponding success probability with κ = 25 is 94%.These values were obtained by sampling 1,000 random vectors ∆ with minimal BSDR of Hamming weight κ = 19 and 25 and then analysing the cause of a decrease in Hamming weight whenever this occurred.A breakdown of the statistics of 1000 trials at the 128 to 64-dimensional fold are shown in Tables 4 and 5.In particular, we include how many times the Hamming weight decreases by 0,1 and 2 as well as how many of these are solvable in the three simple guessing phases described above.We also report success rates of 98% and 96% for solving this particular fold for κ = 19 and κ = 25 respectively.We reiterate that even when the simple guess-and-verify algorithm presented here fails, we expect to be able to solve the neighbour branch by making use of partial information about ∆ l or ∆ r .Thus, from now on, we will assume that the aspect of our attack introduced in this section always succeeds.

Lattice formulation
Our algorithm for solving the bottom level instance after applying repeated folding is inspired by the normal form of the primal attack on LWE.At a high level, the aim of this attack is to construct a lattice Λ which contains a vector v closest to (0, s), such that the offset between Λ and (0, s) is (∆, s).Then, finding this unique closest vector v to (0, s) allows to recover (∆, s).The success of this attack depends on v being the unique closest vector.Heuristically, we can expect the attack to work if (∆, s) is shorter than the shortest vector in Λ. 8 Looking at our instance in Equation (2), our "secret term" (interpreting the instance as LWE) is the vector ∆, which is not guaranteed to have small norm, but is guaranteed to be sparse.Note that we abuse notation slightly here and let Equation (2) refer to the bottom level instance after folding, i.e. ξ > 1 and ∆ is a vector obtained by repeated folding.As mentioned in the introduction, this setting is similar to that considered in [BCGN17,dBDJdW18].Now, since we know that the component-wise minimal BSDR of ∆ will be small in norm, the idea is to construct a lattice resembling the primal attack lattice with an offset vector containing the minimal BSDR of ∆ in its components.
In fact, we will generalise this idea to construct a lattice with the 2 -ary signed digit representation of ∆ as an offset.Let b = log 2 q and ∆ ( ) ∈ Z nb be the vector where all components of ∆ are expanded in the 2 -ary signed digit representation of minimal norm, i.e. we consider 2 -SDR.Concretely, for Kyber the reader may assume = 7 and thus b = 2. Now, let W ( ) = W ⊗ (1, 2 , . . ., 2 (b−1) ) ∈ Z n×nb and θ ∈ Q be some rational scaling factor.We take as our lattice Concretely, a basis for this (nb + n)-dimensional lattice can be constructed from the rows of where (•) T denotes a transpose.Our aim is that v := (0, θs) − (∆ ( ) , θs) ∈ Λ is the closest lattice vector to (0, θs).To estimate whether this is the case, we need to estimate the norm of the offset vector (∆ ( ) , θs) and the length of the shortest vector in Λ denoted by λ 1 (Λ).
For LWE, λ 1 (Λ) is estimated using the Gaussian heuristic.This is well justified for the LWE case where A is a uniformly random matrix mod q.However, the tensor product in W ( ) means that there are two classes of unusually short vectors in Λ.The first class contains vectors of the form (0, . . ., 0, 2 , −1, 0, . . ., 0) where the last n components are 0 and the 2 and 1 belong to the same chunk of b entries.This vector essentially "undoes" the tensor product, producing zero in the part corresponding to W ( ) .This vector has norm ≈ 2 , e.g.128 in our Kyber-based running example.
In addition to these short vectors, we must consider the expected length of the shortest vector in Λ ignoring such unusually short vectors.We will denote this length as λ 1 (Λ).As mentioned above, if W were uniformly random, we could follow the usual strategy and consider the Gaussian heuristic to estimate this norm as: However, as we will discuss in Section 7 the Gaussian Heuristic does not hold in our case.Thus, we will establish λ 1 (Λ) empirically using strong lattice reduction.Now, we expect that the unique vector v ∈ Λ closest to (0, θ s) satisfies v + (∆ ( ) , θs) = (0, θ s) when the following three conditions are all met: We note that the above conditions imply that we expect that a unique closest vector to our target exists.It does not, by itself, imply that it is efficient to recover it.Furthermore, we need to estimate the expected length of the vector (∆ ( ) , θs).Assuming κ n bit-flips and (ρ 0 +ρ 1 )•log 2 q 1 (so that each non-zero component of ∆ is with high probability the result of a single bit-flip), we have that ∆ ( ) 2 ≈ κ 4 −1 3 , cf.Proposition 1.We then expect that where σ is the standard deviation of the secret distribution.
Example 2. To carry out the analysis for Kyber, we pick = 7 which means q ( ) 2 = 3601 and log 2 (q) = 2. Thus, we heuristically require our offset vector to have squared norm < min(16385, 3601).Even picking a very small θ, i.e. ignoring the third condition above, this implies that we can only satisfy our constraints for κ ≤ 15.

A guessing strategy
To shorten the distance between the lattice and our target vector we simply guess the bits of ∆ that contribute most significantly to the norm of ∆ ( ) .9To formalise the former approach, we define a "band size" β that describes which bits we consider as contributing significantly to ∆ ( ) .For example, suppose we choose some ≥ 2 and a band size of β < .
Then we consider the top β bits of each entry in ∆ ( ) (written in minimum Hamming weight BSDR) as being significant.
Our "guessing approach" is simply to guess ∆ ( ,↑) and use the basic primal attack to find the short vector ∆ ( ,↓) .Note that assuming sparsity, the norm of ∆ ( ,↓) is smaller than that of ∆ ( ) so it is more likely that the primal attack will succeed.More concretely, once we have guessed ∆ ( ,↑) , we define s(↓) := s − W ( ) ∆ ( ,↑) and target offset vector (∆ ( ,↓) , θs).Now to investigate when (∆ ( ,↓) , θs) is likely to be the offset to the unique closest vector in Λ, we begin by assuming some fixed and β < and calculating the expected length of ∆ ( ,↓) .For every individual entry of ∆ ( ) , there are − β bits in the non-significant band and β bits in the significant band.Therefore, assuming κ bit-flips in total, we would expect roughly −β κ bit-flips10 in ∆ ( ,↓) .Assuming κ n (i.e.sparsity of bit-flips), we expect that At this point, we can reuse the three success conditions detailed above, as the characteristic properties of Λ remain unchanged.We refer to the process of removing the top-most bits of a vector as "shaving".This process is parametrised by a band size β and a maximum number of bits to correct, α.Setting α to be less than the expected number of bits set in the top band has the advantage of yielding a shorter number of potential guesses available, but there is also the disadvantage that there may still be a few bits set in the top band.If there are some bits still set in the top band, then the candidate vector ∆ ( ,↓) may still be too long.The number of possible guesses for the top band with at most α bits flipped is where the factor 2 i takes care of the fact that each set bit-flip takes values in {−1, 1} when multiple folding steps have been performed.If we have not folded, the factor of 2 i may be omitted since the sign of the bit-flips are known.
Example 3. Returning to our running example, we analyse the case = 7 again.Firstly, there are 256 • 2 β bits in the significant band.Note the factor of 2 due to the fact that each element of Z 7681 requires two integers when written in base 2 7 .However, since 7681 < 2 13 , the top most bit of each element of Z 7681 must be 0.This leaves 256 • (2β − 1) unknown bit positions where we must correct bit-flips.There is an average of 2β−1 13 • κ bit-flips in the unknown part of the significant band.The maximum κ such that (9) < 3601 with θ arbitrarily small, i.e. we are ignoring the second summand in (9), is given in Table 6.We use Equation (10) with α set to the expected number of bit-flips to estimate the number of guesses required for κ = 19 bit-flips in total.Even strategy.As illustrated in Example 3, the existence of vectors q ( ) is a main limiting factor for ensuring that our offset vector is unusually short.To remove this class of vectors from our lattice, we focus on resolving bit-flips in the least significant bits of the components of ∆.Assume for the moment that this has been achieved, and ∆ i mod 2 ≡ 0 for all 0 ≤ i < n.Then, instead of considering W ⊗ (1, 2 , . . ., 2 (b−1) ) ∈ Z n×nb we may consider i.e. scale the rows 0, b, 2b, . . ., nb of B by a factor of two.Since q mod 2 ≡ 1 we cannot write q as a linear combination of 2, 2 , . . ., 2 (b−1) .This removes the annoying vectors q ( ) from our lattice.To ensure ∆ i mod 2 ≡ 0, as assumed here, we may apply a similar guessing strategy as discussed above.However, we note that this comes with some additional cost for guessing and correcting the least significant bits of the components of ∆.Finally, we stress that our analysis so far uses expected values throughout.In Figure 2, we plot an example histogram of the (∆ ( ) , θs) 2 against our expectation for κ = 19 and θ = 3.As illustrated in Figure 2, the actually observed distribution has a large variance.Thus, to estimate the cost of our attack, we will derive parameters from empirical evidence.+ n (θσ) 2 because log 2 q < 14.Thus, half of our entries are bounded by 2 6 instead of 2 7 This is taken into account when we compute the expectation in this figure.

BDD on NTT lattices
So far, we have only analysed the existence of a unique closest vector to our target.The last ingredient of our attack is to find this vector, i.e. a vector in Λ := {x ∈ Z n log 2 q × Q n : W ( ) (1/θ)I n • x ≡ 0 mod q} that is close to (0, θs).Concretely, for Kyber we set = 7 and n = 32, where n ≥ 16 is chosen to preserve sparsity for ρ 0 = 1.0%, ρ 1 = 0.1%, where we expect κ = 19 bit-flips.To consider the geometry of the lattice spanned by our instances, consider the smaller case n = 4, θ = 1 (since it fits on this page).We obtain the q-ary lattice basis where all of the omitted entries are zero.Note that the lattice spanned by B contains the unusually short vector (1, 0, 1, 0, 1, 0, 1, 0, 4, 0, 0, 0).This vector is not an artefact of the tensor product but an artefact of B being derived from an NTT matrix: it corresponds to folding all the way down to dimension n = 1.More generally, the geometry of the q-ary lattices Λ considered in this work is far from what we would expect from a random q-ary lattice.In Figure 3, we plot the lengths of the Gram-Schmidt vectors of a BKZ-90 reduced basis for a lattice Λ corresponding to folding our 256-dimensional instance down to dimension n = 32.This lattice has dimension 96 = log 2 7 q • n + n.For comparison, we also plot the expected lengths of the Gram-Schmidt vectors according to the Geometric Series Assumption which approximates the behaviour of random q-ary lattices reasonably well.
Due to this unusual geometry, we cannot readily apply standard estimates for lattice reduction.As a case in point, computing a BKZ-90 reduced basis of the 96-dimensional lattice in Figure 3 took less than an hour with FPLLL [FPL17], i.e. reducing this basis is considerably faster than expected for random q-ary lattices.
Thus, to find the vector v ∈ Λ closest to (0, θs), we proceed as follows.First, we remove the unusually short vector that corresponds to folding all the way down to n = 1.This is accomplished by guessing the value of ∆ 0 and considering the sublattice spanned by the rows of Λ except for the first log 2 q rows.Pessimistically, we expect that this increases our guessing cost by a factor of log 2 q .We refer to this smaller basis as B and call d the dimension of the lattice spanned by B .Then, we compute a high-quality basis for the lattice spanned by B .In particular, for n = 32 we compute a BKZ-90 reduced basis.Then, for each guess as in Section 6.1, we perform one pruned BDD enumeration in dimension bs = min(60, d), i.e. the bs-dimensional sub-lattice orthogonal to the first d − bs vectors in B .We heuristically expect that BDD enumeration in block size bs will find the closest vector iff the projection The right-hand argument in (11) takes care of the fact that there is little point in enumerating beyond the length of the shortest vector in the projected sub-lattice if we are targeting a unique closest vector.We illustrate the expected behaviour in Figure 4, where we plot the projected norms for 256 samples of (∆ ( ) , θs) against the norms of the Gram-Schmidt vectors for our reduced basis B for θ = 3.Note that in contrast to Figure 3, the basis in Figure 4 is B and not B. We expect enumeration to succeed for every grey line that stays below the Gram-Schmidt vectors for all indices < d − bs. Figure 4 illustrates that we can improve our probability of success by increasing the enumeration dimension at the cost of increasing the running time.Note that the algorithm may still succeed when the heuristic success condition discussed above is not satisfied due to the orientation of the vectors involved.Therefore, we use the empirical evidence (cf.Tables 7-11) to establish the success rate.
The experiments we performed are as follows.We sampled random sparse binary vectors ∆ in dimension n for various κ and construct a corresponding cold boot NTT decoding problem.We then folded this instance down to dimension n = 32 and ran the guessing part of the algorithm for some parameters α, β.Since the cost of the guessing part of the attack is easy to predict, we simulated it by always picking the best "shaving" under the constraints imposed by α, β.This is implemented as the shave function in an 11 In our experiments, the approximation  appendix of the full version.We then ran lattice point enumeration to recover the offset vector, this is implemented in the function offset_vector.We report success when the returned vector matches the norm of our target exactly and failure otherwise.In summary, we implemented the full attack on the n = 32 sub-problem except for the guessing part.
We note that we also implemented and verified extending the solution upwards as described in Section 5.1.We summarise the observed behaviour of our algorithm for solving the bottom-level n = 32 instance in Tables 7-11.These tables illustrate the trade off between the two pruned exhaustive search steps in our algorithm, the first searching for set higher-order bits, the second searching for lattice points.Increasing one reduces the other.Furthermore, according to our empirical evidence, the "even" strategy may provide a small gain in some cases, but it is not overall more efficient than the "not even", i.e. "odd" strategy.All numbers in these tables were obtained using the proof of concept implementation in Sage [S + 17] available in the full version.To establish the cost of the enumeration, we use the number of nodes in the pruned enumeration tree as reported by the Pruner class from FPLLL/FPYLLL [FPL17,FPY18].Processing each node is generally assumed to take about 100 CPU cycles [FPL17].

Putting it all together 8.1 Kyber KEM
We now draw together Sections 5-7 to give a concise account of our attack and its performance on the Kyber KEM.Recall that we have 3 instances of the form s = W n ∆ + s for a single private key in Kyber, with n = 256.We first establish some notation.Below, "label, (n, m)" indicates that the instance with "label" in Figure 1 has n variables s i and that each error term is the sum of m original error terms ∆ j .Note that in Figure 1, the label of a node is given by the subscript in ∆.

Table 7:
Experimental results for Kyber parameters and number of bit-flips κ = 5 (ρ 0 = 0.2%, ρ 1 = 0.1%); θ is the scaling factor of our lattice, α the number of bits we guess in a band of size β.In the "even" case we target the least significant bits of the components of ∆ first.The column "guess" holds the number of guesses before lattice enumeration which includes the cost of guessing ∆ 0 , the column "enum" holds the number of nodes in the pruned lattice-point enumeration tree.The column "total" is the product of the two.All costs are give as log 2 (•).The column "rate" is the success rate over 200 experiments.Only parameters with success rate ≥ 60% are shown.The minimal total cost is highlighted in bold and used in Table 1 For each of our three sub-problems we perform the following steps: 1. Divide and conquer 3 times to obtain two bottom level instances +++ and ++-as in Section 5.
2. Solve at least one bottom level instance using combinatorial and lattice-reduction techniques as in Sections 6 and 7.The cost and expected success rate for solving one such instance are given in Section 7. If solving one instance succeeds with probability p 0 , we assume that this step succeeds with probability 1 − (1 − p 0 ) 2 , i.e. we assume the two bottom level instances are sufficiently different.
3. Substitute the solution obtained into the instance ++.This reduces it from (64, 4) to (32, 4), Solve this instance as in Sections 6 and 7. Note that solving this instance    is much easier than in the previous step since the Hamming weight of the noise is reduced to ≈ κ/2.We assume this step always succeeds.
4. Work the solution of ++ upward to + by solving +-using the information from ++ as in Section 5.1.This step succeeds with probability p 1 and we assume that it is cheaper than the previous steps.
5. Work the solution of + upward to "root" by solvingusing the information from + as in Section 5.1.We assume this step always succeeds and we assume that it is cheaper than the previous steps.
Thus, the overall complexity of recovering 256 components of the Kyber secret is to run the lattice attack from Section 7 three times (steps 2 and 3) and succeeds with probability 2 ).In particular, for our choice of parameters we have 12 p 1 ≈ 1 and p 0 > 0.6 and thus expect success with probability > 0.84.For example, with κ = 19, Table 9 shows that we can solve the hardest BDD problem with a cost of 2 43.3 and success probability p 0 = 0.705.Since this is by far the most expensive stage of the attack, we report an attack cost of enumerating ≈ 2 43.3 nodes in an enumeration tree where each node requires about 100 CPU cycles to process and a p 1 2 ) ≈ 0.91 success probability.We can attack each of the k = 3 module elements separately and combine the final solution.We note that the attacker can detect with high probability when a sub-solution is incorrect and thus invest more computational resources to increase the chance of success.We summarise our results in Table 1.
The attack needs to be run k = 3 times to recover a full Kyber secret.If a solution cannot be obtained for one of the three secret ring elements, then the solutions of the other two sub-problems can be substituted back into the original MLWE problem for Kyber's public key.This reduces the effective dimension of the public key to n = 256.An attacker could then target this smaller RLWE instance.Solving such an instance costs roughly 2 77 according to the LWE estimator from [APS15], again assuming that enumeration is used to realise the SVP oracle inside BKZ.As suggested above, an attacker could alternatively attempt to re-run our cold boot attack on the remaining unknown secret element with different parameter choices from Tables 7-11.This would boost the probability of success at the expense of a greater computational cost.

New Hope KEM
We now move away from our MLWE-based example of Kyber KEM and give a concise account of the performance of our attack on the RLWE-based New Hope KEM [PAA + 17].The parameters used are n = 1024, q = 12289 and the secret polynomials have coefficients lying in the set {0, ±1, . . ., ±8}.Similarly to Kyber KEM, New Hope uses an NTT to store its secret keys, meaning that we can launch the same cold boot attack.An important distinction between the Kyber and New Hope cases is that, for Kyber, we obtain multiple independent cold boot instances, each one corresponding to an individual polynomial in the secret key; this leads to multiple instances of relatively low dimension for Kyber.However, in the case of New Hope, we have just one cold boot instance in a large dimension.This distinction between MLWE-and RLWE-based schemes holds true in general for our cold boot attack in the NTT domain.
We focus our attention on the lattice aspect of the attack, assuming that we have folded the New Hope 1024-dimensional cold boot instance repeatedly to reach a 32-dimensional instance using the methods in Section 5. We can then experimentally estimate the success rate of solving this bottom level instance for various choices of θ, α, β using the methods Table 12: Experimental results for New Hope parameters and number of bit-flips κ = 10; θ is the scaling factor of our lattice, α the number of bits we guess in a band of size β.In the "even" case we target the least significant bits of the components of ∆ first.The column "guess" holds the number of guesses before lattice enumeration which includes the cost of guessing ∆ 0 , the column "enum" holds the number of nodes in the pruned lattice-point enumeration tree.The column "total" is the product of the two.All costs are give as log 2 (•).The column "rate" is the success rate over 100 experiments.Only parameters with success rate ≥ 50% are shown.The minimal total cost is highlighted in bold and used in Table 2 in Section 7 with b = 2 and = 7.The results for κ = 10, 19, 25, 30 are given in Tables 12-15.Note that the value κ = 19 roughly corresponds to the limiting cold boot case of ρ 0 = 0.17%, ρ 1 = 0.1% where liquid nitrogen is used to cool the RAM chip.We now reuse the analysis and notation from Section 8.1 to estimate the running time and success probability of the full attack on New Hope.The success probability of the attack is ≈ p 1 • (1 − (1 − p 0 ) 2 ) where p 1 is the success probability of working a bottom level solution up and p 0 is the probability of successfully solving a bottom level instance.Once again, we assume that this aspect of the attack can be performed successfully with probability p 1 ≈ 1 without dominating the complexity of the overall attack.To determine p 0 , we use the results form Tables 12-15.A summary of our results for κ = 19, 25, 30 is given in Table 2.

A.1 Linear complexity
In this section, we will be considering sequences of elements in a field Z q where q is prime.Linear feedback shift registers (LFSR) for binary sequences are well known as a concept.
We will be considering LFSRs over a field Z q i.e. shift registers where the input (or feedback function) is a linear combination (over Z q ) of the current register values.
Definition 7 (Linear Complexity).The linear complexity of a sequence is the length of the shortest LFSR generating the sequence.
Definition 8 (Connection Polynomial).Suppose an LFSR produces a sequence via the relation Then the connection polynomial of this LFSR is defined to be C(D) Remark 2. The linear complexity need not be equal to the degree of the minimal connection polynomial for finite sequences (see the example below).However, these two quantities are equal when considering infinite periodic sequences with a finite period.
The linear complexity and minimal connection polynomial of any finite (or infinite periodic) sequence can be calculated in polynomial time using the Berlekamp-Massey algorithm [Mas69].This algorithm is extremely generic as it accounts for sequences over any field and does not restrict to periodic sequences.
We now briefly overview the structure of the Berlekamp-Massey algorithm.Suppose we wish to find the linear complexity and connection polynomial of the finite sequence (a 0 , . . ., a n−1 ).Then the Berlekamp-Massey algorithm iteratively calculates the linear complexity and connection polynomial of each subsequence a 0 , . . ., a i for i = 0, . . ., n − 1. Suppose we have just completed the (k − 1) th loop and have arrived at a linear complexity of l k−1 and connection polynomial recall that some of these coefficients may be 0) for the subsequence (a 0 , . . ., a k−1 ).To start the k th iteration, we calculate the discrepancy defined to be d This tells us how far C k−1 (D) is from being the connection polynomial of the subsequence (a 0 , . . .a k ).There are three cases to consider when updating the linear complexity and connection polynomial: The Berlekamp-Massey algorithm gives explicit formulae for updating linear complexities and connection polynomials depending on which of the three cases is relevant.For a rigorous proof of correctness, see [Mas69].The pseudocode for the Berlekamp-Massey algorithm is given as Algorithm 1.
The second component of our attack is the following theorem: Theorem 1 (Blahut).Let q be a prime such that there exists an n th primitive root of unity in Z q and let NTT(•) denote a traditional NTT 13 of dimension n over Z q .For any s ∈ Z q , define (ŝ) := (NTT(s), NTT(s), . . . ) to be the sequence comprising of infinitely many copies of NTT(s).Then LC((ŝ)) = HW(s).
Blahut's Theorem has been proven for the traditional NTT.However, in this work we are considering the negacyclic NTT.It turns out that the correctness of Blahut's Theorem for the negacyclic NTT follows straight-forwardly from the traditional case: Lemma 1 (Negacyclic Blahut).Let q be a prime such that there exists an 2n th primitive root of unity in Z q and let NTT(•) denote a negacyclic NTT of dimension n over Z q .For s ∈ Z n q , define (ŝ) := (NTT(s), NTT(s), . . . ) to be the sequence comprising of infinitely many copies of NTT(s).Then, from Blahut's theorem for the traditional NTT, LC((ŝ)) = HW(s).
Proof.In this proof, we denote whether an NTT is negacyclic or traditional using neg or trad in the subscript.Let ω ∈ Z q be a primitive n th root of unity and γ ∈ Z q be a square root of ω.Also let g = (1, γ, γ 2 , . . ., γ n−1 ) and denote the component-wise multiplication of vectors.We then have sequence is reached.Examples of linear complexity profiles in both these cases in given in Figure 6.

A.2 Attack description
Suppose we are given a noisy version of a secret key with low Hamming weight w in the NTT domain.We will show that the Berlekamp Massey algorithm implicitly yields a strategy for finding such a key given 2w consecutive error-less symbols.The logic behind the attack is that the connection polynomial must be recovered fully once 2w symbols have been considered.A consequence of this is that the attack in Algorithm 2 works if there are 2w clean symbols in the noisy key.Note that if we were to disregard the NTT, leaking 2w symbols of the secret key does not lead to an immediate key recovery attack.
Lemma 2. For a prime q, integer n and vector s ∈ Z q with Hamming weight w, the minimal connection polynomial of ŝ := NTT(s) can be recovered given 2w consecutive symbols of ŝ.
Proof.Suppose that our linear complexity has reached w (which is its maximum value for the error-less NTT sequence) after the consideration of the first 2w symbols.We analyse the loop in the Berlekamp-Massey algorithm that considers 2w + 1 symbols.Since we know that the linear complexity cannot increase, we must either be in case 1 or 2 from Algorithm 1.However, to be in case 2, we must have 2L > N which translates to 2w > 2w + 1 for the loop in consideration.This is clearly impossible, so we must be in the case where the connection polynomial does not change.The same argument holds for the remaining iterations.
To complete the argument, we need to show that the linear complexity after 2w iterations is in fact w.Suppose not, i.e. that we have a linear complexity of w < w.Then at some point in the remaining iterations, we must increase the linear complexity to w. Suppose the first increase occurs for N = 2w + k for some k ≥ 0. Then we must be in case 3 from Algorithm 1, so we update the linear complexity to 2w + k + 1 − w > w which is a contradiction.Therefore we must reach the linear complexity of w after 2w symbols have been considered.
Note that we can change the starting point of the sequence ŝ without changing the proof of the result above.Therefore in the attack, we do not require that the 2w error-less symbols occur in the first components of ŝ.If we do not know where the error-less symbols are, we can simply re-run the Berlekamp-Massey algorithm on all of the cyclic shifts of ŝ in an attempt to get the error-less symbols at the beginning of the sequence.
A general framework for this attack is given as Algorithm 2. Note that this algorithm outputs a list of candidates given a noisy NTT secret.A simple way to find the solution within this list would be to check whether b − a • s is small (in the non-NTT domain) for each candidate.(t 0 , . . ., t 2w−1 ) ← (s i , . . ., s 2w+i ); end forL.Add((r 0 , . . ., r n−1 )); 12: end for 13: return L

A.3 Cold boot scenario
We now consider the Blahut-Berlekamp Massey attack within an NTT cold boot scenario.We will work with RLWE parameters n, q, w := HW(s).Recall that we need 2w consecutive clean symbols for the attack to go through which is equivalent to requiring 2w log 2 q consecutive bits of the secret key.When considering these bits in a noisy version of the secret key, about half of the bits will be out of the ground state indicating that they have not flipped.Therefore, assuming a bit-flip rate of ρ 0 towards the ground state and a bit-flip of ρ 1 away from the ground state, we expect (ρ 0 + ρ 1 )w log 2 q bit-flips within the entire block of 2w log 2 q bits.The is to exhaustively search for the bits that were flipped and run the Berlekamp-Massey algorithm to check each guess.Ignoring the trivial cost of running Berlekamp-Massey, we have a rough average complexity of w log 2 q ρ 0 w log 2 q • w log 2 q ρ 1 w log 2 q .(14 For example parameters w = 64, q = 12289, n = 1024, we have an attack with complexity roughly 2 80 for bit-flip rate ρ 0 = 1%, ρ 1 = 0.1% remembering that ρ 1 is the retrograde flip rate (if ρ = 0.17%, the attack complexity is roughly 2 28 ).In certain scenarios, this complexity could be much lower.For example, suppose there is a block of 2w log 2 q consecutive bits where the majority flips could have only occurred away from the ground state.Then we expect only a small number of bit-flips in this block (since ρ 1 < ρ 0 ), which reduces the amount of guesses required before the attack is successful.Therefore, in a cold boot attack, we would be able to identify the optimal consecutive block of 2w symbols to launch our attack on very easily.

A.4 Future directions for linear complexity attacks
The k-error linear complexity of a sequence is the lowest minimal complexity attainable when changing at most k symbols.This notion corresponds closely to the case where we have a noisy version of the key that contains at most k erroneous symbols.If we had an algorithm that computed k-error linear complexity along with the symbol changes required to minimise the linear complexity, then we would be able to recover secret keys in many non-trivial cases.However, efficient algorithms for calculating k-error linear complexities only exist for specific classes of sequences [SM93,KUI99].There is currently no efficient algorithm that handles sequences with power-of-two period n over a field GF (q) satisfying 2n|(q − 1).It is an interesting open problem to discover such an algorithm.

B Alternative algorithms B.1 Meet in the attack analysis
Meet in the middle attacks offer improvements in terms of time over exhaustive search at the expense of increased memory requirements.It seems feasible that combinatorial meet in the middle attacks are compatible with our cold boot scenarios from Section 4 since we are recovering a low entropy secret ∆ that is not small in Euclidean norm (see Equation 2).The idea is to split ∆ into a left half ∆ (l) ∈ Z n/2 q and a right half ∆ (r) ∈ Z n/2 q .In addition to this, define W i to be the i th row of the inverse NTT matrix and W i,(l) (W i,(r) ) to be the left (resp.right) half of this i th row.The meet in the middle attack hinges on the relations si − W i,(l) • ∆ (l) ≈ W i,(r) • ∆ (r) mod q, i = 0, . . ., n − 1, ( where the i th approximate equality is up to an error given by the i th coefficient of the true secret s which is assumed to be small for practical constructions.We assume that the bit errors are uniformly spread across our noisy key i.e. that there are κ/2 bit errors in both the left and right halves of our noisy key.We then pick a particular candidate ∆ * (l) and calculate the components of the vector arising from evaluating the LHS of Equation (15).We denote the resulting vector by t ∆ * (l) and store the pair (t ∆ * (l) , ∆ * (l) ) in a table T .This process is repeated for all valid choices of ∆ * (l) .Next we consider the value of the RHS for each candidate ∆ (r) and check the approximate equality given in Equation (15) with each entry of the table T .In particular, we are looking for pairs ∆ (l) and ∆ (r) such that the error in Equation ( 15) is a valid sample from the distribution that s was drawn from.Enumerating over all ∆ (r) produces a list of candidates for the full vector ∆.

B.1.1 Locality sensitive hashing
The description above should serve as an intuition rather than a guide to implementing such an attack in practice.Techniques such as locality sensitive hashing (LSH) can significantly decrease the computational cost of meet in the middle attacks in this setting [dBDJdW18].LSH essentially offers a method of efficiently finding the most likely table entries for a given candidate ∆ * (r) by organising the table entries into hash buckets according to some measure of closeness.More concretely, for some similarity measure D giving rise to the definition of a "ball" B(p, r) := {p : D(p, p ) ≤ r} around point p with radius r, we can define a locality sensitive hash family as follows: where Pr H denotes a probability when h is uniformly sampled from H.
Once this family is chosen, the standard LSH strategy is to construct different hash functions, each consisting of µ uniform random functions from H i.e. pick g j (•) = (h j,1 (•), . . ., h j,µ (•)), j = 1, . . .φ. ( where each of the h (•,•) are chosen uniformly at random from H. We then create one hash table per function g i and store our data points in the appropriate hash buckets.Now suppose we want to query the database on point p to see if there is a similar point in the database.Then we simply calculate the values g 1 (p), . . ., g φ (p) and compare with all of the data points in these φ buckets.
A concise summary of the runtime and space requirements of LSH is given in [Laa14] in the form of the following lemma.Note that the lemma only considers finding a single vector in the ball of radius r 2 .Our search problem asks to find all vectors in this ball.Therefore, we would have to search through all candidates output by LSH.Nonetheless, we will assume that our LSH family is good enough to ensure that our list of candidates ends up being very short.
Next, we must choose a similarity measure along with a family H.We choose to work with the Euclidean norm using the hash family from [AI06].A single hash is computed by a projection onto a random 24-dimensional plane followed by a translation and then a very cheap (< 519 real operations) Leech lattice decoding procedure.Letting ν denote the initial dimension of the data being hashed, we approximate τ h ≈ 24ν, m h ≈ 24 and τ D = ν in Lemma 3.
We now return to our running example of the Kyber scheme (n = 256, q = 7681, σ = √ 2) with κ bit-flips per secret ring element.As before, we consider attacking each of the secret ring elements individually.We have that the dimension of each vector ∆ (l) is ν = 128 and make the simplifying assumption that there are no retrograde bit-flips 14 .This means that there are N = 13•128/2 κ/2 possibilities for this vector taking into account that the ground state of memory makes it clear when a bit-flip has not occurred and that approximately half of the secret bits will be out of the ground state.Thus the number of unknown bits in ∆ (l) is 13 • 138/2 (rather than 13 • 128).Note that we have rounded κ/2 down to the nearest integer and are implicitly assuming 13•128/2 κ/2 choices for ∆ (r) .Since the secret in Kyber the gb_cost function from the estimator of [APS15], we obtain a cost of 2 117.3 operations which is not competitive with our lattice attack.

Figure 4 :
Figure 4: Projected lengths for 256 samples of (∆ ( ) , θs) and the norms of the Gram-Schmidt vectors for our reduced basis B for θ = 3 with κ = 19, folded down three times to n = 32 and shaved with parameters α, β = 4, 2. The dotted line indicates d − bs, i.e.where we start enumerating.

Figure 5 :
Figure5: A minimal length LFSR generating the finite sequence (3, 2, 3, 1, 3, 2, 4) over Z 7 .Note that the coefficients of the connection polynomial are the negation of the multiplicands in this diagram.This LFSR has length 4, yet the minimal polynomial is of degree 3.

Figure 6 :
Figure 6: Linear complexity profiles for a random sequence and for the NTT of a low Hamming weight vector.

Lemma 3 .•••--
Suppose there exists a (r 1 , r 2 , p 1 , p 2 )-sensitive family H.For a list L of size N , let ρ = log 1/p 1 log 1/p 2 , µ = log N log 1/p 2 , φ = cN ρ for some constant c.Then for any v in the appropriate space, we can either (a) find an element w * ∈ L such that D(v, w * ) ≤ r 2 , or (b) conclude that with high probability (≥ 1 − e −c ) that for all w ∈ L, D(v, w) ≥ r 1 .Let τ h be the time taken to compute a hash, m h be the storage size of a hash, and τ D be the time to calculate D(•, •).The algorithm requires: Preprocessing time: N µφ • τ h Hash table storage: N µφ • m h Single query time: Hash evaluation: µφ • τ h Expectation of comparison time: cN ρ • τ D

Table 2 :
Cold boot attacks on New Hope KEM.The column "cost" gives the cost of recovering all 1024 components of the secret in terms of the number of lattice points visited during enumeration (≈ 100 CPU cycles each).The column "rate" shows the overall success rate 1 − (1 − p 0 )2 for recovering 1024 components of the secret, cf.Section 7.For the columns labelled "non-NTT", see caption of Table1.

Table 3 :
The preservation rate of the Hamming weight of ∆ on folding multiple times for κ = 19 cold boot flips on Kyber parameters.

Table 4 :
A breakdown of the statistics on the 128 to 64 dimensional fold on 1000 Kyber cold boot instances (κ = 19) when carrying out the three guessing phases.The "Solvable" row indicates how many of the instances in each category are solvable by the three guessing phases.

Table 5 :
The analogous statistics to those in Table4for κ = 25.For details on the table entries, see the caption for Table4.

Table 6 :
The maximum possible κ handled by each guessing band size β for Kyber parameters and the cost of guessing the significant band. .

Table 13 :
Experimental results for New Hope parameters and number of bit-flips κ = 19; for details see Table12.

Table 14 :
Experimental results for New Hope parameters and κ = 25.For details see Table12.

Table 15 :
Experimental results for New Hope parameters and κ = 30.For details see Table12.