On Recovering Aﬃne Encodings in White-Box Implementations

. Ever since the ﬁrst candidate white-box implementations by Chow et al. in 2002, producing a secure white-box implementation of AES has remained an enduring challenge. Following the footsteps of the original proposal by Chow et al. , other constructions were later built around the same framework. In this framework, the round function of the cipher is “encoded” by composing it with non-linear and aﬃne layers known as encodings. However, all such attempts were broken by a series of increasingly eﬃcient attacks that are able to peel oﬀ these encodings, eventually uncovering the underlying round function, and with it the secret key. These attacks, however, were generally ad-hoc and did not enjoy a wide applicability. As our main contribution, we propose a generic and eﬃcient algorithm to recover aﬃne encodings, for any Substitution-Permutation-Network (SPN) cipher, such as AES, and any form of aﬃne encoding. For AES parameters, namely 128-bit blocks split into 16 parallel 8-bit S-boxes, aﬃne encodings are recovered with a time complexity estimated at 2 32 basic operations, independently of how the encodings are built. This algorithm is directly applicable to a large class of schemes. We illustrate this on a recent proposal due to Baek, Cheon and Hong, which was not previously analyzed. While Baek et al. evaluate the security of their scheme to 110 bits, a direct application of our generic algorithm is able to break the scheme with an estimated time complexity of only 2 35 basic operations. As a second contribution, we show a diﬀerent approach to cryptanalyzing the Baek et al. scheme, which reduces the analysis to a standalone combinatorial problem, ultimately achieving key recovery in time complexity 2 31 . We also provide an implementation of the attack, which is able to recover the secret key in about 12 seconds on a standard desktop computer.


Introduction
Historically, cryptanalysis is performed within the black-box model: the cryptographic algorithm under attack is executed in a trusted environment, and the view of the attacker is limited to the input-output behavior of the algorithm.Depending on the type of attack under consideration, the attacker may be able to observe the inputs and outputs of encryption or decryption queries, and perhaps choose the corresponding inputs, but nothing more.Such attack models are particularly relevant in scenarios where the attacker does not have direct access to an implementation of the scheme, whether because it is executed remotely, or within a protected hardware environment such as a secure enclave.
Since the advent of side-channel attacks however, new attack models have come into the light, wherein the attacker has access to some auxiliary information leaked by the implementation.These models are sometimes called gray-box models, in contrast with the black-box model outlined in the previous paragraph.Attacks in the gray-box model may exploit physical leakage such as computation time, power consumption, or electromagnetic leakage, among many others.Such attacks can result in practical breaks against schemes that would otherwise appear secure in the standard black-box model.
White-box cryptography.Going one step further, in 2002, Chow et al. introduced the white-box model [CEJVO02a,CEJVO02b].In this model, the attacker has full access to an implementation of the target cryptographic algorithm, including the ability to control its execution environment.Therefore he can observe memory content, set breakpoints in the execution flow, change arbitrary values in the code or the memory, etc.In this setting, the security assumptions of the black-box model clearly no longer hold.However, it may still be desirable that the adversary should be unable to extract the secret key of the cryptographic algorithm under attack.
This model is relevant in the context of software distribution, whenever a piece of software containing sensitive cryptographic information (such as an encryption algorithm) is to be widely distributed, and hence can be downloaded and analyzed by adverse parties.The most prominent application occurs in Digital Rights Management, where attackers may wish to recover a decryption key used to protect copyrighted content (digital music, TV broadcasts, video games, etc).A successful attacker is then able to distribute the secret key to unauthorized users, providing them with illegitimate access to the protected content.In effect, the goal is to protect sensitive functions within the deployed software, such as cryptographic algorithms, in much the same way that a trusted environment would protect security-critical functions in a hardware context.
In order to achieve this goal, white-box cryptography techniques attempt to obfuscate the implementation of the target cryptographic algorithm.Ideally, an attacker in possession of the obfuscated cipher should be unable to interact with it in any meaningful way, beside simply executing it on chosen inputs.While Barak et al. have shown that general program obfuscation is impossible [BGI + 01], the context of white-box cryptography presents two key differences.The first is that white-box cryptography merely attempts to obfuscate particular function families (such as block ciphers), which Barak et al.'s result has no bearing on.Another key difference is that white-box models do not generally require guarantees as strong as those offered by black-box obfuscation: in the case of a white-box implementation of AES for instance, it may be enough that the adversary is unable to recover the secret key (for a detailed discussion of white-box models, see e.g.[DLPR13,FKKM16]).
The CEJO framework.In their original 2002 articles, Chow et al. proposed such a white-box scheme for DES and AES [CEJVO02a,CEJVO02b].While their proposals were quickly broken [JBF02,BGEC04], their work opened the path to white-box encryption.Follow-up works often reused the same general framework, which we will call the "CEJO framework".
In the CEJO framework, each round function is obfuscated by being composed with carefully crafted input and output encodings.That is, the round function E (r) at round r is replaced in the white-box implementation by f (r+1) −1 • E (r) • f (r) , where f (r) , f (r+1) −1 are bijections called respectively to the input and output encoding.By design, the output encoding of each round is canceled out by the input encoding of the next round.For each round, the white-box implementation gives access to the encoded version of the round function F r = f (r+1) −1 • E (r) • f (r) , but not directly to the underlying round function E (r) .
Chow et al. proposed to define the encodings f (r) as the composition of a non-linear mapping and an affine mapping.The idea is to follow a classic concept in symmetric cryptography : the non-linear mapping will add some confusion on the intermediate values of the state, while the affine mapping will add some diffusion (see Sec. 3.3 and 3.4 in [CEJVO02b]).In addition, in a typical SPN block cipher, round keys are XORed into the inner state of the cipher.In that case, whenever the constant of the affine encoding is uniformly random, a single obfuscated round completely hides the value of the round key, which implies that a successful key-recovery attack must target multiple rounds simultaneously.Thus the CEJO framework is a natural approach to attempt to obfuscate a block cipher, especially in the case of SPN ciphers such as AES.
In addition to the above, some external input/output encodings M out /M in can be added before and after the cipher.In that case, the implementation provides a map from encoded plaintexts to encoded ciphertexts.These encodings are merged into the tables used for the initial and final encoded round function.The implementation is then equivalent to an encoded version of the cipher, which can be expressed as External encodings can be used to increase security, as the attacker is denied direct access to raw plaintexts/ciphertexts.On the other hand, external encodings assume that the implementation surrounding the white-box cipher takes these encodings into account.As such, a white-box implementation with external encodings is not properly speaking an implementation of the cipher it contains.For this reason, in this work, we shall explicitly signal the presence of external encodings, and use the term white-box implementation with external encodings when appropriate.
It is crucial that, given the encoded round function F r , the adversary should be unable to compute and peel off the encodings f (r+1) −1 and f (r) .Indeed, for typical ciphers such as AES, granting direct access to a single round E would allow the adversary to easily recover the corresponding round key, and from there the secret key of the cipher.However attacks on white-box implementations typically achieve precisely this, by taking advantage of the specific structure of the encodings A and B. In white-box implementations following the CEJO framework, encodings are composed of a very simple non-linear layer, together with a more complex affine layer.Attacks generally peel off the non-linear component, then proceed to recover the affine layer.This is typically achieved in an ad-hoc way, by exploiting specific properties of the scheme under attack.

Our Contribution.
As our main contribution, we propose a generic algorithm to recover affine encodings for any white-box implementation of a cipher following the CEJO framework, independent of the way the encodings are built.More generally, our algorithm solves the affine equivalence problem (given two maps F and S with the promise that they are affine equivalent, compute affine maps A, B, such that F = B • S • A) whenever one of the two maps is composed of the parallel application of distinct S-boxes.
Our main algorithm is very similar to one of the steps of the structural cryptanalysis of SASAS by Biryukov and Shamir [BS01], combined with a generic affine equivalence algorithm; for this purpose, we use the recent algorithm by Dinur [Din18], but the same attack would also work with the classic affine equivalence algorithm by Biryukov, De Cannière, Braeken and Preneel [BCBP03].Thus the components we use are not essentially new.However, to the best of our knowledge, the fact that they enable breaking all white-box schemes following the design of Chow et al. in a generic way has not yet been explicitly pointed out in the literature, or analyzed in detail, despite the fact that the SASAS algorithm predates both these schemes and their attacks.As a result, in our experience, this fact is also largely ignored by practitioners in the industry.
By design, our attack applies to a large class of white-box schemes following the CEJO framework, including [CEJVO02a,CEJVO02b,XL09,Kar10].Beyond the previously cited schemes, which were already broken by ad-hoc attacks, we illustrate our attack on a new white-box design by Baek, Cheon and Hong [BCH16].One distinctive feature of this design that makes it particularly attractive to illustrate our attack (beside not being previously cryptanalyzed) is that it increases the state size by obfuscating two parallel rounds of AES, precisely to prevent generic attacks from being able to recover the affine encodings of the scheme.Indeed Baek et al. estimate the security level of their proposal to 110 bits based on their own specialized version of an affine equivalence algorithm.However our generic attack on this scheme requires only about 2 35 basic operations.
As a second contribution, we analyze the scheme by Baek et al. more closely, and introduce another technique able to break this scheme.This new technique extracts and solves a standalone problem from the scheme by Baek et al.. Ultimately, it is able to recover the secret key of the scheme in time complexity 2 31 .This is verified with an implementation.This dedicated attack on Baek et al.'s scheme is also more powerful as it allows us to fully recover the key, while the generic attack only creates a decryption function without recovering the key.
In more detail, our two contributions are as follows.
(1) In an SPN cipher, a round function is composed of an affine layer (in which we include key addition), and a non-linear S-box layer.The S-box layer S consists of the application of k parallel m-bit S-boxes, where n = km is the block size.As a result, when encoding a round function using affine encodings, the encoded round function may be written as F = B • S • A, folding the affine layer into one of the encodings.A natural problem in this setting is the affine equivalence problem: namely, to recover affine encodings A and B, given F = B • S • A, and knowing S.More precisely, since A and B may not be uniquely defined, the problem can be stated as: given S and F as before, find affine maps The general affine equivalence algorithm by Dinur solves precisely this problem, without assuming any special structure on S [Din18] (this is also the case of the classic algorithm by Biryukov et al. [BCBP03]).However its complexity is O n 3 2 n , which makes it unsuitable for recovering encodings on a typical block size of 128 bits.In contrast, we focus on the case where S is made up of k parallel m-bit S-boxes.In this setting, we propose an algorithm that solves the affine equivalence problem with a (typically much lower) time For the AES parameters n = 128, m = 8, k = 16, this yields a time complexity of 2 32 basic operations1 (to be compared with 2 149 basic operations if the generic algorithm by Dinur were applied naively).
As noted earlier, due to its genericity, our attack applies to essentially all white-box schemes following the CEJO framework: this includes the original designs by Chow et al. [CEJVO02a,CEJVO02b], and later proposals [XL09, Kar10].In the case of Karroumi's scheme [Kar10], while it does not seems to follow the CEJO framework at first glance, it has been later shown that this scheme is equivalent to the CEJO framework [LRDM + 13, DMRP13], and hence our technique applies directly.
The main limitation of our attack is that it only targets affine encodings, whereas most white-box schemes following the CEJO framework also use non-linear encodings in addition to affine encodings ([CEJVO02a, CEJVO02b, Kar10, BCH16] do, while [XL09] only uses linear encodings).When non-linear encodings are used, our attack does not break the scheme by itself.However, even in the presence of non-linear encodings, the first step of attacks typically consists of peeling off the non-linear encoding layer first [BGEC04,BCH16], which do not apply to the state as a whole, and leaves the attacker with an instance of the previous problem.In this context, our algorithm provides a powerful tool, which is able to recover affine encodings in a very general setting.
(2) As a second contribution, we take a closer look at the scheme by Baek et al..We identify another angle from which the scheme can be attacked.At the core of this second approach lies the following problem.Let F , h 1 , h 2 be three non-linear mappings from m bits to m bits, and let A 1 , A 2 be two linear mappings on m bits.Given oracle access to G(x, y) = F (A 1 (x) ⊕ A 2 (y)) ⊕ h 1 (x) ⊕ h 2 (y), recover A 1 and A 2 (up to equivalence).We solve this problem and deduce an attack against the white-box scheme by Baek et al. with time complexity ∼ 2 31 operations.We implemented the full attack, and were able to recover the secret key (and external encodings) in about 12 seconds on a standard desktop computer.Our implementation is available at http://yaawai.tk/.

Related Work.
Literature on white-box cryptography, especially designs and attacks following the framework of Chow et al., is quite extensive.The first white-box candidate constructions by Chow et al. [CEJVO02a,CEJVO02b] were quickly broken in practical time [JBF02,BGEC04].
In 2009, Xiao and Lai proposed to rely on larger affine encodings covering two S-boxes at once [XL09].However, their proposal was broken in about 2 32 operations by De Mulder et al. [DMRP12].To thwart this attack, Karroumi proposed to use a dual representation of the AES round function in order to change the structure of each AES round [Kar10].But this was also broken in about 2 22 operations by Lepoint et [DMRP13].Note that all aforementioned attacks exploit the specific structure of the encodings used in the scheme under attack.As a result, they are more efficient than our generic algorithm, which works regardless of the structure of the encodings.Our algorithm also applies to these schemes and succeeds in practical time; but the point is that it is much more general: it does not require any structure in the affine encodings, and applies to all previous schemes at once, and more generally to all schemes in the CEJO framework.This includes Karroumi's scheme as it has been shown to be equivalent to the CEJO framework [DMRP13, LRDM + 13].
A useful tool in the context of white-box cryptanalysis is the linear and affine equivalence algorithm by Biryukov et al. [BCBP03].Their algorithm solves the following problem: given two bijections S 1 , S 2 on n bits, find affine (or linear, depending on the variant of the problem) mappings A, B such that S 2 = B •S 1 •A, if they exist.Biryukov et al.'s algorithm is both able to ascertain whether such mappings exist, and enumerate all solutions.The time complexity of their solution is O n 3 2 n when A, B are linear, and O n 3 2 2n when they are affine.In both cases, these complexities are practical when considering standard S-box sizes, such as n = 8.This algorithm has been further improved in the affine case by Dinur [Din18], bringing the complexity down to O n 3 2 n .Note however that this improved algorithm was designed for random permutations.Indeed, the AES S-box being self-affine equivalent, which is fairly rare in the random case, will lead to a failure of the algorithm.This was mentioned by the author, who also proposed a workaround.However our own implementation of the algorithm shows that it still fails on the AES S-box even when using the workaround.Hence, in that case of the AES S-box, we use the algorithm from [BCBP03] which has a higher complexity, but works on the AES S-box.
The main algorithm we propose in this article is essentially the same as the algorithm appearing in Section 2.3 of the structural cryptanalaysis of SASAS by Biryukov and Shamir [BS01].However it is worth noting that this algorithm, from 2001, predates the first white-box constructions, due to Chow et al. in 2002; and a fortiori later constructions in the CEJO framework.Yet, to the best of our knowledge, it has not yet been clearly pointed out in the literature that this older algorithm actually solves the critical step in attacks on white-box schemes in the CEJO framework, as we show in this article.And indeed this algorithm is not referred to in any of the attacks mentioned above.Thus, we regard as a worthwhile contribution for practitioners in the field to point out that all known constructions in the CEJO framework can be uniformly broken (as far as recovering affine layers, which is the critical step in most cases) by combining this algorithm with a generic affine equivalence algorithm.
Our attack is also related to the attack by Minaud et al. [MDFK15] on the ASASA construction [BBK14], as well as the followup work by Biryukov and Khovratovich [BK15].However, the ASASA attack would only recover the output spaces of S-boxes, not their input spaces, which we also need.In the setting where the ASASA (and SASAS) attack was developed, this was inconsequential, because the attacker had access to both the ASASA function and its inverse, so the problem was symmetric between input and output.However for us this is not the case: a key feature of our setting is that we only have access to an ASA mapping, but not its inverse.This difference is significant, as recovering the input spaces of the S-boxes from their output spaces seems as hard as breaking the scheme in the first place.And indeed, in the designs by Chow et al. to realize white-box AES and DES [CEJVO02a,CEJVO02b], we are not aware of any way to invert the encoded round function without also breaking the scheme.In addition to qualitative differences in the setting considered, the algorithm by Minaud et al. is also more expensive for typical parameters (e.g.n = 128 or 256), as it costs about 2 m n2 + n 6 operations, where the last term is due to having to solve a quadratic system in n variables.Running the ASASA algorithm on the scheme by Baek et al., recovering only the output spaces of S-boxes, would require 2 48 operations instead of 2 35 with our attack.Thus the SASAS algorithm [BS01], which we use, is the better approach in our setting.
At SAC 2008, Michiels, Gorissen and Hollmann also proposed a generic algorithm to break white-box implementations following the framework by Chow et al. [MGH08].Their work considers non-linear encodings, but requires two extra hypotheses: (1) the input space of each individual S-box through the input encoding should be known; and (2) the diffusion matrix of the scheme should satisfy a property called disjoint spanning block sets.In particular, that work does not solve the general problem of recovering arbitrary affine encodings surrounding a known S-box layer.Moreover, no overall complexity bound is provided 2 , as some steps of the algorithm are not accompanied by a time complexity bound.There is also no implementation, which further prevents assessing performance.
The idea of considering a specialized variant of Biryukov et al.'s generic affine equivalence algorithm in the context we have described thus far (i.e.where the inner non-linear layer is composed of distinct S-boxes) was also proposed by Baek, Cheon and Hong in [BCH16], who proposed the specialized affine equivalence algorithm (SAEA) for solving this problem.However, SAEA is very inefficient for larger n in our setting, with a time complexity of O min(n m+4 2 2m /m, n log(n)2 n/2 ) .Baek et al. used SAEA to assess the security of their own white-box implementation with external encodings of AES, predicting a security level of 2 110 operations.Our own generic algorithm, however, merely requires an estimated 2 35 basic operations, breaking the scheme with practical complexity.
Incidentally, both the previously cited works by Michiels et al. and by Baek et al., while introducing interesting new techniques, also illustrate the lack of awareness around the fact that the SASAS technique by Biryukov and Shamir [BS01], combined with a generic affine equivalence algorithm, solves the ASA problem generically.In this respect our work may be regarded as filling a gap in the literature.
Finally, an interesting and recent line of work has exhibited side-channel attacks on white-box implementations [BHMT16,BBIJ17].These approaches are quite powerful in that they require only "gray-box" access to the implementation, but are not generic attacks in the sense of our work.For example they are not applicable to the scheme by Baek et al.
(not only because the scheme obfuscates two parallel executions of AES simultaneously, but also because it uses external encodings on both ends of the cipher).By nature this approach also relies on experimentation, rather than providing analytical bounds as we do.
Recent work in this direction has shed more light on the success of the gray-box approach outlined above, and studied more closely the effect of affine and non-linear encodings on the resistance of a white-box implementation against side-channel attacks [SMG16,BBMT18].These works show that 4-bit non-linear encodings, which were recommended in the original scheme by Chow et al. for size reasons, are insecure in that context.Both works focus their analysis mainly on non-linear encodings, and on the (practically highly relevant) case of a white-box implementation of AES following [CEJVO02b].By contrast our work considers only affine encodings and requires full white-box access, but does so within a more general CEJO framework with an arbitrary SPN cipher and arbitrary (affine) encodings.

Structure of the Article.
In Section 2, we describe our generic algorithm to recover affine encodings in SPN ciphers in detail, together with its complexity analysis.In Section 3, we describe the white-box scheme by Baek et al..In Section 4, we first point out that our algorithm from Section 2 breaks this scheme in a generic manner, then develop a second dedicated attack underpinned by a different technique, and discuss its implementation.

A Generic Algorithm to Recover Affine Encodings in SPN Ciphers
In this section, we present our algorithm for solving the affine equivalence problem in the case where the inner non-linear layer is composed of parallel S-boxes.As discussed in the introduction, solving this problem amounts to recovering affine encodings from a white-box implementation of any SPN cipher based on Chow et al.'s approach, regardless of the way the encodings are built.More precisely, our algorithm solves the following problem.
Problem 1.Let F be an n-bit to n-bit permutation such that F = B • S • A, where: 1.A and B are n-bit affine layers; 2. S = (S 1 , . . ., S k ) consists of the parallel application of k permutations S i on m bits each (called S-boxes).Note that n = km.
Knowing S, and given oracle access to F (but not F −1 ), find affine A , B such that Before we move on to the algorithm itself, a few remarks are in order.
Remark 1. First, our statement of the problem allows the algorithm to query F , but not F −1 .This is tailored to match the real situation of recovering an affine white-box encoding.Indeed, white box schemes following the CEJO framework allow access to F , but not to F −1 , as the output of F is computed as a sum of some hard-coded table outputs, and inverting F would require knowing how to split a given output of F into the appropriate sum.To the best of our knowledge, the most straightforward way to achieve this is actually to break the scheme.
Of course, in other contexts, a variant of Problem 1 where the algorithm is granted access to both F and F −1 may also be worth considering.If n is small, it should be noted that F −1 can be computed exhaustively in 2 n operations, so if we are willing to pay 2 n calls to F , both variants of the problem become equivalent.In fact, our own algorithm will first isolate the input and output space of each S-box, then exhaust that space in 2 m operations for each S-box, which will allow us to access the inverse mapping of each S-box.Thus, essentially, our own algorithm will allow us to revert back to the case where the direct and inverse mappings are both available.In particular, it is not obvious how our algorithm could be improved even if F −1 were accessible.In this regard, we note that Baek et al. explicitly provide an algorithm to solve Problem 1 when F and F −1 are both available, in O n 4 2 3m /m operations [BCH16].However this is slower than our algorithm for all reasonable parameter ranges, even though our algorithm does not require access to F −1 (as noted in the introduction, Baek et al. also propose an algorithm when only F is accessible, but it is much slower).
Remark 2. As stated, Problem 1 asks to recover some affine encodings A , B such that F = B • S • A , but not necessarily A and B. This is because A and B may not be uniquely defined.In fact, if all S-boxes are identical (as is common in SPN ciphers), and as soon as there is more than one S-box, A and B cannot be uniquely defined: indeed, any solution (A, B) can be replaced by (P • A, B • P −1 ), where P is any permutation swapping S-box inputs.Problem 1 merely asks to recover a solution.However, because our algorithm eventually reduces the problem to the affine equivalence problem for each S-box, which is solved using the algorithm by Dinur, and that algorithm is able to enumerate all solutions if desired, it is straightforward to adapt our algorithm so that it outputs every solution.
Remark 3. The special case of Problem 1 where encodings are linear instead of affine may also be worth considering.As mentioned in the previous remark however, our algorithm eventually reduces Problem 1 to the affine equivalence problem for each S-box separately.As such, our algorithm can be trivially adapted to the linear variant of the problem by using a linear equivalence algorithm on each S-box, instead of an affine one.
Remark 4. In the special case where k = 1, i.e. S is composed of a single S-box, Problem 1 is precisely the affine equivalence problem tackled by Biryukov et al. [BCBP03] and Dinur [Din18], with the caveat that F −1 is not accessible.However, as mentioned in the introduction, the O n 3 2 n time complexity of the faster algorithm by Dinur precludes its use on full 128-bit blocks.From this perspective, the point of our algorithm is to achieve better time complexity, and in particular, practical complexity for n upwards of 128 bits, by using the fact that S is split into relatively small m-bit S-boxes.

Overview of the Algorithm
In a nutshell, the idea of the algorithm is to first isolate the input and output subspaces of each S-box, then apply the generic affine equivalence algorithm by Dinur to each S-box separately.
Thus, the first step of the algorithm is to find the input subspace of each S-box.More precisely, we want to build a subspace of dimension m of the input space, such that this subspace spans all 2 m possible values at the input of a single fixed S-box, and yields a constant value at the input of all other S-boxes.To achieve this, we use a differential cryptanalysis approach.Namely, we pick uniformly at random an input difference ∆.With probability 2 −m , ∆ yields a zero difference at the input of a particular S-box.We can easily ascertain whether this is the case by checking that the set of output differences generated by input difference ∆ spans a subspace of dimension n − m.If that is the case, then ∆ yields a zero difference at the input of one S-box, and non-zero differences at the output of all other k − 1 S-boxes3 .
By repeating this process a few times, we can eventually find n−m linearly independent input differences that yield a zero difference at the input of the same S-box.By going through this process for each S-box, we recover k spaces of dimension n − m, each yielding a zero difference at the input of a distinct S-box.Now if we pick any k − 1 of these spaces and compute their intersection, we obtain a space of dimension m that yields a zero difference at the input of k − 1 S-boxes, and spans all values at the input of the remaining S-box.This is precisely the space we wanted to build.
Indeed, if we query the overall permutation F on all 2 m values forming such a subspace, we obtain a mapping that is affine equivalent to the corresponding S-box.It remains to apply the affine equivalence algorithm by Dinur to recover affine mappings witnessing the affine equivalence for that S-box.We repeat this process for all S-boxes.Finally we merge together the affine mappings thus recovered for each S-box to obtain the overall solution.

Description of the Algorithm
We will first detail our algorithm in the case that all S-boxes are the same, and then explain how to adapt it to the case of different S-boxes.The main idea to solve this problem is to find all input difference spaces I i which activate only one of the S-boxes.That is, for a difference ∆ ∈ I i and any message x ∈ F n 2 , the difference after the application of A, i.e. ∆ = A(x) ⊕ A(x ⊕ ∆), is zero except on m consecutive bits corresponding to the input of the i-th S-box.Indeed for such an input difference space I i ⊂ F n 2 , since the S-boxes are bijective, the output difference space 2 is of dimension m, for any x ∈ F n 2 .Note that this output space O i does not depend on the choice of x.Therefore we can compute affine mappings P i (from F m 2 to I i ) and 2 which is affine equivalent to the S-box S. We can then use the affine equivalence algorithm by Dinur to recover two affine mappings A i , B i such that S = B i • S • A i .By doing this for each S-box, we will be able to build two affine layers A and B such that F = B • (S, . . ., S) • A .
Computing the I i 's.To compute the input spaces that we are looking for, we will begin by computing all input spaces V i which activate at most k − 1 S-boxes.More precisely, for i from 1 to k the space V i is such that, for any ∆ ∈ V i and x ∈ F n 2 , we have that A(x) ⊕ A(x ⊕ ∆) is zero on m bits corresponding to the input of the i-th S-box.There is k such spaces and once we have them, we can recover all the input spaces I j by computing the intersection of k − 1 spaces V i .
Computing the V i 's.We first remark that if we have a difference ∆ ∈ V i , then the output vector space of differences O i will be of dimension n − m instead of n since one S-box will be inactive.This is the test we will use to construct the V i 's.The idea is to pick a difference ∆ at random as well as n − m + l messages and then check whether the dimension of the output is lower or equal to n − m.For a large enough value l, a difference ∆ will satisfy the condition if and only if it belongs to one of the V i 's.Repeating this procedure enough time would allow us to fully recover the spaces V i .However this would lead to a lot of rank computations.Instead we observe that, once we found an element of V i , we can build the full output difference space O i .Hence we compute a parity-check matrix of O i , i.e. a matrix H i such that for any x ∈ F n 2 , H i • x = 0 if and only if x ∈ O i .This parity-check matrix can be used to quickly verify whether a vector belongs to O i , and, as a result, whether a difference ∆ belongs to V i .
Recovering affine layers.The two previous steps allow us to build the spaces I i and O i that we were looking for.As described above, we thus get some affine mappings A i , B i , P i , Q i for i = 1 . . .k.Note that we do not know which S-box is activated by the space I i , and thus one could think that we need to try all possible arrangement of those affine mappings.However this is not necessary, since we could always write F as F = B • P −1 • (S, . . ., S) • P • A where P is a permutation over the consecutive blocks of m bits.Therefore, we build a block diagonal affine mapping D A (resp.D B ) where the blocks are the mappings A 1 , . . ., A k (resp.B 1 , . . ., B k ), as well as the two affine mappings P and Q built as That way, we have that D Complexity of the algorithm.The first step is to compute all vector spaces V i .We can split this step into two parts.First, the computation of the output space O i .Note that our test only checks whether ∆ ∈ ∪ k j=1 V j , and this happens with probability k2 −m .Hence we need to try 2 m values for ∆ on average to determine all the k output spaces.Taking n − m + l elements in X leads to a probability of a false positive, i.e. rank(O i ) = n − m while ∆ activates all S-boxes, of 2 −ml for one value of ∆.The effective value of l will depend on the overall probability of failure that we wish to achieve for the whole algorithm and will be detailed below.Then computing the rank of O i can be done in (n − m + l) 2 n = O(n 3 ) operations.All in all, the computation of the output spaces The second part is to compute a basis of the input space V i which is of dimension n − m.To get each of those n − m vectors (minus ∆ 0 which we already know), we first remark that as above, the probability that a difference ∆ is valid is 2 −m , hence 2 m tries for ∆.Each value of ∆ will be tested using l values of x, leading to a probability of false positive of 2 −ml for one specific ∆.The parity-check matrix of O i can be computed at the same time as the rank computation, and thus adds no cost here.This matrix is of size m × n, therefore checking if one output difference belongs to O i costs about O(mn) operations.Therefore, using that n = km, the complexity of computing the basis of size n − m for each of the k spaces Computing all intersections of (k −1) vector spaces V i can be done in O(kn 3 ) operations using the algorithm in Appendix A. Then, we need to make k calls to the affine equivalence algorithm, which leads to a complexity of O(km 3 2 m ).All in all, the total complexity of our algorithm is As mentioned previously, the algorithm from [Din18] was designed for random permutations.This algorithm has a certain probability to fail, which is higher when the size of the Algorithm 1 Computing A and B. 1 Go back to line 2 else With probability 2 −m 8: Using a parity-check matrix of O i 13: end while 16: end if 17: end for 18: for each intersection Compute a m -bit to n -bit projection P j from F m 2 to I j 20: Compute a n -bit to m -bit projection S is a bijection over F m 2 which is affine equivalent to S

23:
Use the affine equivalence algorithm from Dinur to recover two affine mappings A j , B j of size m such that S = B j • S • A j 24: end for Block diagonal affine mapping with block size m 26: D B ← diag(B 1 , . . ., B k ) Block diagonal affine mapping with block size m 27: P ← That way, we have F = B • (S, . . ., S) • A S-box is low, or when the affine equivalence problem has multiple solutions, which is the case for the AES S-box since it is self-affine equivalent.This was mentioned by the author, along with a trick which could make the algorithm work on the AES S-box.However, we did implement this trick, along with further tweaking, and the algorithm would still fail for this specific choice of S-box.Hence, if the algorithm from Dinur fails, one would need to use the algorithm from Biryukov et al. [BCBP03], which raises the complexity to Distinct S-boxes.In the analysis so far, we have assumed that all k S-boxes are identical.
In Appendix B, we discuss how the previous algorithm can be adapted to handle the case m + 2 m mn 2 when using the improved affine equivalence algorithm from Dinur (cf.Appendix B).

Probability of failure
In Appendix C, we provide an analysis of the failure probability of Algorithm 1. Recall that the number of messages we use within the algorithm is parametrized by the value l.Intuitively, the probability of failure decreases with l.
In fact, as shown in Appendix C, the probability of failure can be approximated by: (k(n − m) + 1)2 m(1−l) .
As an example, for the Baek et al. proposal, the parameters are n = 256, m = 8 and k = 32.Hence, using only l = 5 messages, the failure probability is 2 −16 .In practice, failures are not a concern: in our experiments we set l = 5, and never encountered a failure.

Description of the White-Box Scheme by Baek et al.
Baek et al. provide a toolbox to break any white-box scheme in the CEJO framework [BCH16].Their results suggest that the main weakness in the previous proposals for white-box AES is the size of the internal state.Thus, they proposed to concatenate two AES instances, and encode them together in order to increase the size of the internal state (Fig. 2).We note that their proposal is a white-box scheme with external encodings.Baek et al. also showed that the cost of removing the non-linear encodings is lower than recovering the affine encodings, so they focused only on designing affine encodings.Let us recall the round function of AES, denoted as AES (r) , built from the four sub-steps AddRoundKey(ARK), SubBytes(SB), ShiftRows(SR) and MixColumns(MC): Thus, the encoded round function is the 256 -bit to 256 -bit mapping where A (r) are affine mappings on 256 bits.However, using a random affine mapping would result in some impractical tables since these mappings are of input size 256.Therefore, they proposed to build 32 tables from 16 bits to 256 bits for each round, using some structured affine mappings as follows: Let A r be an invertible linear map of dimension 256 over F 2 , and denote the (i, j)-th 8 × 8 block of A r by A r i,j , i, j = 0, . . ., 31.Then A r is built such that A r i,j is the zero matrix for all (i, j) = (i, i), (i, i + 1) and (31, 0).Finally, let a r = (a r 0 , . . ., a r 31 ) be a random 256 -bit vector, where each a r i is an 8 -bit block.Then we define the input encoding of the r-th round A (r) with: To generate the tables, we will merge A (r+1) −1 with the linear part of AES, that is, we define which is an affine mapping of size 256.Then, as depicted on Fig. 2 our encoded round function becomes , where K r is the r-th round key.The last round (r = 10) is slightly different and will be treated in a later part.

Table construction.
We split the linear part of M (r) into 32 linear blocks of size 256 × 8 M r i such that M (r) (x) = (M r 0 , . . ., M r 31 ) • x ⊕ m r where m r is a 256 -bit vector representing the affine part of M (r) .Also take 31 random 256 -bit vectors m r i , i = 0, . . ., 30 and m r 31 = m r ⊕ m r 0 ⊕ • • • ⊕ m r 30 .Then for i = 0, . . ., 31, we have the 16 -bit to 256 -bit tables F (r) i defined as: where AC a is defined as AC a (x) = x ⊕ a and the index are taken modulo 32 when necessary.Thus, one can evaluate the encoded round function F (r) as the sum of F (r) i : Therefore to implement our encoded round function F (r) , instead of having an unreasonable 256 -bit to 256 -bit table, we juste need to store 32 tables from 16 bits to 256 bits.However, the partial application ) is an 8 -bit to 256 -bit mapping which can be reduced to an 8 -bit bijection by applying a projection.Then it is affine equivalent to S, and one can efficiently recover the affine mappings with the affine equivalence algorithm described in [BCBP03] in about 2 25 operations.To prevent this weakness, Baek et al. proposed to replace where h (r) i is a random 8 -bit to 256 -bit function, and we get using the fact that the index are taken modulo 32.We will later see that this choice was not enough to hide the structure of F (r) i .
External encodings.Consider two random 256 -bit affine functions M in and M out .The external input encoding function is then defined by F (0) = A (1) −1 • M in , which is implemented with a 256 × 256 matrix and a 256 -bit vector.The external output encoding M out allows us to define the last encoded round function as (10) , AES (10) • A (10) , where AES (10) This function is then split into 32 tables T (10) i using the same technique as above.That way, we have Since one encoded round function is implemented with 32 tables from 16 bits to 256 bits, the memory required for each encoded round function is 32 × 2 16 × 256 bits = 64 MB, leading to 640MB for the full scheme with external encodings.In their paper, Baek et al. evaluate the security of this construction to 2 110 using their toolbox.However, as we will show in the next section, we are able to decrypt any message in ∼ 10 × 2 30 operations, and fully break this construction by recovering the key in ∼ 2 31 operations.

Cryptanalysis of the Scheme by Baek et al.
Baek et al. assessed the security level of their proposition to 110 bits.Recall that each encoded round function is of the form F = M • (S, . . ., S) • A where M and A are affine mappings.Therefore, our generic algorithm from Section 2 can be used to compute an equivalent round function F = M • (S, . . ., S) • A where A and M are known affine mappings, in about ∼ 2 34.6 operations.However, one can exploit the specific structure of the encodings to mount a more efficient dedicated attack on their scheme.We will first begin by giving a method of complexity ∼ 2 30 to recover a computationally easy to invert equivalent representation of one encoded round function.Next, we will show that instead of using this method 10 times (for each round function), we are able to fully break this scheme in ∼ 2 31 operations, that is, recovering the secret key used in the underlying AES as well as the external encodings M in and M out .

Building an Equivalent Representation of the Scheme
Let us consider one encoded round function and drop the exponent notation for the round as it is not relevant here, and also merge the key addition with the input affine encoding.Given an encoded round function F of the form M • (S, . . ., S) • A where M and A are secret affine mappings, and A has the structure depicted in (1), our goal is to provide a computationally easy to invert representation of F , that is, finding two equivalent affine mappings M and A such that F = M • (S, . . ., S) • A .In that case, inverting one round would only cost two inversions of 256 -bit affine mappings.Remember that the encoded round function is hidden in the tables T i (x, y) = F i (x, y) ⊕ h i (x) ⊕ h i+1 (y) where h i are random functions.

Reducing the Problem to Block Diagonal Input Encodings
Finding the input encoding can easily be done if this encoding is a block diagonal affine mapping where each block is of size 8.By applying an appropriate projection, one can obtain some 8 -bit bijections that are affine equivalent to the AES S-box.In that case, recovering the affine mappings used can be done in about 2 25 operations with the affine equivalence algorithm from [BCBP03].Because of the random mappings h i , one cannot use this algorithm directly on the tables in the Baek et al. proposal.However, we will show that we can decompose the secret input encoding A in A = B • A where: • B is a secret block diagonal affine mapping, built from blocks B i of size 8 × 8, • A is a known linear mapping which has the same structure as A (1).
Let us denote the 16 -bit to 8 -bit linear mapping ), which is unknown by the attacker.By construction, since we want the affine encodings to be invertible, we know that L i is of rank 8.If one is able to recover Ker L i , which is then a linear space over F 16 2 of dimension 16 − 8 = 8, then there exists an 8 × 8 invertible matrix i , where the linear mapping V i is built as (v 1 . . .v 16 ) with {v 1 , . . ., v 8 } a basis of Ker L i and {v 9 , . . ., v 16 } a completion of this basis.In that case, while the matrices B i are still unknown for the attacker and will form the block diagonal matrix B, one can build the matrix A from the 8 × 16 blocks (0 8 Id 8 ) • V −1 i .So now, we only need a way to compute Ker L i from the tables T i , which can be done using the following lemma.
A proof of Lemma 1 is provided in Appendix D. Note that the third point is a strict implication.Indeed, if one takes x ∈ Ker A i,i , one can easily see that for any y ∈ F 8 2 , the third equation holds while (x, y) is not necessarily in Ker L i .So to compute Ker L i , we first need to recover Ker A i,i and Ker A i,i+1 .
We can safely assume that if x / ∈ Ker A i,i , the function behaves like a random function and then is constant with overwhelmingly low probability.Therefore, by choosing any (a, b) ∈ F 8 2 × F 8 2 , one can check if x ∈ Ker A i,i by computing f x and checking whether or not f x is constant.Obviously, the same method can be applied to recover Ker A i,i+1 .
Once Ker A i,i and Ker A i,i+1 are recovered, one can recover the remaining elements (x, y) ∈ Ker L i with x / ∈ Ker A i,i and y / ∈ Ker A i,i+1 by using the third implication: if (x, y) / ∈ Ker L i , we can assume that the resulting value of the equation behaves like a random variable over F 8 2 and is then equal to 0 with probability 2 −8 .Therefore, one can check if (x, y) ∈ Ker L i by choosing a few4 values for (a, b) and checking if the equation stands for all these (a, b).Pseudo-code for this step is provided in Appendix F.
In that way, we can recover Ker L i in roughly ∼ 2 18 table lookups using the method described above and which is summarized in Algorithm 1.Since we need to repeat this operation 32 times, we end up with a complexity of ∼ 2 23 table lookups to decompose A into A = B • A.

Building an Equivalent Representation of the Round Function
At this point, our encoded round function is F = M • (S, . . ., S) • B • A where A is known and B is block diagonal, built with 8 × 8 affine mappings B 0 , . . ., B 31 , but is still secret.Our goal is to find an equivalent representation of the round function, that is, finding affine mappings M and B which behave like M and B in the sense that The idea is to find 32 affine mappings B i of size 8 to build B .Note that here, these B i will not necessarily be equal to B i , but we will see that we can then build M in a way that solves this problem.
Recall that we can evaluate the encoded round function F by summing over the tables T j .For x i ∈ F 8 2 , let consider the function Since B is block diagonal with blocks of size 8, only one S-box will be active, and so this function is a 8 -bit to 256 -bit mapping of the form H • S • B i where H is some affine function of size 8 × 256.Note that H, S and B i are all injective (at least) by construction.So we can compute this function and deduce an affine projection P such that P • H • S • B i is a bijection over F 8 2 .This bijection is then affine equivalent to the AES S-box, and we can use the affine equivalence algorithm from [BCBP03] to recover B i in ∼ 2 25 .
However, there are some self-equivalence relations on the AES S-box, which means there exist some5 affine mappings A 1 , A 2 of size 8 × 8 such that A 2 • S • A 1 = S. Therefore, the affine equivalence algorithm will not exactly recover B i , but one B i = A 1 • B i without knowing which A 1 is used.In our present case where we only want to provide an equivalent representation of the round function, this does not really matter.We can choose any candidate for each B i , and we will show how to build an affine mapping M to compensate the action of A 1 .
So we are looking at our equivalent round-function M • (S, . . ., S) • B • A, where B and A are known, but we still need to find M .The overall strategy for that is depicted in Fig. 3 and detailed below.As in the description of the scheme, let us split the linear part of M into (M 0 . . .M 31 ) where M i is of size 256 × 8. Algorithm 2 gives the procedure to compute M i .The idea is just to compute the image each vector of the canonical basis through M , which can be done using the fact that we fixed one candidate for each B i .
since we know B i 5: for each e j = (0 . . . 1 . . .0) ∈ F 8 2 do with a 1 at the j-th position 6: x j i ← B −1 i (S −1 (y j i )) 8: x j ← (0 . . .x j i . . .0) 9: j-th column of M i ← ∆z j since ∆y j = (0 . . .e j . . .0) 12: end for We can apply this method for all 32 blocks B i to recover the linear part of M .After that, to recover the affine translation m of M , we only need to compute and z = T i (x) for one x ∈ F 256 2 , then we can easily recover m since in that case z = z ⊕ m .
So we are able to provide a computationally easy to invert equivalent representation of the encoded round function as M • (S, . . ., S) • B • A. The complexity of building M and B is dominated by the 32 calls to the affine equivalence algorithm to get each B i , which lead to a complexity of about 32 × 2 25 = 2 30 , which is therefore the complexity of this whole 1-round attack.

Building an Equivalent Representation of the Scheme
Therefore, we can already provide an attack on the full 10-round scheme: indeed, we just need to apply the above method on each encoded round function F (r) .Note that the external encodings do not pose any problem here.For the external input encoding M in , recall that we know the affine mapping F (0) = A (0) −1 • M in .Using the previous technique, we are able to recover an equivalent representation F (1) of F (1) , such that F (1) = F (1) while F (1) is easy to invert.So since we then have F (1) • F (0) = F (1) • F (0) , we do not need to do anything about M in to provide an equivalent representation of the scheme.
All in all, we have built 10 easy to invert equivalent round-functions F (r) such that which is the original scheme.The cost for doing this is to repeat 10 times the 1-round attack, which gives us a complexity of 10 × 2 30 .While this is already practical, we only have an equivalent representation of the scheme, but we did not recover the key nor the encodings.

Recovering the Key
While we could just use the previous method 10 times on each encoded round function to provide an easy to invert representation of the full scheme, we can do better and fully break the scheme by recovering the key in a more efficient way by exploiting two consecutive rounds.
So let us start at the point where we decomposed one round into F = M • (S, . . ., S) • B • A, with A known and B an affine diagonal mapping.Recall that using the affine equivalence algorithm from [BCBP03] for each block does not give us exactly B i , but roughly 2 11 candidates B i .If we want to recover exactly the key and the encodings, we need to identify which candidate is exactly B i .Note that since we have 2 11 candidates for each of the 32 B i , we cannot exhaust them all.
To be able to quickly identify the correct candidate, one can first apply the previous method on two consecutive rounds.By doing so, we decompose these two rounds into

. , S) • C • A
where A, A are known and B, C are affine block diagonal mappings, which are still secret, but for which we know 2 11 candidates for each block B i and C i .
In that case, we can write F (r) as Since B is block diagonal, so is its inverse, and we know A and A. Our problem in then reduced to block diagonal input and output encodings for one encoded round function, with 2 11 candidates for each block B i and C i .We now need to recover which are the correct B i and C i .To do so, we will use a Meet-in-the-Middle approach depicted in Fig. 4 and detailed below.
The MixColumns operation of AES works on words of four bytes, so we restrict our work on four B i and the corresponding four C i that will be used as input of the same MixColumns operation.For example, as depicted in Fig. 4, we can first consider B 0 , B 1 , B 2 , B 3 and C 0 , C 5 , C 10 , C 15 which will be on the same MixColumns operation after the application of ShiftRows.For an easier understanding, we will describe our MITM method using these blocks, as it will be exactly the same for the other B i and C i .The detailed procedure is given in Algorithm 3.

Algorithm 3 Identifying correct blocks
1: x 0 , . . ., x m ← messages with byte x j 0 taking different values and x j i = 0 if i = 0 2: for each candidate for C 0 do 3: (∆z j 0 , ∆z j 1 , ∆z j 2 , ∆z j 3 ) ← MC.(∆w j 0 , 0, 0, 0) Store C 0 in a hash table T z indexed by ∆z 1 0 , . . ., ∆z m 0 6: end for 7: r) can be evaluated using the tables T j 8: ∆y j 0 ← y 0 0 ⊕ y j 0 9: for each candidate for B 0 do 10: We have the correct B i 20: end if 21: end for 22: Once we have all the correct B 0 , B 1 , B 2 , B 3 , we can use the same kind of computation to identify the correct remaining C i using messages with x j i taking different values and x j l constant for l = i We want to use the MITM to identify the correct (B 0 , C 0 ), for which we have 2 22 candidates in total.As we will search a match with ∆z 1 0 , . . ., ∆z m 0 where each ∆z j i is an 8 -bit value, taking m = 4 leads to a 32 -bit filter, which is enough to leave only the right candidates.Building the hash table costs ∼ 2 11 , and so does the matching step.Once B 0 and C 0 are recovered, we only need to go through all the candidates for the remaining B i and C i , which is done separately.Since we have 2 11 candidates for each of them, the total cost of this step is roughly 8 × 2 11 = 2 14 .Finally, we need to do this method on each of the 8 groups of 4 B i and 4 C i , leading to a complexity of ∼ 2 17 to recover B and C.

Extracting the Key
Note that the reason why we used the differences ∆z i instead of the values are because when we decomposed F (r+1) into M (r+1) • (S, . . ., S) • B • A, B contains the key in its affine translation: that is, we have The same phenomenon happens with C, which we recover as C(x) = C.x ⊕ (c ⊕ K (r) ).So when we need to use B i for the MITM, the affine translation will not be the good one, while the linear part is.However, once we recovered the correct B and C, we can use this Since the key schedule of AES is invertible, one can do this procedure on the first two rounds, given through the tables T (1) and T (2) .That way we can compute K (1) , which is the master key used in AES, from K (2) .This only leaves the external encodings to be recovered, which is an easy task now.We can recover exactly which affine translation was used for C for which we knew C(x) = C.x ⊕ (c ⊕ K (1) ).Then we can recover the first input encoding A (1) as A (1) (x) = C. A.x ⊕ c.Now recall that the external input encoding we knew was F (0) = A (1) −1 • M in .We recovered A (1) so it is easy to compute M in .
Recovering M out is not hard either, see Fig. 5.We know the key, meaning we can easily compute the two parallel AES, and we also know M in .So for any y ∈ F 256 2 , one can compute x such that y = (AES, AES) • M in (x), then z = M out (y) from x by using the tables.Therefore, we only need to do this for 257 values of y: the zero vector to get the affine translation of M out and then each of the 256 canonical basis vectors.
All in all, we recovered the key of the AES as well as both external encodings.The cost of doing this is dominated by the cost of the 64 calls to the affine equivalence algorithm to get some candidates for B i and C i , which leads to complexity of ∼ 2 31 .
In Appendix E, we consider a natural extension of the scheme by Baek et al., where more than two AES instances are encoded together.We show that our attack remains efficient even in that case.
Implementation.We implemented the attack in C++, relying on NTL [Sho01] for linear algebra.The total time to recover both the key and the external encodings is about 12 seconds, with roughly 10 seconds spent on the 64 affine equivalences, and using a negligible amount of memory.This was run on a Intel Core i7-6600U CPU @ 2.60GHz on a single core.Our implementation is available at http://yaawai.tk/.

Conclusion
In this article, we propose a generic algorithm to recover affine encodings for SPN ciphers, in the context of white-box schemes following the framework of Chow et al.More generally, our algorithm solves the affine equivalence problem in the special case where one of the two maps is composed of the parallel application of distinct S-boxes.We illustrate the efficiency of our attack on a white-box implementation of AES with external encodings proposed by Baek, Cheon and Hong, which was precisely designed to make a generic ASA approach out of computational reach.Nevertheless our generic attack breaks the scheme in 2 35 basic operations, compared to the assessment by its authors that 2 110 would be required.We then took a closer look at the Baek et al. scheme, and identified another attack vector, which reduces the attack to a simple standalone problem.This second approach results recovers the secret key in time complexity 2 31 .A full implementation of the attack confirms the complexity estimate.

A Computing all Intersections of k − 1 Subspaces among k Subspaces
In our algorithm from Section 2, we have k vector spaces V i ⊂ F n 2 of dimension n − m, and we need to compute each intersection of (k − 1) spaces V i .Let B i be the n × (n − m) matrix such that its columns are the vector of a basis of V i .In order to save computations, we begin by echelonizing each matrix (B i | I n ) where I n denote the identity matrix of size n.This leads to matrices with the following structure: where C i is a matrix of size (n − m) × n and D i is of size m × n.We note that a vector x belongs to V i if and only if it belongs to Ker D i .Therefore, with D the matrix built as In our case, we do not need the intersection of all V 1 , . . ., V k , but all the intersection of k − 1 spaces V i .To do so, instead of building D from all the matrices D i , one can build D from only k − 1 matrices D i , leading to the intersection i =j V i for each j = 1 . . .k.
The complexity of this whole computation is as follows.We first need to echelonize each (B i |I n ) on their first n − m columns.Note that this computation can be done at the same time as the line 10 in the previous algorithm: since we need to draw ∆ linearly independent from the previous computed vectors of V i , we can echelonize the basis of V i as we build it.Since B i is of size n × (n − m), the cost of doing this for each i is thus kn 2 (2n − m) = O(kn 3 ).Then we need to compute the kernel of the matrices D built from k − 1 matrices D i .Note that, those matrices being of size (k − 1)m × n, computing the k kernels needs about ((k − 1)m) 2 n = O(n 3 ) operations.However, by doing this in a clever way, one can avoid repeating the same computations and thus improve the constant hidden in the O() notation.First, denote by K i the kernel computed from the matrices D j with j = i, and Remark that computing K i with i = 1 (i.e.all kernels containing D 1 ) is the same as removing one block D j , j = 1 from D and echelonizing the resulting matrix.Thus by doing this naively, one would echelonize several times from the m rows of D 1 .So we want This matrix D can be used to compute all kernels containing D 1 by removing one of the block D j .Again, doing this naively would result in a lot of redundant echelonization on the rows of D 2 , therefore we repeat the previous procedure by echelonizing D on the rows of D 2 once for all, leading to a matrix D which we will use to compute all the kernels containing D 1 and D 2 .A summary of this procedure is depicted in the Figure 6, along with the complexity of each step in the tree.To be more precise about this complexity, let give a look at the operations we need to do on the i-th level of the tree.We need to compute the kernel K i from a (k − 1)m × n matrix which has already be echelonized on im rows, thus leading to a complexity of (k − 1 − i) 2 m 2 n operations.We also need to echelonize on m rows of a matrix of total size km × n, which is also already echelonized on im rows, which needs (k − i)m 2 n operations.Therefore, the total complexity of this way to compute the kernels is

C Probability of Failure for Algorithm 1
In this section, we study the probability of failure of our main algorithm, Algorithm 1.
In Algorithm 1, the number of messages we use is parametrized by the value l, and the probability of failure decreases with l.Failures in our algorithm stem e.g. from generating n − m + l output differences activating all S-boxes, and these output differences spanning a subspace of dimension n − m despite all S-boxes being active.Intuitively, it seems clear that the probability of such an event decreases exponentially with l.However the exact probability of a failure depends on the S-boxes under consideration, and more specifically, it depends on their differential distribution table.As a result, an exact analysis of the failure probability is quite complex.
In what follows, to keep the analysis in check, when a random input difference activates all S-boxes, we approximate output differences by uniformly random vectors.We submit that for cryptographic S-boxes, this is a reasonable approximation of reality as far as the dimension of the output space is concerned, which is what matters for our algorithm.Moreover, we have successfully run experiments (using the AES S-box, as well as random ones) to validate that failure probability behaves as expected.
During the computation of the O i 's.When we search the output space O i , we draw n − m + l random elements to test whether the output space is of dimension lower or equal to n − m.Here, a false positive would be a difference ∆ such that rank(O i ) = n − m while ∆ activates all S-boxes.Therefore the probability of a false positive at this step is upper-bounded by 2 −ml for one value of ∆.Since we do this step for about 2 m values of ∆, the probability that a false positive occurs in this step over all the algorithm is upper bounded by 2 m 2 −ml = 2 m(1−l) .
During the computation of the V i 's.For each value of ∆, we want to test whether F (x) ⊕ F (x ⊕ ∆) ∈ span(O i ) for l values of x.In that case, a false positive is a value of ∆ such that this test is verified while ∆ activates all S-boxes.Again, since dim(O i ) = n − m, the probability of a false positive of a specific value of ∆ is 2 −ml .We try about 2 m values of ∆ on average, and need to do this to find all the n − m basis vectors for each of the k spaces V i .So the probability of a false positive at this step is upper bounded by k(n − m)2 m(1−l) .
Overall failure probability.The probability of failure of our algorithm is upper-bounded by the sum of the two previous probabilities, which is to say:

E Using More AES Instances in Parallel
A natural question is whether the white-box scheme by Baek et al. could be made secure by increasing the number n of AES instances encoded in parallel.However in this section, we show that this is not the case, as the storage requirement of storing the actual white-box implementation quickly becomes limiting.More precisely, Table 7 shows the complexity of each step of our dedicated attack as n increases, together with the size of the corresponding white-box implementation.Recall that in this section, n denotes the number of parallel AES instances (rather than the total block size).
So the dominating cost comes from either the computation of the inverse of one encoding or the calls to the affine equivalence algorithm.For n ≤ 22, the affine equivalence is dominating and lead to a complexity of ∼ 2 35 for an implementation of size ∼ 64 GB when n = 22.Otherwise the inversion is dominating, and obtaining even a 60-bit security would need n = 2 13 parallel AES, which lead to an implementation of size ∼ 2 13 TB, which is definitely not realistic.
A = A • P and D B = Q • B and thus by taking A = D A • P −1 and B = Q −1 • D B , we have our equivalent function F = B • (S, . . ., S) • A .The whole algorithm is summarized as pseudo code in Algorithm 1

Figure 6 :
Figure 6: Efficient computation of the kernels i − 1) 2 m 2 n + (k − i)m 2 n = m 2 n k 3 + 2k − 1 3To compare, the naive way to do, i.e. computing each kernel independently, would lead to All in all, this adds a factor k in the overall complexity, which becomesO 2 m n 3 + 2 m ln 3 + n 4 m + k2 2m m 2 n = O 2 m ln 3 + n 4 m + 2 2m mn 2when using the algorithm from Biryukov et al., and O 2 m ln 3 + n 4 m + 2 m mn 2 when using the improved affine equivalence algorithm from Dinur.

Figure 7 :
Figure 7: Complexity of our attack and implementation size for n parallel AES instances.
al. [LRDM + 13].The previous attack also applies to the original scheme by Chow et al.; and another work by De Mulder et al. also provides improvement on the original BGE attack

table Figure
when using the algorithm from Biryukov et al., and O 2 m ln 3 + n 4 We have the correct B 0 andC 0 = T z [∆z 1 0 , . .., ∆z m 0 ]Once we have the correct C 0 , we know the correct values of ∆z j i , so we do not need any hash table 17: for each candidate for B i , i = 1, 2, 3 do