Rapidly Veriﬁable XMSS Signatures

. This work presents new speed records for XMSS (RFC 8391) signature veriﬁcation on embedded devices. For this we make use of a probabilistic method recently proposed by Perin, Zambonin, Martins, Custódio, and Martina (PZMCM) at ISCC 2018, that changes the XMSS signing algorithm to search for fast veriﬁable signatures. We improve the method, ensuring that the added signing cost for the search is independent of the message length. We provide a statistical analysis of the resulting veriﬁcation speed and support it by experiments. We present a record setting RFC compliant implementation of XMSS veriﬁcation on the ARM Cortex-M4. At a signing time of about one minute on a general purpose CPU, we create signatures that are veriﬁed about 1 . 44 times faster than traditionally generated signatures. Adding further implementation optimizations to the veriﬁcation algorithm we reduce veriﬁcation time by over a factor two from 13 . 85 million to 6 . 56 million cycles. In contrast to previous works, we provide a detailed security analysis of the resulting signature scheme under classical and quantum attacks that justiﬁes our selection of parameters. On the way, we ﬁll a gap in the security analysis of XMSS as described in RFC 8391 proving that the modiﬁed message hashing in the RFC does indeed mitigate multi-target attacks. This was not shown before and might be of independent interest.


Introduction
Digital signatures are the necessary means to establish message authentication in settings where establishing a shared key is not a viable option.In particular, a digital signature can be verified by an arbitrary number of people.This makes them the predominant choice for securing software distribution and updates, as well as applications like secure boot and certification of public keys.With the rise of the Internet of Things (IoT), digital signatures also have to be available on resource-constrained devices.In order to make digital signatures accessible to such small devices, it is important to minimize the resources required and optimize their speed.While the speed of all involved routines is relevant, in many applications verification speed is more crucial than signing time.As many signatures are generated once but are verified thousands of times, verification is potentially done much more often than signing.This generally holds for updating and secure boot as sketched above, and is especially relevant for IoT applications.For example, in the wireless  how hash functions are used to compute nodes in the tree or within the OTS.In this work we focus on XMSS, but we expect the results to translate to other schemes, especially LMS, with little to no changes since the results are independent of how nodes in the OTS or the tree are computed.
While some of the previous proposals for hash-based signatures differed in the OTS they use, all modern proposals settled for a form of the Winternitz OTS (WOTS) [27].For example, XMSS in RFC 8391 uses a scheme commonly referred to as WOTS + (which we follow, although it actually is WOTS-T [22]).We describe WOTS + in Section 2.
The PZMCM technique.We are not the first set out to answer the question of how to maximize the verification speed of XMSS signatures.Our work largely builds on a technique by Perin, Zambonin, Martins, Custódio, and Martina (PZMCM) [28].Instead of speeding-up the verification algorithm, PZMCM proposed to exploit the fact that while WOTS signing and verification times differ from message (digest) to message (digest), their sum is constant.More precisely, the number of hash function calls for generating a signature and afterwards verifying it always sum to the same value.To exploit this, they suggest to add a counter to the input of the message hash.Then they try T different counter values and pick the one that leads to the fastest to verify signatureamong the T candidates.This trade-off allows to compute signatures that require significantly less hash computations for signature verification than traditionally generated signatures, at the price of increased signing time.
The PZMCM technique perfectly fits our needs.However, we identify several shortcomings in the implementation and analysis of the technique.
1.The time required to search for the signature depends on the length of the message to be signed.Especially for (large) software packages this can pose a problem.2. PZMCM only analyze the security of the modified signature scheme under the assumption that the message hash is collision resistant, while XMSS explicitly avoids this assumption, aiming for collision-resilience as this allows to use shorter message digests.When choosing parameters according to the security analysis by PZMCM and preserving security, not only verfication speed but also signing and key generation speed would actually get worse and signature size would increase compared to regular XMSS.3. PZMCM do not provide a detailed analysis of the expected improvement in verification time for a given T .Their analysis is limited to experimental validation for small values of T and does not allow to estimate the impact of choosing larger values of T .4. While motivated by use cases in automotive, PZMCM does not provide an experimental evaluation of the impact of their method on actual embedded devices.Hence, the impact of their improvement might be significantly smaller.For example, verification time could be dominated by storage access times.
Contributions.We present a collection of modifications that, for example, achieve a factor two improvement of verification speed on an ARM Cortex-M4 at the cost of about one minute additional signing time on a general purpose CPU.At the same time, all our changes provably preserve security and RFC compliance.We achieve this by filling in the above gaps.
1. We modify the PZMCM technique for signature generation to make the added time independent of the message length.For this we exploit the iterative nature of most cryptographic hash functions.By precomputing and storing the internal state of the hash function after absorbing the message, the message only has to be processed once, instead of T times.For a 100KB message and T = 2 25 , this reduces the signing time from over 3 hours to 14 seconds on a general purpose CPU. 2. We give a detailed security analysis of the impact that the PZMCM technique has on the security of XMSS.We formally prove that as long as the used hash function behaves like a random function, security does not significantly degrade.More precisely, the XMSS parameters listed in the RFC still achieve the same level of security with PZMCM.As an intermediate step we complete the security proof of XMSS as described in the RFC.We give a tight bound for the complexity of generic attacks against the message hashing construction used in the RFC showing that the modification indeed prevents multi-target attacks.3. We present a statistical analysis of the speed-up provided by the PZMCM technique.
For this purpose we analyze the statistical distribution of the base-w encoding of a random message digest and determine its expectation.This allows to predict the expected speed-up also for larger values of T and thereby avoids the need for costly experiments when choosing the best trade-off for a given use-case.Our analysis makes some idealizing assumptions.We therefore support it with an experimental validation of relevant parameter sizes.4. We provide an implementation of XMSS verification on the ARM Cortex-M4 and present new speed records for XMSS signature verification.On the one hand, the speed-up is caused by the use of the PZMCM technique for signature generation.On the other hand, we implement a further well known optimization that reuses an intermediate state of the hash computation shared among all the hash computations in WOTS + .

Related work.
Since the introduction of XMSS in 2011 [6], there have been a number of works which also studied implementations of XMSS variants on embedded platforms.
In [18] a variant titled XMSS+ is presented together with an implementation for 16-bit smart cards.The authors of [21] look into implementation aspects of the stateless hashbased signature scheme SPHINCS on an embedded microprocessor.In order to provide a meaningful comparison, they present implementation results of XMSS MT (a multi-tree version [20] of XMSS that could be used to sign a virtually unlimited number of messages) on an ARM Cortex-M3.These variants of XMSS differ from XMSS as described in RFC 8391 as they do not implement the multi-target mitigation technique from [22] because they predate it.An implementation study of XMSS MT for the Java Card platform is provided in [31].The work gives a good motivation why Java Card might not be the preferable choice when implementing hash-based signatures and aiming for good performance.Finally, a recent concurrent work [9] presents the first XMSS implementation on the ARM Cortex-M4 platform.However, the aim of [9] differs from our work as it targets a comparison of XMSS and LMS on embedded devices.In this context the authors also analyze the impact of applying changes to the hashing constructions, recently proposed in [3] in the context of SPHINCS + .For the work at hand, we decided that all changes that are not RFC-compliant are out of scope as they will hinder fast adoption.
Organization.The remainder of this paper is organized as follows.Tweakable hash functions and WOTS + are introduced in Section 2. In Section 3 we introduce the modification to XMSS signature generation that enables the signature generation / verification trade-off of [28] as well as our optimization.Our security analysis of the resulting signature scheme under classical and quantum attacks is given in Section 4. We provide the statistical analysis of the algorithm in Section 5, for which experimental support is given in Section 6.
Lastly, in Section 7, we present the record setting RFC compliant implementation of XMSS verification on the ARM Cortex-M.

WOTS + and tweakable hash functions
Before introducting WOTS + , we briefly recall the notion of tweakable hash functions.

Tweakable hash functions
XMSS matured since its original publication and the scheme described in RFC 8391 actually is a variant introduced as XMSS-T in [22] with a slightly changed message hash.Hash-based signatures describe a graph structure in which nodes are computed using hash functions.The main difference between different XMSS variants, and between XMSS and schemes like LMS [26] or GMSS [7], is how hash functions are used to compute nodes while the structure is essentially identical.To unify the description of schemes, [3] introduced the abstraction of tweakable hash functions which we use in our description.For a security parameter n, a tweakable hash function Th k : {0, 1} n × {0, 1} 256 × {0, 1} kn → {0, 1} n takes as additional input besides a kn-bit message, an n-bit public parameter, and a tweak.
For XMSS the tweak is a 256 bit string representing an address which uniquely identifies the node in the graph structure of XMSS.The public parameter is a random value that is part of the public key.These additional inputs are used for domain separation of different hash function calls to mitigate multi-target attacks [22].For consistency with previous works, we follow [3] and use F in place of Th 1 .We always assume that the additional inputs are used when not explicitly stated.For further details and constructions see [3].

WOTS +
XMSS uses WOTS + [17] as OTS which we describe now in the context of XMSS.We roughly follow the description from [3].
Parameters.The security parameter n determines the message digest length m and influences the size of private key, public key and signature.The Winternitz parameter w can be used to control a trade-off between speed and signature size.A greater value of w implies a smaller signature, but slower speeds.Typically w is chosen as a power of 2 within {4, 16, 256}, as this allows for easy transformation of bit strings into base-w encoded strings.We further define An uncompressed WOTS + private key, public key, and signature consist of blocks of n bits each.
WOTS + key pair.The secret key of a WOTS + key pair is derived from a secret seed SK.seed that is part of the XMSS private key, combined with the address of the WOTS + key pair within the XMSS structure, using a pseudo-random function PRF.For each n-bit private key node, the corresponding public key node is derived by applying a tweakable hash function F iteratively (w − 1) times.The ouput of the last iteration is then set to be the public key node.This defines hash chains of length w each.The public key nodes are compressed into a single n bit node using a non-complete binary tree called L-tree.We refer to this single node as WOTS + public key.
WOTS + checksum, signature generation and verification.An m-bit message digest of a message M , H M can be re-written to its base-w representation.The result is a length . Each of these integers defines a chain length in the message (hash) chains.The checksum of H M is defined as which can be represented as a length 2 vector of base-w values C M = (c 1 , . . ., c 2 ), with c i ∈ [0, w − 1].We call these hash chains the checksum (hash) chains.This checksum is necessary to prevent message forgery: an increase in at least one h i leads to a decrease in at least one c i and vice-versa.Using these integers as chain lengths, the function F is applied to the private key elements.This leads to n-bit values that make up the signature.For a received message and signature, the verifier can recompute the checksum, derive the chain lengths, apply F iteratively to complete each chain to its full length w, and compute a candidate WOTS + public key.This can then be compared to the n-bit public key.

Algorithm for rapidly verifiable signatures
We now describe how to apply the techniques of PZMCM and storage of the internal state of the applied hash function to achieve rapidly veriable signatures.The resulting speed-up will be given in Section 7.

PZMCM Winternitz tuning
The cost of verification of a WOTS + signature is largely determined by the value of the 1 integers h 1 . . ., h 1 , as the number of hash computations necessary to complete the 1 message chains of a signature is Therefore signature verification cost decreases as the h i increase.The number of hashes required for verifying the remaining 2 checksum chains may increase as the h i grow.However, there are about a factor 10 less checksum chains than message chains for common parameters.
Using a good hash function to hash the message, these values behave like uniformly distributed.In [28], PZMCM propose a trade-off technique to get signatures with greater h i values to lower signature verification time.The idea is to search for a counter ctr ∈ [0, T ] such that the cumulative chain length corresponding to H ctr M ← H msg (ctr, M ) is maximized and consequently the signature verification time is reduced.This allows one to trade the additional effort of computing T iterations of H msg , as opposed to a single one during signature generation, for more efficient verification.While below we focus on analyzing powers of two, i.e., T = 2 t , this is not necessary.For example PZMCM give results for T ∈ {25, 200, 3500} (see [28,Table 2]), showing an improvement for T = 3500 of up to 25%, 33% and 42% for w = 16, w = 256 and w = 2 16 , respectively.As a side-effect, the bias towards larger h i results in a bias of the hash value H ctr M .Such behavior could potentially be exploited by an adversary.We analyze the impact on security in Section 4.

Tuning XMSS signatures
We now present how we incorporate this approach in XMSS.For the presentation we focus on the usage of the SHA-256 hash function since this is the only required hash function for usage in XMSS [19].To be consistent with the RFC, we also use n for message digest length in bytes in this section.The results carry over to using any other hash function.In practice this means one has to iterate the line byte of the signing algorithm in the RFC by appending a counter (see [19,Algorithm 12]).The counter can be included in different places of the algorithm, or even at different places of the above line.We choose to append a 64-bit counter ctr to the message: This has multiple advantages over inserting this at different locations in the digest computation.Firstly, this change is fully compatible with the RFC [19] and hence also compliant with the upcoming NIST special publication [10].(However, it is not transparent to a verifier as the counter has to be removed from the message after verification.) Secondly, appending the counter to the end of the input has an important benefit to performance, as it allows one to compute and store the internal state of the hash function after processing all but the last block of the input and only recompute the final block for the 2 t counter values.The size of the input in the original hash function call H msg (r || getRoot(SK) || (toByte(idx sig , n)), M )is 4 • n + M len bytes where M len = log 2 (M )/8 and n is the length in bytes of the message digest (e.g., n = 32 bytes for SHA-256).When adding the eight byte counter the input size increases to 4 • n + M len + 8 bytes.The internal blocksize for SHA-256 is 64 bytes: hence, the first 2 + (M len + 8)/64 blocks can be precomputed and only the final block with (part of) the message and the counter need to be recomputed.This approach is outlined in Algorithm 1, where line (1) of [19,Algorithm 12] is replaced by the lines in blue.Especially for larger messages, this improvement becomes very significant.See Table 7.1 for experimental results.Note that the original XMSS signing algorithm can be recovered by setting t = 0 and discarding the ctr (which will always be 0) appended in line 20 of Algorithm 1.
The adapted algorithm makes use of a number of external functions.Two calls are made to the standard SHA256 API functions • sha256_inc_blocks(s, in, b): processes b 512-bit blocks from the input "in", using and updating the context state s, • sha256_inc_finalize(s, in, b): works similar to sha256_inc_blocks, but also finalizes the hash computation.
Moreover, wots_getlengths(h) computes the sum of the lengths of the hash-chains from the hash-digest h and last_block(in) extracts the most significant (M len + 8) mod 512 bits of the input "in".
Finally we remark that on top of the iteration technique applied here, [28] also introduced a padding technique to reduce the verifier hashes even further.The idea here was to pad the unused bits in the checksum chains to ones (instead of the default zeroes), which resulted in a reduction of roughly w verifier hashes.For example, for w = 2 1 6 this would mean a reduction of 10%.However we want our algorithm to be RFC compliant, which checksum padding is not.Therefore we do not incorporate this in the implementation of Section 7.

Security
The authors of [28] already give a rough analysis of their proposal under the assumption that the used cryptographic hash function is collision resistant.In this section, we give a new precise analysis of the security of their proposal and analyze the cost of classical and quantum attacks against the scheme.This new analysis shows that at the same level of security one may use cryptographic hash functions with about half the output length compared to the analysis in [28].This translates to about a factor two speed-up and size improvement: for the same Winternitz parameter w the number of hash chains per key pair drops by about a factor two (only the checksum part shrinks by less).
In contrast to other schemes like RSA-PSS or ECDSA one can prove security of XMSS and its variants (incl.XMSS-T [22] and RFC 8391) with fixed length messages and without initial message hash.Hence, security of message hashing and fixed-length signature scheme can be analyzed independently for XMSS and its variants.We show that in all cases we obtain the bound on the security of the variable input-length scheme as the sum of the bounds for message hashing and fixed-length scheme.We then analyze the security of the different message hashing constructions for XMSS-type signatures.For this, we first formulate the security assumption on the hash function as a standard model property.
Then we analyze the complexity of generic attacks, providing a bound for black box attacks against random functions.
We start rephrasing the security proof of XMSS-T in this way.Then, building on this proof, we analyze the security of XMSS with hashing as described in Algorithm 1 above.As the latter builds on message hashing as in the RFC 8391 we obtain a (tight) security bound for that as a special case.This message hashing differs from that of XMSS-T [22] and was never formally analyzed.We prove that this modified message hashing indeed provides (almost) optimal security.Index-bound EUF-CMA.Hash-based signature schemes like XMSS-T are so called key-evolving signature schemes as introduced by Bellare and Miner in [1] and formalized e.g. in [6] with the additional property that a secret key update occurs after every signature: we call these simple KES (SKES).The number of updated keys that can be created for one SKES public key is an additional parameter p for key generation (e.g., for XMSS we have p = 2 h , where h is the height of the XMSS tree).After p updates, the key becomes ⊥.Given ⊥ as key, the signature algorithm fails.For a formal definition see Appendix A. What is relevant in the context of this work is that in a SKES a signature σ is accompanied by an index i and we require an extended security definition where a signature is only valid under the index with which it was produced.We define index-bound existential unforgeability under adaptive chosen message attacks (iEUF-CMA) using experiment Exp iEUF-CMA SKES (A) below for an adversary A that makes q s queries to its signing oracle Sign.While hidden for readability, the signing oracle Sign is assumed to replace the secret key with the updated secret key after every signature.The difference to the conventional EUF-CMA game is that there are two kinds of valid forgeries: Either a forgery is for a fresh message, never sent to Sign, (the conventional EUF-CMA case) or it is for a previously queried message but for an index different from the one used in the signature query.
1 be the query-answer pairs of Sign(sk, •) We denote the success probability of an adversary A against iEUF-CMA security of a key-evolving signature scheme KES that makes q s signature queries as

Hashing with M-eTCR-Hash
XMSS-T as proposed in [22] makes use of a multi-target extended-target-collision resistant (M-eTCR) hash function to compress the message.Given a hash function H : {0, 1} k × {0, 1} x → {0, 1} m and a fixed input-length SKES S with message space {0, 1} m we build a variable input-length SKES S = T eTCR [SKES, H] as follows: Below we relate the security of S' to the security of S and the M-eTCR security of H.The success probability of an adversary A against M-eTCR security makes use of a challenge oracle Box(•) which on input of the j-th message M j outputs a uniformly random function key R j : Now consider the following two algorithms that use a forger A against the iEUF-CMA security of S' as a black box to break the iEUF-CMA security of S and the M-eTCR security of H, respectively.Forger F A : Given a public key pk for S and access to the corresponding S-signing oracle Sign run A on input pk.Implement the S'-signing oracle Sign for A using Sign: Sample random R and return Sign(H(R, M )).When A outputs a S'-forgery (M, (i, R, σ)), output (H(R, M ), (i, σ)).
M-eTCR-adversary M A : Given access to a challenge oracle Box generate a S-keypair (pk, sk 0 ) ← S.gen(1 n , p).Run A on input pk.Simulate A's signing oracle using Box: Given the j-th query M j run R j ← Box(M j ), compute (j, σ) ← S.sign(sk j , H(R j , M j )).When A outputs a forgery (M, (i, R, σ)) output (R, M, i).
Note that the runtime of F A and M A are the same time as that of Exp iEUF-CMA S (A) assuming that their challengers run in the same time as honest challengers.Moreover, both make as many queries to their oracles as A makes to its oracle.
Theorem 1 (M-eTCR + SKES).For any adversary A against the iEUF-CMA security of S' we can instantiate the above algorithms F A and M A such that Proof.The event that A succeeds can be split into two mutually exclusive events: where M i is the message of the i-th signature query and R i is the randomness used to hash that message.
Now, whenever E 1 occurs, M A succeeds as A generated a collision for one of M A 's challenges.Consequently, we obtain Whenever E 2 occurs, F A succeeds as A's forgery against S' also leads to a valid forgery against S.So we have that A union bound gives the theorem statement.
In [22], it was shown that for a random function A is a quantum algorithm that makes q queries to its F-oracle.This bound is shown to be tight for m ≤ k, demonstrated by a matching attack in [22].For k < m we are not aware of such a matching attack.For a specific instantiation of H these results imply that no attacks exist that treat H as a black box and do better than above bounds.For parameter selection this bound says that to achieve b bits of security against quantum attackers, a message digest size of m = 2b + log p is necessary (m = b + log p for classical attackers).This is already significantly better than when using a collision resistant hash function as considered in [28] which requires m = 3b against quantum and m = 2b against classical attackers.

Hashing with index and counter
In our analysis we next looked at XMSS-T with the hashing as done in RFC 8391.The message hashing changed from XMSS-T to RFC 8391 [19, Section4.1.9]to prevent multitarget attacks, i.e., to avoid the factor p in the bounds given above.To this end, this construction used the signature index and root value in the user public key as additional input.The index works as domain separator between signatures under the same public key, the root value as domain separator between signatures under different public keys.Our analysis of this scheme can be found in Appendix B. However, the result can also be derived as a special case of the analysis below.
We now analyse the security of the message hashing as described in Section 3.For our security analysis we integrate the counter selection into a security property of the hash function and show that an adversary does not gain any advantage from this change in generic attacks.To this end we assume that we are given a hash function H : {0, 1} k × {0, 1} n × {0, 1} log qs × {0, 1} x × {0, 1} t → {0, 1} m and a fixed input-length SKES S with message space {0, 1} m which allows for the computation of a unique n-bit identifier id pk per public key1 .We integrate the index selection defining two functions cost and select cost .The function cost assigns a positive integer value to an output of H.The function select cost takes inputs R, id pk , i, M , computes cost(H(R, id pk , i, M, j)) for 0 ≤ j < 2 t and returns ctr such that cost(H(R, id pk , i, M, ctr)) is minimal.From this we build a variable input-length SKES S = T ctr [SKES, H]: Again, we will relate the security of S' to the security of S and the security of H.The security that is required from H is what we call M-eTCR with nonce and counter (cnM-eTCR).Besides adding two domain separators (index and public key identifier), the definition of cnM-eTCR adds the selection of a counter with respect to a cost function.Therefore it makes use of a slightly different challenge oracle Box cost (•) that on input of the j-th message M j outputs a uniformly random function key R j together with ctr j = select cost (R j , j, id, M j ): Now consider the following two algorithms that use a forger A against the iEUF-CMA security of S' as a black box to break the iEUF-CMA security of S, and the cnM-eTCR security of H, respectively.Forger F A : Given a public key pk for S and access to the corresponding S-signing oracle Sign run A on input pk.Compute id pk from pk.Implement the S'-signing oracle Sign for A using Sign: To answer the i-th query, sample random R, compute ctr ← select cost (R, id pk , i, M ), and return (i, R, ctr, Sign(H(R, id pk , i, M, ctr))).When A outputs a S'-forgery (M, (i, R, ctr, σ)), output (H(R, id pk , i, M, ctr), (i, σ)).
nM-eTCR-adversary M A : When initialized, generate a keypair (pk, sk) ← S.gen(1 n , p) for S, compute and output id pk .When called with id pk and access to a challenge oracle Box run A on input pk.Simulate A's signing oracle using Box: Given the j-th query M j run (R j , ctr j ) ← Box(M j ), compute (j, σ) ← S.sign(sk j , H(R j , id pk , j, M j , ctr j )), and return (j, R j , ctr j , σ).When A outputs a forgery (M, (i, R, ctr, σ)) output (R, M, ctr, i).
Note that the runtime of F A and M A is the same time as that of Exp iEUF-CMA S (A) assuming that their challengers run in the same time as honest challengers.Also, both make as many queries to their oracles as A makes to its oracle.
Theorem 2 (cnM-eTCR + SKES).For any adversary A against the iEUF-CMA security of S' we can instantiate algorithms F A and M A such that The proof is analogous to that of Theorem 1 above.The actual (small) difference is hidden in the new algorithms F A and M A .
The more interesting part of the analysis is the hash function property cnM-eTCR with regard to the complexity of generic attacks.This tells us how large the impact of the hash function modification is on security.For a random function F we prove the following.Theorem 3. Let F : {0, 1} k ×{0, 1} n ×{0, 1} log p ×{0, 1} x ×{0, 1} t → {0, 1} m be random over the set of all functions with that domain and range.Let A be an adversary that makes q queries to its F-oracle.
Setting t = 0 we obtain the case of RFC 8391.Moreover, it is worth noting that we do not have a q 2 p2 −m term anymore in the quantum case (pq2 −m for classical) compared to the M-eTCR security of a random function.This is the result of the added domain separation.For choosing post-quantum parameters this means that as long as p is far smaller than the number of queries needed for a successful attack, we are fine with a digest size of m = 2b for security level b (and m = b for classical).This justifies to chose the message digest length m = n to be equal to the output length of the internal hash function as done in the XMSS-RFC.This was not justified following the security analysis in [28] which requires m = 2n against classical and m = 1.5n against quantum attackers.
The proof of Theorem 3 uses the HRS-framework introduced in [22].On a high level, the idea is to use an attacker against the hash function to solve an average-case search problem (Lemma 3) for which known bounds exist (Lemma 1).The search problem is modeled as finding an input that maps to '1' for a boolean function f .For this, our reduction B generates a hash function H with the same domain as f that has a solution to nM-eTCR exactly where the '1' entries in f are.
As the cnM-eTCR game is interactive, i.e., the adversary A selects the target messages, B has to adaptively reprogram H while A already has access to H. We use a second reduction C and a hybrid argument to demonstrate that this reprogramming cannot change A's success probability by much (Lemma 3).This is done using a reduction from reprogramming a function in several positions at once for which a bound (Lemma 2) was implicitly proven in [22].The final bound is then obtained, plugging in the known bounds into Lemma 3.
The HRS-framework uses an average case search problem.The problem is defined in terms of the following distribution D λ over boolean functions.Using this distribution the average case search problem Avg-Search λ is the problem of finding an x such that f (x) = 1 given oracle access to f ← D λ .For any q-query quantum algorithm A Succ Avg-Search λ (A) := Pr

Definition 1 ([22]). Let
For this average case search problem HRS prove a quantum query bound.The result for classical algorithms is folklore.

Lemma 1 ([22]
).For any q-query algorithm A it holds that Another tool that we need is adaptive reprogramming.Consider the following two games.We are interested in bounding the maximum difference in A's behaviour between playing in one or the other game.
Game G 0i : After A selected id, it gets access to F. In phase 1, after making at most q 1 queries to F, A outputs a message M ∈ {0, 1} x .Then a random R ← R {0, 1} k is sampled, ctr ← select cost (R, id, i, M ) is computed and (R, ctr, F(R, id, i, M )) is handed to A. A continues to the second phase and makes at most q 2 queries.A outputs b ∈ {0, 1} at the end.
Game G 1i : After A selected id, it gets access to F. After making at most q 1 queries to F, A outputs a message M ∈ {0, 1} x .Then a random R ← R {0, 1} k is sampled as well as 2 t random range elements y j ← R {0, 1} m .Program F(R, id, i, M, j) = y j and call the new oracle F .Compute ctr ← select cost (R, id, i, M ) with respect to F .A receives (R, ctr, y = F (R, id, i, M, ctr)) and proceeds to the second phase.After making at most q 2 queries, A outputs b ∈ {0, 1} at the end.

We want to bound the advantage Adv
| of an adversary A to distinguish between these two games.In [22,Lemma 5] the quantum case is proven for a function H : {0, 1} k × {0, 1} x → {0, 1} m .Considering id, i, and ctr as part of the message, the lemma applies to F.Moreover, while the lemma in [22] only covers reprogramming the function in one position, its proof also covers reprogramming in 2 t positions and thereby to prove the following lemma.Lemma 2. For any q-query algorithm A it holds that for p ∈ N, i ∈ [0, p] The proof of [22] still applies for the following reason.It uses three intermediate games to get from G 0i to G 1i : In the first game R is sampled in the very beginning.In the second game, it replaces F R (the function resulting from F by fixing the first input to R) during the first phase by the constant zero function.In the third game, it programs F R in the second phase at position (id, i, M ).The step to G 1i is then to make F R during the first phase again a random function.Now, in our setting we change the third game to reprogram F R in 2 t positions.However, the distinguishing advantage of any adversary between the second and the original third game is 0 and remains 0 for the modified third game.The reason is that in both games, F R in the second phase is a fresh random function.The only difference is who is sampling the points of the function but that is transparent to the adversary.The remaining analysis stays the same.
For non-quantum A it is folklore to argue that this is simply the probability that A correctly guessed R in one of its q queries.As the final ingredient for the proof of Theorem 3, we need the following lemma.

Lemma 3. Let H as defined above be a family of random functions. Any (quantum) adversary
A that solves cnM-eTCR making q (quantum) queries to H and p to Box can be used to construct quantum adversaries B against Avg-Search 1 /2 m that makes no more than 2q + p2 t queries to its oracles and C distinguishing games G 0i , G 1i above that makes no more than 2q + 1 queries to its oracles such that For non-quantum adversaries A the number of queries are q + p2 t and q + 1, respectively.
Note that the reductions B and C described in the proof below only have to be quantum if A is quantum.Consequently, for classical A our reductions B and C are also classical.
Proof.The reduction B is shown in Figure 4.1.B makes use of several random functions (e and g).In [32], Zhandry showed that against a q query quantum adversary, random functions can be simulated using 2q-wise independent hash functions.In addition, we require that e : K × {0, 1} t → {0, 1} m for a fixed R ∈ K is collision free.Such a function can be simulated using a quantum secure pseudorandom permutation (qPRP) over {0, 1} m with key space K.Such qPRP exist if one-way functions exist [33].Moreover, it makes use of a function select cost : K → {0, 1} t that simulates the behavior of select cost .
2. Let select cost : K → {0, 1} t be the function that given K ∈ K returns ctr such that cost(e(K, ctr)) is minimal within {cost(w 4. Run A(1 n ), when it outputs id store it.
5. Run A(id) simulating Box.When A sends its ith query  whenever (M , R , i , c ) is a valid cnM-eTCR solution, H(R id i M c ) = e(id i , select cost (id i )), which for M = M i only is the case if f (R id i M c ) = 1.So, whenever A succeeds, also B succeeds.It remains to argue about A's success probability when run by B. To this end, we observe that H follows the uniform distribution over all functions with the same domain and co-domain as H: Per K every domain element maps to e(K, select cost (K)) with probability λ = 2 −m .Every other value is taken with probability ((2 m − 1)2 m )(1/(2 m − 1)) = 2 −m which corresponds to the probability of f not being 1 and sampling the value out of a set of 2 m − 1 values.This also holds for all intermediate versions generated by the reprogramming in Step 5 when treated as independent functions (we handle the dependency below) as reprogramming means that we re-sample a random position.The R i are sampled uniformly at random and hence also follow the distribution used in the cnM-eTCR game.Due to the use of select cost we ensure that the returned counter values also follow the right distribution.While we do exclude the possibility of collisions in the output of e for fixed K, this does not disturb the distribution as we implicitly consider the cases where collisions occur (by checking if f = 1 for any of the programmed values) but immediately abort with a success event in that case.
We further have to show that the re-programming in Step 5 does not change A's success probability by much.This can be shown by a sequence of game hops.Consider the games G j for 0 ≤ j ≤ p which are similar to B but only reprogram H for the first j queries to Box and leave it untouched for the remaining queries.
Given the above analysis, G 0 is perfectly simulating the cnM-eTCR game for A. Consequently, the probability that A succeeds when run by G 0 is Succ nM-eTCR H,p (A) for random H. On the other end, G p = B, so by the above analysis the success probability of A in G p is upper bounded by Succ Avg-Search1 /2 m (B).Now, the difference in success probability of A between any two consecutive games G j−1 , G j is upper bounded by Adv G0 j ,G1 j (C) for the following algorithm C. We construct a C that simulates G j−1 when run in G 0j and G j when run in G 1j to A. Given access to the first function F, C simulates G j−1 using F in place of the initial H constructed in Step 3.This means, C forwards all regular function queries to its F oracle but the ones for values where it reprogrammed during the first j − 1 calls to Box.Now, when C runs in G 0j , the outer game does not change F and consequently, this perfectly simulates G j−1 .If in turn C runs in G 1j , the outer game does reprogram F in one more position and consequently, this perfectly simulates G j .Now C simply outputs 1 whenever A succeeds and 0 otherwise.The final bound is obtained observing that there are p game hops.
Theorem 3 now follows from plugging the bounds of Lemmas 1 and 2 into the bound of Lemma 3.

Analysis
PZMCM [28] gives an experimental argument for the normal distribution of the hashchain values of Winternitz signatures when using Winternitz tuning.The parameters for this distribution need to be determined separately for each value of w and T to get estimates on the expected number of hashes for the verification of the resulting signature.In this section we formalize this analysis by analyzing the distribution of the hash chain values under the assumption that the hash function H used to create the message digest behaves like a random function F. This results in a closed formula that can be used to estimate the expected value for any value of w and T .This enables an implementer to choose signature parameters without running many experiments.In Section 6 we provide experimental support justifying this heuristic analysis.Below we denote by H M the m-bit message digest of an arbitrary length message M obtained by applying H msg as defined in Section 3 where we only make the inputs M and ctr explicit and assume the remaining inputs to be fixed.

Message chain length analysis
For an m-bit base-w message digest H M = (h 1 , . . ., h 1 ) we have the following Lemma.Lemma 4. Fix w and m as positive integers which define 1 = m /log(w) , and let X = ( 1 i=1 h i )/ 1 be a random variable; i.e. the mean of the integer base-w representation values of H M = (h 1 , . . ., h 1 ) ∼ U({0, 1} m ) where 0 ≤ h i < w for 1 ≤ i ≤ 1 , and U is the uniform distribution.Then the mean of X, denoted by µ(X), is equal to w−1 2 and the variance is equal to w 2 −1 12 1 .
Since the total number of hashes in the message chains is equal to (w−1) 1 , and subtracting the above quantity, the theorem follows.

Chain lengths checksum
So far we have only looked into the lengths of the 1 message chains.For the length of the 2 checksum chains, the challenge is that it is dependent on the values H M = (h 1 , . . ., h 1 ).On average, when values of the coefficients of H M are high, the checksum coefficients C M = (c 1 , . . ., c 2 ) (written in base-w) will be low.However, this is not always the case.
As in [4] we assume the computations of expectations are independent.
For analysis of total averages, Assumption 2 implies that the number of hashes as stated in Theorem 4 should be appended with the average values of the checksum chains.Hence, for the checksum coefficients C M = (c 1 , . . ., c 2 ) we obtain similar to the case for entries of Proof.This follows from the properties of the uniform discrete distribution.
A difference between this work and [4] is that we pick the best one with regard to the hash effort of the verifier out of T hashes, whereas in [4] fully independent signatures were analyzed.As we will also see in Section 6, for large values of T the independence assumption 2 no longer holds for analysis purposes.
As an alternative, one could assume that the maximum value is reached in all the checksum chains.This means assuming C M equals the all-zero vector for the verifying effort and the number of hashes to be computed is 2 (w − 1).This option should not impact the analysis too much since 2 1 .We discuss both options in more detail in Section 6.

Experimental verification
We now verify the estimates given in Section 5. We analyze the expected number of hashes for a verifier according to Section 5 and compare it to an experimentally determined minimal and maximal performance gain for the verification by appending 2 t counters in the message hash.All experiments are run with w ∈ {4, 16, 256}, conform with the approach as outlined in [19] and run on a single core of an AMD Ryzen Threadripper 1950X running at 3.4GHz.

Validation
We first validate Theorem 4 in practice.We compute 10 3 signatures for every combination w ∈ {4, 16, 256} and for each t ∈ {0, 1, . . ., 29, 30}.Hence, each signature generation computes T = 2 t different counter values in the hash computation and records the best achieved result in terms of the number of hash computations required to verify the signature when only considering the length of the message chains.The average best results over these 10 3 trials are plotted in Figure 6.1.We observe that the values indeed coincide with the estimate of Theorem 4 and therefore Assumption 1 seems to hold.However, for larger values of (w, t) the estimate becomes slightly optimistic.As explained in Section 5, we conjecture this is the case due to the chain lengths not being exactly distributed as ), but each chain having a bounded maximum.This causes the approximate chain-length distribution to take on larger extreme values than is possible in reality, and therefore the estimate is optimistic for large values of t.We conjecture that this effect is stronger for larger values of w, because the effect of one extremal chain-value is larger.
We perform a similar experiment where we include the checksum hash chains.The results are depicted in Figure 6.2.
Two estimates are presented in each graph.One where we assume the length of the checksum chains to behave according to Assumption 2 and takes its average as determined in Lemma 5 and one where we take the upper bound of w − 1 hash function calls for these chains.It can be observed that for small t values, the average estimate of Lemma 5 fits quite well.For these values, the conservative estimate of w − 1 hashes for each checksum chain is pessimistic on the effort reduction in signature verification; the average experimental number of hashes is strictly lower than taking the maximum.However for larger values of t one observes that, especially for w ∈ {16, 256}, the upper bound for the checksum chains lies closer to reality.We have found this is due to the violation of the independence assumption.This effect is directly caused by our algorithm adaptations; by choosing the signatures with high value hash chains and by construction of the checksum C M (see Section 2.2), we have that a high average value for all hash chains h i , 0 < i ≤ 1 , on average means a low average value for the checksum hash chains c i , However, this is not straightforward to analyze, as we see for w = 4.There adding the average estimate for the 2 checksum chains of Lemma 5 seems closer to reality.Experiments show that the probability of C M / ∈ [256, 512] is very small.In fact it did not occur once in 10 7 random trials.This means that even though the checksum (which determines the checksum chains) has 10 bits; 1) the first bit is always set to 0, because the checksum fits in 9 bits, 2) the second bit is almost always set to 0, because the probability that C M > 512 is very low, and 3) the fourth bit is almost always set to 1, because the probability that C M < 256 is very low.This means that for w = 4, which has 2 = 5, with high probability c 1 = 0 and c 2 ∈ {2, 3}.This inflexibility in the checksum means that the conservative estimate can never be reached and therefore for w = 4, Lemma 5 serves as the better estimate.

Expectation of Hashes in Signature Verification
We continue to analyze the expected minimum and maximum number of hash computations in signature verification given a fixed value of t.Although the expected value is a good indicator of the improvement trend, for practical implementations it is good to know what could be achieved in the best case result, but more importantly, also what could be the worst possible result of applying this signer/verifier trade-off.To this end, the boxplots of our experimental results are depicted in Figure 6.3.Not surprisingly, one observes outliers in practice.Note however that even for the worst case in 10 3 trials, the trend is downwards.
We now use this data to derive a heuristic upper and lower bound for the number of hash computations of the verifier, as a function of the number of signature computations 2 t of the signer.We see the results in Figure 6.4.We extrapolate the minimum and maximum values found in the experiments up to 5 ≤ t ≤ 30 by fitting an exponential function f (x) = a • e −bx + c over the values.We omit the first values to avoid precision errors caused by the initial steep decline.The resulting fit can be seen in the legend of the dash-dotted line.To get some extra confidence in our estimate, we run one trial for larger values of t (instead of the 10 3 trials for the remaining graph).For the resulting datapoints in Figure 6.4 for t = 33, 37, 40 we see that these fall within our estimates.However for tighter and more confident bounds, more data would need to be gathered.

Benchmark Results
The implementation used for both the signature generation on the high-end platform as well as signature verification on the embedded device is the reference implementation, which was released together with the RFC [19] 2 .We replaced only the SHA-256 implementation with the C-implementation also used in the embedded crypto benchmark platform pqm4 [24].For all benchmarks we used the XMSS parameter set known as "XMSS-SHA2_10_256" (where w = 16).We put the modified reference code of XMSS with all optimizations discussed in this paper into the public domain.It is available at https://huelsing.net/ code/RapidXMSS_code.zip and comes with no guarantee or warranty.

Signature Generation
The estimates from Section 5 have been shown to hold experimentally in Section 6.In this section the goal is to quantify the trade-off of the computational time from the signature verification to the signature generation more precisely.As in Section 6 the signature generation is run on a single core (out of the 16 cores) of an AMD Ryzen Threadripper 1950X running at 3.4GHz while the system is active with other tasks in order to simulate a typical system environment.Since these are typically long runs the timings are reported in seconds instead of clock cycles: the goal is not to be overly precise but give a ball-park figure how long one can expect signature generation to take.First let us investigate the practical impact when using the optimization described in Section 3. By precomputing the first 2 + (M len + 8)/64 blocks of SHA-256 one only has to compute the final block where the counter is included over and over again.When this optimization is used the time for a fixed large value of t becomes independent of the message size.For example, when t = 25 (so doing 2 25 SHA-256 computations per signature) computing an XMSS signature requires around 14 seconds irrespective of the message size.When this optimization is not applied the situation is quite different and summarized in Table 7.1.
Hence, for large messages of 100 KB precomputing the initial blocks results in almost three order of magnitudes speed-up.It is of interest to estimate how long the more efficient implementation will run for larger values of t.On our target platform a good estimate for t = 22 + δ for positive integer values of δ (values where signature generation takes longer than one second) is 1.8 • 2 δ seconds.Combining these estimates with those for the average verifier hashes of Theorem 4 (applying respectively the Lemma 4 and the maximum value w − 1 for the 2 chains) and the extrapolated minimum and maximum of Figure 6.4,we offer an implementer some support to choose their algorithm parameters.Practically, this means that for a given amount of time invested in generating a (firmware) signature, we can expect the signature verification speed-up in Table 7.2.
The comparative percentages here are with respect to the average number of verifier hashes in a one-shot signature verification, i.e., XMSS without using the PZMCM technique.We see that the largest jump in improvement is already reached by spending a few seconds on the computation of a firmware signature.Note that these computations are embarrassingly parallel and can be distributed over multiple cores.Moreover, it might be an interesting projectto see to what extend existing Bitcoin mining ASICS can be reused for these repeated hash computations.

Signature Verification
For the benchmark platform and representative embedded target platform we used the Freedom-K64F (FRDM-K64F) which is an ultra-low-cost development platform for Kinetis microcontrollers by NXP.More specifically, these low-power microcontrollers are based on an Arm Cortex-M4 core and have 256 kB RAM, 1 MB flash memory and run at 120 MHz.ARM provides on most Cortex-M3, M4 and M7 devices, including e.g. the NXP Kinetis or LPC devices, the Data Watchpoint and Trace (DWT) unit.The DWT is an optional debug unit that provides watchpoints, data tracing, and system profiling for the processor.It contains counters for, among others, clock cycles (CYCCNT).This makes it extremely simple to gather accurate cycle counts for portions of the code and we have used the DWT unit to collect the reported cycle counts in this section.The reference implementation is compiled with the flags -03 -mthumb -mcpu=cortex-m4 -mfloat-abi=hard using the arm-none-eabi-gcc cross-compiler version 8.3.1 using the MCUXpresso IDE. 3ne of the optimizations discussed in [9] is concerned with precomputing some of the hash computations.Let us assume one uses XMSS with SHA-256, then for a fixed key pair the first 512-bit input to the pseudo-random function is the same for all calls.Since the internal block size of SHA-256 is also 512 bits, this can be precomputed and reused: halving the number of total calls to the SHA-256 compression function.This optimization is fully compatible with the RFC [19] and is not applied in the accompanying reference implementation.We have implemented this approach and denote this with "precomp'.
In order to benchmark the impact of the iterated hash technique we measure the average signature verification time using both the pre-hash and the techniques from Section 3. From Theorem 4, using w = 16, t = 10, α = π/8, 1 = 64, the expected number of required hashes is 360.4.For analysis purposes we can assume the 2 -chains to be independent and uniformly distributed.Therefore from Lemma 5 we obtain using 2 = 3 that the mean value for the checksum is 45/2; hence, the expected number of hashes is 382.9.This is 1.31 times faster than the (64+3)7.5 = 502.5 hashes one expects when not using this technique (t = 0).When looking at the experimental data from Section 6 this ratio remains the same; 391.8 and 508.4 hashes for t = 10 and t = 0, respectively.The following performance numbers in millions of cycles are the average over a hundred signature generation / verification runs.Precomputing the hash inputs as remarked by [9] leads to a factor 1.47 speed-up.Using t = 10, which means signature generation time of around 40ms, results in a speed-up factor of 1.21.Combining both leads a reduction of the signature verification time of a factor 1.76 compared to the reference implementation.Obviously one can increase the value of t to improve these figures.Let us assume the signer can afford to spent one minute on a single core on signature generation (t = 27): then the verifier can expect more than a factor 1.44 and 2.11 speed-up with and without precomputing hash inputs respectively, and a signature verification time of well below 7 million cycles.
Remark 1.The second term of the bound in Theorem 6 originates from the application of adaptive reprogramming.While the bound is tight for the classical setting, we conjecture that it is extremely loose in the quantum setting.Indeed, our reduction is tight, but we conjecture that the bound in Lemma 6 is not.We would assume that a tight quantum bound for reprogramming would be close to the classical bound as it seems to be related to random guessing for which quantum computations do not provide an advantage.
The bound in Theorem 6 is nevertheless interesting because it justifies the use of a message digest length m = 2b for a targeted security level b in the post-quantum setting that is independent of the number of targets p.This is optimal for the given problem as an attack using Grover can reach this bound.For XMSS-style signatures this is highly relevant as the length of the message digest (here m) largely influences the number of hash values in a Winternitz signature.The bound is still not optimal with regard to the required length k of the randomizer which has to be chosen as k ≥ 2b + log p in the post-quantum setting.However, the impact of this non-tightness is less severe as it only increases the size of one value in the signature.
The HRS-framework uses an average case search problem.The problem is defined in terms of the following distribution D λ over boolean functions.For this average case search problem HRS prove a quantum query bound.The result for classical algorithms is folklore.
Lemma 1 ([22]).For any q-query algorithm A it holds that Succ Avg-Search λ (A) ≤ λ(q + 1) , if A is a classical algorithm 8λ(q + 1) 2 , if A is a quantum algorithm Another tool that we need is adaptive reprogramming.Consider the following two games.We are interested in bounding the maximum difference in A's behaviour between playing in one or the other game.
Game G 0,i : After A selected id, it gets access to F. In phase 1, after making at most q 1 queries to F, A outputs a message M ∈ {0, 1} x .Then a random R ← R {0, 1} k is sampled and (R, F(R, id, i, M )) is handed to A. A continues to the second phase and makes at most q 2 queries.A outputs b ∈ {0, 1} at the end.
Game G 1,i : After A selected id, it gets access to F. After making at most q 1 queries to F, A outputs a message M ∈ {0, 1} x .Then a random R ← R {0, 1} k is sampled as well as a random range element y ← R {0, 1} m .Program F(R, id, i, M ) = y and call the new oracle F .A receives (R, y = F (R, id, i, M )) and proceeds to the second phase.After making at most q 2 queries, A outputs b ∈ {0, 1} at the end.
3. Run A(1 n ), when it outputs id store it.Now, the difference in success probability of A between any two consecutive games G j−1 , G j is upper bounded by Adv G0 j ,G1 j (C) for the following algorithm C. We construct a C that simulates G j−1 when run in G 0j and G j when run in G 1j to A. Given access to the first function F, C simulates G j−1 using F in place of the initial H constructed in Step 2. This means, C forwards all regular function queries to its F oracle but the ones for values where it reprogrammed during the first j − 1 calls to Box.Now, when C runs in G 0j , the outer game does not change F and consequently, this perfectly simulates G j−1 .If in turn C runs in G 1j , the outer game does reprogram F in one more position and consequently, this perfectly simulates G j .Now C simply outputs 1 whenever A succeeds and 0 otherwise.
The final bound is obtained observing that there are p game hops. pk

Figure 1 . 1 :
Figure 1.1:The authentication path to authenticate the fifth leaf is shown in gray.

Figure 4 . 1 :
Figure 4.1: Reducing Avg-Search to cnM-eTCR.We now analyze the success probability of B. Per construction, whenever(M , R , i , c ) is a valid cnM-eTCR solution, H(R id i M c ) = e(id i , select cost (id i )), which for M = M i only is the case if f (R id i M c ) = 1.So, whenever A succeeds, also B succeeds.It remains to argue about A's success probability when run by B. To this end, we observe that H follows the uniform distribution over all functions with the same domain and co-domain as H: Per K every domain element maps to e(K, select cost (K)) with probability λ = 2 −m .Every other value is taken with probability ((2 m − 1)2 m )(1/(2 m − 1)) = 2 −m which corresponds to the probability of f not being 1 and sampling the value out of a set of 2 m − 1 values.This also holds for all intermediate versions generated by the reprogramming in Step 5 when treated as independent functions (we handle the dependency below) as reprogramming means that we re-sample a random position.The R i are sampled uniformly at random and hence also follow the distribution used in the cnM-eTCR game.Due to the use of select cost we ensure that the returned counter values also follow the right

Lemma 5 .
Let Y = 2 i=1 c i be a random variable, i.e. the sum of the checksum values of H M .Then the mean µ(Y ) is equal to 2 (w −1)/2 and the variance is equal to 2 (w 2 −1)/12.

Figure 6 . 1 :
Figure 6.1:Average number of hash computations for signature verification of the first 1 message chains as a function of t.Solid blue line: average over 10 3 experiments.Dashed red line: estimate of Theorem 4.

Figure 6 . 2 :
Figure 6.2:Average number of hash computations for signature verification of all message chains as a function of t.Solid blue line: average over 10 3 experiments.Dashed red line: estimate of Theorem 4 for 1 + mean of Lemma 4 for each 2 checksum chain.Dash-dotted magenta line: estimate of Theorem 4 for 1 + maximum value w − 1 for each 2 checksum chain.

Figure 6 . 3 :
Figure 6.3: Number of hashes for the verifier in the chains, after taking the highest cumulative hash chain value out of 2 t appended counters.Boxplot is over 10 3 trials for each value of t.The box represents the 50% confidence interval (i.e.datapoints between the first and third quartile), with the yellow line the median.The whiskers of the boxplot represent the 95% confidence interval.The dots represent the outliers.

Figure 6 . 4 :
Figure 6.4: Number of hashes for the verifier in the chains, after taking the highest cumulative hash chain value out of 2 t appended counters.Blue resp.red line: minimum resp.maximum cumulative hash chain value over 10 3 experiments for each value of t.Blue resp.red dash-dotted line: extrapolated fit of a • exp(−bx) + c over the experimental data for the minimum resp.maximum.Green dots represent three single experiments for t ∈ {33, 37, 40}.

4 .
Run A(id) simulating Box.When A sends its ith query M i ∈ {0, 1} x : (a) Sample R i ← R {0, 1} k .(b) If f (R i id i M i ) = 1 output R i id i M i and stop.(c) Program H(R i , id, i, M i ) = e(id i).(d) Return R i . 5. When A outputs (M , R , i ) output (R id i M ).

Table 7 . 1 :
Signature generation time for a message hash with 225different counter values, with and without precomputing the first 2 + M len + 8/64 SHA-256 blocks.

Table 7 . 2 :
Trade-off between signature generation time and verification for message hashing with 2 t different counter values: improvement compared to standard XMSS are displayed in italics.