ES-TRNG: A High-throughput, Low-area True Random Number Generator based on Edge Sampling

. In this paper we present a novel true random number generator based on high-precision edge sampling. We use two novel techniques to increase the throughput and reduce the area of the proposed randomness source: variable-precision phase encoding and repetitive sampling. The ﬁrst technique consists of encoding the oscillator phase with high precision in the regions around the signal edges and with low precision everywhere else. This technique results in a compact implementation at the expense of reduced entropy in some samples. The second technique consists of repeating the sampling at high frequency until the phase region encoded with high precision is captured. This technique ensures that only the high-entropy bits are sent to the output. The combination of the two proposed techniques results in a secure TRNG, which suits both ASIC and FPGA implementations. The core part of the proposed generator is implemented with 10 look-up tables (LUTs) and 5 ﬂip-ﬂops (FFs) of a Xilinx Spartan-6 FPGA, and achieves a throughput of 1 . 15 Mbps with 0 . 997 bits of Shannon entropy. On Intel Cyclone V FPGAs, this implementation uses 10 LUTs and 6 FFs, and achieves a throughput of 1 . 07 Mbps . This TRNG design is supported by a stochastic model and a formal security evaluation.


Introduction
True Random Number Generators (TRNGs) are essential building blocks of modern embedded security systems.They enable various cryptographic algorithms, protocols and secured implementations by providing secret keys, initialization vectors, random challenges and masks.The security of these applications relies on the uniformity and unpredictability of the utilized random numbers.A cause of failure in today's security systems is often traced to a design flaw or an active attack on the used TRNG [Mar03, BCC + 13] rather than to a broken cryptographic algorithm or an unprotected implementation.
True randomness cannot be obtained via computational methods.Instead, physical phenomena such as noise in electronic devices should be the source of the unpredictable nature of TRNGs.Due to their importance for security, TRNGs are subjected to strict evaluations in the process of industrial certification.The SP 800-90B [TBK + 18], a special publication of the national institute for standards and technology (NIST), contains requirements for the design and evaluation of TRNGs.According to this document, a theoretical rationale for the unpredictable behavior of the entropy source is required.A minentropy estimation of the generated output and effective online tests are also mandatory.The German BSI standard called AIS-31 [KS11] put forward a stricter requirement for the design and the evaluation of TRNGs.This standard requires a formal security analysis of the TRNG design based on a stochastic model of the entropy source.The generated random bits need to provide a Shannon-entropy level of at least 0.997/bit.
Examples of TRNGs in literature include Intel's embedded RNG core called µRNG [MJS + 16], ASIC designs [PSR06] and FPGA implementations (e.g.[CFFA13,MKD11]).Several types of physical phenomena are explored to design an entropy source for TRNGs, including thermal noise [BPPT06], chaos [Gol06], timing jitter [SMS07, SPV06, WT08, RYDV15] and metastability [VHKK08, DGH09, GP09, VD10].Many of these TRNGs are designed without accounting for AIS-31 compliance, so they are not supported by stochastic models and formal security analysis.In addition, some designs use unrealistic assumptions of the platform parameters (e.g.overestimated strength of the jitter) thereby resulting in an overly optimistic entropy assessment.
Our goal is to develop a lightweight, AIS-31 compliant TRNG with conservative entropy estimation.The contributions of this paper are the following:

TRNG design procedure
Various design criteria need to be taken into account when designing a TRNG.Conventional design goals include resource consumption, throughput, latency, feasibility in the target platform and design effort.An ongoing trend started more than a decade ago towards following security criteria during the TRNG design procedure.Some essential requirements are resistance against attacks, a stochastic model to prove unpredictability, and inner testability of the entropy source.The indispensable design criterion is a security assessment based on a realistic and applicable stochastic model.
Figure 1 shows both the old and the modern TRNG design procedure.An old and obsolete design approach relies on statistical evaluations on the output of the proposed TRNG.Most   140-1 [FIP94] and DIEHARD [Mar98].However, these test suites are only checking statistical parameters of the generated bits.A completely deterministic pseudo-random sequence could pass those test suites, despite having no randomness.
In a modern design approach, a stochastic model is used to evaluate the unpredictability of the proposed TRNG design and, furthermore, is a compulsory requirement of BSI AIS-31 [KS11].Statistical test suites only function as a sanity check or a prototype evaluation.The non-determinism of a TRNG should be rooted in an unpredictable physical process that evolves over time.A notable example is the timing phase jitter in a free-running ring oscillator.A stochastic model is a simplified abstraction of this process based on clearly stated and verifiable assumptions.Platform and design parameters serve as inputs to a stochastic model to enable the entropy estimation of a TRNG.Platform parameters, such as the delays of logic gates and the jitter strength, should be evaluated for the target platform.Design parameters, such as the sampling frequency, should be determined according to the design requirements.
Figure 2 shows a generic architecture of a TRNG.The entropy source is the only component in the architecture with non-deterministic behavior.All randomness is generated by the entropy source, in some cases in the form of analog signals.A digitization module is needed when converting these analog signals into a digital form.Implementations of the entropy source and digitization module comprise the Digital Noise Source.
The raw random numbers are the output of the digital noise source and should be available for inner testability.Raw random numbers are usually subject to statistical defects, such as the bias from an ideal probability of ones and the auto-correlation between output bits.
The post-processing module is utilized to enhance statistical and security characteristics of the TRNG.There are two types of post-processing: the algorithmic post-processing and cryptographic post-processing.The algorithmic post-processing is utilized to extract entropy from the raw random numbers to increase the entropy per bit.The cryptographic post-processing is used to provide additional security properties, such as the backtracking resistance.We note that, according to AIS-31 [KS11], there is a minimal requirement of entropy level on the input data of cryptographic post-processing.The output data of the post-processing function are referred to as the internal random numbers.The post-processing is optional and not required if the entropy of the raw random numbers is sufficient.
The online tests, also called embedded tests or continuous tests, are used to detect

Notation and definitions
• P r(a) -The probability of the event a.
• E(X) -The expected value of the random variable X.
• σ 2 (X) -The variance of the random variable X.
• σ(X) -The standard deviation of the random variable X.
• ρ X|a (x) -The conditional probability density of the random variable X defined over the domain R, given that the event a occurred .
• H 1 (X) -The Shannon entropy of a discrete random variable X with outcomes 0 and 1.If p denotes the probability P r(X = 1), then the Shannon entropy is computed as follows: • H ∞ (X) -The min-entropy of a discrete random variable X with outcomes 0 and 1.If p denotes the probability max i∈{0,1} (P r(X = i)), then the min-entropy is computed as follows: (5) mutually disjoint real intervals, an integral of function f (x) over S is the sum of integrals of f (x) over all intervals: • Given a set of discrete distributions F (Θ) defined over the same domain and parameterized by Θ which is distributed according to the distribution G (referred to as the mixing distribution), then the compound distribution J of F (Θ) and G is given by integrating parameter Θ against the probability density ρ G of the mixing distribution where the integration is done across the the whole domain of G and i denotes an arbitrary element from the domain of F (Θ).
• The probability distribution of the sum of two independent random variables is equal to the convolution of their distributions.If f (x) and g(x) are the probability distributions of two independent random variables, their convolution is:

ES-TRNG Architecture
In this section, we present the architecture of ES-TRNG and the design rationale over conventional design criteria.These design criteria include: a compact implementation, a reasonably high throughput, feasibility on various implementation platforms and a low engineering effort.Our ES-TRNG architecture is based on high-precision edge sampling.The randomness source of the ES-TRNG is the timing phase jitter from a free-running ring oscillator.Two novel techniques are used to improve the throughput and reduce the resource consumption.The first technique is called variable-precision phase encoding.By using the selective high-precision sampling process, this technique enables a compact implementation and a short jitter accumulation time.The second technique is repetitive sampling, which allows multiple sampling within a single system clock cycle.Due to the repetitive sampling, ES-TRNG can obtain a higher throughput.By using a fully digital architecture and not relying on any technology specific components, we obtain a design feasible on a wide range of implementation platforms.The architecture of the digital noise source is shown in the left part of Figure 3.The entropy source denoted by RO1 is implemented as a free-running ring oscillator with an enable signal.The average period of RO1 is denoted by T 01 .The output signal of RO1 propagates through the Digitization module.
The digitization module consists of three main components, namely, a tapped delay chain, a sampling free-running ring oscillator RO2 and a bit extractor.The average period of RO2 is denoted by T 02 .The used tapped delay chain consists of four cascaded delay  elements.The first and last delay elements are used as isolation buffers to provide similar input and output loads for the two delay elements in the middle.These middle elements form a two-stage delay chain.The 0 → 1 delay and the 1 → 0 delay of the first stage in the tapped delay chain are denoted by t r,1 and t f,1 respectively.Similarly, notations t r,2 and t f,2 are used for the rising delay and the falling delay of the second stage.The output of this two-stage tapped delay chain is sampled using three FFs (Flip-Flops) at the rising edge of the output signal of RO2.
As depicted on the right part of Figure 3, the sampled three-bit signal Stage[2 : 0] is processed using the Bit Extractor.According to the truth table in Figure 3, the bit extractor encodes the three-bit input Stage[2 : 0] to a single data bit called Raw bit and a strobe signal Valid.In the notation used in this truth table, N stands for non-valid value which is not used to generate output bits.
The lower part of Figure 3 provides the architecture of the bit extractor.This module has a compact implementation using a single LUT and two FFs on modern FPGAs.The simplicity of the bit extractor also results in a low latency which enables correct operation at high clock frequencies.

Platform and design parameters
Table 1 summarizes the relevant parameters.Platform parameters are those parameters that are specific to an implementation platform and that cannot be changed by the designer.Their values have to be obtained experimentally.Parameter σ 2 m /t m reflects the strength of the white noise in the timing jitter.Parameters t r/f,1/2 are properties of hardware primitives on FPGA fabric.Design parameters of the proposed TRNG are the periods T 01 and T 02 of RO1 and RO2, the system clock frequency, and the jitter accumulation time.

Two novel techniques
In order to achieve a compact implementation with high throughput, the proposed design utilizes two novel techniques, namely variable-precision phase encoding and repetitive Table 1: Platform and design parameters.
Platform parameters.t r,1 The 0 → 1 delay of the 1 st stage in the tapped delay chain t r,2 The 0 → 1 delay of the 2 nd stage in the tapped delay chain t f,1 The 1 → 0 delay of the 1 st stage in the tapped delay chain t f,2 The 1 → 0 delay of the 2 nd stage in the tapped delay chain D The duty cycle of the free-running RO1 The variance of a white-noise jitter accumulated during the measurement time t m Design parameters.T 01 The average free-running RO1 period T 02 The period of the sampling clock RO2 T CLK The period of the system clock t A Jitter accumulation time

Low-precision region
Low-precision region sampling.The central idea is to repeat the sampling and ignore all samples from the low-precision region (the region where values 000 and 111 are sampled in the tapped delay chain).

Variable-precision phase encoding
The variable-precision phase encoding technique is shown in Figure 4.This technique is enabled by using both the tapped delay chain and the bit extractor.The bottom part of the figure shows a single period of RO1.Symbol ξ(t) denotes the normalized time in the unit of T 01 .The phase of the oscillator RO1, denoted by ξ(t) − ξ(t) , changes periodically increasing from 0 to 1 within a single period.The position of the captured edge is encoded into a raw bit.Since the delays t r/f,1 and t r/f,2 of the tapped delay chain are much smaller than the oscillation period T 01 , the digitization module captures the oscillator phase with high precision around signal edges when the phase value is around 0 or D. This region is called the high precision region.The samples from this region are encoded by either 0 or 1 depending on the phase.In the remainder of the cycle, the edge is captured with low precision, i.e.only the correct half-period can be determined from the captured data.This region is called the low precision region.The samples from this region are not used (encoded as N ).
The sampling of the delay chain is triggered by the rising edge of the signal RO2 out.Due to the accumulated timing jitter, the relative sampling position follows a Gaussian distribution.Increased accumulation time t A leads to a wider jitter distribution.The expected value of the sampling time is determined by t A , the number of ignored lowprecision samples and the initial phase offset between RO1 out and RO2 out.This expected value (the center of the Gaussian distribution) can appear at any position, as indicated at the top of the figure.We note that using a delay line with more than two elements would extend a region that is sampled with high precision, at the cost of increased area and energy consumption.A design using the precise sampling for all phase values is presented in [RYDV15].

Repetitive sampling
The second technique used in the proposed design is called repetitive sampling.The proposed digitization module has a small critical path, which enables the digital noise source operating at a higher frequency than other components.Repetitive sampling is synchronized to the high frequency signal RO2 out, aiming to reduce the time needed to hit the high-precision region, thereby improving the throughput.Once the high-precision region is hit, a Valid signal is generated.
The timing diagram of the digital noise source is depicted in Figure 5 to illustrate the repetitive sampling.The entropy source RO1 is reset before generating a bit and then enabled for a period of time t A .The parameter t A is chosen to allow accumulating enough jitter at the entropy source.After time t A has passed, the sampling oscillator RO2 is enabled.The RO1 out signal edge propagating through the tapped delay chain is sampled using RO2 out.This sampling can be repeated multiple times within a single system clock cycle, because the frequency of a free-running RO2 can be higher than the system clock frequency available on FPGAs.Once the tapped delay chain stages Stage[2 : 0] reach the high-precision region, the Valid signal will be set, the raw bit will be encoded and RO1 and RO2 will be reset.

Security Analysis
The formal security analysis provides the theoretical guarantees for the quality of the generated bits.These guarantees are achieved by developing an entropy estimator based on the stochastic model of the TRNG.The conservative entropy estimator provides a lower bound on the entropy based on the design parameters and the platform parameters.This estimator is used early in the design stage to guide the choice of the design parameters.

Assumptions
The assumptions about the physical processes in the entropy source are the starting point of the security analysis.
In this design the entropy is extracted from the timing jitter of a free-running ring oscillator.Our first assumption is that some amount of white (Gaussian) noise is present in the entropy source.The property of the white noise is that its observations are independent of each other and independent of any other noise source in the system.According to the central limit theorem, the variance of this noise increases linearly with the jitter accumulation time.The platform parameter σ 2 m /t m is obtained experimentally.Our second assumption is that the value of this parameter is not overestimated.This means that a conservative estimation of this parameter's value must be made during the measurement procedure.
We further assume that other noise sources are present in the entropy source.These include flicker noise (low frequency noise), telegraph noise (popcorn noise) and the global noise from the power supply.None of these sources are exploited in this design because either their observations are correlated, the noise parameters are not measured or the noise source can be manipulated by the attacker.In order to provide a conservative estimation of entropy, the impact of these noise sources will be treated as deterministic and known to the attacker.
In addition, we assume that the sampled raw bits are independent because the circuit is reset before generating each bit [BL05].We note that this doesn't imply that the raw bits are identically distributed.
Finally, we assume that the frequencies of the two free-running ring oscillators and the delays of the elements in the tapped delay line are measured with sufficient precision, such that the measurement error doesn't significantly affect the entropy estimation.

Entropy Source.
We will analyze the entropy source using the phase model of the ring oscillator with duty cycle D (in practice D ≈ 0.5).The phase ξ ∈ R of the RO increases over time; integer phase values correspond to the rising edges of the output signal and the values i + D, i = 0, 1, ... correspond to the falling edges.The entropy source is reset before generating a new bit, so we assume that the oscillations always start from ξ(0) = 0. Phase ξ is affected by both deterministic and white noise sources.
Since the average value of the white noise is zero, the expected value of the phase after time t is The contribution of the deterministic noise sources E(ξ Deterministic (t)) cannot be estimated with reasonable precision.Therefore, for entropy estimation we will always consider the worst-case value of this term.
The standard deviation of the phase is greater than the standard deviation of the white noise: This bound on the standard deviation can be computed for any value of t, given the physical and design parameters.

Digitization.
The ring oscillator is sampled repeatedly at moments t A , t A + T 02 , t A + 2T 02 , .. until a valid sample is detected.We denote the phase at these moments with X 0 , X 1 , ...
Here we introduce an approximation that will be used for estimating probabilities: Approximation 1: where ∆ξ = T 02 /T 01 .This assumption is justified by the fact that, for sufficiently high frequencies, the impact of the correlated noise is negligible compared to other randomness generating processes.This assumption, while not always explicitly stated, is used in state-of-the-art stochastic models of TRNG designs [FL14,HFBN15].
For the security analysis of the digitization, we define the normalized platform parameters: d r,1 , d f,1 , d r,2 , d f,2 denote the normalized delays of the tapped delay chain (d r,l = t r,l /T 01 and d f,l = t f,l /T 01 for l ∈ {1, 2}); D denotes the duty cycle of the entropy source.
In order to simplify the following analysis, we denote the standard deviation of the phase at time t A as σ t A = σ(X 0 ).We use σ T02 = σ(X i − X i−1 ) to denote the standard deviation of the superimposed phase, which is the result of the white noise accumulated in T 02 .We would like to note that the white noise accumulated during the time period between X i and X i−1 for any i is independent of each other and also independent of the white noise accumulated during the accumulation time t A .
The sampling circuit maps the phase value into the output bit.We introduce the mapping s : R → {0, 1, N }, where the symbol N denotes that a non-valid value is detected.Function s(x) can be formally defined as: where S 1 , S 0 and S N are unions of mutually disjoint real intervals defined as: We stress that the sets S 1 , S 0 and S N are mutually disjoint and that: For clarity reasons, we introduce the function g : R → {0, 1}.
We note that g(x) is a periodic function with the following property: For any function f : R → R and any A ⊂ R: A special case of this property is: In line with these definitions and properties, we start the analysis of the digitization by examining the first sample taken at time t A .
The phase at the first sampling moment is a normally distributed random variable X 0 .The probability density function of this variable is: Let Y i denote the sampled values, Y i = s(X i ).For convenience, we use the notation µ i = E(X i ) − E(X i ) .We can compute the probabilities of Y 0 as: To simplify the notation, we let EV i denote the event Y 0 = N, • • • , Y i = N .For example, the EV 0 denotes the event Y 0 = N .The second sample is only taken when EV 0 occurs.The i th sample is taken when EV i−1 occurs.From Equations ( 21) and (25), it follows that: For events EV i , the following property holds: An intuitive example of the repetitive sampling is shown in Figure 6.The probability density function ρ X0 (x) of the random variable X 0 = ξ(t A ) is located at the left part of the figure.This distribution covers three types of regions under the curve ρ X0 (x): the white regions, the black regions and the shaded regions, which correspond to segments of ρ X0 (x) over the sets S 0 , S 1 and S N respectively.The area of the white regions is equal to P r(Y 0 = 0), while the area of the black regions and the shaded regions are equal to P r(Y 0 = 1) and P r(Y 0 = N ).It is described by Equations ( 23), ( 24) and (25).If the realization of X 0 is located at the shaded regions, the second sample is needed.
The middle distribution corresponds to the second sample.The middle distribution also covers three types of regions.The area of the white regions, the black regions and the shaded regions are equal to P r(EV 0 , Y 1 = 0), P r(EV 0 , Y 1 = 1) and P r(EV 1 ).
If the event EV i−1 occurs, the distribution of the i th sample shown at the right part of the figure can be derived from the distribution of the (i − 1) th sample.
Under the Approximation 1, Equation (12) can be rewritten as: where Z i denotes a series of random variables defined as: Informally speaking, Z i is the superimposed component of ξ(t A + i • T 02 ) caused by the white noise accumulated during the time interval ).The goal of the entropy estimation is to compute the binary probability of the raw bits.The computation of this binary probability requires P r(EV i−1 , Y i = 1).Starting from the P r(Y 0 = 1) given by Equation (23), we first compute ρ X1|EV0 (x) as: Probability The formal derivation of Equation (30) can be found in Appendix A. An informal but intuitive interpretation of Equation ( 30) is shown in Figure 7.The left part of the figure is the probability density function of the ρ X0 (x).The black, white and shaded area under the curve correspond to P r(Y 0 = 1), P r(Y 0 = 0) and P r(EV 0 ) respectively.The second sample is taken, only when the event EV 0 occurs.The shaded region is shifted to the right by T 02 /T 01 , which corresponds to the dashed curve.The probability density function ρ Z1 (x) is shifted to positions with probabilities according the dashed curve.The weighted and shifted results are added together.The result is shown at the right bottom of this Figure 7. Now the conditional probability function ρ Xi|EVi−1 (x), (i > 1) can be derived iteratively as: The formal derivation of Equation (31) can be found in Appendix A.
We can compute probabilities P r(EV i ) as: Starting with Equations ( 26) and (30), we can apply Equations ( 31) and (32) iteratively to compute probabilities P r(EV i ).

Binary probabilities
Now we can compute the following probabilities:

P r(EV
The P r(EV i−1 , Y i = 0) can be computed using Equations ( 27), ( 33) and (34).The value of raw bit b is determined by a sequence (Y 0 Y 1 • • • Y j ) where j is the smallest integer such that Y j = N .Therefore, the binary probability of a raw bit can be computed as:

Entropy claim
Binary probabilities are computed starting from the platform parameters using Equations ( 11), ( 19), ( 22), ( 26), ( 30), ( 31), ( 33) and ( 35).The only unknown parameter is µ 0 which is equal to the phase of the oscillator at the moment t A .The value of µ 0 depends on the global noise, low-frequency noises and the operating conditions.For this reason, it cannot be predicted with reasonable precision at design time.The conservative entropy claim is made by examining the effects of µ 0 on binary probabilities and using the value that results in the lowest entropy.This procedure is shown in the example in Section 6.

Experimental Validation of the Stochastic Model
We use an experimental approach to validate the proposed stochastic model, i.e. to check if the behavior of the physical TRNG matches the behavior predicted by the model.For this purpose, we implement the proposed design on a Xilinx Spartan-6 FPGA.In this experiment we monitor the number N T of toggles of the RO2 during the sampling phase before the raw bit is generated.
Before applying the stochastic model, we have to specify the design parameters and measure the platform parameters.Design parameters T 01 and T 02 are chosen indirectly by selecting the number of delay elements in RO1 and RO2.We implement RO1 and RO2 using a single look-up table and three look-up tables respectively.The periods of RO1 and RO2 are then measured using two individual ripple counters, the measurement results are T 01 = 2171.8ps and T 02 = 2739.8ps.In regard to other design parameters, the period of the system clock is chosen to be T CLK = 10 ns and the jitter accumulates for 9 system clock cycles.
We follow the methodologies proposed in [YRG + 17] and [RYDV15] to measure the platform parameters.The obtained propagation delays are t r,1 = 22.25 ps, t r,2 = 24.12ps, t f,1 = 35.93ps and t f,2 = 40.90ps.The duty cycle D measurement is 0.43 and the white-noise jitter strength is σ 2 m /t m = 0.0029 ps.The number of toggles for a specific value of µ 0 is distributed as follows: which is easily computed by applying Equations ( 33) and (34).
We note that due to a very low jitter strength, the variance of accumulated jitter at the sampling phase is much lower than the period of RO1, i.e. σ T02 1.A consequence of this low jitter is that the probability P r(Y i = N ) is very low in cases when µ i is close to the high-precision region (µ i ≈ 0, µ i ≈ D and µ i ≈ 1).Conversely, this probability is very high when µ i is far away from the high-precision phase region (i.e.µ i ≈ D/2 and µ i ≈ (1 + D)/2).We check how this affects the distribution of N T for the selected oscillator periods.Let's first observe that T 02 /T 01 ≈ 5/4.Therefore: Since the high-precision regions are close to the beginning of the cycle and the middle of the cycle, we get: The result is that the N T distribution for odd values is very different from the distribution for the even N T values.This effect is observed in the model for any value of µ 0 as well as in the experimental data shown in Figure 9a.For clarity, this distribution is presented using two graphs, one for the odd N T values and one for the even values.
In order to model the distribution of N T we have to make assumptions about µ 0 .In the used experimental setting, it is reasonable to assume that µ 0 follows a Gaussian distribution.We note that we don't make this assumption for computing the entropy, but rather use the worst case value.Therefore, the N T distribution is computed as the compound distribution of N (µ, σ 2 ) and the distribution given by Equation (36).For N (0.384, 0.09 2 ), the model reasonably approximates experimental data, as shown in Figure 9b.We note that N (0.384, 0.09 2 ) is unrelated to the platform parameters and is not used for entropy estimation.It is only used to validate the stochastic model.

Xilinx FPGA implementation
In this section, we present a Xilinx Spartan-6 FPGA implementation of ES-TRNG.In addition to look-up tables (LUTs) and sequential elements (flip-flops and latches), this  FPGA has high-speed carry chain primitives called CARRY4.These primitives can be configured to work as a tapped delay line.
Figure 10 shows the implementation of the core of the proposed digital noise source.RO1 is implemented as a high-speed oscillator using a single LUT, RO2 is implemented using 3 LUTs.A single CARRY4 element is used to implement the tapped delay chain.
The entire core of the TRNG shown in Figure 10 is implemented using only one CARRY4 element, 10 LUTs and 5 flip-flops.In addition to this core, the design contains a parity filter for post-processing and a control circuit for setting the enable and reset signals.A system clock of 100 MHz is used in the design.

Application of the model
In this section, we applied the stochastic model to derive optimal design parameters.The platform parameters measurement is described in Section 5.The stochastic model enables us to calculate the Shannon entropy and min-entropy for any value of µ 0 for a specific accumulation time.The conservative entropy estimation is made by using the global minimum.This procedure is illustrated in Figure 11 To confirm that our entropy estimation is indeed conservative, we applied the entropy assessment procedure from [TBK + 18] on the collected raw random numbers.The results are summarized in Table 2.The results reported by the NIST entropy assessment python package are always higher than the min-entropy estimation derived from the proposed stochastic model.This result is expected because the stochastic model estimation is based on the worst case scenario.
According to the standard [KS11], a minimal Shannon entropy level of 0.997 per bit is  expected in the internal random numbers.This goal is achieved when the bias ε internal of the internal random numbers is upper-bounded by 3.2%.This Shannon entropy level cannot be achieved by accumulating the jitter less than 1 µs, which leads to a throughput smaller than 1 M bps.However, a higher throughput can be obtained by using a shorter accumulation time followed by a simple algorithmic post-processing.Parity filter of order n f , which combines n f consecutive input bits into one output bit using a XOR function, reduces the bias ε internal to: where ε raw denotes the bias of raw random numbers.
To optimize the throughput of ES-TRNG, we computed the minimal n f required to achieve the Shannon entropy level of 0.997.Figure 12 shows the lower bound estimation for H 1 and H ∞ for t A ranging from 20 ns to 1 µs.For every t A within this range, we calculate the expected throughput of the internal random numbers after the required n f -stage parity filter.From the figure, we can see that the throughput increases monotonically when t A ≤ 110 ns.Within this region, the required n f reduces rapidly because of the increasing of estimated Shannon entropy.There are several discontinuous segments on the throughput plot when t A > 110 ns.Each segment corresponds to a different required n f .When t A > 1000 ns, the throughput is always lower than 1 M bps.The global maximum of 1.15 M bps throughput is obtained at t A = 250 ns where n f = 3.At this point, the H ∞ of raw random numbers estimated by the stochastic model is 0.515 which is more conservative than 0.86 derived from the standard [TBK + 18].The post-processed data achieves a throughput higher than 1 M bps with a Shannon entropy level higher than 0.997.For verification, a 10 M B sequence of internal numbers was generated and tested using T0-T5 tests proposed in the AIS-31.The sequence passes all the tests.

Intel FPGA implementations
In order to show portability of our design to different FPGA platforms, we have also implemented ES-TRNG on Intel Cyclone V FPGA.The basic building blocks of this FPGA are adaptive logic modules (ALMs) that contain 8-input flexible LUTs, 2 fast adders and 2 flip-flops.The tapped delay line is implemented using fast carry chains in dedicated adders.Since 1 ALM already contains two carry stages, we would need only 2 ALMs to implement a structure equivalent to CARRY4 in Xilinx FPGA.However, due to considerable differences of the carry stages' propagation delays, we opted to use output of every second carry stage.In this way, we obtained more balanced delays of the resulting delay chain stages at the price of the increased resource utilization -4 ALMs instead of 2. To be able to accurately determine RO1 frequency, we implemented RO1 with 2 LUTs, while for RO2 we used 3 LUTs.The oscillation periods of RO1 and RO2 were determined with ripple counters and their values are T 01 = 1745.68ps and T 02 = 3020.068ps respectively.Bit extractor module is implemented by using a single ALM, thus achieving a compact implementation.The measurement of Intel Cyclone V FPGA platform parameters is performed in the same way as for Xilinx Spartan-6 in Section 5.The propagation delays are calculated as follows: t r,1 = 67.316ps, t r,2 = 68.316ps, t f,1 = 52.044ps and t f,2 = 50.544ps.The whitenoise jitter strength is σ 2 m /t m = 0.020 ps, while the duty cycle D of RO1 is 0.58.After measuring the physical parameters, we determined the design parameters, as previously described.The jitter accumulates for t A = 230 ns, while the parity filter has to be of order n f = 3 to obtain Shannon entropy level of 0.997.The obtained throughput of ES-TRNG on Intel Cyclone V FPGA is 1.067 M bps.We observe that although the white-noise jitter strength on Intel Cyclone V is higher than on Xilinx Spartan 6, the propagation delays on Xilinx Spartan 6 are substantially lower, leading to similar throughputs on both FPGAs.

Results and Comparison
Table 3 shows the comparison to the state-of-the-art TRNG designs for both Xilinx and Intel FPGAs.The second column of the table indicates the availability of stochastic models for each TRNG design.The utilization of hardware resources are reported in the third column.Only the basic units in the FPGA are reported, such as LUTs, Flip-flops and Slices.Special dedicated primitives, like PLL building blocks for PLL-TRNG, are not included here.Among Xilinx TRNG designs listed in the table, ES-TRNG achieves the smallest hardware footprint.Our ES-TRNG generates more than 1 M bps internal random numbers with an estimated minimal Shannon entropy level of 0.997.Here we discussed two types of design effort for FPGA implementation: manual placement (MP) and manual routing (MR).MP is critical for some TRNG designs, because the quality of those TRNGs is sensitive to the relative spatial location of their building blocks.For TRNG designs based on identical delays or balanced routing, MR cannot be avoided.
As can be seen in Table 3, the ES-TRNG has one of the smallest area consumption compared to TRNGs implemented on Cyclone V FPGAs.The only TRNG designs with comparable implementation footprints are the COSO-TRNG (the coherent sampling ring oscillator based TRNG) and the PLL-TRNG (the coherent sampling based TRNG using PLLs) [PMB + 16].It is worthwhile to note that although the COSO-TRNG has higher throughput than ES-TRNG, it requires laborious manual placement and routing, which has to be performed for every target device.On the other hand, the PLL-TRNG does not require any manual placement or routing on Cyclone V FPGAs, but it occupies dedicated PLL modules and provides a 40% lower throughput compared to ES-TRNG.
Here we would like to point out a problem with results comparison in this research domain that became relevant in recent years.Many TRNG designs were developed before the publication of evaluation standards [TBK + 18] and [KS11].Therefore, these old designs and some of the recent ones are not provided with stochastic models that are required today.Without the proper security analysis or by using debunked or simplified assumptions, it is possible to achieve a better hardware performance than if AIS-31 criteria are strictly followed.For example, a designer may base the entropy estimation solely on the results of the statistical tests.If this overestimation is used to guide the choice of design parameters, higher targets for throughput, area and energy are more easily achieved.This has the unfortunate effect that designs with a more rigorous security analysis appear worse in terms of hardware performance when compared to their predecessors.
In particular, in ring oscillator based designs, the jitter strength is often the critical parameter that significantly affects the performance of the final hardware implementation.Some recent works on jitter measurement on FPGAs [FL14, LB15, YRG + 17] showed that the strength of the Gaussian noise is lower by an order of magnitude compared to the values that were used in older TRNG designs.For comparison, we carried out the ES-TRNG design procedure assuming that the jitter strength is five time higher than measured (this is similar to the jitter strength reported in [SPV06]).The obtained design parameters were t A = 30 ns and n f = 2 which results in the throughput of 6.25 M bps.

Conclusion
In this work, we present a novel true random number generator called ES-TRNG.Two techniques, namely variable-precision phase encoding and repetitive sampling, are used to increase the throughput and the entropy of the proposed generator and to reduce the hardware footprint.The digital noise source is implemented using only 10 look-up tables (LUTs) and 5 flip-flops (FFs) of a Xilinx Spartan-6 FPGA, and achieves a throughput of 1.15 M b/s with 0.997 bits of Shannon entropy.On Intel Cyclone V FPGAs, this implementation uses 10 LUTs and 6 FFs, and achieves a throughput of 1.07 M bps.The proposed generator is backed up with a security analysis based on the stochastic model of the entropy source.

Figure 3 :
Figure 3: The architecture and operation principle of the digital noise source.

Figure 6 :
Figure 6: The intuitive interpretation of the repetitive sampling.

Figure 7 :
Figure 7: The intuitive interpretation of the first sample and the second sample.
Probability distribution of toggles computed from the collected experimental data.Probability distribution of toggles derived from the stochastic model.

Figure 9 :
Figure 9: Toggles probability distributions predicted by the model and obtained by experiments.Each distribution is shown using two graphs, one for the even values (left) and one for the odd values (right).
. This figure shows the entropy estimations for three different values of accumulation time (t A = 5 • T CLK , 10 • T CLK and 20 • T CLK ).The global minima are indicated by red dots.

Figure 10 :
Figure 10: FPGA implementation of the digital noise source.
commonly used statistical test suites are NIST SP 800-22 [RSN + 10], FIPS [KS11] 2: The generic architecture of a TRNG.failures of the entropy source.They operate on the raw random numbers rather than the internal random numbers for faster response and more reliable attack detection.Online tests can be derived and implemented based on the stochastic model[KS11]or empirically [YRM + 16].Total failure tests are implemented to detect the total breakdown of the entropy source.They are intended to check the working status of the entropy source and trigger an alarm immediately after the breakdown.

Table 2
Figure 11: The estimated H 1 and H ∞ for different accumulation time.
Figure12: The lower bound estimation for H 1 and H ∞ , the expected throughput after post-processing for different accumulation time.

Table 3 :
The comparison with existing TRNG implementations