Evaluation and monitoring of free running oscillators serving as source of randomness

. In this paper, we evaluate clock signals generated in ring oscillators and self-timed rings and the way their jitter can be transformed into random numbers. We show that counting the periods of the jittery clock signal produces random numbers of signiﬁcantly better quality than the methods in which the jittery signal is simply sampled (the case in almost all current methods). Moreover, we use the counter values to characterize and continuously monitor the source of randomness. However, instead of using the widely used statistical variance, we propose to use Allan variance to do so. There are two main advantages: Allan variance is insensitive to low frequency noises such as ﬂicker noise that are known to be autocorrelated and signiﬁcantly less circuitry is required for its computation than that used to compute commonly used variance. We also show that it is essential to use a diﬀerential principle of randomness extraction from the jitter based on the use of two identical oscillators to avoid autocorrelations originating from external and internal global jitter sources and that this fact is valid for both kinds of rings. Last but not least, we propose a method of statistical testing based on high order Markov model to show the reduced dependencies when the proposed randomness extraction is applied.


Introduction
In modern cryptographic systems, security is based on the statistical quality and on the unpredictability of confidential keys. These keys are generated in random number generators (RNGs) using random physical phenomena that occur in the hardware devices in which the system is implemented. A widespread source of randomness in digital devices is the jitter of the clock signal generated inside the device using free running oscillators such as ring oscillators [1,2,3], or self-timed rings [4].
The statistical quality and unpredictability of the generated numbers depend on the size and quality (e.g. the spectrum) of the clock jitter. It is therefore good practice to continuously monitor this jitter using an embedded jitter measurement method. As required in the document AIS-20/31 published by the German Federal Office for Information Security (German acronym BSI) [5], the measured jitter parameters should then be used as input parameters in the stochastic model used to estimate entropy, which characterizes the unpredictability of generated numbers.
Generally, many sources of randomness contribute to the overall entropy rate at the output of the RNG based on free running oscillators [6]: 1. Secure sources -random sources such as thermal noise, which are considered to be the best sources of randomness, because of their large and almost uniform signal spectrum similar to white noise, they are mutually independent, and unavoidable (i.e. they cannot be manipulated by the attacker); 2. Security critical sources -random sources such as low frequency noises that feature some autocorrelation, which reduces the entropy rate at the generator output, while making entropy estimation very complex because of long term dependencies; 3. Dangerous sources -environmental, data dependent and correlated sources, which can be random or deterministic. Their contribution to random number generation must be avoided by the design, since they can be manipulated. If the manipulation cannot be avoided, it must at least be detectable through dedicated embedded tests.
In practice, most (and sometimes all) of these sources of randomness coexist. This would not be a big security issue if: 1) only the contribution of secure sources were taken into account when estimating the entropy rate; 2) the generated numbers were impossible to manipulate.
In [1], Sunar et al. use an urn stochastic model to estimate the entropy rate at the output of the generator using a huge number of ring oscillators, which the authors claimed were independent. However, the model does not account for possible dependencies between the outputs of the ring oscillators, which can even cause the rings to lock [7].
In [2], Baudet et al. propose a comprehensive stochastic model for an elementary oscillator based random number generator sampling the jittery clock signal. In their model, the entropy rate at the generator output is estimated from the variance of the random jitter component that originates from the thermal noise.
The output numbers generated by both generators may be biased depending on the duty cycle of the sampled signal(s). Although both generators use the clock signal generated in the rings as a source of randomness, only the model proposed by Baudet et al. estimates the entropy rate from the jitter component originating from the thermal noise and consequently avoids overestimating entropy.
Evaluating the contribution of thermal and low frequency noises to the generated randomness is no simple task. In [8], Haddad et al. computed the variance of the jitter for different accumulation times and then computed the jitter component originating from the thermal noise by curve fitting. This method has two disadvantages: 1) its precision depends to a great extent on the precision of the curve fitting algorithm; 2) it is not suitable for monitoring the jitter inside the device.
In [9], Fischer and Lubicz proposed a method of evaluation of the variance of the random jitter originating from the thermal noise that can be embedded in logic devices and hence used for online evaluation of the entropy rate at the output of the generator. However, depending on the initial phase of the two clock signals and the jitter accumulation time, the method can produce incorrect results. The error can be corrected by using different accumulation times, but it is not easy to make this correction automatic.
In [10], Killmann and Schindler used a pair of noisy diodes as a source of randomness and an operational amplifier, a Schmitt trigger and a counter of edges as a time-to-digital converter transforming the noise into the raw binary signal. Surprisingly, the time-to-digital conversion based on counters had not been previously studied in the context of the use of free running oscillators.
Our contributions: 1) We show that counting the periods of the jittery clock signal, representing a time-to-digital conversion, gives random numbers of significantly better quality than the methods based on sampling the jittery signals. What is more, the counter values can be used to characterize and to continuously monitor the source of randomness. 2) We propose to use Allan variance of counter values instead of the commonly used statistical variance to evaluate the jitter, since it is not sensitive to low frequency components of the jitter originating from low frequency noises, such as flicker noise, which are known to be autocorrelated. The proportion of thermal noise in the total jitter can thus be more easily measured inside the device with no error or overestimation. 3) We demonstrate that by using two identical rings instead of one ring and one quartz oscillator, the impact of not only external, but also of internal global jitter sources can be significantly reduced and render the generator much more robust. 4) We propose to use a statistical method based on a high order Markov model and show how efficient it is in detecting dependencies and correlations in low quality generators.
The paper is organized as follows: in Section 1, we provide the theoretical background and analyze state-of-the-art methods related to our approach. In Section 2, we describe the experimental setup and analyze the impact of the type of the oscillator on the commonly used statistical variance and on Allan variance. In Section 3, we present the results of implementation of variance computation circuitries in hardware and discuss the impact of the measurement circuitry and of the additional logic represented by an AES cipher on the source of randomness in Section 3.2. In Section 4, we discuss the main results. We present our conclusions in Section 5.

Theoretical background
In purely digital devices, which are currently used to implement cryptographic systems, analog noise signals such as thermal noise cannot be directly exploited. Instead, the designer can use the fact that electrical noises are transformed in free running oscillators into uncertainties in timings of generated digital clock signals, which can be observed as a jitter in the time domain and as a phase noise in the frequency domain [11].
In logic devices, the most frequently used free running oscillators are ring oscillators (ROs) and self-timed rings (STRs), because both are easy to implement using standard logic gates. ROs are usually composed of an odd number of inverters as shown in the top panel of Fig. 1 (a) or a NAND gate and a sufficient number of non-inverting buffers, as shown in the bottom panel of Fig. 1 (a). In ROs, which are also called single-event ring oscillators [12], only one event (the rising or falling edge of the clock signal) propagates at any given time in the ring. Its propagation time is impacted by noises that modify the slope of the rising and falling edges and the reference voltage of inverters (or buffers).
In STRs, also called multi-event oscillators without signal collision, several events can propagate over the ring at the same time. The STR is composed of L stages, each consisting of a Müller gate and an inverter (see Fig. 1 (b)) [12]. F i is the forward input of the i-th stage, R i is the reverse input of the same stage, and C i is the output of the stage. If the forward and reverse input values differ, the forward input value is written to the stage output. Otherwise, the previous output value is maintained.
Ring oscillators are simpler and hence less expensive than STRs, so many rings can be used to increase entropy [1]. STRs are more complex, but multiple outputs of the same ring can be used to increase entropy [13].
The randomness originating from electrical noises, which is transformed in the free running oscillators into a clock jitter, can be further transformed into random numbers obtained as a chain of 1-bit or n-bit random values by: 1) sampling the jittery clock signal(s) after a sufficiently long time interval required for entropy accumulation as shown in Fig. 2 (a) [1], [2]; 2) by counting the periods of the jittery clock signal during the time interval as shown in Fig. 2 (b) [14].
While the first method based on sampling may be preferred because of its simplicity, it Figure 1: Generation of the jittery clock signal s in free running oscillators: ring oscillators (a) and self-timed rings (b)  Figure 2: Generation of random numbers from the jittery clock signals s 1 and s 2 using a sampler (a) and a counter (b). Signals s 1 and s 2 are generated in two free running oscillators (FRO) of the same type and topology.
is very sensitive to dependencies between the clock signals and also to the duty cycle of the sampled clock signal, which can cause a significant bias in the generated numbers [2].
Although the second method based on counting the periods of the jittery clock signal adds some complexity to the RNG design, we will show that it effectively removes dependencies between the clock signals by transforming random events from the time domain to the frequency domain, and even removes the dependence of generated numbers on the duty cycle of the jittery clock signal. We will also show that the counter can be used as a basis for dedicated embedded tests.
In the following sections, we will demonstrate and justify the relationship between the measured variance of counter values and that of the jitter present in clock signals s 1 and s 2 .

Characterization of the source of randomness by a statistical variance -a pitfall
Statistical variance characterizes the deviation of a random variable from its mean value. More precisely, if X is a square-integrable random variable, then its statistical variance can be computed as [15,16,17]: where E denotes the statistical average. The estimate of this variance on a set of M samples {x i } 1 i M , is given as [18]: (2)

Limitations of statistical variance in the presence of low frequency noises
We denote the output frequency of the oscillator under study by ν(t). The fractional frequency of the output is defined as: where ν 0 is the nominal frequency of the oscillator. In oscillators, random fluctuations are often characterized by the power law spectrum [19]: where f is the Fourier frequency, h α the intensity of the particular noise process and α a constant that characterizes this process. The typical values of α, with corresponding noise types, that often appear in the literature are +2 (white noise phase modulation), +1 (flicker noise phase modulation), 0 (white noise frequency modulation), −1 (flicker noise frequency modulation) and −2 (random walk frequency modulation). Knowing that random fluctuations are due to the above mentioned types of noises, the power spectral density of y can be expressed as [20]: Under the assumption that y is a zero-mean stationary random process, its statistics do not change over time. This implies that y is an infinite signal that can only be observed through a time window defined by the function h τ . The observed signal y τ can then be considered as the response of a filter, with the impulse response h τ , to the random input y. The power spectral densities of y and y τ are therefore related by [16]: where H τ is the Fourier transform of h τ . Based on the Wiener-Khinchin theorem, the autocorrelation function of y τ can then be computed as [16]: Because the process has zero mean, the variance is the autocorrelation function, evaluated at 0, hence [18]: The choice of h τ reveals how samples of the signal y are used in the variance computation. Since in the case of statistical variance, we are interested in consecutive samples, the corresponding time window has the form depicted in Fig. 3.
The magnitude squared transfer function of the statistical variance is thus [20]: The statistical variance of the signal can then be expressed as: where f h is the cutoff frequency of the oscillator. In Eq. (10), the integrand is equivalent to π 2 τ 2 f α as f → 0. The Riemann criterion therefore shows that the integral does not converge for α = −1 and α = −2 corresponding to low frequency noises. Consequently, it is not possible to compute the variance when the data is affected by low frequency noises. In other words, statistical variance should not be used when low frequency noises are not negligible.

Riemann criterion for improper integrals
For this reason, it is recommended to use other types of variance that converge in the presence of low frequency noises [18,22]. One example of this type of variance is the Allan variance, which is widely used to study the frequency stability of clocks and oscillators [23]. Next, we will show that the Allan variance should also be preferred in entropy rate estimation.

Allan variance
We recall that y denotes the fractional frequency of the oscillator. Thus, the average fractional frequency is defined as: It corresponds to the average frequency deviation over a time interval of length τ . If the frequency data are acquired periodically with a sampling period of τ , the obtained fractional frequency series is denoted (y i ), where y i is the i th acquired sample. The Allan variance of the frequency deviation of y is then defined as [19]: We denote Avar(y) the Allan variance of y as in [18]. An estimate of this variance in a data set comprised of M average fractional frequency samples, is given as [18]:

Convergence in the presence of low frequency noises
Unlike statistical variance, the Allan variance computes the difference of consecutive samples. This yields the time window presented in Fig. 4, with a magnitude squared transfer function given by: This makes it possible to write the signal variance as: In this new case, the integrand is equivalent to π 4 τ 4 f α+2 as f → 0. The Riemann criterion for f → 0 ensures that this integral converges when α > −3, and thus guarantees the accuracy of the Allan variance, even when the data are affected by low frequency noises (α = −1 and α = −2).
Next, we present the general properties of the Allan variance. Readers interested in the proofs of these properties should refer to Appendix A.

Theorem 1 (General properties of the Allan variance).
1. The Allan variance coincides with the statistical variance of any stationary and uncorrelated random process.
2. If λ is a real number and x is a stationary random process, then λx is also a stationary random process and: 3. If x and y are two independent stationary random processes, the following equation is valid: Since the measurement principle of the jitter is based on counter values, the properties of the Allan variance presented here will be used to establish the link between the variance of counter values and the variance of the jitter.

Link between the variance of counter values and of the jitter
We assume that both s 1 and s 2 contain jitter that causes variations in counter values. Before using the variance of a population of counter values as a measure of quality of the source of randomness, we need to determine and justify the relationship between this variance and the variance of the jitter on both signals s 1 and s 2 .
As mentioned above, the counter values are obtained by counting the number of periods of the measured clock signal s 1 during a time interval τ defined by the reference clock signal Of course, in practice, both signals feature a jitter, however, to simplify the computation, we include the jitter of signal s 1 in that of signal s 2 as done in [2]. Consequently, the period T 2 of signal s 2 can be considered as a random variable of standard deviation (see [2, Appendix C]): and the period T 1 of signal s 1 as a constant. The measurement time τ = k r=1 T 2r is thus a random variable. This time defines only the length of the time period, not the position of the initial phase ϕ 0 of the signal s 1 when the measurement (the counting) starts (see Fig. 5). However, to measure the jitter more accurately, the initial phase ϕ 0 has to be taken into account. This initial phase is independent of τ , since its value does not depend on the length τ . Because T 1 is constant, the counter value N is a random variable defined as: The value N thus satisfies the inequality: which is equivalent to: It then follows that N can be written as: It thus exists 0 ε < 1 such that: According to Sheppard's correction [24], ε is a random variable that is uniformly distributed over [0, 1). Since it is independent of τ −ϕ0 T1 , using Eq. (16) and (17) from Theorem 1, the following equation holds: It is important to note that the Allan variance of counter values always overestimates the Allan variance of the jitter per unit of time (e.g the signal period). The correction must be applied by subtracting Avar(ε) = 1 12 and Avar(ϕ0) As Avar(ϕ0) 12 . As we do not want to overestimate the jitter, a conservative approach is to take the minimum value for Avar(τ ) that is: Using Eq. (25), the variance of the accumulated jitter can be computed from the variance of counter values. This justifies using counter values to estimate the jitter.

Study and setup of the variance measurement
To study the difference between statistical variance and the Allan variance in different conditions, we first implemented the circuit presented in Fig. 2 (b) in the hardware. Four different hardware configurations were tested in an Intel Cyclone V FPGA: • Configuration 1: Signal s 1 of 127 MHz was generated in an RO and signal s 2 came from a low jitter quartz oscillator generating a stable 125 MHz clock.
• Configuration 2: Both signals (s 1 and s 2 ) were generated in two ROs with the same number of elements, oscillating at a frequency of 125 and 127 MHz, respectively.
• Configuration 3: Signal s 1 of 128 MHz was generated in an STR and signal s 2 came from a low jitter quartz oscillator generating a stable 125 MHz clock.
• Configuration 4: Both signals (s 1 and s 2 ) were generated in two STRs with the same number of elements and oscillating at a frequency of 130 and 128 MHz, respectively.
The counter values were sent to a PC via a simple serial interface and evaluated in the software. The jitter accumulation time τ was set up from the PC using the serial link.
To obtain meaningful and reliable embedded measurements, we first needed to establish the right operating parameters. These parameters are k -the number of periods of signal s 2 , which determines the accumulation time τ and M -the number of samples from which the variance will be computed.
We performed a series of variance measurements for different values of M in order to find an acceptable compromise between the measurement time and precision. We used k = 30 000 for this study. Measurement results are shown in Fig. 6. Figure 6 clearly shows the advantage of the Allan variance: it changes very slightly and only for low values of M , while the statistical variance increases with M and its values fluctuate. This fluctuation occurs because low frequency noises affect the signal periods. We selected M = 4096 as a compromise between the number of statistical data (which impacts the precision of the measurement) and the measurement time. To obtain coherent results, the same values of M were used when measuring variance and Allan variance.
We next studied the impact of the accumulation time τ = k r=1 T 2r on the measured variance. We observed the variances and Allan variances of counter values from two ring oscillators as well as two self-timed rings with k ranging from 300 to several million. The results are presented in Fig. 7 and Fig. 8.  We observed that ROs and STRs behave similarly in terms of variance dependence on k. This means that the jitter accumulates in both structures in a very similar way.
We also observed that for low values of k (k < 1 000), the computed variances varied probably because of the quantization noise, rather than random noises. Indeed, for these low values of k, the counter values varied only very slightly.
Last but not least, we observed that for sufficiently high values of k (k > 10 000), the Allan variance was always lower than the statistical variance. This proves that statistical variance overestimates the proportion of uncorrelated noise in the total accumulated jitter.

Accuracy of Allan variance estimation
The sample estimate given in Eq. (13) (time-average) approximates the true value 1 from Eq. (12) (average over process randomness) well, provided that the series is wide-sense stationary and has rapidly decreasing correlations 2 [25]. One can think of small correlations as a short memory of the process: substantial new information is gained with every sample, so that the estimate becomes increasingly accurate.
To check the quality of our estimate, we examined the process of counter values (used to compute the variance) and their first differences (used to compute the Allan variance) in more detail. We collected the data from four different hardware configurations, i.e. Configurations 1 to 4 described earlier in this section. All projects used k = 30 000 periods of s 2 to set the counting time. The behavior across all experiments is summarized at high level in Table 1. The autocorrelations were considerably reduced in first differences, as shown in Fig. 9. This confirms that differencing subsequent counter values is a good way of eliminating low frequency components and reducing correlations. Frequency noise in the ring oscillators is modeled by a process with stationary first differences in the theoretical literature 3 [26]; this is consistent with our experiments. More experiments are presented in Appendix B.
Based on our empirical evidence, we assume that the correlation is zero for sufficiently large lags. Under the mild assumption that the difference process is correlated Gaussian, we can bound estimation errors in the Allan variance computations (quantifying convergence rate). More precisely, we assume Model assumption: The process of counter differences is stationary normal, with zero correlations for lags larger than p.
The technical result and the corrolary regarding Allan variance are given below.

Evaluation of randomness in counter values
We propose to use the least significant bit of the counter values (or of their first difference) as random values. To evaluate the quality of the generated sequence, we model dependencies between subsequent bits by higher-order Markov chains. First, we recall some basics on Markov chains in order to introduce the theorem used to compute the min-entropy rate, which is more conservative than the Shannon entropy rate. We then empirically (based on data generated under different hardware configurations) compare our evaluation technique with the entropy estimation methods in AIS31 and NIST 800-90B.

Theoretical Background
Model The Markov chain model of order d assumes a sequence of random variables {U i } i over the common state space S (S = {0, 1} in our case), such that: • the next state distribution depends only on previous d states (short memory) • the next state distribution is a function of state values (that are time homogeneous) is the transition probability from v to w. In our case U i are bits and the assumed order is 8, thus the transformed chain W i has states {0, 1} 8 and the transition matrix has the size 2 8 × 2 8 .

Reduction of a chain of order d to a first order chain
Min-entropy rate Entropy rates (understood as the entropy per bit in long sequences) can generally be computed from the transition matrix. However, computation of the min-entropy rate is more complicated than that of the Shannon entropy and does not have a closed-form formula. We refer the reader to [27] for a detailed discussion of how different definitions of entropy (Shannon entropy, Renyi entropy, min-entropy) can be computed using Markov chains; below we state the result for min-entropy.
A sequence of states s 1 , . . . , s +1 is called a loop if s 1 = s 2 = . . . = s and s 1 = s +1 , where is the length of the loop. The min-entropy rate is then determined as follows.

Theorem 2 (Min-entropy rate of Markov chains [27]). Let P be the transition matrix of an irreducible and aperiodic Markov chain with the state space S. Then
H ∞ (P ) = min min (s1,...,s +1 )∈C 1 k=1 log 1 P s k ,s k+1 (26) where C denotes the set of all loops of length and P s k ,s k+1 the probability of the transition from state s k to state s k+1 .

Implementation
Language We implemented the procedure to estimate the min-entropy rate of a Markov chain in Python; to increase the speed, parts of the code were compiled to C by the Cython module. For computation, we used the Numpy library with double precision (64 bits).  r(s , s , ) for s ∈ S, s ∈ S and |S| requires a memory size of about |S| 3 multiplied by the size of the float placeholders.

Algorithmic issues
Parameters Based on the dependencies indicated by the results of the autocorrelations, we decided to use d = 8. We therefore study transitions between blocks of consecutive d = 8 bits, and the size of the transition matrix is 2 8 × 2 8 .

Experiments on the generation of random bit streams from free running oscillators
In our experiments on the generation of random numbers using free running oscillators, we wanted to compare randomness extraction using the two methods presented in Fig. 2.
We analyzed the output values of the sampler from Fig. 2 (a) and the least significant bit of the counter values from Fig. 2 (b) (and their first differences). We thus analyzed the outputs of four projects.
• The first two projects used the method of entropy extraction based on sampling the jittery clock signal (according to Fig. 2(a)) with two kinds of oscillators used as a source of randomness: signals s 1 and s 2 were generated by two ROs oscillating at 125 and 127 MHz, signals s 1 and s 2 were generated by two STRs oscillating at 130 and 128 MHz.
• The other two projects used the counter method of entropy extraction (according to Fig. 2(b)), while using the same oscillators as the first pair of projects.
For the method of extraction based on sampling of the jittery clock signal, we generated random bit streams for k ranging from 10 000 to 100 000. For the counter method of extraction we generated sequences for k ranging from 2 000 to 100 000. Two kinds of files were generated in this case -one containing the least significant bits of the counter values and the other containing the least significant bit of the first differences of counter values. We tested all the generated sequences using the AIS31 Procedure B (tests T6 -T8) and NIST 800-90B test suite, from which we also obtained Shannon entropy and min-entropy estimates respectively. The min-entropy was computed for every sequence according to Eq. (26), i.e. the computation was based on high order Markov chains while taking correlations between output bits into account. The results are presented in Appendix D, Table 5 to 10.
Three very important results stand out in the tables presented in Appendix D. First, the method of randomness extraction based on sampling of the jittery clock signal always gives lower entropy rates than those obtained by the method based on counting the jittery clock signal periods. Second, the method of min-entropy estimation based on high order Markov chains gives very consistent results even in the interval of values of k, for which Procedure B of AIS31 revealed no differences in Shannon entropy estimates. Third, the entropy rates are practically the same when the least significant bit of the counter values or that of their differences is used. This is valid independently of the type of free running oscillator (RO or STR).
To compare the parameters of the proposed randomness monitoring process, we implemented tests based on counter differences and two other state-of-the-art tests (proposed in [8] and [9]) in the same device -Intel Cyclone V FPGA.
The circuitry corresponding to implementation of the Allan variance according to Eq. (13) in hardware is shown in Fig. 10. All the computations are in fixed point arithmetic. This method only requires one multiplier to square data. One subtractor is used to compute the difference of the consecutive samples and one adder with associated register is used as accumulator.
The circuitry corresponding to hardware implementation of the variance computation used by Haddad et al. in [8] and for the one corresponding to Eq. (2) is depicted in Fig. 11.  Figure 11: Implementation of the counter variance measurement circuitry for the method proposed by Haddad et al. in [8] and for that corresponding to Eq. (2) Again, all the computations are in fixed point arithmetic. Numbers before and after the radix point indicate the number of bits of the integer and fractional part of the given value, respectively. Two multipliers (one of 12 bits and the other of 24 bits) are used to square data. Two adders and associated registers (one of 24 bits and the other of 12 bits) are used to implement accumulators. One subtractor is used before the output of the block. Four additional data registers are used to store intermediate data.
The third test we implemented in the hardware has the same architecture as that presented in [9], Fig. 6. In the following section, we compare the three implementations.

Implementation results
First, we evaluate design parameters like area, speed and power consumption of the three methods of variance measurement described above. Area and speed values were obtained from Quartus software. Power consumption was measured using a dedicated hardware evaluation platform [28]. The results are presented in Table 3. We observe that the Allan variance measurement circuitry based on Eq. (13) is smaller, faster and consumes slightly less power than the circuitry required by the other two methods. This is because the implementation of the Allan variance measurement is simple (only one subtractor and one adder needed, only one DSP block used instead of two or four, respectively).

Study of the impact of the measurement circuitry on the source of randomness
Next, we propose a rigorous approach to assess the impact of the embedded jitter measurement on the measured jitter itself. The impact of the jitter measurement on the jitter itself is evaluated in the following steps: • Project 1 -Only two free running oscillators, used as sources of randomness, are implemented in the selected logic device. The generated clock signals are output using low voltage differential signaling (LVDS) outputs and measured externally using high end oscilloscope and differential probes (see Fig. 12).  Figure 12: External jitter measurement method using an oscilloscope and differential probes • Project 2 -A complete TRNG, embedded variance measurement and an AES cipher are implemented in the FPGA to mimic the behavior of the real crypto SoC as shown in Fig. 13. Signal s 2 is generated using an external quartz oscillator. The variance is measured both internally, and externally.
• Project 3 -A complete TRNG, embedded variance measurement and an AES cipher are implemented in the FPGA to mimic the behavior of the real crypto SoC as shown in Fig. 13. Signal s 2 is generated using a free running oscillator. The variance is measured both internally, and externally.

Osc1
Logic Device  Figure 13: External jitter measurement method using an oscilloscope and differential probes combined with an internal jitter measurement method while the TRNG and the AES cipher are running (only one generator of signal s 2 is present in each of the two projects: Quartz oscillator in Project 2 and Osc2 in Project 3) To ensure the measurement results are consistent, it is important to guarantee the same placement and routing of Osc1 and Osc2 in all projects. We generated the Exported Partition file (.qxp), which is the Quartus II software option used to export post-fitting netlists. The exported netlist was then used in all the projects.
We decided to implement only ROs as Osc1 and Osc2 because, as shown in Section 2, they are simpler to implement than STRs and the jitter behavior is very similar in both STRs and ROs. Oscillators Osc1 and Osc2 had the same number of elements and the same topology. They oscillated at respective frequencies of 124.5 ± 0.3 MHz and 126.3 ± 0.2 MHz. The difference in frequency in the three projects was thus less than 1 %, which was important to ensure the results were comparable.
We measured the jitter of both oscillators as well as the normalized counter value externally using a LeCroy WavePro 735i oscilloscope (4 GHz bandwidth, 40 GS/s) and two D420 WaveLink 4 GHz differential probes. Counter values cannot be obtained directly from an oscilloscope since the value of k cannot be set up like in hardware but can only be deduced from the oscilloscope time base, which, in our case, was set to 5 µs per division. We measured the number of periods of both clocks in this time interval. Finally, to make the comparison of values obtained using the external and embedded measurements more consistent, we measured the number of cycles of both clocks at the same time interval and normalized the resulting data according to the following equation: where n 1 represents the number of clock periods of s 1 and n 2 the number of clock periods of s 2 that appear during the same time interval determined by oscilloscope's time base. In our case, we used k = 30 000 to normalize oscilloscope measurements. Table 4 shows the results of external and internal measurements. The jitters of Osc 1 and Osc 2 were both measured by the oscilloscope. To compare these values with those obtained using the Allan variance according to Eq.(25), we saved the counter values in a file for processing. Table 4: Results of external and internal measurements of oscillator jitters in three selected projects: Columns 2 and 3 list the jitters σ 1 , σ 2 measured using the oscilloscope. Column 4 lists the equivalent jitter σ eq computed from Eq. (18). Column 5 lists the normalized variance of counter values computed from the oscilloscope using Eq. (27). Column 6 lists the Allan variance estimate computed inside the device using Eq. (13). We can see that putting the whole cryptosystem including the AES cipher in an FPGA more than doubles the jitter of both oscillators, but the variance of counter values remains almost the same if only internal oscillators are used. In Project 2, in which the signal s 2 is generated by an external quartz oscillator, there was a significant increase in the variance of counter values, which confirms that using identically implemented oscillators and implementing them both inside the FPGA (differential principle of randomness extraction) helps prevent negative effects of the surrounding logic on the measured jitter.
To further confirm this claim, we acquired a large sequence of counter values from Project 2 and transferred them to a PC in order to visualize them over time. The acquisition was done with the accumulation period set by k = 30 000. The whole acquisition took approximately 30 minutes. Figure 14 shows the counter values when the signal s 2 was generated by an external quartz oscillator. A strong low frequency signal can be seen to affect the counter values. The frequency of the signal is approximately 1.5 mHz.  Figure 15 shows the counter values when s 2 was generated by an internal RO. Even though the low frequency pattern is still slightly visible, its amplitude is significantly reduced. We discovered that the observed low frequency signal originated from the power line even though the evaluation board was using only low noise linear power supplies. These findings confirm that unwanted global noises are almost always present and are unavoidable. Since this kind of noise can be manipulated, it can be extremely dangerous for the TRNG design. Moreover, a low frequency signal such as the one visible in Fig. 14 is usually hard to detect.

Discussion
We have very clearly demonstrated several advantages of the Allan variance over statistical variance: it gives stable values independent of low frequency noises even for short data sets. It is thus suitable for the estimation of entropy originating from non-manipulable independent noises such as thermal noise. It can serve as a basis for embedded tests, for which it is particularly suited because of its small area and low latency.
RO and STR behave similarly in terms of variance dependence on jitter accumulation time. Jitter accumulates in both structures in a very similar way. This is a new observation.
Using two identical oscillators reduces autocorrelations in RNG output values. Using the first differences of counter values instead of counter values themselves further reduces autocorrelations.
The method of randomness extraction based on sampling of the jittery clock signal always gives lower quality results than the method based on counting the jittery clock signal periods. The jitter accumulation times can be reduced more than ten times (more than 400 000 periods of the reference clock were needed in [9] and fewer than 30 000 if the jittery clock periods are counted). This means significantly higher bit rates at generator output with no loss of entropy.
The method of min-entropy estimation based on high order Markov chains gives very consistent results even in the interval of values of k, for which Procedure B of AIS 31 revealed no differences in Shannon entropy estimates (see Tables 7 to 10).
The studies described here confirm, that using external oscillators jeopardizes the implementation of security critical applications. They also prove, that implementing identical oscillators inside the FPGA and using their relative jitter transformed into counter values or even better into their differences, can efficiently mitigate the negative effects of global noise sources both external to the FPGA and generated internally by the surrounding logic, represented in our case by the AES cipher.

Conclusions
We evaluated the jitter of clock signals generated in ring oscillators and self timed rings and the way the jitter is transformed into random numbers. We showed that counting the periods of the jittery clock signal gives random numbers of significantly better quality than the usual methods of sampling jittery clock signals. We used counter values to characterize and to continuously monitor the source of randomness. We showed that using the Allan variance to characterize the clock jitter has at least two advantages: first, it is not sensitive to low frequency noises such as flicker noise, and second, significantly less circuitry is required for its computation than that used in other methods. We also show that a differential principle of randomness extraction from the jitter, based on the use of two identical oscillators is essential to avoid autocorrelations originating from both the external and internal sources of global jitter, independently of the type of ring used. Last but not least, we propose a new method of statistical testing based on a high order Markov model to demonstrate the reduction of dependencies when the proposed randomness extraction is applied. While providing an estimation of min-entropy, the method is very efficient in detecting dependencies between generated numbers.

A.1 Allan variance generalizes the statistical variance
If x is a stationary and uncorrelated random process, we know that its statistical variance exists [22]. If we call µ the expected value of x, then: Since x is stationary, one has: and: Moreover, the uncorrelatedness of x implies: It then follows: Hence:

A.2 Multiplication by a scalar
Given a real number λ and a stationary random process x, then λx is also a stationary random process. Its Allan variance is then:

A.3 Sum of independent random processes
If x and y are two independent stationary random processes, one has: Because the processes x and y are independent, one has: for any i, j ∈ N. Since they are stationary: for any i, j ∈ N. Hence: It then follows that: Avar(x + y) = Avar(x) + Avar(y).

B.1 Background
Sample autocorrelation Given a sequence of observations z 1 , . . . , z N originating from a random process {Z i } N i=1 , the sample autocorrelation is the function of the time lag τ defined byρ whereμ andσ 2 are sample mean and variance estimateŝ At longer lags τ ≈ N there are fewer samples to estimate, so thatρ u becomes unstable; for this reason one often applies the following modification which increases the bias but has lower variance (and smaller MSE error as suggested in some empirical studies); however, in our case N is big enough to obtain accurate results of ρ u (τ ) for a wide range of values 0 τ N .
Process autocorrelation If the sample z 1 , . . . , z N comes from a WSS ergodic process {Z i } i thenρ andρ b estimate the process autocorrelation function which under the WSS assumption depends only on τ (as the mean E(Z i ) = µ and variance σ 2 = Var(Z i ) do not depend on i).

Sample vs. process autocorrelation
If the sample z 1 , . . . , z N comes from a WSS ergodic process {Z i } i thenρ andρ b estimate the process autocorrelation. This estimate converges provided the autocorrelations decay fast enough (in theoretical literature this is captured by the notion of covariance ergodicity [29]). Confidence for these estimates, when necessary, can be obtained using Bartlet's formula [30].

B.2 Examples
Raw counter values We first estimate autocorrelations of counter values. As expected they are very high, particularly for setups with a quartz reference clock. We use both estimators (40) and (39). We compute the sample autocorrelation function by fast Fourier transform.

C Proof of Lemma 1
Since the process is Gaussian and the variance and mean of Z i do not change over time, higher moments do not change over time either. We therefore have where the constant does not depend on time i. Now let us consider mixed moments which, for the joint Gaussian distribution (Z i , Z j ) can be simplified 5 as where ρ(Z i , Z j ) is the correlation. Again Var(Z i ) does not change over time; moreover ρ(Z i , Z j ) depends only on the lag j − i and equals zero when |i − j| > p according to our assumptions. Thus where the constant does not depend on i, j. By combining Equations (41) and (42) we obtain

D Results of entropy estimation using a Markov chain model and statistical tests required by standards
Entropy was evaluated in six different configurations of the TRNG including two methods of randomness extraction and two types of oscillators, as explained in Sect. 1: • Both clock signals generated by ROs, randomness extraction by sampling the clock.
• Both clock signals generated by STRs, randomness extraction by sampling the clock.
• Both clock signals generated by ROs, randomness extraction by counting the clock edges (the least significant bit of the counter represented the random bit).
• Both clock signals generated by ROs, randomness extraction by counting the edges.
• Both clock signals generated by STRs, randomness extraction by counting the edges.
• Both clock signals generated by STRs, randomness extraction by counting the edges.
We used two standardized batteries of statistical tests alongside the method proposed in this article to evaluate the output of the TRNGs: • Markov chain min-entropy estimate. This method is explained in detail in Section 2.
• German AIS 20/31 test suite from Procedure B, which is intended to test the output of the TRNG core. Entropy is estimated by the test T8 is the Shannon entropy per random bit.
• American NIST 800-90B test suites for independent and identically distributed data (IID) and non-IID data. If data is detected to be IID, the min-entropy estimate of the IID test track is given. Otherwise, the non-IID entropy estimate is used.