Side Channel Attack On Stream Ciphers: A Three-Step Approach To State/Key Recovery

. Side Channel Attack (SCA) exploits the physical information leakage (such as electromagnetic emanation) from a device that performs some cryptographic operation and poses a serious threat in the present IoT era. In the last couple of decades, there have been a large body of research works dedicated to streamlining/improving the attacks or suggesting novel countermeasures to thwart those attacks. However, a closer inspection reveals that a vast majority of published works in the context of symmetric key cryptography is dedicated to block ciphers (or similar designs). This leaves the problem for the stream ciphers wide open. There are few works here and there, but a generic and systematic framework appears to be missing from the literature. Motivating by this observation, we explore the problem of SCA on stream ciphers with extensive details. Loosely speaking, our work picks up from the recent TCHES’21 paper by Sim, Bhasin and Jap. We present a framework by extending the eﬃciency of their analysis, bringing it into more practical terms. In a nutshell, we develop an automated framework that works as a generic tool to perform SCA on any stream cipher or a similar structure. It combines multiple automated tools (such as, machine learning, mixed integer linear programming, satisﬁability modulo theory) under one umbrella, and acts as an end-to-end solution (taking side channel traces and returning the secret key). Our framework eﬃciently handles noisy data and works even after the cipher reaches its pseudo-random state. We demonstrate its eﬃcacy by taking electromagnetic traces from a 32-bit software platform and performing SCA on a high-proﬁle stream cipher, TRIVIUM, which is also an ISO standard. We show pragmatic key recovery on TRIVIUM during its initialization and also after the cipher reaches its pseudo-random state (i


Introduction
Symmetric key cryptography is among the cornerstones in ensuring security in presentday electronic communication.The symmetric key ciphers are typically highly efficient We wish to thank François-Xavier Standaert and the anonymous reviewers for their detailed comments and helpful suggestions.compared to their asymmetric counterparts.Consequently, symmetric key ciphers are used whenever the communication protocol allows it.
Therefore, understanding the potential threats that challenge the security of the symmetric key ciphers is of prime importance.The security of a such a system is usually challenged on two aspects.The first aspect, termed as classical attack, rigorously analyses the algorithmic structure.In contrast, the second aspect relies only on the physical characteristics of a device which is running the cipher.Thus, it bypasses the ingenious construction that governs the security against the classical attacks.In this work, we concentrate on one such class of a device dependent attack, known as the Side Channel Attack (SCA, for short) [KJJ99, Koc96,MOP07,Pee13].In this case, the attacker, observes physical characteristics such as timing, power consumption, electromagnetic emanation and so on.When the secret component of the cipher (typically termed as the key), takes part in the process, it influences the external characteristics of the device.Equipped with this knowledge, the attacker is commonly able to deduce some non-trivial information regarding the key.There is another type of a device dependent attack, the so-called Fault Attack (FA) [BS97].
Being a major concern for the security, SCAs have garnered serious attention from the cryptographic community.Still, it appears that the block ciphers (and similar constructions) get the limelight practically all the time.Effectively, the other main branch of the symmetric key cryptography that deals with stream ciphers (and similar constructions) is apparently under-analysed (see Section 2 for a quick review of the existing literature).In this work, we look for potential solutions to the problem of finding a suitable model for stream ciphers (and similar constructions), using which an efficient side channel analysis would be possible.The framework we present here offers multiple features, e.g., it is an automated framework which can deal with real (noisy) traces for the Hamming weight (software) and Hamming distance (hardware) leakage model.Additionally, it works even after the cipher reaches its pseudo-random phase (we also use term key-stream phase interchangeably).
To show the usefulness and non-triviality of our work, we point to the following quotes: "Knowledge about the Hamming weight of intermediate values do probably give enough information to exclude some possible keys and therefore to reduce the key space.However, Hamming weight information by itself is often not sufficient to derive the secret key." -[RO04, Section 1] "As there is no obvious way to recover the internal state of the cipher after the initialization phase when the key is spread amongst the 288 bits of internal state [of TRIVIUM], applying any kind of side channel attack at a later stage is not promising."-[GBC + 08, Section 4.6]

Our Contribution
The challenge we partake here is to design a generic framework that can retrieve the key of a stream cipher or a similar construction given the side channel leakage, with minimal human intervention.Moreover, the framework should be able to retrieve non-trivial information from the leakage during the pseudo-random generation algorithm, and account for noise present in the leakage.The source-code is available as an open-source at the bitbucket repository1 .While more information about our modelling is given in Section 3, the work-flow can be summarised as follows (see Figure 1 for a pictorial description): 1. Offline Stage: (a) Get the side channel traces from the target device.Using the proper leakage model, for example Hamming weight for software or Hamming distance from hardware, those traces can be formulated as a multi-class classification problem.
For instance, the traces can be formulated as a classification problem with 33 classes when the leakage is from a 32-bit microcontroller since 33 distinct Hamming weights are possible.
(b) Now the problem can be trained with an appropriate Machine Learning (ML) model that supports classification.
(c) Test SMT solver for its tolerance limit, t l by using simulated noisy information.

Online Stage:
(a) With the trained ML model, we estimate the class in which the target traces would belong.Thus, we can get some non-trivial information in the form of Hamming weight or Hamming distance from the traces.(b) This enables us to fit a Satisfiability Modulo Theory (SMT).If all the predictions from the ML are correct, then the SMT instance returns a solution for the unknown state/key in a reasonable time.(c) Since the probability for correct prediction of HW through the ML model is not 1, we change the definition of correct prediction.Let c be the original class, then for a given > 0, a prediction c of the class is correct if c − ≤ c ≤ c + holds, otherwise the prediction is wrong.(d) Since the prediction from the ML model (Step 2(a)) is not 100% accurate for any given < 7, once in a while a wrong prediction will make it way through the SMT instance.This is problematic, since just 1 wrong prediction makes the entire system inconsistent (even though all the rest predictions are correct).
To filter that, we use another model, which is based on Mixed Integer Linear Programming (MILP).This MILP instance takes the sequence of predictions from the ML model, and uses cipher specific information, such as, the Hamming weight of one block at a certain clock cannot be too dissimilar to that of the same at the next clock.With a high probability, this MILP instance returns a sequence with no anomaly.(e) The output from MILP is finally fed into the SMT instance (Step 2(b)), which is then solved with error tolerance, t l (Obtained in Step 1(c)).
In actuality, the modellings (ML, MILP and SMT) are more complicated, this comes from the innate nature of the side channel traces.With the naïve ML modelling, we observe that the class-prediction accuracy is quite low.This leads to discard of too many traces (as SMT needs all correct traces), consequently the same experiment is to be repeated an impractical number of times.To mitigate the problem, we introduce the concept of error tolerance (denoted by ) which allows for a middle ground.More information regarding the concept of tolerance can be found in Section 3.2.
A practical evaluation of our framework is described with the TRIVIUM [DCP08] (description of TRIVIUM is omitted here due to space constraint) stream cipher running on an ARM Cortex-M3 board (32-bit microcontroller) in Section 4 (some additional results on 8and 16-bit microcontrollers are also shown).In particular, Section 4.2 describes the set-up used, Section 4.3.1 discusses about the machine learning related results, the MILP model that corrects the output sequence given by ML is discussed next in Section 4.3.2,following which the SMT model is described in Section 4.3.3.In Section 4.3.4,we summarise the SMT module.Thereafter, our experimental results are given in Section 5. Results on the Hamming weight model (software) are given in Section 5.1 (Section 5.1.1 for pseudo-random phase and Section 5.1.2for initialisation phase), and on the Hamming distance (hardware) model are given in Section 5.2.To test our framework for different SNR, we put results on success probability of our framework with respect to different SNR using a simulated traces in Section 6. Lastly, we conclude in Section 7.

Related Works
Coming to the side channel attacks on stream ciphers and related constructions, Rechberger et al. [RO04] present a detailed survey of the attack on stream ciphers and countermeasures in 2004.Fisher et al. [FGKV07] in 2007 propose a DPA on GRAIN-v1 [HJM07] and TRIVIUM [CP08].Their attack is carried out using multiple chosen IVs.Both attacks are done in the initialisation phase.To remove the noise, they suggest to average out power traces with identical input parameters.Strobel et al. [Str09] carry out a side channel attack based on algebraic method on GRAIN and TRIVIUM using multiple IVs in the initialisation phase.In 2008, a report on susceptibility of eSTREAM candidates is submitted by Gierlichs  In a recent work by Sim, Bhasin and Jap in TCHES'21 [SJB20] (which we refer to as DAPA for simplicity), the authors show that for TRIVIUM a partial key recovery (66 out of 80 bits) is possible from the initialisation phase, with the assumption that the obtained side channel leakage is noise-free.
Indeed, we observe that most of the relevant research works only target the initialisation phase.To the best of our knowledge, the only existing work to address the attack on the pseudo-random generation phase is [KAA + 17].This attack, however, is specific to the CRYPTO-I cipher; and probably does not scale up for mainstream ciphers like TRIVIUM2 .The difficulty of recovering state bits in the pseudo-random phase is that one can no longer use IV information as it is mixed up completely in the psuedo-random phase.A comparative summary of the literature survey with our work is shown in Table 1.In some sense, our work can be associated with Algebraic Side Channel Attack (ASCA).Some major results related to ASCA on block ciphers can be found in [Jaf07, RS09, CFGR12, RSV09, ORSW12].Their method to model HW as a SAT equation (described in [CFGR12]) will have a large complexity in the case of stream ciphers with more than billions of clauses because of a larger state size.ASCA uses heuristic approach, solving the system of equations using the side channel information.However, the method assumes that the side channel information is decoded and presented perfectly.In a real setting, the side channel information is noise prone, and the side channel decoding might not be error-resilient.Several improvements for ASCA have been proposed, to mitigate the error issue, such as extending the equation to accept set of possible solutions, as shown in Set-ASCA [RSV09, ZWG + 11] and Improved ASCA (IASCA) [MBZ + 13], or by using optimizer to solve imprecise deduction in the equation, as demonstrated by Tolerant ASCA (TASCA) [OKPW10,ORSW12].The latest approach is to exploit the soft information provided by profiled side channel attacks, commonly referred as the Soft-Analysis SCA (SASCA) [VGS14,GGSB20].Though powerful, SASCA [BCS21] can be quite demanding in terms of profiling required.However, as per our knowledge, all these attacks have only been demonstrated for block ciphers.

Brief Overview of the Framework
As noted already in Section 1, our paper presents an automated framework which is able to recover the unknown state or key bits of a stream cipher (or related constructions) given the side channel information from a device.Our framework is generic, as it can target any type of device (software or hardware), can work with a variety of ciphers that follow the basic Non-linear Feedback Shift Register (NLFSR) based construction (most notably, stream ciphers), considers the real leakage from a device (i.e., does not use simulated traces which are noiseless), needs only one execution of the cipher (thereby respecting the key -IV uniqueness), and can attack initialisation and pseudo-random phases alike.It involves three solvers (ML, MILP and SMT) that work in conjunction with the rest and returns the output in a reasonable time.
From a top level view, the overall framework can be conceptually spilt into two steps: (1) Predict Hamming weight/distance from traces.We need to get a good estimate of the Hamming weight (HW) on software or Hamming distance (HD) on hardware, given the leakage information.For simplicity and conciseness, here we focused on HW model initially.Later on we took a step forward to show that our SMT model works for Hamming distance model also.
(2) Fit an automated tool to retrieve key.Next, we need to fit the side channel information obtained to an automated tool which will eventually compute the secret key.If we are targeting the pseudo-random generation stage of the cipher, the key-stream is available, and that can be used in the solution process too.We intuitively choose ML (as it has native support for classification) for Step (1) and SMT for Step (2).However, the overall method turns out to be considerably more complex than just that, as described in Section 3.2.

Description of the Framework
This basic overview of our framework given in Section 3.1 suffers from two major problems.The relevant discussions, along with the respective remedies, are given next.

Low Accuracy by Machine Learning Model
The first major problem comes from the difficulty of classification of Hamming weight/distance from the ML instance.This happens due to the relative proximity of one class to the others.With the naïve implementation, the ML accuracy turns out to be little less than 40% (for more details of our ML model, see Section 4.3.1),despite dedicated efforts to increase it.
Having such a low accuracy is a problem, as the later part that takes these predictions and tries to solve for the unknown components, needs to be fed with 100% accurate predictions.Only one wrong prediction would potentially make the entire SMT system inconsistent.An initial estimate suggests that we would need a few hundred predictions for HW/HD for a typical cipher like TRIVIUM [DCP08].This means, one has to run an impractical number of independent experiments in order to get all correct predictions from the ML model.
To find a way to increase the accuracy of the ML model, we make use of the following interesting observation.We notice that, given the correct class is c, the model will more likely return a class from {c − , . . ., c, . . ., c + } than c for ≥ 1.In other words, it is highly likely that the ML model will predict a class within the proximity of the actual class (than it will predict the actual class).This gives us the idea of tolerance, , which can be interpreted as follows (consider the correct class is c): (i) When = 0, the accuracy of the ML prediction is counted as-is (i.e., we count the case as accurate if predicted class is c).(ii) When > 0, we count the case where the predicted class belongs from {c − , . . ., c, . . ., c + } as accurate.Intuitively, with the increase of , the corresponding (custom definition of) accuracy will increase.At the same time, the SMT instances can be tweaked to work with a more flexible predictions (that contain the tolerance ).More specifically, any SMT constraint of the form, x = c can be reformulated with as, c − ≤ x ≤ c + (x can be, for example, the HW of the state and c is the predicted class).Thus, the concept of error tolerance, , allows us to increase accuracy for the ML prediction, and is compatible with SMT modelling.However, a trade-off for tolerance is to be made.Making larger will increase the ML prediction accuracy, but will make the SMT instance less effective.Ultimately, beyond a certain threshold for a given platform, the SMT solver will take impractical time to solve, rendering it useless.In our experiment with the ARM Cortex M3, we observe that the ML model achieves accuracy of 100% starting tolerance = 7 (Table 3).For this level of tolerance, the formulated SMT instances appear to take too much time for our target cipher TRIVIUM (more results can be found in Section 4).This implies, we need to reduce to get a solution from the SMT problem in a reasonable time, at the expense of reduction of accuracy for the ML model (unless high performance computing resources are used).Reducing further to 4, we observe that the corresponding SMT instance is solvable in our set-up, but the solution time is around 13 to 21 hours.Finally, we settle down at = 3, where the ML accuracy is 99.784%, which is quite high, and also the SMT instances can be solved on an average of 10 hours.This choice of appears to hit the sweet-spot for the conflicting requirements.An instance for the scenario can be seen from Table 2, where the number in the parenthesis represents the number of rounds3 used for SMT modelling with TRIVIUM during its pseudo-random phase (key-stream information is used in the solution process).

No Resilience by SMT Model
If we could use a higher tolerance (e.g., ≥ 7 in Section 3.2.1)so that the ML prediction works with 100% accuracy, then we would be certain that all the information passed to  the SMT problem is error free.However, as we have seen, increasing the tolerance too high would also mean the SMT problem is getting less information, that leads to more solution time for the solver.Although the choice for (which is 3 in our case) allows for a very high accuracy for the ML model (99.784%), it is not 100%.This means, about 3 in 1000 predictions on average will be wrong (wrong in this context means, those predictions are outside the desired tolerance).As mentioned in Section 3.2.1 already, this is a problem since one wrong prediction would render the whole SMT instance inconsistent.There is no way for the attacker to know if/when a system is inconsistent until the SMT instance returns inconsistent, and even after that the attacker does not know which prediction(s) is wrong.This, in a way, leads us back to the same problem of Section 3.2.1.However, now the difference is, with the introduction of (non-zero) tolerance, the prediction accuracy for ML is quite high.This suggests that there could be redundant information so that an error correction would be feasible.

MILP Model for Correction of Wrong Predictions
As mentioned earlier in Section 3.2.1, a generic MILP model attempts to correct the ML predicted sequence so that all predictions are within the desired tolerance limit (so that the final model based on SMT does not return inconsistent 4 ).
Indeed, keeping in mind that the ML model does not take into account the cipherspecific information (predicts the classes independently), an additional modelling can be put in place.Here we make use of the following types of information for a b-bit software platform (b consecutive bits of the state locations are updated at one clock, which are indicated as block): (A) Change of HW for entire state from a clock to next.Observe that maximum increase in HW of the internal state for any consecutive rounds is equal to the number of incoming bits to the internal state during state update.Similarly, maximum decrease in the HW of a internal state is equal to the number of outgoing bits.Also, since the microcontroller splits the state and runs the blocks consecutively, change in Hamming weight of each such block from round t to t + 1 depends upon the number of incoming and outgoing bits of that block.(B) Interaction among the blocks.
• If we encounter any block of the type shown in Figure 6(a), then after b rounds (size of microcontroller used), HW of one block will be transferred to the next/previous block.• One bit from previous block is shifted to the next block on the right side.This forms a conditional constraint, i.e., change in HW of next block depends upon whether the incoming bit is 1 or 0. To capture the observations in (A) and (B), we devise a third model.It takes the sequence of predictions by the ML model, and formulates an MILP problem (described in Section 4.3.2) which returns (possibly) a slightly modified sequence so that all the predictions are within the desired tolerance.Finally, the output from this MILP instance is given to the SMT solver, which solves for the unknown state/key of the cipher (see Section 3.2.4 for a discussion on this).
This MILP model works as follows.It takes a sequence of predicted classes from ML as an input and returns a sequence of classes, which is not far apart from the predicted classes, and which respects the internal cipher structure (noted in (A) and (B)).The catch, however, is that it may not work all the time, i.e., the returned sequence may fall outside the desired tolerance limit.Empirically, we observe out of 1000 independent trials that (here b = 32), the returned sequence of classes lies within the desired tolerance limit in 976 trials5 (see Section 4.3.2 for more details).That means, though the MILP succeeds most of the time, there is a probability of about 0.024 that the constraints to the SMT problem will be inconsistent.

Overall Construction of Our Framework
With all the information presented thus far, now we are able to describe a complete work-flow for our framework, a visualisation of it is given in Figure 1.During the offline (pre-processing) phase, we prepare the training of our ML model with the side channel traces.After this, during the online stage (when the actual attack is underway), first this ML model is queried with the observed sequence of traces and predictions from this model are noted with a certain predefined tolerance, .This predictions are done independently for each traces, thus it fail to account for the internal structure of the cipher.By capturing those specifics, we are able to ensure the sequence of predictions fall within the predefined with an MILP modelling (with high probability).This sequence is finally fed to an SMT solver that gives the solution for the state/key of the cipher.
Recovery Procedure for Inconsistent SMT Instance In the unlikely case that the SMT instance is inconsistent, we recommend repeating the attack in a combination of the following ways (recommendation in (β) is probably the most practical): (α) Retrain the ML model and run the prediction with same traces as input.As the initial parameters of the ML model are randomised, it may return a different sequence which could resolve the issue.(β) While collecting the traces, collect them for additional number of rounds.As shown in Section 4, we realistically need little over 100 rounds of traces.If, say, the trace is collected for about 500 rounds, some other part can be potentially targeted if one part seems not fixable by the MILP model.Since the success probability of MILP is quite high, we will most likely get success within 1 or 2 trials.(γ) Collect the traces from the device again 6 .However, in MILP, increasing the number of classes decreases the success probability and for SMT the solution time increases after a threshold.Therefore, a lower number of classes is beneficial for our framework, provided SMT instances are solvable.A similar idea was also explored in the context of block ciphers in [RSV + 11].

Note on State/Key Recovery
In our attack on pseudo-random generation phase (where the cipher is ready to produce key-stream or tag), we are able to recover the (unknown) state.Depending on the internal structure, (full) key recovery may or may not be possible.For our exemplary case with TRIVIUM, the state is invertible, thus it can be reverted back to the key/IV loading phase, finally enabling us to recover the key.In certain cases, however, the state update is not invertible, in that case (full) key recovery may not be possible, for example LIZARD [HKM17].In DAPA, the authors recover 66 (out of 80) bits of the secret key, the rest 14 bits are to be found by exhaustive search.As a direct comparison, we are able to retrieve the full (80 bits) key of TRIVIUM.

Target Device:
We take electromagnetic traces from ARM Cortex-M3, a 32-bit microcontroller (refer to Section 4.2 for more information on the setup).The corresponding attack for a 16-bit or 8-bit microcontroller are easier, as we show in Sections 5.1.1 (for key-stream phase) and 5.1.2(for initialisation phase, this is typically easier to solve).We also show some results related to hardware leakage in Section 5.2.
3. Tools: For ML (Section 4.3.1),we use a Multi-layer Perceptron (MLP) due to its simplicity, with PyTorch9 .Gurobi10 solver is used to solve MILP instances (Section 4.3.2).Finally, as for the SMT solver (Section 4.3.3),we choose Z311 .

Experimental Setup
For the experiments, we implemented the targeted operation in assembly on Arduino DUE (ARM Cortex-M3) with 512KB flash, 96KB SRAM and 84 MHz operating frequency.We use Riscure high precision EM probe to capture the leakage, on a Lecroy WaveRunner 610zi oscilloscope.A preliminary test with known fixed data is conducted to identify the best measurement spot on the board.We first measure the leakage by running grid search across the target device, and calculate the SNR [MOP07, BDGN13] for each position.Then, we choose the spot with the best SNR leakage shown in Figure 2 (i.e.SNR ≈ 3.3).
For profiling, we use stratified sampling to ensure all the HW classes are represented in the experiment.Around 33000 traces are collected with each stratum of HW of 0 to 32 (each HW corresponds to 1000 traces).Overall 2362000 ≈ 2 21.17 traces are used.

Part I: ML to Predict Classes From Traces
With the experimental set-up described in Section 4.2, here we describe the results related to machine learning.This is a classification problem with 33 classes (as the number of HW for the target microcontroller will lie in [0, 32]) based profiled SCA (supervised learning, in context of ML).This kind of usage of ML in SCA is not uncommon, one may refer to [PHJ + 17] for example.We choose MLP for the classification problem owing to its simplicity, though other ML models could be used.We collect 1000 traces for each of the HW classes for profiling.So in total 33000 traces are collected.After that, we collect more samples for the ML training.The total data-set size (training + validation + testing) for the ML training, including the traces for profiling, is 2362000 ≈ 2 21.17 .We note that all classes are represented in the 2 21.17 samples; but the data-set is imbalanced, i.e., the class distribution is unequal.
MLP models with input and output layer of size 500 and 33 respectively are trained with a 62.5/12.5/25training/validation/testing split and a batch size of 64.All the experiments are done with Python 3.8.5,PyTorch 1.8.1+cu102 and Numpy 1.20.3 on a server with an Intel Xeon Platinum 8260 CPU, NVIDIA Quadro GV100 32GB GPU, 256GB RAM and Ubuntu 20.04.1 LTS as the operating system.
ReLU is used as the activation function after the hidden layers.AdamW [LH17] with learning rate 0.0001 is used as the optimizer with weighted cross entropy loss as the loss function to account for class imbalance.During training if overfitting is detected, the training process is terminated prematurely and the models are discarded.The results from fully trained models are indicated in Table 3.
We experiment with other activation functions, as well as deeper networks.However, we observe that the performance in all the other models is comparable to, if not worse, than the 2 hidden layer model.Moreover, the 2 hidden layer model works efficiently due to its low depth.
Optuna We also experiment with the automatic hyper-parameter optimisation framework, Optuna 12 , to determine the optimal number of hidden layers and hidden layer size (following [KEI + 21, Section 4]).A data-set of size 2 19.345 with a 80/20 training/validation split is used.The number of trials is set to 500 and default settings are kept.At the end of the optimisation process, Optuna generates a model with 3 hidden layers each of size 416 as the optimal model with 569121 parameters.This model when retrained on the full data-set produces results similar to our 2 layer model with 84897 parameters as indicated in Table 3.During the optimisation process, we observe that Optuna would pick the same set of hyper-parameters multiple times; each time with slightly different validation accuracy, without pruning the trial.The reported accuracy in the Table 3, for the different data splits, is the average accuracy of all batches in an epoch rounded to 5 decimal places.All in all, we conclude that our model with 2 hidden layers (with 128 neurons per layer) performs optimally compared to the models returned by Optuna.Also, with this MLP, the accuracy (when tolerance, = 0) that we observe is just shy of 40%.

Part II: Correction of ML Predictions by MILP
As discussed in Section 3.2.3,we need a model that can correct the sequence of ML predicted classes.MILP appears to be more efficient than SMT for this purpose.
Observe from Table 3, that increasing tolerance improves accuracy but on the other hand it also increases SMT solution time (check Table 2).In TRIVIUM, the designed SMT model is able to recover the state bit only if all of input classes are within the error tolerance 4 in key-stream generation phase, i.e., the predicted HW class should lie in the range [HW org − 4, HW org + 4] (see Tables 5a, 5b and Section 5.1.1),where HW org refers to the original HW class.Here, we focus on error tolerance 3 as SMT solution time is less but 0.216% of HW classes are incorrectly predicted.In order to correct those anomalies, we have designed an MILP model.
In order to show that our attack is generic, we first describe the MILP model on a general FSRs based stream cipher under HW model.Without loss of generality, we can assume the FSRs involved in the cipher are left oriented.As shown in Figure 3(a), assume the cipher has n registers (R i , for i ∈ {0, 1, . . ., n − 1}) and their elements are shifted to their left position during update.The blue arrow represents the incoming and outgoing bits of each register and black arrow shows that bits are shifted in left direction during update.Let us divide the internal state into a sequence of consecutive blocks (B i , for i ∈ {0, 1, . . ., m − 1}), each of length same as that of the microcontroller (8/16/32) used, see Figure 3(b).Dashed black arrow represents that either the last bit of block is shifted to its immediate left block or it is discarded.
General structure of a stream cipher based on FSRs The main idea of MILP modelling is to exploit the dependency relation of blocks shown in Figure 3(b) and keep track of incoming and outgoing bits.Using these relations we will search for a sequence of HWs which follows these relations and near to the predicted HWs sequence.Therefore the optimization function would be to minimise the distance of HWs sequence variables from predicted HWs sequence.It is to be noted that these relations have not been exploited by either ML or SMT.Now the algorithm to generate the constraints and objective function for the MILP instance from the block dependency relation is as follows.
Objective Function: Minimise the distance between sequence of HWs variables and sequence of predicted HWs.
Constraint I: Since there are n registers involved (see Figure 3(a)), the number of incoming bits for the internal state is equal to the number of outgoing bits (blue arrows), which is equal to the number of registers involved, i.e., n.It implies that the maximum change in HW of internal state for consecutive rounds is less than n.
Constraint II: For block B i (0 ≤ i ≤ m − 1), assume that the number of incoming bit is in i and the number of outgoing bit is out i .Then the HW for block B i after update either increases by at most in i or decreases by at most out i .See Figure 4, the blue line represents the incoming/outgoing bits from the register update function and black line represents the incoming/outgoing bits from its immediate left or right block.Case 1: Pair of the type as shown in Figure 6(a), where dashed arrow on block B i shows either the last bit is fed to the block B i−1 (black) or being discarded (blue).Similarly for block B i+1 either the incoming bit at last position is coming from register update function or from the block B i+2 .In this case we can observe that the HW of the block B i+1 at round t will become the HW of the block B i at t + mc len round, where mc len is the length of microcontroller/block.
Case 2: Pair of the type shown in Figure 6(b).Let in i (in i+1 ) represents the number of incoming bits to the block B i (B i+1 ) respectively.Similarly, let out i (out i+1 ) represents the number of outgoing bits from the block B i (B i+1 ).Now observe that if the HW of the block B i+1 increases by in i+1 (i.e., maximum increase) in the next round, then it implies that all outgoing bits are 0, which in turn implies that one of the incoming bits to the block B i is 0 (see black arrow between B i+1 and B i ).It means that HW of the B i can increase by at most in i − 1 for that consecutive rounds.Similarly, if the HW of the block B i+1 decreases by out i+1 (i.e., maximum decrease) in the next round, then all outgoing bits from B i+1 are 1, which implies that one of the incoming bit to the block B i is 1.Thus the HW of the block B i can decrease by at most out i − 1 in the next round.
As per the construction, sequence of original HW classes will satisfy constraints I, II and III.Through the MILP modelling we are trying to search for a sequence of classes which satisfies the above dependency relations, and are close to the sequence of predicted classes.So if there are few wrongly predicted classes, it will return a sequence which is more close to original class (because of Constraints I, II and III).Thereby we can formulate this as a minimisation problem.The MILP instance for the above is as given next.
Bi Bi+1 The following notations are used subsequently.Let N be the number of rounds, then for 0 ≤ i ≤ N − 1 and 0 ≤ j ≤ m − 1: 1. n is the number of registers in the stream cipher, 2. mc len is the size of microcontroller used, 3. HW var [i, j] = MILP variable for HW of block j at round i, 4. HW p [i, j] = predicted HW of block j at round i, 5. HW org [i, j] = Original HW of block j at round i, 6. in j represents the number of incoming bits to Block B j , 7. out j represents the number of outgoing bits from Block B j .Let D 1 = {0, . . ., N − 1} and D 2 = {0, . . ., m − 1}.Then the mathematical representation of aforesaid MILP model is described subsequently.

Type of Constraints:
For i ∈ D 1 \ (N − 1) and j ∈ D 2 , do the following.

Constraint I:
j∈D2 (HW var [i, j] − HW var [i + 1, j]) ≤ n Constraint II: −out j ≤ HW var [i + 1, j] − HW var [i, j] ≤ in j Constraint III: For j ∈ D 2 \ {0}, such that the pair of blocks (B j−1 , B j ) is in the accepted pair category (see Figure 5(a)): • Case 1: If the pair (B j−1 , B j ) is of the type as shown in Figure 6(a), then, HW var [i+ In this system of MILP constraints, Constraint III comprises constraints of the form Condition 1 =⇒ Condition 2. We can not model this implication relation in Gurobi directly.However, we can model this by converting it to the indicator constraint 13 , which is of the form (z = n) =⇒ Condition; for n ∈ {0, 1}.Therefore the above conditional statement can be converted into a set of indicator constraints as follows: (z = 1) =⇒ Condition 2; (z = 0) =⇒ ¬ (Condition 1) where ¬ denotes negation and z is a dummy binary variable.
Coming back to TRIVIUM MILP modelling, first note that TRIVIUM consists of right shift registers whereas the above mentioned model is on left shift register.Constraint I and II will remain same for both left and right FSRs based cipher, whereas in Constraint III, j − 1 will be replaced by j + 1.Since TRIVIUM comprises of 3 registers, we can replace n by 3 in Constraint I.For Constraint II, observe from Figure 7 that for any block, the number of incoming bits is same as the number of outgoing bits i.e., in j = out j , 0 ≤ j ≤ 8. Therefore, in j = 1 = out j ∀ j ∈ {0, 1, 3, 4, 6, 7, 8}, whereas in j = 2 = out j ∀j ∈ {2, 5}.Thus we can form Constraint II.In TRIVIUM, all consecutive pair of blocks comes under the category of accepted pair block (see Figure 5(a)).Therefore, Case 2 of Constraint III holds for all consecutive pair of blocks.For Case 1 of Constraint III, the block pairs (B 0 , B 1 ), (B 3 , B 4 ), (B 6 , B 7 ) and (B 7 , B 8 ) are the right candidates.
In Table 4, we note down the results as follows.We conducted experiments on a simulated cipher, where the noises are injected in the same distribution as that of ML output (MLP-II in Table 3).We collect 1000 such random samples, where each sample is a sequence of noisy HW classes, where the noises follow the MLP-II distribution.Then we feed it to the MILP solver.To test its success probability, we note down how many experiments successfully corrected all of the predicted classes with tolerance 3. The percentage of this with respect to the total number of experiments is indicated in the third column.
As can be seen from Table 4, solution for Constraint I + II + III can be obtained in a matter of seconds.We can deduce from the table that success rate varies with the number of rounds.The best success rate is attained at round 110, i.e., 97.6.This means, out of 1000 independent experiments (where input to each experiment is a sequence of 110×9 = 990 classes), the model successfully corrects the input sequence in 976 experiments approximately.Suppose, we will repeat the same experiment independently for n times, then the probability that we will succeed at least once is, 1 − (1 − 0.976) n ≈ 0.999424 (for n = 2), with trace information for 110 rounds.Thus we can conclude that we can correct sequence of predicted HW classes in 1 or 2 trial(s) with a very high probability, when traces information for 110 rounds are available.
All of the computations reported here are done on the MILP solver Gurobi-9.1.2with Python-3.8.12, on an Intel Xeon E5-2670 v3 @ 2.30GHz CPU, running on a 64-bit Ubuntu-20.04operating system.It is to be noted that with increase in the number of rounds, the solution time taken by Gurobi also increases.So if we needed to correct classes for higher number of rounds and solution time becomes high, then we can drop one or two constraints to reduce the solution time.Success probability might differ as per the types of constraints.Also Gurobi's performance varies with the system configuration and number of threads.In our experiment increasing the number of threads also increases solution time as well as memory, so we carried out our experiment using one thread.

Part III: SMT to Solve for Unknown State/Key
Up to this point, we presume that the attacker has side channel traces in possession, obtains ML predicted sequence of classes (see Section 4.3.1).Then the attacker uses the MILP model to correct the sequence of predicted classes (refer to Section 4.3.2).Since the leakage data obtained is erroneous and it is not clear how to model arithmetic addition (for HW/HD equation), logical operation (for key-stream and state update function) and erroneous data in SAT simultaneously in a single system of equations.So we tried to convert the whole system of equations into a system of modular equations as modular operation is well supported through the data structure BitVec14 in the SMT solver Z3.Instead of forming direct large degree equations, which is difficult to form and simplify, we form more smaller degree equations with the help of dummy variables, which work as follows: equate the dummy variable with the register update function and update the internal state with that dummy variable as the incoming bit (the concept of dummy variables is used before, see [Bar09] for an explanation or [BMS15] for another usage).Algorithm 1 is a generic way to form the SMT instances.For the mathematical formulation of SMT, assume that the MILP solver returns a HWs sequence, HW p , such that and some fixed > 0, where N is the number of rounds and HW org refers to the original HW class.Now HW p , key-stream Z, microcontroller size, tolerance and structure of the cipher are the inputs to Algorithm 1.In Step 1, N block represents the number of consecutive blocks for the internal state, each of size mc len .Since we are using modular operation, we have to define modulus large enough to accommodate maximum HW of any block, which is equal to the size of microcontroller, mc len .So, we use the modulus of 2 log 2 (mc len ) +1 operation15 instead of arithmetic addition for HW, whereas for key-stream and update function we use logical operation.Thus, define bitlen = log 2 (mc len ) + 1 (Step 2).Initialise the solver m.Define state variables S var of size state len , each of the structure BitVec(•,bitlen) (Step 4).Similarly, define dummy variables of the same data structure for each register and for each round.For each such defined variables, put the constraints that its value is either 0 or 1, see Step 8. Now for each round, form the key-stream equations (Step 10) and HW class equation (Step 13) and feed it to the solver.During the state update operation, update the state normally (Step 15), extract update equations (from updated positions), equate it to dummy variables (Step 18) and put dummy variables at the updated positions (Step 19).Feed all the constraints to the SMT solver to solve.If all of the HW classes predictions are within tolerance and the solver returns a solution in a feasible time, then verify it (Step 26).If not verified, run SMT again by increasing the number of rounds and predicted sequence length.If it returns inconsistent, that means at least one predicted HW class fall outside the tolerance .For this case, follow the recovery procedure given in Section 3.2.4.
In case of TRIVIUM, we can proceed with the generic Algorithm 1 by using the parameter state len = 288, N block = 9, r = 3, size of each register and their update position in the internal state.We carried out the SMT experiment using a simulated cipher, where we inject noise randomly (i.e., uniform distribution) to the original HW/HD information within the given tolerance class.The results of the experiments are shown in Section 5.

Part IV: Putting It All Together
For the attack during the pseudo-random generation phase of TRIVIUM (see Section 5.1.1 for the relevant results), note that our framework is able to work in a single key/IV setting.
Insert key-stream(S) = Z[i] to the solver m (Optional) 11: for j ← {0, 1, . . ., N block 1} do 12: 13: Also, attacking this phase is considerably more complex than attacking the initialisation phase, the relevant results on which can be found in Section 5.1.2.
The same key/IV setting is set for the attack during the initialisation phase (results are given in Section 5.1.2).In this case, since only 80 bits are now unknown (down from 288 in case of PRGA attack), the SMT solution time is drastically improved, even for higher tolerance, thereby cutting of necessity of the correction by the MILP model (Section 3.2.3).
Note that Algorithm 1 employs constraints found in the forward direction only.We can further utilise the backward equations, as the state update of TRIVIUM is invertible.By doing so, we have more equations with less degree monomials of dummy variables, which makes it easier to recover those variables.This idea is used only in the attack during the key-stream generation phase, whereas for the initialisation phase only forward equations are used as we start from round 0 to make use of IV.We optimize the Algorithm 1 further, by removing the condition 8, and defining all variables of the structure BitVec(•, 1) i.e., 1-bit variable.For key-stream and update equation we can proceed as given in the algorithm, whereas for HW equation we extend the variable size by appending zeroes using the function ZeroExt(bitlen-1,•) 16 .This is to perform modulus 2 bitlen operation on HW constraints.
The solution for SMT instances are carried out with Z3/Python 3.8.5.The CPU used for the computation is Intel Xeon CPU E5-2670 v3 @ 2.30GHz running a 64-bit Ubuntu-20.04.

Pseudo-random Phase
Table 5 shows the solution time taken by the SMT solver to produce unique solution under HW/8, HW/16 and HW/32 models and with varying error tolerance (i.e., its input is a sequence of predicted HW classes within error tolerance ).In Table 5a, the key-stream equations are not included in the modelling (only update and HW equations are used), whereas Table 5(b) shows result when key-stream equations are included.Number of rounds is initially selected to be near the half of the number of original variables (i.e., 110).If the solution time is within few seconds and no multiple solutions is observed; then we increase error tolerance by 1, otherwise we increase/decrease the number of rounds by 20-30 to check if solution time decreases and unique solution exists.If we take number of rounds too low, then we generally have multiple solutions 17 .
In Table 5(a), under HW/8 model (recall the description of HW/8, HW/16 and HW/32 from Section 4.1) we can recover state bit up to the error tolerance 3, which is almost half of the corresponding word size of the microcontroller.Beyond error tolerance 3, Z3 returns multiple solutions even for higher rounds, and also takes longer time to solve.For HW/16 model, we could find solution for up to tolerance 4 in about 1-2 hours.For the 32-bit microcontroller, the results are as described next.As per Tables 5(a) and 5(b), we have the SMT solution time up to the error tolerance = 4. Table 4 shows results for the success rate of MILP modelling only for tolerance 3, as in this case we have more results and less SMT time.But for lower tolerance more predicted HWs are wrong (i.e., lies outside the expected tolerance limit), which in turn makes SMT system of equations inconsistent (as positions for wrong classes are unknown).Aforementioned MILP model can correct these wrong HWs and return a sequence of HW classes within the tolerance 3 with a very high success rate (see Table 4).We now describe the total solution time with respect to tolerance 3 (as we have more solutions for this bound of tolerance, than that of 4).For 150 rounds, the success probability is 0.968 (see Table 4), Gurobi takes 6.38 seconds to solve and the SMT solution time is 49975.49second (see Therefore, to conclude, we can recover the state bit of TRIVIUM in 28763.22seconds for 170 rounds (with key-stream equations) with probability 0.946.The SMT solution time is 36297.41seconds for 180 rounds (without key-stream equations) with a success rate of 0.943.We ignore the overhead time taken to form the SMT and MILP instances, as it is within a few seconds.Note that, the success probability is less than 1.If the SMT instance returns inconsistent in the first trial, then for next trial we can target the state at different/same round during key-stream phase under same key/IV.Therefore, we do not need to change IV or rerun the cipher for another trace.
For the computation of the success probability in n trails, note the following.Let p be the success probability in first trial.Then the probability that we get success at least once in n trials is 1 − (1 − p) n .The result above with best probability is 0.968 (150 rounds).Therefore, on 2 trials we will succeed at least once with a probability of 0.9989.

Initialisation Phase
In the initialisation phase of TRIVIUM, the number of unknown variables reduces from 288 to 80 as only secret key bits (80 bits) are unknown.This enable us to solve the SMT instance for higher tolerance, as we describe next.While the results in our paper are all given with a single IV, we like to note that our SMT model can readily deal with multiple IVs.Further, since for each IV the system gets some new information, the solution time will likely be less and we can solve SMT instances for higher error tolerance.Table 6 shows solution time for HW/8, HW/16 and HW/32 in the initialisation phase.For a given tolerance, initially we select the number of rounds as 140 (as for 130-140 rounds and tolerance 3-4 in key-stream generation phase, the SMT solution time is feasible) see Tables 5a and 5b.Thereafter, if multiple solutions are observed then we increase the number of rounds by 10 and solve.Notice under the HW/8 and HW/16 scenarios, we reach at the optimal point at the half of the word size of the microcontroller 18 .
For the HW/32 model during the initialisation phase, we can solve up to error tolerance 15 (almost near to the optimal point).However, as per Table 3, HW class can be predicted with 100% accuracy for error tolerance 7. Thus, we can create the SMT instance with any tolerance beyond 6 and solution time can be found with probability 1.Therefore, we do not use the MILP based correction here.In this case, we can achieve the best solution time of 79.49 seconds (170 rounds) with probability 1.
On the other hand, as given in Table 3, the accuracy for tolerance 4 is 0.99965.Therefore with HW information for 140 rounds (i.e., HW prediction of 140 × 9 = 1260 blocks), on an average 140 × 9 × 0.999655 ≈ 1260 HW data are within error tolerance of 4. Thus, we also solve the SMT instance for tolerance 4 to obtain a solution within 8.58 seconds.If it returns inconsistent, we increase the tolerance and solve the equations (refer to Section 3.2.4 for more details on the recovery procedure).

Hamming Distance (Hardware) Model
On top of the MILP model for HW (as detailed in Section 4.3.2), the following observations are used for MILP modelling for HD leakage model.Note that, the HD of a register changes by {−1, 0, 1} during state update.Therefore, total changes for an n-register cipher changes by {−n, −n + 1, . . ., n − 1, n}.For SMT modelling, note that Algorithm 1 also holds for HD model when the HW-constraints are replaced with the HD-constraints under modulus 2 log 2 (state len ) +1 (as the maximum possible HD is equal to the state size, state len ).

Initialisation Phase
In Table 7, we present the results for TRIVIUM under HD model in the initialisation phase.Let the internal state at the targeted round t be denoted as S t = [s 0 , s 1 , . . ., s 287 ], i.e., a tuple of unknown variables.The last column indicates the location for the bits which are guessed.In the same column, s i • • • s j for 0 ≤ i ≤ j ≤ 287, denotes that all variables in the range s i , s i+1 , s i+2 , . . ., s j are guessed.We tried up to tolerance 3 only, but it can also be extended for higher tolerance.The guessed bits are chosen consecutively near the variables which are involved in state update function.It is selected consecutively because during update the variables inside the register are shifted to its right position, i.e., variable on the immediate right side of the tap 19 position moves to the tap position in the next round.From Table 7, we can see that up to tolerance 1 we can recover state bits without any guess.However for higher tolerance 2, we 18 'Optimal' here means, beyond that tolerance we get multiple solutions even at higher rounds. 19By tap position, we refer to the position in the state which is used the key-stream/update function.
need at least 10 guessed bits and for tolerance 3, we need at least 20 guessed bits.All of that can be done in practical time.For guessing, we have to go through all the possibilities and try SMT algorithm for each guess.However, the SMT time would be lower for a wrong guess, i.e., it will return inconsistent for a wrong guess faster than consistent result for the right guess.

Pseudo-random Phase
Regarding the pseudo-random phase, the system of constraints here has 288 unknown variables which is quite high as compared to the 80 variables in the initialisation phase.We do not get any result till 2 days.However, with a guess of 140 bits we get the result, but doing so exceeds the complexity of exhaustive search on the key (which is only of 80-bits).A probable reason behind this is that we have a large number of unknown variables and the key-stream in TRIVIUM involves only 6 bits which does not carry enough information.The higher effort required to attack pseudo-random phase as compared to initialization phase can also further motivate use of levelled implementations [BBC + 20], where resource friendly countermeasures may be considered for pseudo-random phase, while initialization must be well protected.

Success Probability with respect to varying SNR
We test our framework (in pseudo-random phase) with different SNRs.We add Gaussian noise with 0 mean and different standard deviations to the previously used traces to simulate low SNR settings.The parameters used are as follows: 2 20 data-set size, 50 epochs, MLP with two hidden layers of 128 neurons each, a 62.5/12.5/25training/validation/testing split and a batch size of 64.We used the same MLP architecture that was used for the original experiment.The accuracy of the output is then fed to MILP, which runs on a simulated cipher and noises are injected as per the ML accuracy.We tested MILP experiments for 130 and 150 rounds as SMT solver needs information of either 130 rounds or 150 rounds for state recovery under error tolerance 3 (see Table 5(a)).The MILP experiments are carried out using 1000 random trials.Thus the success probability of MILP (success probability of the whole experiment) with respect to varying standard deviation is shown in Table 8.From the table, we can see that beyond SNR 1.12124, the success probability drops to 0. Thus, SNR 1.12124 is the threshold for our framework with the current experiment setting.Alternatively, one can also optimise ML architecture to boost accuracy at lower SNR (where possible), to perform key recovery at lower SNR.

Conclusion
In this paper, we show a pragmatic framework that recovers the state/key from stream ciphers and related constructions from the side channel information (power or EM).Our framework is able to attack the initialisation phase (i.e., before the cipher reaches its pseudo-random phase) and, more importantly even after the cipher reaches its pseudorandom phase to produce key-stream.The efficacy of our framework is shown through the EM leakage from a 32-bit software platform on the high-profile stream cipher TRIVIUM in Section 5.1.To add the cherry-on-top, we also show how our model can work with the hardware leakage in Section 5.2.Our framework is based on profiling of side channel leakage, i.e., it works at offline and online stages.In the offline stage, the attacker collects information about the side channel leakage to build a profile, which is then used during the online stage to find out secret information.The analysis of the offline stage starts with an ML model (Section 4.3.1)where we use an MLP with only 2 hidden layers.Since our target software platform has 33 HW classes, our first intuition is to see the accuracy by the model for each of the individual 33 classes.In the process though, we discover that it is hard to go beyond 40% accuracy, even with deeper networks.Thus, we introduce the concept of tolerance (denoted by ), where we broaden the scope for correct prediction from the ML model.For a given and the correct class c, we count the prediction as correct with tolerance if the predicted class lies within the 2 + 1 neighbourhood of c, i.e., {c − , c − + 1, . . ., c, . . ., c + − 1, c + }.This allows for a higher accuracy from the ML model as increases.The offline stage ends with an SMT solving (Section 4.3.3),that returns the state/key of the cipher.However, increasing comes with the catch that the solution time taken by the SMT solver increases.Thus, it becomes important to find the proper balance where the ML accuracy is considerably high and the solution time taken by the SMT solver is reasonably low.With our experimental data from the 32-bit ARM Cortex-M3, we settle at = 3 where the ML accuracy is about 99.7%, and the SMT solution time with our target cipher TRIVIUM is about few hours.Even though the ML accuracy is quite high, still certain wrong (i.e., outside the desired limit) predictions will pass through.One such constraint, again, will make the system inconsistent.To mitigate that, we introduce an intermediary between ML and SMT, which is based on MILP (Section 4.3.2).This MILP model takes the sequence of predictions from the ML model, and transforms (with a high probability) into another sequence where all the predictions lie within the desired level.
Additionally, to test the limits of our approach for varying SNRs, we added Gaussian noise to the previously measured traces.We observed that at SNR of ≈ 2 and ≈ 1.1, the ML accuracy falls to 0.95744 and 0.88698 (for error tolerance 3), respectively.This decrease in ML accuracy degrades the success probability of key recovery to roughly 0.36 (for SNR=2) and 0.016 (for SNR=1.1).We note that one can explore different pre-processing techniques and can optimise the ML architecture to mitigate the effect of low SNR on the success probability.
To conclude, we list the following problems that may be of interest as follow-up works: • Analytical approach for less than perfect accuracy: Our approach requires a strict condition that the accuracy of the HW/HD prediction has to be 100%, otherwise the SMT instance will become inconsistent.However, it may be possible to come up with an analytical approach, such as a Hidden Markov Model (HMM), that can work (with a high probability) when the accuracy is lower than 100%.This can take some load from the ML module (Section 4.3.1)and can potentially remove the MILP module (Section 4.3.2). • Improvement of ML: First, while we attempt to find an efficient ML model, this is not the focus of this research, and can likely be improved.For instance, one may experiment with other types of ML models, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), or Long Short-Term Memory (LSTM) networks.The electromagnetic traces can be treated as time series data and exploring the feasibility of RNN/LSTM would be an interesting direction of study.Second, data pre-processing techniques such as normalisation, scaling, and feature selection can be used for a potential performance gain and improvement of the training stability.For instance, instead of feeding the complete trace to the ML model, we can identify points of interest and only use those to reduce the complexity.Third, the ML model does not take into account the specifics of the cipher (those specifics are utilised under the MILP module in Section 4.3.2).This can be considered in a more general setting, ultimately cutting-off the MILP module.• Form of leakage function: Extension of our framework for polynomial leakage function [GSP13] and weighted HW/HD leakage function [DPRS11] instead of HW (software) or HD (hardware) can be considered in the future.These leakage functions will probably have an impact on our framework, as in our case, information on a linear equation gives better results than the information on a general polynomial equation.For a weighted leakage function, since the current SMT model works on modular operations, for a negative weighted leakage function the SMT model needs to be improved.Additionally, it also depends upon SNR of the side-channel measurements.
et al. [GBC + 08].They show the vulnerabilities of the eSTREAM ciphers against various SCAs.They suggest an attack on TRIVIUM and GRAIN-v1 during the initialisation only.Later some SCA on stream cipher is carried out, such as; [HYY + 10] on K2; [QGGL13] on CRYPTO-I (an LFSR-based ciphers); and [KAA + 17] on CRYPTO-I, TRIVIUM, GRAIN and BIVIUM-B.All of these assume noiseless traces and needed multiple IVs for their attack.In 2015, Chakraborty et al. [CMM15] propose a side channel attack on GRAIN family of stream ciphers using multiple IVs in the initialisation phase.Tena-Sánchez et al. presents two papers [TA15, TSA15] in 2015 on TRIVIUM.Both the papers target the initialisation phase and need at least 1200 different IVs but can deal with noisy traces.

Figure 1 :
Figure 1: Schematic of work-flow of our side channel attack framework

Figure 2 :
Figure 2: SNR for the measurement setup

Figure 3 :
Figure 3: Internal state and its block representation

Figure 4 :
Figure 4: Representation of a single block

Figure 7 :
Figure 7: State registers of TRIVIUM (split into blocks of 32 bits)

Table 1 :
A comparative study of our work with other stream cipher SCAs

Table 2 :
Trade-off between ML tolerance and SMT solution time (sec.)for TRIVIUM [DCP08]h-profile stream cipher, TRIVIUM[DCP08], which is chosen as a final round candidate in the eSTREAM project 7 , and also is standardized in ISO/IEC 29192-3 8 .Consequently, this cipher has received a large number of analysis, see [HLM + 20, SMB17] for examples.It is chosen by the authors of DAPA [SJB20] as well.

Table 4 :
Success rate of MILP model for TRIVIUM Algorithm 1 SMT modelling for SCA (from HW data) Number of rounds, N ii.Internal state size, state len iii.State registers R1, R2, . . ., Rr; with |Ri|= ri∀i iv.Position to update in the state, pos v. Microcontroller word size, mc len vi.Tolerance, ≥ 0 vii.An array of size N , Z, that stores the key-stream viii.Predicted classes HWp of size N × N block Create Model 1: N block ← state len ÷ mc len Define state len number of BitVec variables for the internal state Svar, each of length bitlen-bit.5: S ← Svar 6: Define dummy variables R dum i , (1 ≤ i ≤ r) as a bitlen-bit BitVec variable each is an array of size N − 1 Generate constraints

Table 5
(a)).So the total required time is 49981.87seconds with a success probability of 0.968.The best result can be obtained for 170 rounds (see Table 5(b)) with solution time 28755.36+ 7.86 = 28763.22seconds with probability 0.946 (only 1 sample is available).Similarly for 180 rounds, the solution time is 36288.8+ 8.61 = 36297.41seconds with success probability of 0.943.

Table 5 :
Results on TRIVIUM in pseudo-random phase

Table 6 :
Results on TRIVIUM in the initialisation phase

Table 7 :
Results for TRIVIUM on HD model

Table 8 :
Success Probability with respect to SNR