Introduction#

Pseudorandom generators are used ubiquitously in cryptography in order to overcome the deterministic limitations of computers and generate good enough [[Randomness#pseudorandomness|pseudorandomness]] from true [[Randomness#admonition-randomness|randomness]].

An algorithm which fulfils the task of generating more bits from a smaller number of bits is called a generator.

Definition: Generator

A generator is an [[Algorithm Efficiency|efficient algorithm]] \(\operatorname{Gen}: \{0,1\}^X \to \{0,1\}^R\) where \(R\gt X\) which takes a binary string as an input and produces a longer binary string as an output.

![[res/Generator.svg]]

A generator which takes a short string of random bits, called a seed, and expands them into a larger string of pseudorandom bits is called a pseudorandom generator (PRG).

Definition: (Secure) Pseudorandom Generator (PRG)

A (secure) pseudorandom generator \(\textit{PRG}(seed: \textbf{str}[S]) \to \textbf{str}[R]\) is a generator such that for every input, called a seed, \(s \in \{0,1\}^S\) and every efficient statistical test \(\textit{ST}: \{0,1\}^R \to \{0,1\}\), the output \(\textit{PRG}(s)\) is pseudorandom, i.e. it holds that

\[ \left|\Pr[\textit{ST}(\textit{PRG}(s)) = 1] - \Pr_{r \leftarrow_R \mathcal{R}}[\textit{ST}(r) = 1]\right| \le \epsilon(R) \]

for some negligible \(\epsilon(R)\).

The set \(\mathcal{S} \coloneqq \{0,1\}^S\) is called the seed space and the set \(\mathcal{R} \coloneqq \{0,1\}^R\) is called the output space.

Intuition

This definition tells us that an algorithm \(\textit{PRG}\) which takes a uniformly chosen binary string of length \(S\) (i.e. "truly random" string), called a seed, and outputs a longer binary string of length \(R\), is a pseudorandom generator if there is no efficient statistical test which can distinguish between \(\textit{PRG}\)'s output and a string chosen uniformly at random from the output space \(\mathcal{R}\) with non-negligible probability.

Essentially, the definition says that the probability that any statistical test thinks a string generated by \(\textit{PRG}\) is random is approximately equal to the probability that the same statistical test thinks a string uniformly chosen from \(\mathcal{R}\) is random, i.e.

\[ \Pr_{s\leftarrow_R \mathcal{S}}[\textit{ST}(\textit{PRG}(s)) = 1] \approx \frac{1}{|\mathcal{R}|} \]

It does not matter if you understand the nitty-gritty details of this definition for the security of a pseudorandom generator because it is one of the most useless pieces of information you will encounter in your lifetime. The reason for this is that there is no known PRG which has been proven to satisfy this definition because being able to prove it means that one is able to prove that \(P \ne NP\).

Nevertheless, it gives us an idealised model for what a secure PRG should be.

Determining the Security of a PRG#

We can derive some properties from the definition of a PRG which can hint that a candidate PRG is secure and can be trusted.

Theorem: Unpredictability \(\iff\) Security

A secure PRG is unpredictable in the sense that there is no algorithm which given the first \(i\) bits of the output of \(\textit{PRG}\) can guess what the \(i+1\) bit would be with probability that is non-negligibly greater than \(\frac{1}{2}\). Similarly, an unpredictable PRG is secure.

Proof

TODO

Unfortunately, these two properties only provide a potential way to rule out an PRG as insecure. Proving that a PRG is unpredictable equally as difficult as proving that a PRG is secure, since it is essentially an equivalent definition for the security of a PRG.

Leap of Faith#

At the end of the day we just assume that secure generators exists. In fact, we have many PRGs that we believe to be secure but are just unable to prove it. Similarly, we have many PRGs that have been shown to be insecure and should not be used. So really, we consider a PRG to be secure until someone comes along and shows a way to break it. Since we have no better alternative, i.e. we do not know how to prove that a PRG is secure, we are forced to take the leap of faith and make-do with what we have.

Nevertheless, in order to be as safe as possible, one needs to make as few assumptions as possible and indeed that is what cryptography does. The only assumption regarding the existence of secure PRG which cryptography makes is the following.

Axiom: Existence of a Secure PRG

There exists a secure \(\textit{PRG}(seed: str[S]) \to str[S+1]\) which takes a seed of length \(S\) and produces a pseudorandom string with length \(S + 1\).

This assumption has neither been proven nor refuted, however there is a lot of evidence supporting it (and it better be true because cryptography falls apart otherwise). Okay, but this assumption in itself does not seem particularly helpful, for it only allows us to produce a pseudorandom string which is one bit longer than its random seed - we have really only gained 1 bit of randomness. Fortunately, it turns out that if we assume this to be true, this PRG can actually be used to construct a new PRG which takes a seed of length \(S\) and produces an output of any length \(T \gt S\) we might want.

Let's see how we can do this. We are given a pseudorandom generator \(\textit{Gen}(seed: str[S]) \to str[S+1]\) and want to use it to create a new generator \(\textit{GenT}(seed: str[S]) \to str[T]\)) which can use the same seed to produce a pseudorandom string whose length \(T\) is arbitrary. This is actually pretty simple. First, one feeds the seed \(s_0\) to the generator \(\textit{Gen}\) which will output a string \(s_1\) of length \(S + 1\). We can take the last bit of \(s_1\) and use it as the first bit \(y_0\) of the output of \(\textit{GenT}\). Taking 1 bit from \(s_1\) reduced its length to \(S\), so we can use it as input to \(\textit{Gen}\) once again. We repeat the process \(T\) times until the bits \(y_0, y_1, ..., y_{t-1}\) output by \(\textit{Gen}\) at each step form a string \(y\) of length \(T\).

![[res/Arbitrary PRG.svg]]

And here is a \(\textit{GenT}\) implementation in pseudo-code:

fn GenT(seed: str[S]) -> str[T] {
    let y: str[T]; // Initialise the output y
    let current_seed = seed;

    let i = 0;
    while i < T {
        let pseudorandom_str = Gen(current_seed); // Get the output of Gen from the current seed
        y[i] = pseudorandom_str[S]; // Use the last bit of Gen's output for the current bit of the output y; the last bit is at index (S + 1) - 1 = S
        current_seed = pseudorandom_str[0..S] // The new seed is the output of Gen without the last bit
    }

    return y;
}

This algorithm provides us with a generator that can produce a string of any length \(T \gt S\) given a seed of length \(S\). Well, there is actually one restriction - \(T\) must be equal to \(p(S)\) for some polynomial \(S\). Otherwise the above algorithm would take non-polynomial time to execute - it would not be efficient.

Proof: Security of \(\operatorname{GenT}\)

We are given the algorithm \(\textit{GenT}(seed: str[S]) \to str[T]\) with seed space \(\mathcal{S} = \{0,1\}^S\) and we need to prove that it is secure.

Let's introduce some notation. We denote by \(U_i\textit{Gen}_{T-i}\) a string whose first \(i\) bits were chosen according to the uniform distribution over \(\{0,1\}^{i+1}\) and whose remaining \(T-i\) bits were generated by the same algorithm that \(\textit{GenT}\) uses with some seed \(s_i \leftarrow_R \mathcal{S}\). We denote by \(H_i\) the distribution of strings generated in this way. Therefore, \(H_0\) is the distribution obtained by sampling the seed space and outputting only \(\textit{GenT}\), for there are no bits chosen from a uniform distribution in this case. Conversely, \(H_T\) denotes the uniform distribution over \(\{0,1\}^T\) because no bits are generated by the same algorithm that \(\textit{GenT}\) uses.

We need to show that \(H_0 \approx H_T\) which can be done by using the [[Randomness#admonition-theorem-triangle-inequality-for-computational-indistinguishability|Randomness]] and just showing that \(H_i \approx H_{i+1}\) for all \(i \in \{0,1,...,T-1\}\).

Suppose, towards contradiction, that there is a statistical test \(D(y: str[T]) \to \textit{bit}\) and some \(i\) such that

\[ \left|\underset{y \sim H_i}{\mathbb{E}}[D(y) = 1] - \underset{y \sim H_{i+1}}{\mathbb{E}}[D(y) = 1]\right| \ge \epsilon \]

for some non-negligible \(\epsilon\).

We now construct an algorithm \(D'(y: str[S+1]) \to bit\) which will interpret the \(y\) as an output of \(\textit{Gen}(s_i)\) at some stage which used a seed which we will call \(s_i\). This output is comprised of a seed for the next stage, i.e. \(s_{i+1} \coloneqq y[0]y[1]\cdots y[S-1]\), and one output bit, \(y[S]\). Subsequently, \(D'\) generates a string \(z\) of length \(T\). The bits \(z[0],z[1],...,z[S-1]\) are chosen according to a uniform distribution, the algorithm then copies the bit \(y[S]\) into \(z[S]\) and finally it generates the bits \(z[S+1],z[S+2],...,z[T-1]\) by using the same process as \(\textit{GenT}\) does, utilising \(s_{i+1}\) as the initial seed. At the end, \(D'\) will simply return \(D(z)\).

TODO