by Neal R. Wagner
Copyright © 2002 by Neal R. Wagner. All rights reserved.
NOTE: This site is obsolete. See book draft (in PDF):
Random numbers are very widely used in simulations, in statistical experiments, in the Monte Carlo methods of numerical analysis, in other randomized algorithms, and especially in cryptography. The connection with cryptography is very close, since any pseudo-random bit stream along with exclusive-or provides a cryptosystem (though not necessarily a strong system), and any good ciphertext should look like a pseudo-random bit stream (perhaps occurring in blocks). This section focuses on random number generators used in simulation and numerical analysis, but for use in cryptography, the recommended random number generators are derived from cryptosystems, both conventional and public key.
From the beginning (where ``beginning'' is the 1940s, the start of the computer age) there was interest in so-called ``true'' random numbers, that is, numbers generated by a random process in the world. Physical events such as the radioactive decay of particles are unpredictable except for their behavior averaged over time, and so could be used as a source of random numbers, but these events have been difficult to utilize and have been disappointing in practice. More promising recently are possibilities from quantum theory, but such matters are completely outside the scope of this discussion.
By far the most common source of random numbers is some deterministic process, such as a software algorithm. These provide ``random-looking'' numbers, but the numbers are not really random -- there is always an exact algorithm for specifying them. This is the reason that researchers now describe such numbers using the word ``pseudo'', which means ``false''. These are not true random numbers, but for most applications they can be just as useful. Sometimes they can be more useful, as for example when one wants to repeat a simulation with exactly the same random or pseudo-random numbers.
At first one might think that the best way to get random-looking numbers is to use a ``random'' algorithm -- one that does crazy operations, everything imaginable, in every possible order. Donald Knuth tried out such an algorithm as an example, and showed that its performance was no good at all. In its first run, Knuth's ``random'' algorithm almost immediately converged to a fixed point. Knuth was arguing that one should use science and great care in generating pseudo-random numbers.
An early suggested source of pseudo-random numbers was an equation which was much later to become a part of modern ``chaos'' theory. A later chapter describes a generator derived from this equation.
Another early idea for a source of random numbers was to use the bits or digits in the exapnsion of a transcendental number such as pi, the ratio of the circumference of a circle to its diameter.
3.14159 26535 89793 23846 26433 83279 50288 41972 ... (in decimal) 3.11037 55242 10264 30215 14230 63050 56006 70163 21122 ... (in octal)It has long been conjectured that this is a very good source of pseudo-random numbers, a conjecture that has still not been proved. In 1852 an English mathematician named William Shanks published 527 digits of pi, and then in 1873 another 180 digits for a total of 707. These numbers were studied statistically, and an interesting excess of the number 7 was observed in the last 180 digits. In 1945 von Neumann wanted to study statistical properties of the sequence of digits and used one of the early computers to calculate several thousand. Fortunately for Shanks his triumph was not spoiled during his lifetime, but his last 180 digits were in error and his last 20 years of effort were wasted. Also there was no ``excess of 7s''. The number pi has now been calculated to many billions of places, although the calculation of its digits or bits is too hard to provide a good source of random numbers. The later digits are harder to calculate than earlier ones, although a recent clever algorithm allows calculation of the nth binary (or hexadecimal) digit without calculating the previous ones.
Later work focused on a particularly simple approach using a congruence equation.
This approach uses a linear congruence equation of the form:
xn+1 = (k*xn + a) mod mwhere all terms are integers, k is the multiplier, a (often taken to be 0) is the increment, and m is the modulus. An initial seed is s = x0. Each successive term is transformed into the next. The pseudo-random terms are in the range from 1 to m-1. To get (more-or-less) uniformly distributed floating point numbers between 0 and 1, just do a floating point division by m. Assuming that a = 0, the quality of the numbers produced depends heavily on k and m.
This type of generator can produce at most m-1 different numbers before it starts to repeat. To get this behavior, one can start with a prime number for m and use a generator for k so that all m-1 numbers will be produced in a repeating cycle, starting with whatever the seed s is.
The old generator provided by the C standard library uses 16-bit integers, and so has a maximum possible cycle length of 216 = 65237 -- a ridiculously small cycle, making the generator usless except for toy applications. By far the most common generator of the past was implemented on hardware with 32-bit integers and used the fact that 231-1 is a prime. The multiplier commonly used (starting in 1969) was 75 = 16807 and the constant a was taken to be zero. This generator is quite efficient, and has a cycle length of 231 - 2. The multiplier was chosen so that various statistical properties of the sequence would be similar to the results for a true random sequence. In the 1970s when I first started using this sequence the cycle length seemed quite long -- now it seems quite short since I frequently run experiments with hundreds of billions of trials.
Knuth in his chapter on conventional random number generators approves the values m = 231 - 1 and k = 16807 above as ``adequate'', but he has suggestions for better values, such as:
| m | k |
|---|---|
| 231-249 | 40692 |
| 231-1 | 48271 |
| 231-1 | 62089911 |
| 232 | 69069 |
| 248 | 31167285 |
| 264-1 | 6364136223846793005 |
Knuth suggests various generators, including one that combines the first two table entries above:
xn+1 = 48271*xn mod (231 - 1), yn+1 = 40692*yn mod (231 - 249), zn = (xn - yn) mod (231 - 1),
where independent seeds are needed for x0 and y0, and the sequence of the zn make up the output random numbers. The period is nearly the square of the component generators. Knuth also recommends:
xn = (271828183*xn-1 - 314159269*xn-2) mod (231 - 1),
which has very good performance and whose period is the square of m. Of course two independent seeds x0 and x1 are needed to start the sequence off with x2.
Knuth has other suggestions for efficient random number generators of high quality, where ``quality'' is measured by a variety of statistical tests that compare the output of the pseudo-random generator with true random numbers. If for a given test the comparison says the two sets of numbers look the same, then one says the generator ``passes'' this particular test. A generator that passes all the popular tests that people can devise is of high quality.
However, even generators of high quality are mostly not usable in cryptography. For example, given several successive numbers of a linear congruence generator, it is possible to compute the modulus and the multiplier with reasonable efficiency. One could make the generator more complex in order to resist this attack, but there would still be no proof or assurance of the difficulty of ``reverse engineering'' the generator. Instead, if generators are needed in cryptographic applications, one is usually created using a conventional cipher such as the Advanced Encryption Standard or using a public key cipher such as RSA or one of its variants. The AES-based generator will be efficient and will satisfy most practical requirements, but the RSA-based systems, while extremely slow compared to the others, have a very strong property of being cryptographically secure, a term that means the generator will pass all possible efficient statistical tests. These matters will be defined and discussed in the next chapter.