5.5 Testing Pseudorandom Number Generators
There are various tests that may be applied to pseudorandom number generators. Although some tests are not easily categorized, it is convenient to describe tests as being either
- empirical, or
- theoretical.
We have already discussed empirical tests in Section 5.3. These are statistical tests that are applied to subsequences of the numbers produced by a generator. For a given subsequence, a particular sample statistic is calculated, and the result is assessed for consistency with a U(0,1) sample. In Section 5.3, we mentioned empirical tests for sample mean, sample covariance matrix, and sample autocorrelations. More sophisticated empirical tests assess independence, equidistribution, or recurring patterns either in the numbers or in n-tuples of the numbers. Empirical tests often have beguiling names such as the poker test, birthday test, or coupon collector’s test.
Theoretical tests consider the formula that defines a generator, often employing advanced mathematical techniques to infer its properties. In this sense, theoretical tests characterize the entire sequence of numbers produced by a generator. Theoretical tests assess such things as period, lattice structures, uniformity, or correlations—often the same notions assessed by empirical tests, but for a generator’s entire period. Theoretical tests are limited since the full-period performance of a generator may be different from the behavior of a subsequence of numbers from that generator used in a particular application. For example, the lattice structure of an LCG is a property of that generator’s entire period. It exhibits a uniformity that is not apparent in reasonably sized subsequences from the same LCG. Despite such limitations, certain theoretical tests have proven to be consistent predictors of a generator’s performance on empirical tests. We call these tests figures of merit. Two important figures of merit are the spectral test and discrepancy.
5.5.1 The spectral test
For pseudorandom number generators with lattice structures, an important theoretical test is the spectral test of Coveyou and MacPherson (1967). This can be interpreted in different ways. The most intuitive interpretation is that it measures, in a given dimension, the spacing between the parallel planes of the lattice. For LCGs, MRGs and other generators with lattice structures, the spectral test is the quintessential figure of merit. Speaking of such generators, Knuth (1997) notes:
Not only do all good generators pass this test, all generators now known to be bad actually fail it. Thus it is by far the most powerful test known, and it deserves particular attention.
It is not sufficient to perform the spectral test in just a few dimensions. As we saw with the infamous RANDU generator, it is possible for a generator to have good lattice structures in certain dimensions, but very poor lattice structures in others. In particular, if vectors are to be formed from consecutive n-tuples of pseudorandom numbers, it is desirable to apply the spectral test to the generator in n dimensions to confirm that the vectors will not fall on just a few widely spaced planes in that dimensionality.
A drawback of the spectral test is its computational complexity. The number of calculations needed to perform the spectral test increases exponentially17 with the dimension being considered. For this reason, it is impossible to directly assess high-dimensional lattice structures.
5.5.2 Discrepancy
A standard empirical test is the equidistribution test, or its generalization to multiple dimensions, which we call the serial test. In one dimension, we divide the unit interval [0,1] into a equal subintervals. We then test the uniformity of a finite sequence of m numbers by determining how many of them fall into each subinterval. A perfectly equidistributed set of numbers might have an equal number of values occurring in each bucket, but we would not expect such perfect equidistribution from a U[0,1) sample. We would not consider 1000 numbers to be particularly random if we divided the unit interval into 200 equal buckets and found that exactly 5 of them fell into each of the 200 buckets. Instead, we would expect some random variability in the number of occurrences in each bucket. The equidistribution test looks for such variability and assesses whether it is consistent with that of a U[0,1) sample.
A limitation of the equidistribution test is the fact that results depend upon the particular number a of subintervals employed. It is convenient to have a measure of equidistribution that does not depend upon this number of subintervals. Discrepancy is such a measure. Because its definition is technical, we shall not present it. Interested readers may consult Niederreiter (1992).
A drawback of discrepancy is its computational complexity. The number of calculations required to calculate discrepancy for m n-dimensional points increases with either m or n in proportion with mn. For this reason, it is impractical to apply discrepancy as an empirical test. Instead, researchers employ it as a theoretical test, using mathematical arguments to determine upper and lower bounds for the discrepancies of generators.
5.5.3 Period
Period is not usually considered a figure of merit because its predictive ability is limited. Certainly, there are generators with high periods that fail empirical tests miserably. However, it is reasonable to discuss period along with standard figures of merit because, among generators that do exhibit good behavior on empirical tests, performance does correlate with period. This is especially true of linear generators such as LCGs and MRGs. The best LCGs with periods of around 232 have noticeably inferior properties compared to the best LCGs with periods of around 248. These, in turn, are noticeably inferior to the best LCGs with periods of around 264. Accordingly, the best linear generators tend to have very high periods.
Another reason to consider period is the need to employ a generator whose period exceeds the number of pseudorandom numbers required by an intended application. Every application is unique, but a rule of thumb is that a generator should have a period well in excess of the square of the number of pseudorandom numbers required for the application. Today, many operating systems and software packages employ default generators that are LCGs with periods less than 232. Accordingly, such default generators are inappropriate for use in applications that require more than, say, 215 = 32,768 pseudorandom numbers. Manyvalue-at-risk applications require more pseudorandom numbers than this.
For inversive generators, extremely long periods are not as necessary. Empirical work indicates it is sufficient for an inversive generator’s period to exceed the number of pseudorandom numbers required by a few orders of magnitude.