14.4 Backtesting With Distribution Tests
As part of the process of calculating a portfolio’s value-at-risk, value-at-risk measures—explicitly or implicitly—characterize a distribution for 1P or 1L. That characterization takes various forms. A linear value-at-risk measure might specify the distribution for 1P with a mean, standard deviation and an assumption that the distribution is normal. A Monte Carlo value-at-risk measure simulates a large number of values for 1P. Any histogram of those values can be treated as a discrete approximation to the distribution of 1P.
Distribution tests are goodness-of-fit tests that go beyond the specific quantile-of-loss a value-at-risk measure purports to calculate and more fully assess the quality of the 1P or 1L distributions the value-at-risk measure characterizes.
For example, a crude distribution test can be implemented by performing multiple coverage tests for different quantiles of 1L. Suppose a one-day 95% value-at-risk measure is to be backtested. Our basic coverage test is applied to assess how well the value-at-risk measure estimates the 0.95 quantile of 1L, but we don’t stop there. We apply the same coverage test to also assess how well the value-at-risk measure estimates the 0.99, 0.975, 0.90, 0.80, 0.70, 0.50 and 0.25 quantiles of 1L. Collectively, these analyses provide a rudimentary goodness-of-fit test for how well the value-at-risk measure characterized the overall distribution of 1L.
Various distribution tests have been proposed in the literature. Most employ the framework we describe below.
14.4.1 Framework for Distribution Tests
While coverage tests assess a value-at-risk measure’s exceedances data – αi, – α +1i, … , 0i, which is a series of 0’s and 1’s, most distribution tests consider loss data – αl, – α +1l, … , 0l. Although it is convenient to assume exceedance random variables tI are IID, that assumption is unreasonable for losses tL.
A value-at-risk measure characterizes a CDF for each tL. Treating probabilities as objective for pedagogical purposes, is a forecast distribution we use to model the “true” CDF for each tL, which we denote . Our null hypothesis is then = for all t.
Testing this hypothesis poses a problem: We are not dealing with a single forecast distribution modeling some single “true” distribution. The distribution changes from one day to the next, so each data point tl is drawn from a different probability distribution. This renders statistical analysis futile. We circumvent this problem by introducing a random variable tU for the quantile at which tL occurs.
Assuming our null hypothesis , the tU are all uniformly distributed, tU ~ U(0,1). We assume the tU are independent. Applying [14.9], we transform our loss data –αl, –α+1l, … , 0l into loss quantile data –αu, –α+1u, … , 0u, which we treat as a realization u, … , u[α–1], u [α] of a sample. This we can test for consistency with a U(0,1) distribution. Crnkovic and Drachman’s (1996) distribution test applied Kuiper’s statistic4 for this purpose.
Some distribution tests—see Berkowitz (2001)—further transform the data – αu, – α +1u, … , 0u by applying the inverse standard normal CDF Φ–1:
Assuming our null hypothesis , the tN are identically standard normal, tN ~ N(0,1), so transformed data –αn, –α+1n, … , 0n can be tested for consistency with a standard normal distribution.
Below, we introduce a simple graphical test of normality that can be applied. This will motivate a recommended standard test based on Filliben’s (1975) correlation test for normality. That is one of the most powerful tests for normality available.
14.4.2 Graphical Distribution Test
Construct the tn as described above, and arrange them in ascending order. We adjust our notation, denoting n1 the lowest and nα+1 the highest, so n1 ≤ n2 ≤ … ≤ nα+1. Next, define
for j = 1, 2, … , α + 1, where Φ is the standard normal CDF. The are quantiles of the standard normal distribution, with a fixed 1/(α + 1) probability between consecutive quantiles. If our null hypothesis holds, and the nj are drawn from a standard normal distribution, each nj should fall near the corresponding . We can test this by plotting all points (nj, ) in a Cartesian plane. If the points tend to fall near a line with slope one, passing through the origin, this provides visual evidence for our null hypothesis.
14.4.3 A Recommended Standard Distribution Test
We now introduce a recommended standard distribution test based on Filliben’s correlation test for normality. Construct pairs (nj, ) as described above, and take the sample correlation of the nj and . Sample correlation values close to one tend to support the null hypothesis.
Using the Monte Carlo method, we can determine non-rejection values for the sample correlation at various levels of significance. If the sample correlation falls below a non-rejection value, we reject the null hypothesis at the indicated level of significance. Non-rejection values for the .05 and .01 significance levels are indicated in Exhibit 14.6.
Suppose we are backtesting a one-day 99% value-at-risk measure based on α + 1= 250 days of data. We calculate the nj and and find their sample correlation to be 0.993. Based on the values in Exhibit 14.6, we reject the value-at-risk measure at the .01 significance level but do not reject it at the .05 significance level.
Why is it unreasonable to assume losses – αL, – α +1L, … , –1L are IID?
In applying our recommended standard distribution test with 750 days of data, the sample correlation of the nj and is found to be 0.995. Do we reject the value-at-risk measure at the .05 significance level?