# 14.2 Backtesting

JP Morgan’s *RiskMetrics Technical Document* was released in four editions between 1994 and 1996. The first had limited circulation, being distributed at the firm’s 1994 annual research conference, which was in Budapest.1 It was the second edition, released in November of that year, that accompanied the public rollout of RiskMetrics. Six months later, a dramatically expanded third edition was released, reflecting extensive comments JP Morgan received on their methodology. While the second edition described a simple linear value-at-risk measure similar to JP Morgan’s internal system, the third edition reflected a diversity of practices employed at other firms. That edition described linear, Monte Carlo and historical transformation procedures. It also, perhaps for the first time in print, illustrated a crude method of backtesting.

Exhibit 14.1 is similar to a graph that appeared in that third edition. It depicts daily profits and losses (P&L’s) against (negative) value-at-risk for an actual trading portfolio. Not only does the chart summarize the portfolio’s daily performance and the evolution of its market risk. It provides a simple graphical analysis of how well the firm’s value-at-risk measure performed.

With a one-day 95% value-at-risk metric, we expect daily losses to exceed value-at-risk approximately 5% of the time—or six times in a six month period. We define an **exceedance** as an instance of a portfolio’s single-period loss exceeding its value-at-risk for that single period. In Exhibit 14.1, we can count ten exceedances over the six months shown.

Is this result reasonable? If it is, what would we consider unreasonable? If we experienced two exceedances—or fourteen—would we question our value-at-risk measure? Would we continue to use it? Would we want to replace it or modify it somehow to improve performance?

Questions such as these have spawned a literature on techniques for statistically testing value-at-risk measures ex post. Research to date has focused on value-at-risk measures used by banks. Published backtesting methodologies mostly fall into three categories:

**Coverage tests**assess whether the frequency of exceedances is consistent with the quantile of loss a value-at-risk measure is intended to reflect.**Distribution tests**are goodness-of-fit tests applied to the overall loss distributions forecast by complete value-at-risk measures.**Independence tests**assess whether results appear to be independent from one period to the next.

Later in this chapter, we cover several backtesting procedures that are prominent in the literature. Because all have shortcomings, we also introduce three basic tests—a coverage test, a distribution test and an independence test—that we recommend as minimum standards for backtesting in practice.

The question arises as to which P&L’s to use in backtesting a value-at-risk measure. We distinguish between **dirty P&L’s** and **clean P&L’s**. Dirty P&L’s are the actual P&L’s reported for a portfolio by the accounting system. They can be impacted by trades that take place during the value-at-risk horizon—trades the value-at-risk measure cannot anticipate. Dirty P&L’s also reflect fee income earned during the value-at-risk horizon, which value-at-risk measures also don’t anticipate. Clean P&L’s are hypothetical P&L’s that would have been realized if no trading took place and no fee income were earned during the value-at-risk horizon.

The Basel Committee (1996) recommends that banks backtest their value-at-risk measures against both clean and dirty P&L’s. The former is essential for addressing Type A and Type B model risk. The latter can be used to assess Type C model risk.

Suppose a firm calculates its portfolio value-at-risk at the end of each trading day. In a backtest against clean P&L’s, the value-at-risk measure performs well. Against dirty P&L’s, it does not. This might indicate that the value-at-risk measure is sound but that end-of-day value-at-risk does not reasonably indicate the firm’s market risk. Perhaps the firm engages in an active day trading program, reducing exposures at end-of-day.

Financial institutions don’t calculate clean P&L’s in the regular course of business, so provisions must be made for calculating and storing them for later use in backtesting. Other data to maintain are

- the value-at-risk measurements;
- the quantiles of the loss distribution at which clean and dirty P&L’s occurred (the loss quantiles), as determined by the value-at-risk measure. This information is important if distribution tests are to be employed in backtesting;
- inputs, including the portfolio composition, historical values for key factors and current values for key factors;
- intermediate results calculated by the value-at-risk measure, such as a covariance matrix or a quadratic remapping; and
- a history of modifications to the system.

The last three items will not be used in backtesting, but they could be useful if backtesting raises concerns about the value-at-risk measure, which people want to investigate. The third and fourth items could be regenerated at the time of such an investigation, but doing so for a large number of trading days might be a significant undertaking. Data storage is so inexpensive, there is every reason to err on the side of storing too much information.

A history of modifications to the value-at-risk measure is important because the system’s performance is likely to change with any substantive modification. Essentially, you are dealing with a new model each time you make a modification. We are primarily interested in backtesting the measure since its last substantive modification.