Abouarghoub, Wessam (2013). Implementing the new science of risk management to tanker freight markets, doctoral thesis, University of the West of England.

Alexander, Carol O. (2001). Market Models, Chichester: John Wiley & Sons.

Alexander, Carol O. and A. M. Chibumba (1997). Multivariate orthogonal factor GARCH, working paper.

Allen, M. (1994). Building a role model, Risk, 7 (9), 73-80.

Bai, Jushan, and Shuzhong Shi (2011). Estimating high dimensional covariance matrices and its applications, Annals of Economics & Finance, 12 (2), 199-215.

Baxter, Martin and Andrew Rennie (1996). Financial Calculus: An Introduction to Derivative Pricing, Cambridge: Cambridge University Press.

Basel Committee on Banking Supervision (1995). An Internal Model-Based Approach to Market Risk Capital Requirements.

Basel Committee on Banking Supervision (1996a). Amendment to the Capital Accord to Incorporate Market Risks.

Basel Committee on Banking Supervision (1996b). Supervisory Framework for the Use of “Backtesting” in Conjunction With the Internal Models Approach to Market Risk Capital Requirements.

Beck, Kent (2002). Test Driven Development: By Example, Reading: Addison-Wesley.

Beck, Kent and Cynthia Andres (2004). Extreme Programming Explained: Embrace Change, 2nd Edition, Reading: Addison-Wesley.

Berkowitz, Jeremy (2001). Testing density forecasts, with applications to risk management, Journal of Business & Economic Statistics, 19 (4), 465-474.

Berkowitz, Jeremy, Peter Christoffersen and Denis Pelletier (2011). Evaluating value-at-risk models with desk- level data, Management Science 57 (12), 2213–2227.

Berkowitz, Jeremy and James O’Brien (2002). How accurate are value‐at‐risk models at commercial banks? Journal of Finance, 57 (3), 1093-1111.

Bernstein, Peter L. (1992). Capital Ideas: The Improbable Origins of Modern Wall Street, New York: Free Press.

Black, Fischer and Myron S. Scholes (1973). The pricing of options and corporate liabilities, Journal of Political Economy, 81 (3), 637-654.

Black, Fischer (1976). The Pricing of Commodity Contracts, Journal of Financial Economics, 3, 167-179.

Bollerslev, Tim (1986). Generalized autoregressive conditional heteroskedasticity, Journal of Econometrics, 31, 307-328.

Bollerslev, Tim (1990). Modelling the coherence in short-run nominal exchange rates: A multivariate generalized ARCH model, Review of Economics and Statistics, 72, 498-505.

Box, George E. P. and Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, New York: Wiley.

Britten-Jones, Mark and Stephen M. Schaefer (1999). Nonlinear value-at-risk, European Finance Review, 2 (2), 161-187.

Brockwell, Peter J. and Richard A. Davis (2010). Introduction to Time Series and Forecasting, 2nd ed., New York: Springer.

Burden, Richard L. and J. Douglas Faires (2010). Numerical Analysis, 9th ed., Boston: PWS Publishing Company.

Burghardt, Galen and Bill Hoskins (1995). A question of bias, Risk, 8 (3), 63-70.

Campbell, Rachel, Kees Koedijk and Paul Kofman (2002). Increased correlation in bear markets, Financial Analysts Journal, 58 (1), 87-94.

Campbell, Sean D. (2005). Finance and Economics Discussion Series, Washington: Federal Reserve Board.

Cárdenas, Juan, Emmanuel Fruchard, Etienne Koehler, Christophe Michel, and Isabelle Thomazeau (1997).value-at-risk: One Step Beyond, Risk, 10 (10), 72-75.

Cárdenas, Juan, Emmanuel Fruchard, Jean-François Picron, Cecilia Reyes, Kristen Walters, and Weiming Yang (1999). Monte Carlo within a day, Risk, 12 (2), 55-59.

Chew, Lillian (1993). Made to measure, Risk, 6 (9), 78-79.

Christoffersen, Peter (1998). Evaluating interval forecasts. International Economic Review, 39 (4), 841-862.

Christoffersen, Peter and Denis Pelletier (2004). Backtesting value-at-risk: a duration-based approach, Journal of Financial Econometrics, 2 (1), 84-108.

Cockburn, Alistair (2000). Writing Effective Use Cases, Reading: Addison-Wesley.

Cornell, Bradford (1981). A note on taxes and the pricing of Treasury bill futures contracts, Journal of Finance, 36 (12), 1169-1176.

Cornell, Bradford and Marc R. Reinganum (1981). Forward and futures prices: Evidence from the foreign exchange markets, Journal of Finance, 36 (12), 1035-1045.

Cornish, E. A. and Ronald A. Fisher (1937). Moments and cumulants in the specification of distributions, Review of the International Statistical Institute, 5, 307-320.

Corrigan, Gerald (1992). Remarks before the 64th annual mid-Winter meeting of the New York State Bankers Association, January 30, Waldorf-Astoria, New York City: Federal Reserve Bank of New York.

Cotter, John and François Longin (2007). Implied Correlations fromvalue-at-risk. Working paper, University College Dublin.

Coveyou, R. R. and R. D. MacPherson (1967). Fourier analysis of uniform random number generators, Journal of the Association for Computing Machinery, 14, 100-119.

Cox, John C., Jonathan E. Ingersoll, Jr., and Stephen A. Ross (1981). The relation between forward prices and futures prices, Journal of Financial Economics, 9, 321-346.

Crnkovic, Cedomir and Jordan Drachman (1996). Quality control, Risk, 9 (9), 138-143.

Culp, Christopher (2001). The Risk Management Process: Business Strategy and Tactics, New York: John Wiley & Sons.

da Silva, Alan Cosme Rodrigues, Claudio Henrique da Silveira Barbedo, Gustavo Silva Araújo and Myrian Beatriz Eiras das Neves (2006). Internal Model validation in Brazil: analysis ofvalue-at-risk backtesting methodologies, Revista Brasileira de Finanças, 4 (1), 363-384.

Dahlgren, Robert, Chen-Ching Liu and Jacques Lawarrée (2003). Risk assessment in energy trading. IEEE Transactions on Power Systems, 18 (2), 503-511.

Dale, Richard (1996). Risk and Regulation in Global Securities Markets, Chichester: John Wiley & Sons.

Davidson, Clive (1996). The data game, Firmwide Risk Management, special supplement to Risk, 9 (7), 39-44.

Davies, Robert B. (1973). Numerical inversion of a characteristic function, Biometrika, 60, 415-417.

Dennis, J. E. and Robert B. Schnabel (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Englewood Cliffs: Prentice-Hall.

Ding, Z. (1994). Time series analysis of speculative returns, PhD thesis, San Diego: University of California.

Doherty, Neil A., (2000). Integrated Risk Management: Techniques and Strategies for Managing Corporate Risk, New York: McGraw-Hill.

Dowd, Kevin (2005). Measuring Market Risk, 2nd ed., Chichester: John Wiley & Sons.

Dusak, Katherine (1973). Futures trading and investor returns: an investigation of commodity market risk premiums, Journal of Political Economy, 81, 1387-1406.

Eckhardt, Roger (1987). Stan Ulam, John von Neumann, and the Monte Carlo method, Los Alamos Science, Special Issue (15), 131-137.

Eichenauer, J. and J. Lehn (1986). A non-linear congruential pseudo random number generator, Statistical Papers, 27, 315-326.

Eichenauer-Herrmann, J. (1993). Statistical independence of a new class of inversive congruential pseudorandom numbers, Mathematics of Computation, 60, 375-384.

Engle, Robert F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of UK inflation, Econometrica, 50, 987-1008.

Engle, Robert F. (2000). Dynamic conditional correlation—A simple class of multivariate GARCH models, working paper.

Engle, Robert F. and K. F. Kroner (1995). Multivariate simultaneous generalized ARCH, Econometric Theory, 11, 122-150.

Engle, Robert F., Simone Manganelli (2004). CAViaR: conditional autoregressive value-at-risk by regression quantiles. Journal of Business and Economic Statistics, 22 (4), 367–381.

Engle, Robert F. and Kevin Sheppard (2001). Theoretical and empirical properties of dynamic conditional correlation multivariate GARCH, working paper.

Evans, Michael and Tim Swartz (2000). Approximating Integrals Via Monte Carlo and Deterministic Methods, Oxford: Oxford University Press.

Fallon, William (1996). Calculating value-at-risk, working paper.

Filliben, James J. (1975). Probability plot correlation coefficient test for normality, Technometrics, 17 (1), 111–117.

Fincke, U. and M. Pohst (1985). Improved methods for calculating vectors of short length in a lattice, including a complexity analysis, Mathematics of Computation, 44 (170), 463-471.

Finger, Christopher (1997). A methodology for stress correlation, Risk Metrics Monitor (Fourth Quarter), 3-11.

Finger, Christopher (2006). How historical simulation made me lazy, RiskMetrics Group Research Monthly.

Fishman, George S. (1996). Monte Carlo: Concepts, Algorithms, and Applications, New York: Springer-Verlag.

Forbes, Catherine, Merran Evans, Nicholas Hastings, and Brian Peacock (2010). Statistical Distributions, 4nd ed., New York: John Wiley & Sons.

Francis, Stephen C. (1985). Correspondence appearing in: United States House of Representatives (1985). Capital Adequacy Guidelines for Government Securities Dealers Proposed by the Federal Reserve Bank of New York: Hearings Before the Subcommittee on Domestic Monetary Policy of the Committee on Banking, Finance and Urban Affairs, Washington: US Government Printing Office, 251-252.

Franses, Philip Hans (1998). Time Series Models for Business and Economic Forecasting, Cambridge: Cambridge University Press.

Franses, Philip Hans and Dick van Dijk (2000). Non-Linear Time Series Models in Empirical Finance, Cambridge: Cambridge University Press.

French, Kenneth R. (1980). Stock returns and the weekend effect, Journal of Financial Economics, 8, 55-69.

French, Kenneth R. (1983). A comparison of futures and forward prices, Journal of Financial Economics, 12, 311-342.

French, Kenneth R. and Richard Roll (1986). Stock return variance: the arrival of information and the reaction of traders, Journal of Financial Economics, 17, 5-26.

Fuglsbjerg, Brian (2000). Variance reduction techniques for Monte Carlo estimates of value-at-risk, working paper.

Garbade, Kenneth D. (1986). Assessing risk and capital adequacy for Treasury securities, Topics in Money and Securities Markets, 22, New York: Bankers Trust.

Garman, Mark B. and Steven W. Kohlhagen (1983). Foreign currency option values, Journal of International Money and Finance, 2, 231-237.

Gärtner, von Bernd (1999). Ein reinfall mit computer-zufallszahlen, Mitteilungen der Deutschen Mathematiker-Vereinigung, 2, 55-60.

Geiss, Charles G. (1995). Distortion-free futures price series, Journal of Futures Markets, 15 (7), 805-831.

Gentle, James E. (1998). Numerical Linear Algebra for Applications in Statistics, New York: Springer-Verlag.

Gibbons, Michael R. and Patrick Hess (1981). Day of the week effect and asset returns, Journal of Business, 54, 579 – 596.

Glasserman, Paul (2003). Monte Carlo Methods in Financial Engineering, Springer: New York.

Glasserman, Paul, Philip Heidelberger, and Perwez Shahabuddin (2000). Variance reduction techniques for estimating value-at-risk, Management Science, 46 (10), 1349 – 136.

Goldfeld, Stephen M and Richard E. Quandt (1973). A Markov model for switching regressions, Journal of Econometrics, 1, 3-16.

Goldman Sachs and SBC Warburg Dillon Read (1998). The Practice of Risk Management, London: Euromoney Books.

Golub, Gene H. and Charles F. Van Loan (1996). Matrix Computations, 3rd ed., Baltimore: Johns Hopkins University Press.

Gridgeman, N. T. (1960). Geometric probability and the number p, Scripta Mathematica, 25, 183-195.

Group of 30 (1993). Derivatives: Practices and Principles, Washington: Group of 30.

Group of 30 (1994). Derivatives: Practices and Principles, Appendix III: Survey of Industry Practice, Washington: Group of 30.

Guldimann, Till M. (1995). Risk measurement framework, RiskMetrics—Technical Document, 3rd ed., New York: Morgan Guaranty, 6-45.

Guldimann, Till M. (2000). The story of RiskMetrics, Risk, 13 (1), 56-58.

Gupta, Anurag and Marti G. Subrahmanyam (2000). An empirical examination of the convexity bias in the pricing of interest rate swaps, Journal of Financial Economics, 55 (2), 239-279.

Haas, Marcus (2001). New methods in backtesting. Mimeo, Financial Engineering Research Center Caesar, Friedensplatz, Bonn.

Haas, M., 2005. Improved duration-based backtesting of value-at-risk. Journal of Risk, 8 (2), 17–38.

Han, Chulwoo, Frank C. Park, and Jangkoo Kang (2007). Efficient value-at-risk estimation for mortgage-backed securities, Journal of Risk 9 (3), 37-61.

Hall, Asaph (1873). On an experimental determination of p, Messenger of Mathematics, 2, 113-114.

Hamilton, James D. (1993). Estimation, inference and forecasting of time-series subject to changes in regime, in G. S. Maddala, C. R. Rao and H. D. Vinod (editors), Handbook of Statistics, vol. 11: Econometrics, New York: North-Holland.

Hamilton, James D. (1994). Time Series Analysis, Princeton: Princeton University Press.

Hammersley, J. M. and D. C. Handscomb (1964). Monte Carlo Methods, New York: John Wiley & Sons.

Hartman, Joel, and Jan Sedlak (2013). Forecasting conditional correlation for exchange rates using multivariate GARCH models with historical value-at-risk application, working paper.

Haug, Espen G. (1997). The Complete Guide to Option Pricing Formulas, 2nd ed., New York: McGraw-Hill.

Hellekalek, P. (1998). Good random number generators are (not so) easy to find, Mathematics and Computers in Simulation, 46, 485-505.

Hendricks, Darryll (1996). Evaluation of value-at-risk models using historical data, Federal Reserve Bank of New York Economic Policy Review, April.

Higham, Nicholas J. (2002). Computing the nearest correlation matrix—a problem from finance, IMA Journal of Numerical Analysis 22(3), 329-343.

Heron, Dan and Richard Irving (1996). Banks graspvalue-at-risk nettle, A Risk Special Supplement, Risk, June, pp. 16–21.

Holton, Glyn A. (1998). Simulating value-at-risk, Risk, 11 (5), 60-63.

Holton, Glyn A. (2004). Defining risk, Financial Analysts Journal, 60 (6), 19–25.

Hughston, Lane (1996). Vasicek And Beyond: Approaches to Building and Applying Interest Rate Models. London: Risk Publications.

Hughston, Lane (1999). Options: Classic Approaches to Pricing and Modelling. London: Risk Books.

Hull, John C. (2011). Options, Futures, and Other Derivatives, 8th ed., Englewood Cliffs: Prentice Hall.

Hull, John and Alan White (1998). Incorporating volatility updating into the historical simulation method forvalue-at-risk, Journal of Risk, 1 (1), 5-19.

Imhof, J. P. (1961). Computing the distribution of quadratic forms in normal variables, Biometrika, 48, 419-426.

James, Jessica and Nick Webber (2000). Interest Rate Modelling, Chichester: John Wiley & Sons.

Jamshidian, Farshid and Yu Zhu (1997). Scenario simulation: Theory and methodology, Finance and Stochastics, 1 (1), 43-67.

Jarrow, Robert A. (editor) (1998). Volatility: New Estimation Techniques for Pricing Derivatives, London: Risk Books.

Jarrow, Robert A. and George S. Oldfield (1981). Forward contracts and futures contracts, Journal of Financial Economics, 9, 373-382.

Jaschke, Stefan R. (2001). The Cornish-Fisher-expansion in the context of delta-gamma-normal approximations, Journal of Risk, 4(4), 33-52.

Jaschke, Stefan R. and Peter Mathé (2004). Stratified sampling for risk management, unpublished manuscript.

Johnson, Dallas E. (1998). Applied Multivariate Methods for Data Analysts, Pacific Grove: Duxbury Press.

Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation, Biometrika, 36, 149-176.

Judge, George G., R. Carter Hill, William E. Griffiths, Helmut Lütkepohl, and Tsoung-Chao Lee (1988). The Theory and Practice of Econometrics . 2nd ed., New York: John Wiley & Sons.

Kercheval, Alec N. (2008). Optimal covariances in risk model aggregation, Proceedings of the Third IASTED International Conference on Financial Engineering and Applications, ACTA Press, Calgary, 30-35.

Klaassen, Franc (2000). Have exchange rates become more closely tied? Evidence from a new multivariate GARCH model, working paper.

Knuth, Donald E. (1997). Art of Computer Programming, Volume 2: Seminumerical Algorithms, 3rd ed., Vol. 2. Reading: Addison-Wesley.

Kolb, Robert W. (2006). Understanding Futures Markets, 6th ed., Malden: Blackwell.

Korn, Ralf and Mykhailo Pupashenko (2015). A new variance reduction technique for estimating value-at-risk, Applied Mathematical Finance, 22(1), 83-98.

Kupiec, Paul H. (1995). Techniques for verifying the accuracy of risk measurement models, Journal of Derivatives, 3 (2), 73–84.

Larman, Craig (2003). Agile and Iterative Development: A Manager’s Guide, Reading: Addison-Wesley.

Lad, Frank (1996). Operational Subjective Statistical Methods: A Mathematical, Philosophical, and Historical Introduction, New York: John Wiley & Sons.

Laplace, Pierre Simon Marquis de (1878-1912). Oeuvres Complètes de Laplace, Paris: Gauthier-Villars.

Leavens, Dickson H. (1945). Diversification of investments, Trusts and Estates, 80 (5), 469-473.

L’Ecuyer, Pierre. (1998). Random number generation, in Jerry Banks (editor), Handbook of Simulation: Principles, Methodology, Advances, Applications, and Practice, 1998, New York: John Wiley & Sons.

L’Ecuyer, Pierre. (1999). Good parameter sets for combined multiple recursive random number generators, Operations Research, 47 (1), 159-164.

L’Ecuyer, P., F Blouin and R. Couture (1993). A search for good multiple recursive random number generators, ACM Transactions on Modeling and Computer Simulation, 3 (2), 87-98.

Leffingwell, Dean and Don Widrig (1999). Managing Software Requirements: A Unified Approach, Reading: Addison-Wesley.

Lehmann, E. L. and Joseph P. Romano (2005). Testing Statistical Hypotheses, 3rd ed., New York: Springer.

Lehmer, D. H. (1951). Mathematical methods in large-scale computing units, Proceedings of a Second Symposium on Large-Scale Digital Calculating Machinery. Cambridge: Harvard University Press, 141-146.

Leipnik, R. B. (1991). Lognormal random variables, Journal of the Australian Mathematical Society, Series B, 32, 327-347.

Leong, Kenneth S. (1996). The right approach, Value-at-Risk, A Risk Special Supplement, Risk Magazine, June, 9–14.

Levine, Robert S. (2007). Implementing Systems Solutions for Financial Risk Management, London: Risk Books.

Lewis, P. A. W., A. S. Goodman and J. M. Miller (1969). A pseudo-random number generator for the System/360, IBM Systems Journal, 8 (2), 136-145.

Li, Qingna, Donghui Li and Houduo Qi (2010). Newton’s method for computing the nearest correlation matrix with a simple upper bound, Journal of Optimization Theory and Applications, 147 (3), 546-568.

Lietaer, Bernard A. (1971). Financial Management of Foreign Exchange: An Operational Technique to Reduce Risk, Cambridge: MIT Press.

Linsmeier, Thomas J. and Neil D. Pearson (1996). Risk Measurement: An Introduction to Value at Risk, unpublished manucript.

Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets, Review of Economics and Statistics, 47: 13-37.

Ljung, G. M. and G. E. P. Box (1978). On a measure of lack of fit in time series models, Biometrika, 65 (2), 297-303.

Longerstaey, Jacques (1995). Mapping to describe positions, RiskMetrics—Technical Document, 3rd ed., New York: Morgan Guaranty, 107-156.

Lopez, Jose A. (1999). Methods for evaluating value-at-risk models, Federal Reserve Bank of San Francisco Economic Review, 2, 3-17.

Lyons, Richard K. (1995). Tests of microstructure hypotheses in the foreign exchange market, Journal of Financial Economics, 39, 321-351.

Macaulay, Frederick R. (1938). The Movements of Interest Rates. Bond Yields and Stock Prices in the United States since 1856, New York: National Bureau of Economic Research.

Ma, Christopher K., Jeffrey M. Mercer and Matthew A. Walker (1992). Rolling over futures contracts: A note, Journal of Futures Markets, 12 (2), 203-217.

Malz, Alan M. (2011). Financial Risk Management: Models, History, and Institutions, Chichester: John Wiley & Sons.

Mark, Robert (1991). Units of management. Balance Sheet (distributed in Risk, 4 (6)), 3-7.

Markowitz, Harry M. (1952). Portfolio Selection, Journal of Finance, 7 (1), 77-91.

Markowitz, Harry M. (1959). Portfolio Selection: Efficient Diversification of Investments, New York: John Wiley & Sons.

Markowitz, Harry M. (1999). The early history of portfolio theory: 1600-1960, Financial Analysts Journal, 55 (4), 5-16.

Marsaglia, G. (1968). Random numbers fall mainly in the planes, Proceedings of the National Academy of Sciences USA, 61, 25-28.

Marshall, Chris and Michael Siegel (1997). Value at risk: implementing a risk measurement standard, Journal of Derivatives, 4 (3), 91-110.

Mathai, A. M. and Serge B. Provost (1992). Quadratic Forms in Random Variables, New York: Marcel Dekker.

McLeod, A. I. and W. K. Li (1983). Diagnostic checking ARMA time series models using squared residual autocorrelations, Journal of Time Series Analysis, 4 (4), 269–273.

Metropolis, Nicholas (1987). The beginning of the Monte Carlo method, Los Alamos Science, Special Issue (15), 125-130.

Metropolis, Nicholas and Stanislaw Ulam (1949). The Monte Carlo method, Journal of the American Statistical Association, 44 (247), 335-341.

Mills, Terence C. (1999). The Econometric Modelling of Financial Time Series, 2nd ed., Cambridge: Cambridge University Press.

Mina, Jorge and Andrew Ulmer (1999). Delta-Gamma four ways. Technical report, RiskMetrics Group.

Mittnik, Stefan (2014).value-at-risk-implied tail-correlation matrices. Economics Letters, 122 (1), 69-73.

Molinari, Steven L. and Nelson S. Kibler (1983). Broker-dealers’ financial responsibility under the Uniform Net Capital Rule—a case for liquidity, Georgetown Law Journal, 72 (1), 1-37.

Morgan, Byron J. T. (1984). Elements of Simulation. London: Chapman & Hall.

Morgan Guaranty (1996). RiskMetrics—Technical Document, 4th ed., New York: Morgan Guaranty.

Mossin, Jan (1966). Equilibrium in a capital asset market, Econometrica, 34, 768-783.

Niederreiter, Harald (1992). Random Number Generation and Quasi-Monte Carlo Methods. Philadelphia: Society for Industrial and Applied Mathematics.

Office of the Comptroller of the Currency (2000). OCC Bulletin 2000–16: Model Validation, Washington: Office of the Comptroller of the Currency.

O’Neil, Catherine (2010). Measuring CDS value-at-risk. Risk Metrics Working Papers.

Opschoor, Anne, Dick van Dijk and Michel van der Wel (2013). Predicting covariance matrices with financial conditions indexes (No. TI 13-113/III, pp. 1-43). Tinbergen Institute Discussion Paper Series.

Pan, Jun and Darrell Duffie (1997). An Overview of value-at-risk, Journal of Derivatives, 4 (3), 7-49.

Park, Hun Y. and Andrew H. Chen (1985). Differences between futures and forward prices: A further investigation of the mark-to-market effects, Journal of Futures Markets, 5 (1), 77-88.

Park, Stephen K. and Keith W. Miller (1988). Random number generators: good ones are hard to find, Communications of the ACM, 31 (10), 1192-1201.

Patel, Jagdish K. (1996). Handbook of the Normal Distribution, 2nd ed., New York: Marcel Dekker.

Pérignon, Christophe, Zi Yin Deng and Zhi Jun Wang (2008). Do banks overstate their value-at-risk? Journal of Banking & Finance, 32 (5), 783-794.

Pérignon, Christophe and Daniel R. Smith (2010). The level and quality of Value-at-Risk disclosure by commercial banks, Journal of Banking & Finance, 34 (2), 362-377.

Pichler, Stefan and Karl Selitsch (2000). A comparison of analyticvalue-at-risk methodologies for portfolios that include options, Model Risk, Concepts, Calibration and Pricing, Rajna Gibson (editor), London: Risk Books.

Press, W., S. Teukolsky, W. Vetterling and B. Flannery (1995). Numerical Recipes in C: The Art of Scientific Computing, 2nd ed., Cambridge: Cambridge University Press.

Pritsker, Matthew (2006). The hidden dangers of historical simulation, Journal of banking & finance, 30 (2), 561-582.

Pupashenko, Mykhailo (2014). Variance reduction technique for estimating value-at-risk based on cross-entropy, Journal of Mathematics and System Science, 4(1), 37-48.

Qi, Houduo and Defeng Sun (2010). Correlation stress testing for value-at-risk: an unconstrained convex optimization approach, Computational Optimization and Applications, 45 (2), 427-462.

Questa, Giorgio S. (1999). Fixed Income Analysis for the Global Financial Market: Money Market, Foreign Exchange, Securities, and Derivatives, New York: John Wiley & Sons.

Rebonato, Riccardo and Peter Jäckel (1999). The most general methodology to create a valid correlation matrix for risk management and option pricing purposes, Journal of Risk, 2(2), 17-27.

Reuters (2000). An Introduction to The Commodities, Energy & Transport Markets, Singapore: John Wiley & Sons.

Rota, Gian-Carlo (1987). The lost cafe, Los Alamos Science, Special Issue (15), 23-32.

Rouvinez, Christophe (1997). Going Greek withvalue-at-risk, Risk, 10 (2), 57-65.

Roy, Arthur D. (1952). Safety first and the holding of assets, Econometrica, 20 (3), 431-449.

Røynstrand, Torgeir, Nils Petter Nordbø, Vidar Kristoffer Strat (2012). Evaluating power of value-at-risk backtests, masters thesis, Norwegian University of Science and Technology.

Rubinstein, Reuven Y. (2007). Simulation and the Monte Carlo Method, 2nd ed. New York: John Wiley & Sons.

Saff, E. B. and A. B. J. Kuijlaars (1997). Distributing many points on a sphere, Mathematical Intelligencer, 19 (1), 5-11.

Schneider Geri and Jason P. Winters (2001). Applying Use Cases: A Practical Guide, 2nd ed. Reading: Addison-Wesley.

Schrock, Nichols W. (1971). The theory of asset choice: simultaneous holding of short and long positions in the futures market, Journal of Political Economics, 79, 270-293.

Schwaber, Ken and Mike Beedle (2001). Agile Software Development with Scrum, Upper Saddle River: Prentice Hall.

Sharpe, William F. (1963). A simplified model for portfolio analysis, Management Science, 9, 277-293.

Sharpe, William F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk, Journal of Finance, 19 (3), 425-442.

Shirreff, David (1992). Swap and think, Risk, 5 (3), 29 – 35.

Singh, Manoj K. (1997). Value-at-risk using principal components analysis, Journal of Portfolio Management, 24 (1), 101-112.

Sironi, Andrea and Andrea Resti (2007). Risk Management and Shareholders’ Value in Banking, Chichester: John Wiley & Sons.

Solomon, H. and M.A. Stephens (1977). Distribution of a sum of weighted chi-square variables, Journal of the American Statistical Association, 72, 881-885.

Spanos, Aris (1999). Probability Theory and Statistical Inference: Econometric Modeling with Observational Data, Cambridge: Cambridge University Press.

Steinberg, Richard M. (2011). Governance, Risk Management, and Compliance, Chichester: John Wiley & Sons.

Stoyanov, Jordan (1997). Counterexamples in Probability, 2nd ed., Chichester: John Wiley & Sons.

Strang, Gilbert (2005). Linear Algebra and Its Applications, 4rd ed., Brooks Cole.

Stuart, Alan and J. Keith Ord (1994). Kendall’s Advanced Theory of Statistics, Volume 1: Distribution Theory, London: Arnold.

Student (1908a). The probable error of a mean. Biometrika, 6, 1-25.

Student (1908b). Probable error of a correlation coefficient. Biometrika, 6, 302-310.

Tobin, James (1958). Liquidity preference as behavior towards risk, The Review of Economic Studies, 25, 65-86.

Todhunter, Isaac. (1865). History of the mathematical theory of probability from the time of Pascal to that of Laplace. Cambridge: Cambridge University Press. Reprinted (1949), New York: Chelsea.

Treynor, Jack (1961). Towards a theory of market value of risky assets, unpublished manuscript.

Tsay, Ruey S (2013). Multivariate Time Series Analysis: With R and Financial Applications, John Wiley & Sons.

Vasey, G. M. and Andrew Bruce (2010). Trends in Energy Trading, Transaction and Risk Management Software 2009 – 2010, CreateSpace Independent Publishing Platform.

Viswanath, P. V. (1989). Taxes and the futures-forward price difference in the 91-day T-bill market, Journal of Money, Credit and Banking, 21 (2), 190-205.

Walmsley, Julian (2000). The Foreign Exchange and Money Markets Guide, 2nd ed., New York: John Wiley & Sons.

Wilmott, Paul, Jeff Dewynne, and Sam Howison (1993). Option Pricing: Mathematical Models and Computation, Oxford: Oxford Financial Press.

Wilson, Thomas (1992). Raroc remodeled, Risk, 5 (8), 112-119.

Wilson, Thomas (1993). Infinite wisdom, Risk, 6 (6), 37-45.

Wilson, Thomas (1994a). Debunking the myths, Risk, 7 (4), 67-73.

Wilson, Thomas (1994b). Common methods of calculating capital at risk, Risk, 7 (10), 78-80.

Wilson, William W., William E. Nganje and Cullen R. Hawes (2007). Value-at-risk in bakery procurement, Review of Agricultural Economics, 29 (3), 581-595.

Zangari, Peter (1994). Estimating volatilities and correlations, RiskMetrics—Technical Document, 2nd ed., New York: Morgan Guaranty, 43-66.

Zangari, Peter (1996a). Data and Related Statistical Issues, RiskMetrics—Technical Document, 4th ed., New York: Morgan Guaranty, 163-196.

Zangari, Peter (1996c). Market risk methodology, RiskMetrics—Technical Document, 4th ed., New York: Morgan Guaranty, 105-148.

Ziggel, Daniel, Tobias Berens, Gregor Weiss and Dominik Wied (2014). A new set of improved value-at-risk backtests, Journal of Banking and Finance, 48:29-41.



Chapter 1

  1. Dowd (2005) discusses ETL metrics.
  2. Recall that standard deviation is the square root of variance.
  3. Gradient approximations are discussed in Section 2.3.
  4. As obtained with a Monte Carlo transformation.
  5. Some value-at-risk measures make simplifying assumptions that render the value of 0p unnecessary—it drops out of the calculations. Others either accept 0p as an input or calculate it based on current values of key factors.
  6. See Dale (1996) pp. 60 – 61 and Molinari and Kibler (1983) footnote 41.
  7. See Dale (1996), p. 78.
  8. The Basel Committee on Banking Supervision is a standing committee comprising representatives from central banks and regulatory authorities. Over time, the focus of the committee has evolved, embracing initiatives designed to define roles of regulators in cross-jurisdictional situations; ensure that international banks or bank holding companies do not escape comprehensive supervision by some “home” regulatory authority; and promote uniform capital requirements so banks from different countries may compete with one another on a “level playing field.” Although the Basel Committee’s recommendations do not themselves have force of law, G-10 countries have often implemented those recommendations as statutes or regulations.
  9. Personal correspondence with the author.
  10. These value-at-risk measures are described by Chew (1993).
  11. Founded in 1978, the Group of 30 is a nonprofit organization of senior executives, regulators, and academics. Through meetings and publications, it seeks to deepen understanding of international economic and financial issues. Results of the Price Waterhouse study are reported in Group of 30 (1994).
  12. This incident is documented in Shirreff (1992). See Corrigan (1992) for a full text of the speech.
  13. The name “dollars-at-risk” appears as early as Mark (1991) and “capital-at-risk” as early as Wilson (1992).
  14. The above discussion of RiskMetrics is based upon Guldimann (2000), the author’s own recollections, and private correspondence with Till Guldimann.

Chapter 2

  1. It should be apparent from context when parentheses ( ) are being used to indicate an interval as opposed to an ordered pair.
  2. It should be clear from context whether a prime indicates differentiation of a function as opposed to transposition of a vector or matrix.
  3. Our name reflects the method’s similarity to the method of ordinary least squares. See Exercise 2.16.
  4. In [2.61] if b = 0, set b = |b| = 0.
  5. Consider the equation z2 + 4z + 5 = 0. Factoring the left side, we obtain (z + 2 + i)(z + 2 − i) = 0, indicating the two solutions z = −2 − i and z = −2 + i. Now consider the equation z2 − 4z + 10 = 6. Subtracting 6 from both sides and factoring, we obtain (z − 2)(z − 2) = 0. This has two solutions, but they coincide. We say that the equation has the repeated solution z = 2.
  6. If the vertical-bar notation of [2.131] is unfamiliar to you, it is read as “evaluated at”, so the left side of the approximation indicates a derivative evaluated at a specific point x[0].
  7. See Dennis and Schnabel (1983) for a more sophisticated solution.

Chapter 3

  1. For technical reasons, we should qualify [3.2] and say that it may fail to hold on a set of values for X of probability 0.
  2. Technically, f must be measurable for f (X) to be a random variable.
  3. The use of subscripts in the notation η1 and η2 for skewness and kurtosis is unfortunate because it can lead to confusion if subscripts are also employed to distinguish between different random variables. We use the notation because it is well established.
  4. We could force uniqueness by defining the q-quantile as the supremum of all values satisfying the definition provided in the text.
  5. An alternative would be to derive a mean vector based upon interest rate parity.
  6. A set of vectors is orthonormal if they are orthogonal and normalized (e.g. of length 1).
  7. Treatment of the noncentrality parameter is not standardized in the literature. Some authors define the parameter as in [3.114] but denote it simply δ. Others define the parameter differently, for example, taking a square root in [3.114] or dividing the sum by 2.
  8. The gamma function is defined for any y > 0. It is related to the factorial function by Γ(y) = (y – 1)! for y.
  9. See Stoyanov (1997) for more counterexamples relating to the joint-normal distribution.
  10. We discuss random variate generation in Chapter 5.
  11. We lose no generality by assuming Σ is positive-definite. If Σ were singular, we could perform dimensional reduction as described in Section 3.6.1 to obtain a positive-definite joint-normal random vector.
  12. This and the analysis of Exhibit 3.28 were performed with the Monte Carlo method, which we describe in Chapter 5.
  13. See Spanos (1999) for a detailed discussion including historical notes.
  14. I am indebted to Arcady Novosyolov for this simplification.

Chapter 4

  1. See Holton (2004) for an in-depth discussion of subjective vs. objective probabilities in the context of risk.
  2. Usage of the term “sample variance” is inconsistent. Some authors define estimator [4.27] as the sample variance.
  3. We consider only discrete processes. With continuous processes, t takes on real values.
  4. Usage of the term “white noise” is not uniform. Some authors use the term to mean Gaussian white noise.
  5. Such a realization can be constructed using techniques of random variate generation described in Chapter 5.

Chapter 5

  1. Ulam and Teller were fierce rivals during their tenure at Los Alamos. Rota (1987) indicates that Ulam’s significant contribution to the design of the hydrogen bomb resulted coincidentally from his efforts to prove Teller’s design infeasible.
  2. This incident is described in Eckhardt (1987).
  3. W. S. Gossett, who published under the pen name “Student,” randomly sampled from height and middle-finger measurements of 3000 criminals to simulate two correlated normal distributions. He discusses this methodology in both Student (1908a) and Student (1908b).
  4. Laplace had previously described the potential for statistical sampling to approximate solutions to nonrandom problems, including the valuation of definite integrals. See Chapter V of his Théorie Analytique des Probabilités and a 1781 memoir, both available in his published between 1878 and 1912.
  5. See Eckhardt (1987) and Metropolis (1987) for historical accounts of this early work.
  6. Metropolis and Ulam (1949).
  7. Buffon communicated this problem to the Academy in 1733. See Todhunter (1865) for an historical account of Buffon’s work.
  8. See Chapter V of his Théorie Analytique des Probabilités, available in his collected works published between 1878 and 1912.
  9. Fox’s experiment is reported by Hall (1873).
  10. The approximation may be too good. Gridgeman (1960) documents a number of historical implementations of the needle dropping experiment. He includes a statistical analysis of the plausibility of Fox’s reported results.
  11. In constructing a realization of a sample of size m, we have m degrees of freedom. These allow us to simultaneously satisfy m independent conditions. However, the existence of infinitely many possible tests means there are infinitely many conditions to satisfy. With a continuous distribution, infinitely many of the tests will be independent.
  12. The particular generator used in this analysis was the so-called DRAND48 linear congruential generator, which has parameters η = 248, a = 25,214,903,917 and c = 0. We discuss linear congruential generators next.
  13. Lehmer (1951) considers the case c = 0. Obviously, if c = 0 and z[k–1] = 0, then all subsequent values z[k], z[k+1], z[k+2], … will equal 0. This can’t happen as long as 0 is not used as a seed and η is not divisible by a.
  14. This is the name given to the generator in IBM’s System/360 Scientific Subroutine Package, Version III, Programmer’s Manual, 1968.
  15. See, for example, Park and Miller (1988)
  16. More generally, all that is required is that b not be divisible by η.
  17. Knuth (1997, p. 103) indicates that the number of calculations for dimension n is on the order of 3n. Fincke and Pohst (1985) provide a more detailed complexity analysis.
  18. See J. M. Hammersley and D. C. Handscomb (1964). Monte Carlo Methods, New York: John Wiley & Sons., p. 50.

Chapter 6

  1. Time differences reflect periods when daylight savings time is nowhere in effect.
  2. The two exchanges have an offset arrangement that allows an open contract on one exchange to be closed on the other, so the two exchanges’ contracts are truly fungible.
  3. Open outcry trading for both contracts ends at 2:00 PM each day.
  4. Trading closes in Singapore at 7:00 PM local time.
  5. Log returns are used. Data from days when any of the exchanges were closed is omitted from the calculation. Results are inferred using the method of uniformly weighted moving averages (UWMA) discussed in Chapter 7.
  6. The CSCE is a subsidiary of the New York Board of Trade (NYBOT).
  7. A normalized delta is an option’s delta divided by the option’s notional amount. For vanilla options, normalized deltas are between –1 and 1.

Chapter 7

  1. The effect is partially offset if short-term interest rates are more volatile than long-term interest rates.
  2. Exponentially weighted moving average estimation had been used in time series analysis for some time. Zangari’s contribution is to propose its use in value-at-risk analyses.
  3. A histogram of a time series can be treated as a realization of a sample from the unconditional distribution of the underlying stochastic process if the process is strictly stationary.

Chapter 8

  1. Classic papers include French (1980), Gibbons and Hess (1981) and French and Roll (1986).
  2. If there are no intervening nontrading days, a overnight loan is a loan that commences today and matures tomorrow. A tom-next (tomorrow-next) loan commences tomorrow and matures the next day. A spot-next loan commences in 2 days (spot) and matures the next day. Such loans are convenient for extending an existing loan by a day.

Chapter 9

  1. For simplicity, we assume the portfolio is due to receive no fixed cash flows from caplets whose rate-determination dates have already passed.
  2. Notation 0E( ) indicates an expected value conditional on information available at time 0. See Section 0.4. The vertical bar to the right of each partial derivative is read “evaluated at”, so both partial derivatives are “evaluated at 0E(1R)”. See Section 2.2.4.
  3. Any such portfolio would also have exposures to interest rates and implied volatilities. For this example, we treat these as constant.
  4. Formula [9.30] defines an ellipsoid as long as 1|0Σ (and hence 1|0Σ–1) is positive definite.
  5. A trivial solution is to space the points at equal intervals about the equator of the sphere. This works in all cases and is perfectly symmetrical, but it is uninteresting for our purpose.
  6. It is not uniquely defined. Two stable arrangements of 16 electrons are possible. However, one of these has a lower potential energy as defined by [9.35].
  7. The algorithm is not intended to reproduce the exact motion of l – 1 electrons. To the precision dictated by our stopping condition, the result will be a locally minimum-energy configuration.
  8. Corresponding put deltas are –.75, –.50, and –.25.
  9. These may have negative or imaginary values due to roundoff error.
  10. I am indebted to Craig Dibble, formerly of Bankers Trust, for bringing Garbade’s paper to my attention.
  11. If the basis point covariances seem large, remember that they are based on data from the 1980s.
  12. Because all eigenvectors have length 1, it is meaningful to directly compare variances of corresponding principal components.
  13. In practice, we might not apply a principal-component remapping to eliminate just two dimensions. We apply the remapping here for practice.
  14. For expositional convenience, we change our units of measure from the earlier example.

Chapter 10

  1. In Chapter 3, we adopted the inverse CDF notation Φ-1(q) for quantiles. This was because, if a random variable has unique quantiles, they equal corresponding values of the inverse CDF. A random variable has unique quantiles for q ∈ (0,1) if it is continuous with a PDF that is nonzero on some interval (which can be unbounded or all of real numbers) and zero elsewhere. In essentially all value-at-risk applications, random variables 1L and 1P have conditional distributions that satisfy this criterion. Contrived exceptions are possible; consider, a portfolio composed entirely of expiring digital options.
  2. Value-at-risk measures that employ these have sometimes been called delta-gamma value-at-risk measures, reflecting an assumption that the transformation procedure would be proceeded by a quadratic remapping based on delta-gamma approximations. The name is unfortunate because, as explained in Sections 9.3.6 – 9.3.7, quadratic ramappings should be based on less localized approximations, such as those obtained by interpolation or the method of least squares.
  3. To clarify notation, in Section 1.8.1 we mathematically defined a portfolio as an ordered pair (0p, 1P). Accordingly, notation (53600, 1P) tells us that 0p = 53600.
  4. This is worth emphasizing. For a given sample size and value-at-risk metric, standard error depends entirely upon the PDF of 1P. The Monte Carlo transformation procedure works by constructing a realization of a sample for 1P. The actual mechanics of how that realization is constructed are unimportant for standard error. Factors such as the composition of the portfolio, the number of key factors upon which it depends, or the portfolio mapping affect standard error only to the extent that they shape the PDF of 1P. If we know the PDF of 1P, we don’t need to consider these other factors. Understand this, and you will understand why the Monte Carlo method does not suffer from the curse of dimensionality.
  5. Each Monte Carlo analysis was performed with sample size m = 1000. Standard errors for a sample size of m = 20000 were estimated by taking the sample standard deviation of the results and dividing by the square root of 20.

Chapter 11

  1. Allen (1994) and Wilson (1994b) had already described variants of the approach. Also, as part of the public rollout of RiskMetrics, J.P. Morgan distributed a document entitled RiskMetrics – Directory of Products. This listed third-party vendor products that were compatible with RiskMetrics. One of the vendors, Sailfish, was indicated as offering historical simulation in addition to a value-at-risk measure styled on RiskMetrics.
  2. Wilson (1994b).
  3. Heron and Irving (1996).

Chapter 12

  1. Such a crude value-at-risk measure would probably treat implied volatilities as constant. It could capture vega effects by modeling implied volatilities as key factors. In my work, I have come across a number of linear value-at-risk measures inappropriately applied to non-linear portfolios. All lacked the sophistication to model implied volatilities as key factors.

Chapter 14

  1. Source: 2008 phone interview with Till Guldimann.
  2. Values of n greater than 1 generally don’t come into play.
  3. Since a continuous distribution is being used to approximate a discrete one, a case could be made that rounding the lower solution up and the higher one down would be more consistent with [14.8], but we present the test as Kupiec specified it.
  4. They refer readers to Press et al. (1992) for a description of Kuiper’s statistic.
  5. See Berkowitz and O’Brien (2002) and Pérignon, Deng and Wang (2008).

Title Page

Theory and Practice
Second Edition
Glyn A. Holton
Published by the author.


copyright © 2014, Glyn A. Holton
All rights reserved. No part of this book may be reproduced or transmitted without the express written permission of the author except for the use of brief quotations.
Second Edition, 2014

Holton, Glyn A. (2014). Value-at-Risk: Theory and Practice, second edition, e-book published by the author at

First Edition, 2003

Holton, Glyn A. (2003). Value-at-Risk: Theory and Practice, San Diego: Academic Press.

Published by the author
66 Winslow Rd.
Belmont, MA 02478
United States

14.8 Further Reading Backtesting

14.8  Further Reading – Backtesting

For an alternative discussion of backtesting, see Campbell (2005).

For some notable backtesting methodologies not discussed in this chapter, see Haas (2001), Engle and Manganelli (2004), and Ziggel et al. (2014). See also Christoffersen and Pelletier (2004), Haas (2005), and Berkowitz et al. (2011), who discuss duration-based backtesting methodologies. These are a form of exceedence independence test that assess if intervals between exceedences appear random.

Da Silva et al. (2006), Berkowitz et al. (2011) and Røynstrand et al. (2012) assesses the performance of various backtesting methodologies using actual and/or simulated P&L data.


14.7 Backtesting Strategy

14.7  Backtesting Strategy

Specifying a backtesting program for a trading organization can be an unsettling experience, plagued by data limitations and philosophical quandaries. Here we shall address issues and present practical advice on how to proceed.

14.7.1 Backtesting as Hypothesis Testing

Backtesting, as it is commonly practices, is hypothesis testing. It poses all the familiar challenges of hypothesis testing. Let’s focus on two:

  • Philosophically, hypothesis testing treats the null hypothesis as “valid” or “invalid” whereas in many applications the question is more one of the null hypothesis being either an imperfect but useful assumption or an imperfect and not useful assumption. Stated another way, hypothesis testing is often applied to situations that are “gray” to determine if they are “black” or “white.”
  • If we accept the null hypothesis as either “valid” or “invalid” there remains an uncomfortable tradeoff between the risk of Type I error and that of Type II error—reducing one increases the other, making it difficult—or controversial—to find a balance.

The problems are related. Value-at-risk measures aren’t “valid” or “invalid,” just as the approximation 3.142 for π is not “valid” or “invalid.” Value-at-risk measures and approximations are either “useful” or “not useful,” and usefulness depends on context. For a carpenter, 3.142 may be a useful approximation of π, but it might not be for an astronomer. A particular value-at-risk measure may be useful for assessing the market risk of futures portfolios but not of portfolios containing options on those futures. While we generally speak of “backtesting a value-at-risk measure,” in fact we backtest a value-at-risk measure as applied to a particular portfolio.

With backtesting, we distinguish between those value-at-risk measures we will reject and those we will continue to use for a particular trading portfolio. Where we draw the line is a compromise to balance the risk of rejecting a “valid” value-at-risk measure against that or failing to reject an “invalid” value-at-risk measure. Never mind that this is a compromise over a contrived issue. It really isn’t a compromise at all. Researchers in the social sciences long ago adopted the convention of testing at the .05 or .01 significance level. Use of the .05 significance level predominates, but a researcher whose data is particularly strong may report results at the .01 significance level to emphasize the fact. Accordingly, there is no real debate about what significance level to use. In backtesting, we use the .05 significance level based solely on the established convention and the fact that backtest data is rarely good enough to warrant the .01 significance level.

Bluntly stated, we accept or reject value-at-risk measures based on a convention for how to compromise over a contrived issue. The convention is use of the .05 significance level. The compromise is about balancing the risks of Type I vs. Type II errors. The contrived issue is that of a particular value-at-risk measure being somehow “valid” or “invalid”.

These problems exist for hypothesis testing in fields other than finance. Social scientists embrace the hypothesis testing approach because there aren’t really good alternatives. In backtesting, we are fortunate to have two or three years of data on the performance of a value-at-risk measure. If historical data weren’t so limited, we could go beyond the contrived issue of value-at-risk measures being “valid” or “invalid” and truly assess the usefulness of individual value-at-risk measures. It is limited data, more than anything else, that drives us to accept the hypothesis testing approach to backtesting. Formal hypothesis testing largely substitutes convention for meaningful test design. This may be a weakness, but it is also a strength. Without extensive data, careful test design is impossible. Convention-driven hypothesis testing allows us to make decisions with limited data in a manner that, despite only loosely conforming to our needs, is consistent. Arguably, it represents the best option available to us for interpreting limited data.

14.7.2 Alternatives

The Basel Committee’s traffic light backtest doesn’t employ hypothesis testing. It is just a rule specified by regulators based on their intuitive sense of what seemed reasonable. Its graded response of increasing capital charges within the yellow zone avoids the stark “valid” or “invalid” distinction of hypothesis testing at the expense of creating an illusion of precision. With just α + 1 = 250 data points, it is difficult to draw any conclusion whatsoever about a value-at-risk measure, especially a value-at-risk measure that is supposed to experience just one exceedance every 100 days.

For banks, having their value-at-risk measure perform poorly on the traffic light test would cost more than elevated capital charges. Regulators might force them to go through an expensive and time consuming process of implementing a new value-at-risk measure. At a minimum, poor performance on the traffic light test would attract scrutiny, which banks generally want to avoid.

Rather than entrust such matters to luck, banks have tended to implement conservative value-at-risk measures whose coverage q* well exceeds the 0.99 quantile of loss they purport to measure. Some such measures are so conservative they practically never experience an exceedance.5 This all but guarantees the value-at-risk measures perform well on the traffic light test.

Lopez (1999) builds on the traffic light approach of more finely grading backtest results. Drawing on decision theory, he suggests that the accuracy of value-at-risk measures be gauged by how well they minimize a “loss function” reflective of the evaluator’s priorities, which might include avoiding extraordinary one-day losses or avoiding increased regulatory capital charges. While this is consistent with the goal of accepting or rejecting value-at-risk measures based on an assessment of their usefulness, it poses a risk of drawing conclusions not warranted by limited data available for backtesting.

Lopez’s approach compares

  • the value of the loss function achieved by a value-at-risk measure over a period of α + 1 observations, and
  • some benchmark value an accurate value-at-risk measure might have achieved over the same period.

Depending on how the loss function is defined, this can be straightforward, or it can entail assumptions. For example, if the loss function is set equal to the number of exceedances experienced over the α + 1 observations, Lopez’s methodology reduces to a simple coverage test. For a more interesting—and problematic—loss function, define the magnitude of an exceedence as the maximum of 1) a portfolio’s actual loss minus the value-at-risk for that period, and 2) zero. A loss function based on the magnitude of exceedences addresses a concern of many managers: how bad can a loss be on days it exceeds reported value-at-risk? But evaluating a benchmark for such a loss function requires some assumptions as to how an accurate value-at-risk measure might have performed. Should a value-at-risk measure fail a backtest based on such a loss function, the question arises as to whether the problem resides with the value-at-risk measure or with the assumptions used to model the benchmark.

14.7.3 Joint Tests

Joint tests are backtests that simultaneously assess two or more criteria for a value-at-risk measure—say coverage and exceedance independence. Such tests have been proposed by Christoffersen (1998) and Christoffersen and Pelletier (2004). Campbell (2005) recommends against their use:

While joint tests have the property that they will eventually detect a value-at-risk measure which violates either of these properties, this comes at the expense of a decreased ability to detect a value-at-risk measure which only violates one of the two properties. If, for example, a value-at-risk measure exhibits appropriate unconditional coverage but violates the independence property, then an independence test has a greater likelihood of detecting this inaccurate value-at-risk measure than a joint test.

14.7.4 Designing a Backtesting Program

When a value-at-risk measure is first implemented its performance will be closely monitored. Data will be insufficient for meaningful statistical analyses, but a graph such as Exhibit 14.1 can be updated monthly and monitored for signs of irregular performance. Parallel testing against a legacy value-at-risk measure is also appropriate. At this stage, the goal is primarily to address Type B model implementation risk. Coding or implementation errors can produce noticeable distortions in a value-at-risk measure’s performance, even over short periods of time.

At six months, coding or other implementation issues should have been identified and resolved. If any of these motivated substantive changes in the value-at-risk measure or its output, you will want to wait until six months after the last substantive change before performing any statistical backtests. Results from our recommended standard distribution test are likely to be the most meaningful at this point, as six months of data really isn’t enough for coverage or independence tests.

Perform another backtest at one year. Now include our recommended standard independence test. If you calculate value-at-risk at the 90% or 95% level, also include our recommended standard coverage test. Otherwise, wait two years before performing all three of our recommended standard tests. Continue to backtest annually using those three tests. Use all available data generated since the last substantive change to thevalue-at-risk system, up to a maximum of five years.

I recommend institutions use the three recommended standard tests described in this chapter. They are as good as any you will find in the literature, and better than most. Some widely cited backtests are flawed or ineffective. Banks will also need to perform the traffic light backtest, as required by their regulators. Backtests should be performed with both clean and dirty data.

14.7.5 Failing a Backtest

Because they are performed at the .05 significance level, failure of any one of our recommended standard backtests is a strong indication of a material shortcoming in a value-at-risk measure’s performance. Your response will depend on the particular test failed, whether it was failed with clean or dirty data, and your assessment of the circumstances that caused the failure. A graph similar to Exhibit 14.10 is useful for diagnosing problems identified by coverage or distribution tests.

Failure of a clean test—or both a clean test and the corresponding dirty test—is indicative of a Type A (model design) or Type B (implementation) problem with the value-at-risk measure. Focus your analysis first on eliminating the possibility of an implementation or coding error. Only then address the possibility of Type A design shortcomings.

A design shortcoming may not necessarily dictate a fundamental change in the design of your value-at-risk measure. If your value-at-risk measure already incorporates sophisticated analytics suitable for your portfolio, modifying those analytics may not be productive. A review of your backtesting data may indicate that an ad hoc solution, such as multiplying output by a scalar, may fix the problem

For example, if your value-at-risk measure failed a clean recommended standard distribution test, and you are comfortable the model design is appropriate for your portfolio, you can go back and redo the distribution test using the same past value-at-risk measurements, but multiply each by a scalar w. Through trial and error, or some search routine, you can solve for that value w that optimizes performance on the test (i.e. maximizes the sample correlation between the nj and ). Going forward, scale value-at-risk measurements by that value w.

Some may feel uncomfortable with an ad hoc solution like this. Keep in mind that a value-at-risk measure is a practical tool. Our goal is not to develop some theoretically beautiful model for the complex dynamics of markets. All we require is a reasonable indication of market risk. The philosophy of science tells us to judge a model based on the usefulness of its predictions and not on the nature of its assumptions. If we can fix a value-at-risk measure by simply scaling its output, then there is every reason to do so.

Of course, this solution only applies if a value-at-risk measure is already sophisticated enough to capture relevant market dynamics. If a portfolio is exposed to vega risk or basis risk, and the value-at-risk measure isn’t designed to capture these, no amount of scaling of that value-at-risk measure’s output is going to solve the problem. If a Monte Carlo value-at-risk measure is so computationally intensive that there is only time for a sample of size 250 for each overnightvalue-at-risk analysis, the standard error will be enormous. Scaling the output will not solve this problem. The computations need to be streamlined—perhaps with a holdings remapping and/or variance reduction—and the sample size increased.

Tweaking a poorly designed value-at-risk measure is only going to produce another poorly designed value-at-risk measure. If a value-at-risk measure is fundamentally unsuited for the portfolio it is applied to, it needs to be fundamentally redesigned.

Some shortcomings of value-at-risk measures must be lived with. The standard UWMA and EWMA techniques for modeling covariance matrices do not address market heteroskedasticity well. As we indicated in Chapter 7, there are currently no good solutions to this problem. Today’s value-at-risk measures are slow in responding to rising market volatility. During such periods, they tend to experience clustered exceedances. Similarly, when volatilities decline, they again lag, and may experience few or no exceedances. These phenomena may cause a value-at-risk measure to fail an independence test. There is little that can be done about the problem.

Failure of a dirty test and not the corresponding clean test is an indication of a Type C model application problem.

14.7.6 Backtesting Other PMMRs

This chapter, like the literature, has focused on backtesting of value-at-risk measures. If you employ some other PMMR, coverage and exceedance independence tests will not apply, but it may be possible to develop tests analogous to those tests for your particular PMMR. Our recommended standard distribution and independence tests are not limited to value-at-risk. They can be applied with most PMMRs.


14.6 Example: Backtesting a One-Day 95% EUR Value-at-Risk Measure

14.6  Example: Backtesting a One-Day 95% EUR Value-at-Risk Measure

Assume a one-day 95% EUR value-at-risk measure was used for a period of 125 trading days. Data gathered for backtesting is presented in Exhibit 14.8. We have already used the data from the second and third columns to construct Exhibit 14.1. We will now use the data to apply coverage, distribution and independence backtests.

14.6.1 Example: Applying Coverage Tests

To apply a coverage test, we need

  • the quantile of loss the value-at-risk measure is intended to measure: q = 0.95,
  • the number of observations: α + 1 = 125, and
  • the number of exceedances x = 10.

The last value is obtained by summing the 0’s and 1’s in the fourth column of Exhibit 14.8.

Exhibit 14.8: Backtesting data for a one-day 95% EUR value-at-risk measure compiled over 125 trading days. Value-at-risk (VaR) and P&L values in the second and third columns are expressed in millions of euros. The exceedance column has a value of 1 if the portfolio realized a loss exceeding the 0.95 quantile of loss, as determined by the value-at-risk measure. Otherwise it has a value of 0. The last column indicates the specific quantile of loss for each P&L result, again, as determined by the value-at-risk measure.

It can also be obtained by visual inspection of Exhibit 14.1.

In Exhibit 14.3, we find that our recommended standard coverage test’s non-rejection interval for q = 0.95 and α + 1 = 125 is [2, 11]. Since our number of exceedances falls in this interval, we do not reject the value-at-risk measure.

In Exhibit 14.4, we find that the PF test’s non-rejection interval for q = 0.95 and α + 1 = 125 is [2, 12]. Since our number of exceedances falls in this interval, we do not reject the value-at-risk measure.

We cannot use the Basel Committee’s traffic light coverage test because it applies only to 99% value-at-risk measures.

14.6.2 Example: Applying Distribution Tests

For distribution testing, we apply [14.10] to the loss quantiles tu and arrange the results in ascending order to obtain the nj. Values for the  are obtained from [14.11], with α + 1 = 125. Values for nj and  are presented in Exhibit 14.9.

Exhibit 14.9: Values nj and calculated for the data of Exhibit 14.8.

These are plotted in Exhibit 14.10.

Exhibit 14.10: A plot of the points (nj , ) from Exhibit 14.9.

The graphical results are inconclusive. The points do fall near a line of slope one passing through the origin, but the fit isn’t particularly good. Is this due to the small sample size, or does it reflect shortcomings in the value-at-risk measure? For another perspective, we calculate the sample correlation between the nj and  as .997. Consulting Exhibit 14.6, we do not reject the value-at-risk measure at either the .05 or the .01 significance levels.

14.6.3 Example: Applying Independence Tests

Starting with Christoffersen’s test for independent tI, we use the data of Exhibit 14.8 to calculate α00 = 105, α01 = α10 = 9 and α11 = 1. From these, we calculate = 0.9211,  = 0.9000 and   = 0.9194. Our likelihood ratio is


so –2log(Λ) = 0.0517. This does not exceed 3.814, so we do not reject the value-at-risk measure.

Next, applying our recommended standard independence test, we use [14.21] to calculate values tn from the loss quantiles tu. Results are indicated in Exhibit 14.11. 

Exhibit 14.11: Example values tn for our basic independence test. They are identical to those in the first column of Exhibit 14.9, only they are ordered by time while those in Exhibit 14.9 are ordered by magnitude.

We calculate the sample autocorrelations of the tn for lags 1 through 5 as indicated in Exhibit 14.12. 

Exhibit 14.12: Example sample autocorrelations for use with our basic independence test.

Our test statistic—the largest absolute value of the autocorrelations—is 0.132. This is less than the non-rejection value 0.274 obtained from Exhibit 14.7, so we do not reject the value-at-risk measure at the .05 significance level.

Exhibit 14.13 below presents 125 days of performance data for a one-day 99% USD value-at-risk measure for use in Exercises 14.8 through 14.11. For convenience, you should be able to cut and paste it from this webpage into a spreadsheet.
time99% VaRP&Ltime99% VaRP&Ltime99% VaRP&L
tat t – 1at ttat t – 1at ttat t – 1at t
Exhibit 14.13: Daily performance data (in USD millions) for a one-day 99% USD value-at-risk measure for use in Exercises 14.8 through 14.11.

In this exercise you will perform several coverage backtests.

  1. Use the data of Exhibit 14.13 to calculate exceedence data ti. Save your results, as you will need them again in Exercise 14.11.
  2. Use the data of Exhibit 14.13 to construct a graphical backtest similar to Exhibit 14.1.
  3. Apply our recommended standard coverage test at the .05 significance level using your results from part (a).
  4. Apply Kupiec’s PF coverage test at the .05 significance level using your results from part (a).
  5. Apply the Basel Committee’s traffic light coverage test using your results from part (a).



In this exercise, you will perform the graphical and recommended standard distribution tests of Section 14.4 using the data of Exhibit 14.13.

  1. Our value-at-risk measure is a linear value-at-risk measure that assumes tL is conditionally normal with t–1E(tL) = 0. Use this information and the data of Exhibit 14.13 to calculate loss quantile data tu.
  2. Apply the inverse standard normal CDF to your loss quantile data tu to obtain values tn. Save your results, as you will need them again in Exercise 14.11.
  3. Order your values tn by magnitude. Denote the ordered valued nj.
  4. Calculate values  as describe in Section 14.4.2.
  5. Plot of the points (nj, ) in a Cartesian plane. Interpret the result.
  6. Calculate the sample correlation between the nj and . Does the value-at-risk measure pass or fail our recommended standard distribution test at the .05 significance level?



In this exercise, you will perform Christoffersen’s exceedences independence test using the data of Exhibit 14.13.

  1. Retrieve the exceedence ti data you calculated for Exercise 14.8, and use it to calculate values α00,  α01,  α10 and  α11.
  2. Use your results from part (a) and formulas [14.16], [14.17] and [14.18] to calculate values  and  .
  3. Use your results from parts (a) and (b) and [14.20] to calculate the log likelihood ratio –2log(Λ). What conclusion do you draw?



In this exercise, you will perform our recommended standard loss quantile independence test using the data of Exhibit 14.13.

  1. Retrieve the tn data you calculated for Exercise 14.9, and calculate its sample autocorrelations for lags 1 through 5.
  2. Take the absolute value of each sample autocorrelation, and then take the maximum of the five results. Based on this, does the value-at-risk measure pass or fail the recommended standard independence test at the .05 significance level?



14.5 Backtesting With Independence Tests

14.5  Backtesting With Independence Tests

Independence tests are a form of backtest that assess some form of independence in a value-at-risk measure’s performance from one period to the next. Independence of exceedances tI and independence of loss quantiles tU are separate forms of independence that might be tested for. We have already seen that coverage tests assume the former and most distribution tests assume the latter. If a value-at-risk measure fails an independence test, that can cast doubt on coverage or distribution backtest results obtained for that value-at-risk measure.

There is no way to directly test for independence, so null hypotheses address specific properties of independence—say exceedances not clustering or loss quantiles not being autocorrelated. Accordingly, backtests for independence can be judged, among other things, based on how broad their null hypotheses are.

14.5.1 Christoffersen’s 1998 Exceedence Independence Test

Christoffersen’s (1998) independence test is a likelihood ratio test that looks for unusually frequent consecutive exceedances—i.e. instances when both t–1i = 1 and ti = 1 for some t. The test is well known, since it was first proposed in an often-cited endorsement of testing for independence of exceedances.

Extending our earlier notation q* for the coverage of a value-at-risk measure, we define



These are the value-at-risk measure’s conditional coverages—its actual probabilities of not experiencing an exceedance given that it did not (in the case of ) or did (in the case of ) experience an exceedance in the previous period. Our null hypothesis script h naught is that q*.

If a value-at-risk measure is observed for α + 1 periods, there will be α pairs of consecutive observations (t–1i, ti). Disaggregate these as


where α00 is the number of pairs (t–1i, ti) of the form (0, 0); α01 is the number of the form (0, 1); etc. We want to test if


which would support our null hypothesis. We apply a likelihood ratio test as follows. Assuming script h naught doesn’t hold, we estimate  and with



Assuming script h naught does hold, we estimate q* with


Our likelihood ratio is



and –2log(Λ) is approximately centrally chi-squared with one degree of freedom—that is –2log(Λ) ~ χ2(1,0)—assuming script h naught. The 0.95 quantile of the χ2(1,0) distribution is 3.841, so we reject script h naught at the .05 significance level if –2log(Λ) ≥ 3.841. Similarly, we reject it at the .01 significance level if –2log(Λ) ≥ 6.635.

The test largely depends on the frequency with which consecutive exceedances are experienced. As these are inherently rare events, the test has limited power. Also, the test isn’t defined when there are no consecutive exceedances at all, which is common. Christoffersen doesn’t address this situation. In some cases it may be reasonable to simply accept the null hypothesis when there are no consecutive exceedances, but not always. For example, if you backtest a one-day 90% value-at-risk measure with 1,000 days of data, there should be about 10 instances of consecutive exceedances. If there are none, it might be inappropriate to accept the null hypothesis.

14.5.2 A Recommended Standard Loss-Quantile Independence Test

For a recommended standard test, we assess the independence of the values tN obtained by applying the inverse standard normal CDF to the loss quantiles tU:


Note that this is the same transformation we made with [14.10]. As before, given loss quantile data mu, m+1u, … , –1u, we apply [14.21] to obtain values mn, m+1n, … , –1n.

We adopt the null hypothesis that the autocorrelations


are all 0 for lags k = 1, 2, 3, 4 and 5. We test this hypothesis by calculating the sample autocorrelations of our data mn, m+1n, … , –1n for those same five lags. We take the maximum of the absolute values of the five sample autocorrelations. That is our test statistic. We reject the null hypothesis at the .05 significance level if the test statistic exceeds the non-rejection value indicated for sample size α + 1 in Exhibit 14.7.

Exhibit 14.7: Non-rejection values for the recommended standard independence test at the .05 and .01 significance levels. If the test statistic exceeds the non-rejection value, the null hypothesis is rejected at the indicated significance level.

Non-rejection values were calculated for each sample size α + 1 with a Monte Carlo analysis that found the 0.95 (for the .05 significance level) or 0.99 (for the .01 significance level) quantile for the test statistic assuming the null hypothesis.


In Christoffersen’s 1998 independence test, α01 routinely equals α10. Why is this, and what would cause them to differ?


A value-at-risk measure is to be backtested using Christoffersen’s 1998 independence test. Based on 250 days of exceedence data, α00 = 237, α01 = α10 = 5, and α11 = 2. Do we reject the value-at-risk measure at the .10 significance level?


A value-at-risk measure is to be backtested using our recommended standard independence test and 500 days of data. Values tn are calculated, and their sample autocorrelations are determined to be 0.034, –0.078, –0.124, 0.107 and 0.029 for lags 1 through 5, respectively. Do we reject the value-at-risk measure at the .05 significance level?