Bureau of Business Research Graduate School of Business Administration University of Michigan August 1972 A Nonparametric Approach to the Construction of Prediction Intervals for Time Series Forecasts Working Paper No. 63 W. Allen Spivey William E. Wecker >Graduate School of Business Administration University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express premission of the Bureau of Business Research

BACKGROUND OF THIS PAPER This paper was given before the meetings of the American Statistical Association, August 16, 1972, in Montreal, Canada. ABSTRACT The usual approach to the construction of confidence limits for time series forecasts begins with the assumption that the forecast errors are normally distributed, Williams and Goodman (JASA, December, 1971) have recently reported on a practical forecasting problem in which the data did not support an assumption gof normality. They suggested an alternative procedure which requires that the underlying distribution of forecast errors be specified, although not necessarily as a normal distribution. In this paper we develop a nonparametric approach to the construction of confidence limits which requires no such assumption concerning the distribution of forecast errors. This nonparametric procedure can be calculated from data more simply than either procedure above, and avoids the possibility of an erroneous distributional assumption and its attendant problems.

A Nonparametric Approach to the Construction of Prediction Intervals for Time Series Forecasts 1. Introduction. When forecasting a time series one is sometimes interested in specifying a random interval such that the actual future observation will lie in the interval with a given probability. This type of interval is sometimes called a prediction interval and is to be contrasted with a confidence inters val, which, with a given probability, contains a fixed but unknown parameter. Denote the time series to be forecast as {.*.xt-1 Xt' Xtl ' }, A the forecast of X t+ made at time t as Xt+l and the forecast error as et+l = Xt+ - Xt+l Then we seek a prediction interval (X t++L, Xt+,+U) such that the probability is a that the random variable Xt+l will appear in the interval, i.e., an interval such that AhA (1) Pr{Xtl(Xt+l+L, Xtl+U)} = a, L < 0 < U. It is convenient to rewrite (1) so as to refer to the forecast error et and we have analogously for this random variable (2) Pr{et ls(L U) } = a.

- 2 - The standard approach to the construction of prediction intervals requires the assumption that the et are independent, identically distributed normal random variables 11], 12], 13]. Recently, Williams and Goodman described a practical forecasting study in which prediction intervals were desired, but in which also the normality assumption was found to be inappropriate [7]. As a result prediction intervals which were calculated on the basis of the normal theory failed to enclose the forecast error with the theoretically expected frequency. They developed an alternate procedure which requires one to determine a family of distributions having a member which gives a good fit to the empirical distribution of the forecast errors. For the data they considered the gamma family was chosen. The Williams and Goodman procedure provides a general method for the construction of prediction intervals when the assumption of normality is not appropriate; however, a sophisticated data analysis capability is required to determine an appropriate alternate family of distributions. The purpose of this paper is to suggest another approach to the construction of prediction intervals -- distribution-free prediction intervals -- which does not require specification of an alternate family of distributions and so avoids the extensive data analysis necessary in the Williams and Goodman procedure. 2. Distribution-Free Prediction Intervals. We assume that the one step ahead forecast errors et are independent, identically distributed random variables with a fixed (continuous) density

- 3 - function f(e). Following the suggestion of Williams and Goodman 17], we have a moving sample of fixed size n. That is, as the most recent forecast error is observed the error for the most remote time point in the sample is deleted so as to maintain a fixed sample size n. Let the sample observations be etn,.-, et, and let L and U the lower and upper end points, respectively, of the prediction interval be defined by the following functions: (3) L = g(etnr *- e) = e(1) (4) U= h(etn,..., et) = e(n) where e(l) and e(n) are the first and n order statistics from a sample of size n from the distribution of errors f(e). We now calculate the probability a that the random variable et+1 will appear in this interval, (5) a = Pr{L < et+ < U}. If et+1 is in the interval (L,U), then it must with probability one be in one of the subintervals (e(i), e(i+l ), i = 1, -, n-l, and the probability that et+l will be in any one of these subWe regard the assumption of independence as a reasonable one because if the errors were dependent, one could in principle utilize this in predicting future errors based on past errors and thereby improve the forecast. For example, if the observed serial correlation of the errors suggests that they can be modeled by et =.9 et1 + at, where at is serially uncorrelated with mean 0, then we can use this equation for forecasting future errors. The forecast accuracy would, of course, depend on the error term at..~~~

- 4 - intervals is the same as for any other subinterval. Since the endpoints of the subintervals are order statistics and since the order statistics can be selected in n! equally likely ways, there is a total of n!(n-l) equally likely ways for the random variable et+1 to be in the prediction interval. Now there are (n+l)! equally likely ways for the order statistics e(1), i e(n+l) ' to be arranged and therefore (ni l) n! n'-l a = Pr{L < et+ < U} - (n -)n! nl (n n+l More generally for L = e(r)' U = e(s) r < s <n, S -r (6) a = Pr{e( < e < e } = - (r) t+-l (s) Ja1n+l 3. Distribution:-Free Tolerance Intervals. Yet another distributionfree interval of potential use in forecasting is the tolerance interval. Let L and U be symmetric functions of the sample etn, *.., et such that the random interval (L,U) will with probability y cover at least a fixed, preassigned proportion p of the density f(e), i.e., an interval such that U (7) Pr If f(e) de > p } = y. L Such an interval is called a 100p% tolerance interval at probability level y[6, p. 334-336]. We observe that (7) can be also written (8) Pr{F(U) - F(L) > p} = y,

U where F(U) = f(e) de. For the case in which L and U are given by L = e(r ) U = e(s) r < s $ n it is well known that F(e(s) - F(e r) has a beta distribution and is independent of F(e) 16, p. 238]. Therefore the tolerance intervals defined by the order statistics e(r) and e(s) will be distribution-free. For fixed preassigned p,r,s, and n the incomplete beta function can be used to evaluate y for the tolerance interval (7), n-1 s-r-1 n-s+r (9) Y = / n;(srp)U (l-u)n-s+ du, p 0 u< 1. 4. Relations Between Distribution-Free Prediction and Tolerance Intervals. For a fixed sample size n, the distribution-free tolerance interval (L,U) will have an expected coverage U E {f f(e)de } = E{ F(e(s) F(e(r) } = n L the mean of the beta distribution parameterized as in (9). Therefore the a level of a prediction interval defined by (er), e s) is identically the expected coverage of the (e (r) e(s) tolerance interval. Additionally the values p and x of the tolerance interval satisfy the relation (10) a > pT,

- 6 - as can be easily shown.2 5. The Application of Dist:ribution-Free Intervals in Forecasting. In order to make distribution-free interval forecasts one begins by generating sequentially point forecasts for known values of the time series using only information which was available prior to the observation being forecast. In this way a time series of forecast errors can be constructed. In practice, one chooses two positive integers, r and s, r < s, and values of a from (6), p and y from (7) to satisfy the needs of the particular forecasting problem under study. Relation (6) can then be used to determine the sample size n required for a chosen a level distribution-free prediction interval (e(r) e(s)). Tables of the incomplete beta function can be used to determine a sample size for choices of p and y. Charts are also available 14], t5] 2 Let f(U,L) be the joint density of U and L and r be the set of values of U and L such that U f f(et+l)det+l P then a = Pr(L < et+1 < U) = JfPr(L < et1 < UIU,L)f(U,L)dUdL allUfL - Ff Pr(L < et+1 < U|U,L)f(U,L)dUdL U,Ls~ _ p ff f(U,L)dUdL py. U,Ler

- 7 - which can be used to quickly determine n given r,s,p and y. When the intervals have been specified, the following statements can be made with respect to future forecasts based on these intervals. For continued one step ahead forecasting we can expect the interval forecast to enclose 100a% of the actual forecast errors as in (2). We can also be 100Y% confident that at least 100p% of the future observed errors will be enclosed. Analogous statements can be made with respect to the observations on the time series of interest instead of the time series of forecast errors by making use of (1) instead of (2). 6, Empirica;al: 'Resul:ts A brief empirical investigation was conducted on six time series of forecast errors of 131 points each which were supplied by the Bell Laboratories. The series being forecast are the same series discussed in Williams and Goodman 17]; however, the forecasting equation used differs slightly from their equation (3.1) 2 in that the quadratic term t is not present. Five distribution-free prediction intervals were calculated each from a moving sample of size 19: Prediction Interval I 0~ Max{ le(l9), Je(1) I} (e(1), e(19)) III 0+ 2nd largest{Ie(9) Ile(18Ie) I(l) Ie2) I IV 0 4th largest{ le(19) let (18) 1 Ie(17) I Ie(16) 1' le(l) Ile(2) 1. Ie(3) I, e(4) 1 V (e(2) e(18))

- 8 - The sample size was chosen using equation (6) so that the prediction intervals would enclose the forecast error with the following probabilities: Prediction Probability Interval Forecaast error enclosed I.95 II.90 III.90 IV.80 V.80 The observed relative frequency of successful enclosures for the distribution-free prediction intervals was compared to the observed relative frequency of successful enclosures based on the normal theory. The latter were supplied by Williams and Goodman in a private communication. The results are presented in Table 1. Table 1 Theoretically Expected Relative Observed Relative Frequency Expected Relative of Successful Enclosures Frequency of Successful Based on Normal Enclosure s. As'sumption Distribution-ree.95.8948.9479.90.8257.8988 (interval II).8929 (interval III).80.7163.8140 (interval IV)...............7,97.6, (in.terval V)

REFERENCES 1. Box, G.EIP. and G.M. Jenkins, Time -Series' Analysis, San Francisco, California: Holden-Day, Inc., 1970. 2. Goldberger, A.S., Econometric Theory, New York: John Wiley and Sons, 1964. 3. Graybill, F.A., An Introduction to Linear Statistical Models, vol. I, New York: McGraw-Hill Book Co., 1961. 4. Murphy, R.B., "Nonparametric Tolerance Intervals," Annals of Mathematical St;atis:tics, vol. 19, 1948, pp. 581-589. 5. Walsh, J.E., Handbook of Nonparametric Statistics, Princeton, N.J.: D. Van Nostrand Co., 1962. 6. Wilks, S.S., Mathematical Statistics, New York: John Wiley and Sons, 1962. 7. Williams, W.H. and M.L. Goodman, "Constructing Empirical Confidence Limits for Economic Forecasts,"1 Journal of the American Stiatistical Assn., vol. 66, no. 336, December 1971, pp. 752-754.

------- ---