l Bureau of Business Research Graduate School of Business Administration University of Michigan December, 1970 BUREAU OF BUSINESS RESEARCH WORKING PAPER NO. 16 TIME SERIES FORECASTING PROCEDURES FOR AN ECONOMIC SIMULATION MODEL by Kenneth O. Cogger Research Fellow University of Michigan Copyright ( 1970 by The University of Michigan Printed in the United States of America All rights reserved

I I BACKGROUND OF THIS PAPER This article is based on research performed by the author in partial fulfillment of requirements for the Ph. D. degree at the University of Michigan. Additional research for this paper has been done at the Bureau of Business Research, Graduate School of Business Administration.

I I! I. Introduction In order to introduce the topic of time series analysis with emphasis on forecasting, it is necessary to clarify and define several concepts. A time series will, for our purposes, be narrowly defined as a family of real-valued random variables {X(t),teT} where the index set T is the set of all integers. That is, T = {...,-1,0,1,...}. It will be assumed that every finite collection {X(tl),...,X(t )} is a set of random variables with a specified joint distribun tion function F.. (X(t **(t).X(t)). This distribution function must tli ***a t n satisfy two consistency requirements. First, if al' '. an is any permutation of 1, 2,..., n, then F, t (X(tl)..,X(t)) = F t Xt (t),...X(t )). 1t itn n t. 1 n Second, for any integer m<n, Ft (X, Httn (X(t)'',XX(t)) (t X( h9 1? m X(t)m + ' 1 k, n jUnder these definitd rs, r..., n Under these definitions and restrictions, our discussion is necessarily limited to real-valued time series defined over a discrete time domain. The class of all such time series is sufficiently large, however, to include any time series appearing in the economic sector of the real world. In particular, this class includes all time series that would appear in a simulation model of the automobile industry. The definition of a time series given above leads to the equivalence of the terms "time series" and "stochastic process". The theory of stochastic processes has developed rapidly over the past thirty years. During this period, time series analysis has utilized two distinct, but overlapping approaches. The first of these approaches is restricted to what may be termed the frequency domain. The object of time series analysis in the frequency domain is to obtain

-2 -estimates of what is called the spectrum in order to determine whether a given time series exhibits certain fundamental rhythms or harmonics. This method of "hidden periodicities" has given way in recent years to utilizing the spectrum as a broad characterization of any time series regardless of the existence of harmonics. Time series analysis in the frequency domain, usually called spectral analysis, has been utilized most effectively in physics, engineering, and other non-economic areas. In fact the early theoretical developments in time series 1 2 analysis by Kolmogorov and Wiener were based mainly on spectral analysis. The second approach to time series analysis is restricted to the more familiar (at least to economists) time domain. There are several important reasons for the fact that most applied economic time series analysis have taken place in the time domain. The first of these reasons has already been suggested. The time domain is simply a more familiar setting for economists interested in time series analysis. A second reason is that spectral analysis does not provide ready answers to questions related to forecasting. While spectral analysis may have merits for characterizing a given time series in a general way, the spectrum has no direct connection with time series forecasting. Since forecasting is often an ultimate goal, as in the automobile study, the decision is usually made to conduct economic time series analyses in the time domain, where statistical results are immediately applicable to forecasting. A third reason for preferring analyses in the time domain over spectral analysis is particularly pertinent to the automobile study. Within a simulation model for an industry, managerial decision rules must be formulated in an unambiguous manner. These rules can be implemented only after the input of various estimates and forecasts based upon time series analysis. From the standpoint of operating the simulation model, Kolmogorov, A. N., "Aufangsgrvude der Theorie der Markoffschen ketten mit unendlich vieleu moglichen Zustanden," Mat. Sbornik (N.S.), 1 (1936), 607-10. 2Wiener, N. Extrapolation, Interpolation, and Smoothing of Stationary Time Series. New York: Wiley, 1949.

-3 -both the decision rules and their inputs must be capable of unambiguous description in terms of an algorithm. It is precisely this requirement of a simulation model that leads to the rejection of spectral analysis as a procedure for providing these inputs. Spectral analysis yields primarily graphical results which require a great deal of visual (and therefore subjective) interpretation. This lack of objectivity leads to considerable ambiguity in practice regarding the interpretation of sample spectrums. To be sure, graphical results abound within the field of time series analysis in the time domain also, but these results are not usually interpreted visually. A fourth reason for preferring analyses in the time domain over spectral analysis is that the latter is strictly valid only for stationary time series. The concept of stationarity will be fully described in the next section but it suffices to state now that many economic time series exhibit non-stationarity, effectively limiting the usefulness of spectral analysis within an economic simulation model. II. Time Series Representations or Models Time series may be characterized according to several different prop3 erties. Many of these properties are summarized in detail in Doob, a valuable reference source for the theory of stochastic processes. One of the most frequently discussed properties of time series is that of stationarity. A strictly stationary time series is defined to be any time series possessing the property that every finite collection of random variables {X(tl),...,X(t )} has the same n joint distribution function as {X(tl+h),...,X(t +h)} for any integer h. A consequence of this definition is that if the above collection of random variables has a multivariate normal distribution function, then the time series is strictly stationary if and only if the expected value of X(t) is independent of t and the covariance of X(tj) and X(tk) is a function only of the difference tj - tk. This c.. k 3Doob, J, Stochastic Processes. New York Wiley, 1953. Doob, J. L., Stochastic Processes. New York: Wiley, 1953.

I I -4 -characterization of multivariate normal distributions by their second-order, or covariance, properties leads to a less restrictive distinction between types of time series. A weakly stationary or second-order stationary time series (not necessarily normally distributed) is defined to be any time series such that the expected value of X(t) is independent of t and the covariance of X(t) and X'(t+h) is a finite-valued function only of h. For future reference it is convenient to introduce the notation x(tjtk) = C(X(tj), X(tk)) for the covariance of the random variables X(t.) and X(tk). If the time series of interest is weakly stationary, we shall write Yx(h) = C(X(t), X(t+h) as the covariance of the random variables X(t) and X(t+h). It is important to note that many economic time series are non-stationary since their expected values are not constant. Other types of non-stationarity are possible of course, including the possibility that y (tjtk) is not a function of t. - tk and the possibility that this covariance function fails to exist. These three types of non-stationarity will be referred to as non-stationarity in the mean, in the covariance, and in the variance, respectively. Within the class of weakly stationary time series, further characterization is possible in terms of linear representations. The first type of linear representation for a weakly stationary time series is called a moving average. A weakly stationary time series {X(t),tcT} possessing an expected value M = E{X(t)} is said to be a moving average process of order q if there exists a sequence of real numbers b = 1, b1, b2,..., b such that q q (1) X(t) = M+ X b. c(t-j); teT, j=o

- D where {e(t),tET} is a weakly stationary time series with zero expected value and a covariance y (h) which is zero for all hoo. Usually M is set equal to zero without loss of generality. A second type of linear representation for a weakly stationary time series is known as an autoregressive process. A weakly stationary time series {X(t),t~T} possessing an expected value M = E{X(t)} is said to be an autoregressive process of order p if there exists a sequence of real numbers a = 1, al, a2,..., a such that 0 o p P p (2) [ a.[X(t-j)-M] = ~(t); teT j=o J where {c(t),tzT} has the same properties as given above. A time series {X(t),tET} whose covariance Yx(tj.tk) is a function only of t. - tk but yet whose expected value is a function of t is non-stationary in the mean as we said before. A time series exhibiting such non-stationarity may still, however, possess a representation given by (1) or (2). In this event, we shall still refer to these representations as a moving average process and an autoregressive process, respectively. 6r* In examining representations (1) and (2) with M = 0, the theory of ordinary linear difference equations plays a major role. Since (2) is (apart from its stochastic nature) merely a linear difference equation of order p, a general solution for X(t) as a linear combination of e(t), e(t-l),... can be found. This suggests, then, that any representation of the form (2) can be inverted into a representation of the form (1). Since this suggests the possible equivalence of a moving average representation and an autoregressive representation, it is worthwhile to examine this possible equivalence in detail. Such a detailed examination is beyond the scope of this paper, but we certainly should state those conditions under which a moving average process permits an equivalent representation as an autoregressive process. A sufficient condition for this equivalence is that the variance of X(t), given by be finite. From ), this requires the variance of X(t), given by Yx(O), be finite. From (1), this requires 4Ibid., 95.

I I -6-: q (3) Y (o) b < " j=o J 3=0 where y (o) is simply the variance of s(t). While for finite values of q this may appear to be a trivial condition, one last property of the representations (1) and (2) indicates the non-triviality of condition (3). This property is that a moving average process of finite order q is equivalent to an autoregressive process of infinite order and that an autoregressive process of finite order p is equivalent to a moving average process of infinite order. Depending on whether p is finite, then, we may have to let q approach infinity in (3) and convergence of the sum in (3) must be verified before deciding on the equivalence of the representations (1) and (2). Given an equivalence between the representations (1) and (2) when {X(t),teT} is weakly stationary and where either p or q (or both) must be infinite, it is apparent that the number of parameters needed to characterize the time series of interest (a1,a2..a. f interest (a,a2,...,a,b,b,...,b ) may be large, depending upon which representation is of direct interest. Therefore, in the interest of parsimony, a third linear representation for a weakly stationary time series has received an increasing amount of attention in recent years. Specifically, a time series {X(t),teT} is said to possess a mixed autoregressive moving average representation of order (p,q) if there exist sequences of real numbers a = 1, a,..., a, b = 1, b b such that o I' q P q (4) X a.X(t-j) = E b. (t-j); teT, j=0 j =o where {c(t),t~T} is a weakly stationary time series with zero expected value and a covariance yc(h) which is zero for all hio. Representation (4) therefore includes representations (1) and (2) as special cases while avoiding the practical problems associated with a proliferation of parameters.

I I, '.; I 7 -Representation (4) can be simplified by the introduction of linear operations. Specifically, let B be the backward shift operator and let V be the backward difference operator so that BX(t) = X(t-1) and VX(t) = X(t) - X(t-1) = (l-B)X(t). Then if p - q (5) D (B) = a.Bj and 0 (B) = b.B the representation (4) may be rewritten in the form (6) D (B)X(t) = 0 (B)c(t); teT. P q Keeping the representations (4) and (6) in mind, it is helpful to recall that we are still considering time series {X(t),tsT} that are weakly stationary. For some time series, it is clearly unrealistic to assume, a priori, that the 5 property of weak stationarity holds. Box and Jenkins, in a series of papers beginning in 1962, have suggested generalized forms of (4) and (6) for the linear representation of a broad class of time series exhibiting non-stationarity. To 6 motivate their generalization, consider the autoregressive process of order one given by setting p = 1 and q = o in (4): X(t) + alX(t-l) = e(t); teT The equivalent moving average representation is then X( (t) - e(t-) - a(t-) + (t-2) - ale(t-3) +... In order for {X(t),teT} to be weakly stationary, it is necessary that the variance of X(t) be finite. This requires that lal<l. If |all>l, the process {X(t),teT} is said to diverge and is clearly non-stationary in variance. If 1all = 1, the Box, G. E. P., and Jenkins, G.. M Models for Prediction and Control III Linear Non-Stationary Mdes, Technical Report No. 79, University of Wisconsin Department of Statistics. Watts, D. G. An Introduction to Linear Modeling of Stationary and NonStationary Time Series, Technical Report No. 129, August 1967, University of Wisconsin Department of Statistics.

I I -8 -process is still non-stationary but in an interesting way. For, if a 1 = +1, the process oscillates with an unbounded range. A more important case is when al = -1. In this case, X(t) = E(t) + ~(t-1) + e(t-2) + c(t-3) +... is the realization of what is termed a random walk. In fact, it is seen that the first backward difference given by X((t -1) - X (t) = (t) = (l-B)X(t) is weakly stationary and, in fact, equivalent to a process with a covariance y (h) which is zero for all hoo. While the original process {X(t),teT} has no variance (or higher central moments in general) its behavior is not divergent when a = -1, but is said to be "homogeneously" non-stationary. This terminology was first 7 introduced by Box and Jenkins. The above motivation suggests a generalization of (6) to a more complete representation given by (7) 0 (B)(l-B)dX(t)= O (B)c(t); teT, p q which of course includes both stationary and non-stationary time series. Representation (7), following Box and Jenkins, is called an integrated autoregressive 8 moving average of order (p,d,q). If d = o, (7) represents "well-behaved", weakly stationary time series. If d > o, (7) represents "homogeneously" nonstationary time series possessing infinite variances. This latter possibility is one which has frequently caused concern in the analysis of economic time series. It has been argued that time series possessing infinite variances cannot be possible in the economic sector. If this is indeed the case, then careful analysis of any given time series should reveal whether it possesses a representation in (7) with d = o. The truth of this argument, therefore, does not prohibit us from Box and Jenkins, Op. cit. 8Box and Jenkins, it. Box and Jenkins, Op. cit.

I I I -9 ' at least considering the possibility that d > o in (7). Alternatively, it has 9 10 been shown in some recent work by Harrison and Cogger that several time series from the economic sector exhibit behavior typical of time series represented by (7) with d > o. Given that a time series, either stationary or non-stationary, can be represented by (7) for some set of parameters a1, a2,..., a, b1, b2,., b, p, d, and q, it is necessary to estimate these parameters. Specifically, a sequence of realizations X(1),..., X(N) of this time series must be utilized to estimate p, d, q, al, a2,..., a, b b2,, b before forecasts of future realizations X(N+1), X(N+2),... of this time series can be calculated. The logical approach to take at this point might therefore be to discuss estimation procedures before examining forecasting procedures. It is convenient for the purposes of the automobile study, however, to reverse our approach and discuss forecasting procedures first. This approach is taken in light of the evolutionary manner in which economic simulation models are usually constructed. A modular approach is usually taken in constructing such models, beginning with a relatively simple model and then adding sophistication to the model as familiarity with its workings is attained. This incremental or modular approach is ideally suited to the incorporation of a time series analysis program within the simulation model, since the problems of forecasting and estimation can be totally segregated. Initially, a simulation model will have a forecasting segment in which the time series model (7) is specified a priori. As the simulation model attains a higher level of sophistication, the appropriate model will no longer be specified a priori but on the basis of information flowing from a separate estimation segment within the simulation 9 Harrison, P. J. "Exponential Smoothing and Short Term Sales Forecasting," Management Science, 13 (1967), 821-842. 10 Cogger, K. 0. "Statistical Foundations of General Order Exponential Smoothing with Implications for Applied Time Series Analysis" Unpublished Ph.D. dissertation, The University of Michigan, to appear.

-10 -model. Forecasting procedures will therefore be the subject of Section III and estimation procedures will be developed in Section IV. III. Forecasting Initially, the auto study simulation model will utilize a time series forecasting technique based upon a single model from (7). Regardless of the time series under consideration, the parameters p, d, q, al, a2,..., a, b,..., b P bl q will remain fixed. For this particular time series representation, the expected squared forecast error will be a minimum only when the time series under consideration possesses the chosen representation. It is important, then, to choose a representation which will achieve reasonably accurate forecasts regardless of the actual time series representation. It has been shown by Harrison and 12 Cogger that model (7) when p = o, d = q = 1, bl = -, |g|<l is one such representation that achieves this type of "robustness" under misspecification. With these parameter values, model (7) may be written in the form (l-B)X(t) = (l-3B)c(t), which can be written as (8) X(t) - X(t-) = c(t) - 3~(t-l) after removing the backward shift operator B. The basic problem is to find a forecast, given by Xt(t), of some future value X(t+S) given only the realizations X(t), X(t-l),... such that the mean squared forecast error given by 2 E[X(t+-)- X t()] is a minimum for any Z > o. While this problem seems formidable, it is actually quite easy to solve given (8). From (8), which is an ordinary difference equation, the time series {X(t),teT} has an alternative representation of the form 00 (9) X(t) = M + E(t) + (1-8) Z c(t-j), j=l 11Harrison, P. J., Op. cit. 12 Cogger, K. O., Op. cit.

-11 -where M is a constant. Representation (9) is simply the sum of the solution (given by M) of the homogeneous difference equation (X(t)-X(t-l)=O) and the particular solution of the non-homogeneous difference equation (X(t)-X(t-l) = ~(t)-~s(t-l)). The forecast of X(t+i) which achieves minimum mean squared error is known to be simply the conditional expectation of X(t+it) given X(t), X(t-l), which is given by Co (10) X (t) = E{X(t+)|jX(t),X(t-l),...} = M + (1-B) ~E(t-j). j=o For a complete discussion of this well known property of conditional expectations, 13 14 the reader is referred to Doob for an advanced discussion and Malinvaud for a less theoretical discussion. From (9) and (10), it is seen that (11) X(t+t) - X () a (t+) + (1-8) X e(t+t-j); i > 2 (~(t+i); i = 1. The mean squared forecast error for any I ~ 1 is given by 2 2 (12) E[X(t+k)-Xt()]2 = (o)[1+(-l)(l-6) ]; 1 since the time series {e(t),tcT} has been assumed to be uncorrelated with mean zero. The form of the i-step ahead forecast given by (10) is not utilized in practice. Instead, such a forecast is usually given in terms of X(t), X(t-l),... since these are the observable random variables. To calculate this more useful form of a forecast, we start with the original representation (8) in the form (13) X(t) = X(t-1) + E(t) - PE(t-l). The observation to be forecast can then be expressed as (14) X(t+k) = X(t+i-1) + C(t+i) - Be(t+R-l); > 1 The forecast of X(t+Y.) made at time t, whiclh we I hiavc (Ilenoted bIy X (t), in nIOW found by taking conditional expectations at time t on both sides of (14) using the relations 13 Doob, J. L. Op. cit. 14 Malinvaud, E. Statistical Methods of Econometrics. Chicago; Rand McNally and Co., 1966.

I I -12 -E{X(+Z)|x(t),x(t-l),...} = xt(z); 9>1 t E{X(t+L-l)jX(t),X(t-l),..} = Xt(l-1); >,2 E{X(t+k-l)IX(t),X(t-l),...} = X(t); Z=1 (15) E{c(t+-)|X(t),X(t-l),...} o; >1 E{c(t+S-l) X(t),X(t-l),...} = o; 9>2 E{s(t+Z-l)|X(t),X),X(t-),...} = e(t); =1 Whence we obtain the forecast X (Q) entirely in terms of previous forecasts and of known values of the series, since from (11) it is seen that E(t) = X(t) - X (1). Specifically, (14) and (15) reveal that xt(1) = x(t) - B[x(t)-x (1)] = (1-g)x(t) + eX (1); =l1 (16) ^ ^.'... Xt () = Xt(Z-l); >2. t t The form of (16) reveals several important properties of time series forecasts based upon representation (8). First, the forecasting procedure is (computationally) extremely efficient. It is recursive in nature, allowing forecasts to be quickly revised in the light of new observations on the time series of interest. A new forecast is obtained simply as a weighted average (depending on the value of 3) the he most recent observation and the most recent one step ahead forecast. Second, the forecast is constant for all future time periods. This second property is a necessary consequence of representation (8) since the underlying time series is assumed to have a constant expected value M. Third, the forecast implicitly assumes that an infinite number of past realizations of the time series are available for forecasting purposes. This property follows from the fact that the conditional expectations in (15) are based upon X(t), X(t-l),... ad inf. This last property deserves special attention, since in the automobile study's simulation model any time series will be observed for only a finite number of time periods. To apply the forecasting procedure (16), suppose that an initial forecast is used to determine a forecast at time t = 1. Then

-13 -x (l) = X(l)(1l-) + 1X (i) 0 where X (1) is determined in some arbitrary manner. By recursively updating the 0 forecast as new observations become available, (16) finally reveals that ~^ N-1Ni. (17) XN(1) (1-S) 1 siX(N-) + 8E(1) j.o is the forecast of X(N+1) based on the available realizations X(l), X(2),..., X(N) and on the initial forecast X (1). As the number, N, of observations in0 creases, the effect of the arbitrarily specified value for X (1) becomes negligible 0 (since 11<1, lim N =o). The effect of X (1) is significant if N is small, however, N+^ 15 15 and it is important to eliminate its effect upon XN(1) in this case. Taking expected values on both sides of (17), it is seen that E{XN(1)} = (1-)(l — M + ) (1) N -(1-5) where M is the expected value of X(N-j) for all j>0. But then E {X (1)-fNX (1) M (1-N ) Therefore for any value of N and any initial forecast X (1), we may correct the 0 forecast XN(1) in such a way that it will be an unbiased estimate of M for the N time series {X(t),tET}. Specifically, it is recommended that X (1) be equated to zero in the forecasting procedure described by (16) and that Xt(k) be adjusted by a factor of (1- B) for all t>l. For large values of t, this correction will not be needed but for small values of t its use will insure unbiasedness in the resulting forecast. To summarize the institution of this forecasting procedure in the automobile simulation model, we now recall the pertinent steps that must be carried out. It is assumed that N realizations of a time series have been observed and denoted X(l), X(2),..., X(N). It is further assumed that a forecast is required for any filture rea'li izal )lon X(N4-.) wIlhrc- 9. - W. W.tv-, li'l,e 'fl tt i Wade, R. C. "A Technique for Initializing Exponential Smoothing Forecasts," Management Science, 13 (1967), 601-2.

-14 -forecast by X N(). Recursively, the following steps are carried out. N Step 1: XL(1) = (l-B)X(l) Step 2: Xt(l) = (l-B)X(t) + 3Xt_l(1); t = 2, 3,..., N Step 3: XN(Q) = X(1); N N (1-13) Step 3, of course, is not necessary for large N but should always be made for the small sample sizes (N<.20) extant in the automobile study. Finally, it is necessary to choose a value for the parameter 3. Since we wish the forecasts obtained by this forecasting procedure to be as accurate as possible (in the sense of minimizing the mean squared forecast error), the choice of a value for B is important. A value of 8=.5 is recommended for several reasons. First, the above forecasting procedure is based on the model (7) when (pd,q) = (o,l,l). If the time series under consideration is truly represented by this model with 3=.5, the forecast is known to possess minimum mean squared error. If the true value of ( is not.5, this forecast will possess a larger mean squared error than is theoretically possible. The increase in mean squared error will be, however, less than 33-1/3% for any true value of 3 in the range o<Pl, a considerably wide interval. Second, suppose the time series under consideration cannot be represented by model (7) with (p,d,q) = (o,l,l) for any value of B. Then our choice of a value for 3 (given that we still insist upon,using the above forecasting procedure) should be based on the principle of providing a reasonable amount of protection against the existence of alternative representations for the time series of interest. One commonly assumed time series model is given by the condition (p,d,q) = (1,0,0) with -lalO. Such a time series representation is called a first order Markov process and the above forecasting procedure will not achieve the minimum mean squared error that is theoretically possible. For a value of 3=.5, however, the mean squared error of the

I I -15 -above forecast will be increased over this minimum by less than 30%, however. The above properties of the chosen forecasting procedure are discussed in detail 16 in Cogger. The robustness of this forecasting procedure under misspecification of the true time series representation renders it quite valuable when this true representation is unknown as it will be in the automobile study. The simple forecasting procedure described in steps 1-3 is called first order exponential smoothing. It is called "first order" because the expected value of the underlying time series is constant. The term "exponential" refers to the geometrically declining weights attached to previous observations in the forecast (17). This particular forecasting procedure, of course, is only one of many that can be derived from the integrated autoregressive moving average representation (7). Accordingly, we shall now consider more general time series forecasting procedures derived from (7). The first generalization we wish to discuss is the time series forecasting procedure known as general order exponential smoothing. To motivate our discussion, it is convenient to reconsider steps 1-3 for first order exponential smoothing in the form: Step 1': S1 (l-)X(l) Step 2': SGl] = (l-6)X(t) + gl]; t = 2, 3,..., N t t-l Step 3': X() = Sl]/[1-sN]; O l This notation is more suitable for the discussion which follows, and we shall always refer to St1] as the first order exponentially smoothed value of the data X(1),..., X(t). These steps suggest an immediate generalization, however. Suppose we define SP as the th order exonentially smoothed value of the data define S as the p-th order exponentially smoothed value of the data define t 16 O Cogger, K. O. 0p. cit.

-16 -X(1),..., X(t) given by applying steps 1'-3' not to the original data but to the (p-l)-st order exponentially smoothed value of the data. More precisely, [PB] let SP] be defined by (18) S[P] (l-B)X(t) + BS t; p=l; t.l ' (1- ) S + p2; t 0o; ppl; t=o. To see that this definition can lead to useful results, (18) can be solved recursively for t = 1, 2,..., N to obtain7 — 1 (19) S -P] ( 1-S)P J (J X(N-j); pl; N>1. Nj-o \ P-1 -(Note that if p=l, (19) reduces to the previously defined first order smoothed value of the data.) We now assume that the expected value of X(N-j) is a polynomial in j of degree n-l. Hence, this expected value may be expanded in a Taylor series about the point N in the form (20) E{X(N-j)} = J (-j)k 4k./k' k=o where the A k, for any fixed value of N, are unknown constants. The important question to be answered at this point is whether the exponentially smoothed values [k] of the data in (19) can be used to estimate the unknown parameters A ] for k = 0, 1,... n-l. The answer to this question is in the affirmative, as shown 18 in Brown8 for the asymptotic case (N tending in the limit to infinite) and in 19 Cogger for any value of N. To show this, we take expected values on both sides of (19) and utilize (20) to get 17 Brown, R. G., Op. cit. Ibid. 19 Cogger, K. 0., Op,- cit.

I I -17 -(21) E{SP} =! (p-l) o. (.+). }. Ak Defing the nxl vecto in the f Defining the nxl vector S in the form -N tef; p>l; N1i. (22) SN = [1] N S[2] N 4[n] N and the nxl vector A in the form _YV (23) [o] [1] ' n-1] N and the nxn matrix M(N) with elements in the p-th by row and (k+l)-st column given (24) 1M (_l)k(l_)Pp N-I j1kj (j+p-l) (24) M j - -' ) pk+1 k!(p-4l)! I j=; 1<p<n; o.k,<n-1, we now rewrite (21) in the matrix form (25) E{S} = M(N)4N; N1. But then an unbiased estimate of the elements of A is given by (26) M(N) 1SN; N>1, -1 where M(N)1 (27) I = 1 Ls merely the inverse of the matrix M(N). Defining. - /(n-l)! as an nxl vector, it is clear from (20) that N-l k [k] E{X(N+z)} = Z N = 'I A k=o k!

-18 -and therefore an unbiased estimate of the mean of the time series at time N+is given by (28) X N() = A = '(N)1; N>1. Alternatively, (28) describes the Z-step ahead forecast of the time series based on the observations X(l), X(2),..., X(N). The above forecasting procedure is called exponential smoothing of order n (or, sometimes, simply general order exponential smoothing). We have yet to show that general order exponential smoothing minimizes the mean squared forecast error for any time series represented in (7). The time series representation in (7) for which this property holds is given by setting p=o, d=q=n, and b. = (k)(-8)j; j=o, 1,..., n. That is, the forecast given by exponential 3 smoothing of order n achieves minimum mean squared error for a time series possessing the linear representation n (29) V n(t) = (Q)(-r)j e(t-j) j=o 3 j -- where the s(t) are uncorrelated with mean zero. A detailed proof of this result 20 is beyond the scope of this paper but may be found in Cogger. Exponential smoothing of order n is perhaps one of the most widely used forecasting procedures based on (7). Its wide use has been promoted by the ease with which time series forecasts may be made (cf. (18), (22), (24), (27), and (28)). The only apparent difficulty with the procedure is the complexity of the matrix 21 M(N) for large values of n. This problem has been recently resolved.1 A recursive method of evaluating this matrix for any set of values for n, N, and 3 is given by 20 r, K.., p. cit. Cogger, K.O., p0. cit. 21 Ibid.

I I -19 -(30) M k+(N) = 1 -P k+.( p M (N) (1 )P- N ( p+N2); >2; k=o -' (lr3l N-i 2; k=o (P/k)[Mp,(N)-Mp+lk(N)/(l-')]; p4l; kIl. p, k pT k The relationships in (30) are easily verified by substitution from (24). We now consider representation (7) in its completely general form given by (31) J aVd X(t-j)= bj~(t-j). j-o j=o For given values of p, d, q, al,..., a, b,..., bq the representation (31) may be written alternatively in the form 00 (32) X(t) = m(t) + I Yk~(t-k) k=o where m(t) is the solution of the homogeneous difference equation given by (33) a.V X(t-j) o, J=o and where the first few cofficients T =1, V1 P2... which are needed in what o 2 follows may be obtained by equating coefficients of B =1, B,...in the expression k d. (34) V B ^ = bjB/[(-B)d p a.B] k=o j=o j=o J This follows from (7) by noting that if P (B)(l-B)) X = 6 (B) ~(t), p q then 0 (B) o X(t) ) = [ q) ' Bk d k p (B)(1-B) k=o p But then from (32), it is seen that 00 (35) E{X(t+)[IX(t),X(t-l),...} = m(t+) + X Ykc(t+Z-k) = x (9). - k= t

I I -20 -Combining (32) and (35), the minimum mean squared error of the k-step ahead forecast will be given by 2 r2 (36) E{X(t+Q) - X ()} Y (o) V2; 1. t k k=o The above results enable us to determine the variance of the k-step ahead forecast. The form of the forecast itself is most easily found in a different manner. Accordingly, we rewrite (31) in the form P d p+d q (37) C a.V X(t-j) X(t) - O.X(t-j)= b.e(t-j). j=o J j=1 j=o J This form assumes that p+dAl, an assumption that will be maintained throughout our discussion. But then p+d q (38) X(t+sQ) = S. X(t+a-j) + i b. e(t+a-j). j=-1 j=o We now take expectations, conditional on X(t), X(t-l),..., on both sides of (38) (recalling (15)) to get ^ p+d (39) Xt(Q) = E{ L jX(t+k-j)lX(t),X(t-l),...} + E{ be(t+-j)lX(t),X(t-l)...} = j=ol j= where p+d -1 -l1 p+d (40) E{ [.jx(t+a-j)i...} =.X (k-j) + = 4.X(t+-j); p+d>a,2 j= j=l j= p+d J x X(t+l-j); p+d>A=l j-l X.X (a-j); p+d<a;>,2 j=l J and where q (q (41) E{ E b.c(t+J-j)|J...} = b.j(t+Z-j); l<S.< q j=o J j= J o; P>q

-21 -Since, from (32) and (35), e(t+J-j) = X(t+t-j) - Xt+ (1) for j>Sg, (39)-(41) completely determine the minimum mean squared error forecast Xt(Z) in terms of previous observations X(t), X(t-l),... and of previous forecasts X_ (I), Xt2 (1),... for any values of p, d, and q such that p+d>l. For the case p+d=o, (31) may be written as q (42) X(t) = ] bj (t-j). j=o But then q (43) X(t+J) = I b.(t+l-j). j=o Taking conditional expectations on both sides of (43) it is seen that ^ q (44) Xt(I) =! b.~(t+Z-j); l < q j=9 J Lo; R>q, which is the same as the expression in (41) Hence, equations (38)-(41) and (43)-(44) enable forecasts to be made for any set of values for p, d, and q. One last point to be considered is the need to calculate "initial conditions" for the forecast errors s(t+S-j) = X(t+S-j) - Xt+.j1 (1) for jug found in (41) and (44). The recommended procedure is to set these equal to zero or, equivalently, equal to their expected values. For a large number of observations, of course, these "initial conditions" will have a negligible effect upon the forecast Xt(9), and for any value of t>l, the forecast will be an unbiased estimate of X(t+s). If the particular representation in (7) corresponds to general order exponential smoothing, of course, these initial conditions may be eliminated in a more rigorous manner, and the necessary forecasts may be calculated, as before, without the use of equations (38)-(44). IV. Estimation of Forecasting Parameters Suppose that N consecutive realizations X(1),X(2),...,X(N) of a time series {X(t),teT} represented by the model (7) have been observed. In order to

-22 -implement the forecasting procedures described in Section III, appropriate estimates of the unknown parameters a,a2,...,a,blb2.,.,b must be obtained. In addition, a1,a2,. p' 1' 2' o' the order (p,d,q) of the process must be determined. Each of the three estimation procedures described in this section is based on the assumption that (p,d,q) is known. For this given order, each procedure yields estimates of the unknown parameters as well as a statistical measure of the goodness-of-fit of the available realizations to the model (7). These estimation procedures do, however, permit the determination of the proper order (p,d,q) in the following manner. First, estimation procedures are instituted for a large number of possible orders of the process. Empirical evidence has suggested that most economic time series can be modeled reasonably well by one of the following orders: d = 0, 1, or 2 and p = q = 0, 1, or 2. These nine different orders are therefore recommended for use in economic simulation models. Second, a statistical measure of goodness-of-fit is calculated for each of these nine different orders. Finally, the order (p,d,q) which yields the best fit of the available realizations to the model (7) is specified as the appropriate order of the process, and the corresponding estimates of the parameters al,a2,...,a,bl,b2,...,b are then used to carry out 2 p 1 2' q the forecasting procedures of Section III. Each of the three estimation procedures to be presented are based only upon realizations of V X(t) where d, as indicated previously, has been specified. For notational convenience, we therefore define Y(t) = V X(t) as the d-th backward difference of X(t), and rewrite the model (7) in the form P q (45) F a.Y(t-j) = X b.(t-j). j=o j=o J

I I i -23 -22 Estimation Procedure A(q=o) The first estimation procedure to be discussed was first derived by Mann and Wald for q=o in (45). In this case, (45) may be rewritten in the form (46) Y(t) = -alY(t-l) - a2Y(t-2)-...-a Y(t-p) + e(t). p But (46) is similar in appearance to equations appearing in ordinary multiple regression theory and it might therefore be suggested that appropriate estimates of the parameters a. could be obtained by treating (46) as a classical least-squares regression problem where Y(t) is the dependent variable and Y(t-l), Y(t-2),..., Y(t-p) are the independent variables. Estimates of a. obtained in this manner do not possess the usual statistical properties of least squares estimates, however, since the random variables Y(t) are not independent. Mann and Wald, however, showed rigorously for the first time that asymptotically (large N) the statistical properties of least squares estimates of a. in (46) are the same as those of least squares regression estimates in multiJ variate normal systems. Summarizing Mann and Wald's results, we note first that if N realizations of X(t) are available, then only (N-d) realizations of Y(t) are available, and these are denoted by Y(l),Y(2),...,Y(N-d). Second, estimates al,a2,...,a of the parameters al,a2,...,a are given by those values which p p minimize the sum of squares N-d 2 (47) S = X [Y(t) + al Y(t-l)+...+a Y(t-p)]. P t=p+l Differentiating S with respect to a1,a2,...,a and equating to zero yields the following matrix formulation of the normal equations: (48) (Y'Y)a = Y'y, M2ann, H. B., and Wald, A., "On the Statistical Treatment of Linear Stochastic Difference Equations," Econometrica, 11 (1943), 173-220.

where (Y'Y) - -24 - y(t-l) lY(t-l)Y(t-2) CY(t-l)Y(t-p), and Y'y = Y(t-l)Y(t-2) 2 (t-2) Y (t-2)Y(t-p) - YY(t)Y(t-l) - XY(t)Y(t-2) - CY(t)Y(t-p)... 9 Y(t-l)Y(t-p)... a Y (t-2)Y(t-p) o 1... YY2(t-p), where a 1 a a2 all summations are from t=p+l to t=N-d. After obtaining the estimates a from (48), the estimate of the variance of E(t) in (46) is given by (49) y,(o) = cS N-d ^ N-d y [Y(t)+alY(t-l)+...+a Y(t-p)] = Y (t)-a'Y'Ya tp+1 t=-p+l (N-d-p) (N-d-p) For purposes of testing hypotheses and setting confidence intervals, Mann and Wald show that the asymptotic covariance matrix of a is given by ^ 1 (50) Var a=(Y'Y )y (o), and that a has a limiting multivariate normal distribution. That is, if the element in the i-th row and i-th column of (Y'Y ) is given by Cii, then a. - a. 1 1 Z - 1Y (o)Cii is asymptotically distributed as a standard normal random variate.

-25 - Estimation Procedure B (p=o)23 The second estimation procedure to be discussed was first derived by for the case p=o in (45). In this case, (45) may be rewritten in the form Durbin q (51) Y(t) = Z bje(t-j). j=o Durbin noted that since (51) describes a moving average process of finite order q, the time series {Y(t),teT} must also possess an autoregressive representation of infinite order. If this autoregressive representation were of finite order, say k, then Estimation Procedure A would be directly applicable. Durbin showed that if k were taken to be sufficiently large, then this approach could be taken. Following Durbin, we first approximate the autoregressive representation of {Y(t),tET} by k (52) I a.Y(t-j) = e(t) 3=0 for. large_ k. Let a,a2,...,aa k be the least squares estimates of aa2,...,ak in (52) given by a direct application of Estimation Procedure A for p=k (cf. (48)). Durbin then showed that these estimates could be used to obtain consistent and asymptotically normal estimates of b,b2,...,b in the following manner. Let the qxq matrix A be defined by qxq matrix A be defined by 23Durbin, J., "Efficient Estimation of Parameters in Moving Average Models,' Biometrika, 46 (1959), 306-316.

-26 -k k- ^ k- + r 2 A a. a a.... J=0 J j=0 J J J==0 j qH J 0 j=o jo jq k-1 ^ ^ k -+2 j=o j+l j=o0 = jaj+q-2 k- +1 * ^ k-q+2, k j j+pq- ho j j +q-2 ' a j~~o ~h=o JJ o Jj=o J ~^ r^ 1rk-I ^ ^,, Letb b = and a*= - aja+1. Then b is ~1 3 j=o bq ajaj+ given as the solution to the equation ^.. A. However, when q<2 (as will be the case according to our recommendations made at the beginning of this section), these variances are easily found. Durbin showed that if q=l, the variance of bl is given by 2 var b1 = (1-b1)/(N-d-k) * If q=2, the variance of b or b is given by - o1- 2 2 vab1 = ar b2= (-b2)/(N-d-k). Hypothesis tests and confidence intervals may therefore be constructed by noting that z = VN-d-k (b -b )/Vib2 is a standard normal random variable when q=l and that

-27 -Z = H iNVV (bj-bj)/v~1; j=l or 2 is a standard normal random variable when q=2. Finally, since this estimation procedure utilizes Procedure A explicitly, an estimate of the variance of E(t) may be obtained from (49) after setting p=k. Estimation Procedure C (Any value for p, Any value for q) This last estimation procedure to be discussed was first derived by Durbin for general values of p and q in (45). The procedure is iterative in nature, and each iteration involves a number of steps. The first step suggested by Durbin is to approximate the representation for {Y(t),tcT} in (45) by an autoregressive representation k (54) X a'Y(t-j) = c(t) j=o for large k, and to apply Procedure A to obtain estimates al,a2,...,ak of the parameters in (54). The second step taken by Durbin is to calculate k (55) e(t) = a'Y(t-j), j=o J which is simply the residual error from the regression of Procedure A. Estimates al,a2,...a, b,b',..,b of the unknown parameters in (45) are then obtained by Iz p l ~ q calculating those values which minimize the sum of squares N-d (56) S = X [Y(t)+a Y(t-1)+..+a Y(t-p)-b e(t-1)-...-b e(t-q)] t-p~k"l -Lp 1 q' t=p+k+l (The limits on the summation are easily obtained by noting which terms in the summand are available.) The appropriate estimates are given by solving Durbin, J., "The Fitting of Time Series Models," Revue Inst. nt. de Stat., 28 (1960), 233-243.

-28 - (57) X a b ^. _ O. x2 _ Z -Li 'C2, where lY2(t-l).Y (t-l)Y(t-2) Y (t-2)Y(t-l) LY (t-2) Y (t-p)Y(t-l) CY(t-p)Y(t-2) Mll M12 a 1Y(t-1)Y(t-p) -e (t-l)Y(t-l) -e (t-l)Y(t-2) * -e(t-1)Y(t-p) -e(t-l)Y(t-p) Y (t-2)Y(t-p) -e (t-2)Y(t-l) -je(t-2)Y(t-2) -e (t-2)Y(t-p). Y2(t-p) -Ce (t-q)Y (t-l)... -e (t-q)Y(t-2)...* -e (t-q)Y(t-p) 21 = 12 2 Le (t-l) e (t-l) e (t-2) ye (t-l)e(t-q) Ye (t-2) e(t-l) e2 (t-2) e (t-2) e (t-q) Ye(t-q)e(t-l) Ye(t-q)e(t-2) Le2 (t-q) M22 ' =,-l y(t)Y (t-l) — 2 Y (t)e(t-l) XY(t)e(t-2) -IY(t)Y(t-2) -_Y(t)Y(t-p) XY(t)e(t-q)

-29 -a = a 1,and b = b, where all summations a ^ "2 2 a2 a b p J b qj are from t = p+k+l to N-d. The third step suggested by Durbin is the following. Suppose that the i, true values of al,a2,...,a are known in (45). Let Z(t) = Y(t) + alY(t-l)+...+a Y(t-p) p to obtain (58) Z(t) = E(t)+blE(t-l) + b2e(t-2)+...+b q(t-q) q for t=p+l, p+2,...,N-d. Then Procedure B (p=o) can be directly applied to obtain A ^ ^ estimates b1,b2,...,b by simply letting Y(t) = Z(t) in the indicated procedure. 2 q The fourth step suggested by Durbin is the following. Suppose that the true values of bl,b2,...,b are known in (45). Utilizing the backward shift operator B, (45) may be written in the form (59) a.BjY(t)= b.B3 e(t). j=o j= But then (60) a.BJW(t)= ~(t)= = a.W(t-j) j= jo j 0 when we define o00 (61) W(t) = X y.BJY(t) j=o J such that (62) y y.Bj = [ b.B]-1. j=o J j=o J

I I I: i -30 -Approximating (61) and (62) for large k, we write k (63) W(t) = C YjY(t-j) j=o and k (64) [ i yBj ] [ I b.j] = 1. j=o j=o Knowing the values of b., the values of yj may easily be found by equating coefficients of like powers of B in (64). For the values of q that are of concern, the following results are easily obtained. q=l: y = (-bl); j>o. q=2: Y =1; =-b; -blYjl-bj_2 for j>2. After making the indicated transformation (63) for large k, the parameters al, a2,... a may be estimated by a direct application of Procedure A, substituting W(t) for P Y(t) in the normal equations (48). Thus, knowing bl,b2,..,b, we obtain estimates a1,a2,...,a P The above steps are related to each other in the following manner. First, preliminary (and usually quite accurate) estimates of a,a2,...,a, b1 b2,..,b are obtained by carrying out the first two steps described above. Given the resultant estimates ala2,..,a, the third step is carried out under the assumption that these estimates are equal to the actual values al,a2,...,a. This third step, p of course, will yield new estimates of the unknown parameters bl,b2,...,b. Given q these latter estimates, the fourth step is then carried out under the assumption that these estimates are equal to the actual values bl,b2,...,b. This fourth step, of course, will yield new estimates of the unknown parameters a,a2,...,a. This sequence of estimation procedures constitutes the first iteration. Additional iterations are then carried out beginning with the third step. This estimation procedure clearly requires the establishment of a non

-31 -arbitrary rule which can be used to terminate the iterative process above. Durbin has indicated that the preliminary estimates obtained with steps one and two above are usually quite good, thereby reducing the number of iterations that might be required if poor "starting values" were utilized. Accordingly, it is recommended that an upper limit of 5 iterations (including the first pass through the indicated four steps) be established to reduce computational effort, Further? it is recommended that each time the fourth step is carried out, the estimate of the variance of e(t) given in (49) be calculated. The iterative procedure should be terminated whenever y (o) changes by less than 10% from the previous iteration. This additional termination condition should further reduce the expected number of iterations without adversely affecting any forecasts that are subsequently made. Finally, the only remaining ambiguities are the number of lags (k) to be taken in some of the steps above where k was simply taken to be large, and the statistical measure of goodness of fit that will be used to determine the proper order of the process. It has been shown in recent Monte Carlo studies that when both the sample size (N-d-p) and the number of lags (p) are assumed to be large, a 25 reasonable rule of thumb is to take N>50 and p to be approximately N/10. When these conditions cannot be satisfied, the only recourse is to rely upon the proven robustness of time series forecasts derived from the models discussed in this paper. The statistical measure of goodness-of-fit that should be used in deciding upon the order (p,d,q) of the process is given by the estimate of the variance of the time series {c(t),teT} in (49). This measure is a logical one to utilize in the context of time series forecasting, since the variance of a one-step ahead forecast will be equal to the variance of the time series {C(t),tET}. We would prefer to choose, as a model of the time series of interest, that particular model 25Cogger, K.., op. cit. Cogger, K. 0., op. cit.

-32 -(and order) which minimizes the residual variance given by (49). Moreover, this measure of goodness-of-fit can be shown to be extremely sensitive to misspecification of the correct value of d. It is less sensitive to misspecification of p and q but, fortunately, it can be shown that this latter type of misspecification does not usually entail very large increases in the variance of a forecast.26 usually entail very large increases in the variance of a forecast. 6Ibid.

OTHER WORKING PAPERS Working Paper Number 1 "Reflections on Evolving Competitive Aspects in Major Industries, " by Sidney C. Sufrin:2 "A Scoring System to Aid in Evaluating a Large Number of Proposed Public Works Projects by a Federal Agency, " by M. Lynn Spruill 3 "The Delphi Method: A Systems Approach to the Utilization of Experts in Technological and Environmental Forecasting, " by John D. Ludlow 4 "What Consumers of Fashion Want to Know, " by Claude R. Martin, Jr. — Out of print. To be published in a forthcoming issue of the Journal of Retailing. 5 "Performance Issues of the Automobile Industry, " by Sidney C. Sufrin, H. Paul Root, Roger L. Wright, and Fred R. Kaen 6 "Management Experience with Applications of Multidimensional Scaling Methods, " by James R. Taylor 7 "Profitability and Industry Concentration, " by Daryl Winn 8 "Why Differences in Buying Time? A Multivariate Approach," by Joseph W. Newman and Richard Staelin —Out of print. To be published in a forthcoming issue of the Journal of Marketing Research. 9 "The Contribution of the Professional Buyer to the Success or Failure of a Store, " by Claude R. Martin 10 "An Empirical Comparison of Similarity and Preference Judgments in a Unidimensional Context, " by James R. Taylor

Working Paper Number 11 "A Marketing Rationale for the Distribution of Automobiles, " by H. 0. Helmers 12 13 14 15 "Global Capital Markets, "by Merwin H. Waterman "The Theory of Double Jeopardy and Its Contribution to Understanding Consumer Behavior, " by Claude R. Martin "A Study of the Sources and Uses of Information in the Development of Minority Enterprise —A Proposal for Research on Entrepreneurship, "by Patricia L. Braden and Dr. H. Paul Root "Program Auditing, " by Andrew M. McCosh