August 1976 Division of Research Graduate School of Business Administration The University of Michigan MULTIPERIOD PREDICTIONS FROM AN AUTOREGRESSIVE MODEL USING EMPIRICAL BAYES METHODS Working Paper No. 135 by R.W. Andrews The University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Division of Research.

ABSTRACT The ability of an estimated dynamic model to produce accurate predictions depends on the method used to estimate the parameters in the model. This report investigates multiperiod predictions for a firstorder autoregressive process. A Bayes predictor is derived using a bivariate normal prior distribution over the parameters of the model. The loss structure used depends on the prediction and the variable to be predicted; therefore this is a statistical decision theory problem. The form of the Bayes predictor lends itself to Empirical Bayes procedures. Two Empirical Bayes procedures are presented which bypass the assessment of a prior distribution. All three procedures are developed so that the estimates of the model parameters depend on the number of steps ahead the forecast is desired.

Introduction Since economic time series data is often analyzed using autoregressive models, the methods employed to estimate the parameters of these models is of paramount importance. Using the estimated model to predict is often the purpose of observing time series data. The predictions could be required for one period of time in the future or for many periods ahead. Estimated models which exhibit certain properties for predicting one period ahead may not exhibit these same properties for multiperiod predictions. Therefore, when estimating the model, it is reasonable to consider the number of periods ahead for which the forecast is desired. Consequently, this is a statistical decision theory problem rather than just an exercise in fitting the data to a model. The dynamic economic structure considered in this report is limited to a first-order autoregressive model. This problem of finding multiperiod predictions from an autoregressive process has been investigated in Klein [3], Chow [1], and Lahiri [4]. The estimator suggested by Klein [3], and referred to by Fair [2] as the DYN estimator, attempts to account for the unknown values of all lagged endogeneous variables in multiperiod predictions. A Bayesian analysis of the problem of estimating an autoregressive model has been reported by Chow [1], who gives a Bayesian solution for the first-order autoregressive process. His analysis assumes that the prior distribution is diffuse. In section three, a Bayesian solution with respect to a normal prior distribution is reported. The loss function is the square of the specific multiperiod prediction error. The use of this loss -1 -

-2 - structure results in the estimator depending on the number of periods ahead for which the prediction is desired. In this report, Empirical Bayes procedures are utilized in estimating a first-order autoregressive process. The Empirical Bayes methods of estimating the parameters are developed to depend on the number of periods ahead for which the forecast is required. These procedures employ the same criterion as Chow [1]. However, the Empirical Bayes procedures do not require the assumption of a specific prior distribution on the parameters. The Empirical Bayes estimators presented below use the form of the derived Bayes estimator as given in section three. The artificial assumption of a known prior distribution is circumvented, however. In section three the Bayes estimator is given as a function of the prior distribution. From this form two Empirical Bayes predictors are presented in section four. The concluding section comments on the applicability of these predictors and suggests essential further research. The Model A first-order autoregressive model can be represented as follows: (1) Yt = aYt-1 + c + ut; for t = 1, 2, for which E [u] =0 and E [uu] = h1 6ts t t s ts Throughout this development it is assumed that the initial value of the series, yo, is fixed and known. In addition, the error term is assumed to have a normal distribution with a known precision h = 1/ The to have a normal distribution with a known precision h = 1/0. The

-3 - unknown parameters are therefore the coefficients a and c. It is not the purpose of this report to investigate procedures of directly estimating a and c. The problem of interest in this report will be predicting Yn+k based on the observations yl' Y2,....', In the next section, Bayes predictors of n+2 and y+3 are reported. The following section gives two different forms of the Empirical Bayes predictor of Yn+k Since the Bayesian framework will be utilized for both the Bayes and Empirical Bayes predictors, an assumption must be made concerning the realization of the random variables a and c. This assumption may be made in two different ways. The augmentation of this assumption to equation (1) results in two different models. This assumption, denoted by either A or B, concerns the function of the prior distribution on a and c. Assumption A: For a specific time series; y, Y1, Y2-...... Yn....; (a,c) is a realization from a bivariate prior distribution and (a,c) is fixed throughout the time series. That is, the value of (a,c) is the same for each time t = 1, 2,.....,n,........ Assumption B: For a specific time series; Y, Y1, Y2,-'........ and for each time t = 1, 2,......n,......, a realization from the prior distribution of (a,c) is experienced. That is, for a given time series, the value of (a,c) at time t might be different from the value of (a,c) at time s # t, even though they are both from the same distribution. In essence,under Assumption A, the values of (a,c) are fixed from one step to the next; whereas, under Assumption B, the values of (a,c) change from step to step. Throughout this report it is assumed that

-4 - the prior distribution of a and c are independent. (Comments as to how the predictors developed should perform under Assumption A or B will be given in the concluding remarks.) All classical developments of this problem, such as Klein [3], assume that a and c are fixed throughout the time series. This is consistent with Assumption A except that classical theory does not assume a prior distribution. Under Assumption B, equation (1) should be written: (2) yt = atYt-1 + ct + ut; for t = 1, 2,..... The Bayes Predictor This section will develop the form of the Bayes estimator as well as the Bayes predictor of Yn+2 and Yn+3 for an assumed normal prior. The unknown parameters of the model as given by equation (1) are (a,c). Let p(a,c) denote the prior distribution for these parameters. At the n- stage we have observed yn = (Y1, 2......y and we want to predict Yn+k Since the predictor or forecast is a function of yn, we write Y+k = YYk(Yl' Y2'.....,y) as the predictor of y To find the form of Yn+k, the criterion is to minimize the Bayes risk given by (3) R(p) = EpE{(Yn+k-Yn+k) } This implicitly assumes a squared error loss on the value of the y we wish to predict. This also will be the loss function assumed in the development of the Empirical Bayes procedures. The interior expected value of equation (3) is taken with respect to the conditional distri

-5 - bution of y1, y2'......,y and Yn+k given a and c. The other expectation is taken with respect to the prior density p(a,c) as noted. Using equation (1) repeatedly we may write, k k-a k-2 k-l k-2 k = y +(a k-lc + k-2 c +. + c) + (ak-lun+l + ak-2 + + ) n+k n n+ Un+2 + n+k Then, (4) E(9 - 2 (4) (n+k -n+k k k-1 k-2 k-l k-2 2 +k-a yn-(a c + a c +.. + c) - (ak Unl.. un2 + u+k k k-i k-2 2 E{y^n+k a Y (a c + a c +.... + c)} + E{ak un+l + a Un+2 +... + Un+k (5) EpE{yan+kaYn-(a c + a c +.... + ) Yn+k_ akyn- (ak-1kn2 (5)EE{Y k-a y-(aUc + a c +.... + c)} d aky (a aklun + +c)2( ac) p(a,c) d -2da dc The last equality holds since ut with t>n are uncorrelated with Yn Since we want to find the n+k which minimizes equation (3), and since the second term on the right hand side of equation (4) does not where p(y |a,c) is the conditional distribution of y given (a,c). The integrals written here, as throughout this entire report, are integrated over the entire appropriate sample space. Since tegrated over the entire appropriate sample space. Since p(y|na,c) p(a,c) = p(a,c|yn) P(yn),

-6 - we may rewrite equation (5) as (6) EpE{yYn+l-akYn- (aca + a c +.... + c)} {^n+kay -(aC k+ ak-lc +.. + c)}2p(acljy) p(Yn) dyn da dc For an observed n = (Y1, Y2'....,yn)', equation (6) is minimized if we minimize the integrand {y +-aky -(ac + ak k- +...+ c)}2p(a,clyn) da dc, with respect to Yn+k Setting the derivative equal to zero we obtain (7) yn+k = a + (ac + a c +.... + c) p(a,clyn) da dc We may write that p( P(Ya,c) p(a,c) p(a,c|yn) = Ip(yn a,c) p(a,c) da dc and therefore the form of the Bayes estimator may be written f rakyn k-1 k-2 n{aky + aklc + a c +.... + c} p(yn a,c) p(a,c) da dc (8) yn - () Yn+k IP(ynla,c) p(a,c) da dc Equation (7) is the form of the Bayes estimator from which the Bayes predictor for two and three steps ahead will be developed. Equation (8) is the form which will motivate the Empirical Bayes estimator of Yn+k as given in the next section. If we assume that (a,c) are bivariate normal with mean vector (P1-,2)and variance-covariance matrix il/f h 0 l 0 1/f2h

-7 - then a- ' fl 0 a-r -h fa ~ I ] [::1] (9) p(a,c): exp 2 cThe conditional distribution of the data may be represented by Yl-aYo - C yc -aY -C Y2-ay1 - c y2-ayl - c (10) p(y na,c) - exp 2. Yn-aYnl-c Yn-aYnl-c Using equations (9) and (10) we derive fnac) P{aZ1] [ fJ[a-:fl ~pc|ac) -h r- r1,1 o - e_ p(n2la2c) 0 exp C-P2 + yl-ay - c y2-ayl - c Yn-ayn-l-c yl-ay -c y2-ay1 - c Yn-aYn-l-c In order to simplify this conditional density we will use the following identity as given in Raiffa and Schlaiffer [7; p. 337]: (y-X)' (y-XB) + (S-b)'V(P-b) = ([-[V+X'X]-1 [Vb+X'y])' (V+X'X) (1-[V+X'X]-1 [Vb+X'y]) + b'Vb + y'y - (Vb+X'y) ' (V+X'X)- (Vb+X'y). Notice that B is contained only in the first term of the right hand

-8 - side of this identity. If we set B=, b=, V= 0 fV y 2 y0 1 YI Yo Y2 1 y =, and X = Yn Yn-l then we can find the posterior distribution of (a,c) given Yn it is obviously normal with the parameters as given below. All summations are from i = 1 to n. Next, set + - 2 D-1 = (n+f2) (fl + X yi-) - (E Yi-1) The posterior mean of a is pa = D {(n+f2) (ulfl + YiYil) - (P2f2 +1 yi) Yil}; the posterior mean of c is pC= { (f+1 Yi_l) (P2f2 + yi) - (1f + YiYi-l) Yi-)l the posterior variance of a is a = Dh (n+f2) a 2 the posterior variance of c is 2 = ); and = Dh-l (f + y; and i — the posterior correlation coefficient of a and c is

-9 - Pac (-Yi-) /[(n+f2)(f + Yi2 ) acYi-1 Using these posterior parameters and equation (7) we can find the Bayes predictors for two and three steps ahead. In order to accomplish this, we need the following statements: If (X,Y) are 2 2 bivariate normal (x' y x, a, a, p), then x y x9 CyyI,te E[Y] = Iy, E[XY] = pGXay + pXpy, [2 2 2 E[X Y] = 2QpXOXOY + Pyax + PYPX 3 2 3 2 3 E[X3Y] = 3pxOy +PGy + 3XyX + XY Using these identities in equation (7) we find that ^ 2 2 Ay2yn(aaC + ) + p aa + V j + av Yn+2 +n a aa acc a c ac +c and ^ 2 3 2 2 yn+3 = y (3 + )+Ia + + 23) + caa + a 2 + 2acap c n+3 n a a a c a c a ac a a c ac a c + Pac + c Empirical Bayes Predictors Using the form of the Bayes predictor given in equation (8) Empirical Bayes procedures will now be investigated. In order to demonstrate the Empirical Bayes procedures as described in Lemon and Krutchkoff [5], we will display the data in the following form:

-10 - experience 0 1 2 3 4......j......n Y Yo Y Y Y ---...... Yo-......Y Yl Yl Yl Y1 Yl Yl Y2 Y2 Y2 Y2 Y2 Y3 Y3 Y3 Y3 Y4 Y4 Y4 a4....a...... a. e4 ij n 2, e. c 4 j n 2..2.82 4...... n If the problem is considered as arrayed above, every new observation will enable us to consider a new experience. For example, after j observations we are at the jth experience and have the data yl, Y2-...., available. Consider the situation in which we have observed n (7Y1, Y2.' ---',n) and we wish to predict Yn+k. We want to use the form of the Bayes predictor, equation (8); however we do not have nor do we want to assess a prior distribution. For each experience 2 -1 after the third stage we have been able to estimate a,c, and o2=h These estimates could have been developed in numerous ways, but throughout this report we will use the ordinary least squares estimates 2 and designate them by aj, c., and &j; for j = 4, 5,....,n. If we give equal weight to each of the estimates of the parameters, i.e., let p(a,c) = l/n-3 for (a,c) = (a, cj,), then we can rewrite equation (8) as

-11 - [ rhk r~k-l h. ~k ~ k-2 ^ [ak,+ ^k- ~ + a. c + + ^ ~. ^ (11) ^E [ n + p( a T)-1 J Jl j.... Yn+k nlaJ Cj I P (Ynl a j, ) where the summations are taken from j = 4 to n. The quantity ^k,k-l ^ k-2 ajY + (aj + c aj C+.... + c) is a prediction of yn+k based on the estimates obtained at experience j. Therefore, equation (11) is a weighted average of estimates of y +k The weighting function is given by P(Y La C a n exp { (y*4ay* - 2 p(y j, cj) exp 2{ 2 (Yi-ajYi- -Cj) } ' with the summation taken from i = 1 to n. The weight is proportional to the conditional likelihood of the present experience given the estimates of the parameters at the experience to be weighted. In essence we are weighting the prediction developed from each experience by how well the estimated parameters explain the current data, n. Notice that in using this technique we may, without loss of generality, assume that (a,c) is changing from experience to experience. Also, in using the Empirical Bayes procedure we have eliminated the assumption of a known conditional precision, h. Using ordinary least squares methods 2 we are able to estimate a = 1/h At the jth experience, the ordinary least squares estimates are given by (^a] ^Cj = ( 1 (aj, C (= (Xj) X yj.J ji X where

-12 - y 1 yl Yl 1 Y2 yj- I Yj and (Yi - - ajYi-1)2 2 = with the summation taken from j-2 i = 1 to j. The decision to start after the third observation insures that we have enough degrees of freedom to estimate the conditional variance. The weights assigned to the early experiences are usually, but not always, smaller than the weights assigned to the later experiences. The properties of Empirical Bayes procedures, such as equation (11), have always been investigated using Monte Carlo experiments. The results of these experiments seem to indicate that the procedure is asymtotically optimal [5]. That is, the risk of using the proposed estimator approaches the risk of the Bayes estimator as the number of experiences approach infinity. Only for some very simple examples have the Empirical Bayes procedures actually been shown to be asymtotically optimal, using analytic, not Monte Carlo, techniques. In all cases heretofore considered, an assumption of the model was that the conditional data at experience i was independent of the conditional data at experience j i i. This is certainly not the case in our development since the data at experience j contain all the observations from experience j-l. In order to adjust for the dependence between the experiences, a fixed number of observations could be assigned to each experience.

-13 - For example, with quarterly data might be chosen. That situation, four observations per experience could be represented as follows: experience 0 Yo 1 yl Y2 Y3 y4 a1 c1 2 61 2 Y5 Y6 Y7 Y8 a2 C2 2 2...... j... *.... Y4j-3 Y4j-2 Y4j-1 Y4j a. J c. 2 j...n '**Y4n-3 Y4n-2 Y4n-l Y4n a n c n 2 n Since we are assuming the jth experience is an autoregressive model as given by equation (1), still not independent of the j- experience. still not independent of the j-l- experience. For example Y5 is not independent of Y4; however, the conditional distributions of nonadjacent experiences are case the conditioning occurs with respect to The ordinary least squares estimates of independent. In this the last observation. a, c, and 2 are given by (j a ) y = (XXj )-1 Xjy j' J.jyj where X. = J Y3j Y4j-1 Y4j-2 _ Y4j-3 1 1 1 1 and yj = Y4j-3 Y4j-2 Y4j-1 Y4j _

-14 - Therefore the Empirical Bayes predictor of Y +k ' based on the form of the Bayes predictor in equation (8), is given by ^*(a Yn k + k-l ^ k-2^ ^ ^ ^ *(ajY4n + aj Cj + aj Cj +.... + Cj) p( aj 112) Kj 4n j j j j nj j (12) Yn+k = P(yn laj, c) where Yn= (4n-3 Y4n-2' Y4n-l' Y4n) and a~* \ A4 i-c^ c2 -P(y * a, c^jj) I 84 exp { 2 X (Y4n_ -ajy4nil -j } exp- j (y4 y4n-i-1 2cj 202 J with the summation taken from i= to 3. Concluding Remarks The Bayes predictor, as given by equation (7), implicity incorporates Assumption A into its model. Since the Bayes predictor uses the entire series y as a single entity, it assumes that (a,c) is a fixed realization from p(a,c) throughout the series. The Bayes predictor also has the difficult requirement of assessing a prior distribution on (a,c). Even though the mean vector of (a,c) might be easy to assess, it is unlikely that the variance of (a,c) could be assessed accurately. ^E The Empirical Bayes predictor,, permits Assumption B to be part of the model. However, the advantage of a more flexible model is probably offset by the disadvantage of strong dependence between experiences. The Empirical Bayes predictor, n+k, implicitly allows the assumption that (a,c) changes after every fourth observation. For quarterly data, this would be consistent with an annual revision of

-15 -model parameters. Also, this Empirical Bayes procedure has the advantage of conditional independence of experiences as explained in the last section. In order to test the worth of these proposed methods of prediction, a Monte Carlo study, similar to that reported by Orcutt and Winokur [6] and Lahiri [4], should be executed. The study should include observations generated from both models (1) and (2).

-16 - REFERENCES [1] Chow, G.C. "Multiperiod Predictions from Stochastic Difference Equations by Bayesian Methods." Econometrica 41, No. 1 (Jan. 1973): 109-18. [2] Fair, R.C. "A Comparison of Alternative Estimators of Macroeconomic Models." International Economic Review 14, No. 2 (June 1973): 261-77. [3] Klein, L.R. "An Essay on the Theory of Economic Prediction." 1968-YrjB Jahnsson Lectures. Chicago, Ill.: Markham Publishing Company, 1971. [4] Lahiri, K. "Multiperiod Predictions in Dynamic Models." International Economic Review 16, No. 3 (Oct. 1975): 699-711. [5] Lemon, G.H., and Krutchkoff, R.G. "An Empirical Bayes Smoothing Technique." Biometrika 56, No. 2 (1969): 361-5. [6] Orcutt, G.H., and Winokur, H.S., Jr. "First Order Autoregression: Inference, Estimation, and Prediction." Econometrica 37, No. 1 (Jan. 1969): 1-14. [7] Raiffa, H., and Schlaiffer, R. Applied Statistical Decision Theory. Boston, Mass.: Harvard University, 1961.