Division of Research December 1973 Graduate School of Business Administration The University of Michigan Computer Simulation in Varying Parameter Regression Models Working Paper No. 88 by W. Allen Spivey William J. Wrobleski The University of Michigan O The University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted *or reproduced without the express per- mission of he vision of Researh.

Introduction This paper is concerned with the p-by- 1 time series y(t) which, for tiTne periods t -- I, *, n has the linear regressi.on str1uc(tur e (1) y(t) - X(t)B(t) + e(t), where X(t) is a known p-by-r matrix of fixed regressors, and e(t) is a p-by- 1 vector. of equation errors having mean 0 and known p-by-p varivariacovariance matrix Ve(t). The equation errors e(l), -, e(n) are serially uncorrelated. In conventional regression theory the -r-lby-1 vector of regressions coefficient.s (3(t) is reg.arl~ded as a [fix-Ied, no(nranldon ve(. tor. In this paper, however, we investigate varying. paranmetetr regres si orn models in which B (t) is taken to be a random coefficient -vector formed for time periods t 1, *-, n from the transition equation (2) -s(t) - T(t)3(t - 1) + u(t), where T(t) is a known coefficient updating matrix of size r-by-r, and -e -i e an. 0 and kn own the r-by- 1random coefficient error vector u(t) has mean 0 and known r-by —r variance —covarianc( matrix V (1). h'1 (.-, fficient e rr r,; u(l),, u(n) are also serially uncorrc(l.atel. I n.t a( i:iiro, t.rh

random coefficient errors u(t) and the random equation errors e(t) are mutually uncorrelated both within and between time periods. Finally, for the first time period (t 1) we have (3) - B(11) T(l)A(O) + u(l), where the r-by-1 vector 3(0), called the starting value, is taken to be a known, fixed, nonrandom vector. This class of models was first introduced by Kalman [4] and by Kalman and Bucy [5] and it forms a part of the theory of optimal control. It has been applied to many engineering problems and there is an extensive literature on. the theory and on these applications (good survey papers are [2] and [9]). Application of the Kalman-Bucy model requires estimation of the varying regression coefficients 6(t) from. the observed data (the y(t) and X(t)). Using an approach involving wide sense conditional probability distributions and expectations, Kalman obtained recursive updating equations as orthogonal projections on linear manifolds for the minimum mean squared linear estimators (t). of 3(t) based on the use of all the data through time period t. The estirnators ((t) are frequently called filtered estimators. These equations can be written as follows:.:

-3 -(4) tt)= (tt - 1) + S(t - l)X'(t)D (t)(y(t) - X(t) (t|t- 1) whe re (5) B(l|o) = T(1)3(0) (6) S(l0) = Vu(l) (7) (tjt - 1) = T(t)B(t- 1) t = 2, 3,, n (8) S(tjt - 1)= T(t)S(t - 1)T'(t) + Vu(t) t 2, 3, -, n (9) S(t-l) = S(t —lt-2) - S(t- lt-2)X'(t- )D- l(t- )X(t- 1)S(t- l|t-2) t 2, 3, ' * n (10) D(t) = Ve(t) + X(t)S(ttt-l)X'(t) t = 1, 2, **, n and where the symbol ' denotes matrix transposition and the notation tit- I means that the associated variable is an estimate for time t based on all the data through time t-l. It is also shown that the differences 3 (t) - 3(t) have means 0 and variance-covariance matrices S(t) and are uncorrelated with the observations through time t. When the equation errors e(t) and coefficient errors u(t) have a Gaussian distribution the Kalman estimator 3(t) of 3(t) will be the minimum mean squared estimator rather than merely the minimum mean squared linear estimator. The relevance of Kalman-Bucy models for regression theory has been examined recently by Duncan and Horn [3] who presented a wide-sense random coefficient regression interpretation in which, as

-4 -a consequence of an extended Gauss-Markov theorern, the Kalil-an estimators 1(t) are identified as minimum variance linear (unconditionally) unbiased estimators of 3(t). Under the Gaussian assumption about the distributions of the errors e(t) and u(t) the Kalman estimator will be a minimum variance (unconditionally) unbiased estimator of 1(t). Kalman-Bucy modelling has been applied in econometrics (in [7] and [8]), and the present authors have used the approach in the econometric modelling and forecasting of short-termn interest rates [6]. In the latter work a Kalman-Bucy varying parameter adaptation of the reduced form of a standard, fixed coefficient econometric equation system is studied. It must be emphasized that applying the Kalman-Bucy model to n. economic time series offers some interesting challenges because of the problem of specifying T(t), 6(0), V (t) and Vu(t). In engineering applie cations T(t) and 1(0) are typically given by known physical properties of some real world system, so most of the attention in this literature is given to an examination of Ve(t) and Vu(t). In (ecorionorics, how v(er, none of these parameter sets can be considered t( o be, krnown, ar'l a tractable and generally applicable method of estimation is not yet available. -For example, maximum likelihood estimation of these parameter sets jointly appears to be extremely complex even in the identifiable case.Moreover,in simpler cases in which maximum. i '^ *' '.... *'.: " '.

- 5 - likelihood estimates are available, their determination requires simultaneous solution of systems of nonlinear equations. This in turn requires extensive numerical analysis even for small systems (see [1]). These complexities have led us to use simulation studies to examine the influence of various specifications of the parameter sets in the single equation case. Specifically, we explore by means of simulation the question of how sensitive the Kalman-Bucy model is to the specification of the four parameter sets T(t), f8(0), Ve(t) and Vu(t). In other words, are the properties of the Kalrnan estimators ( (t) of 3(t) and y(t) of y(t) materially influenced by mis specification of one or more of these parameter sets? We have developed a simulation model with which we can (1) generate simulated observations on (t) and y(t) from a known (specified)i Kalman-Bucy model, (2) obtain filtered estimates of B (t) and y(t) from: the simulated data using parameter sets which have been both correctly specified and then misspecified, and (3) compare the estimates obtained by correctly specifying the model with those obtained by n-isspecifying it. In every case but one we have restricted our experiments to single-equation Kalman-Bucy models in which Ve(t), Vu(t), and T(t) do not vary with time, so in order to simplify notation we will write the parameter sets as, V and T whenever appropia write these parameter sets as Ve, Vu, and T whenever appropriate.::~ ~~~~~,Van rp Finlly Ee have restricted our simulation expermnst h

- 6 - misspecification of one parameter set at a time over what we feel is a reasonably wide variety of underlying models. Description of the Simulation Program The computer program used to generate the results described in the next section requires the following inputs: 1. A set of n observations on k- 1 independent variables. Rather than randomly generate such a set of independent variable s, wo have used, for all of the experiments discussed here, thirty observations on three economic time series (thus, n - 30, k — 4). The means of Xl(t), X2(t), and X3(t) are 4.94, -.0014, and -. 0116, and the corresponding standard deviations are.694,.0018, and.0138. The correlation between Xl(t) and X2(t) is -. 811, between Xl(t) and X3(t) is.269, and between XZ(t) and X3(t) is -. 524. 2. The parameters 6(0), Ve (note that Ve is a scalar in the single equation case, Vu, and T (to be called the underlyin n parameters) of the Kalman-Bucy model that are to be used to genera:e simulated observations on g(t) and y(t), t; -,, 30. 3. The parameters 8'(0), Ve, Vu, and T* (to be called the assigned parameters) that will be used to obtain filtered estimates of f(t) and y(t), t = 1, **, 30, from the three independent

- 7 - variables and the simulated y(t) values. For a given e:xperinlent at most one of the assigned paranmeters will. differ from the underlying parameters. In most cases, exactly one will be different. When the assigned and underlying parameters are identical, all parameters are then correctly specified and the simulation results conform to the properties of the Kalman estimators described above. When the above information has been inputted into the program, it proceeds as follows: 1. B(t) and y(t) are generated from (1) and (2) for t 1, -", 30 using the underlying parameters. In all cases, the errors e(t) and u(t) are generated internally as pseudorandom selections from independent normal distributions with means of 0 and variances as specified by Ve and Vu. 2. Given the resulting 30x5 data matrix [y: X, where the first colurne of X is a column of l's, filtered estimates of c(t) and y(t), t - 1, - *, 30, are then obtai.ned using:the. a.s-;sined paramrneter s. In the case of a nondiagonal Vu, k independent N(0, 1) variates are selected; the resulting vector of four components u(t) is then premultiplied by an appropriate matrix P, chosen such that PP' - Vu, giving u.(t) which is distributed as N(0, Vu)..,..': 0 -:,~~~U

-8 - 3. Steps 1 and 2 are repeated as many times as desired, and various statistics (described below) are accumulated. In each of the experiments reported on here, we generated 30 such replications (i. e., for a given set of input, steps 1 and 2 were repeated 30 times) and then printed a variety of summary statistics. Description of Summary Statistics The large amount of output generated by our simulation experiments compels us to condense and summarize the results and limits the summary statistics that can be discussed. Of the many statistics that were accumulated for each experiment, only two will be referred to here: the average sum of squares of error per trial, and the mean error per trial (and a related t-statistic). For trial j of any given experiment (j - 1, *, 30), we define the following: yj(t) = observed value (simulated using the underlying parameters) of the dependent variable at time t (a scalar). x(t) =l1x4 vector of observations on four independent variables at time t (the first element in x(t) is equal to 1 for all t) -- note that x(t) does not vary over trials and so is tihe samern for all j. fP(t) = 4x I vector of observed coefficients (simulated using the underlying parameters) at time t. The elements of 6j(t) are denoted bij(t), i = 0, 1, 2, 3.

-9 - y.(t) =-the Kalman-Bucy filtered estimate of yj(t) derived using the assigned parameters. &j(t) = the Kalman-Bucy filtered estimate of %j(t) derived using the ~~~J ~~~~~~~~~J assigned parameters; the elements of Bj(t) are denoted b^j(t). The summary statistics on which we will base much of the discussion in the next section are described below in terms of the yj(t) and yj(t). Statistics similar to those described below are developed based on bij(t) and bij(t) in place of yj(t) and yj(t) for 1 = 0, 1, 2, 3. Table I illustrates some typical output from several experiments related to the misspecification of Ve. All three lines of statistics apply to data generated using the same underlying parameters, but each line represents estimation results for a different set of assigned parameters. For these three experiments, the data were generated with Ve equal to i. 25 and the estimates were derived by assigning Ve the values of 1. 25 (Experiment 1),. 125 (Experiment 2), and 5. 00 (Experiment 3). The values of 3*(0), Vu and Tt were assigned correctly (i. e., equal to the underlying parameters) for all three experiments TABLE 1 ESTIMATING y(t) FOR VARIOUS ASSIGNMENTS OF Vg,.._______________^ Es tim ation of y(t Expe rim en Avg. SSE/Tria Mean rror/Triai tati t) 1 18. 342 -0..0172 (-0. 568) 2 1. 843 -0. 0026 (-0. 482) 3 39. 1.61 -0. 0004 (-0. 537)

- 10 - Ay-g. SSE/Trial. If we define the sun of squares of error for trial j to be 30 SSEj zx (yj(t) - yj(t)), then the average sum of squares of error per trial is given by __ 30 SSE ( SSEj)/30 j-l Mean Error/Trial. A more accurate title for this statistic would be "rnean average error/trial." If we lefine the average error for trial j to be 30 Aj (1Z (y.(t) - y;(t))/30 then the nmean error per trial and its associated t-statistic (which provides a test of the hypothesis that the expected value of the mean error per trial is zero) are given, by a mn d. and 30 t-=M30(/ > (A - M) /)) J-.. J Discussion of Sim ulation Results Since we restricted our experiments to the misspecification of one parameter set at a time, it will be convenient to break this

- 11 - discussion of the sensitivity results into four parts, treating in turn the effects of misspecifying Ve, 3(0),, and T. Misspecification of Ve In general, the size of the equation disturbance vaancnce relative to the size of the variance contributed by the coefficient disturbances determines how well the y(t) are estimated by the y(t). In fact one can show that y(t) will be identical to y(t) if Ve(t) is identically zero for all t. It also appears from the simulation results that the smaller is the scalar V, for a given assignrment of Vu -- whatever is the specification of T and 3(0) -- the closer the y(t) will be to y(t). However, this result may induce considerable fluctuations in the estimates -(t) of N(t) from one time period to the next. Moreover, these fluctuations may be greater still if other parameters have been misspecified also. In all of our experiments concerning Ve (as well as those for Vu and T), 6(0) and 3'(0) were set equal to (.67,. 85, -220., 9. 92)'.. */ and Vu and V"' were of the form cy I. In the first twelve experiments, T and Ti were set equal to the identity matrix and the relative magnitudes of' Ve and V were varied. The final. twelve exprrimrents were identical to:the first twelve except that T and T' were specified to be the matrix.

- 12 - 1. O 0 0 1. 3 0 0 0 0 ) 30.0 0 25.0 0 00 ]. 0 The general conclusion that can be drawn from the first twelve experiments is that, regardless of the relative sizes of the variances of the equation and coefficient disturbances, rmisspecifying Ve has considerable effect on the estimation of y(t) and comparatively little effect on the estimation of B(t) (given that Vu, B(0), and T are correctly specified). To illustrate, Table 2 contains partial results of Experiments 1, 2, and 3, in which the data were generated with Ve =.125 and Vu (.025)1, and the estimates were derived by assigning Ve the values. 125,.0125, and 1. 25. The results for bz(t) and b3(t) are similar and have been omitted. The differences in the Avg. SSE/Trial and Mean Error/Trial figures for estimating the y(t)'s are dramatic, particularly in light of the comparatively stable analogous figures for estimating the coefficients. None of the t-statistics associated with the above mean error figures had an absolut e value larger than 0. 6, so those figures were deleted from Table 2. Experiments 13 through 24 were identical to the first twelve except that T and T: were not the identity matrix. The overall results for these experiments were similar to those described for the first twelve experiments, although using a non-identity T matrix greatly

TABLE 2 SELECTED ESTIMATION RESULTS FOR VARIOUS ASSIGNMENTS OF V' e Estimation of y(t) Estimation of bo(t) Estimation of blt) Experiment Avg. SSE Mean Error Avg. SSE Mean Error Avg. SSE Mean Error (Ve =.125) 0. 5320 -. 0020 10. 868.0346 0. 5776 -.0080 2 (Ve.0125) 0.0097 -.0002 11.018.0362 0. 5931 -.0080 3 (Ve = 1.25) 7.2258 -. 0152 10. 693.0303 0.7842 -.0100 e ( _ _72

- 14 - reduced the average sum of squares of error per trial and the mean error per trial for estimating b0(t) and bl(t) as compared to the resuits presented in Table 2 above. The similar values for estimating y(t) changed very little from those in the first twelve experiments. In none of the experiments concerning e did any of the misspecifications of Ve appear to bias the estimators when the other parameter sets were assigned correctly. Misspecification of f3(0) We ran 56 experiments representing a variety of underlying models to examine the effect of incorrectly specifying 3(0). In general, we found the Kalman-Bucy model to be very sensitive to the choice of the starting value 8(0). In some of the experiments, each element of 6(0) was misspecified with an error of approximately 10 percent of the true value. In other experiments, the error was closer to 50 percent. In still other experiments, only one element of 6(0) was misspecified by varying amounts. The relative sizes of Ve and Vu were varied over a number of the experiments and some experiments used a T matrix which was not the identity matrix. The most significant results can be better understood by rewriting (4) in the form (11).. ~~(t) z ~(tft —) K t~(t) (t) -

15 - where K(t) S(tft-1)X'(t)D- (t) is the so-called Kalnlan gain and v(t) is a scalar equal to y(t) - x(t) (t t-l). If T is the identity matrix, (11) can be rewritten as (12) - IS(t) - (t (- ) 4 K(t)v(t) (t = l,..., n). Obviously, then, particularly in the T = I case, the amount of correction that (t') will realize from one time period to the next depends on the elements ki(t) of K(t). The magnitude of ki(t) is considerably affected by the magnitude of xi(t) and the relative sizes of Ve and V'. For example, in many of our experiments concerning 3(0), we specified Vu and V: to be of the form co I. The mean of x (t), t = 1, *, 30, is approximately 5. 0, but the mean of x3(t), t - 1,. —, 30, is approximately 10-3 Consequently, if T r I and Vu = UI, b (t) is much more likely to correct toward its true value if misspecified than is b3(t). In general, then, if T. I and Vu is diagonal, the effect of misspecifying (0) on the estirnation of P (t) andl y(t) (depe n1ds 1(n1 the magnitude of the specification error of bi(O) relative to the mrlagnlittude of ki(t). In addition, misspecifying only one of the. elements of 6(0) may introduce bias in the estimation of the other coefficients that were correctly specified. If T is not a diagonal matrix (i. e., if there are interrelationships between the coefficients), it is easy to see that

-16 - incorrectly splccifying one coefficient may lead to lbiased resullIts for several 'ff cefficints. This was observedl in seve-r'al of our expe rimlenlts. IFor the most part, misspecifying r(0), in whatever way, had very little effect on the estimation of y(t) unless the variances of the coefficient disturbances were very small compared to the variance of the equation disturbance. In this case none of the coefficients was able to adjust itself if misspecified. For example, in one pair of experiments, the underlying parameters included T = I, Ve. 125, and Vu =. 0251. The average sum. of squares of error for y(t) was 0. 532 with 3g'(0) - 3(0) and )0.771 th h *I(O) approxirrately equal to. 53(0).' turt thermrore, the t-statistics associated- with the mean error per trial were -0. 38 in the correctly specified case andl -3. 33 in the misspecified case. In contrast to those results, another pair of experiments was run in which the conditions were identical except that Vu was set equal to (.00025)1. The average sum of squares of error for y(t) was 3. 03 for the correctly specified case and 24. 35 in the misspecified case. The corresponding t-statistics for the mean errors per trial were -0.43 and 43. 35. Misspecification of Vu It seems reasonable to believe that in a typical application, Vu will b.e assurned to be a diagonal matrix. Thus, we limited these

- 17 - experiments to cases in which V' was specified. to be a diagonal matrix whether the underlying Vu was diagonal or'not. In the first nine Vu experiments, we set T = I and Vu = oI2 U and looked at the effects of misspecifying ou for various relative 2 2 sizes of ao and Ve In all cases (regardless of the size of o2 relative to Ve, at least in our experiments), misspecifying o2 either on the high or low side had virtually no effect on the estimation of 3(t). On the other hand, underestimating cu effectively increases the relative size of V- and the model then fits y(t) less well. Conversely, overestimating Gu reduces the relative size of Ve and the model fits y(t) more closely than if o2 is specified correctly. The next nine experiments were identical to the first nine except that instead of setting T equal to I, we specified I 1 0 00 0 | ~. 1 0 0 0 30 -25 0 0 0 0 1 This transition matrix, if specified correctly as it was here, provides closer estimates of y(t) than were realized by correctly specifying an... underlying identity T matrix. Nevertheless, misspecifying o again " ~ U had little noticeable effect on the estimation of 8(t), so the results were essentially the same as those observed for the first nrine experi ments.

- 18 -The last eight experiments were desig.ned to test the effect of specifying a diagonal V,, with correct diagonal elements when the underlying V- had numerous off-diagonal elements. In other words, what happens if there is correlation among the coefficient disturbances but we fail to recognize that correlation when estimating y(t) and B(t)? Our results, all derived with T = I, indicate that as the correlations among the coefficient disturbances grow larger, so does the error of estimation if those correlations are ignored, particularly with respect to estimating 8(t). For example, in one pair of experiments we specified Vu to be a matrix with each diagonal element equal to.025 and each off-diagonal element equal to.005. This gave a correlation of.20 for each pair (uj(t), uj(t)), i -. In the first of -.. those two.experiments, we assigneda V' correctly, and in the second: we set V- (. 025)1. In another pair of experiments, the same approach was taken, but in this case the off-diagonal elements of Vu were set equal to.20. This gave a correlation of.80. Table 3 presents the results for estimating y(t) and b0(t) only; note how much the results for estimating b0(t) were affected, particularly when the larger correlation was ignored in the estimation (the t-statistics associated with the mean error per trial were virtually the same whether the correlation was correctly specified or not, so they are not......... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~. *... included in Table 3).

U n - 19 - TABLE 3 SELECTED RESULTS FOR VARIOUS SPECIF;IC'ATIONS OF V-: Estimation of y(t) Estimation of b0(t) Experiment Avg. SSE Mean Error Avg. SSE Mean Error 1 (r=. 20, VuVu) 0. 339 0.00367 1.071 0.0625 2 (r=. 20, VJVu) 0. 379 0. 00394 1. 877 0. 0830 3 (r=. 80, VI=Vu) 0.423 0.00449 0. 185 0. 0204.. 4 (r. 80, V['Vu) 0.642 0.00578 4.010 0. 1234 Misspecification of T. Our experiments with the transition matrix involved four matrices, T1, T2, T3, and T4. T1 was chosen to be the identity matrix and the remaining three matrices were chosen so that each matrix represented more complex interrelationships among the coefficients than the preceding one. Thus, T1 was the least structured transition matrix and T4 was the "most structured, " forcing the largest number of interrelationships among the coefficients. T2, T3, andT'4 all were chosen in such a way that the, elrnelnts of Of(0) (which were the same for all of these T experiments) ccnformred to. —.. the relationships imposed by those matrices. We conducted 16 experiments to examine the effects of mis specifying T. In the first four of these experim~ents- the und rlying; 9 e