Division of Research January, 1977 Graduate School of Business Administration The University of Michigan POOLING CROSS-SECTIONAL AND TIME SERIES DATA - A REVIEW OF STATISTICAL ESTIMATION TECHNIQUES Working Paper No. 141 by Terry Dielman and Roger L. Wright The University of Michigan OThe University of Michigan, 1977 FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Division of Research.

Introduction Several recent articles have been published examining the methods of pooling cross-sectional and time series data. The occurrence of observations of a large number of individuals over certain periods of time is becoming more common and, thus, these pooling methods are attracting more attention. When examining the current literature we find that there are three cases involving slight alterations in the models used that must be considered: (1) a model with exogenous variables but no lagged values of the dependent variable, (2) a model with both exogenous and lagged dependent variables, and (3) a model with lagged values of the dependent variable but no exogenous variables. Simulation studies have been performed to determine small sample properties of estimation procedures for these three cases. In this paper we present a review of these studies as well as a summary of findings from other available literature. Within our review we will note when and how the estimation procedures may best be utilized in pooling cross-section and time series data.

Linear Model And Generalized Least Squares In general, the model we are concerned with can be written as Y=XB + u K where yit =x.itjj + uit with Y an NT X 1 vector of the dependent variable Yit; X an NT X K matrix of K variables which may be exogenous or lagged dependent; B a K X 1 vector of unknown parameters; u an NT X 1 vector of the unknown stochastic components uit. The generalized least squares estimate of the parameter B is: BGLS = (X'n X)1 (X'- Y) where Q = var(u)=E(uu') Special Case: Error Component Model Suppose now that uit= Pi+ Vit where E(piVi.) = 0 for all i, i', and t E(Wi) = 0 and E(Vit) = 0 for all i and t E(pi Pi) = o 2 iif i = i' L0 otherwise and E(V itV't) = 2 if i = i' and t = t' 0 otherwise The.i are time invariant individual effects and the Vit represent remaining effects which are assumed to vary over both individuals and time. (Note that in our variance or error component representation of uit as Pi + Vit, we have simplified from the more general case of

-2 - uit=pi + t + vit where the Xt represent the period specific and individual invariant effects. Further comment on this simplification will be made later.) We now have E(uu')=Q = a2 A 0 * - O A *..0 * * 0 ~ 0 ~ A where A is the T X with = alp2+ aC We can then w] derivation): GLS= [W + BGLS xx T matrix 1 p... p 1 *. ~. * p p.. 2 v and p = oa rite the GLS p p 1 2 2 estimator as follows (see Appendix A for estimator as follows (see Appendix A for OB 1[W + OB ] xx xy xy where T =EX.'X. xx 3. 1. Bxx= (Xi'ee'.) W =T - B XX XX XX T = zX.'Y. xy i i1 1 Bxy=T (xi'ee'Yi) W =T - B 'xy xy T, W and B are K X K; Ty, Bxy and W are K X 1, XX XX XXY

-3 -e is a T X 1 vector with all elements unity and - 2 - v 8 = av+Ta^ 2.[3,4] The problem in using the GLS estimator is that the value of 9 is unknown. Therefore we must produce an estimate of 9 by estimating a 2 v 2 and cp2. (Note that we could also proceed by estimating p = a ao2 + aV2 and using this in an equivalent GLS estimator expressed in terms of p. The Nerlove articles speak in terms of finding p rather than 9; see Appendix C for details.) We now present several methods used to produce estimators of B in the articles examined (see Appendix B for a further listing): 1. The true GLS estimator with 9 assumed known. 2. The ordinary least squares estimator (OLS): A -1 OL = (T )-T OLS xx xy (This is6 with 9 = 1.) GLS 3. The least squares with dummy variables estimator (LSDV): ^ - 1 -BLSDV (Wxx xy (This isB with 9 = 0.) GLS The model for the LSDV method can be written as Y=XB+ ZP+ v where X is the matrix of exogenous and/or lagged dependent variables;

-4 - Z is an NT X N design matrix introducing the "individual dummy variables;v pis an N X 1 vector of coefficients for the dummy variables; v is an NT X 1 vector of disturbance terms. 4. Maximum likelihood estimator (ML): The value of 9ML is determined by searching for the value between zero and one that maximizes the likelihood function. We then determine 2( 1+ B -1 v,ML NT yy + yMLyy (Wxy MLBX xy ML ] 2 ^ op2 ML can then be derived from ML and a2 since ML ML v,ML v,ML ML =. 2 V,ML p,ML Therefore we have 2 a 2 1 V(,ML "2 l',ML T v,ML' ML 5. Nerlove's estimator (2RC): 2 Compute the estimate of a from the estimated residuals of v 2 LSDV and estimate a from the estimates of coefficients of dummy variables [5]. a2 Cn -w, (W )-l w v,2RC NT I YY XY XX Y N p,2RC N i 2RC (I - *iLSDV) where = i=l,2RC N = LSDV i,LSDV i,LSDV N i=l

-5 - We now present a summary of three simulation studies examining the results of estimation procedures for the cases mentioned in the introduction. Results of Comparison A study by Maddala and Mount involving Monte Carlo simulation examines the first case[4]. 1. Presence of exogenous variables but no lagged values of the dependent variable The Model: Y = Xa + u where it = it + uit with uit = pi + vit Comparison of Methods: * Bias: The bias in all cases was reported to be negligible. The maximum bias was.007, less than one percent, with no systematic variation of the relative bias. * Mean square error: No appreciable differences were found to exist between any of the two step estimators examined (ML, and 2RC). All performed equally well. Also, for overall performance, the two-step estimators outperformed OLS and LSDV. Examining the MSE's for the estimates, the MSE is seen to increase as 9 decreases (p increases) for OLS and decreases as 8 increases for LSDV. * Notes: Possibilities for negative variances exist with the ML method. ^ 2 In this case a was put equal to zero. p

-6 - The above findings also hold for the additional methods mentioned in Appendix B. Monte Carlo studies by Nerlove have examined properties of estimators in the two cases involving lagged dependent variables: 2. Presence of both exogenous and lagged dependent variables The Model: Y=X 8+ u where Yit o + 2iY itt-l +B2Xit + Uit with it =i it Comparison of Methods: * Bias: Small sample bias was known to exist in all cases in which a lagged value of the dependent variable is one of the explanatory variables. Using GLS with the true value ofp resulted in only a slight bias in all cases. OLS: (1) p = 0 The OLS and GLS estimates coincide. (2) p O0 Severe bias encountered. 1 severely biased upwards; 2' severely biased downwards for large p values except when 4=0; 2 a severely biased downward. LSDV: B1 biased downwards (less for higher values of p and $2); 2' o',2'o and p are biased upwards. 2RC: Despite the upward bias inp using LSDV, 2RC was found to be superior over a wide range of true

-7 - parameter values to all other estimates considered. 2 Estimates of o, B1, B 2 and a are not seriously biased in comparison with either OLS or LSDV. The principal source of difficulty noted is the upward 2 bias in the estimate of ML: Biases in cases where a large number of boundary solutions (P=O) did not occur were slight. Where boundary solutions occurred the ML estimates were greatly affected. Relative mean square error: Nerlove considered the mean square errors (MSEs) for the GLS method as an ideal of comparison and formed ratios of MSEs of 1o, B1 and 82 for each method to the corresponding MSE of GLS. OLS: p=0: MSEs coincide with GLS or are somewhat smaller. p3O: The above result no longer applies and estimates deteriorate. Note that even the LSDV estimates are superior for a1 and 82. LSDV: Superior to OLS for 81 and B2; worse than OLS for o. 2RC: Lower MSEs than LSDV except when pis very small. Lower than ML even when this method might be expected to perform well. 2RC compares favorably with all other estimates over a wide range of parameter values. ML: Large numbers of boundary solutions were encountered for the MLE estimates even when they were not expected to occur.

-8 - The MLE method does not compare favorably to the 2RC. 3. Presence of lagged values of the dependent variable but no exogeneous variable The Model: Y=XB + u where it = BYi't-1 + it with uit= Ii+ Vit Comparison of Methods: * Bias: GLS: Small amount of bias. OLS: Estimates of B are strongly biased upwards when p f O; 2 estimates of o are strongly biased towards zero. LSDV: Estimates of aare biased towards zero; 2 estimates of p and o are biased upwards. 2 ML: Estimates of 2 are highly erratic. 2RC: Estimates of 6 show some bias downwards for large true B 2 and some bias upwards for small true B; estimates of a are less erratic for this method than for ML. Conclusions These simulation studies yield the following conclusions: 1. With a lagged value of the dependent variable present, Nerlove's estimator, 2RC,appears to be superior to other suggested methods of estimation. 2. With no lagged value present there is no clearly superior method among the two-step procedures in terms of the properties examined,although these are superior to methods such as OLS and

-9 - LSDV. Maddala and Mount suggest, in this case, that two of the methods be applied to the data at hand and the results compared. Widely differing estimates should be taken as an indication of a possible misspecification of the model. 1 When looking for a single method of estimation to be used with all data sets, however, the choice would go to Nerlove's method. Prospects for Futher Study The literature thus far has examined cases where uit= Pi + it' i.e., where the random time effect has been omitted from the disturbance term. This is simply a matter of choice in specifying the model. With the time effect included, we would have the following specification: Y = Xe + u, uit = i + Xt + vit The the GLS estimate of B will be + EG B +9 C W + EGB + 2C1 GLS [xx 1 xx +2 Cxx x y X xy ] where a 2 v a2+Ta2 a 2 + To 2 v v @ av2 v A and C, Cy and C represent the between time period decomposition of the variances [3]. See Arora for another study supporting use of a two-step method. See Arora [2] for another study supporting use of a two-step method.

-10 -Nerlove suggests that further study in terms of small sample properties of estimators would be desirable in this case, especially in terms of the results when the Xt are erroneously assumed absent [5]. The assumptions made concerning the u vector, i.e., that A 0...0 E(uu') = 02 0 A 0 o 0..A 0 2 where a2 = o 2 + o 2, p= and 1 p.. p p i.. p A= p *,. S p p... imply a specific form of serial correlation. The effect of a misspecification here needs to be investigated [5]. Another area of study concerning the pooling of cross-section and time series data is the use of the random coefficient regression (RCR) model. Study in this area has begun. For example, Hsiao has proposed the following model [2]: it = ( + +t) it + it with 6i, Yt and sit random disturbances,

-11 - E( itC = a2 i = j t=t' 0, otherwise E(6iSj) = a2, i = j 0, otherwise E(ytyt') = a, t = t 0, otherwise E(6iYt) E(i j) = E(teit) 0; E(sit) = E(6i) = E(yt) = 0 for all i, j and t. Note that this case can include the error components model. Hsiao writes the model as Yt = ( +6 + x itk + it. K it E (]k ik Ytk) itk it k=l Now, if each element of the first column of x is equal to one, we have it = 1+ (k + 6ik + tk)Xitk + il + Yit + it k=2 where 6il' Yt' and Eit correspond to pi, Xt, and Vit in our previous error components model. In this case, however, we also allow for the possibility of random slope coefficients. RCR models are justifiable in economic use. The coefficient of a variable may not remain constant because of unobservable influences of the variable. Thus, we may be better off predicting the mean of some process that determines the coefficient rather than assuming the coefficient to be constant. Since these models represent additional general instances of the

-12 -error components model used in pooling cross-section and time series data, they deserve the attention of further study.

I Appendix A Derivation of GLS Estimator We have written B = [W + B] [W + GB )] following We have written GLS = x xx xy xy Maddala and Mount. The following is a verification that our form is equivalent to the GLS estimator. Y = XB + u where uit = Pi + vit The GLS estimator for the above model is GLS = ( X'Xi) 1 X - Y whereQ = E( uu' )= F A 0 *.. 0 - 2 0 _0 A 0. *. O *6 - A (NT X NT) where A = p.. p p 1 *- p _P P * — 1 (T X T) 2 2 + a2 and p _ -1 To obtain n we can make use of the orthogonal transformation C = KeJ/~/T where e = cL (T X T) (T 1)

Appendix A (Continued) and C is defined so that C1 e=O, C1 C1 IT_= and C C1 = I T T 1 1, ~~~~1, T- i' 1. 1 =T T Then CAC' = (1 - p) + Tp 0 0 0 1 - p... 0 0.. -.... 1 - P_ 0 Therefore Therefore 2 = 1 =a -1 0 0 A-1 A 0.0.-1...A2 0 where A = C' 1 (1 - p) + Tp 0 0 0 C 1 1 - p.. 0 1 0 l-p 0 Note: This formulation follows i which may be stated as follows: an orthogonal matrix B such that diagonal and contains the latent from a result in Theil [8] (pp. 27-28) For any symmetric matrix M there exists BM = QB, BMB'= Qand M=B'QB where Q is roots of M along the diagonal.

Appendix A (Continued) (**) Then A-1 (**) Then A 1 1 - p IT + 1 -1 + (l-p) + — Tp T (l-p) + TP -1 )ee 1-p e' 1 1 - p T 1 I 1- p T 2 IT + 1 ((-p) - (1-p+Tp)) ee' T (l-p+Tp)(l-p) + -P (l-p)(1-p+Tp) ee' + A1 ee' where A1 = -p (l-p)(1-p+Tp) and 12 1p X2 = 1-p We can now write the GLS estimator as follows: GLS ~GLS 1 A.O.. _ - 1 - A -1 O x A ~ o1l 1X -1 _L L...AI _ 0... o7.. 0 Y..A11 -1 i N _ _ =1 X. 'I -1 X r N i-= X A-1 1i _ where X = [XI; * *. XN ] Y'] N and Xi is K x T, and Yi is 1 x T. Y' = [Y1 Y2 2 N = I ZE Li=l Xi [Xlee' + X2I] Xi -1 N i=l Xi [Alee + X2I] y

Appendix A (Continued) [N N -i N N C= E X.'ee'X. + k2 Z X.'X. 1 XI X.'ee'Y. + Az E X.'Y. i=l 1 i=l 1 i i= 1 = T Bxx + X2 Txx ] [ X1 T BY + X Txy ] - -= B + T 1 xx x rL B +T 1 L Az xy xyj and, on setting = 1- 1T, we have X2 = [(E- 1)BX + TXX [(9- 1)Bxy + Tx = [Bxx + (Txx - Bxx) ]1 GBxy + (Txy - Bxy) = Bxx + Wxx] By + Wxy where Wxx, B,X Txx, Wxy, Bxy and Txy are defined on page 4. Note: Az

Appendix A (Continued) f- PT ]-) -- pT (1 ) = 1 - 1-p+pT = 1-p 1-p+pT o2 1- -Z Ua o2,z TOa 1 - 1 + 1 =a- = z2 - 2 0o - CZ + TO2 P VP o2 = v ao + TOZ v p

Appendix B Alternative Estimation Methods Description of other estimators examined in certain studies are as follows: 6. The least squares between groups estimator (LSBG): ~(B~ (B) LSBG (Bxx (By) 7. Wallace and Hussain's estimator (WH)[9]: Wallace and Hussain propose using estimated OLS residuals as the true residuals to estimate a2 and 2: 1J v 2 1..x-1 a (1 LW - 2T ( ) W + T '(T W (T,WH N(T- xx x y xy xx xx xx xy o WH = -2 T ) y + T (T B-1 (T 1T C NT yy Xy Xx Xy Xy Xx Xx TXxy 1 2 -- a T V,WH 8. Amemiya's estimator (AM)[1]: Identical to WH except using LSDV rather than OLS in the first step: O2, = 1 [W -W '(W W ] V,AM N(T-1) YY xy xx xy 2 = - [B - 2 WB.) (W )-B W(W. 1,AM NT yy xy xx xy xx xx xx xy 1,AM - T aVAM ~ -— ~~- ~

Appendix B (Continued) 9. Instrumental-variable estimates using xi, t-1 as an instrument for i (IV) [5]: With this method p is estimated from the calculated residuals, uit, by 1 N T Uit i=l t=l it i=l t=l i N - 1 E NT i=l T P N T w z Uit where az = i=l t=l NT 10. Two round instrumental (2RI) estimates similar to those in 2RC but using the calculated residuals from the IV method rather than LSDV[5]. 11. Analysis of covariance method (ANCOVA): a2 V,ANCOVA -= 1 N(T-1)-K [W -W 'I(W )-l] xx xy xx xy ANCOVA 1U,ANCOVA 1 -1 2 T(N-K) B - B ' (B )-l B ] - ' T(N-K) yy xy xx xy T V,ANCOVA where k = number of slope parameters. 12. Henderson's Method III(H3): a2 = A2 v,H3 V,ANCOVA

Appendix B (Continued) o2 = a-1W '(W )-lw -T (T.)T Na2 i,H3 y x xy xx xy v,H3 where a = T[N - trace[(Txx) Bx] 13. Minimum norm quadratic unbiased estimator (MINQUE)[7]: This method (due to Rao) proposed that we minimize the Euclidean norm of the difference between the actual estimator and the "natural" estimator given vi and v in Y = XS + Zp + v Estimators of a2 and a2 are computed by solving the following V 11 simultaneous equation system: - trace {[RE'] 2} tracer [Ri R]} E Y'R'RY PMINQUE Y'RBS'RY trace {[RZZ'R]2} trace {R2} vMINQUE YRRY VMINQUE Y'RRY where R = (H-1 - H - X(X'HX-l X) l'H-1) H - (gg' + INT)1. Studies involving the above methods and results 1. Presence of exogeneous variables but no lagged dependent variables. Also examined methods 6. LSBG 7. WH 8. AM 11. ANCOVA 12. H3 13. MINQUE Notes: Possibilities for negative variances exist with any of the

Appendix B (Continued) following: MINQUE, ANCOVA, H3, WH, and AM. In such cases a2 1P was put equal to zero. The ANCOVA and H3 methods both lack the property of uniqueness in the estimators produced. Results stated in main body of paper hold for these six methods also. The two step methods listed above are 7, 8, 11, 12, and 13. 2. Presence of exogenous and lagged dependent variables. Also examined methods 9. IV 10. 2RI Results: Bias IV: When B = 0, these estimates are inconsistent and highly erratic behavior occurs. When B = 1, bias in all parameters is slight. 2RI: The 2RI estimates are greatly affected by the extreme variability of the estimates of p obtained from the IV method. The estimates of 02 show the greatest bias and erratic behavior. Mean square error IV: Ratio is high for f = 0 but falls markedly as B increases. 2RI: Erratic behavior when 0 = 0 because underlying estimates of p are so poor. Higher MSE's than 2RC even for large values of 5. 3. Lagged dependent variables only. No further methods were examined. -— ^

Appendix C Derivation of Nerlove's GLS Estimator To derive Nerlove's form of the GLS estimator (which involves p rather than 8), we note that in line (**) of Appendix A we have: A-1 A 1 1-p T + C (-+T - ee' 1-p[ IT 1 ee' + T 1 T 1 - ee' T (1-p+pT) GLS - (X' S-lx)-l (X' Q-1Y) N i=l1 XA-X ]-1 i i ( Ni i=i 1 l N XI E _ ee' (z - <-) T X. 1 N + 1 Z T i=l X.ee'X.. 1 1 -1-p+pT -1 1- p - N X Z i=l ee' Xi (IT- ee ) 1 - p N X'ee'Y. + 1i 1 '-pT 1-p+pT i=1 = f Xx 1 -p+ T Bx - 1-p xx 1-p-PpT X I 1W "~]+ [ 1 -p+pT xy l-p xy 1-p+pT xY This is the form used by Nerlove [5].

References 1. Amemiya, T. "The Estimation of the Variances in a Variance - Components Model." International Economics Review, February 1971, pp. 1-13. 2. Arora, Swarnjit S. "Error Components Models and Their Applications." Annals of Economic and Social Measurement 2/4 (1973): 451-67. 3. Hsiao, Cheng. "Some Estimation Methods for a Random Coefficient Model." Econometrica 43, 2 (March 1975): 305-25. 4. Maddala, G. S. "The Use of Variance Components Models in Pooling Cross-Section and Time Series Data." Econometrica 39, 2 (March 1971): 341-58. 5. Maddala, G. S. and Mount, T.D. "A Comparative Study of Alternative Estimators for Variance Components Models Used in Econometric Applications." JASA 68, 342 (June 1973): 324-28. 6. Nerlove, Marc. "Further Evidence on the Estimation of Dynamic Economic Relations from a Time Series of Cross-Sections." Econometrica 39, 2 (March 1971): 359-82 7. Nerlove, Marc. "A Note on Error Components Models." Econometrica 39, 2 (March 1971): 383-96. 8. Rao, C. R. "Estimation of Variance and Covariance Components in Linear Models." JASA 67 (March 1972): 112-15. 9. Theil, Henry. Principles of Econometrics (New York, NY: John Wiley & Sons, Inc., (1971),.pp. 27-28. 10. Wallace, T. D. and Hussain, A. "The Use of Error Components Models in Combining Cross-Section with Time Series Data." Econometrica 37, 1 (January 1969): 55-72.

-6 - The above findings also hold for the additional methods mentiDned in Appendix B. Monte Carlo studies by Nerlove have examined properties of emtimators in the two cases involving lagged dependent variables: 2. Presence of both exogenous and lagged dependent variables The Model: Y=X B+ u where YitBo + BiYift-1 +Q2xit + Uit with Uit = pi + it Comparison of Methods: * Bias: Small sample bias was known to exist in all cases in whiif a lagged value of the dependent variable is one of the exIpanatory variables. Using GLS with the true value of p resuEted in only a slight bias in all cases. OLS: (1) p = 0 The OLS and GLS estimates coincide. (2) p y0 Severe bias encountered. 1 severely biased upwards; 2'-severely biased downwards for large p values except when 2=0; 2 a severely biased downward. LSDV: L1 biased downwards (less for higher values d p and B2); Bo, B2,a and p are biased upwards. 2RC: Despite the upward bias in p using LSDV, 2RC wafound to be superior over a wide range of true