Division of Research Graduate School of Business Administration The University of Michigan August 1980 EFFICIENT INFERENCE IN RANDOM COEFFICIENT MODELS WITH MULTICOLLINEARITY IN THE TIME SERIES REGRESSIONS Working Paper No. 228 Robert K. Rayner Roger L. Wright* The University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Division of Research. *We are grateful to E. Phillip Howrey for helpful discussions and valuable comments.

Introduction In applications involving pooled cross-sectional and time series data, one often has several variables describing individuals (units, firms) which exhibit cross-sectional variability, but little or no time series variability for each individual. This type of data presents a problem for random coefficient regression (RCR) pooling. The usual method of estimation for this model (Swamy, 1971) is a feasible Aitken procedure in which the first step is the estimation of a time series regression for each individual. If some explanatory variables are fixed or nearly fixed over time for one or more individuals, there are problems in implementing this first step. This paper provides an extension of Swamy's RCR model to the case where there may be multicollinearity in some or all of the individual time series regressions. The first section of this paper describes an application in finance which motivates our interest in this problem. In Section Two, we describe a generalization of Swamy's two-stage estimator of the mean (F) of the vector of coefficients conditional on known variance-covariance parameters. Section Three deals with the problem of predicting the coefficients of a particular individual, again assuming known variance-covariance parameters. In Section Four, estimation and prediction methods are developed assuming unknown variance-covariance parameters. Then we discuss some empirical results on the finance problem introduced in Section One. A brief summary of the paper is given in Section Six.

-2 - 1. Estimating Cost of Equity Capital Rate regulation for electric utilities depends heavily on estimates of the cost of the firm's equity capital. The capital asset pricing model is one of the many techniques that is used to provide these estimates. According to capital asset pricing theory, the cost of equity capital (in excess of the risk free rate) to firm i in period t is given by E(yit) in the model E(Yit) = yiE(Rt) (1.1) Here Rt is the excess return in period t of the market portfolio of all stocks. Equation (1.1) states that systematic risk, Yi, determines differential expected returns among securities and that there is a linear relationship between Yi and expected return, i.e., cost of capital. In a rate case application, an estimate of yi may be obtained from a time series regression of the form Yi = RYi + ci (1.2) where yi and R are T x 1 vectors of the observed excess rates of return on the ith stock and the market, respectively, Yi is an unknown scalar parameter, and ei is a T x 1 vector of disturbances. The estimate gi of Yi is then used to help estimate cost of capital to firm i. However, the question arises as to how well gi predicts the firm's true Yi, which is what is needed in estimating cost of capital. We might expect to get better estimates of cost of capital by "comparing" relevant characteristics of the firm to those of other firms like it, i.e., other firms in the industry. As Myers (1978) has noted: The distinction between industry and firm systematic risk is important in rate cases. It is hard to estimate a regulated firm's cost of capital if data on only that firm are available. This is true regardless of the approach taken. It is necessary to broaden the sample.

-3 - Rosenberg and McKibbon (1973) have devised a pooling approach which is potentially useful for predicting Yi for an electric utility. They estimate the systematic risk for firm i in period t using a model in which systematic risk is assumed to be linearly related to certain accounting variables, Yit = Withi In this paper we use the recent theoretical results of Bowman (1979) to help specify the model for systematic risk, and then use a modified RosenbergMcKibbon (R-M) pooling approach to predict Yi for electric utilities. Bowman has shown that there is a theoretical relationship between a firm's systematic risk yi and the firm's debt ratio. We note that debt ratio is a variable which exhibits variability across firms, but little or no variability over time, for each firm. Thus, we investigate the model Yi = Wiri (1.3).in which Yi is an unknown scalar parameter, Wi is a 1 x 2 vector (including the intercept) of observations assumed to be constant over time, and $i is a 2 x 1 vector of unknown parameters. Then, substitution pf (1.3) into (1.2) results in Yi = RWiji + ci (1.4) = Xii i+ Si where yi and R are T x 1 vectors of observations on each of the N firms and the market, respectively, and Xi is the T x 2 matrix defined by Xi = RWi. Note that for each i, (1.4) is a singular model, because Xi is of rank 1. Now, if we are willing to assume that Oi = 0 (i = 1,2,...,N) multicollinearity may not be a problem in the pooling model. But a more reasonable assumption is that there are firm differences, and so we consider random coefficient regression pooling.

-4 - RCR is a parsimonious specification that allows for cross-sectional variation in model parameters. In RCR pooling, each coefficient vector is considered to be a random draw from the same multivariate normal distribution. This seems reasonable for a homogeneous industry like the electric utilities industry. Random coefficient regression is a promising modeling approach for other reasons. It is well suited for drawing inferences from a sample to a population (see, for example, Dielman, Nantell and Wright, 1980). Also, the procedure is computationally efficient. In RCR pooling it is assumed that the coefficient vector gi is fixed over time. Although not always so, this may be a reasonable assumption in this application, at least over the relatively short time period of the last four years. For a portfolio of electric utilities, Warga (1980) has found that there are dramatic fluctuations in model parameters around the period of the 1973 oil embargo, but that the parameters are relatively stable after 1975. Using monthly data from 1976-1979, we have a reasonable number of observations for the individual time series regressions. Overall, random coefficient regression seems to be a good choice of methodology for the cost-of-capital problem, and so it is the approach that is used here. Coefficients in (1.4) are assumed to be fixed over time for each firm, but they are allowed to vary randomly across firms. Specifically, we model cost of capital by Yi = (RWi)Bi + Ei = 1,2,...N) (1.5) - xisi + ci where gi - N2($, A), Xi = RWi, and si NT(O, ciiI).

-5 -Because Xi is not of full rank, Swamy's (1971) feasible Aitken procedure for estimating p cannot be used directly. The first step in that method is the estimation of individual time series regressions. In the next section we show how to estimate 8, conditional on known variance-covariance parameters, when there is multicollinearity in the time series regressions.

-6 - 2. Estimation With Known Variance-Covariance Parameters This section introduces the random coefficient regression (RCR) pooling model which is useful in many applications including our financial example. Our version of the RCR model is similar to the pooling models studied extensively by Swamy and others, but features singularity (i.e., multicollinearity) in the time series datasets describing some or all of the individuals (e.g., securities). In this section we will describe an extension of Swamy's twostage procedure for computing Aitken's generalized least squares estimators in the presence of this multicollinearity. Throughout this section, all variance-covariance parameters are assumed to be known. Consider the following model: yi = XiBi + ei (i = 1,2...,N(2.1) where yi and ei are T x 1 vectors, Xi is a T x k matrix, and Bj is a k x 1 vector. The vector yi and the matrix Xi contain observed variables characterizing individual i, while 0i and ci are unobserved random vectors. We assume that (a) Rank (Xi) = ri < k, but rank (X) = k where X' = X1' X2'... XN'] is the k x NT matrix comprised of the Xi, (b) 0i is Nk(0, A) where A is positive semidefinite, (c) ei is NT(O, aiiI) with vii > 0, and (d) 01, '*', 0N, El,..., eN are mutually independent. In addition, throughout this section and the next, it is assumed that aii and A are known.

-7 - Equation (2.1) can be rewritten as Yi = Xi + ui where (2.2) ui = Xi( i - f) + si. Here ui is NT(O, Qi) with known variance-covariance matrix ~i = XiAXi' + aiiI, and ul... uN are mutually independent. Since X is of full rank, the best linear unbiased estimator of 8 is uniquely determined by Aitken's generalized least squares estimator: N N b = ( I Xi'Qi-lXi)-l Xi'i-lyi. (2.3) i=l i=l Moreover, the variance-covariance matrix of the estimator b, denoted here as C, is N C = ( X Xi'i-ljXi)-1 i=1 We now turn to the case that each Xi is of full rank, i.e., ri = k. This situation has been extensively studied by Swamy (1971). Swamy shows that b can be efficiently computed following a two-stage procedure. In the first stage of this procedure, ordinary regression statistics are calculated which summarize the time series observations of each individual. In the second stage, these N sets of time series regression statistics are pooled to estimate the population parameter B. Specifically, in the first-stage time series analysis of each individual i, the following statistics are calculated: bi = (Xi'Xi)-lXi'yi and Ci = A + aii(XiXi)-1. Here bi is the vector of ordinary regression coefficients summarizing the time series data describing individual i, and Ci is the variance-covariance

-8 - matrix of bi taking into account both the variance of Bi around B and also the conditional variance of bi as an estimator of gi: Ci = E(bi - 0)(bi - )' =E( i- (i- ' + E-i- + [(bi- i)(bi- Bi)' I ]. Swamy's second-stage, cross-sectional analysis follows from the GLS representation of 8 (2.3) together with the identities XijQi-lXi = Ci-1 and Xi1'Qi-yi = Ci-lbi. In fact, N N b = ( I Ci-1)- Ci-lbi and (2.4) i=l i=l N C - ( Ci-1>-1. i=l So in the second stage, b is calculated as a weighted average of the bi. The computational advantage of (2.4) over (2.3) is substantial since (2.3) involves the inverse of each Qi which is a T x T matrix, while (2.4) involves the inverse of matrices of size k x k. In many applications, including the finance example described in Section One, Swamy's two-stage procedure cannot be directly applied since the Xi are of less than full rank (i.e., multicollinear) so that the bi are undefined. The remainder of this section describes a generalization of Swamy's two-stage procedure which preserves its intuitive appeal and computational advantages and is applicable to the multicollinear case in which some of the ri are less than k. This generalization utilizes the rank factorization (Rao, 1973, p. 19) of each Xi with rank ri less than k. Using the rank factorization, we compute matrices Rj and Wi such that Xi = RiWi, where Ri is of size

-9 - T x ri and is of full rank ri, and Wi is of size ri x k and is also of rank ri. (If Xi is of full rank k, we take Ri = Xi and Wi = I.) Then we make the following definitions: Yi = Wifti, (2.5) gi = (Ri'Ri)-Ri'yi, and Hi = WiAWi' + aii(Ri'Ri') — To motivate these definitions, we note that the model (2.1) can be rewritten as Yi = RiYI + ei and gi is the ordinary least squares, time series estimator of Yi with variance-covariance matrix Hi if the variability of Yi is taken into account. By employing these definitions and a matrix identity of Rao (1973, exercise 2.9, p. 33), we obtain the identities Xi'fi-lxi = Wi'Hi-'Wi and (2.6) Xi'X i-lyi = Wj'Hi-lgi. These identities together with (2.3) give N N b - ( I Wi'H-lWi)-1 Wi'Hi-lgi, (2.7) i=l i-1 N C = ( Wi'Hi-lWi)-1. i=l So Swamy's second-stage weighted average (2.4), which is only applicable if each Xi is of full rank, generalizes to (2.7) which represents the best linear unbiased estimator b as a cross-sectional, generalized MANOVA estimator in which the gi, computed from the time series regressions, play the role of a vector of observed dependent variables, the Wi are matrices of independent variables, and the Hi represent the variance-covariance matrices of the

-10 - observations gi. The form of (2.7) is consistent with the fact that each gi is Nri(Wig, Hi) and gl,..., gN are mutually independent. The rank factorization of each Xi plays an essential role in this representation of b. The first factor Ri, is used only in the ith firststage time series regression, i.e., in the computation of gi and Hi. These statistics from the N time series regressions are used in the second-stage pooling in conjunction with the set of second factors Wi which determine the relationships between the observed gi and the underlying fi. Hence the two factors of Xi are related respectively to the two stages of analysis; the Ri comprise the time series observations on each individual, and the set of Wi, together with the time series regression statistics gi and Hi, comprise the cross-sectional information characterizing the sample of individuals. In concluding this section we note that in many applications involving cross-sectional time series datasets, including our finance example, it is natural to formulate a model which has two components corresponding to the time series and cross-sectional stages of analysis (see, for example, Amemiya (1978) and Hanushek (1974)). In these cases, the rank factorization may arise naturally. In the situations that we have in mind, a time series model, say Yi = RiYi + Ei, is specified to represent the process generating the time series observations of each individual i (i = 1,2,...,N). Here each Ri is T x ri and of full rank ri, and Yi is a ri x 1 vector of time-invariant, unobserved characteristics of individual i. A second cross-sectional model, Yi = WiIi, is specified which expresses the unobserved Yi as an observed linear

-11 -transformation Wi of a vector Bi (k x 1) of unobserved, time-invariant characteristics of the individual. These vector 0i are assumed to comprise a random sample from a multivariate normal population with unknown mean B and known variance-covariance matrix A. If the cross-sectional model is imbedded into the time series model, we obtain the RCR model (2.1) where each Xi is singular, having rank decomposition Xi = RiWi.

-12 - 3. Predicting With Known Variance-Covariance Parameters In many applications, there is a need to predict the value of Bi (a k x 1 vector) for a specific individual i. For example, in our finance application, interest is in predicting the Si which represents the ith security's characteristics that determine its systematic risk in conjunction with its observed characteristics Wi. In this section we consider the problem of efficient prediction of fi. We assume that the RCR model (2.1) is applicable to an available pooled crosssectional time series database describing a sample of N individuals, and that all variance-covariance parameters are known. The case that each Xi is of full rank is discussed first, followed by the case where some or all of the Xi may be less than full rank. A natural predictor of Bi when Xi is of full rank is the OLS predictor, bi = (Xi'Xi)-lXjiyi, based on the ith individual. bi is unbiased, and of all predictors that are linear in yi, it is the most efficient. However, as Kadiyala and Oberhelman (1979) note, bi uses information only on the ith individual and ignores information about all of the other individuals. Therefore, they consider predictors of Bi which are linear in y, i.e., of the form Aiy where Ai is a (k x NT) matrix of constants. They show that the unbiased, linear predictor of Si with the lowest mean squared error is given by bci =b + AXi'Q-lyi - AXi'S-1Xib (3.1) which simplifies to bci = A(CA + aii(Xi'Xi)-1)-lbi + (cii(Xi'Xi)l1)(A + aii(Xi'Xi)-lX)lb (3.2) As Kadiyala and Oberhelman have observed, equation (3.2) has an intuitive interpretation. The predictor is simply a weighted average of the estimator

-13 - bi on the ith individual, and the estimator b. In this formula, oii(Xi'Xi)-l is the variance-covariance matrix of bi, and A is the variance-covariance matrix of the jBis. The larger the variance of the OLS estimator bi, the greater the weight given to b, the estimator of the mean vector; and the greater the dispersion of the 0i, the greater the weight given to the estimator bi based on observations on the ith individual. The extension of equation (3.1) to the case that rank (Xi) < k is straightforward. In the Kadiyala-Oberhelman derivation of the "best" predictor of 0i, it is sufficient that X be of full rank, so (3.1) remains valid with our assumptions. Using (2.6), we rewrite (3.1) to obtain the more computationally efficient formula bCi = b + AWi'(WiAWi' + ii(Ri'Ri)-l)-l(gi - Wib) (3.3) Equation (3.3) provides the predictor of gi that has the smallest mean squared error (in the class of linear, unbiased predictors), when rank (Xi) < k. Equation (3.3) appears at first not to have an easy interpretation. However, we note that it is, in fact, closely related to an intuitively appealing estimator which we derive now. Although it has been observed previously that 0i may not be estimable under generalized rank conditions on Xi, it is possible to get a g-inverse solution for gi in the equation Wi0i = gi (3.4) Given an estimate b of 0, an intuitively appealing estimator of gi for the ith individual is the minimum norm g-inverse solution of (3.4). (For a suitably chosen positive definite matrix Q, the norm of a vector a is defined by I all = (acQa)l/2). Rao (1973, p. 48) shows that the minimum norm g-inverse, with respect to the 0 vector, is given by

-14 - Q'-Wi'(WiQ-lWi'), where "-" denotes the g-inverse. Thus, after transforming (3.4) to an origin at b, Wi(fi - b) = gi - Wib, the minimum norm solution to (3.4) is given by b + Q-1Wi'(WiQ-1Wi')-(gi - Wib) and choosing Q = AC1, we get b + AWi'(WiAWi')-(gi - Wib) (3.5) The expression in (3.5) is very similar to the predictor in (3.3). However, (3.5) does not allow for variability in bi, conditional on known 0i. In some applications, it is of interest to predict Yi = WiS. For example, in our finance problem, we are interested in systematic risk yi. It is easy to show that the best predictor of Wji is WibCi, and from (3.3) Wibci = Wib+ WW (WW + iWi' (W i + i(Ri'Ri)-)- (g - WHb) which can be written as Wibci = (WiAWi')(WiAWi1 + aii(Ri'Ri)-l)lg i + aii(Ri'Ri)-l(WiAWi' + aii(Ri'Ri1)l)-lib (3.6) Equation (3.6) admits to the same kind of interpretation as did (3.2). In this section we have derived efficient predictors for gi and Yi - Wigi for the case that rank (Xi) < k. As in the previous section, results were obtained assuming known variance-covariance parameters in model (2.1). In practice, the variance-covariance parameters must be estimated, and so it is relevant to ask how prediction is affected. Kadiyala and Oberhelman cite some Monte Carlo evidence that bCi, computed from (3.3) with estimated parameters, is particularly warranted when N is large and T is small.

-15 - 4. Estimation with Unknown Variance-Covariance Parameters In this section we consider estimation in the RCR model with unknown variance-covariance parameters and multicollinearity in some or all of the individual time series regressions. A computationally efficient form for the likelihood function of the parameters in the case rank (Xi) < k is derived first. Then, because it is not clear whether the parameters are identified, we look at identification in the RCR model in the presence of collinearity. The information matrix for the parameters is derived, and two conditions are obtained which, in conjunction with assumptions (a) thru (d) in Section One, are sufficient for the information matrix to be of full rank, thus establishing local identifiability (see Rothenberg (1971)). Then Fisher's method of scoring (Rao, 1973, pp. 366-374), a procedure for maximum likelihood estimation, is proposed for estimating the A matrix. We conclude this section with an iterative estimation procedure based on equation (3.3). Introducing some additional notation we write the system of equations (2.2) as y = X3 + u with y' = (Ylt.. YN') and u' = (ul'... uN') and E(uu') = Q. The vectors ul,..., uN are mutually independent, so that ( = diag(Qi). We assume that A depends on a finite number of unknown parameters 6' = (61 6... ) Let O' = (6'a'), where a' = (aii... oNN) Then by our assumptions, the likelihood function for the parameters S and 0 is given by -NT/2 N 1/2 L(, O, X) = (2) -NT/2 N IXiAXi' + aiiIT-12 i=l N.exp{-l/2 I (yi - XiS)'(XiAX' + aiiIT)-(Yi- Xi)} (4.1) i=l

-16 - Using the rank factorization of the Xi, the definitions (2.5) and again the matrix identity in Rao, equation (4.1) can be written as L(, 1, X) (2)-NT/2 N -(T-ri)/2 R -1/2 Hi-1/2 L(3, Oly, X) = (2)RiR I i=l N.exp{-1/2 T-ri) sii + (gi-Wif)' Hi-l(gi-Wij)} (4.2) i=1 oil where sii is the OLS estimator of aii from the ith time series equation. Equation (4.2) provides substantial computational advantages over (4.1), because the matrices in (4.2) are on the order of ri x ri, whereas those in (4.1) are T x T. (cf. Swamy, 1971, pp. 111-112). We now look at identification in the RCR model with generalized rank assumptions. It is clear that if rank (Xi) < k, then Bi from the ith time series regression is unidentified, but what of the parameters 0 and O? It is not clear from (4.2) whether they are identified or not. We use the information matrix to examine parameter identifiability. According to Rothenberg, a necessary and sufficient condition for the 8 and O parameters to be locally identified is that the information matrix be nonsingular. Magnus (1978) has shown that the information matrix is given by X'eQ-i X o (4.3) K I X 1/2( 3vecl) ( avecQ -l In (4.3), the notation vec ~-1 means the (NT)2 x 1 vector obtained from -1 by taking the first column of ~-1, then the 2nd and so forth; and ~ is the Knonecker product operator. Also, we use the definition of matrix Ovec-1' derivative as Dhrymes (1978), so that v l is an (NT)2 by (M + N) matrix of first partial derivatives. of first partial derivatives.

-17 - By assumption, X'-'1X is of full rank, and hence P is locally identified. A necessary and sufficient condition for 0 to be locally identified is avecSr1 (vecW-' that the matrix 1/2( ave )' ( Q )( eO ) be nonsingular. Now Q is nonsingular, and so 0 ~ 2 is nonsingular. Thus, we need to establish conditions for which vec- (or equivalently 3v ) is of full rank -0 0a (M + N). 3vec~2 We approach the problem by deriving a convenient expression for av0, 30 using the matrix formulas in Dhrymes and again the rank factorization idea and Rao's matrix identity. The Q matrix is block diagonal, so that avecQ 3vecQ21 avecfN rank (- O ) = rank [( 0 ) ( )] (4.4) Also, it can be shown that avec ~2i 3vecA ave c iiI ae = [(Ri 0 Ri)(Wi 0 Wi) va & a i (4.5) Then, after substituting (4.5) into the matrix 3vec1l avecf2N v - [ ( e )'" ( - se )']' of (4.4), it follows that 3vec ~2 rank (9c ) = rank (V) = rank (V'V). (4.6) and so we need conditions for which the matrix V'V is of full rank. It can be shown that sufficient for V'V to be of full rank, and hence O identified, are: (1) The matrix W( vecA) is of rank M, where W = (Wi O Wit... WN' Q WN')', and (2) T > ri, for i = 1, 2,..., N.

-18 - Assumptions (a) thru (d) with conditions (1) and (2) above are sufficient for identifiability of all of the RCR parameters. Now consider estimation of (0, 6, a). It does not seem to be computationally feasible to compute the maximum likelihood estimator of a since matrices of order N2 are encountered. However, using rank factorization, an ordinary regression estimator sii is easily calculated and is an unbiased, consistent (T + o) estimator of oil (Goldberger, 1964, pp. 269-272). Conditional on these estimates of vii,.maximum likelihood estimation of B and 6 is computationally feasible using Fisher's method of scoring (Rao, 1973, pp. 366-374). We use scoring on 6 combined with (2.7) for estimation of A. To implement this procedure, it is necesary to derive for 6 the firstorder conditions and information matrix. Starting from equation (13) in Magnus, we can show that the first-order condition on 6 is given by -1/2 X (e) (Wi Wi) 'vec Hi-1 i=l N avec A., +1/2 (a —a )'(Wi 0Wi) vec (Hi-l(gi - WiS)(gi - Wi'Hi-) (4.7) i=l Similarly, the information matrix is given by N vec A -vec A 1/2 ( a6 )'(Wi 0Wi)'(Hi- 0Hi-1)(Wi 0Wi)(-ac^). (4.8) i=l Unfortunately, there is no assurance that a scoring procedure using (4.7)(4.8) will give positive semidefinite estimates of A. This problem of "negative variance components" is one that has long been troublesome for random coefficient regression and other variance components models (see, for example, Swamy, 1971, pp. 107-111). A potential remedy for this problem is suggested

-19 - by the papers of Box (1966) and Dent-Hildreth (1977). Box shows that in certain specific cases, a problem of constrained maximization of the likelihood function can sometimes be converted to one involving unconstrained optimization without introducing additional local optima. We are investigating the transformation A = TT', where T is lower triangular, as a method of obtaining a positive semidefinite estimate of the A matrix. Unconstrained optimization is used, with the search being conducted over the elements of the T matrix rather than over the elements of A. Equations (4.7) and (4.8) need to be modified to take advantage of the A = TT' transformation. Letting t be the M x 1 vector of elements of T where A = TT', it can be shown that avecc A avecTT ' avecT avecT ' -vecA avecTT. = (T 0 I) vecT (I T) at, so that (Wi 0 Wi) sav = (WiT Wi) 8cT + (Wi 0 WiT) c st (4.9) While equations (4.7) and (4.8), in conjunction with (4.9), may appear to be formidable at first glance, our initial investigations indicate that the formulas are actually computationally efficient and relatively easy to program. avecT The matrix aeT, for example, is k2 x M, but each column of this matrix at contains only one 1, all other elements in the column being 0. Computational advantage can be had using this fact. And although we went to some trouble to obtain formulas (4.7)-(4.9), computational advantage may be obtained because the first-order conditions and the information matrix are given explicitly, rather than their having to be computed numerically. Also, in practice it may not be necessary to compute and invert the information matrix at each iteration (Rao, 1973, p. 370).

-20 - By way of summarizing this part of the paper, we note that even though the transformation A = TT' is not one to one, our limited investigations (at this time) show that this lack of uniqueness appears not to cause convergence problems, and that this transformation, with the method of scoring, may provide computationally efficient, positive semidefinite estimates of the A matrix of the RCR model. In Section Five we present some empirical results which were obtained using formulas (4.1) thru (4.9). At this point we briefly discuss another potential estimation method for (0, A, a). The equation (3.3) suggests an iterative procedure of estimating A by N (bic - b)(bic - b)'/N. i=l The procedure may at least be useful in providing a starting value for a maximum likelihood search.

-21 - 5. An Application: Predicting Systematic Risk for Electric Utilities In this section we return to the financial application of Section One. ThQ model we use here is oversimplified and is not intended as a meaningful test of financial hypotheses. Its purpose is to illustrate the procedures developed in the previous sections. We are pursuing a more comprehensive analysis of systematic risk in financial markets. However this simple illustration suggests that pooling is a promising approach. A sample was taken of 64 electric utilities that have data available on both the CRSP and COMPUSTAT data tapes over the period January 1976 thru December 1979. Estimation was done using the 24 monthly observations from 1976-77; forecasting was done over the period 1978-79. Debt ratio is a variable that is available from annual data. Because it typically changes very little over time periods as short as several years, we simply calculated Wi as the average of the two annual values 1976-77, for each electric utility in the sample. The procedures developed in the previous sections are applicable because all of the Xi matrices are singular. The Oii were estimated from the individual time series regressions. The identity matrix was used as an initial estimate of 'the A matrix, and b was calculated from the GLS formula (2.7). Scoring formulas (4.7)-(4.9) were then used to provide a new estimate of B, and an iterative procedure was continued until stable values to four decimal places were obtained for the estimates of a and A. The results are reported below: b = (0.6455, -0.0928)', with t-statistics of 1.64 and -0.10 respectively; and (611, 621, 622) = (0.3569, -0.0640, 0.1417).

-22 - Neither b coefficient is significant at the.05 level of significance. Thus, we are not able to conclude here that debt ratio is a determinant of systematic risk Yi, as the theoretical finance literature suggests. A relationship may still exist, however, because there is evidence of multicollinearity in our model. The condition number, the ratio of the largest singular value of the variance-covariance matrix of b to the smallest, is 40.9, and multicollinearty may cause problems in parameter estimation when the condition number is greater than 10-30, see Belsley, Kuh and Welsch (1980). It appears that there may not be enough cross-sectional variability of debt ratio in our sample. The main concern of this application is prediction of systematic risk Yi. We did a mean squared error study, using the following forecasting methods: 1. A firm-specific method in which the estimates gi of Yi over the period 1976-77 were used to predict gi calculated over the forecast period 1978-79. 2. A pooling method in which the forecasting formula (3.4), evaluated over the period 1976-77, was used to predict gi calculated from the period 1978-79. The results are reported below: Method 1, Root MSE: 0.2636, Method 2, Root MSE: 0.1857. These results are encouraging. In this application, the pooling approach has a root MSE that is 30% less than the firm-specific method which is the approach usually taken.

-23 - 6. Summary This paper is concerned with efficient inference in Swamy's random coefficient regression (RCR) model when there is singularity, i.e., multicollinearity in some or all of the individual time series regressions. In particular, efficient estimation of the parameters of the RCR model is considered, as is efficient prediction of the response Bi of the ith individual. Methods are developed first under the assumption of known variance-covariance parameters, and then with this assumption relaxed. We derive sufficient conditions for identifiability of all of the parameters of the RCR model. Then Fisher's method of scoring is developed for estimation of the A matrix parameters. A transformation, suggested by the papers of Box and Dent-Hildreth, is incorporated into the scoring procedure. Our initial investigations suggest that scoring with this transformation produces computationally efficient, positive semidefinite estimates of the A matrix. The problem of multicollinearity in the time series regressions is motivated by an application in finance, namely, estimating cost of capital to utilities. The model is introduced in Section One and some preliminary empirical results are given in Section Five. These results are encouraging; in this particular application, using pooling methods there was a 30% reduction in root mean squared forecast error compared to forecasts from the usual forecasting technique.

-24 - References 1. Amemiya, T. (1978), "A Note on a Random Coefficients Model," International Economic Review. 19, 793-796. 2. Belsley, D. A., Kuh, E. and Welsch, R. E. (1980), Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: John Wiley & Sons. 3. Bowman, R. G. (1979), "The Theoretical Relationship Between Systematic Risk and Financial (Accounting) Variables," Journal of Finance, 34, 617-630. 4. Box, M. J. (1966), "A Comparison of Several Current Optimization Methods, and the Use of Transformations in Constrained Problems," Computer Journal, 9, 67-77. 5. Dent, W. T. and Hildreth, C. (1977), "Maximum Likelihood Estimation in Random Coefficient Models," Journal of the American Statistical Association, 72, 69-72. 6. Dielman, T., Nantell, T. J. and Wright, R. L. (1980), "Price Effects of Stock Repurchasing: A Random Coefficient Regression Approach," Journal of Financial and Quantitative Analysis, 15, 171-189. 7. Dhrymes, P. J. (1978), Mathematics for Econometrics, New York: SpringerVerlag. 8. Goldberger, A. S. (1964), Econometric Theory, New York: John Wiley & Sons. 9. Hanushek, E. A. (1974), "Efficient Estimators for Regression Regression Coefficients," The American Statistician, 28, 66-67. 10. Kadiyala, K. R. and Oberhelman, D. (1979), "Response Predictions in Regressions on Panel Data," Proceedings of the American Statistical Association, Business and Economic Statistics Section, 92-97. 11. Magnus, J. R. (1978), "Maximum Likelihood Estimation of the GLS Model with Unknown Parameters in the Disturbance Covariance Matrix," Journal of Econometrics, 7, 281-312. 12. Myers, S. C. (1978), "On the Use of Modern Portfolio Theory in Public Utility Cases: Comment," Financial Management Association, 7, Number 3,. 66-76. 13. Rao, C. R. (1973), Linear Statistical Inference and Its Applications, Second Edition, New York, John Wiley & Sons. 14. Rosenberg, B. and McKibbon, W. (1973), "The Prediction of Systematic and Specific Risk in Common Stocks." Journal of Financial and Quantitative Analysis, 8, 317-334.

-25 -15. Rothenberg, T. J. (1971), "Identification in Parametric Models." Econometrica, 39, 577-591. 16. Swamy, P. A. V. B. (1971), Statistical Inference in Random Coefficient Regression Models, Berlin: Springer-Verlag. 17. Swamy, P. A. V. B. (1973), "Criteria, Constraints and Multicollinearity in Random Coefficient Regression Models," Annals of Economics and Social Measurement, 24, 429-450. 18. Tracy, D. S. and Dwyer, P. S. (1969), "Multivariate Maxima and Minima With Matrix Derivatives," Journal of the American Statistical Association, 64, 1576-1594. 19. Warga, A. D. (1980), "Risk Instability in the Electric Utility Industry," Unpublished Doctoral Dissertation, University of Michigan.