I RESEARCH SUPPORT UNIVERSITY OF MICHIGAN BUSINESS SCHOOL AUGUST 1997 HIERARCHICAL BAYES METHODS FOR MARKET MODEL ESTIMATION AND PORTFOLIO SELECTION WORKING PAPER #9712-16 BY MARTIN R. YOUNG UNIVERSITY OF MICHIGAN BUINS9A SCHOOL PETER J. LENK UNIVERSITY or MICHIGAN BUBINEEB SCHOOL

I Hierarchical Bayes Methods for Market Model Estimation and Portfolio Selection Martin R. Young Peter J. Lenk myoung@umich.edu plenk@umich.edu University of Michigan School of Business Department of Statistics and Management Science Ann Arbor, MI 48109-1234 August, 1997 Abstract The market model is an important construct for both portfolio managers and researchers in modern finance. For practitioners. market model coefficients are used to guide the construction of optimal portfolios. For academicians, the market model parameters play a fundamental role in explaining equilibrium asset prices and other market phenomena. This paper presents a hierarchical miodeling procedure which can substantially improve the accuracy of market mnodel parameter estimates, through incorporation of crosssectional information. It is shown that this improvement in parameter estimation accuracy translates into substantial improvement in portfolio performance. Expressions are derived which characterize the sensitivity of portfolio performance to parameter estimation error. Evidence with NYSE data suggests that the hierarchical estimation technique leads to superior out of-sample portfolio performance, when compared to alternative estimation approaches. KEY WORDS: Beta. Estimation Risk, MAarkov Chain MAlonte Carlo, Sensitivity, Shrinkage

I I Introduction Stochastic linear models for asset returns play a central role in modern financial theory and practice. Such models describe the return on each particular asset in a market as varying linearly with some exogenous factors: yjt = aj ++ f, j = 1,...,p, t = 1,..., n, where yjt is the return of stock j at time t, ft = (fti,.ftr)' are the values of the underlying factors at time t. aj is the intercept and /j = (il,..,/ijr)' are the factor coefficients for stock j, and cjy is a random deviation, independent of ft; with mean zero and variance vj. For example, the traditional single factor market model (e.g., Elton and Gruber 1995) describes the covariation of asset returns via the model: yj? = aj +~ 3jmt + 6jt, (1) where the single factor mt represents a market index such as the S&P 500 index which measures the general performance of the stock market. More recently, Fama and French (1995) modeled the covariation of stock returns via the three factor model yf -.ft = aj +,(m, - rff) +.S2j(SMLt - jft) + a3j(HMLt - rft) + e,, (2) where rft is the "risk free" rate of return, SAMLt ("Small Minus Large") is the return on a portfolio composed of the firms with smallest market value minus the return on a portfolio composed of firms with large market value, and HML, ("High Minus Low") is the return on a portfolio composed of the firms with a high ratio of book value to market value minus the return onl a portfolio composed of stocks with low values of this ratio. The parameter values (a,, i3j, vj) for the different assets can be used to select portfolios that achieve optimal tradeoffs between risk and return (Elton and Gruber 1995). In addition, these parameters appear as explanatory variables in financial models for asset 1

I pricing (e.g., Faina and MacBeth 1973: Fama and French 1992: Famna and French 1995). The parameter values (oaj, f vj) of course cannot be directly observed, but can only be imperfectly estimated from finite samples of data. This unavoidable estimation error presents a problem to both theorists and practitioners. The sensitivity of financial models to estimation error suggests two remedies: reducing estimation error: and correctly accounting for it. Vasicek (1973) noted in the context of a single factor model such as (1) that the estimate of a beta coefficient for a particular security could be improved by use of cross-sectional information. For example. if the betas of the stocks tend to range between 0.5 and 1.5, then a beta estimate of 2.0 is more likely to be an over- estimate than an under estimate. Thus. the estimate can possibly be improved by shrinking it towards the cross-sectional mean of 1.0. Barry (1973) developed a Bayesian approach to portfolio analysis. which includes the use of predictive distributions to account for parameter uncertainty. Rosenberg and James (1976) suggested that the incorporation of fundamental financial quantities such as firm size or liquidity - could lead to improved estimates of betas. More recently, Jorion (1986), Frost and Savarino (1986), and Board and Sutcliffe (1994) used Stein type shrinkage estimators to improve upon the usual least squares procedure for obtaining parameter estimates for the portfolio selection problem, and Karolyi (1992) used multiple shrinkage to help estimate betas. In this paper, we introduce a hierarchical market model and Bayesian estimation procedure which incorporate the above improvements to the simple least squares estimation procedure in a unified model. This hierarchical modeling procedure, which jointly models both the cross-sectional and the time -series variation in stock returns, has the desirable feature that the degree of adjustment of the usual least-squares estimates is determined automatically for each security, based on relevant attributes of the data. For securities whose parameter estimates are uncertain. either because the security has only been observed for a short period of timle. or because the security's returns do not closely 2

I fit the market model: the parameter estimates will be substantially modified toward a cross-sectional mean. which may be determined by fundamental characteristics of the firm. such as firm size or industry sector. For securities whose least squares parameter estimates are reliable, in the sense of having small standard errors. the estimates will be relatively unmodified. In addition. the paper explores the impact of estimation error on portfolio selection. We demonstrate that the improvement in estimation accuracy achieved by hierarchical Bayes estimation leads to improved portfolio selection. We develop analytic measures which characterize the sensitivity of optimal portfolio allocations with respect to parameter estimation error. One result obtained is that small idiosyncratic variance for a given asset indicates potentially large error in the estimated portfolio allocation for that asset. The paper has the following outline. Section 2 defines the hierarchical market model. Section 3 presents a simulation study of the hierarchical Bayesian (HB) parameter estimation and portfolio selection procedure. and in Section 4, historical data from the NYSE are analyzed by HB and alternative methods; with both the simulated and the real data. it is shown that the HB method out-performs the competitors, in terms of outof sample forecast accuracy and portfolio performance. Section 5 presents a sensitivity analysis. Appendix A describes the numerical algorithm used to implement the HB method. 3

2 The Hierarchical Market Model The linear factor model (3) relates the return yjt for asset or firm j at time t to the returns ft on a vector of economic factors by a simple linear regression: yjt = j +/3jf +e j =l...p, t-l....n, (3) where the ejt are independent residuals, which will be assumed to have a normal distribution with mean 0 and variance vi: N(0, vj). The hierarchical market model describes the cross sectional variation in the parameters cj. I3, and vj across the firms in the population. Here, the parameters (aj, 3j) are assumed to be related to covariates z' and z] according to the linear regression equations: aj =- O0z +ujo, ujo - N(0, Ao), (4) 3jk = kz +uy.k, u.jk r N(0, AAk.). j = 1,...,p, k = 1,...,r, (5) with Ujk mutually independent for k = 0, 1,...,r. The covariates z7 and z4 may contain such variables as firm size. leverage, and other accounting numbers, as well as indicator variables representing the industry segment for the jth firm. The firm specific variances vj may also be related to fundamental quantities of the firmn. Let z7. be a vector of fundamental variables, and let ry = log (vj). rj can be modeled as: 7T = /'z' + w j, wj - N(0, 6). (6) Equation (6) establishes the prior distribution on the unknown firm specific variance vj = exp(rj). 4

1 Equations (3) —(6) describe a hierarchical regression model (Lindley and Smith 1972). sometimes referred to as a population model (Wakefield, Smith, Racine-Poon. and Gelfand 1994): the multiple shrinkage estimator of George (1986) is a related approach. Wakefield. Smith, Racine-Poon, and Gelfand (1994) and Mueller and Rosner (1994) describe alternative models that could be used for the Gaussian hyper-prior distributions; these alternatives include a Student —t distribution, and a mixture of Gaussian distributions. As is discussed in section 3.2, the selection of optimal portfolios is based upon an estimate of the joint distribution of future returns {yjt}. To obtain this distribution, one must model both the conditional distribution of the returns yj. given the factor returns ft, and also the marginal distribution of the factor returns.. Here. this latter will be treated as a lnultivariate normal: f, N(f, Qf). (7) Equations (3) and (7) together imply that the joint moments for the returns Yjt are given by: E[yjt] = a +/3,, Var[yjt] =- PQf.3j + vj, Cov [yjt, kt] = 3' fIk (8) 2.1 Prior Distributions on Model Hyperparameters The parameters {Ok, Ak}, k = 0., r.. 7', and 6 in the prior distributions (4)- (6) are typically referred to as hyperparameters; to complete the specification of the Bayesian model. one will require prior distributions on these hyperparameters, as well as on the parameters uf and ~f which characterize the distribution of the independent variables in the market model (3). In this paper, we use non informative priors for all of these parameters, in order to allow the data, rather than the priors, to determine posterior 5

conclusions. The priors on the location parameters 0, ', and P-t are (improper) uniform distributions over all possible parameter values. The prior on [f-1 is also taken to be uniform. The prior on the Ak. and on 6 are taken to be IG(ao, al). where v - IG(ao, al) denotes that vzC'1 has the gamma distribution with mean ao/ai and variance ao/a'. In the applications described in this paper. we use ao = 1, a1 = 0.1, to provide proper, but very diffuse priors.1 The joint posterior distribution for the unknown model parameters in equations (3) - (7) cannot be evaluated analytically. Appendix A. though, describes how the model can be analyzed numerically, using a Markov chain Monte Carlo algorithm (Roberts and Smith 1993). Section 3 describes a simulation study designed to evaluate the performance of the hierarchical Bayes estimator. 3 Estimation Accuracy and Portfolio Performance This section evaluates the hierarchical Bayes (HB) and least squares (LS) estimators in terms of estimation accuracy and portfolio performance. We show in section 3.1 that the HB estimators have significantly smaller estimation error than the LS estimators, and in section 3.2 we show that this improvement in estimation accuracy leads to improved portfolio performance. In section 5, we develop analytic expressions characterizing the sensitivity of portfolio performance to estimation accuracy. 'Burger (1985, pagre 187) (disulsses the fact that the usual Jeffreys' non informative prior (cianllot be iuse'( for the scale parameters ill a hierarchical mIodel, since such priors can lead( to improper posterior distribultions. 6

I 3.1 Parameter Estimation Accuracy For the simulation study used in this section and the next, random datasets were generated from the hierarchical model defined by equations (3) (7). with r = 1, and with zj = z= zj = 1, j = 1,..,p. For each of p stocks. values of a, fl, and v were assigned by randomn generation from the laws given by (4)- (6). Then, for each time period t, a value of the (scalar) factor index was generated from a N(pf, f~) distribution, and for each stock j a value of the return yjt was generated by equation (3). This model corresponds to the standard single factor model. as in equation (1). Four different settings, or blocks, for model parameters were used in the simulation. The settings were selected to mimic typical monthly stock returns and to demonstrate the performance of HB and LS estimates with different sample sizes and amounts of parameter heterogeneity. Block 1 mimics two years (24 months) of monthly data; the cross -sectional means and standard deviations for a, 3, and r in this block were set equal to values estimated from a random sample of 500 New York Stock Exchange companies during 1988 1991; E[a] 0.00, SD[a] = 0.70. E[3] = 1.00, SD[/3] = 0.25, E[v] = 105, SD[v] = 175. Blocks 2 and 3 change the sample size, to 72 months and 12 months. In Block 4, the sample size is 24 months, but the cross - sectional parameter heterogeneity is increased by a factor of 4, relative to Block 1: SD[a] = 2.80, SD[,3] = 1.00. For each simulation block, the number of firms generated equalled 30, the mean and standard deviation of the market index ft were 1.0 and 4.0 respectively, and the cross-sectional correlations between a, 23, and r were equal to 0.0. 100 replications were simulated for each block. In the following; Blocks 1. 2. and 3 will be referred to as "NYSE(24)", "NYSE(72)", and "NYSE(12)", and Block 4 will be referred to as "HIGHHET(24)" (for "high heterogeneity"). Table 1 compares the mean absolute error (MAE) of the hierarchical Bayes (HB) and 7

I least squares (LS) estimates for the simulations. The table lists the mean and standard deviation of the MAE taken over all 100 simulation replications within a block. In the NYSE(24) block, the MAE's are substantially lower for the HB estimates: the differences are significant at p <.001 for all comparisons of HB and LS. using a non parametric sign test. The comparison for the alpha coefficient estimates is noteworthy over the 100 replications, the average HB error is less than 22% of the average LS error. The results from the NYSE(72) block show that even for a large sample size, HB offers substantial advantages relative to LS: average HB error is less than 31% of the average LS error. Also. the results from the HIGHHET(24) block show that, even with crosssectional parameter heterogeneity much higher than is seen in actual returns data, the HB estimator is considerably more efficient than the LS estimator. Researchers analyzing stock mnarket time series data face the problem of choosing an appropriate length of data for analysis. If the dataset is too long, then the parameters may not be constant over the entire period of observation, while if the dataset is too short, the data may be insufficient to accurately estimate model parameters. Here. it is shown that an HB estimation approach may help with this conflict. Comparing the estimation error of betas in the NYSE(72) and NYSE(12) blocks, it is seen that the average LS error with 72 time series observations. 0.21, is greater than the average HB error with only 12 time series observations, 0.19. Thus, HB may permit the efficient analysis of time series short enough that parameters may reasonably be assumed to be constant over the observation period.2 Thle hierarchical Bayes method makes some additional assumptions about the underlying data generating process which are not required with the least squares method. In this simulation experiment, the assumptions are not violated, but in practice, if the 2Alternatively, the HB framework can be }ise] to explicitly' model time —varying parameters, in the spirit of We\st and Harrison (1989). 8

assumptions are grossly incorrect, then the least squares estimator could out perform the shrinkage estimator in terms of estimation accuracy. Section 4 describes anl experiment with real stock returns data. and it is seen that. in this one setting with non simulated data. the HB method again substantially outperforms the least squares estimator. 3.2 Portfolio Performance Portfolio allocations w were chosen to maximize the expected utility, where the utility function was taken to be the negative exponential U(y) = 1 - exp(-Ay), with A being a parameter expressing aversion to risk (e.g.. Frost and Savarino 1986). In general, maximizing the expected utility will require numerical optimization, but in the case in which the distribution of future returns yjt is well approximated by a normal distribution, with cross sectional mean and variance pi and E, the distribution of any portfolio Wy -= z 1; Wjyjt will also be normal, and the allocation vector which maximizes the expected utility will be the solution to the Markowitz (1952) optimization problem max w'l - Aw'Ew, subject to w'l = 1, (9) where Af and S represent estimates of the joint moments. and 1 is a vector of 1's. The maximizer of (9) is given by w* = A- Si-4 + 1-A'x-S p~ -1. The least squares estimates of the moments p and S can be determined from the respective estimates of parameters (oj, /3,. v) and (.f ff), using the formulas (8). For Bayes methods, the appropriate variance matrix to use is the so-called predictive variance (Barry 1973); the predictive variance incorporates uncertainty about model parameters in addition to the usual sampling uncertainty about future observed returns. The method for obtaining the hierarchical Bayes predictive moments is described in Appendix A.3. Press (1982), and Quintana (1992) provide further discussion of the use of Bayesian techniques in portfolio 9

selection. The quality of an implied portfolio allocation vector w can be evaluated by the expected utility under the true model for the future returns. Let this true model be denoted by y ~ N(p/0, So). The true expected utility can be converted to a certainty equivalent (Elton and Gruber 1995). which is the return. in dollars, such that the investor wvill be indifferent between holding the risky portfolio w'y, and holding the certainty equivalent. In our case. the certainty equivalent C is found by solving 1 - exp(-AC) = E[1 - exp(-Aw'y)]. and the solution is C = w'o - 4w'S0w. The benchmark for the certainty equivalent is the value obtained when w is computed by solving problem (9) using the true moments I. 0So. The bottom rows of Table 1 display the quality of portfolios formed on the basis of the true mean and covariance, as well as the hierarchical Bayes and least squares estimates of these moments. For each simulation replication, the portfolios are obtained by solving the maximization problem (9). In all blocks, the performance of the HB portfolios, as measured by the mean certainty equivalent, is significantly better than that of the LS portfolios (p <.001). In the NYSE(24) block, the mean optimal certainty equivalent if the true model parameters were known was 1.29 (1.29% per month), while the the mean certainty equivalent for the portfolio based on HB estimates was 0.46. However, the mean certainty equivalent for the LS portfolios was 12.85. Thus, the deterioration in portfolio performance due to estimation error is much more severe for the LS than for the HB method. A similar pattern is observed in the other simulation blocks. [Insert Table 1 Here] 10

I 4 Example: N.Y.S.E. Stock Returns The previous section compared the performance of the hierarchical Bayesian method to the performance of a non-pooled estimator, ordinary least squares, using simulated data. This section reports the results of an experiment in which monthly returns data from the New York Stock Exchange were analyzed via hierarchical Bayesian methods, and an alternative shrinkage method. the multiple shrinkage estimator of Karolyi (1992). In this experiment, monthly returns data on 500 randomly selected securities in the Center for Research in Stock Prices (C.R.S.P.) database were obtained for 19 four year intervals: 1955 1959. 1957 1961,.... 1991-1994. For each four year interval, the data from the first two years were used to obtain parameter estimates for a linear factor model, along with the implied optimal portfolio allocations, using hierarchical Bayes and multiple shrinkage techniques. The parameter estimates were then compared to the ordinary least squares estimates obtained over the third and fourth years of the interval, to evaluate out -ofsample parameter estimation accuracy of the two methods. Also, the estimated portfolio weights were applied to the returns in the third and fourth years to assess the quality of the portfolio selections. The particular factor model used in the analyses was Yjt = ctj + ijrt. + i32jSMLt + et, p - N(0, v1), (10) where mi, the overall market factor, denotes the C.R.S.P. value weighted market index, and SMIL, the "size sensitivity" factor, represents the return on a portfolio composed of all the firms with smallest market value (lowest size decile in the C.R.S.P. dataset) minus the return on a portfolio composed of firms with large market value (highest size decile in the C.R.S.P. dataset). See Fama and French (1995) for further discussion of this factor model. 11

The multiple shrinkage method of Karolyi (1992) was originally introduced in order to obtain superior estimates of 2 coefficients in the single index model (1). The method produces an estimate /j for stock j which is a weighted average of, say, L + 1 different estimates, = j0 is just the ordinary least squares estimate. The other L estimates i. I = 1... 4.L are cross sectional group means for the OLS coefficients for example.,ijl might be the average OLS, estimate for all companies in the same size class as firm j, and 4j2 the average OLS 2 estimate for all companies in the same industry as firml j. The weights v are set equal to the relative precision of each of the estimates fl6: the weight vo on the OLS estimate is just the inverse of the sampling variance for the OLS estimator, and the weights vl, 1 - 1,..., L are the inverses of the cross sectional variances of the B estimates within the L classes. Chan and Chen (1988) suggests using firm size as an instrument for predicting market beta. For the data analysis in this section, the classes used for the multiple shrinkage estimator were the firm size (market value) decile, and the industry segment, where the different industry classifications used were "manufacturing" (SIC code e [2000,3999]), "utilities" (SIC e [4900.4999]), "finance/insurance" (SIC E [6000,6999]), "services" (SIC e [7000,8999]), and "other": the industries represented in the last class included agriculture, mining, construction, and retail trade. The multiple -shrinkage method of Karolyi was applied, using these classes, for obtaining pooled estimates for the 13 coefficients. the a coefficients, and the idiosyncratic variances v, in model (10). For the hierarchical Bayes method. the same information set - the market values and the industry classifications for the firms was used to construct the predictor covariates for the hierarchical prior distributions for ajy,.jk, and vj, as was used to form classes for the multiple shrinkage method. In this application. z'. zA, and z' were specified as z -z' = z; = Zj, where zj =(z'l, z2, z3zj4, 4z5) with zjl through Zj4 defined as indicator variables for manufacturing, utilities, finance/insurance, and services firms, 12

I respectively, and zja defined as the size decile for firm j. The Markov chain used in the hierarchical Bayes method was run for 2000 iterations, with the first 1000 samples discarded. For each estimation method. portfolio weights were selected as the solution to optimization problem (9), with risk aversion parameter A fixed at 10.0. The MS estimates of the moments p and E were determined from the estimates of parameters (oaj,j, vj) and (p/, Qf/) using equation (8). The calculation of the HB estimates of p and S is described in Appendix A.3. 4.1 Parameter Estimates Figure 1 plots the posterior means for the coefficients of z in the prior equations (4) (6), for the 19 two —year estimation periods between 1955 -1956 and 1991-1992. The signs for several of the coefficients are consistent over time. For example; the coefficient for Z2. the indicator for utilities, is consistently negative, both for predicting idiosyncratic variance (Figure 2(b)). and market beta (Figure 2(c)). The small market beta for utilities suggests that these firms have relatively little undiversifiable risk; the sensitivity analysis to be presented in section 5. however, demonstrates that the low idiosyncratic variance for utilities may render the firms highly sensitive to estimation risk. As expected, z5, firm market value, is highly related to the size sensitivity measure.2 (Figure 2(d)); it is also the case that larger firms have consistently lower market beta,di, and, as is suggested in Malkiel and Xu (1997), lower idiosyncratic risk v. 4.2 Prediction Accuracy and Portfolio Performance Table 2 describes the accuracy with which the accuracy with which the hierarchical Bayes (HB) and multiple shrinkage (MS) estimates from the leading two - year periods predict least squares 13

I estimates from the trailing, two year hold out periods. The table lists. for each of the methods. the mean and standard deviation, over the 19 analyses, of the mean absolute error (MAE), across all 500 firms, in predicting future least squares parameter values. The row in the table labeled "# HB" refers to the number of periods in which the HB method outperformed the alternative estimator: the p —value reported is from the associated binomial sign test. For example. the MAE for the HB estimator for the,/ coefficients was smaller than that for the MS estimator in 18 of the 19 analyses; this difference is statistically significant (p <.001). For the other model parameters, a and v. the HB estimator was more accurate as well. Table 2 also displays the out -of sample performance of the estimated portfolios, in terms of certainty equivalents. the risk adjusted measure of portfolio performance. It is seen that the HB method significantly outperforms the MS method in terms of portfolio performance: in 14 out of 19 periods, the HB portfolio had a higher risk adjusted return than the MS portfolio. The HB and the MS methods both employ shrinkage, and both make use of fundamental information on the firm to estimate model parameters. One possible reason that the HB method could outperform the MS method in some settings is that the MS formula does not correspond exactly to a Bayes estimate, in that the formula takes into account the precisions of the different estimates used to form the weighted average, but does not take into account the correlations between the estimates. While in the present setting the HB method out performed the MS method, one can expect that in many cases the two methods could have similar performance. [Insert Table 2 Here] 14

I 5 Sensitivity Analysis The previous sections demonstrated the considerable extent to which estimation error degrades portfolio performance. In this section, the relationship between miis estimated market model parameters and mis-estinmated portfolio weights is explored in greater detail. Sensitivity analysis for portfolio selection in the general mean-variance framework has been discussed in Best and Grauer (1991) and Chopra and Ziemnba (1993). In these studies, it was seen that portfolio selection is sensitive to errors in parameter estimates, and particularly to errors in estimates of the asset means. In this section, the sensitivity analysis is specialized to the case of the factor model for stock returns, in which setting the intercept parameters aj are critical in determining asset means. It is shown that. for data with similar characteristics to N.Y.S.E. stock returns: (a) the error in estimating w'. the optimal weight for stock j. is due largely to the error in estimating aj, and less to estimating ak, k i j5 and (b) the sampling variance of the estimated weight for stock j, wj;, is approximately inversely proportional to Qj. The findings suggest that the value v/j can serve as a useful diagnostic in portfolio optimization: a security with a high value of this quantity is "suspect", in the sense that the security's estimated portfolio weight is likely to be far from the true optimal weight. The sensitivity analysis derived in this section will be conducted with respect to the least squares estimator, though is appropriate asymptotically for the hierarchical Bayes estimator as well. Attention will be directed to the common special case in which there is just a single independent factor in model (3): Yjt =- a +.jff + ej,, ejt - N(0, vi); this restriction simplifies the necessary notation. In this setting, if the factor ft in 15

I equation (5) has been scaled to have zero mean and variance vAf. and if the cj are independent of ext for k $ j. then the sampling distributions of the model parameters {aj, fj, vjj =-1.... I.p} are all independent of each other, with sampling variances given by Varat&j] = vj/n. VarfA] = VJ/(nVAf). Var[j] = 2uvJ/n. It follows from the independence of the parameter estimates that the first order approximation to the sampling variances for the optimal portfolio weights w;, j = 1,....p, is given by Var( Pj )2 ar[t] + Z -( j)2. Var[3k] + '(72)2 Var[]. (11) k — ' k=l k =l Because of the special importance of estimating the asset means in portfolio analysis (Chopra and Ziemba (1993) find that "errors in means are about eleven times as important as errors in variances") attention will be focused on estimation of the a coefficients; the variance of u~j will thus be approximated by Var[zbQ] _ Zj>(|)2 * Var(]. It is shown in Appendix B that OL will tend to be more significant in magnitude than ~L. j $ i; further, the analysis in Appendix B and the computations below show that aw1 can be very accurately approximated by 1/Aj. Altogether, this leads to the simple approximation: Var[ib,] (oj )2. Var[&j]. (AfVj)2. Vj/n l/(A2rn) cx 11/j; i.e., the uncertainty about Wj is inversely related to the estimated idiosyncratic variance Qj. The simulation described in Section 3 helps to clarify the relationship between parameter estimation error and portfolio weight estimation error. Table 3 describes the correlations between the squared errors in estimating wj, and: (a) the squared errors in estimating aj (ERR(alpha)); (b) the squared errors in estimating f% (ERR(beta)); (c) the squared errors in estimating Vj (ERR(var)); (d) the squared errors in estimating aj, multiplied by the sensitivity 1/v2 (ERR-SENS(alpha)): and (e) the estimated variance of wjt, 1/j (EST-VAR(w)). The correlations are computed across the 30 securities for each simulation replication: Table 3 lists the average and standard deviation of these 16

I correlations, over the 100 simulation replications. The rows corresponding to ERR(alpha), ERR(beta) and ERR(var) confirm the central importance of estimating the a parameters in determining portfolio weights: the average correlation between mis estimation of a and mis estimation of w* is generally very high. The rows corresponding to ERRSENS(alphla) demonstrate the additional importance of the sensitivity factors 1/v as the mean correlations reported are even greater than those for ERR(alpha). In practice, ERR(alpha) cannot be observed, since its calculation depends onl the true, but unknown, value of a1. However, Table 3 shows that the observable quantity 1/0j (EST-VAR(w)) tends to be positively correlated with the error in estimating wu4: the measure 1/vj can thus serve as a diagnostic in assessing possible errors in portfolio allocation. In each of the blocks, the correlation of 1/vj with the exact value of f. presented in Appendix B, is over.95. Classical portfolio theory (e.g., Elton and Gruber 1995) has held that the idiosyncratic variance of a security, vj. is economically inconsequential, since idiosyncratic risks can, in principle. be diversified away. Malkiel and Xu (1997), though, show that idiosyncratic variance appears to be cross sectionally correlated with ex-post expected returns. This section has demonstrated a second respect in which idiosyncratic variance may be significant: the uncertainty of an estimate of optimal portfolio weight w, can in practice be quantified by the simple expression "~1. [Insert Table 3 Here] 6 Conclusion This paper advocates the use of hierarchical Bayes methods for estimating market model parameters and for selecting portfolios. These methods automatically and optimally use cross — sectional data to improve upon parameter estimates for each individual firm. 17

The hierarchical Bayes procedure incorporates parameter uncertainty into the estimate of predictive variance, thus allowing for rational managemlent of estimation risk. The improvement in estimation accuracy leads to improved portfolio performance. This paper shows through a sensitivity analysis that the optimal portfolio weights are most strongly affected by estimation error when the idiosyncratic variances are small. A Markov Chain Monte Carlo Estimation A.1 Gibbs Sampling Exact finite sample inferences on the parameters of the hierarchical regression model can be obtained using Gibbs sampling (Gelfand and Smith 1990; Miieller 1991). Gibbs sampling is a particular variant of the class of procedures known as "Markov chain Monte Carlo methods" (Roberts and Smith 1993), in which parameter vectors are randomly generated from a Markov chain whose stationary distribution is equal to the joint posterior distribution of the model parameters. A Gibbs sampler involves iterative resampling from the full conditional posterior distributions of all of the model parameters (Gelfand and Smith 1990). For the hierarchical market model of Section 2. all of the associated parameters can be sampled directly, except for the variances vj, and for these parameters a Metropolis Hastings step can be imbedded (Tierney 1994). The derivations of the conditional distributions follow from standard results in Bayesian analysis (Zellner 1971). Details are provided in the following sections. 18

A.2 The Full Conditional Posterior Distributions A.2.1 Generating pLf and If Given the observed factor returns ft. t = 1....,n. the conditional posterior n distribution of pf is N(Z ft/n. f~//n). and the conditional posterior for f~1- is t=1 77 W( EZ(ft - f)(ft - ft)', )j where V ~ W(T, d) denotes that V has the Wishart t=l probability density proportional to ITI/2IVI(d-r-)/2 exp(- Tr(TV)), and expected value dT-1. A.2.2 Generating aj Let yj = Yjt - /3ft. Then, conditional on /3j, and f. the quantity yj, is normally distributed with mean aj and variance vj. Since the prior distribution for cj is N(00z7 A0o) the conditional posterior for aj is N((Aoj1'Oz7 + nvy'*)(Ao1 + nvf-1)-1. (Ao1 + nvj1)-1), where 7 1 -n-1 t= Yt. A.2.3 Generating,j Now let X = ((ftk))1|~. and y = (yjt - c)i<t<n denote the regression data for the market model for stock j. Prom (5), the prior mean for i3j is 0'z9, and the prior variance is A = diag(Ak). The full conditional posterior distribution for 3j is then N(76, Ab.). where the posterior parameters are v; = (A-1 + v;'X'X)-'(A-10'z. + vj-X'y;). A. = (A-' +v f(X'X))-1. A.2.4 Generating vj and rj The conditional posterior of vj is not of a standard form. However, rj = log(v,) can be generated from its correct distribution through use of an imbedded Metropolis 19

chain (Hastings 1970: Tierney 1994). Let 7r(rj) denote the exact conditional posterior density for rj. and let l4o) be the valueiterations of after iterations of the Markov chain algorithm. In a Metropolis sampler. a new candidate value of Tj, Tr* is generated from some density f(r) which approximates the desired conditional posterior jr(T). With probability max{l, (r;()/7r) ()}. this value is accepted as the g + 1st sample of rj; otherwise. the old value is retained, and the Markov chain proceeds. The rj so generated will. asymptotically. have the correct distribution r(trj). By Bayes theoreim, the exact conditional posterior distribution 7r(rj) is proportional to the product of the likelihood and the prior for Tj; the likelihood of the data is proportional to exp(-nj I - +-e), where Sj =;(yt ~- oj -,3'f)2 and the prior t=1i for Tj is proportional to exp(-C( j - /z)2) The likelihood, as a function of j, can be approximated closely by a normal distribution by matching the modes, and the second derivatives at the modes. The resulting normal approximation has mean log(Sj/n) and variance 2/n. Given a normal prior and approximately normal likelihood, the conditional posterior of rTj is also approximately normal, with mean (a log(Sj/n) +6-l'z')/(n+ 6l'), and variance (L + 6h1)-'; this normal distribution forms a suitable generating density f for the Metropolis chain. In simulation studies, the normal approximation is seen to be very accurate. with the acceptance probability usually close to 1.0, and seldom less than 0.9. Given a sample of Tj, the corresponding value of vj is given by vj = exp(rj). A.2.5 Generating 0, A Let a denote the a coefficients for all securities, arrayed as a vector, and let Bk similarly denote the (3 coefficients associated with the kth factor. Let za be the set of fundamental variables related to a. and Z0 the corresponding set of variables related to /3,8. Then the full conditional distributions for 00 and 0,. k = 1,... r are N((Zn'Z)-lZata, Ao(Za'Z)-') and N((Z01Z3)- 'Z'Bk, Ak (Z,'Z)-1), respectively. The full conditional distributions for 20

I p A0 and Ak. k 1.... r. are IG(ao+p/2, a, + E(a - Ooz,)2/2) and IG(ao+p/2. a1 + E j(_-^ zf)2/). j=l A.2.6 Generating i, 6 Let r denote the log variances r for all securities, arrayed as a vector, and let Zt denote the set of fundamental variables related to r. The conditional posterior for W is N((ZT'ZT)-lZT'r, 6(ZT'ZT)-)., and the conditional posterior for 6 is IG(ao +p/2, a1 + p E (rzj -V'zj)2/2). j=1 A.3 Predictive Moments Let Y represent the future returns for the vector of assets under analysis, and let S denote the set of all model parameters. Then the predictive moments are defined as 4 = E([YJD] = E[E[YIS,D]j, E = Var[YID] = E[Var[Y|JS, j] + Var[E[YIS, D]]. The inner conditional expectations and variances E[YIS,D] and Var[YjSD] are determined by equation (8); thus. one can generate realizations of these conditional moments at each step of the Markov chain. The estimate of the predictive mean E[Yj|D] will then be the sample average of the generated values of E[YJS, D], and the estimate of the predictive variance will be the sample average of the generated values of Var[YJS, V] plus the sample variance of the generated values of E[YJS, D]. A.4 Initial Conditions Tlhe algorithm requires initial values for starting the Markov chain. Initial values for the parameters aj, fly, v; can be obtained by using ordinary least squares regression - i.e., the usual estimators for these parameters. Initial estimates for e, A, /0 and 6 can be 21

obtained by mnultivariate regression of these estimates of aj's I ls and rj 's versus the z9's. z' s and zj's. A.5 Markov Chain Monte Carlo Algorithm: A Summary The Gibbs sampling algorithm for generating samples from the posterior distribution of the model parameters can be summarized as follows: 1. Obtain preliminary estimates of the parameters aj, 13, vj, via ordinary LS regression. 2. Based on the initial estimates of &y, /3j, vj, obtain preliminary estimates of the parameters E) A. 4s, and 6, via LS regression. 3. Repeat for G Gibbs iterations: (a) Generate samples of factor moments I/f and Ftf. (b) For each security j = 1,..., p, generate samples of aj, 3j, mrj and Vj from their respective posterior conditional densities. (c) Given the new samples of aj, 3j, and rj generate samples of hyperparameters 0. A, 4 and 6, (d) Given the samples of pf, f!, and a1, ), vj i1, j...,p, compute values for E[Yis, D] and Var[Yls, D]. using equation (8). The estimate of a particular parameter, say Q3j, is obtained by its posterior mean: G E[/3j D] = _ Y /5Pg) where B is the number of initial samples discarded, and g=B+ I Mo cis the value of /3j generated in step 3b above during the gth iteration of the Markov clhain. The predictive moments E[YJV] and Var[YJD], are determined by the 22

G G formulas: E[YJD] -= -V Z ElYJs.D]()g. Var[YID] = ar[Y|s ])+ G,-B-1 G-B-1 2 Vr[YfsD](9)+ g=B+l g=B+l C c-B-1 E (E[Yjs,Z)D]c>-E[Y|D])2. where E[Yis, ]D(g) and Var[Yls, D](9 are the values g=B+l of the conditional moments generated in step 3d during the gth iteration of the Markov chain. B Sensitivities of Portfolio Weights Section 5 derived an approximation to the variance of the estimated portfolio weights, in terms of the derivatives of these weights with respect to model parameters (a, fi, v). In this section. the relevant derivatives are derived. Let a = (a,. )', =, v = (v,...,vp)', and V = diag(v). The vector of mean returns of the securities is: Li = a + fPA!,36 and the covariance matrix is: S = V + vAI/3/3'. The optimal portfolio weights from the previous section are: w* =-A-1(S-1- ), (12) where -' =V-1l-(+ ~ V-./3 )V1/33'V-' (13) The sensitivity of the optimal weights with respect to a is; 9w* ay 9 w*, i a~ - catr - (s_ 1i 'rS- )t (14) where &W, is a p x p matrix with (i,j)th element equal to On ~~~~~~~~~~~~~~Oaj 23

I It can be shown that: = (Av(1 [ () ( (15) Oai v ~ 1 + v~QI k / vk w = - (Avvjl 11) 1 (+ u-,/k ) for i (16) where u = E =1 v; = - EP= k/vk. and = u-1 EL i(A - )2/vk. The analysis of these results is facilitated by recognizing that the matrix S-1 is equivalent to a conditional covariance matrix. Let y be a multivariate normal random variable with mean 0 and covariance matrix AV-1-1. Then the conditional variance of y given its sum, l'y, is the same as aW Consequently., as given in Equation (15), must be positive, as it is the conditional variance of Yj given l'y. Although < in Equation (16) can be positive if either,fl or 2j is much greater than 3 while the other is much less than Q, for the typical values of the market risk it is usually negative. Moreover, 1 = 0. so that a-, = pL;j aw Thus. when Equation (16) is negative for all j $ i, w, tends to be more sensitive to estimation error in a1 than to estimation error in aj for j # i. Miatrix expressions for the partial derivatives of the optimal weights with respect to,B and v can be obtained as well, but they do not have simple interpretations. Formulas for these derivatives can be obtained from the authors. In practice, the exact values of a. /3 and v in equations (12) -(16) will be unknown, and so will be replaced by their respective parameter estimates. 24

References Barry, C. B. (1973). Portfolio analysis under uncertain means, variances, and covariances. Journal of Finance 29. 515- 522. Berger. J. (1985). Statistical Decision Theory and Bayesian Analysis. New York, NY: Springer Verlag. Best. M. J. and R. R. Grauer (1991). On the sensitivity of mean-variance-efficient portfolios to changes in asset means. Review of Financial Studies 4, 315 —342. Board, J. L. G. and C. M. S. Sutcliffe (1994). Estimation methods in portfolio selection and the effectiveness of short sales restrictions: U. K. evidence. Management Science 40, 516-534. Chan. K. C. and N.-F. Chen (1988). An unconditional asset-pricing test, and the role of firm size as an instrumental variable for risk. Journal of Finance 43. 309-325. Chopra, V. K. and W. T. Ziemba (1993). The effect of errors in means, variances. and covariances on optimal portfolio choice. The Journal of Portfolio Management Winter, 559 -582. Elton, E. J. and M. J. Gruber (1995). Modern Portfolio Theory and Investment Management (5th ed.). New York, NY: John Wiley and Sons. Fama. E. and K. R. French (1992). The cross section of expected stock returns. Journal of Finance 47, 427 —465. Fama, E. and K. R. French (1995). Explaining the cross section of expected stock returns. Journal of Finance 50, 427 465. Fama, E. and J. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. Journal of Political Economy 71. 607 636. 25

Frost, P. A. and J. E. Savarino (1986). An empirical Bayes approach to efficient portfolio selection. Journal of Financial and Quantitative Analysis 21, 293 305. Gelfand, A. E. and A. F. lv. Smith (1990). Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398-409. George. E. I. (1986). A formal Bayes multiple shrinkage estimator. Communications in Statistics, A 15. 2099 —2114. Hastings. W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97 109. Jorion, P. (1986). Bayes-Stein estimation for portfolio analysis. Journal of Financial and Quantitative Analysis 21, 279 292. Karolyi, G. A. (1992). Predicting risk: Some new generalizations. Management Science 38, 57-74. Lindley. D. V. and A. F. M. Smith (1972). Bayes estimates for the linear model. Journal of the Royal Statistical Society, Series B 34, 1- 41. Malkiel, B. G. and Y. Xu (1997). Risk and return revisited. Journal of Portfolio Management 23, 9- 14. Markowitz, H. M. (1952). Portfolio Selection. Journal of Finance 7, 77-91. Muiieller, P. (1991). A generic approach to posterior integration and Gibbs sampling. Technical Report 91 - 09, Department of Statistics, Purdue University. Miieller. P. and G. Rosner (1994). A semniparametric Bayesian population model with hierarchical mixture priors. Technical Report 94-17, Duke University Institute of Statistics & Decision Sciences. Press, S. J. (1982). Applied Multivariate Analysis. Melbourne, FL: Krieger Publishing Company. Inc. 26

Quintana, J. M. (1992). Optimal portfolios of forward currency contracts. In J. M. Bernardo, J. 0. Berger, A. P. Dawid, and A. F. M. Smith (Eds.), Bayesian Statistics 4: Proceedings of the Fourth Valencia International Meeting, pp. 753- 762. New York: Oxford University Press. Roberts, CG. 0. and A. F. Ml. Smith (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society, Series B 55, 3- 23. Rosenberg, B. and G. James (1976). Prediction of beta from investment fundamentals. Financial Analysts Journal 32. 60- 72. Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Annals of Statistics 4. 1701-1762. Vasicek, 0. (1973). A note on using cross- sectional information in Bayesian estimation of security betas. Journal of Finance 28, 1233- 1239. Wakefield, J. C. A. F. M. Smith, A. Racine-Poon, and A. E. Gelfand (1994). Bayesian analysis of linear and non-linear population models by using the Gibbs sampler. Applied Statistics 43. 201 —221. West, M. and J. Harrison (1989). Bayesian Forecasting and Dynamic Models. New York: Springer —Verlag. Zellner, A. (1971). An Introduction to Bayesian Inference in Econometrics. New York, NY: John Wiley and Sons. 27

Table 1: Estimation accuracy (mean absolute error. MIAE) and portfolio performance (certainty equivalent) of Hierarchical Bayes (HB) and Least Squares (LS) estimators on simulated data. 100 simulation replications per block. MEAN and SD denote mean and standard deviation over the 100 replications. NYSE(24) NYSE(72) NYSE(12) HIGHHET(24) _MEAN SD MEAN SD MEAN SD MEAN SD Alpha HB 0.33 0.09 0.27 0.04 0.42 0.20 0.90 0.16 MAE LS 1.54 0.36 0.88 0.18 2.17 0.47 1.56 0.37 Beta HB 0.17 0.03 0.13 0.02 0.19 0.03 0.30 0.07 MAE LS 0.38 0.10 0.21 0.04 0.55 0.17 0.38 0.11 Variance HB 22.45 7.46 14.49 6.24 32.31 12.50 23.86 8.40 MAE LS 24.25 8.29 14.75 6.32 35.23 13.06 25.35 9.08 Certainty TRUE 1.29 0.35 1.36 0.36 1.38 0.41 12.45 5.70 Equivalent HB 0.46 0.34 0.78 0.36 0.38 0.53 5.58 5.11 LS -12.85 4.70 -2.04 1.18 -46.98 24.02 -10.32 17.63 28

Table 2: Estimation accutracy (miean absolute error) and portfolio performance (certainty equivalent) of Hierarchical Bayes (HB) and Mlultiple Shrinkage (MIS) estimators in N.Y.S.E. cross validation study. 1955 1994. # HB denotes the number of periods, out of 19. in which the HB estiator outperformled the MS significance level (binomial sign test). estimator: P value is the associated MEAN SD # HB P value Alpha HB 1.57 0.30 18 <.0001 MS 1.64 0.33 Beta HB 0.41 0.07 18 <.0001 MS 0.43 0.07 Variance HB 2.33 0.54 11 0.18 MS 2.34 0.53 Certainty HB -49.58 30.56 14 0.01 Equivalent MS, -78.56 51.76 Table 3: potential standard Correlations between squared error of estimated portfolio weights, and the sources of this error. for simulated data. MEAN and SD denote mean and deviation of the correlations over the 100 simulation replications. CORRELATIONS NYSE(24) NYSE(72) NYSE(12) HIGHHET(24) ERR(alpha) MEAN 0.656 0.754 0.480 -0.018 SD 0.161 0.128 0.216 0.130 ERR(beeta) MEAN -0.035 0.011 -0.007 -0.130 SD 0.154 0.183 0.180 0.093 ERR(var) MEAN 0.112 0.008 0.171 -0.131 SD 0.199 0.203 0.223 0.071 ERR-SENS(alpha) MEAN 0.892 0.922 0.856 0.596 SD 0.095 0.075 0.167 0.279 EST-VAR(w) MEAN 0.377 0.247 0.512 0.514 SD 0.199 0.180 0.217 0.241 29

I Figure 1: Posterior means of hierarchical model coefficients. Manufacturing: - Utility:.... Finance: _ Service:.... Firm Size: 2 1.5 0.5 (c -0.5 -1 -1.5 -2 -2.5 (.8 0.0 0.4 0.2 -0.2 -0.4 -0.6 -0.8 -1 55 59 63 67 71 75 79 83 87 91 Year (a) Alphas 1 0.5 (} v -0.5 -1 -1.5 -2 5 0.3 0.2 0.1 0 -).1 -0.2 -0.3 -0.4 -0.5 -0.6 5 59 63 67 71 75 79 83 87 91 Year (bi) Variances i 55 59 63 67 71 75 79 83 87 91 Year (c) Market Betas 55 59 63 67 71 75 79 83 87 91 Year (d) Size Betas 30