I Research Support University of Michigan Business School February 1998 UNIFORM SCALE MIXTURE MODELS WITH APPLICATIONS TO BAYESIAN INFERENCE WORKING PAPER #98005 BY ZHAOHUI QIN UNIVERSITY OF MICHIGAN, DEPARTMENT OF STATISTICS PAUL DAMIEN UNIVERSITY OF MICHIGAN BUSINESS SCHOOL STEPHEN WALKER IMPERIAL COLLEGE, LONDON, ENGLAND

Uniform Scale Mixture Models With Applications to Bayesian Inference Zhaohui QinI Paul Damien2 Stephen Walker3 Department of Statistics, University of Michigan, Ann Arbor, MI. 48109 USA. 2 University of Michigan Business School, Ann Arbor, MI. 48109 USA. 3 Department of Mathematics Imperial College, 180 Queen's Gate, London SW7 2BZ, UK. SUMMARY We show that the scale mixture of uniform family generalises the scale mixture of normal family. We also show that inference for the former family is more straightforward and can easily accommodate both mean and variance regression functions. Illustrative analysis for the autocorrelatedheteroscedastic regression model is provided using data obtained from 14 US banks. Key words: Scale mixture of uniforms, Uniform distribution, Gibbs sampler, Autocorrelation, Heteroscedasticity, Truncated densities, Share price, Economic value added, Net operating profit, Market capitalization. 1 Introduction We start this paper by considering the scale mixture of normal family of distributions (Andrews and Mallows, 1974; Karim and Paruolo, 1996). As is 1

well known these provide generalisations to the normal family; for example, the exponential power and Student t distributions. Their practical use is in providing probability density functions with heavier tails compared to the normal, useful for modeling robustness (Box and Tiao, 1973; Choy and Smith, 1997). The heaviness of tails is measured by kurtosis. The normal density has zero kurtosis whereas a scale mixture of normal density can have positive kurtosis (leptokurtic). Consider the density function for X, defined on (-oo, +oo), given by fx () = JN(xI, a2X)r(A)dA, where N(p, cr2) denotes a normal density with mean p and variance c2 and r(.) is a density defined on (0, +oo). The exponential power arises when 7r(.) is an inverse positive stable density and the Student t arises when 7r(.) is an inverse gamma density. However, a surprise is why interest has focused on these two alone. After all, if E[A] = 1 and var[A] = r/3 then E[X] =, var[X]=a2 and /(X) =r, where c(X) denotes the kurtosis. Note then that the kurtosis must be positive. Our work is based on the fact that we can express the normal distribution as a scale mixture of a uniform distribution. Therefore, any distribution which has a scale mixture of normal representation, also has a scale mixture of uniform representation. Moreover, the scale mixture of uniform family includes densities with negative kurtosis (platykurtic). Here we present a result which does not appear to be widely known, at least if known, its relevance for making statistical inference. In the following G represents a gamma distribution and U a uniform distribution. We state without proof: Theorem 1. If XI[V = v],- U ( - aov/, i + oav') and V - G(3/2, 1/2) then X - N(p, a2). 2

2 Scale mixtures of uniform distributions We introduce the scale mixture of uniform family by considering the scale mixture of normal family and using the result of Theorem 1. Suppose then fx(x) = fN(xil, 2A)r(A)dA. We can write this in a three level model, given by XI[V = v ~ U (O- aV, p + a ), v ~ G (3/2, V/2) A~7,(.). We can combine the last two to give fv(V) - 2 J A3/4 exp (-vX//2) r(A)dA. The family of distributions we will be looking at in this paper has the representation: XI[V = v] " U (r - aVv, i + av), fv(.). Here we note that the Student t and exponential power distributions arise as scale mixtures of uniform distributions. The Student t, with ac degrees of freedom, has the representation Xle - N(O, a'2/),, G(a/2, a/2). We can write the first level as the scale mixture of a uniform: Xj[V = ] ~ U ( - 7, + AV), Vl " G(3/2, /2). We can then combine f(vj~) and f(C), integrate over t, to obtain the marginal distribution of V and hence obtain the scale mixture of a uniform representation: 3

I Theorem 2. If V has density given up to proportionality by fv(v) ocX (+ (cY + v)(a+3)/2 and Xl[V = v] - U (u - af, + aV), then X has a Student t distribution with mean u, scale parameter a and a degrees of freedom. We state the result for the exponential power distribution: Theorem 3. If V has density proportional to f(v) Occ Vl^ 12exp (_v1/r) so V1/ -, G(1 + r/2, 1), and Xl[V = v] u (- v-, P + V). then f - x 2/r) fx() (x exp - --- l where r E (0, 2]. This characterisation of the exponential power distribution appears to be more tractable than the alternative scale mixture of a normal characterisation (West, 1987), which is only valid for r E (1,2]. We can obtain an interesting result by combining the result of West with ours: Theorem 4 (West, 1987). If XA ~ N(0, A), here A denotes l/(variance), and f(A) oc AX1/2p11/(A), (1 < r < 2), where p,(.) denotes the density of the positive stable distribution with index a (0 < a < 1), then fx(x) cr exp(-IxI2T). We can now insert the uniform and gamma mixture to replace the normal, leading to the following 3 stage mixture: xl[v = v] U- (-f, +v) VIA G(3/2, A/2) 4

I and f (A) ca A-/2pl, (A) Combining the last two stages implies: v/lr-/2 exp(-uv/r) oc vL/2 A3/2 exp(-0.5Av)A-112pl/,(A)dA. =0 Therefore, Theorem 5. If VIA has the exponential distribution with mean 2/A and f(A) = pi/r(A), (1 < r < 2), then fv(v) cr v1' -lexp(-v1/'), a Weibull distribution. There does not seem to be any reason why we should just consider the Student t and exponential power distributions. Consider the general family of distributions: Xl[V = ] U ( - (V/,1, + aV V - fv. Then the following hold: E[X] = t, var[X] = a2E[V]/3 and = 9E[V2] 5[E(V)12 - So the mean and variance of V determine the variance and kurtosis of X. To obtain var[X] = a2 and i[X] = r we require E[V] = 3 and var[V] = 5r + 6. Note that we must have r > -6/5 which is the kurtosis for the uniform density. A particular distribution which satisfies these requirements is given by fv = G(9a,3a), where a = (5r + 6)-1* This new family of distributions has parameters (P, a, r), with mean #, variance r2 and kurtosis r. We recover the normal distribution when r = 0 (a = 1/6). The scale mixture of a uniform family coincides with the class of unimodal, symmetric distributions: Theorem 6. If fx is a unimodal, symmetric density about 0 and f>(x) exists for all x then fx(x) = 1/2 / fv(v)dv/v~, Jv>x* 5

I where fv(v)= -fx(V) Therefore, we can write XI[V = v] - U(-V/E, +/v) with V - fv, provided fv is a density on (0, oo). Note that - f,~=o fx(/v)dv = 1 which follows from /+oo f0 > /+VV 1 = / fx(x)dx = 1/2 -fx (V/)/v dx dv, and fv(v) > 0 iff fx(z) is unimodal. We have already seen that the scale mixture of uniform family generalises the scale mixture of normal family to include platykurtic shapes. In this section we demonstrate one other advantage of using the scale mixture of uniform family which is the easy analysis involving a variance regression model. Suppose we wish to model data on covariates Z and W, such that E[Xi] = Zip and log var[Xi] = 2WiO, where /3 = (,...,/ p) and 0 = (01,..,q). This is difficult to implement using the normal model, even within a Gibbs sampling strategy. However, for the uniform model it is analysis via the Gibbs sampler is remarkably easy. The model is given by Xi [Vi = vi] ^ U(Z/ - exp(W Vv9)Xf, ZiP + exp(WiO)vT) Vi -iid fv(.). The condition for the variance is satisfied provided we constrain E[V] = 3. For the implementation of a Gibbs sampler we require the full conditional distributions. We will concentrate on those for fk and Ok and the Vi. There will be a parameter associated with fv but the full conditional for this parameter will be based on the Vi being iid from fv and so should not pose any problem. If 7r(.) represent the priors, assumed independent, then the full conditional for 3k is f(klt c< r) (Pk)I(Pk E Ak), where Ak = (max; Z ito {Xi - exp(Wi0)v/ - -ik},mini;zio {Xi + exp(WiO)i - ik ) and -ik = ZEfk ZjPjI/ The full conditional for 9k is f(k...) o exp (- Wik] k) 7r(k)I(Ok G Bk), 6

I where Bk =(maxi:w,k>o { 0.5 log i - E WVilOi,mini:w,k<o { -0.5 log i- + E tl } ) \ k 10k where;i = (Xi - Zi,)2/Vi. Therefore, if priors are normal, then these full conditionals are simply truncated normals. The full conditional for V is fvy(vl.) oc exp(-v/2)I(v > (Xi - Zi/)2 exp(-2WiH)). These straightforward full conditionals mean that implementation of this model via the Gibbs sampler is 'automatic', in that no rejection and/or Metropolis steps need to be tuned each time a new data set is analysed. Therefore, we have provided the basis for routine mean/variance regression models. This could have dramatic impact for financial time series data which typically includes the model of volatility via variance regression; for example, the stochastic volatility model (Jacquier et al., 1995). 3 The autocorrelated-heteroscedastic model We now discuss the likelihood (expressed as a scale mixture of uniforms), prior, and posterior conditional distributions for the autoregressive-heteroscedastic model. In particular, the simplifications for the Gibbs sampler resulting from the scale mixture of uniforms representation is noted. Following Zellner (1987), chapter 4, we consider the following model. * Likelihood: Yit = a + pxit + fit, it = picit-1 + uit, i = 1,, N, t = to + 1,,T where u..r, N(O, cr2), these two equations is equivalent to Yit = PYiti-1 + I(Xt - pizit-1) + a(1 - pi) + uit 7

I ~ Scale Mixture: YlWt - '' - vI/2 t + 1/)/2) where pit = piyit-i +(xit-pixit-i)+a(l-pi), i = 1,., N, t = to+1,.*, T, where to is the initial time period. And VitlA G(3/2,A/2), where A =- -2 * Prior Distributions: 7r()~) - N(a, 2), 7r(p) -~N(l(, ), 7r(pi) N(i,, a2), 7r(A) ~ G(ax, i). * Posterior Conditional Distributions: i) Vt P(V|t la,,p At Y) = G(1, A/2)I(,_,,-,),l,.i = 1,. * N, and t = to + 1, T. ii) c P(crlY, V,, p, A) = r(aY) IJ P(Yt, VIt, N>,, 1a) it cc ir(a) JJ I- 112+Vi /2Yit) 8

Let ai = 1 - pi, bit = piyit-1 + 3(xsi - piit-1), i = 1,..., N and t to + 1..., T, such that p - V/2< < Y ia < + V1/2 is equivalent to Yit bit -.i Yt-bi + Vit max < Cb < mi - b+ V ai ai for all i and t. And the posterior distribution for a is truncated normal: [ Yit -b_ - } i-bit Y bt+ /2}' N cr,o')I max..;,min. Nic, L { - } m { - iii) p P(/lIY, V, a, p, A) = r(/P) H P(t, V,, pi, A1/) i,t C r() TI [._1/2 *+,1/2](Yit) i,t Let ct = (xit- pixitl), dit = piyit-l + r(1 - pi), i = 1,...,N and t= to + 1,..,T, such that /i- Vi/2 < Yt < t + V/2 is equivalent to ci t c it3 ma1~x T{t - dit t < B < min.11 -tdit + tl } for all i and t. The posterior distribution for/3 is truncated normal: ^ 2 2F ftXf~ - di-t y- V~.1' - dit + Vi N(p, o)I Jmax - c it,min n.t cit C it iv) pi, i=1,...,N P(pilY, V, a, /, ou) = 7(pi) i P(Yit, Vie,,, /, alp;) t cc r(Pj) 1 IJ hm,VM,+lV/2A(Yit) Let eit = (yit-i - xit-l - a), fit = a + P3xit, i = 1,...,N and t = to + 1,...T,,such that i -tti2 < Yi < i + i/2 is equivalent to Yu-fit- tVi t t fit + it ----— < _ < — eit eit 9

I for all i and t. And the posterior distribution for pi is truncated normal:;vt,,.,2 [ {ta-itl/2 } ~ - fit + V } pV 't _ fit _ T1/2 1i 2 N(pj, c4a)I max t,min - fit + vt eit eit v) A P(AIZ, V, a, /P, p) = n() lt P(Zit, Vt, /, 3, pilA) c 7i(A)) I, P(VlIA) c XA-l -xA3N(T-to)/2e- ZI t = G(ax + 3N(T - to)/2, P/ + EZ; Vt/2). Since most of the conditional distributions above are truncated normals, a new and simple way of sampling truncated normal distributions developed by Damien and Walker (1997) is described in the Appendix. 4 Data Analysis Simulated Data Analysis Two scenarios for the model developed in the previous section were studied: the explosive (p = 1.25) case; and the non-explosive (p =.50) case. Following Zellner, without loss of generality, we omitted the intercept term and set the slope parameter, 0/ to equal 3.0. The prior distributions for, and p, respectively, were Normal(5.0, 4) and Normal(0.8, 1.0). Under each scenario, we simulated 50 samples. The Gibbs sampler detailed in the text was implemented. To ensure that the final samples were not autocorrelated, we took every 100th iterate after executing the chain for 100,000 iterations, and which took approximately 1/2 an hour on a Sun Sparc station, using FORTRAN. Based on the 1000 samples, we estimated the posterior parameter estimates for / and p under the two scenarios; the standard deviations are given in parentheses. Non-explosive,i = 3.03(0.04); p = 0.56(.09). Explosive 3 = 2.98(0.08); p = 1.25(2.0E - 14). Bank Data Analysis In recent years, following the pioneer work of Stern & Stewart Company, the financial metric, Market Value Added (MVA), is being recognized as a robust indicator of the performance of a firm (Stewart, 1991, 1994; Uyemura 10

et al. 1991, 1996; Copeland et al. 1996). Based on the concept of Economic Value Added (EVA), a cash-based model founded on the economic theory of residual income, MVA has been recognized as one of the most useful measures to assess the performance of a company. A growing number of FORTUNE 500 firms and investment analysts now subscribe to the Stern Stewart Performance 1000 index. In companies such as Microsoft, General Electric, Coca-Cola, Philip-Morris, Whirlpool, Quaker Oats, Berkshire Hathaway, etc. senior management has made the pursuit of their MVA ranking as the primary goal; FORTUNE magazine now reports on the performance of companies using MVA as one of the financial indicators of a firm's viability; see, for example, Dempster (1997). What is MVA? The wealth of shareholders is maximized by maximizing the difference between a company's total value (TV) and the total capital (TC) investors have vested in it. Hence, MVA = TV- TC; TV is also known Market Capitalization (MC). MC is defined as the product of the number of outstanding shares and the share price of a firm. It is this financial measure that is of interest in our context; i.e., MC forms the dependent variable in our model; see, also, Chen et al. (1997). So what influences or drives MC (and whence MVA)? Clearly there are several macroeconomic factors (inflation, commodity prices, treasury bill return rates, foreign exchange rates, etc.) and microeconomic factors (price-toearnings ratios, corporate announcements, debt-level ratios, analyst recommendations, etc.) that will likely influence MC. In fact, Stewart (1994) notes 164 factors (or measures) that are likely to influence the MC of a firm. But he also goes on to note that only 5 to 10 factors are really used in practice; furthermore, all such factors can be collapsed into a single influencing factor - Economic Value Added (EVA) - that Stern Stewart have trademarked. What is EVA? EVA is simply a firm's expected performance discounted at the cost of capital. By definition, there is no unique method to calculating the EVA of a firm. The accounting protocols vary substantially among firms in an industry and across industries; see, for example, Imhoff et al. (1991), Uyemura et al (1996); Black et al. (1997 a,b). From a statistical perspective, EVA is positively correlated to MC (or MVA). It is evident that EVA from one time period will likely be affected by past EVA (autocorrelation); also. note there is substantial variability across firms within an industry, such as 11

I the banking industry; hence the econometrician will likely entertain a random effects model (heteroscedastic influences); see, for example, Zellner (1971). The EVA concept has usually been implemented as a management tool within an organization; see, for example, Baciadore et al (1998). Since projects that generate a larger EVA will lead to an increase in wealth, management can be awarded or reprimanded based on whether or not they have added or reduced value to the firm. Only recently, EVA is also being recognized as a marketing tool. This challenge was posed by Citicorp Securities to an MBA students team at the University of Michigan, which resulted in the research reported in Black et al. (1997 a, b). Larger banks provide services to customers of smaller banks for a fee. Before entering into a contract, the service provider would like to assess the feasibility of its client partner. Reversing this, a larger bank can market its services to smaller banks based on its assessment of the potential client's EVA. In this sense, EVA becomes a marketing tool. Also, EVA assessments can pave the way for mergers and takeovers. A general EVA model for all banking institutions was developed in Black et al. (1997 a, b). The details of financial accounting protocols need not detain us. On the other hand we will concern ourselves with the questions: given a firm's EVA and MC is it possible to develop a statistical model that will allow us to predict the firm's future MC? how can we factor prior knowledge to better forecast the future worth of a firm? can an iterative, and straightforward mathematical system be developed so that a bank can routinely update its data base and forecasts? As noted in Black et al. (1997 a b), obtaining data to calculate EVA for a given year is made substantially difficult by the inconsistency in the raw data needed to calculate EVA. Two sources (SNL and One Source) contain electronically stored bank profile data. Neither of the sources carry all past data needed to compute EVA. To verify whether these two sources of data could be combined for different years, for a specific bank, a consistency check must be carried out by comparing accounting information detailed in a firm's 10K report. The latter itself is a source of substantial variability for various reasons such as year-to-year changes in itemizing components in the balance sheet. The difficulty is enhanced the farther one goes back in the past. We originally targeted 26 banks that were representative in terms of the US banking industry. These banks had varying asset structures ("large" to "small") and somewhat different internal accounting protocols. After close analysis, only 14 banks' data proved to be "clean" and reasonably consistent 12

I for four years, 1993 to 1996. Thus the cross-sectional collection of 14 banks data over four years netted in 56 data points for MC and EVA. Fortunately, the 14 banks were representative of the US banking industry as a whole. Further, we dropped a "large" bank from this set of 14 banks. The motivation was to find out how well the model would predict these out-of-sample bank's market capitalization (or share price). We chose not to include any macroeconomic variables in the model based on the fact that the US economy was fairly stable with respect to the banking industry during this time period. We refrained from including any microeconomic variables in the model directly mainly because the bank sponsoring the research was keen on studying the effect of only EVA on MC. Also, we reached an agreement that this would not be a one-off study; both the data-base and the statistical models would be updated periodically. Also, somewhat to our surprise, we found statistical practice in corporate finance to be somewhat archaic. We felt that suggesting more sophisticated methods at the outset may be off-putting to our customer. Instead, we were able to make a compelling argument in favor of increasing the sophistication in stages by demonstrating the value in using - to borrow a phrase of Arnold Zellner - sophisticatedly simple statistical procedures. Prior parameters Based on discussions with the bank's analysts, and the segment of data that was "messy" we entertained the following prior choices for the parameters in the model developed in the previous section. ca ^ N(7.86e9, (3.0d8)2);, - N(3.5e3, (8.0e2)2); and A Gamma(5.0e - 22, 0.1). Having classified the banks into "large" and "small" groups, based on their asset base, resulted in Plarge N(1.873,.70) and pmat - N(0.968,.70). Posterior Estimates As an illustration, the posterior distributions, along with summary statistics, for fi and the p parameters corresponding to the "largest" and "smallest" bank are given in Figure 1. The sponsoring bank was interested in forecasting the market capitalization (or stock price) of another institution in its peer group. We used the p (see, Figure 1) corresponding to the largest bank in the pool of 13 banks as a proxy, and the EVA of the out-of-sample bank to forecast the latter's stock price for the years 1994-1996; these predictions appear in Figure 2. 13

1 The posterior mean values from these predictions when compared to the actual values appear to be reasonable. We note here that these predictions will likely be more robust as more (clean) data is included in the analysis. Also, factoring in other variables would also help better assess the financial viability of a banking institution over time. The sponsoring bank, as noted earlier, have embarked on these as part of a long-term strategy within their econometrics unit. 5 Discussion The scale mixture of uniform family appears to be a very useful way to encapsulate more realistic assumptions (such as heavy tailed behaviour) in a variety of modeling contexts. In this paper, theory pertaining to this new family was developed. A Bayesian illustrative analysis within the context of modeling market value of financial institutions was provided using data obtained from 14 banks. Further extensions of the general idea presented in this paper, in a variety of economic contexts, will be reported elsewhere. Acknowledgments This research was supported by a fellowship from IBM Canada, and an EPSRC Realising Our Potential Award. Thanks to Andre McKoy, VicePresident of Securities, at Citicorp, Inc. for sponsoring and participating in the University of Michigan Business School MAP project, which led to this research. References Andrews, D.F. and Mallows, C.L. (1974). Scale mixtures of normal distributions, Journal of the Royal Statistical Society, Series B 36, 99-102. Baciadore, J., Boquist, J.A., Milbourn, T.T., and Thakor, A.V. (1996). A search for the best financial performance measure. Journal of Applied Corporate Finance. To appear. Black, F., Martinez, S., Ou, C., Reinhard, S., Rodriquez, A., Savage, T.W.E., 14

I and Wilton, S. (1997). Economic value added model for banks, Multidisciplinary Action Project Report, The University of Michigan Business School, Ann Arbor. Black, F., Martinez, S., Ou, C., Reinhard, S., Rodriquez, A., Savage, T.W.E., Wilton, S., and Damien, P. (1997). Economic value added model to assess bank performance. Submitted for Publication. Box, G.E.P. and Tiao, G.C. (1973). Bayesian inference in statistical analysis. Addison-Wesley, Massachusetts. Chen, T., Magnus, E., Marks, A. and Damien, P. (1997). Statistical insights into the performance of EVA. Submitted for publication. Choy, S.T.B. and Smith, A.F.M. (1997). On robust analysis of a normal location parameter. Journal of the Royal Statistical Society, Series B 59, 463-474. Copeland, T., Koller, T., and Murrin, J. (1996). Valuation, Measuring and Managing the Value of of Companies. John Wiley & Sons, Inc. New York. Damien, P., Wakefield, J.C. and Walker, S.G. (1997). Gibbs sampling for Bayesian nonconjugate models using auxilliary variables, Journal of the Royal Statistical Society, B. To appear. Damien, P. and Walker, S.G. (1997). Sampling truncated normal, gamma and beta densities. Revised for Statistics and Computing. Dempster, M. (1997). EVA: get it while its hot. Monroe Street Journal. October 20, 1997, page 17. University of Michigan Business School Publication. Devroye, L. (1986). Non-uniform Random Variate Generation. SpringerVerlag, New York. George, E.I. and McCulloch, R. (1993). Variable selection via Gibbs sampling, Journal of the American Statistical Association, 88, 881-889. Imhoff, E.A., Lipe, R.C., and Wright, D.W. (1991). Operating leases: im 15

pact of constructive capitalization, Accounting Horizons, March issue. Jacquier, E., Poison, N.G., and Rossi, P.E. (1994). Bayesian analysis of stochastic volatility models. Journal of Business and Economic Statistics, 12, 371-389. Karim, A.M. and Paruolo, P. (1996). Two mixed normal densities from cointegration analysis. Econometrica, 65, 671-680. Smith, A.F.M. and Roberts, G.O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion), Journal of the Royal Statistical Society, B 34, 1-41. Stewart, G.B. (1991). The Quest for value. Harper Collins, New York. Stewart, G.B. (1994). EVA: fact and fantasy. Journal of Applied Corporate Finance, 7, 2, 71-84. Uyemura, D.G. and Kantor, C. C. (1997). Mind & Muscle, Banking Strategies, Volume 73, January/February issue, 57-59. Uyemura, D.G., Kantor, C.C., Pettit, J.M., and Stern & Stewart Co. (1992). EVA for banks: value creation, risk management, and profitability measurement, Journal of Applied Corporate Finance, 9. Zaik, E., Walter, J., Kelling, G. and James, C. (1996). RAROC at Bank of America: from theory to practice, Journal of Applied Corporate Finance, 9. West, M. (1987). On scale mixtures of normal distributions. Biometrika 74, 646-648. Zellner, A. (1987). An Introduction to Bayesian inference in Econometrics. Robert E. Kreiger Publishers, Florida. 16

I Appendix While the focus of this paper was on the use of EVA to predict MC, it may be desirable to include macroeconomic variables in the model formulation over time as the "clean" data base gets updated. This would then lead to posterior conditional distributions that will be proportional to a truncated multivariate normal. Sampling from such a distribution is accomplished as follows; see, Damien & Walker (1997). fx1,...,x(l, x,,X) oc exp (-1/2(x:- )'E-(x - i)) I (x E A), where we assume that the bounds for xi given x-i are available and given by, say, (ai, bi). Therefore fxilX_,(xix-i) c exp (-1/2(xi - vi)2/c) I(x, E (ai, b)), for i = 1,,p, are the full conditionals, and vi = i - jti(xj - ui)eij/eii and a? = 1/eii, where eij is the ijth element of -1. However, since we are already in a Gibbs sampler it seems appropriate to implement the auxiliary variable idea (Damien et al. 1997). We do not need to introduce p latent variables, one is sufficient. We define the joint density of (XI, *, Xp, Y) by fx,.x —,Xpy(,.X x, y) c exp(-y/2)I (y > (x - p)'-(x - _ )) I (x E A). The full conditional distributions are given by fx ilxi,Y( xizi, y) cx I (xi Ai), where Ai = (a, bi)n B, and Bi is the set {xjix-i: (x - /z)' l- (x - #) < y} and so the bounds for Bi are obtained by solving a quadratic equation. The full conditional for YjX is clearly a truncated exponential distribution which can be sampled using the cdf inversion technique. Therefore we have a Gibbs sampler which runs on p + 1 full conditionals which can all be sampled directly using uniform variates. 17

I In the context of the analyses in this paper, the above algorithm reduces to the following special case of sampling a univariate truncated normal. Suppose we wish to sample from the density given by fx(x) oc exp (-x2/2) I(x E (a, b)). Again, we introduce the latent variable Y which has joint density with X given by fx,y(x, ) oa I(o,,xp(-_2/2))(y)I(x e (a,b)), leading to the new full conditionals: YI(X = x) U(O,exp(-x2/2)) XI(Y = y) - U (max (a, -v-2ioy}, in {b, -Y2 log}). The algorithm extends the Gibbs loop by one more full conditional, which is a uniform distribution. The new full conditional for X is also uniform. There are alternative ways to sample truncated normal distributions; see, for example, Devroye (1986). However, typically, these methods require rejection and "tuning"; these factors, of course, depend on the likelihood function so that each time the database is updated maximization routines will have to be engineered. The algorithm proposed above bypasses this difficulty. The statistician implementing the model in a bank simply runs the Gibbs sampler like a "black-box" method after merely updating the prior parameters. This obviates specialized training needed to search for dominating densities, maximizing functions, or worry about "tuning" each time the likelihood/prior gets updated. Of course, alternative methods may be more efficient, but we found that simplicity in coding and execution was the preferred choice. In any case, Damien and Walker (1997) demonstrate that the algorithm discussed above is at least as efficient as other methods. Yet another reason for preferring a Gibbs sampler that involves no rejection sampling and/or Metropolis-Hastings detours is for the reason given in the first paragraph in this Appendix. We noted in the text that it is very unlikely that the number of independent variables, in this context, will ever exceed six or seven. Model selection in such cases is easily done within the Gibbs loop using the method of George and McCulloch (1992). These authors' method 18

is very elegant when the number of dimensions are few. Combining their idea with the method in Damien et al. (1997) will result in an almost "black-box" implementation strategy within a bank's statistical services unit. 19

I a 0 cL 0 04 0 LO in CM 0 o o 0 0 0 J (=) I h:~:4 +4 6.5 7.0 7.5 8.0 8.5 9.0 mean=7.85x10A9, std. dev=3.06x10A8 1000 2000 3000 4000 5000 6000 mean=3748, std. dev=698 p-large p -small 0 0 - 0i LO CS 0 0O Ci 0 o 0 0, LO 0 _ -Is -1 0 1 2 3 -0.5 0.0 0.5 1.0 1.5 2.0 mean=1.32, std. dev=0.54 mean=0.53, std. dev=0.37 Figure 1: Posterior distributions of c, /3 and p. 20

204o ' ' 4 60 5o I D 10. 10 ' 240 24 0 20 4X0 60o 1994 1995 l og9 Figure 2: Posterior predictive distributions of a "large" bank. 1994: Actual Price: 41.38; Predictive mean=50.4, std. dev=12.6, 1995: Actual Price: 67.25; Predictive mean=72.0, std. dev=34.3, 1996: Actual Price: 103.0; Predictive mean=103.1, std. dev=85.3. S~~~~~~~~~~~~~s 1996: Actual Price: 103.0; Predictive mean=103.1, std. dev=85.3. 21