RmB it W WKS, ADS LItARY RESEARCH SUPPORT UNIVERSITY OF MICHIGAN BUSINESS SCHOOL DECEMBER 1996 SAMPLING A DIRCECHLCT PROCESs MIXTURE MODEL WORKING PAPER #9612-39 STEPHEN WALKER IMPERIAL COLLEGE, LONDON, UK AND PAUL DAMIEN UNIVERSITY OF' MICHIGAN BUSINESS SCHOOL

I Sampling a Dirichlet Process Mixture Model Stephen Walkerl and Paul Damien2 Department of Mathematics. Imperial College. 180 Queen's Gate. London SWT 2BZ 2 D -?artment of Statistics and Management Science. Schoc1 of Business. University of Michigan. Ann Arbor, 48109-1234. USA. SUMMARY We introduce a new method for sampling the so called mixture of Dirichlet process MIDP) model which has recently received a great deal of attention in the literature (Bush & MacEarchen. 1996: Mueller et al. 1996). The solution is based on the introduction of strategic latent variables to facilitate the inplerentation of a Gibbs sampler in which all full conditional distributions are of known types and can be sampled directly. Keywords: Latent variables, Gibbs sampler, Dirichlet process. 1 Introduction This paper is concerned with the following general hierarchical model involving the Dirichlet process which is commonly known as the MDP model: tl|,i '- f(,.;0 i), i=...., n, 1

ll.....O,,G w -liid C; -- D(a Go), The notation is taken from the paper of 5Mueller et al. (1996). Briefly tle protbability model assumes z; given 0; (i = L....n) is an olservation from the known distribution f(.: 0;). Here D> represents a Dirichlet process (Ferguson. 19)73) and G is a discrete random distribution (taken from the Dirichlet process) conditional on which the {fi} are independent and identically cistributed (iid). The parameters of the Dirichlet process are the distribution Go which centers the process in that EC; = Go and a > 0 which loosely represents the.trength of belief in Go. There is now a substantial amount of literature involving this model (Escobar. 1994: Escobar & West, 19995: MacEarchen. 1994; WVest et al.. 1994: fMacEarchen t Mueller. 1994: Bush & MacEarchen. 1996: Mlueller et at., 1996). Inference is performed using Markov chain Monte Carlo ((CMC'() methods (Tierney. 1994) and in particular the Gibbs sampler (Smitht and Roberts. 1!993). This requires sampling sequentially from the full conditional distl i-tions. For us (at least for the moment) there is no loss of generality in assuming that both a and Go are fixed. The full conditional distribution for O, poses the problem for the Gibbs sampler (usually the others are of known types). This distribution is given. up to a constant of proportionality, by f(oi) c f(z- l0){aGo(0j) + E 6, (i)} where 6d(.) is the measure putting mass 1 at 0. Here we will always use a * to represent a conditional distribution. If q,0 = a ff(:iO0i)dGo(0i) is analytically tractable then f(,) can be sampled directly. That is. Oi is taken from the distribution proportional to f(z{iJO)Go(Oi) with probability proportional to qio or is taken to be O (j j i) with probability proportional to,/t:. = (yiu.;). The only difficulties with implementing a Gibbs sampler for the MDP model are when the integral f f(,lj0,)dGo(0O) is intractable and hence qo is not available. According to MacEarchen & Mueller (1994); Evaluation of the integral expression for q,o is non-trivial unless Go(O) and f(.I!) are a conjugate pair. Current implementations C)

therefore either use a conjugate model or rely on approximate conpuittatiolns. ()verconiing this computational hurdle is illiportant because of the wide range of current and potential applications of M[DP models. and the need in most applications to leave the conjugate framework..An excellent review of these current approximate computations is foind in the MiacEarchen &- Mueller paper. Tley go on to,ay: If the base listribution JGo(0) and the likelihood f(.l0) are chosen as a conjugate pair. then all distributions can be efficiently generated from and no complications arise. If. however, Go(O) is not conjugate with f(.l0) then resampling the configuration becomes difficult, as the integral qo may be computationally intensive. The method of MacEarchen & Mueller to solving the nonconjugate model relies on the introduction of a latent model and is described in Section:3 of their paper. We omit the details of their algorithm. In the next Section we introduce a latent model which runs on a Gibbs sampler in which all the full conditional distributions are of known tvpes and can be sampled directly. Essentially we are introducing latent variables w hich mean that the q,os -can be evaluated. 2 A New Latent MDP Model Recall the problem is sampling from the distribution given by fO(0) Cx l(0i){Go(Oi) + E S, (i)}. where we have written f(zj01) as 1(0 ). Here we introduce the latent variable u, and construct the joint density with 0, given. up to a constant of proportionality, by f((OJ. ut) -x I (ua < 1(0)) {caGo(,0) + E6.() - [ J^ )i~ 3

I Vwhere J(. represents an indicator funlction. ('learly tile marginal distribution ftr 0 is.f/'.. Note that the full posterior distribution for (0. ut is given )by f'I. u) x n { I (i < 1(60)) (aCto(; ) + ^I i()) }. Therefore the Gibbs sampler now runs over the a(ldlitional full conditionals oiven by f'nu, i0,) = ' (0O. l()). where ('(a. 6) is the uniform distribution on the interval (c. b). and the full conditional for 0i is now in the more friendly form. f'(V;lu;) x ctGo(Oi)I(1(0) > t) + 1 6,(0,). I{6j)>u, The full conditional for u; is obviously trivial to sample. The second should not pose any problem either. it is merely sampling from Go restricted to a particular set. Example 1 Here we consider a one-dimensional nonconjugate normal/uniform NIDP model. Then 1(0,) = exp{-0.5(z - 0,)2/2} and Go = U-(ttL. p) and.f0i \iiu) x tH —pL) 1 (max{lL, a,,} < Oi < min{/H.b,,})+ S,( ts0). 1(,})>u, where ai = - - ac/-2logui and b,,i = - + oa/-2logui (note that ui < 1). Tlhus 0, is taken from the uniform distribution on the interval (max{pLL, au}, min{ptH. bu} ) with probability proportional to q-0 = a(pH - L)-'(rnin{p.. bui} - max{g. ai^}). or is equal to Oj: (Oj) > ai with probability proportional to 1. Note therefore that if {Oj: 1(Oj) > u,} = 0 then 0i is automatically taken from the tniform distribution. 4

For the general nonconjulgate model let.4,, = {0,: /(0) >,;} so w\e 'anl.write."(0 i)itt -) (i? > i aro(.-r,,, j) + il, >, 1 wt'erc;r, I;).-,, i is the normalised Go(Oi)li6;.tl,,. Tlierefore we a\ke O from (Li, }i..-,,,) w itl probability aCGo(A-.i ) + Ei(o)>, i or take 0, = 0b,: 1. > u,) with probability l/(C(o(.4,,) + Z;(i>,1 1). Note that the 'neI ' q, o = aoGo(.4l,). Exam nple 2 Here we consider a binomial model with logit link. that is. -. jL "- binomial(p/. it ) and logitpi = 0.. Therlefore exp(Oi.-) f (z,,) x - (1 + exp(0i))", For this model we introduce the latent variables (u,.c,) so that the joint posterior with 0, is given by.fN i. ui.( ) x {e-"'U'-"-"'-'- (ui > log{l + -' }. ti > log{1 + 0}) CG((O }. where Go(0,) =.V'(0, l. 2) is the prior. Clearly the marginal posterior for 0, is as required. With the inclusion of the Dirichlet prior the full conditional for 0, becomes f Oi ii) c I(a u; < < b) 4 aco(01) + E, ( ) t it J where a,, = -log(eGt - 1) and bi = log(e"t - 1). Therefore we take 0, = -: a, <i < Oj < b or from a truncated normal distribution according to easily available probabilities noting that qio = f(,,ul,,.) (01,.1T)dO,. Complications can arise when 0, is p-dimensional (p > 1) since obtaining 0

a p-dlirlenrsionial A,,, may be difficult. A solution to this prol)lemn is to con>idlter f. = I 1,....., ) wilere p i- the dimensioI of each Oi. Tle strategy is to replace t lhe salnpling from f '(fJilu; by sampling fronm /f'[(ift ).,. u, ). for k =..... Here a,, Go(OiA.A ) + D(.,>u, '; (. ),JIcGo(O(-C)i. Au ) + EZt(,)>,i, 0-k)( 0 (-! where,,, = aG;o(A,,,) and Go(t- c),. Ati) = f Go(O;, -4,i)dk,, E.ra mple 3 Here w\e consi(ler a probit model for which I.3, - binomial( ( 30 + 31tz,). n,). i =..... n. where k(.) represents the standard normal distribution function. The full conditional distribution for 3i is given by 3 1) x {(.3o + 311z,) (1 (- (30oi + 3izii)) 't} 4 aGo3 ) E 3, I (3) and we assume G(( 3,) = -V( 3j;t. ). We introduce the latent variables (u,. t' such that their joint density with 3i is given, up to a constant of proportionalit-. by.f'()3, a,. t) x I {ai < (:30o, + j3iz)'.. ti < {t - P(30i + 3iz:)}'"i) } x (Go( 3i) + E (31). The full conditional for 3o; is given by f3o, I3L t.u,.u,) cx (a, < 3o < 6i) {aC.V(.3i7.o') + 3,(3o i)} cL~($;I ~l~aSJ I =31, and the full conditional for 31i is given by.1f'(,.3l L,. tci) -x I(ci < j3 < d,) {ta.(3,Jta) +,, (3i) }, Jo0 3 o, 6

w\here t,, f i= >((,- oz,., = 'l\ -t,- and,/. =-I ( ' I \.i-,I!/'-. with,, I,- t 1 and, 1 - 1' O l lol oll1)dificrtiions iare required if. = 0. y, = I, or, = 0. 2 P1n i-Proviled w\e can implemenlt the algorithm for the parametric model -.It, -, f. 0, ) tithl 0, "ii(l (Go using the latent variable idea then we can also inmplelment it tor the corresponding MNDP model. Damien and Walker (1996) show tlhat >nch an algorithm is applicable for a large class of nonconjugate itodels. lThis is detailed in the following Theorem whose proof is omitted since it is similar to the theorem appearing in Damien and Walker (1996) from which further details and examples can be found. THEOREM. If 1(0) = I k(0). k=t lhtcerf th6. l.(O) are non#negatite ' inrertible functions, that i.s. if l.(O) > iu Ih/en it i.. possible to obtain the.set.ku = (0: lk(8) > a}. then it is posslble to lmpluz.eR nt a Cibbs sampler for generating random variates fromi the AIDP mIodel in thich all the fill conditionals are of knoun types. Note that the 'new' qPo = ctGo(n: [.,)u j. 3 Numerical Example We use the same example as that used by MacEarchen at al. (1996) who clemonstrate their "next generation" sequential importance sampling technique in that paper. The data comes from Beckett and Diaconis (1994) and consi.ts of:320 binomial experiments where 320 thumbtacks were flicked nine times each. The data for each experiment was the number of times the tack landed point up. That is. for i =.....320. 3 i0lpi ~ binomial(p;.9). Therefore taking Go = ((0. 1) and introducing latent variables (ui. ui) we have 1.'.1"(' (, t...,) ixl (l, < p't. ',< '<-p,)9- ){cI(0<p< (l- ) ( 1) + p,( ' I 4 'A I

T'rl re fore fJ'spI.. ' ) x cl (a, < p, < h,) + ] t6p(p). where /' j, ' if y, > 0 0 otherw ise. and bi={ _ -./- ) ili c< 9 1 otherwise..lso /'1 u(,) = U.(0. l, ) and f '(t;) = ((0. ( -p, )9- ). These fill conditionals are sampled straightforwardly. We implement the Gibbs sampler with a fixed at four different values (0.1. 1.5. 10). Within each iteration we generate a sample from the distribution of pn+ldata to compare with the analysis of MacEarchen at al. (1996). \We collect 5000 samples for each of the four analyses which takes several seconds. The estimates for the predicitve density of p,+4 Idata for the four values of a are shown in Figure 1. These compare very well with the results of MacEarchen at al. (1996. Figure 2). 4 Conclusions The MDP model is very useful in a variety of applications. With the increasing use of the Gibbs sampler in Bayesian analysis it is necessary to have easy and fast ways of generating random variates from awkward conditional distributions. For the MDP model several researchers referred to in this paper have pointed out a serious computational hurdle in implementing the Gibbs sampler. In this paper, we have provided a general solution to the problem which bypasses the computational difficulty. References Beckett.L. &S Diaconis,P. (1994). Spectral analysis for discrete longitudinal data..dv.r in.lath. 103. 107-128. Bush.C.A. c MacEarchen,S.N. (1996). A semiparametric Bayesian model 8

I For raitnoni sed block designs. Biotnlrttika 83. 275-286. Damien.P. &' \Walker.S.C. (1996). Sampling probability densities via tlliformI random variables and a Gibbs sampler. Submitted for publicrtion. Ehcobar.MA.D. (1994). Estimating normal means with a Dirichlet process prior....4A. Statist. Assoc. 89. 268-2717. Escobar.M.D. &L \\estlM. (1995). Bayesian density estimation and inference using mixtures. J. Am. Statist..4.ssoc. 90. 577-588. Ferguson.T.S. (1973). A Bayesian analysis of some nonparametric problems. A11nn. Statist.. 209-230, MacEarchen.S.N. (1994). Estimating normal means with a conjugate style Dirichlet process prior. Commnin. Statist. Simul. and Comp. 23. 727-741. MacEarchen.S.N. & Mlueller.P. (1994). Estimating mixtures of Dirichlet process models. MacEarchen. S.N..Clyde.M1. & Liu.J.S. (1996). Sequential importance sampling for nonparametric Bayes models: The next generation. Mlueller.P..Erkanli.A. &- WestM. (1996) Bayesian curve fitting using rnultivariate normal mixtures. Biometrika 83. 67-79. Smith.A.F.M. S& Roberts,G.O. (199:3). Bayesian computations via the Gibbs -amipler and related Markov chain.Monte Carlo methods. J. Roy. Statist. Soc. Ser. B 55,:3-23. 'ierney.L. (1994). Markov chains for exploring posterior distributions. Ann. Statist. 22. 1701-1762.,West..E.AlMueller.P. &: Escobar.*M.D. (1994). Hierarchical priors and mixtlure modIels. with application in regression and density estimation. In.4Apercts of ntlcerttaintlj:.4 tribute to D. V.Lindley. eds A.F.M.Smith and P.Freeman. 363-368. 9

; n i 1 I, I tDA! I I i I C3 Ln C,, CM 4 cu I 02 0.4 0.6 0.8 1.0 alpha=0. I I ~i~~ I ii I i II 0.0 0.2 0.4 0.6 08 1,0 alpha= 1.. ~~~,,_I C LI) tM 0 a C) _' Ln 0 0 0 o ci 0.0 0.2 0.4 0.6 alpha=5 0.8 1.0 0.0 0.2 0.4 0.6 0.8 alpha=10 1.0 Figure 1: of a. Estimates of the densities for p,+l given the data for four vlaues 10