I Computations via Auxiliary Random Functions for Survival Models Purulshottala W. Laud' Paul Dnamicn2 Stephen G. Walker3 1Division of Biostatistics, Medical College of Wisconsin, Milwaukee. USA. 2 University of Michigan Business School, Ann Arbor, USA. 3 Department of Mathematical Sciences, University of Bath, UK. SUMMARY A new simrulation rmethod, Auxiliary Random Functions, is introduced. When used within a Gibbs sampler, this method enables a unified treatment of exact, right-censored, left-ceinsoreli, left-trucated arnd interval censored data, with and without covariatcs, in survival mlodels. The models and methods are exemplified via illustrative analyses. Key Words: Extended gamnmC process, Censored data, Gibbs sampling, Auxiliary finctions, Covariates, Bioassay. 1 Introduction The prior modelling of increasing or decreasing hazard rate fiunc tions using the extended ganimma process is relatively straightforward; see, for example, Dykstra and Laud (1981). However, serious problecns arise in suininarising posterior distributions. Even u1sing a Gibbs saniple)r. a number of computational difficulties remain (Laud et al., 1996). In this paper, wve synthesize the Dykstnra and Laud (1981) tnodel with coinputational ideas detailed in Damien, Wakefield and Walker (1999). These authors use auxiliary variables to simplify existing Gibbs samplers, by ensuring full conditional distributions take simple fbrmls. t Here we develop the idea of Daniien et al. (1999) to solve the comlputational problemtis. of Laud et al. (1996) via the use of auxiliary rcndoin functions (ARF). It is well known that there are unique computational difficulties associated with Bayes nonpa ramnetric models involviug the Gibhs sampler; see, for example, the innovative ideas adopted by Mac:Eachern and Miiller (1998) and reviewed by MacEacher~l (1998) for the Dirichlet pr1ocess. 1

I 2 Prior Model Here we collect, briefly, the key properties of the extended gamma process that will be of use in the rest of the paper; see, Dykstra and Lau(d (1981) and Amman (1984) for further details. To begin, we consider increasing Hlazard rate fulnctions. Let G(7(, 3/) denote the gamma distributionI with shape parameter a > 0 ard scale parameter 1/f3 > 0. For a = 0, we define this distribution to be degenerate at zero. For a > 0, its density is g({:ra,p') = $ r-(a) (-Ox:> > 0; [ (X I'l.:v _,< 0. Let a(t), t > (, be a non-decreasilng left, continuous real valued function such that t(al) = (0 and /3(t), t > 0, be a positive righlt continuous real valued function with left hand limits existing and bounded away from 0 anld oc. Take Z(t), t > 0, to be a gammla process with paramet;er function (Q). That is, Z(0) = 0, Z(t) has independent incremenicts and for t > s, Z(t) - Z(s) is G(r(t) - (.s), 1). Considering a version of this process such that all its salmple paths are non-decreasing and left; cozItiii.i)ous, let. r(t)= j [i[3(s) ]-dZ(s). This pr)ocess {r(t), t > 0} is called tlce Extended GaInima (EG) process and we denote it by r,(t) -r((.), p(-)). Several aspects of the posterior distribution are given by Dykstra and Laud (1981), Laud (1977) and Amman (1984). These can be summarized as follows: Fact 2.1: The EG process is conjugate with respect to right censored data. In particular, given an observation X > x; writing J?- r(t)dt as J'O(x - t)+dr(t) it. follows that the posterior process is again EG with a(-) unchanged and the ft parameter modified as /)3(t) = /(t) + (x - t) Here the superscript + on a quantity denotes its positive part. Fact 2.2: The posterior with respect to exact data is a mixture of EC processes. The mixing measure is rn-dimensional, where n is the number of exact observations, and has both continuolus and jump components. Fact 2.3: Although analytic expressions for the posterior mIcan ard variance functions are given by the above mentioned authors, their nurmerical evaluation is virtually impossible, especially as the sample size increases to more than around 20. They do not even consider the l)ossibility of including covariates as a cousequence. 2

I Our goal is to provide the basis for a full Bayesian analysis of the increasing (and later the decreasing and bathtub shaped) lhazard rate models for various types of censored data as well as exact observations, both with and without covariates. For ease of exposition, we first, focus attention on the increasing case. The rest of the discussion applies also to data arising in a bioassay because a bioassay can be treated as a censored data problem. In adlition, these methods also work for estimating the intensity rate function of a nollhlonogeneouns Poisson process because the form of the likelihood in this case is the samle as that for survival data (see, for example, Lavwless (1982)). 3 Posterior Computations Via ARFs Damicn et al. (1998) develop a general method to sanple from posterior distributioils ulsing auxiliary variables. Su1ppose lhce required conditional distribution for a randonm variable X is denoted f. The basic idea is to int-roduce an auxiliary variable (U, construct the ijoint denlsity of U anld X, witil marginal density for X given by f, and lthen extend the Gibbs sampler to include the extra full conditional for U. In a Bayesian context, consider the posterior density given by f(x:) a 1(.x:)7r(S;) and suppose it is not possible to sampllle directly from f. Thle general idea proposed by Darnien et all. is to introduce an auxiliary variable Y, defined on the interval (0, oo) or more strictly the interval (0,. l/(i), where i maximises 1(.), and dlefine the joint, (density with X )by f (:,) oc I(y < I(.7,))7r(:, ). The full conditional for Y is U(0,1(.x)), a uniform variable on the interval (0, 1()), and the, fill conditional for X is r, restricted to,the set. AY = {x: l(x) > y}. In this paper, we develo) a xsimulation method that is plarticularly suited to tlhe conltext of salmpling the morIn0otone hazard rate processes. Whereas Dalicn ct al. use latent variables to put a function in place which depends on a finite numrber of paralneters, the latent. v;ariables used in this paper put an entire random function into place. Hence we refer to it as tlhe auxil;iary random function (ARF) rlethod. To introduce the mnethod, consider the increasing hlazard rate case without covariates, and therefore dropping tlhe superscript I. Exact observations. The contribution to thle likelihood resulting from one exact observation at x is given by: f(:.) = r(x)F(x) -r(:) exp (- r(s)ds where F is thle survival function. Based on Fact 2.1, it is clear that F(x), being the contribution of a right-censored observation, updates thle prior parameters of the ECG process. in tlle above equation, from an ARF perspective, the following question arises: how do we substitute the 3

I random function r(-) so that sampling from the posterior is accomplished readily? Take tile culmulative distribution of a random variable Y given the hazard rate process r(-) ass: P(Y < y|j'()) = ) [(0 < / < x) + i(v > ). r {x) The distribution of Y is defined in terms of a random function, hence tihe phras-e "auxiliary random function". It. is in this sense that the ARF algorithm is different frorm Danmien ct al.'s algorithln. 'lb sanple [YIr(-)], let Y = r '(Ur(z)) wllere U - 1(0,1) and r-1(w) = irnf{tjr(t) > w}. To address samipling [r()I Y y ], write the density of Yr(-) as [yjr()] c/Na-v) [i,'(.)]- = r1(0 < y <w:). (.?I: ) Combining tihis with the prior and the likelihood we arrive at [r( )l (-)]cdr(y)exp (-I 7(.S). 1 where dr(y) denotes the increnlent of r-() at y. It is clear that this term combuines naturally withl tlhe EG process, adding a unit jum1p at y to the shape paratmet-er af( ). Thus, a;s a result of the ARF substitution and Fact, 2.1, the posterior full conditional for 7r(-) is once again an EKG process with the parameters lup)dated by: &(s) = c((s) + I(s > y) and /f(s) = f(f) + (x - )4. The fuill conditional distribution of the hazard rate function after tie ARF substitution reduces to an EG process. Thle inlcrements of the EG process can be simullated Iby using1 tlie Bondesson (1982) method a;s described in IJLaud, Smith aind Damien (1996) orL as is often adequate, simply approximatled as gammllaI variates. Left censored dacta, The contribution to tlhe likelihood fromn a left-censored observat.otl X < x, is F(x) = I - exp (-j r(s)ds First, using the auxiliary variate [W7r(-)] with hazard rate r(-) restricted to [0: x], we have 1[,,,I,'(-) X < '] = (11)) _o. x(-or ( )d,)) (() < 2 <. ('wj17'(-),X 1 - exp(- fi(s)d: and [r(.)lwX <;x] oc [r-()]r(?w) exp -,r(s)ds Now, in a manner similar to the exact data scenario, we introduce an ARF Y and proceed as described ab)ove. 4

I Interval censored data. The contribution of an observation, a < X < b to the likeliliood is F(a) - F(b) = exp - / r()s) - exp (- T r(sds) Elmploying WTllr) with hazard rate r7(Q) restricted to (a, ] yields the same full coaidiliional for r(.) as in the left censored data scenario above. Left truncated data. Here one observes X (or a censored version of it) only if it. exc:eds anl observed left truncation time L. Tlus tihe contribution of X =- x, L t I to the likelihood is f (:)1(l < x) = '(:) exp (- r(t)d1t) (I <:). Once again, thle AR.F Y as defined above suffices to substitute for r(x). It can be shown easily that the effect of the term e- J r(t)dt is to modify the scale functions given in Fac. 2.1 as P(t) = /3(t) + (: - Tmax (t, l))+ 4 Cox Model Computations Via ARFs Consider now the extension of the nonparametric analysis to include covariates. Employing tile Cox mnodel. Cox (1972), and initially tackling a single right-censored observation, let /3 and z denote, respectively, the vectors of regression parameters and covariates. The likelihood fimunction for a right censored observation is given by L(r,(.),/3) = P(X > x. 1(), 3) = exp (-eCt J r(,s),. Conditioning on P/ and following the development leading to Fact 2.1, we. easily arrive at [rI() 3,; X > ] EG (a(.), Q(.)) where A(.) = f(-) + cz', (x_ )In the case of exact data, [-(.)|/3, = x] [ r()]e.r(x) exp (-Z j' r(s) ds Clearly, the problem now yields to the same ARF substitutionl erployed in the case without covariates. To examine the fill c:onditional distribution of /, let & denote the usual "nonllcensoring" indicator and write the likelihood as L(r(-), /;, 6) oc ex, cxp -ez " r(s)ds 5

I Simulating from [/1?|] is easily accomplished using the adaptive rejection sampling algorithm of Gilks and Wild (1992) based on the log-concavity of [/3|], having assigned independent normal priors for,the jf comipornents Sampling 7-(-), which avoids the grid approximation nicethlods of Bondesson (1982) anId Damien et al. (1995), is available using representations of L6vy processes detailedi in, for example. Walker and Damien (2000). 5 Illustrative Analysis For illustrative purposes we present here ain examlfple using a simtulated dLata set of size ui = 500. We take the simulation model to be a; Weibull proportional hazards Inodel; that is, i(t) - /o(t ) cxp(f3 zil +Q 2 Z2), wilere ho(t) = t2/7200, fl3 = 0.405 and /2 = 0.693. The zis are taken to be( in {0(,+1} independently generated and P(zi;. = 0) = (.5. We took independent normal priors for [3i and 1 2 with zero means alnd standard deviation of 10, resulting in a tnon-informative set-ull. 'he prior parameter fuinctions c(-) tantd i(-) were chlosen so that r(-) was centred on a Weibull hazard rate function but wil.h a large variance. The Gihbs sampler was rim for 1000 iterations with a butrn-in of 2000 leaving 2000 samples fiom the posterior distributions with which to make posterior sunmmaries. In the Figure we presenlt the estizmate of the 1baseline hazard rate funct-ion, wit.h1 associated 2 standard deviation limits alongside the true hazard rale function. It is evident that the algorithlm has performed well in this case. 6 Decreasing and Bathtub shaped hazard functions In a manner similar to the (lescription of the increasing hazard rate ficntion, one canl define r(t) = [ [f(s)]-Z(s) = / [(s)]1'dZ(s) + [- (cx)][Z() ll Z(t)] ~,0DC The mrcain puirpose of tlhe values at infinity is to preclude linlnoo r(i) = 0 so that 1 7) r(t)dt is infinite and thle survival time described by?r() does not ]haIve positive p)robability at infinity. It is clear that such a process genertates ne cincLeasig haard ratcs We all it the Decreasing Extended Gammna (DEG) and denote it by r(t) ~Dir(,a(.), (.)). 6

I Amman (1984) uses the above developrment and defines a combined process by r(t) '-.(t) + rD(t), where the superscript, inherited also by tlhe parameter fiunct.ions, indlicates the (lirection of mionotonicity, and the two component processes are indlependlent. Amman ternmed this a Ushaped process whereas elsewhere in the reliability and survival analysis literature one also encounters theP usage 'bat-hltub-shaped hazard rates". It should be noted that such a combined process does not necessarily generate hazard rates with this slhapIe. It does, however, add a great deal of flexibility in modelling the shaIpe. For example, the expected hazard rate can le;arranged to decrcase initially and to increase bIeyond( a certain time point by a suiitable choice of the parameCtr functions. A parametric choice for the expectation can be a Weibull distribiltion with shape parameter less than one for the decreasing component and shape p)arameteltr exceeding two for the increasing componcnt. In gencral, one can require d2 d2 7'D d (2 E()) = 2 E(rt)) + (t)) d> ) with -E(r(t)) = 0 for somIe t > (). dtf Since E(r[(t)) (s) df [()l(-s' ) + A 1[t [l ) [ d )] ld(s) we get the conditions (t) a(')(t) ()- (t)^(1) (1) f3"(t)4- () - (t)f (t) [-JD~g]2 >0 [fPi(t)]2 [jD (t,)]2 and (1 (t) f3 (t) where the parentihesized subscript i on a function denotes its ith derivative. Such a prior, while maintaining a bathtlub expectationl, allows the hazard rates to have more general shapes. Tlhe data can then either reinforce thle bathtull shape or indicate otherwise via thle posterior expectation. Facts 2.1 to 2.3 also apply to thle decreasing hazard rate function, except for PI(t) = /fL(t) + mir(:r, t) in the decreaitsing hlazard case and for Fact 2.1. The case of the decreasing hazard rates is similar the aullxiliary variable Y hlas the survival fu nction P(Y > yyr([))= I( >:C) P >~~~~~r~2 7

I and possibly takes the value;co". This results in a DEG protess for the fulJl codlitional for r(-) with parameters &(s) = a(.s) + [(.s > y) and 3 (s) = f/(s) + min(x s). Turning to the case of the colmbined process, we write the posterior distribution as [r(.), '(.)X = 2:] cC [,rD(.)][7I(.){,. (X) + r (X)} txp(- (X{rD() + r(s)}ds.). Now the stochastic substitution takes th e form [X =. rd'.). ) = )I(O < v < X) + {- D(:/) }I(:/ > ) r'- (x) +. ' (:x) Straightforward algebra results in [r"(.(-)| 't)1,. X =.] cX EC(a, /p)ECG(&D, jD), where a(t() ( a(s) + I(y > x s)I. > y), a (s) = ' (s) + [(0 < y x:)I(s > y) and the two scale parameter filnctions as given above. The updates for left censored (lala, interval cclsored data and left truncated (lata follow straighforwardly. 7 Discussion In this paper we have provided a complete and easy to implement Bayesian solution to modelling monotone increasing hazard rate finictions nonparametrically. We point out that thie allproach rcad(ily extlends to the modelling of monotone decreasing alndr batlhtub lazard rate fiunctions based on the EG process (see, Laud, 1977, and Armman, 1984). A new stochastic simulation method was introduced to sample the posterior process. We also provided a solution for the semi-parametric model with covariat.cs. Natural extensions are to consider time-dependent covariatcs and to detect when data contradict the proportional hazards assumption. From tlhe perspective of the phlysicians and scientistS WiShillg lo use these methods, based on the. stored posterior samples, graphical implementations that can respond to queries regardling subistantively interesting fulnctionals would be attractive inleed. References Amman, L. (1984). Bayesian nonparametric inference for qull;ntal respollse data. Annals of Statistics, 12, 636-645. 8

I Bondesson. L. (1982). On sizmulationl fioi infinitely divisible distributions. Adv. Alpp. Proh. 14, 855-869. Cox, D.R. (1972). Regression models and life tables (with discussion). T. Roy. Statist. Soc. 1, 34, 187-220. Damien, P., Wakefield, J.C. and Walker, S.C. (1998). Gibbs sampling fori Bayesianll inconjigate models using auxiliary variables. J. Roy. Statist. Soc. B 61, 331-344. Dykstra, R.L. arnd Laud, P.W. (1981). A Balyesian nonparametric approach to rli;ability. Annrls of Statistics, 9, 356 367. Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics 41. 337-348. Laud, P.W. (1977). Bayesianl nonp;taraleric iinference in reliability. Pl.D. Dissertation; Uiiiversity of Missouri, Columbia, MO. Laud, P.W., Smith, A.F.M. and Damien, P. (1996). Monte Carlo methods for Ipproximating a posterior hazard process. Statistics andi Computing, 6, 77-83. Lawless, J.F. (1982). Statistical Models and Methods For Uifctime Data. Wiley P11ublications, New York. MacEachern, S.N. and Miiller, P. (1998). Estimating mixt.ures of Dirichlct process mtodels. Journal of Computational and Graphical Statistics 7, 223-238. MacEachern, S.N. (1998). Computational methods for mlixture.s of Diricllhet process models. In Practical Nonpmaametric and Semiiparametric Bayesian Statistics (Eds. D.Dey, P.M iiller and D.Sinha.) Clapter 2, Springer. Walker, S.O. and Damien, P. (2000). Re:prescntatiolls of Ldvy processes without Gaussian components. Biometrika 87, 477-483. 9

0.25 / I / / / / I11/ 0.00 0 20 40 Figure. Solid lines represent posterior mean and two standard deviation limits. Lower dashed line shows the prior mean. the upper the true hazard. to