I TO 1 8 99 KRESGl BUS. ADC LIBRARY RESEARCH SUPPORT UNIVERSITY OF MICHIGAN BUSINESS SCHOOL DECEMBER 1996 A BAYESIAN BIVARIATE FAILURE TIME REGRESSION MODEL WORKING PAPER #9612-35 PAUL DAMIEN UNIVERSITY OF MICHIGAN BUSINESS SCHOOL AND > PETER MULLER ISDS, DUKE UNIVERSITY

I

A BAYESIAN BIVARIATE FAILURE TIME REGRESSION MODEL By PAUL DAMIEN School of Business Administration University of Michigan, Ann Arbor. M'I 48109-1217. USA email: damienOdamien. bus. umich. edu and PETER MULLER ISDS. Duke University. Durham NC 27708-0251. USA (919) 684 3437 email: pmOstat. duke. edu

A BAYESIAN BIVARIATE FAILURE TIME REGRESSION MODEL SU.I.MMIARY In this paper. we model bivariate failure time data under the assumption that the two random quantities corresponding to each of the failure times are dependent. We extend the analysis from a bivariate Gumbel model by regressing the log marginal hazard rate against context-motivated covariates. Illustrative analyses using data from electrical treeing and cancer relapse times are exemplified using Markov Chain Monte Carlo methods. Keywords: Gumbel Distribution, Gibbs Sampling, Correlation. Failure Time, Hazard Rate. Covariates. V

I 1 INTRODUCTION We use the term failure time to refer both to reliability data arising in engineering studies and survival data arising in medical or clinical trials. In engineering studies. reliability is defined as the probability that the system has not failed at time t. Likewise. in clinical trials, survival is defined as the probability that a patient has not died. or contracted the disease under investigation at time t. Most work in this area has been concerned with the case of one observed failure time while in the case of two observed failure times. most authors. using a frequentist approach, have considered such failure times to be independent: see. for example. Mann and Grubbs': Guttman and Sinha2: Mlivamura3. Tawn4 considers bivariate models and estimation by developing an asymptotic estimate for the dependence function. Tawn extends the bivariate structure to multivariate models using extreme value distributions. Oakes6 also considers a non-Bavesian analysis of bivariate models that are induced by frailties. The Gumbel model is studied in that paper. Most recently. Oakes and Manatunga7 obtain an asymptotic estimate for the Fisher information for a bivariate extreme value distribution. Thus, there appears to be several useful papers in recent literature that seek to provide asymptotic estimates for the parameters of bivariate failure time distributions. Recent Bayesian work in this area is, however. somewhat sparse, the notable exception being Draper et al.8. Our approach extends the work of Draper et al.8, who provide a Bayesian analysis of system reliability when two components are dependent, in two respects. Firstly, we bypass the need for numerical integration of awkard posterior forms via Monte Carlo methods that are straightforward to implement. In addition to ease in implementation, these methods provide a full Bayesian solution to the problem that Draper et al.'s method does not accomplish. Secondly. we extend the bivariate Gumbel model by modelling the logarithm of the marginal hazard rates through a regression set-up. A full Bayesian analysis for the latter model is provided and exemplified using bladder cancer data. 3

I 2 THE BIVARIATE GUMBEL MODEL Suppose two components have failure times (T1. T2) that are distributed with density function f(tl. t.) = A1A2[exp(-AXtt - At2j)][l + ahi(t1)I 2(t2)]. (1) where A1. A2.1. t2 > 0.-1 < a < 1. and hj is defined by I z(tj)= 2exp(-Ajtj)- 1. j = 12. This is the Gumbel bivariate exponential distribution (Gumbel9). The marginals of this joint density are fj(tj) = Ajexp(-At), with E(Tj) = AJ' and Var(Tj) ='A-2. The reliability of a system using the model in (1) is given by R(t) = P(Ti > t, T2 > t) = exp(-At)[I + a{e-lA - l}{e-it- 1}]. (2) with A = A1 + A2. hloeschberger and Klein'~ have argued, from a frequentist point of view. that the consequences of assuming a = 0 (independence) in the above set-up could be very misleading. Draper et al. using a Bayesian approach, do not assume that a = 0. but. they are only able to provide estimates of the posterior mean of R(t) by numerically integrating out appropriate random variables after a scaling transformation of the Bayesian model described below. 3 THE BAYESIAN MODEL Let t, = (tl,, t2i)'. i = 1,... n be n independent observations on (T1 T2) whose density is given by (1), so that denoting (t.... t,,) by T, the likelihood function is given by n f(TA1, A9, a) AA.2exp(-(ASl + A2S2)) f[li +~ ah(ti)h2(t2i)J] (3) 2=1 with Sj = n=l tji, j = 1, 2. Denoting the prior joint distribution for (Ai, A2, a) by [Ai\. A2 a], the posterior joint distribution of (A1 A2, a) is given,by Bayes' Theorem, as [Ai, A2, caT] c f(TIA,, A2, a)[A, A2, a], (4) 4

I where the notation [.] denotes the probability density of.. In this paper. following Draper et al... we will only consider priors of the form 2 [A1. A2. a] oc (1 - a)r-l( + Ca)-r1 [ AX exp(-AXqj). (5) 7=1 Equation (5) implies that (1- c)/2 has a Beta (rl. r) distribution. A\ has a Gamma (pj. qj) distribution, and that the three parameters are mutually independent. Under certain conditions. (5) can be shown to reduce to the non-informative prior [A. A2. a] xc AX 'A, A, A,2 > 0, -1 < a < 1. (6) Draper et al. use numerical integration methods after suitably transforming (4) to obtain the exact posterior distribution of the three parameters. 4 BIVARIATE GUMBEL EXPONENTIAL REGRESSION We note that our choice of the bivariate gumbel distribution in this paper is purely to illustrate the Bayesian method when one observes two random quantities (failure times) that may be correlated. Other bivariate models. such as the Pareto. exist and may be more appropriate in a given context. Often it is the case that failure times in clinical trials are related to covariates such as gender. age etc. We include a regression on covariates by extending the model in (3) to f(ti. t2ilx, A~, A~, a, 31.32) = A1, A2[exp(-Aitl - 2t2)][1 + ahi(ti, A1)h2(t2) A2)]. log(Ai) = logA + 31,lxli +... + 1,mxmi, log(A2) = logA +- 32.lxUli +... + f2,mxmi, (7) where x = (x.i, z) i = 1,... m is the vector of m observations on each covariate for each of the two failure times, tl and t2. We complete the model Iby specifying a prior distribution on the 3 vectors taking these to have a multivariate normal distribution. 3j -.V (b. B). j-1,2. We assume that 3 and (Al, A2. a) are a priori independent. Note that any other prior distribution on 3 is possible, and our choice of the normal distribution is simply to exemplify the Bayesian approach to the problem. 5

I 5 COMPUTATIONS By a combination of Metropolis and independence chain steps we build a Markov chain Monte Carlo scheme to estimate the Gumbel bivariate exponential model. For a review of Markov chain Monte Carlo methods see. for example. Smith and Roberts11. Tierney12 or Gilks et al.13. The underlying rationale of Markov chain Monte Carlo methods is to simulate a Markov chain which is constructed to have the desired posterior as the limiting distribution. Under conditions discussed in Tierney'2 appropriate ergodic averages over the simulated chain can then be used to estimate posterior expectations. marginal posterior distributions, etc. 5.1 Estimating the Gumbel bivariate exponential model We discuss first the Markov chain used for the estimation of the bivariate exponential model in (3) with the prior given in (5). Starting with initial values for the three parameters A1 A2, a we define a chain by the following three steps. First, draw a "candidate" Ai ~ Gamma(a. b) with a = pi + n and b = q + E tli. Denote by g(A) the Gamma(ab) p.d.f. Compute a.(A.:1) - mi (1. p(A IA1,a T)g(AI) H nl=[1+ achl(tli, A1)h2(t2. A2)] p(1AiA1 a,T)g(XA)- J} l[ + thl(tli, ))h2(t2i A2)] With probability a set A1:= A,l else leave the current value of A1 unchanged. This is an implementation of an independence chain step, using the Gamma posterior of the marginal model in tli as approximation of the full conditional posterior. See Tiernev12 for more discussion. Second. replace A2 by repeating the independence chain desrcibed above for A2. Third. update a by implementing a Metropolis step: Generate &a N(a, a,) and evaluate a(a.&)= min (1,p(&IAll A2:: T)) a(a. c>) = min 1, p(alAl, A2, T) With probability a replace ct by c. else keep the currently imputed value of a; see Section 6.3 for comments on choosing the arbitrary scale factor a,. 6

5.2 Extension to the Gumbel bivariate regression model The Mlarkov chain Monte Carlo scheme described for model (1) is easily modified to include the estimation of regression parameters in the Gumbel bivariate regression model (7). Starting with initial values for the paramter vector 0 = (A~. A,. a. 311..,). we simulate a Markov chain defined by iteratively scanning over all paramleters. replacing each parameter by simulating a Metropolis step. Relabel the parameter vector as 0 = (0,.... 0p) (i.e. write 01 for A~ etc.). Each Metropolis step is given by: First draw a "candidate" Qj N(0j. aj) and compute a(0. Oj) - mi (1 p(0j- ej-T)) where 0_j denotes the full parameter vector without Qj. Second. with probability a replace Oj by tj, else keep the currently imputed value of 0J. The aj are arbitrary scale factors. Section 6.3 suggests some default choices. Note this is identical to the Metropolis step which was used in simulating the a parameter in section 5.1. 6 EXAMPLES 6.1 Example 1: A Gumbel bivariate exponential To illustrate the proposed Markov chain Monte Carlo scheme we estimated the model with the electrical cable failure data from Lawless14. The same data set was used by Draper et al.8. The data concerns time to inception of a microscopic defect in the material (tli) and the subsequent additional time to eventual failure (t2,). This phenomenon is commonly referred to as electrical treeing in the engineering literature. The posterior marginal distributions of the three parameters are plotted in Figures 1 and 2. In addition. easily obtained forms of inference summaries using the Monte Carlo samples provide a complete description of the uncertainty in the three parameters. It is interesting to observe that there is considerable evidence from the data that a is non-zero, suggesting a strong dependence between the two failure times. WVe note that the results from this simulation are in close agreement with results obtained using exact numerical (quadrature) methods by Draper et al.8. The key function of interest is the reliability i

I function given in (2). A full description of the uncertainty in the reliability function at each observed time is easily obtained by simply evaluating (2) at each of the simulated samples. As an illustration, a complete description of the uncertainty in R(t) at times t = 50.100, 150 is provided in Figure 3. We note. once again. that Draper et al. only provide an estimate of the mean of R(t) at each point along the time axis. 6.2 Example 2: A Gumbel bivariate exponential regression model Byar et al.'1 report a study comparing placebo, pyridoxine and topical thiotepa in preventing recurrence of Stage I bladder cancer. Stage I tumors tend to recur in around 50% of the patients within two years after removal. Instillations of thiotepa and regular ingestion of pyridoxine are believed to retard the growth of tumors. or cure them altogether. We consider patients with 2 or more recurrence times. For each of the i = 28 patients we record the two relapse times (ti. t2i, two ilndicators for the assigned treatment (x1i = 1 for pyridoxine, x2i = 1 for thiotepa, x1 = x2t = 0 for placebo), and number (x3i) and size (x4i) of initial bladder cancer occurences. \We estimate the bivariate life time regression model (7) assuming a noninformative prior (6) for (At, A2, a), and taking f(t1z, t2,x, A~, AO. a, 1,.32) = A1. A2[exp(-Alxl - A2x2)][1 + ahl(Xl, Al)h(x2. A-2)]. log(A1) = A? + 31,lx1l + 1.2X2i + 31,3Z3i + 1/,4X4t, log(A2) = A +.32,lxli + 32.2XI i +- 2,33i + 3,4X4i. A strength of the Bayesian approach is to enable the statistician to factor in context-dependent constraints. Thus, a priori, we believe, based on medical knowledge, that the coefficients in the 3 vectors corresponding to the covariates. number. and size of initial bladder cancer occurences are non-negative. We factor this qualitative information by restricting the appropriate components in the 3 vectors in the prior distribution to be non-negative. Figure 4 shows the estimated marginal posterior distributions for A1, A2 and a. Since inference is based on simulated Monte Carlo samples, any desired inference summaries can be provided. Like in the previous example, we observe strong dependency between the two failure times as depicted in Panel (b) of Figure 4. As an illustration, a complete posterior description for,3j and fj2 in the regression model is given in Figure 5. The reliability function in (2) in the 8

ka- - - - - -..~~~~~~~~~~~~- ---- context of these data is the cumulative survival based on the two recurrence times. As an illustration. a plot of the survival function is provided in Figure 6a. Again. as an illustration, in Figures Ta and b. we plot the posterior marginal distribution in the form of histograms for the coefficients 313 and 314, respectively. Recall that these coefficients correspond to the effects of prevalance (X3) and size (X4), respectively. of initial bladder occurence. Moreover. as noted previously, the prior distribution incorporates the restriction that both of these parameters are non-negative. A remarkable feature of Monte Carlo based methods in Bayesian inference is that questions of practical interest such as the following can be answered straightforwardly: -what is the posterior probability that a patient will have first and second relapse times greater than 5 months if the patient is subjected to thiotepa treatment and intial tumor size and prevalance are x3i = 1 and x4, = 1?" Using the simulated samples for each of the parameters, we simply calculate, for each sample, the required probability from (2), thus obtaining a complete description of the uncertainty for this prediction. This is reported in Figure 6b. 6.3 Implementation and initialization The Markov chain Monte Carlo scheme proposed in Section 5.2 implements a Metropolis-within-Gibbs algorithm as proposed in Miiller17. In the examples reported in this paper we chose the scale factors aj by approximately estimating posterior standard deviations via Laplace integration first. For Example 2. we estimated a simplified univariate regression model in til only, using the approximate posterior standard deviations of A?, 31,j for both, the scale factors on A? and 31j as well as A~ and 32.j. Based on the restricted parameter space for a we set aa = 0.5. The approximate posterior means from the Laplace integration served as initial values to start the Markov chain Monte Carlo simulations. VWe arbitrarily, initialized a at a = 0. Another issue to be resolved for practical implementation of the scheme is how to decide termination of the simulation scheme. In the reported examples we started by running 5000 iterations of the chain. Then we evaluated the convergence diagnostic proposed by Geweke17. In Example 1, the diagnostic indicated that the chain had already practically converged. For Example 2, the diagnostic indicated convergence after 15,000 iterations. 9

Both Markov chain Monte Carlo schemes were implemented as C programs on a Digital workstation 3500 check. The 5000 iterations in Example 1 required 27 seconds CPU time. The 15,000 iterations in Example 2 took 23 minutes CPU time. To plot posterior marginal distributions like in Figure 1 we used a "RaoBlackwellization" type estimate as proposed in Gelfand and Smith'8. For example, let 0(t) denote the imputed parameter vector after t iterations of the Markov chain. To estimate p(A IT) for A1 = 0.002 we evaluate p(,1 jA.t). (t) T) for t = 100,150,200,..., 5000 and compute p(A1|T) =, Ep(AilA,), a(t, T). We only use every 50-th iteration for reasons of computational efficiency. 7 SOME EXTENSIONS In this paper, we have considered modelling two failure times by allowing for potential dependence in the random quantities corresponding to each failure time. Several extensions are possible. First, other bivariate models, such as the bivariate Pareto distribution, could be easily considered using a similar framework. Secondly, the implications of such models in competing risk analysis needs investigation. Thirdly, hierarchical models and nonparametric generalisations for the marginal hazard rate are possible. For example, one could model the hyperparameters of the regression model as realisations of a stochastic process; see, for example, West, Mueller & Escobar19. Fourthly, following Tawn5, the computational details for a multivariate extension from a Bayesian perspective needs study. These and other extensions will be reported elsewhere. ACKNOWLEDGEMENTS The authors would like to thank Adrian Smith at Imperial College, London for drawing our attention to the paper by Draper et al. REFERENCES 1. Mann, N.R. & Grubbs, F.E., 'Approximately optimum confidence bounds for system reliability based on component test data', Technometrics, 61, 335-347 (1974). 10

2. Guttman. J. & Sinha, S.K.. 'Bayesian inference about the reliability function for the exponential distributions', Communications in Statistics. A.5. 471-479 (1976). 3. Mliamura, T., 'Estimating component failure rates from combined component and systems data: Exponentially distributed component lifetimes,' Technometrics, 24, 313-318 (1982). 4. Tawn, J.A., 'Bivariate extreme value theory: models and estimation'. Biometrika, 75, 397-415 (1988). 5. Tawn, J.A., "'Modelling multivariate extreme value distributions', Biometrika. 77, 245-253 (1990). 6. Oakes, D., 'Bivariate survival models induced by frailties', Journal of the American Statistical Association, 84, 487-493 (1989). 7. Oakes, D. & Mlanatunga, A.K., 'Fisher information for a bivariate extreme value distribution', Biometrika, 79, (1992). 8. Draper, N.R., Evans, M., & Guttman, I., 'A Bayesian approach to system reliability when two components are dependent', Computational Statistics and Data Analysis, 7, 39-49 (1977). 9. Gumbel, E.J., 'Bivariate exponential distributions', Journal of the American Statistical Association, 55, 698-707, (1960). 10. MIoeschberger, M.L. & Klein, J.P., 'Consequences of departures from independence in exponential series system', Technometrics, 26, 277-284, (1984). 11. Smith, A.F.M. & Roberts, G.O., 'Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion)', Journal of the Royal Statistical Society, Series B, 55, 3-23, (1993). 12. Tierney, L., 'Markov chains for exploring posterior distributions', Annals of Statistics, 22, 1701-1728 (1994). 13. Gilks, W.R., Clayton, D.G., Spiegelhalter, D.J., Best, N.G., McNeil, A.J., Sharples, L.D.. & Kirby, A.J., 'Modelling complexity: applications 11

I of Gibbs sampling in-medicine'. Journal of the Royal Statistical Society B. 55. 39-52 (1993). 14. Lawless, J.F.. Statistical Mfodels and,ethods for Lifetime Data. New York: Wiley. 1982. 15. Byar. D.P.. Blackard. C. k VACURG 'Comparison of Placebo. Pyridoxine and Topical Thiotepa in Preventing Recurrence of Stage I Bladder Cancer', Urology, 10, pp 556-561 (1977). 16. Mueller. P.. 'Metropolis Based Posterior Integration Schemes'. Technical Report 91-09. Statistics Department, Purdue University. (1991). 17. Geweke, J.. 'Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments', in Bayesian Statistics 4, ed. J.O. Berger, J.M. Bernardo, A.P. Dawid and Smith, A.F.M., 169-194 (1992). 18. Gelfand, A.E. and Smith A.F.M. 'Sampling based approaches to calculating marginal densities', Journal of the American Statistical Association. 85, 398-409 (1990). 19. Wlest, NM., Mueller. P., & Escobar, M.D., 'Hierarchical priors and mixture models, with application in regression and density estimation', in Aspects of Uncertainty: A tribute to D. V. Lindley, ed. A.F.M. Smith and P. Freeman, Wiley, New-York. (1994). 12

I 0 0 0 0 oi \ 0 a 0.0010 0.0020 0.0030 0.004C 0.010 0.020 0.030 0.040 LAMBDAI LAMBDA2 Figure 1: Example 1. Marginal posterior distributions p(Allx) and p(A2lx). First quartile, median, mean and third quartile are estimated by (Qi = 0.0018. Mdn = 0.0021, A1 = 0.0021,Q3 = 0.0024) and (Q1 = 0.019, Mdn = 0.023. A2 = 0.023, Q3 = 0.026) respectively. 13

O ci aLi 0 C= I I -1.0 -0.5 0.0 0.5 1.0 ALPHA Figure 2: Example 1. Marginal posterior p(alx) on the correlation parameter. Estimated posterior quartiles and mean are Q, = 0.38, Mldn = 0.68. d = 0.58 and Q3 = 0.86. 14

I cO o 0 c1 j 0 50 100 150 200 250 TIME Figure 3: Example 1. The estimated reliability function R(t) = P(-X > t..2 > t). The graph plots the posterior mean estimate for R(t) against t. In addition to point estimates for R(t) the Bayesian modelling framework provides a full description of the uncertainty about R(t) at any t. For some selected values of t the figure shows the full posterior distribution p(R(t)jT). 15

in 'T -1 tO cj 0 CCi Un To t i -- o ] Q. 0 To 0 6 0.15 0.25 -1.0 -0.5 0.0 0.5 1.0 0.0 0.05 (a) p(A~lrx) (b) p(a|Ix). Figure 4: Example 2. Panel (a) shows the posterior marginal of A~ (solid line) and A. (dotted line). Panel (b) plots the posterior marginal of the correlation parameter a. 16

I o Co co Fiur aExm 2a. ri o C0 04 - 0 0 0 \ qU i [^ I " t **.... - I I I!! -1.5 -0.5 0.0 0.5 1.0 1.5 -1.5 -0.5 0.0 0.5 1.0 1.5 BETA1 BETA2 (a) p(N 1 I|) (b) p(~I). Figure 5: Example 2. Posterior marginal distributions for the regression parameters 831 (solid line in panel (a)), 321 (dotted line in panel (a)) for first relapse time and 312 (solid line in panel (b)) and 322 (dotted line in panel (b)) for the second relapse time. The posterior distribution indicates a slightly lower log hazard for the first relapse time under the thiotepa treatment. For the second relapse time the two treatments are practically indistinguishable. 17

0 o. 0 - TIM j0 I O~~~~~CO O0 - Figure 6: Example 2. The estimated reliability function R(t) = P( > 0 o \ o - M. _ \ — * estimates for R(t) the Baesian modelling framework allows to put a posterior 0 5 10 15 20 25 30 0.0 0.1 0.2 0.3 0.4 0.5 0.6 TIME y (a) R(t) (b) p(R(5)|x= (0, 1,1, 1),T) Figure 6: Example 2. The estimated reliability function R(t) = P(T\ > t. T2 > t) for a patient with covariates x = (0,1,1, 1). In addition to point estimates for R(t) the Bayesian modelling framework allows to put a posterior predictive distribution on quantities like the probability that a patient has first and second relapse time greater than 5 months when subjected to thiotepa treatment and with initial size and number of bladder cancer occurences equal X3 = 1 and X4 = 4. Panel (b) shows the histogram of the predictive posterior distribution for this probability. 18

0 O Ct 0 0 - lq 0 C) - CO 0 O - C\J 0 - 0 - 0 0 CO 0 0 cJ 0 0 -! 0 I I I I I I 0.0 0.1 0.2 0.3 0.4 0.5 BETA3 (a) p(313:T) 0.0 0.1 0.2 0.3 0.4 BETA4 (b) p(8i4 T). Figure 7: Example 2. Posterior marginal distributions p(13fIT) (panel a), and p(314jT) (panel b) for the the effects of prevalance (X3) and size (x4) of initial bladder cancer occurence. Here T = (t1i. t.i, x, i = 1..... n) denotes the full data vector. The proposed Markov chain Monte Carlo scheme allows to include prior information on tile paranleters. In this example, we enforce a priori that 313 and 314 bIe noln-legative. 19