A Simple Multi-Stage Model of Urinary Tract Infection Occurrence After Coitarche Barbara Rocci Stephen M. Pollock Department ot Industrial & Operations Engineering University of Michigcan Ann Arbor, MI 48109 SNw~"3k>~~~~~ ~Betsy Foxxman Depuartment of Epidemiology -University of Michigan Ann Arbor, MI 48109 K- Technic-a Report 97-05 May 1997.~, fN...... <,*...';/''-*';',-' r..''....... **' /. A. - t ^:.r..i

A SIMPLE MULTI-STAGE MODEL OF URINARY TRACT INFECTION OCCURRENCE AFTER COITARCHE Barbara Rocci, Stephen M. Pollock, Betsy Foxman About half of all women will experience a urinary tract infection (UTI) by age 30; approximately 20 - 25% of these will develop a second infection within 6 months of the first one; nearly 5% of women will experience multiple recurring infections. The etiology of UTI's is not wellunderstood, and there is evidence that at least some may be sexually transmitted. Mathematical models can provide insight into the transmission dynamics of these infections, as well as help to evaluate the possible effects of prevention strategies. One such model is presented here. MODEL DESCRIPTION In the basic model, a woman is tracked at the start of sexual activity (coitarche) and can progress through a number of possible stages, as shown in Figure 1. Thus the model assumes a woman can be in any one of the following states at any given time: a) State S,, t = 1, 2,..., 14, where t = years after coitarche with no UTI b) State Mj, j = 1, 2,..., 6, where j = months after first UTI with no second UTI c) State "UTI1" = occurrence of first UTI d) State "UTI2" = occurrence of second UTI e) State "No UTI lin 14 yrs" = no first UTI within 14 years of coitarche f) State "No UTI 2 in 6 mos" = no second UTI within 6 months of first UTI From state S,, there is a probability h(t) per month of moving into state "UTII". From state Mj there is a probability m(j) per month of moving into state "UTI2". Let T = year (after coitarche) of first UTI, J = month after first UTI of second UTI. Then these probabilities are, by definition, the "hazards": h(t) = P(T = tIT > t- 1) t = 1, 2,..., 14 m(j)= P(J = jlJ>j- j = 1,2,..., 6 The model also includes an aging process. For each month i in year t after coitarche, p(t) = PS, _ S (,+, in month i) =:{1 i l The transition to year t+1 occurs at the end of month 12 in year t. This is illustrated in the detailed view of state S2 in Figure 1. Parameter Estimation - First UTI We can estimate h(t) using data from Foxman, et al. (unpublished), in which the time from coitarche to first UTI was recorded for 210 women. Since all the women in this study developed a UTI, h(t) cannot be estimated directly. The longest time from coitarche to first UTI in the study was 14 years. Thus we first estimate the probability g(t) = P(T = tt -1 < T < 14), then remove the condition that T < 14 to obtain an estimate of h(t). Kaplan-Meier estimates of g(t) were first calculated. This non-parametric technique involves recording the number of women at risk of UTI and

Sih(1) I- -1 P1O() I 1-h( T (2 I 2 )S2() I — h(2 -h h(2) 1h2(2) pi(2) r S S3 h(3) I31) 12(2) pi(3) 1-hh2(2 S3 h(4) I SS3 h1(3 S 4 ~l~1-h1(3 ( pi(4) 12(3) s5 h(5) i>y pi(5) 1I_ __'__ -h __ __ 3) (6 h(6) Pi(6) s7 lh(7) m(l) pi(7) I-m(l) ph(8) l- m(2 ) S9 h|(9) i m(3) S h(10) 0 N m(4) Pi(910) l-m(4) sI I MII Pi(10) 1-m(4) E P12) 2) m(6)l-m(6) p2) 1-m(6) f____ j in 6 mos. Pi13) l4 h(14) P14) No UT I1 in 14 ys rFigure 1: Model picture

the number developing a UTI during each year, then estimating the overall probability of surviving up to t. From the data, we have, for t = 1,2,...,14: n, = number of women for whom T > t - 1 d, = number of women for whom T = t Detailed calculations are given in Appendix A. The resulting estimates are shown by the squares in Figure 2. These results suggest that g(t) might reasonably take the form { at t<b g(t)= ab t2b (1) Using this representation, maximum likelihood parameters a and b were calculated (see Appendix B), yielding a = 0.18 and b = 4 years. The solid line in Figure 2 shows the resulting g(t). 1 1 0.9 0.8 - - 0.7- -' 0.6 0.5 0.4 /' 0.3 0.2 0.1 0 2 4 6 8 10 12 14 t (years since coitarche) Figure 2: Conditional probability per year after coitarche of developing first UTI (o = K-M estimate, - = parametric estimate using equation (1)) The parametric estimate ^(t) =at t<b (2) tab t>b is used in the model. As defined above, g(t) is an estimate of the conditional probability of developing the first UTI in year t given that a UTI will occur by year 14 after coitarche. Since a woman will not necessarily experience a UTI within 14 years after initiating sexual activity, it is necessary to calculate the unconditional estimate for use as h(t) in the model. Again, letting Tbe the year of first UTI, we have:

g(t)= P(T =t t -1< T 14) P(T 14, T=t T>t-1) (3) P(T 14) P(T < 141T = t)P(T = tT > t- 1) P(T 14) Clearly, P(T < 14 I T = t) = 1 for t<14. Thus equation (3) becomes: A (t) ^ P(T = tT > t - 1) g(t) = P(T = t l t-1 < T < 14)= P(T= tt>t-) P(T < 14) Thus an estimate of h(t) is given by: h(t) = P(T = tiT > t - 1) = P(T = t I t -1 < T < 14)P(T < 14)= i(t)P(T < 14) (4) Studies suggest that about half of all women experience a UTI by age 30. Assuming that the average age of coitarche is 16, P(T < 14) _ 0.5. Thus equation (4) becomes: h(t)= (0. 5)(t) (5) Equation (5) is used to calculate h(t), using the parametric estimate g(t) of equation (2). Parameter Estimation - Second UTI The model assumes that an infection is treated and cured within one month, and then allows the development of a second UTI within six months after the first one. In particular, for month j, j=1,...,6 after the first UTI, there is a conditional probability m(j) of developing a second UTI, given no second UTI through month j-1. Kaplan-Meier estimates of m(j),j=1,...,6 were calculated using data from Foxman, et al (unpublished). A total of 263 women were followed from their first UTI for six months or until they developed a second UTI, whichever came first. Figure 3 shows the resulting Kaplan-Meier estimates. These non-parametric estimates are used in the model. 0.08 0.07 - 0.06 E 0.05 0.04. 0.03 * m E 0.020.01 0 I I I I I 0 1 2 3 4 5 6 j (months since first UTI) Figure 3: Conditional probability per month after first UTI of developing a second UTI

RESULTS The model, using g(t) of equation (2) and h(j),was solved using STELLA II, a differential equation solver. The resulting cumulative probability of experiencing a first UTI as a function of time since coitarche is shown in Figure 4. The cumulative probability of having a second UTI was also found using the model, and is shown in Figure 5. 0.45 0.4 0.35 - l 0.3 V 0.25 " 0.2 u 0.15 0.1. 0.05 - 0 en C O CN t 00 C 0 eM CD o m ~ oo\ n W oo -< tN O oM^ t (time since coitarche) Figure 4: Cumulative probability of experiencing a first UTI 0.08 0.07 0.06 - 0.04 5 0.03 0.02 - 0.01 - 0 IC C O) 00-, t I,-,-,,- N c N M o c months since coitarche Figure 5: Cumulative probability of experiencing a second UTI

In Figure 4, we see that by 2 years after coitarche, approximately 40% of women will have experienced a first UTI. Figure 5 shows that approximately 7% of women will have had at least two UTI's within 2 years of coitarche. Clearly, women will experience a second UTI only after having had a first one, thus the curve in Figure 5 lags behind that in Figure 4 by a few months. Also, the two figures are displayed on very different scales, indicating the different magnitude of probabilities of first and second UTI's. Potential Use of the Model The model can readily provide information on how intervention strategies might change the probability of developing a first or second UTI. A simple example serves to illustrate how we can evaluate possible public health strategies. Suppose an intervention strategy were available that would lower all h(t) by 20%; the results are shown in Figures 6 and 7. Lowering h(t) by 20% does not appear to have a significant effect on the asymptotic values of P(UTI 1 < t) or P(UTI 2 < t), but shifts both curves to the right somewhat, indicating that for time t, the probability of having developed a UTI before t is lower after the intervention. The shift is greater for second UTI than first UTI. Information such as this could help public health officials determine whether intervention strategies are justified. Other examples of the model's usefulness in evaluating alternatives include: a) Different Strains of Bacteria Approximately 50% of second UTI's are caused by the same strain of bacteria that caused the first UTI. The model can be modified to allow for the possibility of second infection by the same or a different strain, by adding more states. Thus instead of a single "UTI 2" state, there would be two: one for re-infection with the same strain, and one for infection with a different strain. Intervention strategies might affect the probabilities per month of developing a second infection with the same or a different strain as the first. Similarly, the model can be used to study the difference between being infected with E. Coli versus nonE. Coli bacteria for the first UTI. There may be different recurrence rates for these two types; thus the model could be used to study possible interventions based on behavior profiles or symptom profiles. b) Condom Use There is evidence that condom use is protective against a second infection but may be associated with higher probabilities of first infection. These elements could be accounted for in the model by raising or lowering the appropriate probabilities of infection to see how the final probabilities of having an infection. To simulate the protective effect of condoms, h(t) or Am(j) could be raised and lowered, respectively, and the results examined to determine the effect on the probability of developing an infection by a given time. In all of these examples, the model can also provide insight on how sensitive the final distributions are to changes in parameters. For example, figures 6 and 7 show that h(t) would have to be lowered by more than 20% to have an impact on the asymptotic cumulative probabilities. The responses of 2000 women to an extensive questionnaire are currently being analyzed. The questions cover various areas that could be related to the etiology of UTI, including number of sexual partners, type of contraceptive used, and estrogen therapy. We anticipate that these data will provide many avenues of research for which the model described here might be useful, such as an examination of the effects of estrogen therapy or sexual behaviors on the probability of developing UTI.

0.45 0.4. 0.35 0.3 v 0.25 p 0.2 ~' 0.15 / 0.1 0.05 0 e r o o O m' o I _ t -- 0 M' t (months since coitarche) Figure 6: Cumulative probability of experiencing a first UTI ( —- original h(t), - h(t) decreased by 20%) 0.08 0.07 --, 0.06. v 0.05 - 0.04 0.03 v,' / 0.02 0.01 -, 0 0 ei Cl N _ 00 M t (months since coitarche) Figure 7: Cumulative probability of experiencing a second UTI ( —- original h(t), - h(t) decreased by 20%)

APPENDIX A Kaplan-Meier Estimation In order to estimate h(t) = P(T = tiT > t - 1), we first need to estimate the conditional hazard, g(t) = P(T = t I t -1 < T < 14),from the available data. The condition T > t -1 is implicitly assumed to be present in the work that follows. Let n, be the number of women who have not yet had a UTI by the start of year t, and d, be the number of women who develop a UTI in year t. The hazard is defined as: g(t) = (t) (1) S(t where f(t) is the probability mass function of T (the time of first UTI), and S(t), the "survival" function, is P(T > t). Both f(t) and S(t) can be estimated from the data. The number of women who do not develop a UTI during year t is given by (n, - d,). An estimate of f(t) is thus obtained by dividing the number of women developing a UTI during year t by the total number of women at risk. Thus for each year t, we have: f(t)= t (2) 210 where 210 is the total number of women in the data set. The estimation of S(t) is slightly more involved. Let Pt = P(no UTI in year t I no UTI up to t - 1). Then S(t) = PIP2... P. An estimate of Pt is: (n, - = _ (3) Ptt- (3) n, n, We can now use (3) to provide the Kaplan-Meier estimates S(t): S(t) =n( 1- n =(t- 1) -- (4) 1 n, I nt Finally, using (2) and (3) in equation (1) produces an estimate of the conditional hazard, g(t). The results are shown in Table A. 1 and Figure 2. The same procedure was used to estimate m(j) = P(J = jlJ > j - 1). The results are shown in Table A.2 and Figure 3.

APPENDIX A, cont'd Year ____ d__t nft 3%_S(t) f(t) h(t) 0 0 210 1 0 0 1 24 210 0.886 0.114 0.129 2 33 186 0.729 0.157 0.216 3 30 153 0.586 0.143 0.244 4 38 123 0.405 0.181 0.447 5 25 85 0.286 0.119 0.417 6 20 60 0.190 0.095 0.500 7 18 40 0.105 0.086 0.818 8 9 22 0.062 0.043 0.692 9 5 13 0.038 0.024 0.625 10 2 8 0.029 0.010 0.333 11 2 6 0.019 0.010 0.500 12 1 4 0.014 0.005 0.333 13 1 3 0.010 0.005 0.500 14* 1 2 0.005 0.005 1.000 * Note that only one person developed a UTI during year 14, while two were at risk. The estimation method gives an estimate of zero for the hazard function for all subsequent years until the last person developed a UTI, at which point the estimate of the hazard is undefined. Thus estimates were carried out only until year 14. Table A.I: Kaplan-Meier Estimation of Yearly Hazard to First UTI Month dt nt (j)() (j) 0 0 263 1 0 0 1 8 263 0.970 0.030 0.031 2 17 255 0.905 0.065 0.071 3 4 238 0.890 0.015 0.017 4 7 234 0.863 0.027 0.031 5 7 227 0.837 0.027 0.032 6 3 220 0.825 0.011 0.014 Table A.2: Kaplan-Meier Estimation of Monthly Hazard to Second UTI

APPENDIX B Maximum Likelihood Estimation As an alternative to using the Kaplan-Meier (non-parametric) estimates of g(t), a parametric model can be used. The following functional form is assumed for the hazard: g(t) Lab tb (1) For convenience, this formulation assumes that t is continuous. From equation (1), the probability density function for T (the time of first UTI) is: ate-at2/2 t < b f(t) = g(t)e-og() =abe-a('-b) t b (2) Given the data t, = time of the ith UTI developed and letting m(b) indicate the last UTI developed before time b, the likelihood function L(a,b) is: L(a,b) = Ilf(ti) = liatie-aI22 nabe-(-b (3) {i:ti <b} {i:ti >b} = a'm(b) lt e 2{i:tib} an-m(b)bn-m(b)e {:t,>b}.{i:ti <b) The maximum likelihood estimates a and b are the values that maximize L(a,b). To maximize L(a, b), we take the logarithm of each side to get the log-likelihood function, I(a, b). I(a,b)=m(b)lna+ln rlt. - a 2 t +[n-m(b)]lna+[n-m(b)]lnb-ab E (ti-b) (4) \ i:t ~b- 2 { i:t < b\ {i:t i>b} Values of a and b that maximize L(a,b) will also maximize l(a,b). Each side of equation (4) can be differentiated with respect to a, and set equal to zero (a necessary condition for maximization, since l(a,b) is a concave function of a): It? dl m(b) {i:t, + n - m(b) b Y (t - b) (5) da a 2 a {i:tj >b} A Equation (5) can be solved to get a in terms of b: a= (6) ( tS/2) + b (tib) {{i:ti <b} {i:t, >b} A crude search technique was used, in which candidate integer values of b were selected, corresponding a values found from equation (6), then l(a,b) evaluated to find the maximum. Final estimates were a = 0.18 and b = 4