MAINTENANCE POLICIES FOR A MULTI-STAGE DETERIORATING SYSTEM WITH ACYCLIC PHASE-TYPE SOJOURN TIME DISTRIBUTIONS R.H. Yeh and C. Teresa Lam Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109-2117 Technical Report 93-3 February 1993

MAINTENANCE POLICIES FOR A MULTI-STAGE DETERIORATING SYSTEM WITH ACYCLIC PHASE-TYPE SOJOURN TIME DISTRIBUTIONS R. H. Yeh and C. Teresa Lam Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109-2117 Technical Report No. 93-03 February 1993 Submitted to IEEE Transactions on Reliability

Maintenance Policies for a Multi-Stage Deteriorating System with Acyclic Phase-Type Sojourn Time Distributions Ruey H. Yeh* and C. Teresa Lam* The University of Michigan, Ann Arbor Key Words - Acyclic Phase-Type Distribution, Markovian deterioration, Expected long run cost rate, Inspection and replacement policy, Policy improvement algorithms Reader Aids - Purpose: Report methods to derive inspection and replacement policies Special math need for derivations: Markov probability theory, nonlinear optimization, dynamic programming Special math need to use results: Nonlinear optimization, dynamic programming Results useful to: Maintenance planners, reliability analysts and theoreticians * Research Supported in part by a grant from the Research Partnership Program, Horace H. Rackham School of Graduate Studies, University of Michigan. 1

Abstract - In this paper we investigate inspection and replacement policies for a multi-stage deteriorating system with acyclic phase-type (ACPH) sojourn time distributions. We assume that the stage (except the failure stage) of the system is known only through inspections and the system can be replaced at any point of time. The time durations of inspection and replacement are nonnegligible. The costs incurred are inspection cost, replacement cost, operating cost, and downtime (idle) cost. In particular, the operating cost rates, replacement costs and times are functions of the degree of deterioration. Various methods are developed for deriving inspection and replacement policies to minimize the expected long run cost rate. 2

1. INTRODUCTION This paper studies inspection and replacement policies for a multi-stage deteriorating system whose sojourn time distribution in each stage is acyclic phase-type (ACPH). An ACPH distribution is the distribution of the time until absorption in a finite state NMarkov process with an upper triangular infinitesimal generating matrix. This multi-stage deteriorating system is considered here for the following two reasons. 1. Multi-stage deteriorating systems with exponential sojourn time distributions have been widely investigated in the literature [11,14-16] and are special cases of the model considered here. One well-known limitation of exponential distributions is that they assign a large probability to a shorter sojourn time and cannot express multimodality of a distribution. Because of the limitation of exponential distributions, semi-Markovian processes have been employed to model two or three-stage deteriorating systems [7,10,12,13] by allowing the sojourn time distributions to be general (non-exponential) distributions. However when the number of stages increases, analytical approaches to derive inspection and replacement policies become so complicated that they are essentially intractable. As we will see later on, if the sojourn time distribution in each stage is ACPH, then the overall model for the deteriorating system is again Markovian. While our model is more general, it also preserves the analytic tractability of Markovian models. 2. As discussed in [2,3,5,8], the class of ACPH distributions is very versatile and can be used to approximate general sojourn time distributions. Many methods have been developed to approximate general distributions or estimate empirical data by ACPH distributions. Discussion of these methods is outside the scope of this paper. We refer interested readers to [1,2,4,6,8,9]. The objective of this paper is to derive inspection and replacement policies which minimize the expected long run cost rate. 3

2. MODEL We consider a deteriorating system with the following properties. 1. The deterioration of the system at any point of time can be classified into one of a finite number of stages 1,2,.., n + 1. Stage 1 represents the initial new condition of the system. The intermediate stages 2, 3,.., n are ordered to reflect the relative degree of deterioration of the system (in ascending order). Stage n + 1 represents the failure condition of the deteriorating process. From stage i, the system will deteriorate to stage j with probability pij. In particular, we assume that p.. = 0 whenever j < i as shown in Figure 1. Plns+I PIn P2== New 2 Failed New _ Failed Figure 1: A Flow Diagram of a Multi-Stage Deteriorating System. 2. Suppose that the time for the system to stay in stage i E {1,2,..,n} is a ki-phase acyclic phase-type (ACPH) distribution. A cki-phase ACPH distribution is defined as the distribution of the time until absorption in a ki + 1-state Markov process with an absorbing state ki + 1, an initial probability given by a 1 x (ki + 1) vector (1,0,., 0), and an upper triangular infinitesimal generating matrix Ti Ti oT 0 L J;OT T:(ki; +I)x(ki +1) where the entries of T, satisfy Ti(u,u) < 0 for 1 < u < kI, Ti(u,v) > 0 for 4

1 < u < v < k;, T(u, v) = 0 for kI > u > v > 1, T, 1 + T = 0, and O and 1 are column vectors with all entries equal to 0 and 1, respectively. 3. The overall system can now be modeled by a Markov process with N + 1 = E ki + 1 i=1 states where state N + 1 is an absorbing state. The infinitesimal generating matrix of this Markov process is given by T1 T12 T13' * T1n Pl,n+l T1 0 T2 T23 T2n P2,n+l T2 T=:::: 0 0 0... Tn pn,n+l Tn OT 0T oT. T o ( N+1) x (N+1) where Tij = [pij T, ]0 for i < j and 0 is a matrix with all entries equal to 0. I ki x kj 3. MAINTENANCE ACTIONS At any point of time, we assume that there are two maintenance actions availableinspection and replacement. 1. Two types of inspection are considered here - complete and incomplete inspections. A complete inspection can identify the current stage as well as the time that the system has spent in the current stage. An incomplete inspection can only identify the current stage of the system. Since the states within stages are fictitious, it is therefore reasonable that inspections can only identify stage but not state in practice. We assume that each inspection incurs a fixed cost M and it takes q units of time. 2. A replacement action is taken to replace the system with a new identical system or to repair the system back to its initially new stage. When the system is in stage i, 5

the replacement cost and time are cs (cl = 0) and r-, respectively. We assume that when a failure occurs, the system must be replaced immediately with an identical new system. 3. WVe assume that during an inspection or a replacement, the system is neither operating nor deteriorating and it incurs a loss of m per unit time. When the system is operating in stage i, the operating cost is as per unit time. 4. NOTATIONS S The set of all stages for the deteriorating system, S = {1, 2,..., n+ 1} S The set of all states in the overall Markovian model, S = {1, 2,..., N + 1} Si Si C S and Si = kj + 1,, kj, i.e., it is the set of all states within stage i E S in the overall Markovian model after renumbering. Note that Sn+1 = {N+l}. ci The system replacement cost if it is in state i E S, ci = cj whenever i E Sj for some j E S. fr The system replacement time if it is in state i E S ri = rj whenever i E Sj for some j E S. as The system operating cost rate if it is in state i E S \ {N + 1}, ai = aj whenever i E Sj for some j E S \ {n + 1}. Pij(t) The probability that the system presently in state i will be in state j after t units of time, i,j E S. It is clear that Pij(t) = 0 for all j < i. Fi(t) The failure time distribution of the system starting from state i. Note that F1(t) = N Pi,N+l(t) and Fi(t) = 1 - F (t) = E Pij(t). j=i Ai(t) The expected operating cost of the system during [0, t] given it starts from state N t i. Ai(t) = Z ajt Pij(u) du. 6

pi The expected time to failure given the system starts from state i E S. [i = F,(u) du. 5. A POLICY IMPROVEMENT ALGORITHM In this section, we provide a Policy Improvement Algorithm (PIA) to find the optimal inspection and replacement policy by assuming that the current state of the system can be estimated after each inspection. Our objective here is to minimize the expected long run cost rate. Later on we will provide methods to estimate the current state of the system using information revealed by a complete or an incomplete inspection. Let 6(i) be the action selected at the time instant when the current state of the system is estimated as in state i E S and 6 = (6(1),...,6(N + 1)) be the sequence of actions selected for all the states. If 6(i) = 0, then the system should be replaced immediately. If 6(i) = ti > 0, then the system is inspected ti units of time later. Since a failure system must be replaced, we will only consider the set of all policies, A, with 6(N + 1) = 0. Let Ts(i) and C6(i) be the expected time and cost of the system starting from state i to the completion of the next replacement under policy 6 E A. Then, Ts(i) and Cs(i) can be calculated recursively by t _ N+1 T( Fi(u) du + qFi(ti) + Pij(t)T6(j) if (i) = t > (1) T6( - j=i (1) ri if 6(i) = 0 and N+1 Ai(ti) + (M + mq)i(ti) + E Pij(ti)C6(j) if s(i) = t, > 0 Csi) = j=i (2) ci + mri if 6(i) = 0 For an infinite time span, minimizing the expected long run cost rate is equivalent to minimizing the expected cost rate of a maintenance cycle which is the time interval between two successive replacements. Therefore, the objective here is to find 6* = (tt, t,... * * *, tv, 0) 7

such that = C6(1) C. (1) SEA Ts(l) T6.(1) Clearly, there are two trivial policies:'r = (0,0,., 0) and 6/ = (oc, o,., o, 0) with expected cost rates gr = m + cl/rl = m and gf = [Al(oo) + CN+1 + mrN+l]/(/I + rN+l), respectively. The optimal expected cost rate g* is therefore bounded above by the minimum of g, and gf. From [16], solving Equation (3) is equivalent to finding a g' E [0, o) and a policy'6 E A such that inf [Cs(1) - gT6s(1)] = C6.(1) - g*Ts.(1) = 0. Given any g E [O,m], SEA i E S, and 6 E iA, define Vs(i,g) = Cs(i) - gTs(i). From Equations (1) and (2) above, we have V6(i,g) = v6(it) if 6(i) > ci + (m - g)rf if 6(i) = 0 where v(i,g, t) = _ ( {A(t) + [M + (m - g)q] Fi(t) N+1 jt _ + E Pij(t)Vs(,g) - g (u)du (4) Since all the functions on the right hand side of Equation (4) are continuous for any 6 E A, v6(i,g,t) is a continuous function of t for all t E (0,oo). Furthermore, for each fixed g E [0,m], we have lim v6(i,g,t) = Ai(oo) + cN+i + (m - g)rN+l - g^i < 00 t-0oo and lim v,(i, g, t) = oo. Therefore, for each g E [0,m], inf v,(i,g,t) exists and is t-O tE[(,oo] finite. Furthermore, if W(g) = inf[Cs(l) - gT6(1)] for g E [0,m], then W(g) is clearly a continuous and non-increasing function in g with W(O) = inf C6(1) > 0 and W(m) < SEA 0. Therefore, there always exists a g E [0, m] and a corresponding policy 6g such that W(g) = 0, i.e., 6* = 6g and g* = g. The optimal policy 6* and cost rate g* can be obtained by the following Policy Improvement Algorithm (PIA). 8

Step 1: Select an initial policy 6, e.g., 6 = 6, and set g = m. Step 2: Construct a policy 5 as follows. Set s(N + 1,g) = cN+l + (m - g)rN+l and 6(N + 1) = 0. For i =, -1,.., 1, Set Vs(i,) =minm inf v(ig,t), + (m - g)r, (tE[O,co] 6 ( If Vs(i,g) = inf v,(i,g,t) = v6(i,g, t), then 3(i) = ti. tE[o,oo] If Vs(i,g) = c5 + (m - g)r, then 6(i) = 0. Step 3: If Vs(l,g) = 0, then g* = g and 3' = 6. STOP. Otherwise, g = C6(1)/T6(1). GOTO Step 2. A similar argument given in [14] can be used to show that the above algorithm converges to the optimal policy 6*. 6. ADAPTIVE METHODS Adaptive methods are developed to estimate the most likely state of the system after each complete or incomplete inspection and to apply the optimal policy 6* obtained by PIA. 6.1 Complete Inspection After a complete inspection, suppose that the system is identified to be in stage i E S \ {n + 1} for ri units of time. Then, the conditional probability that the system is currently in state j E Si is given by h,(r) = P ) (5) uES, within stage i. Then, we have hi(r,) = max hj(r1). Obviously, we also have P;,k(ri) = ~~~jE ~ ~ ~h (e Si= max Pij) since i the denominator of Equation (5) isb indepeyndent of j, given any ri E [0, oo). The method to implement 8* under complete inspection is summarized as follows. 9

Step 1: Use PIA to derive the optimal policy 6'. Step 2: Set inspection time t = t. Step 3: Perform an inspection t units of time later and identify the stage i and 7i. Step 4: Find k E S, such that Pk(Ti) = max Pij(7i) where i' is the first state in stage i. Step 5: If 6*(k) = t > 0, then t = t! and GOTO Step 3. Otherwise, replace the system immediately and GOTO Step 2. 6.2 Incomplete Inspection Under incomplete inspection, the estimation of the current state relies on the information revealed by all the previous inspections since the amount of time the system has spent in the current stage cannot be realized. Suppose that a new system is just installed, i.e., the system is in state 1 at time 0 and clearly the optimal action is 6'(1). To avoid the trivial case that a new system is replaced, we assume that 6*(1) = t; > 0. The system is therefore inspected at time t*. Set t' = t;. After the first incomplete inspection, suppose that the system is identified to be in stage il G S \ {n + 1}. Then, the conditional probability that the system is in state j E $il is given by M(tl) -pli^i) E P1j(tl) uESll More generally, let t' be the time interval between the r - 1st and the rth incomplete inspections. After the rth incomplete inspection, if the system is identified to be in stage I, then the conditional probability that the system is in state j E Sir is given by hr(tr) = tSE.r-l j m l h, Z (t - u)(th) vESir uE$ir-l The most likely current state Ic of the system is the state such that h (tr) = max h(t).' If 6'(k) =, then the system is replaced immediately and brought back to state 1. If 6'(k) = tk > O, then the system is inspected t. units of time later and t+1' = t~ > 0. The method to implement 6* under incomplete inspection is summarized as follows. 10

Step 1: Use PIA to derive the optimal policy 6*. Step 2: Set r = 1, t= 6-(1), h~(t) = and h(t) = 0 for allj S \ 1}. Step 3: Perform an inspection tr units of time later and identify the current stage iT. Step 4: Find k E S-r such that h;(tr) = max h(t'). Step 5: If 6*(k) = t > 0, then set r = r + 1, t = t and GOTO Step 3. Otherwise, replace the system immediately and GOTO Step 2. 7. RESTRICTIVE METHOD In this section, we present an alternative algorithm to derive an inspection and replacement policy for our multi-stage deteriorating system. The resulting policy can be implemented without estimating the current state of the system after a complete or an incomplete inspection. In this algorithm, we restrict 6(j) = 6(k) for all j, k E S. Let il and i2 E Si be the first state and the last state in stage i, respectively. Under this restriction, we modify Equation (4) by v6(lgt) = i p (t) {Al(t) + [M + (m - g)q] (t) + C Plj(t)v,(j,g,t) N+1 _ + E Plj(t)Vs(j,g) - g Fl(u)du (6) j=i2+l J for all ii < I < i2. Using the above modification, the optimal restricted policy can be obtained by adjusting Step 2 of PIA as follows. Step 2': Construct a policy 6 as follows. Set Vs(N + l,g) = cN+1 + (m - g)rN+ and 6(N + 1) = 0. For i = n,n- 1,,2,1, Find ii, i2 E 5, and v,(il,g,t) as given in Equation (6). Set Vs(i1,g) = m [in if tv(il,g,t),ci + (m-g)ri]. 11

If V6(i, g) = inf v6(il, g, t) = v,(iI, g, t), then 6(j) = ti for all j E S;. tE[0,*o] If Vs(il,g) = c4 + (m - g)ri, then 6(j) = 0 for all j E Si. Under the optimal restricted policy derived using the modified PIA above, if the system is identified to be in stage i E S, then the optimal action is either to inspect the system ti units of time later or to replace the system immediately. 8. EXAMPLES In this section, we consider two 5-stage deteriorating systems with the following common cost and time parameters. 1. al = 1, a2 = 3, a3 = 6, a4 = 9, 2. c1 = 500, c2 = 600, c3 = 1000, c4 = 1400, cs = 2100, 3. r1 = 20, r2 = 21, r3 = 23, r4 = 26, rs = 30, 4. M=, q = 0.1, m= 10. Using these cost and time parameters, gf can be computed and is equal to 10.99. 8.1 Example 1 In this example, kc = 1, 2 = 4, 3 = 1, Ik = 1 and the T matrix is given by -.01.009 0 0 0 0 0.001 0 -.04762.04762 0 0 0 0 0 0 0 -.04546.04546 0 0 0 0 0 0 0 -.04348.04348 0 0 0 0 0 0 0 -.04167.0375 0.00417 0 0 0 0 0 -.0125.01125.00125 0 0 0 0 0 0 -.01429.01429 0 0 0 0 0 0 0 0 12

Using PIA, the optimal policy 6* = (25.17, 11.75, 6.03, 1.85, 0, 0, 0, 0) and the corresponding expected cost rate g' = 7.11. For adaptive methods, the expected cost rates are estimated from 1000 simulated maintenance cycles and are given in Table 1 below. For restrictive method, the optimal restricted policy is obtained using the modified PIA and is given by (63.13,0,0,0,0,0,0,0) with expected cost rate equal to 8.01. Method g (g - gf)/gf Complete inspection 7.96 27.57% Incomplete inspection 7.97 27.48% Restrictive 8.01 27.12% Table 1: Results for Example 1 8.2 Example 2 In this example, kl = k2 = k3 = k4 = 2 and the T matrix is given by -.0204.0204 0 0 0 0 0 0 0 0 -.01961.01765 0 0 0 0 0.00196 0 0 -.02273.02273 0 0 0 0 0 0 0 0 -.02173.01956 0 0 0.00217 0 0 0 0 -.02564.02564 0 0 0 0 0 0 0 0 -.02439.021c. 0.00244 0 0 0 0 0 0 -.02941.02941 0 0 0 0 0 0 0 0 -.02778.02778 0 0 0 0 0 0 0 0 0 Using PIA, the optimal policy 3* = (28.55, 14.61,4.3,0,3.12,0,0,0,0) and the corresponding expected cost rate g* = 7.55. For adaptive methods, the expected cost rates are estimated from 1000 simulated maintenance cycles and are given in Table 2 below. For restrictive method, the optimal restricted policy is obtained using the modified PIA and is given by (62.6,62.6,0,0,0,0,0,0,0) with expected cost rate equal to 8.32. 13

Method g (g - gf)/gf Complete inspection 8.27 24.75% Incomplete inspection 8.38 23.75% Restrictive 8.32 24.29% Table 2: Results for Example 2 REFERENCES [1] T. Altiok, "On the Phase-Type Approximations of General Distributions," IIE Transactions, vol. 17, 1985, pp 110-116. [2] A. Bobbio, A. Premoli, A., and 0. Saracco, "Modeling and Identification of Nonexponential Distributions By Homogeneous Markov Processes," Proceedings of the 6th Advances in Reliability Technology Symposium, April 9-11, 1980, pp 373-392. [3] R. F. Botta, C. M. Harris, and W. G. Marchal, "Characterizations of Generalized Hyperexponential Distribution Functions," Commun. Statist. Stochastic Models, vol. 3, 1987, pp 115-148. [4] W. Bug, and U. Herzog, "The Phase Concept: Approximation of Measured Data and Performance Analysis," Computer Performance, Proceedings of the International Symposium on Computer Modeling, Measurement, and Evaluation, August 16-18, 1977, pp 23-38. [5] R. D. Cox, "A Use of Complex Probabilities in the Theory of Stochastic Processes," Proceedings of the Cambridge Philosophical Society, vol. 51, 1955, pp 313-319. [6] A. Cumani, "On the Canonical Representation of Homogeneous Markov Processes Modeling Failure-Time Distributions," Microelectronic Reliability, vol. 22, 1982, pp 583-602. 14

[7] V. Jayabalan, and D. Chaudhuri, "Optimal Maintenance and Replacement Policy for a Deteriorating System with Increased Mean Downtime," Naval Research Logistics, vol. 39, 1992, pp 67-78. [8] M. A. Johnson, "Matching Moments to Phase Distributions: Density Function Shapes," Commun. Statist. Stochastic Models, vol. 6, 1990, pp 183-306. [9] M. A. Johnson, "Matching Moments to Phase Distributions: Nonlinear Programming Approaches," Commun. Statist. Stochastic Models, vol. 6, 1990, pp 259-281. [10] J. Karpinski "Checking Policy for Deteriorating Multistate Systems," Naval Research Logistics, vol. 35, 1988 pp 259-268. [11] H. Luss, "Maintenance policies when deterioration can be observed by inspections", Operat. Res., vol. 24, 1976, pp 359-366. [12] A. Z. Milioni, and S. R. Pliska,"Optimal Inspection Under Semi-Markovian Deterioration: Basic Results," Naval Research Logistics, vol. 35, 1988 pp 373-392. [13] A. Z. Milioni, and S. R. Pliska, "Optimal Inspection Under Semi-Markovian Deterioration: The Catastrophic Case," Naval Research Logistics, vol. 35, 1988, pp 393-411. [14] H. Mine, H. Kawai, "An optimal inspection and replacement policy", IEEE Trans. Reliability, vol. 24, 1975, pp 305-309. [15] H. Mine, H. Kawai, "An optimal inspection and replacement policy of a deteriorating system", J. Operat. Res. Japan, vol. 25, 1982, pp 1-15. [16] M. Ohnishi, H. Kawai, and H. Mine, "An optimal inspection and replacement policy for a deteriorating system", J. Appl. Prob., vol. 23, 1986, pp 973-988. 15

AUTHORS R. H. Yeh; Industrial and Operations Engineering; University of Michigan; Ann Arbor, Michigan 48109, USA. R. H. Yeh received the BA degree in Industrial Engineering from National Tsing Hua University, Taiwan, in 1984. He also received MSE in Industrial and Operations Engineering from University of Michigan in 1988. He is currently a PhD candidate in the Industrial and Operations Engineering Department, University of Michigan. Professor C. Teresa Lam; Industrial and Operations Engineering; University of Michigan; Ann Arbor, Michigan 48109, USA. C. Teresa Lam received the BA degree in Mathematics from Oxford University, England in 1984. She also received MSc & PhD in Statistics from Carnegie Mellon University in 1985 and 1988 respectively. She is currently an assistant professor in the Industrial and Operations Engineering Department, University of Michigan. 16