Bounds on Optimal Values in Stochastic Scheduling John R. Birge Department of Industrial and Operations Engineering The University of Michigan Ann Arbor MI 48109-2117, USA K.D. Glazebrook Department of Statistics University of Newcastle upon Tyne Newcastle NE1 7RU UK February 5, 1996 Technical Report 96-2

Bounds on Optimal Values in Stochastic Scheduling J.R. Birge * Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109 USA K.D. Glazebrook Department of Statistics University of Newcastle upon Tyne Newcastle NE1 7RU UK February 5, 1996 Abstract: Consistent stochastic orders of processing times and objective functions yield optimal policies in many stochastic scheduling problems. When these orders fail to hold, however, finding optimal values may be difficult. In this paper, we show how to bound these values in general situations including problems with unreliable machines and tardiness-based objectives. 1. Introduction. Most practical scheduling problems involve some uncertainty about the lengths of time to process jobs and the availability of machines and other resources. Recent results (see, e.g., Chang and Yao [1993] and Righter [1994]) using stochastic order relationships have characterized optimal policies when these orders are consistent. While these results may be quite useful in certain situations, the conditions for optimality are most likely rare in practice. We show how the results may be extended to broad classes of problems by constructing bounds on optimal values. While our basic approach may apply in multiple machine problems, to keep the analysis simple, we consider a single-machine and fixed set of N jobs. Costs are assumed to be the expectation of * Based in part on work supported by the National Science Foundation under Grant DDM9215921. 1

functions of job completion times, including tardiness objectives. The processing times may include breakdown processes. Our results represent extensions of results in weighted flow time models as well as previous models with due dates. In the weighted flow time case, the application of Gittins indices (Gittins [1979], Glazebrook and Gittins [1981], Glazebrook [1976]) has led to characterizations of optimal strategies in many contexts. Special cases with machine breakdowns also appear in Glazebrook [1984], Birge et al. [1990], and Birge and Glazebrook [1988]. Birge et al [1990] also gives limited results for linear tardiness penalties. Weighted fixed cost penalties for each tardy job are considered in Glazebrook [1983]. Optimal policy results for general functions of completion times appear in Righter [1994]. We use the optimality results in Chang and Yao [1993] extensively below. Shaked and Shantikumar [1994] give general properties of stochastic orderings that lead to many of the optimality results. In Section 2, we describe our model and previous results. Section 3 gives bounds for deviations when jobs have order specific costs. Section 4 provides corresponding results when costs are job specific. Section 5 gives bounds in the case of nonagreeable costs while Section 6 discusses optimality conditions for nonpreemptive policies in the presence of breakdowns. Section 7 presents conclusions. 2. Model and Background. We suppose a set J = {1,2,..., N} of jobs is to be processed on a single machine. Each job j has a random processing requirement Xj. The processing requirements are independent of each other. At t = 0,1,..., the state of the system consists of the set of completed jobs and the cumulative processing time xj for all unfinished jobs in J. Decisions are made at t on which job to process during the interval [t, t + 1]. A policy ir is any nonanticipative rule dependent on the history of the process that determines which jobs to process at t until all jobs in J are completed. We assume the cost of a policy is the expectation of a function of the completion times, V(7r) = E[f(cl,...,CN)], (1) where ci denotes the completion time of job i and E, denotes the expectation under policy 7r. The objective is to minimize V(7r). The objective in (1) is job specific as in the case where each job may correspond to a different product. When each job is substitutable, the cost may be order specific, only depending on the order of finish, with objective f written as f(c(i),..., c(c(N)). 2

A common objective in (1) is weighted flow time f(cl,..., CN) = N1 iCi. In this case, assuming no breakdowns, if job j has received xj units of processing, then Gittins [1979] and Glazebrook [1984] show that an optimal policy processes the uncompleted job with highest Gittins index yj(x) where ( -r rtt-2 wjZpj(x + t- 1) H(1 -j( + S)) i E (1 - pj(X + S)) t=O0 s=0 J and pj(t) = P(Xj = t + 1 Xj > t), t = 0,1,..., is the completion rate function. It is not hard to see that if pj is nondecreasing, then the supremum in (2) is attained at r = +oo and so yj(x) = wjlpj(x), where pj (x) - E[xj -xjx>x When all jobs have nondecreasing completion rates, then ordering jobs by the weighted expected remaining processing time is optimal. An optimal nonpreemptive policy is also given in Glazebrook [1983] for an objective that includes a reward for completion before a due date, di for each i, and no reward beyond the due date. In this model, if weights are decreasing, wl >... > wN, due dates are increasing d1 <... < dN and processing times are stochastically increasing, P[X1 < x] < P[X2 < x]... < P[XN < x] for all x, with the order unchanged when conditioning on past processing, then processing in the order 1,2,..., N until each job is completed is an optimal policy. For a breakdown model with deterministic processing times and with linear tardiness penalties, Birge et al. [1990] show this order is again optimal. Chang and Yao [1993] provide results for more general objectives with various conditions. Righter and Shantikumar [1992] also provide generalizations in other ways. The variety of proof techniques and other extensions appear in Righter [1994]. We use these results in the following. Before summarizing them, we define some notation. We assume that the order, 1,2,..., N, is consistent with shortest expected processing time (SEPT). Following this policy to the completion of each job has cost, fSEPT. Following the order, N, N - 1,..., 1, until each job is completed is longest expected processing time (LEPT) order with cost, fLEPT. Following the wAL order to the completion of each job is weighted shortest expected processing time (WSEPT) with cost, fWSEPT. We denote the ordinary stochastic order for two positive random variables, X and Y, by X <st Y, meaning that P{X > t} < P{Y > t} for all t E [O.oo). We also use that X is less than Y in hazard rate order, X <hr Y, to mean that 4pYt4 is decreasing in t for all t. When X and Y have 3

(here discrete) densities, gx and gy, respectively such that gx(t)lgy(t) is decreasing over the union of their support, then X is less than Y in likelihood ratio order, X 1,r Y. We note that likelihood ratio order implies hazard ratio order which, in turn, implies stochastic order. It is often useful to have the orders consistent regardless of the amount of processing that has been completed. Here, X has increasing likelihood ratio (ILR) if [X - tlX > t] <Ir [X - slX > s] for all 0 < s < t, which implies that X has an increasing hazard rate (IHR). If likelihood ratios of two ILR random variables, X and Y, are ordered so that gx(s + &)gy(t) < gx(s)gy(t + 6) for all 6 > 0 and any s and t, then X is less than Y in increasing likelihood ratio order, X <iir Y. Similarly, if PX>t+^} is decreasing in 6 for all s and t with IHR X and Y, then X is less than Y in increasing hazard rate order, X <ihr Y. The results also generally make use of restrictions on the objective functions that are consistent with processing time orders. We say that the costs are aggreeable if f(ci,..., ci,...,cj,..., cpv) > f(C1i,..... C,..Ci,...,CN) whenever ci > cj. In particular, this implies, when f is separable (f(ci,..., CN) = EN= fi(ci)), that fi - fj is increasing for i < j. From these relationships, we have the following results in Table 1 where we use A = ST, (I)LR or (I)HR to indicate Xi <A Xi+l for all i under ordering A. A function f is supermodular if f(x + eiei + ejj) > f(x +eei i) + f(x + ejEj) for any ei, ej > O, where ei is the ith unit vector. We use CY for Change and Yao [1993], RS for Righter and Shantikumar [1992] and R for Righter [1994]. Proc. Times Preempt? Conditions on f Result Source LR No Order specific, increasing E[fsEPT] < V(7r) < E[fLEPT] CY HR No Order specific, increasing, E[fsEPT] < V(7r) < E[fLEPT] CY supermodular ST No Order specific, increasing, E[fSEPT] < V(7r) < E[fLEPT] CY separable IHR(DHR) Yes Order specific, increasing, E[fsEPT] < V(7r) < E[fLEPT] CY supermodular ILR(DLR) Yes Order specific, increasing E[fsEPT] < V(7r) < E[fLEPT] RS LR No Job specific, increasing, E[fsEPT] < V(7r) < E[fLEPT] RS(R) aggreeable HR No Job specific, increasing, E[fsEPT] < V(7r) < E[fLEPT] RS(R) supermodular, aggreeable ST No Job specific, increasing, E[fsEPT] < V(7r) < E[fLEPT] RS separable, aggreeable IHR(DHR) Yes Job specific, increasing, E[fsEPT] < V(wT) E[fLEPT] R(comment) supermodular, aggreeable ILR(DLR) Yes Job specific, increasing, E[fsEPT] < V(fr) < E[fLEPT] R(comment) aggreeable 4

Table 1. Order conditions on optimal policies. As noted, these results generally extend to models with breakdowns (see also Hirayama and Kijima [1992]) when the breakdown process does not depend on the job in process. It also applies to arrivals with two classes of jobs (assuming nonidling policies, see Righter [1994] and Chang and Yao [1993]). Our interest here, however is when the conditions in Table 1 are not satisfied. In this way, we derive results that extend the weighted flow time bounds on optimal policies in Birge and Glazebrook [1988]. We begin by considering the order specific objective case. 3. Order Specific Costs. In general, our approach is to relax the processing time order conditions by constructing an artificial processing time Xi+, distribution for i + 1 such that Xi+, <~hr Xi+l and Xi <hr Xi+l4 If this substitution is consistent with a given order, then we can apply the results in Table 1 and construct bounds on optimality. One case in which ordered processing times can be constructed is when each processing time has an upper and lower geometric rate. Such conditions were also used for breakdown rates and bounds in weighted completion time problems in Birge and Glazebrook [1988]. For these rates, we assume for processing time Xi, that fi is the upper geometric rate of Xi with Fi = inf{,L > 0 | X, ~ST X, where X, is geometric with rate pL (i.e., follows a geometric distribution with expectation 1/t) }. The lower geometric rate i. for Xi is defined such that pi = sup{/P > 0 Xi <ST X, where X, is geometric with rate /)}. Assuming that each processing time has an upper and lower geometric rate, we can define two permutations on 1,..., N, 7r and ir, that are optimal for the problems with geometric random variables at upper and lower rates, respectively, replacing the original processing times. Let the value with upper rates be V and the value with lower rates be V. An immediate consequence is the following. Theorem 1. Suppose each processing time Xi has upper and lower geometric rates as defined above and suppose 7r* is an optimal policy for objective, V(7r), with order specific costs and f increasing, then, for the policies following permutations r and o_ as defined above, V(i) < V(7*) < V(Er). (3) 5

Proof: To see this result, note that V(7r) < V(7r) and V(7r) < V(1r) for any 7r. From Table 1, the optimal permutation policies 7r and ir exist for V and V by using orders consistent with the upper and lower rates that then satisy ILR order. Note that Xri <st Xi. With standard coupling results, we can define Xi =distribution Xi and Xi < Xi almost surely. Then, it follows that substituting Xi for Xi, V(7r) < V(7r*) < V(7r*). Similarly, V(7r*) < V(r) < V(7r). The result follows. ~ We can in fact strengthen the result in (3) to measure the maximum deviation possible from an optimal nonpreemptive policy to an optimal policy allowing preemption. Theorem 2. Suppose each processing time Xi has upper and lower geometric rates as defined above and suppose i* is an optimal policy for objective, V(7r), with order specific, separable, and increasing costs, and -r is an optimal policy for V as defined above, then V(*) - V(7r*) < <K (4) V(7r*) -1-e where e = min[l, Ej=1 1 — ] -3 Proof. From Theorem 1, assuming the t order is 1,..., N, we have: N (r) - V(7r*) < V(7r) - V(r*) V(7) - V(r) = Z E[fj(X1+.+Xj) - fj (Xi +~ + X)]. (5) j=l We begin with the first term on the left-hand side in (5). Here we have: 00 E[fl(X,)] - E[fl(Xl)] = fl(x){_(l - pl)x - _1(1 - f_)x-l} x=1 00 X = E[Z{fl(y) - fl(y - 1)}]{_1(1 - _l)x-l - i1(1 -,)X-1} x=l y=l (6) 00 = E{fi(y) - fl(y - 1)}{(1 - )y- - (1 -- )y-l} y=l y= - 1 - where the last inequality follows since (1- _l)yl - (1- f)y-l < (i - _1)(1-,i)v-2 = (1 _1)Y-1 [l —-]1 6

Now, we consider the second term where E[f2(Xl + X2)] - E[f2(X1 + X2)] = E[f2(Xl + X2)] - E[f2(Xl + X2)] (7) + E[f2(X1 + X2)] - E[f2(X1 + X2)]. Conditioning on X1 and X2 respectively, we can proceed as in (6) to show that /2 - ]E[f(x E[f2(X1 + X2)] - E[f2(Xl + X2)] < [ - ]E[2(X X2)] (8) 1_~2 and E[f2(X + X2)] - E[f2(X1 + X2)] < [1 - ]E[f2(X1 + X2)]. (9) 1 -- Hence, from (7-9), tl2 - /L2 E[f2(X1 + X2)] -E[f2(Xl + X2)] < [- 2 ]E[f2(X + X2)] +[I- 1 - ]E[f2(X, X2)] (10) 1 - 1 2 < {[_ - ] + [ [ 1 ]}E[f2(Xl + X2)]. Repeating the argument for (10) up to job j, we obtain E[fj(X +.. +X)] - fj(X + +Xj)] < { [ I ]}E[,fj]. (11) i=1 3 Now, we have from (5) V(7T) - V(7r*) < { L[ - - ]}yV(). (12) i.=l — J We can then let e = min{l,N[ -_]} to obtain V(7r) - V(7r*) < V'(7r) - V(70) < eV(O). (13) Noting that we have V(7t) < (1 - e)-1V(r) < (1 - e)-lV(7r*) from (13), we obtain the result. U 7

4. Bounds with Job Specific Costs. The results of the previous section assume that costs are order specific. We now wish to extend the results to job specific cases. For these results, we must define rates to be consistent with an agreeable objective function order. We assume that the objective function has an agreeable order, 1,..., N. Then, we define Aii = maxj>i ftj for i = 1,.... We then have Xi which is geometric with rate Ai in IHR order. We suppose V is the objective for these processing distributions and that *r is a corresponding optimal policy. We then obtain the following. Theorem 3. Suppose each processing time Xi has upper and lower geometric rates as defined above and suppose 7r* is an optimal policy for objective, V(r), with job specific and agreeable costs, then, for the policies following permutations #r and r as defined above, V(f7) < V( (*)< V(7r). (14) Moreover, V(ii) - V(*) (15) V(7r*) - -' where e = min[1, EN=l[ -+1 = min[l, E 1[ [ + ]i Proof. This follows exactly as in the proof of Theorem 2 where we note that the i values provide the necessary ordering. * Note in (15) that e has two components, the first for the extent of incorrect ordering and the second for the deviation from geometric times. 5. Bounds for Nonagreeable Costs. The conditions of the optimal permutations in Theorems 1-3 are fairly restrictive, but they may also be used to obtain bounds on optimal objective values whenever processing times follow a consistent stochastic order or have upper and lower bounding geometric (or, in the continuous case, exponential) distributions. For the following, we first assume an ordering on the processing times and then define an alternate objective that is agreeable. To' define the new objective, suppose f is separable and increasing. We define f+ by N f+(cl,...,CN) = f(Ci), (16) i=l 8

where f+(ci) = fi(ci) + 6+(ci) and 6+ is such that f+ is agreeable. For example, we may define 6+ by t + -(t) Z64+(t), (17) 1 and 6 (t) = sup{fk(t) - f(t), 0} (18) k>i where f(t) = f(t) - f(t - 1) (or df/dt in the continuous case with an integral replacing the sum in (17)). Note that 6+ is nonnegative so that 6+ is increasing for all i and that 6+ is identically zero whenever costs are agreeable. Similarly, we can define f- as in (16) with f-(ci) = fi(ci) + 6- (ci), 67-(t) = Z - 67(t) and i-(t) = -supk<i{fi(t) - fk(t),0}. In this case, we again have 6- identically zero whenever costs are agreeable. As an example, for the case of expected weighted tardiness objectives, fi (t) = wi(t - di)+, where di is a due date for job i. Suppose that the processing times are ILR, we then have (t) = - sup{f(t) -fk(t), 0} k<i = -SUP{Wilt>di,k<i:t<dk, SUPk<i,t>dk;t>di {Wi - Wk}, 01t<di} * (19) = -Wilt>di,k<i:t<dk + SUPk<i,t>dk;tdi Wi - Wk }ltdk,k<i We, therefore, have that 6i = -wi(t-di)lt>di,k<i:t<dk -SUPk<i,t>dk;t>di (Wi-Wk}(t-SUpkidk)lt>dk,k>i. If we define a new set of weights, wi, i = 1,..., N, recursively by w1 = Wi, w^+1 = min({w, wi+)}, i = 1,..., N - 1 and a new set of due dates, di-, i = 1,..., N, by dj = dl, di+l = max{di, di)i}, i = 1,...,N - 1, then f-(t) = w-(t - d-)+. Here, if, for example, wl = 3,W2 = 2,wl = 1, and d1 = 3, d2 = 2, di = 1, then, for f-, we have wF = 3, w = 2, w = 1 and d- = 3, d = 3, d- = 3. On the other hand, if the weights are, w1 = 1, w2 = 2, wi = 3, then w1 = w2 = w1 = 3. Suppose the objective value of any policy 7r with f+(-) replacing f is V+(-)(7r). This provides bounds on the optimal value of V(7r). Combined with the result in Theorem 3, we can then define V- (7r) as the objective value under policy ir for using distributions with the upper geometric rates, t, and the objective f-. We can also define V+(Ir) as the objective value under policy 7r for using distributions with the lower geometric rates, A/, and the objective f+. We then have a general bound whenever processing times about upper and lower geometric rates. 9

Theorem 4. Suppose each processing time Xi has upper and lower geometric rates, f is job specific, separable and increasing,'r* is an optimal policy for objective, V(7r), and 7r+(-) is an optimal policy for V+(-) with objective f+(-) as defined above, then V-(r+(-)) < V(r*) < v(r+(-)) < V+(7r+). (20) Proof. By construction, 7r- is optimal for the objective V+(-). This policy in fact correspond to SEPT (under the Ai order) according to the result in Table 1 and our assumptions. For any policy 7r, we have V(7*) = E, [f(cl,..., CN)] < Er[f(cl,..., CN)] < Er[f+(cl,..., CN)] (21) N = E[f(cl,..., CN)] + Er 6 (ci)]. i=l For ir = r+/-, we have the middle inequality of (20). We also have N V(7r*) > Er* [f(c1,.., CN)] + E*- [ai- (ci)] i=1 (22) ~ V-(7*) > V-(7T-). Since the differences in (20) depend on all the costs, we do not have as readily available comparisons as in Theorems 2 and 3. We have an additional bound possible using the increasing property of the 6+ functions. If we apply the same process as in the construction of f+ to obtain 6++ from 6+ which is agreeable. Then, for any policy ir under the conditions of Theorem 4, we have Er [E-= ++t(Ci)] < Er [E-1 6i (ci(Tr))] < ELEPT[[E 1 6++(ci)]. In this case, N N V(Tr*) = V+(TT*) - E, [ +(Ci)] > V+ (T) - ELEPT[- 6+(i)], (23) i=1 i=1 where, again the last term vanishes when costs are agreeable. 10

6. Optimality of Nonpreemptive Policies with Breakdowns. Other results may also be possible with specific distributions when costs are not agreeable. It may be advantageous, for example, to identify when nonpreemptive policies are actually optimal within the class of preemptive policies. We have seen this result for ILR and IHR families. The next result shows that it is extendable to arbitrary deterministic processing times with increasing separable convex costs and a breakdown process. In the following, we assume integral processing times, pi, i = 1,..., N, and suppose that the class of preemptive policies is stationary, meaning that decisions may depend on the completed processing of each job but cannot depend explicitly on the time or breakdown process. Theorem 5. Suppose that Xi = pi (deterministic), completion times are affected by a random breakdown process which is independent of the job order and has finite expected number and downtimes in any fixed interval, and f is separable, increasing and convex, then there exists an optimal policy within the class of stationary preemptive policies which is nonpreemptive. Proof. Suppose, without loss of generality, that an optimal preemptive policy * completes jobs in the order, 1, 2,..., N. By our stationarity assumption and deterministic processing times, this order does not depend on the breakdown process. Thus, the completion order remains deterministic. Suppose job i consists of pi separate unit length jobs labelled, ii, i2,..., ipi. We define a new cost function, f', on job ij, 1 < j < Pi, by fj(t) = E{fk(t)- fk(t- 1)}. (24) k>i Let V'(r) be the value for the process defined with objective f' in (24) and some policy Tr. We then have N pi V'(7r) - =,[ E Zf (cI )] i=1 j= N > E,[I fj (max[c,..., cj])] (25) j=1 N > E[E fj(cj)]. j=1 We can show that * is in fact optimal for V'. For the jobs in the process for V', consider the lexicographic order on {ij}. We have fjl(t) - fjk(t) = 0 for any k, 1. Also, whenever j < k, we have 11

fjl(t)-fkm(t) = Ekj1(fi(t) -f(tt-l)), which is increasing by the convexity of f. Therefore, by Theorem 2.2 in Birge et al. [1990], the lexicographic ordering, 11, 12,..., Ipl, 21,..., 2p2,..., N1,..., NpN, is an optimal policy for V'. This policy in fact corresponds to the nonpreemptive policy r to process in the order 1, 2,..., N. From (25), N V(i) = E* [Z fj,(cj)] j=1 N = E*[Z fj(max[ci,..., cj]) j=1 (26) N > E [Z f (max[ci,..., c])] j=1 N > E. [_ f (cj)] = V(i). j=l Thus, the nonpreemptive sequence is optimal. m We can extend the result in Theorem 5 to stochastic processing times if Xi = Z(i) Yj, where the Yj are independent and identically distributed. In this way, we can replace the time t- 1 with the time of the last completion before t and apply the approach in Righter [1994] to show that interchanges cannot improve the objective for V' as in the proof of Theorem 5. 7. Conclusions. We have presented results that provide bounds on the value of optimal policies in stochastic scheduling. The bounds measure the extent to which distributions and cost functions vary from the orderings that produce optimality for simple orders. These bounds may be used in heuristic or optimum seeking algorithms to eliminate suboptimal strategies or provide e-optimal results. References. 1. Birge, J.R., Frenk, J.B.G, Mittenthal, J., and Rinnooy Kan, A.H.G., "Single-Machine Scheduling Subject to Stochastic Breakdown," Naval Research Logistics 37 (1990) 661-677. 2. Birge, J.R., and Glazebrook, K.D., "Assessing the Effects of Machine Breakdowns in Stochastic Scheduling," OR Letters 6 (1988) 267-271. 12

3. Chang, C.-S., and Yao, D.D., "Rearrangement, Majorization, and Stochastic Scheduling," Mathematics of Operations Research 18 (1993) 658-684. 4. Gittens, J.C., "Bandit Processes and Dynamic Allocation Indices," J. Royal Statistical Society, Series B 41 (1979) 148-177. 5. Glazebrook, K.D., "Stochastic Scheduling with Order Constraints," International Journal of Systems Science 7 (1976) 657-666. 6. Glazebrook, K.D., "On Stochastic Scheduling Problems with Due Dates," Int. J. Systems Sci. 14 (1983) 1259-1271. 7. Glazebrook, K.D., "Scheduling Stochastic Jobs on a Single Machine Subject to Breakdowns," Naval Res. Logist. Qualt. 31 (1984) 251-264. 8. Glazebrook, K.D., and Gittins, J.C., "On Single-Machine Scheduling with Precedence Relations and Linear or Discounted Costs," Operations Research 29 (1981)161-173. 9. Hirayama, T., and Kijima, M., "Single machine scheduling problem when the machine capacity varies stochastically," Operations Research 40 (1992) 376-383. 10. Righter, R.D., "Scheduling," Chapter 13 in M. Shaked and J.G. Shantikumar, Stochastic Orders and their Applications, Academic Press, San Diego, 1994. 11. Righter, R.D., and Shantikumar, J.G., "Extremal properties of the FIFO discipline in queueing networks," J. Applied Probability 29 (1992) 967-978. 12. Shaked, M., and Shantikumar, J.G., Stochastic Orders and their Applications, Academic Press, San Diego, 1994. 13