Stochastic Programming Approaches to Stochastic Scheduling John R. Birge Department of Industrial and Operations Engineering The University of Michigan Ann Arbor MI 48109-2117, USA and M.A.H. Dempster Department of Mathematics University of Essex Clochester, England C04 3SQ Technical Report 95-12

Stochastic Programming Approaches to Stochastic Scheduling John R. Birge* Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, Michigan, USA 48109 M. A. H. Dempstert Department of Mathematics University of Essex Colchester, England C04 3SQ Abstract: Practical scheduling problems typically require decisions without full information about the outcomes of those decisions. Yields, resource availability, performance, demand, costs, and revenues may all vary. Incorporating these quantities into stochastic scheduling models often produces difficulties in analysis that may be addressed in a variety of ways. In this paper, we present results based on stochastic programming approaches to the hierarchy of decisions in typical stochastic scheduling situations. Our unifying framework allows us to treat all aspects of a decision in a similar framework. We show how views from different levels enable approximations that can overcome nonconvexities and duality gaps that appear in deterministic formulations. In particular, we show that the stochastic program structure leads to a vanishing Lagrangian duality gap in stochastic integer programs as the number of scenarios increases. Keywords: stochastic programming, hierarchical decision making, stochastic scheduling, queueing networks. large deviations, turnpikes, Lagrangian relaxation. * This author's work was supported in part by the National Science Foundation under Grants ECS 88-15101, ECS 92-16819, and SES 92-11937. t This author's work was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant A-5489 and by the UK Engineering and Physical Sciences Research Council under Grants J90855 and K;17897. 1

1. Introduction Scheduling decisions involve consideration of many quantities that are not known with certainty. Important characteristics such as process yield, processing time, resource availability, product quality, demand, input costs and eventual revenues may all be random quantities with varying levels of certainty. The inevitability of these uncertainties has promoted interest in stochastic scheduling (see, for example, Dempster et al [1982] and Righter [in Shaked and Shantikumar,1994]). The great majority of the stochastic scheduling literature concentrates on structural results for optimal solutions that show that, under certain conditions, it may be optimal to follow a fixed permutation schedule or use a form of an index rule (see, for example, Gittins [1981], Glazebrook [1981], Pinedo [1983], Shantikumar and Yao [1992]). While these results are quite useful in many settings, especially in terms of overall system evaluation, they do not, generally, extend directly to near-term operating conditions. Our interest in this paper is to use stochastic programming formulations to explore the relationship between broad system results for planning purposes and short-term results for actual operations. Our approach fits into a hierarchical approach to decision making (see also Dempster [1982], Dempster et al [1983] and Dempster [1994]) in which we view the decision process as breaking down into: (1) long-term, strategic dlecisions about overall capacity and scope (market): (2) medium-term, tactical or planning decisions about aggregate production for known items or services, or fixed orders; and (3) short-term, operational decisions for current situations. Scheduling decisions impact on each level of this hierarchy since they may deternine overall capacity and fixed term production, as well as decisions on what or how to produce in the present. Hierarchical stochastic programming provides a unifying framework for this analysis. All levels of the hierarchy can appear in the same model to allow for various methods of decomposition, approximation and solution. In this paper, we explore different types of approximation associated with each level of the hierarchy and shoNw how that approximation may aid the overall decision process. Our results are consistent with other attempts to unify stochastic and dynamic optimization as in, for example, Bertsimas [1994]. These are most compatible when we take a long-term view of design decisions 2

for overall performance and capacity. In Section 2, we explore this strategic decision making level. Our approximation in this section is to assume that operational scheduling effects are well approximated by heavy traffic fluid, or Markov-modulated fluid, queueing approximations, Dempster [1994]. We use an example from telecommunications (see Dempster, Key and Medova [1995]) to illustrate how this approximation and large deviation theory can reduce a complicated stochastic program to a simple deterministic problem. When higher level capacity and scope decisions are set, aggregate production scheduling becomes a dominant concern. In Section 3, we study this level of the decision hierarchy when production decisions can be approximated in a convex model. We give some justification for this in a planning setting. We then provide results from Birge and Dempster [1992] to give optimality conditions for this problem and provide a turnpike result and conditions for optimal cyclic schedules. This also justifies a match-up scheduling approach to disruptions, which was discussed for deterministic systems in Bean et al [1991]. At an operational level however, it is often difficult to eliminate the effects of nonconvexitis. In this case, other approaches are necessary. In Section 4, we show how to use a Lagrangian relaxation approach to the general stochastic scheduling problem with setups. This method has been used in power system sclheduling (see Takriti, Birge and Long [1994a]) with small duality gaps. Extending results from Bertsekas {1982]. *we show. in fact. that the duality gap in this relaxation approach actually decreases to zero as the ilnumber of samples in a discrete approximation procedure to the stochastic problem increases. 2. The Strategic Level and Queueing Approximation. Following the original ideas of G.B. Dantzig (see, Dantzig [1963]) in the context of military logistics, from tlle 1940's management science has been concerned with models at the macro, meso and micro levels of detail to support corporate decision-making in planning, management and control. Integrated hierarchical lumathemllatical and computer models reflect the classical three-level military planning concepts of strategic loIg-run, tactical medium-run and operational short-run. As decision-making moves down tihe corporate hierarchy, it becomes increasingly detailed and involves many more, but smaller, uncertainties. The modelling involved at succesively lower levels reflects these differences - paralleling the macro, meso and micro scale mathematical models of classical physics - increasing in complexity at each level. In a stationary corporate 3

environment, operational planning models - involving mainly management and control functions - can become extremely complex. In a highly dynamic uncertain environment, useful mathematical and computer models tend to become simpler, as it is the strategic and tactical decisions involving rarer major uncertainties which are critical for survival (see Table 1). Lead Time Cost Complexity Uncertainty Level 1 Strategic T I Macro T Level 2 Tactical I I I Meso I Level 3 Operational I I I Micro l Table 1. Strategic and tactical planning levels handle complexities and uncertainties by aggregation in a stable environment in order to focus on rare major environmental change. Msany problems in manufacturing, logistics and telecommunications are modelled at the operational letvel by open queueing system (see, for example, Buzacott and Shantikumar [1980], Harrison [1985], Chen an(d landelbaum [1991], Sethi and Zhang [1995] and Kelly [1991]), although the study of the behaviour of these systems with active scheduling policies is in its infancy (see, however, Bertsimas and Ninio-Mora 1'3. Bertsialls [1994] and Garbe and Glazebrook [1995] for situations in which simple index rules can be jltustil fl). Iil tflis section we first outline the aggregation of minor operational event uncertainties by changes of tillne' 111(l space scales appropriate to macro level strategic decisions and modelling for open queueing IV>x siIl> wl itll finite buffers. (Here we ignore the intervening meso - tactical - level regarding these queueing app))roximi;Ltiolls: thle reader is referred to Harrison [1985], Chen and Mandelbaum [1991] and Dempster 191l ftor Imiore details.) Using functional central limit theory, the complex operational stochastic model is lt.tfL-.{itot( to a simple deterministic fluid flow model for strategic planning purposes. hFor practical applications of hierarchical planning models, however, this aggregation is often too strong in tliat tle resulting approximation ignores the occasional random extraordinary influences of extreme tactical;11(1 operational level considerations on strategic planning. For modeling at this meso-macro boundary, I;larkov-lnodulated fluid flow models are appropriate. We illustrate strategic planning at this level with I hierarchical telecomunications planning model (treated in detail in Medova [1993], Dempster [1994] and DemIpster. Key and Medova [1995]) which uses large deviations theory to produce, from the original chance 4

constrained stochastic programming formulation, a simple deterministic mathematical programming model for network dimensioning and traffic balancing which can be used as a framework for fast real-time routing and scheduling algorithms. The section concludes by briefly indicating some extensions of this approach to other scheduling contexts. Consider now an open (i.e. exogenous input) queueing network on an arbitrary graph G(N, A) of nodes (with finite buffer capacity) and (directed) arcs, with random routing of particles from each node on each of its output links to each other node. (Classical sources for queueing networks are Kleinrock [1975] and Kelly [1979].) We assume arbitrary independent identically distributed exogenous input inter-arrival and service time processes at each node subject to the existence of a dynamic equilibrium (long-run) limiting process distribution for the system. This stationary stochastic process is characterized by an erogenous arrival rate A~, a potential service rate j and the switching fractions Pjk of particles that are routed directly to node k on link (j, k) after service at node j on each of the J:= INI nodes j in the network. The corresponding row vector/matrix triplet (Al, L', P) specifies the long-run average performance of the system and - assuming that particles arriving at a node with a full buffer are lost to the system - the total inflow vector A' of total arrival rates Aj at nodes j = 1,... J is the maximum solution of the traffic equations A' = A + (A' A p')P, (1) where A denotes coordinatewise minimum. Since the network is open, the routing transition matrix P has spectral radius o(P) < 1. ( A closed network has v(P) 1 and A':= 0.) The traffic intensity at node j is defined to be pj:= Xj/yj and node j is defined to be a bottleneck if pj = 1 and a nonbottleneck if pj < 1. Let Q':= {Q'(t):= (Ql(t)....QJ(t)): t > o} denote the integral equilibrium queue length process representing the network node (i.e. buffer and server) occupancies. Not surprisingly, it may be shown that the macro level fluid approximation of this process is supported on the set of bottleneck nodes. Moreover, it may be shown that Q can be expressed in terms of the potential net throughput process X':= {X'(t):= (X1(t),....,Xj(t)): t > 0}, 5

which represents the difference between the equilibrium cumulative input and potential output processes. and the equilibrium lost output process (due to empty nodes) Y':= {Y'(t):= (Y(t),...,YJ(t)): t > 0}. Together, X' and P completely specify the system as almost surely (a.s.) the unique solution of Q = X' + [Y' - Z'(I - P)] < B', (2) where B' is the constant process representing fixed node capacities and Z' represents the buffer overflow loss process, subject to the appropriate a.s. nonnegativity conditions and the requirement that Y' and Z' are a.s. nondecreasing and increase from 0 (the identically zero process) only when the corresponding coordinates of Q' and B' - Q' respectively are 0. These latter conditions may be neatly summarized by observing that Q' must be pathwise the a.s. unique solution of the linear order complementarity problem (see e.g. Dempster [1994]) (OCP) Q':= X' + [Y' - Z'(I - P)] < B' (3) Y'> 0' Z' > ' Q' > 0' (4) Q' A AY' = O' [B' - Q'] A AZ' = 0 (5) posedt il the Skorohod space of left-limited right-continuous functions of time. Here we note that we may express Y' = AY'(ti) and Z' = AZZ(,i) t.<(() Ti< (') ill terms of the jumps AY'(ti) and ZZ'(Ti) of the respective processes at jump epochs ti, T, up to time t. It follows that Y' and Z' are a.s. the least element processes satisfying the inequalities (3) and (4). tils ((CP) provides a complete description of the equilibrium behaviour of the specified finite buffer open (lJenlteinlg systeim at the operational level of equilibrium stochastic particle flow. Aggregating numbers of particles by the scaling X'/n, accelerating time by the scaling nt and applying a suitable functional law of large numbers for stochastic processes (see e.g. Chen and Mandelbaum [1991]), the 6

expected queue length process Q emerges at the strategic level in the a.s. limit as n -* oc as (equilibrium) deterministic fluid flow. This flow is regulated by a deterministic increasing process for expected lost output Y' and expected buffer overflow Z' and driven by the deterministic expected potential net throughput process, X:= {X'(t):= [(A' - /') V /']t: t > O}, (6) where V denotes coordinatewise maximum. Taken together these deterministic fluid flows satisfy a deterministic version of (OCP). As mentioned above, positive coordinates of Q and Z are associated with bottleneck nodes, as is appropriate to strategic decisions such as decreasing exogenous input rates or increasing service rates at bottleneck nodes or changing routing frequencies in the network so as to decrease their input loads. Notice that at this macro expected level - which is by no means the case at lower levels - buffer sizes at individual nodes are irrelevant. To illustrate a situation at the meso-macro level in which such dimensioning becomes relevant to strategic planning, we turn to a hierarchical telecommunications model with extensions to more general scheduling models. Following Dempster[1994], the general 3-level hierarchical telecommunication network planning model is given by minx1,,2[f(xi) + Level 1: Topology Design g(x. X2) + Level 2: Node and Link Capacity Allocation (7) IEh(xli 2, d)] Level 3: Traffic Management subject to x1 E Xl,x2 E X2, where. in a manufacturing setting, topology design includes plant layout, node and link capacity allocation illvolves tile material handling system design, and traffic management determines routing and sequencing. In this model. x1 and x2 represent a choice from usually an exponentially large number of integer design vectors in X1 and X2, representing node and link placements and capacity allocations respectively, while IE(-) denotes mathematical expectation in terms of the stationary distribution of the traffic demand process with state d in terms of bandwidth requirements on the allocated directed links. The function given by h(x1, X2, d) represents the value of a multicommodity flow problem (see e.g. Hu[1969]) which allocates the traffic demand d by origin-destination pair to routing paths over the physical network topology determined by xz within the 7

capacities determined by x2. Since real telecommunication networks tend to grow incrementally, in practical planning problems the physical topology of the network (determined by xz in the model) is usually taken as given, so that the main strategic decisions in telecommunications network management concern node (i.e. switch) and link (i.e. optic fibre) capacity allocation - particularly so as to allow resilience of the network in the face of random node and link failures. In manufacturing systems, the nodes become machines and storage areas while the links are material handling routes, such as conveyors and automatic-guided vehicle paths. Modern telecommunication networks involve many layers, both physically and with respect to traffic demand time-scales such as private network contracts (months or years), calls (minutes), packet bursts (milliseconds) and bit flow (microseconds). In manufacturing, similar contracts run from product lifetime supply (years) to hourly just-in-time deliveries. To illustrate the influence of meso level rare events on macro level strategic decisions, we shall present a simplified 2-level special case of (7) involving its second and third level. Our model is a special case of a more detailed model of asynchronous transfer mode (ATM) network dimensioning and traffic routing and balancing developed in Medova[1993], Dempster[1994] and Dempster. Key and MIedova[1995]. to which the reader is referred for more details. In particular, the optimal traffic managemnent cost function h of (7) will result from solving a certainty equivalent form of a chanceconsta liined stochastic program. Applications to telecommunications network planning with the alternative recourse stochastic programming formulation have recently been given in Sen et al [1992] and Bai et al [1994]. In general, the chance-constrained approach is natural to grade-of-service (GoS) considerations expressed in termls of acceptable blocking or loss probabilities at various time-scale layers of the network and usually results in a very much more compact computational representation of the problem than the recourse formulation. Simlilarly in manufacturing systems, a model can either incorporate penalties for shipping delays for recourse formulations or use service level constraints of the chance-constrained form. To see this in the telecommunications context, consider once more a fixed network topology represented 1b a directed graph G(N, A) and consider stationary stochastic (packet, cell or bit) traffic flows fw (in packets, cells or megabits per second) between each origin-destination (OD) pair w E W in the network during a specified traffic regime. The network dimensioning and traffic management problem we shall consider is to 8

allocate link capacity Ca, a e A, and route traffic fp on paths p E P,, a set of specified routes through the network from origin to destination node w, so as to optimize the expected net revenue occuring from carrying the stochastic traffic in terms of total OD pair bit flows fw = Z fp wEW (8) pE P., offered to the network. For simplicity, we treat a linear objective, but the arguments given below can easily extended to (level) separable convex cost functions and to the provision of node (switch), as well as link, capacity (see Dempster, Medova and Moise [1995]). Specifically, we consider the chance-constrained 2-level problem min [Z baCa - IE( E r, Z f)] (9) aEA wEW pEP,, subject to P{ y fp > D} < 9call w W (10) pEP,,, P{ Z fp Ca} < call a EA (11) pEQn fp>0 a.s. pEPw wEW. (12) Tihe objective (9) of this optimization problem contains coefficients representing the unit cost of link capacity provlsior, b,. a C A. and unit OD pair net revenue rw, w e W, resulting from carrying traffic between OD pairs. Tllhe first chance. or probabilistic, constraint (10) states that the probability of offered stochastic OD pair traffic flow f,,, exceeding a specified demanded capacity level Du, must not exceed the GoS for:;.l refusal probability gc,,l. The probabilistic constraint (11) states that call GoS 9call must be maintained f,;: the sum of offeredl traffic flows on the set Qa of all paths passing through each (directed) link a E A. The constraint (12) requires the a.s. nonnegativity of traffic flowxs. Problem (9-12) is completely analogous to the manufacturing situation where W corresponds to a set of products. P,, includes alternate production routings to complete product w, and Ca represents capacity of both paths and machines. At this level, we would assume that operational level sequencing effects would not be significant in reducing overall capacity. In the result below we shall use a large deviation bound to approximate the OD pair demanded capacities D,, which meet the probabilistic constraints (10) and (11) exactly. It is well known that a linearly 9

chance constrained problem of the above form has a deterministic equivalent problem which is obtained by replacing the probabilistic constraints involving random variables with equivalent constraints written in terms of deterministic variables which meet the probability restrictions as nearly as possible. These equivalent constraints are usually found by inverting the joint distribution of the random variables. However, if D, is specified (approximately) by a conservative traffic bound derived from large deviation theory, we may replace the stochastic traffic flows fp by deterministic fluid flows fp, p E Pw, w E W, which represent traffic flows just at the rate above which fluctuations of the stochastic flows fp would begin to exceed call GoS requirements. Then we can formulate the (approximate) deterministic equivalent of our problem as the path and capacity allocation (PCA) problem of Medova [1993], viz. (PCA) min [ baCa - Z r, E fp] (13) aEA wEW pEP,,, subject to E fp = Dw w E W (14) pEP,,, fp < Ca a E A (15) pEQ, fp > 0 p Pw w E W. (16) liHi> 111(1( 1 (lielltldsions link capacities and routes traffic at the maximal flow rates (bandwidths) consistelit w\ithl the specified individual call GoS. (In fact, the multilaver multiservice effective bandwidths Dv,, is - I'. (deve(lo()ped in Dempster, Key and Medova [1995] for ATM networks are also consistent with more st ri-Iiit (;,S criteriteria at the burst and cell timescale layers of the network.) The linear programming problem spe (itlld )by tihe PCA model (13)-(16) is a 2-level deterministic network planning and traffic management 11iodel in wlich the strategic level (first stage) dimensions link capacities and the lower level (second stage) allowates and routes (approximate) deterministic equivalent traffic between all specified OD pairs in the ncetwork so aLs to mlaintain GoS for its corresponding stochastic traffic on all paths and links. For fixed link (ILapaities. tlhe second stage problem is a multicommiodity flow problem in compact format instantiated using ()D pair-p)athi and arc-path 0-1 incidence matrices. \e summarize the above discussion in the following proposition. 10

Theorem 1. The chance-constrained stochastic programming problem (9)-(12) with Poisson flows and random call routing has an approximate deterministic equivalent linear programming problem given by (13)-(16) whose accuracy depends on the tightness of the large deviation bound employed to derive it. Remark. We establish the theorem using the least refined of the large deviation bounds, the Chernoff,bound, but both reducing constants and, under appropriate stochastic assumptions, tighter bounds, such as the Bahadur-Rao bound, can be obtained (see Dempster, Key and Medova [1995] for references). * Proof. First consider the total offered traffic f, between an arbitrary specified OD pair uw e W. Recall that the Chernoff bound is given by P{fw > D} < exp {-max,[sD - pf,,,(s)]}, (17) where the logarithmic generating function pf,, of fw is defined as /f,,, (S):= In Eesf'". (18) Rearranging (17) yields -ln P{fw > D} < maxs[sD - f,, (s) and consequently to approximately invert the constraint (10), taking account of (8), we must solve maxs[sD - f,,, (s)] = -In gca,, (19) to obtain Dw - =f,(s ) (20) and the corresponding deterministic constraint fw, < D,. (21) If we assume that the stationary traffic flow f1, has a Poisson distribution with parameter pw, that fl = Z fp pEP,,, 11

and that each arriving particle (call) of the Poisson flow f, is independently randomly routed on a path p E P, with probability fp/fw, then it follows that a, fp < D, (22) pE6P, is the deterministic equivalent of (10), since Pfp:= fpfw/fw > fpDw/f:= Dp} < gcall (23) Moreover, since w E W was arbitrary, we have that P{ E fp > Ca} < call a E A pEQ,, provided that fp < Ca a EA, (24) PEP,, since it may be shown that for sufficiently small 9call (typically 10-3 or 10-4), the effective bandwidths Dp determined by the Chernoff bound are additive, i.e. Z Dp = Dw, (25) pE P,,, (see Demnpster Key and Medova [1995]). Hence (24) is the deterministic equivalent of (11) as required. U 'Ilis theorem can be extended in a number of ways. For telecommunications network planning perhaps tile Ilost iIll)ortanlt generalization is to multiservice traffic mixtures of independent Poisson streams of Nlairkov mlo(ulated( fluid calls with negative exponential hdlding times (see Dempster, Key and Medova |1995]). Inl this context the (PCA) model provides a framework for real-time routing and rerouting of actual (cals Iased on stochastic knapsack and bin packing models. Since Theorem 1 applies to the basic queueing network model treated earlier, it (and its generalizations) canl be used to study hierarchical planning problems in, for example, flexible manufacturing. Strategic and tactical decisions might involve shop layout. factory and machine centre product capability, machine centre sizing. setulps, etc: while operational decisions would involve work-in-progress scheduling, processing and routing betweeln machine centres. We consider tactical level decisions on overall production in the next section. 12

3. The Tactical Level and Convex Approximation. The example of the previous section fits the strategic framework of a manufacturer considering capacity planning. The analysis considered only aggregate production service requirements without consideration of the timeliness of order fulfillment. At a lower decision level, the time element becomes more important. iWe first consider appropriate methods for aggregate production (for example, daily or monthly) decisions. Our approximation is based on a convex model in contrast to many scheduling models, in which nonconvexities immediately appear through disjunctive constraints. This model gives a good approximation at a tactical level where the production system appears large enough that a continuous approximation to setup conditions can apply. In this section, we follow the model in Birge and Dempster ([1992,1995]). Our model is similar to the continuous time model in Solel [1987] (see also Dempster and Solel [1987]) but we consider a discrete time model that allows us a wider class of problems. Related results for general stochastic optimization models appear in Dempster [1988], Flam [1983,1985], Kushner [1972], Arkin and Evstigneev [1979], Rockafellar and Wets [1983] and Hiriart-Urruty [1982]. One of our main results in this section concerns turnpike optimality which in the deterministic case may be found in McKenzie [1976] and in stochastic models in. for example, Sethi et al. [1992]. 'To set the stage for our model, we assume a data process, w:=: t = 0,...} in a (canonical) probability space (fQ.E,). We also assume a decision process x:= {xt: t = 0...} such that x is a measurable function x: w x(uz). The space of the decision processes is the space of essentially bounded filn.ctions. Ln:= L( (Q x N, E x P(N), p x #: Rn), where P is the power set and # is counting measure. Associated with the data process is a filtration IF:= {St}I1, where Et:= a(cwt) is the a-field of the history process ^t:= {Wo,...,Lt} and the Et satisfy {0, } C Eo C C E. A fundamental property of the decision process at time t is that it must only depend on the data up to tinle t. i.e. xt must be Et-measurable. An alternative characterization of this nonanticipative property is that xt = E{XtlEt} almost surely (a.s.), t = 0,..., where IE{IEt} is conditional expectation with respect to the c-field Et. Using the projection operator nt: z '7r tZ: I=E{zEt}, t = 0,..., on Lt, this is equivalent 13

to (I- nt)Xt = 0, t = 0,.... (26) We let KX denote the closed linear subspace of nonanticipative processes in L' and denote by I:= (ILo, r111...) the projection operator from Lo onto J/. Our general optimization model is to find 00 infxeAlE:( ft(, xt(a), xt+i()), (27) t=O where "IE" denotes expectation with respect to E. We use the notation xt and ft to denote respectively xt and ft as functions of w, i.e. as random entities. Expression (27) then becomes 00 infxeNIE ft (xt,xt+ ), (28) t=O with objective F(x):= 1E t= ft(xt,xt+). BWe assume in (28) that the objective components ft are proper convex normal integrands (see Rockafellar [1976]) with the following additional property: Assumption 1: For any y = (xt,xt+l), there exists 7y > 0 (independent of t) such that for rT e aft(y) c (Lx)'. the Banach dual space of L'. and for all w, either (a) r c aft(w), or (b) there exists z such that 7r e Aft(z) and ft(z) + r(w - z) < ft(w) - '||z - yll a.s. (29) for all t > 0. Note that uniform convexity implies Assumption l.which allows nonstrict convexity involving a von Neumann facet. We use this more general assumption here because it allows us to use the common scheduling objectives which involve linear tardiness and earliness penalties. 14

We present the main results from Birge and Dempster [1992] on optimality conditions, turnpike results concerning the optimality of cyclic policies and the asymptotic optimality of match-up strategies. The proofs of these results may be found there. In general, the objective in (28) is infinite. We can avoid this difficulty by defining a policy x* {xo, x,...} as (weakly) optimal, as in McKenzie [1976], if it is not overtaken by any other policy, i.e. if there does not exist x' such that -r limsupIE [ft(xt, t+l) - ft(xt,x t+)] < -e, (30) r ~ — 00 t=o where > 0. We also assume that the objective functions satisfy a condition ensuring that no infinite terms are present in the sum in (30). Assumption 2: For any t and 6 < oc, there exists e < oo such that li||xt || < 6 a.s. implies IE ft(xt, xt+l) > -e and lxt+i1I < e a.s. for xt+l feasible. U Given Assumption 2, we can subtract a constant from each ft without changing the weak optimality of x*. By setting this constant equal to the expected objective value in each period, we obtain an infimum of o in (28). Thus without loss of generality we assume a finite infimum in (28) in the sequel. Thel first result from Birge and Dempster [1992] is that there exist prices supporting th e objective terlms in (28). These price supports provide the optimality conditions in the following theorem that allow de&collmposition of the conditions by time period. Theorem 2. Suppose Assumption 2 holds and that x* is optimal in (28) with finite infimum, a.n,! (a) (inonran ticipative feasibility) For any x a dom F (i.e., such that IE Et=o ft (xt, Xt+ 1) < ~o), the projection of x onto Ir, fx, is such that IE EoO ft (txt l 1 Xt i ) < o, (b) (strict feasibility) For some x E A. such that IE ot=o ft(xt, Xt+) < oc, there exists 6 > 0 such that for all ly - x[l < 6 y e L, IE E ft (ytyt+ ) < X. (c) (finite horizon continuation approximation) There exists x' such that for all Tk in some sequence {T1.T2...}. and, for any x, dom F, (xT,X.+1,x+. 2,...) is also feasible, and the transition cost 15

to x' is such that IIE[fTk-1(xTk.1,xT ) + fTk(xTk,x'~,)]I -p 0 as k -+ oc and IlE[fTk-1(xT-1,xT,) + fTk (XTk, XTk +1)H ~ IIE [fTk -l(XTkl1XTk) ~ fTk (XTk I XTk +l0fl fork = 1. Then, x' is optimal with given initial conditions xo if and only if there exist Pt e- L'(E), t = 0. such that (i) Pt is nonanticipative, i.e. Pt = IEfptIjt} a.s. for t = 0,..., (ii) JEo(fo(xo,xi) - poxo + pix1) is a~s. minimized by x1:= xt over x1 = xl- I}Ell, and, for t > 0, IEt(ft(xt.xt+i) - ptxt + Pt+lxt+i) is a~s. minimized by (Xt,Xt+i) (x;,x;+i) over Xt = IE{Xt I Et} and xt+l = 1Efxt+i I Et+i},and (inz) IE ptk(xt, - x*,) - 0 as tk -* oc, for all x E dom F. U Tlie optinmality conditions (c) characterize optimal solutions which approach a common facet - the uorn.\(cunanr facet - from any given starting condition xO. The main implication of this result is that it is iasviiptotically optimal to match up with a decision process that is optimal for a specific initial condition even if tli;tt iitial condition changes. This result may be elaborated by showing that if the data process is cyclic tHl(Ii it iLs LvIII)toticallyv optimal to return to an optimal cyclic policy even if other conditions temporarily (1)h1 tIn. 'Ihiese results justify the match-up scheduling policy in Bean et al [1991] and extend the deterministic rDlsiltlf IIn Bevani and Birge [1986J. The following results for the general stochastic optimization model (28) drcsuo,1() froin Birge and Dempster [1992] and apply in a variety of contexts. Proposition 1. Given Assumptions 1 and 2 and Conditions (a) - (c) in Theorem 2, let X* be the set of.%(IIIrtl(Jn.% (x;.x7~i) that are minimal in (ii) of Theorem 1 for p~ a set of optimal supporting pm'ces given the miTlil coUflitio71n Xo and let x' be an optimal decision process given the initial condition 4b. Then, for any H mid > 0. there exists T < oc. such that, for all t > T. PLw: inf(X;..x,)c-x. II(x'.x1~l) - (x*,x*~1)ll > E} K 6. (31) U 16

Theorem 3. Under the conditions of Proposition 1, we may conclude that as t -* oo, in(xf,x +)ex I(xt,x;+l) - (X?,xt+1) -- 0 a.s. (32) For Theorem 3 to be fully applicable, we would like to have a method for determining an optimal policy for some initial state so that the policy of matching up to that strategy can 'be implemented. This determination is simpler if we can show that cyclic policies are optimal. In this case, only a single cycle needs to be analyzed to determine the turnpike optimal policy. In this development, we follow a similar approach to Arkin and Evstigneev [1979]. We first assume that the data process has a left tail, i.e. that w0 can be interpreted as... w'lWo. An alternative is to assume some type of Markovian property of the data process (see Arkin and Evstigneev). The data process is assumed to be cyclic with cycle k if the measure 1i is invariant with respect to the k-period forward shift operator Tk where Tkw = w' such that wt:= Wt+k, i.e., wt = Tkwt = Wt+k a.s. It follows that we may define Tk t:= Et for t = 0..., k - 1. We also assume that the objective is invariant with respect to Tk so that ft+k(Tkxt, Tkxt+1) = ft(xt,xt+i) a.s., where Tkxt(w):= xt(Tkw). In this context, x is a cyclic policy if xt+k = Tkxt a.s. and (28) becomes k-2 infxEIE(Z ft(xtxt+l) + fk-(xk-1, Tkxo)). (33) t=o Corollary 1. Given conditions (a) - (c) of Theorem 2 and a cyclic data process with cycle k, then there exists a weakly optimal policy with cycle k for any initial condition xo. As an example of extending these general results to a scheduling framework, consider a model with several commodities i = 1,..., n processed according to random processing times p(i), random release dates r(i) and random due dates d(i), with a penalty weight of w, for every period after the due date in which processing is not completed. We wish to model a situation in which orders for each item arrive randomly (according to r(i)), in varying amounts (according to p(i)), and with random due dates (d(i)). The random entities. r(i). p(i). d(i), correspond to sequences of times, {r(w, i, 1),r(, i, 2),...,, {p(w,i, 1),p(w, i, 2),...,}, 17

{d(w, i, 1), d(w, i, 2),..., }, for each order number 1,2,.... The data process is defined so that when the jth order for i arrives at t = r(w, i, j), then the processing time p(w, i, j) and due date d(w, i, j) are also known. Thus Et distinguishes r(w, i, I), p(w, i, ), d(w, i, l) for 1 < I < j, but not for I > j. Decisions are the amount of processing performed on each item i in each period t. Since the state of each item is reduced by the processing requirement at each due date, the total processing in period t is xt+(i2) - xt(i) if t is not a due date (t $ d(i,j) for any j = 1,2,...) or xt+-(i) + p(i,j) - xt(i) if t is a due date (t = d(i,j)). The decisions are constrained so that no processing can occur if an item is not released (t < r(i, j)) and processing in each period on each item is at most one. Other restrictions on feasible processes appear in an indicator function (w, xt,xt+l) which considers all resource availabilities. The only costs in this model are due to tardiness. A penalty wi is charged in each period for every unit of item i backordered (xt(i) < 0). The total tardiness cost at time t given w is then E,=1 Wi(-Xt(wJ, i)). The objective is to minimize the expected total tardiness. The single period objective contribution is thus: n ft(xt, Xt+l):= fti(xt(i), Xt+l(i)) + 6(w x, xt+i), (34) i=l wVi here -wixt(i) if Ej -1-l{t=d(i,j)}p(i,j) < Xt+(i) - Xt(i) — < 1') — 1 l{t>r(i,j)}p(i,j) - l{t=d(i,j)}p(i, j) xt(i) < 0 a.s., f/ (x, (i) Xt+l (i)):= if =1 -l{=d(i, i) < t+l(i) - t(i) < j E= l{t>r(itj)}P(ij) - l{t=d(ij)}p(ij), xt(i) > 0 a.s., xo otherwise. Thiis ionclyclic model is a generalization of the model with cyclic data process (with cycle k) in Birge and Deinlpsrer [1992]. In that model, it is assumed that r(i,j + 1) - r(i,j), d(i,j) - r(i,j), and p(i,j) are all i(tleiticallv distributed and that r(i,j) < d(i,j) < k a.s. for all j. Tl1(e data process is assumed to determine the availability of the resources (such as machines, labor and tools) for processing all commodities. We allow the 6 indicator term to represent feasibility generally by assunliiiig a value of "'0" if xt+l is feasibly reached from xt and "oo" otherwise. For example, suppose that each process i requires a resource m(i) where m(i) e {1..., M}, the set of resources, and each resource canl process at most one unit during a time interval if available and cannot process anything if unavailable. 18

In this case, wt can be interpreted to have several components such that the first M components form an M-vector of ones and zeroes corresponding to availability and unavailability of resources. We then have 0 if IE=1 l{j=m(i)}(Xt+l(i) - Xt(i) + p(i)l1t=d(i)}) 65(, Xt, Xt+l) = < <o t(j) for j = 1,...,M, (35) oc otherwise. Other constraints can also be represented in this way. Our only requirement is that 6(w, ) is convex. As an example of an alternative convex function, this constraint on feasibility can be broadened further to a situation with setups by including an additional set of variables, st(i,j), which indicate the extent to which resource group j is set up to process commodity i. If resource j is a large number of machines, then certain fractions of the group could be set up for i at different times. By indicating the total single period production capacity of resource j by Kj (which might also vary in time and be an additional decision variable), we would include a constraint that E=1 st(i, j) < Kj. We can also include a convex constraint on the maximum setup change in a single period as Aj, so that Z>l Ist+1(i, j) - St(i, j) < A (with a setup change cost also entering the objective if relevant). In this case, the indicator of feasibility becomes: 0 if (xt+l(i) - xt(i) + p(i)l{t=d(i)}) < t(j)st(i,j) for i =1,...,n;j = 1,...,M, 6(,xxt Xt+l):= E=__ st+l(i, j)< Kj, (36) E= Ist+(ij)- St(ij)| < A,,j = 1 M, cx otherwise. Note that this function preserves the convexity of the objective function. This model is a basic multipleprocessor. minimum expected weighted tardiness problem. With the convexity assumption, it meets the criteria for optimality and asymptotic stability given in Theorems 2 and 3. In some cases, the stationary distribution for this model follows a deterministic path between disruptions and an optimal match point is achieved as quickly as possible (cf. Bean et al [1991]). To see this, let x* be an optimal turnpike schedule in X. the set of optimal turnpike schedules. Assume that some state x0 < x0 a.s. is the initial state instead of x'. Let x' be an optimal trajectory given x4. In Birge and Dempster [1992], it is shown that a trajectory x that starts at x0 and matches up with x* at the earliest feasible t can be constructed with the same objective value as x' for the cyclic problem. Essentially the same proof mutatis mutandis yields this result for the current noncyclic problem. Theorem 4. Suppose x* is optimal from x; above in the tardiness model, find infxe.x,,=x,. a.s. EE o ft(xt.xt+1) given x > x4 a.s., with, ft defined in (34) and 6t in (35), and that there exists a 19

feasible solution x such that x = x; a.s., and xt = x* a.s. for some 7 < oo. Then there exists an optimal solution x' given x; such that x; = xt, t > r a.s.. U To illustrate the use of this result, we consider a one-machine, multiple commodity version of the model with single period objective (34). In this case, M:= 1 and m(i):= 1 for i = 1,..., n. For simplicity, we also assume that wi:= 1 for i = 1,..., n. With these assumptions, the familiar earliest due date scheduling policy is optimal. To define this policy, suppose that r(w,i,j(w, t, i)) < t < r(w,j(, t,i) + 1) and that d(w,z i1,J(w,t,ii)) < d(w, i 2, (t,i2)) K< * < d(w,in,j(Wt,in)) and define Xt+l(w,ii) recursively from I = to I = n by: Xt+l(W, ii) = xt(w, il) - l{t=d(w,il,J(w,t,i )))P(w, ilj(w, t, i)) 1-1 + l{,,(1)=l}(min{l - (xt+l(w, is) - t(w, is) (37) s=1 + l{t=d(,,.(J(W,t,.))}P(W, i5J(w, t, is)),P(w, i,j(w, t, il))M). Equation (37) then forces production to occur up to the machine availability in due date order on any items that have not yet reached the order quantity p(w, i1,j(w, t, i1)). This definition implies that order j for item i is always completed before order j + 1 is released. i.e. that xr(ij)(i) > 0 a.s. This assumption can be relaxed bv allowing due date order to apply also to previous orders j < j(, t, i) that are not yet complete. Theorem 5. An optimal solution to problem. (28) with objective defined by (15), A:= 1, xo:= 0, and 7m(i):= 1.Wi:= 1 for i = 1,..., n is to process items according to earliest due date of released items first, i.e. according to (37). The result of Theorem 5 for due date order is not valid (even with equal weights) in cases where penalties are charged only when jobs are finished. In such cases, the optimal order follows due dates if all jobs can complete on time, but the optimal order switches to shortest processing time if all jobs are late. The result of Theorem 5 does apply, however, if processing times and due dates follow the same order (see Birge et al [1990]. Lemma 2.3). It also applies if the weights are ordered in decreasing order from earliest due date to last due date. Due date order is optimal here regardless of processing time because, according to (37), charges are 20

incurred only on the incomplete portion of each job. This assumption is practical if an order is large and small batches within the large order can be shipped to the customer as they are finished. This ability to break up jobs is the critical factor in our convexity assumptions. Processing available jobs in due date order according to Theorem 5 provides a long run optimal solution provided the initial system is empty or that we can assume some point in time at which we have nonnegative processing on all released jobs. We would like to show that this policy satisfies the conditions for match-up optimality in Proposition 1 and Theorem 3. Assumptions 1 and 2 are valid since the costs are just piecewise linear with fixed increment in each period and the set of possible states is at most a unit L1 -distance from the current state. For condition (a) in Theorem 2 (nonanticipative feasibility), note that the feasibility conditions only depend on information available when an order is accepted. The second condition (b), strict feasibility, is satisfied if we assume that the system has sufficient slack such that the objective can be obtained without completely using all available capacity in some period. We assume this is possible (although an optimal solution may use all capacity). The third condition (c), finite horizon continuation, is that we can at some point reach a trajectory starting frotm. for example, the empty inventory state. Withi these assumptions and following Theorem 4, the optimal policy for any initial inventory state is to iiiatch up with the state from the empty inventory position as quickly as possible. The t'ternative initial states in Theorem 4 would correspond to entering period 0 with some overdue orders causlitg initial negative inventories for these items. The optimal match up response then corresponds to processing any ittems witih neative inventories before proceeding to items with zero or positive inventories. T'h- result is that one reaches by time t the same state as in the zero initial inventory state whenever the cumulative excess capacity up to time t is greater than the total negative inventory at time 0. 4. The Operational Level and Lagrangian Relaxation. At the operational level, distinct setup times and the consideration of discrete variables is often unavoidable. In these situations, however, the stochastic nature of the problems can frequently provide advantageous 21

solution characteristics that are not possible in deterministic problems. For example, consider a single machine scheduling problem with the objective to minimize weighted completion time when the machine is unavailable at some point during processing. If that downtime is known with certainty and occurs sometime strictly between 0 and the time to process all jobs, then finding the optimal order of jobs becomes an NP-complete problem. However, if downtimes occur randomly with negative exponential interarrivals, then the optimal order is simply the same weighted shortest processing time order as in the reliable machine case (Birge et al. [1990]). Other beneficial effects of random problem parameters include obtaining continuity and other useful objective function characteristics. For example, Van der Vlerk [1995] surveys quite general conditions for continuitv in input variables of the expectation over random parameters of the optimal value of optimization problemlls with integer variables. The applications of these results include problems with fixed order costs. Tlhese types of results are reminiscent of the classical Lyapunov theorem (see, e.g. Young[1969], Theorem 79. 1) which implies that the expectation of a nonlinear multifunction of a random (vector) argument with an labsolultely continuous (joint) distribution is convex. Another characteristic of stochastic scheduling problems is that similairities of problem structure across different random outcomes can be exploited in solution pre)((lI.es. Ilsis observation has led to a variety of Lagrangian-based solution methods (see. for example, Dteptrel 19S~ and Rockafellar and Wets [1991]). A particular advantage for scheduling problems is that tif iltg-va\. rial)le )prol)lems can be solved separately with warm starts provided by solutions to problems wit siiilair (lata. \e show below, in fact. that the duality gap in discrete time stochastic programs of this t Vle;actuallll decreases to zero as the number of scenarios increases. ThI lh isic approach here follows Bertsekas's [1982] original results in bounding duality gaps. Our (I evelopInliiIt here is a generalization of the results in Takriti, Birge and Long [1994b], which apply the Lagraltlii;a relaxation approach to scheduling electric power units. In that paper, it is shown that the (lutlit,;IIa is indeed limited by the number of decision points and the total capacity and that the average llap) (ldecliines to zero as the number of random scenarios increases. Computational results (Takriti, Birge and Long '1994a1) also show that the solutions are close to optimal with only a few scenarios and that in this case stochastic scheduling problems provide significant savings over deterministic methods. 22

As a general model, we postulate the model of (27) with a finite number of scenarios Q = {w1,..., wN} with probabilities, pl,...,pN, respectively, and a finite time horizon t. For decisions, x, i = 1,..., N, we may have integer variables (in addition to continuous decisions) where we use a superscript i to indicate dependence on scenario wa. Our problem becomes: T N inf Zp ift (Xi, ~xi+i) (38) xEAV,xEX' = 0 = t t where Xi includes any feasibility conditions such as integrality on certain components of Xi, ft is a convex function, and the only constraints linking scenarios are represented through KJ. As noted earlier, KJ can be written as a set of linear constraints, such as x( E p'k) E pk ==O Vit, ke~t (i) kE St(i) where Et(i) is the set of scenarios sharing the same history process as i at time t. If there are T periods and xi c RinT, we would typically have N(T - l)n of these constraints (one for each scenario in every period from 1 to T - 1). If we collectively write these constraints as N Z H'x = b, b E U q -=1 then (38) becomes min P N subject to P = E piF(xt) lV (39) xE X,N, E HtXi = b. tivity constraints. This program has the form: sup D subject to D < D(A) A where min [pi Fi() + ATHix] - ATb D(A) = subject to x1 E Xi, = 1,...,N, (40) tI ~ A >0, 23

which then decomposes into completely independent subproblems for each i. The following result from Bertsekas [1982] provides the basis for the duality gap result in the Lagrangian approach. Theorem 6. If problem (39) has a solution, for every i, the set {(xi, Fi(x))lxi e Xi} is compact, and, for every x C convXi, there exists x E Xi such that Hix < Hi, then infP - supD < (q + l)p, where p = maxi=....N sup(piFi(xi)xzi E Xi) - inf(piFi(xi)xi E Xi). Ptvof This proof follows immediately froim Berteskas [Proposition 5.6, 1982]. The only observation is that p'F' corresponds to the objective function fi in Bertsekas' work, and that we use a linear function Hi which preserves compactness in X1. U The significance of this result is that the gap decreases to zero as the probabilities decrease (as pi, for example set equal to 1,- O) without a corresponding increase in q. This tendency occurs in solution proce(llres that either sample the set of possible outcomes (for statistical results, see, for example, King and \\t~s 1991l) or include bounding approximations with increasing accuracy (see, for example. Birge and \Wets G19S _i). In either case. the number of scenarios increases and the probability of any individual scenario dec(reases withl decreasing confidence intervals or bounds on the expected value of a full solution of (39). ()Or g(oal is thus to show that the number'q of constraints necessary to maintain nonanticipativity need not illnCrease( at the same rate when we consider integer variable values separately. The burden of proof is in fact less thian this since we really need consider only constraints involving integer variables. We show this il tlie following theorem. For this result, assume that xi has a continuous part and a binary part, so that X.;:= (t/. st) and XA:= Y1 x SZ, where Y' is a convex set and SZ is the interesection of a convex set with XJl {0). 1 }. where J = mT if we have T periods with m integer variables in each period. We also suppose tliat the N total scenarios are combined by the nonanticipativity constraints into a tree with K non-leaf nodes or branching points. We then have the following result. 24

Theorem 7. Suppose the conditions of Theorem 6, that Xi = yi x Si as defined above, and that the equivalent decision tree for problem (39) includes K non-leaf nodes, then inf P - sup D < (Km + 1)p, (41) where p:= maxi=,....N sup(piFi(xi)Ixi E Xi) - inf(piFi(xi)[xi E Xi). Proof. The proof of this result follows the lines of the proof in Bertsekas [1982] except that we recognize the difference between the continuous and integer parts of the variables in order to reduce the first factor of the bound from the general case of q = N(T - 1), where T is the number of periods, to Km. First observe that we can replace all nonanticipativity constraints in P for variables s by the following: st( E pk)- pkS =0,vt, (42) kEC, (i) kEE E(i) where we have only chosen a single representative, i* E Et(i), for each set of scenarios sharing the same history process as i at time t. We therefore need only a single constraint (42) for every node of the tree. To see this note that the s\ components are all binary. If s (j) = 0, then (42) is only satisfied when sk(j) = 0 for all A- t Z(i). If st (j) = 1 then, of course, the same condition holds. Thus, constraints (42) can substitute for all st nonanticipativity constraints. If there are K non-leaf nodes and st E Rm, then we have Km constraints of the form (42). If we let the constraints of form (42) be denoted collectively by Z GIs = 9, and let the nonanticipativity constraints on the y' variables be of the form L'y' = 1, 25

then the primal problem becomes N inf p'(Ci(y) + D'(si)) i=l subject to y E Y i = 1,...,N, siESi i=1,...,N, (3) N N - Liy = 1, i=l where we have written Fi(xi) as Ci(yi) + Di(si) and note that g E 3R2Km and I E 3, for example. Now, following Bertsekas, consider Wi:= {wi: w = [Liyi,Ci(yi)],yi e Yi} and Z:= {zi: zi = [Gisi, Di(sz)], si E S1}, with W:-=,N Wi and Z:= N=i.i. Then we have: infP = min{u + v13((w,u), (z,v)) E W x Z such that w= g,z = 1}. (44) From duality theory (see, for example, Magnanti et al. [1976]), we have that supD = min{u + v3((w, u),(z,v)) E conv(W x Z) such that w =g, z = }, (45) where con7' denotes the convex hull. Next observe that conv(W x Z) = W x conv(Z), since Yi is convex. Now, following Bertsekas, we can use tile Shapley-Folkman theorem to write every z E conv(Z) using a subset I(z) C {1,..., N} with at most I',- 1 indices such that z E ~EI(-)Z~ (46) Now, suppose ((w, ii), (z, )) c W x conv(Z) with i + v = sup D and fl = g, z = i. Then we have yi C Yi such that i=1 L = g and CE=1 C0i(p) = u. From Shapley-Folkman, we also have some I C {1...., N}, I\ < Km - 1, with (l1, vi) e conv(Z') and s1 e S,. i I7 such that E Gi(si) + E l = =1, (47) 2Ei 26

and N C( i)~+ E Dsi + E i= sup D (48) i=l fI iEI Now, using the approach in Bertsekas and our assumptions, we can obtain for every i C I, some Si such that Gisi = li and f,(s ) < vi + pi + e for any e > 0. Thus, we have found a feasible solution (y, s) for (P) such that N N inf P < Ci(i) + D < sup D + p, (49) i=l i=l iEI which, since I < 2Km + 1, yields the result. This result is only valuable in the present context if we can restrict the growth of K to o(N) so that the gap in Theorem 6 goes to zero. In fact, this is the general case. Suppose, for example, that we employ a sampling scheme where every scenario path has weight pi:=. If each path has v branches at lach node in each time period (see Figure 1 for an example), we then have N = ^T scenarios and K = (vT - 1)/(v - 1) branching nodes. The duality gap vanishes in the limit since since K/N O0 as v -> oo. We state the result as follows. O = Nonleaf node (K=13) v =3, T=3, N=27 Figure 1. Scenario tree where with T = 3 periods (excluding t = 0), v = 3 branches at each non-leaf nodes, AN = 27 scenarios, and K = 13 non-leaf nodes. 27

Corollary 2. If the conditions of Theorem 7 hold, N = vT, pi:= N, K = (vT - 1)/(V - 1), and supi(Fi(xi)lxi E Xi) - inf(Fi(xi)lxi E Xi) < M < oo uniformly for all v, then lim (inf P - sup D) -- 0. (50) V —OC The vanishing duality gap result holds for other general sampling schemes. The result also indicates that the Lagrangian duality gap for a problem with any distribution obtained in the limit of the sampling procedure must also have zero duality gap. It implies in general that randomness - absolutely continuous radolm variables in this case - can indeed simplify some of the combinatorial difficulties in problems. Of course, one must still solve the separate deterministic subproblems optimally, but no additional gap is present il this case. This result also indicates that parallel processing may offer distinct advantages for stochastic scheduling. ReIpeate(l deterministic solutions of deterministic Lagrangian subproblems on separate processors will yield iilc('re siinly accurate solutions as the number of scenarios increases. Conclusion. I'lii. Iapelr lhas explored three methods for incorporating optimization procedures into stochastic scheduliIl" pIroeltllls as ) part of,a decision hierarchy. At the capacity planning level, we showed how queueing appIx.allltiosll ield a convenient model when lower level timing effects are not - or are only occasionally - switni anit. At the aggregate production level, wae demonstrated that a convex approximation can yield useful c.l;a( t('rizations of optimal policies. At the lowest level of detailed scheduling, we showed that Lagrangian relax.a;t ion lmethods obtain smaller duality gaps as stochastic program formulation sizes grow. 'Iliese results can all be viewed as separate solutions of parts of the 3-level model in (7) with each higher lev.el illcorp)orating some approximation of the lower level objectives. Our results give some indication of structural properties and uses of optimization procedures. We anticipate that this general framework will yield Illany other characterizations and procedures for alternate problem structures. 28

References. V.I. Arkin and I.V. Evstigneev (1979). Stochastic Models of Control and Economic Dynamics. Nauka, Moscow. (In Russian). English translation by E.A. Medova and M.A.H. Dempster. Academic, Orlando (1986). D. Bai, T. Carpenter and J.M. Mulvey (1994). Stochastic programming to promote network survivability. Technical Report SOR-94-14, Statistics and Operations Research, Princeton University, August 1994. J.C. Bean and J.R. Birge (1986). Match-up real-time scheduling. In Real-Time Optimization in Automated Manufacturing Facilities, R.H.F. Jackson and A.W.T. Jones, eds., NBS Special Publication 724, National Bureau of Standards, 197-212. J.C. Bean, J.R. Birge, J. Mittenthal and C.E. Noon (1991). Match-up scheduling with multiple resources, release dates and disruptions Operations Research 39, 470-483. D.P. Bertsekas (1982). Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York. D. Bertsinias (1994). A mathematical programming approach to stochastic and dynamic optimization problenms. Technical Report, Sloan School of Management and Operations Research Center. Massachusetts Institute of Technology, March 1994. D. Bertsimas and J. Ninfo-Mora (1993). Conservation laws, extended polymatroids and multi-armed bandit problems: A unified approach to indexable systems. Technical Report, Sloan School of Management, MIassachusetts Institute of Technology, March 1993. J.R. Birge (1982). The value of the stochastic solution in stochastic linear programs with fixed recourse. Mathematical Programming 24, 314-325. J.R. Birge and I.A.H. Dempster (1992). Optimality for match-up strategies in stochastic scheduling. Technical Report, Department of Industrial and Operations Engineering, The University of Michigan. Subiitted to: Mathematics of Operations Research. J.R. Birge and M.A.H. Dempster (1995). Optimal match-up strategies in stochastic scheduling. Discrete 29

Applied Mathematics 57, 105-120. J.R. Birge, J.B.G. Frenk, J. Mittenthal and A.H.G. Rinnooy Kan (1990). Single machine scheduling subject to stochastic breakdowns. Naval Research Logistics 37, 661-677. J.R. Birge and K.D. Glazebrook (1988). Assessing the effects of machine breakdowns in stochastic scheduling. Operations Research Letters 7, 267-271. J.R. Birge and R. J-B. Wets (1986). Designing approximation schemes for stochastic optimization problems, in particular, for stochastic programs with recourse. Mathematical Programming Study 27, 54-102. J.A. Buzacott and J.G. Shantikumar (1980). Models for understanding flexible manufacturing systems. AIIE Transactions 12, 339-350. H. Chen and A. Mandelbaum (1991). Stochastic discrete flow networks: Diffusion approximations and bottlenecks. Annals of Probability 19, 1463-1510. G.B. Dantzig (1963). Linear Programming and Extensions. Princeton University Press, Princeton, NJ. MI.A.H. Dempster (1988). On stochastic programming: II. Dynamic problems under risk. Stochastics 25, 15 —42. NI.A.H. DemIipster (1994). Hierarchical approximation of telecommunications networks. BT Technology J. 12. 40-49. NI.A.H. Delll)ster. I.L. Fisher, L. Jansen, B.J. Lageweg, J.K. Lenstra and A.H.G. Rinnooy Kan. (1983). Aiialvsis of heuristics for stochastic programming:Results for hierarchical sceduling problems. Mathetrat tc.. of Operations Research 8, 525-537. \I.A.H. Deliipster. P.B. Key and E.A. Medova (1995). Integrated system modelling for ATM. Research Report 95-7, Department of Mathematics, University of Essex. Submitted to: Telecommunications Rese(arch. NI.A.H. Dempster, J.K. Lenstra and A.H.G. Rinnooy Kan, eds. (1982). Deterministic and Stochastic Sch(Idulzng. Reidel. Dordrecht. NI.A.H. DeniIpster and E. Solel (1987). Stochastic scheduling via stochastic control. Proceedings 1st World Conference of the Bernoulli Society. K. Krickeberg and A. Shiryaev, eds. VNU Science Press, Utrecht, 30

783-788. S.D. Flam (1983). Asymptotically stable solutions to stochastic problems of Bolza. Chr. Michelsen Institute Report, Fantoft, Norway. S.D. Flam (1985). Nonanticipativity in stochastic programming. Journal of Optimization Theory and Applications 46, 23- 30. R. Garbe and K.D. Glazebrook (1995). Conservation laws, submodular returns and greedy heuristics for job selection and scheduling problems. Research Report, Department of Mathematics and Statistics, University of Newcastle, April 1995. J.C. Gittins (1981). Multiserver scheduling of jobs with increasing completion times. Jounrals of Applied Probability 16, 321-324. K.D. Glazebrook (1981). On nonpreemptive strategies in stochastic scheduling. Naval Research Logistics Quarterly 28, 289-300. K.D. Glazebrook (1984). Scheduling stochastic jobs on a single machine subject to breakdowns. Naval Research Logistics Quarterly 31, 251-264. K.D. Glazebrook (1987). Evaluating the effects of machine breakdowns in stochastic scheduling problems. Naval Research Logistics 34. 319-335. K.D. Glazebrook (1991). On non-preemptive policies for stochastic single machine scheduling with breakdowns. Probability in Engineering and Informational Sciences 5, 77-87. S.C. Graves (1981). A review of production scheduling. Operations Research 29, 646-675. J.M. Harrison (1985). Brownian Motion and Stochastic Flow Systems. John Wiley, New York. J-B. Hiriart-Urruty (1982). Extension of Lipschitz integrands and minimization of nonconvex integral functionals: Applications to the optimal recourse problem in discrete time. Probability and Mathematical Statistics 3, 19-36. F.P. Kelly (1991). Loss networks. Annals Applied Probability 1, 319-378. A.J. King and R. J-B Wets (1991). Epiconsistency of convex stochastic programs. Stochastics and Stochastics 31

Reports 34, 83-92. H.J. Kushner (1972). Necessary conditions for discrete parameter stochastic optimization problems. In: Proceedings Sixth Berkeley Symposium Math. Stats. and Probability, J. Neyman, ed. U. California, Berkeley, 667-685. L. McKenzie (1976). Turnpike theory. Econometrica 44, 841-864. T.L. Magnanti, J.F. Shapiro, and M. H. Wagner (1976). Generalized linear programming solves the dual. Management Science 22, 1195-1203. E.A. Medova (1993). ATM admission control and routing. BT Laboratories Performance Engineering Report DS DZ/783-03/DP/00029, December 1993. J. Mittenthal (1986). Single Machine Scheduling Subject to Random Breakdowns. Ph.D. dissertation, The University of Michigan, Ann Arbor, Michigan. M.L. Pinedo (1983). Stochastic scheduling with release dates and due dates. Operations Research 31, 559-572. NI.L. Pinedo and S.M. Ross (1980). Scheduling jobs subject to nonhomogeneous Poisson shocks. Management Science 26, 1250-1257. R. Righter (1994). Scheduling. Chapter 13 in NI. Shaked and J.G. Shantikumar. Stochastic Orders and their.Applications. Academic Press, San Diego, CA. A.H.G. Rinnooy KNan (1976). Machine Scheduling Problems: Classification, Complexity and Computations. %Iartinus Nijhoff, The Hague. R.T. Rockafellar (1976). Integral Functionals. Normal Integrands and Measurable Selections. Lecture Notes in Mathematics 543. Springer-Verlag, Berlin. R.T. Rockafellar and R. J-B. Wets (1983). Deterministic and stochastic optimization problems of Bolza type in discrete time. Stochastics 10, 273-312. S. Sen, R.D. Doverspike and S. Cosares (1992). Network planning with random demand. Technical Report, Systems and Industrial Engineering Department, University of Arizona, December 1992. 32

S.P. Sethi, H.M. Soner, Q. Zhang and J. Jiang (1992). Turnpike sets and their analysis in stochastic production planning problems. Mathematics of Operations Research 17, 932-950. S.P. Sethi and Q. Zhang (1995). Multilevel hierarchical decision making in stochastic marketing-production systems. SIAM J. Control and Optimization 33. 529-553. M. Shaked and J.G. Shantikumar (1994). Stochastic Orders and their Applications. Academic Fress. San Diego, CA. J.G. Shantikumar and D. Yao (1992). Multiclass queueing systems: Polymatroid structure and optimal scheduling and control, Operations Research 40, S293-S299. E. Solel (1986). A Dynamic Approach to Stochastic Scheduling via Stochastic Control. Ph.D. dissertation, Dalhousie University, Halifax, Nova Scotia. S. Takriti, J.R. Birge and E. Long (1994 a). A stochastic model for the unit commitment problem. IEEE Transactions on Power Systems, to appear. S. Takriti, J.R. Birge and E. Long (1994 b). Lagrangian solution techniques and bounds for loosely-coupled mixed-interger stochastic programs. Technical Report, Department of Industrial and Operations Engineering. University of Michigan. December 1994. NI.H. van der Vlerk (1995). Stochastic Programming with Integer Recourse. Ph.D. Thesis, University of Groninigen. L.C. Youig (1969). Lectures on the Calculus of Variations and Optimal Control Theory. W.B. Saunders, Philadelphia. 33