A MULTICUT ALGORITHM FOR TWO-STAGE STOCHASTIC LINEAR PROGRAMS John R. Birge Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109 USA Francois V. Louveaux Faculte des Sciences Economiques et Sociales Facultes Universitaires N-D de la Paix B-5000 Namur Belgium Technical Report 85-15

A Multicut Algorithm for Two-Stage Stochastic Linear Programs John R. Birge* Fran9ois V. Louveaux Department of Industrial and Faculte des Sciences Economiques Operations Engineering et Sociales The University of Michigan Facultes Universitaires N-D de la Paix Ann Arbor, MI 48109 B-5000 Namur USA BELGIUM Abstract: Outer linearization methods, such as Van Slyke and Wets's L-shaped method for stochastic linear programs, generally apply a single cut on the nonlinear objective at each major iteration. The structure of stochastic programs allows for several cuts to be placed at once. This paper describes a multicut algorithm to carry out this procedure. It presents experimental and theoretical justification of reductions in major iterations. Keywords: stochastic programming, outer linearization, cutting plane methods. *Supported by National Science Foundation Grant No. ECS-8304065.

1. Introduction Two-stage stochastic linear programs have a deterministic equivalent program with convex objective function that can be solved by a variety of methods. The L-shaped method of Van Slyke and Wets [9] is a cutting plane or outer linearization technique for solving this program when the random variables have finite support. It has been extended to multi-stage stochastic linear and quadratic programs by Birge [1] and Louveaux [7], respectively. Their analyses showed the L-shaped algorithm to be an effective solution technique for a variety of examples. The structure of stochastic programs, however, allows the L-shaped method to be extended to include multiple cuts on the objective in each major iteration. This paper describes this procedure for two-stage stochastic linear programs. A multi-stage version has been proposed by Silverman [8]. In Section 2, we briefly describe the L-shaped algorithm and the problem structure. In Section 3, we present the multicut algorithm and, in Section 4, we discuss its efficiency in terms of bounds on the number of major iterations for general problems. The specific case of simple recourse problems is discussed in Section 5. Section 6 presents results of numerical experiments and the appendices provide illustrative examples of claims made in the text. 2. The L-shaped algorithm The classical two-stage stochastic linear program with fixed recourse is the problem of finding min z = cx + Eg min q(w)y (1) s.t. Ax = b T(w) x + Way = h(w) x > y> O 1

n1 ml where c is a known-vector in R, b a known-vector in R, i a random N-vector defined on the probability space (2, A,P), and A and W are known matrices of size ml x n1 and m2 x n2, respectively. W is called the recourse matrix. n2 m2 For each w, T(w) is m2 x nl, q(W) E R and h(w) c R. Piecing together the stochastic components of the problem, we get a vector i(X) = (q(w), h(w), T(No),..., Tm (w)) with N = n2 + m2 + (n2 x n1) components, where Ti (w) is 2 the ith row of T(). Transposes have been eliminated for simplicity. E~ represents the mathematical expectation with respect to. Problem 1 is equivalent to the so-called deterministic equivalent program (D.E.P.): min z = cx + Q(x) (2) s.t. Ax = b x> 0 where Q(x) = EEQ(x, S(w)) and Q(x, (wU)) = min { q(w)'y | Wy = h(w) - T()'x, y > 0 } y Properties of the D.E.P. have been extensively studied (Wets [10], Garstka and Wets [4]). Of particular interest for computational aspects is the fact that Q(x, g) is a convex piecewise linear function of x and that Q(x) is also piecewise linear convex if i has finite support. When T is non-stochastic, the original formulation (2) can be replaced by min z = cx + T(X) (3) s.t. Ax = b Tx - X = 0 x> 0 where Y(X) = EgP( X, S(w)) 2

and i(X, (w)) = min {q(o) y | Wy = h(w) - X, y > 0} This formulation stresses the fact that choosing x corresponds to generating an m2-dimensional tender X = Tx to be "bid" against the outcomes h(w) of the random events. In this paper, we concentrate on algorithms for solving (2) or (3). Excluding algorithms for the specific simple recourse problem (see, e.g., Kall [6], Wets [11]), the basic method for solving (2) is the L-shaped algorithm due to Van Slyke and Wets [9] which is directly related to Benders' decomposition. For more details, see the discussion of algorithmic procedures in Wets [11]. The method consists of solving an approximation of (2) by using an outer linearization of Q. Two types of constraints are sequentially added: (i) feasibility cuts (5) determining {x | Q(x) < + oo} and (ii) optimality cuts (6) which are linear approximations to Q on its domain of finiteness. Assumption: The random variable i has finite support. Let k = 1,..., K index its possible realizations and let Pk index their probabilities. L-shaped algorithm Step 0. Set s = t = V = 0. Step 1. Set v = V + 1. Solve the linear program (4) - (6). min z = cx + Q (4) -s.t. Ax = b D x > di Q = 1,..., s (5) ERx + e > ee Q = 1,..., t (6) x > 0, OER. Let (xV, 09) be an optimal solution. If no constraint (6) is present, 0 is set equal to - X and is ignored in the computation. 3

Step 2. For k = 1,..., K solve the linear program 1 + - min w = + e v s.t. Wy + Iv+ Iv- = hk - Tk-x 1 >V y>O v+ >0 v > 0 where e = (1,..., 1), until, for some k, the optimal value w' > 0. Let o" be the associated simplex multipliers and define Ds+1 = Tk and d +1 = a hk to generate a feasibility cut of type (5). Set s = s+l and return to Step 1. If, for all k, wl = 0, go to Step 3. Step 3. For k = 1,..., K solve the linear program min w = qk'y (7) s.t. Wy = hk - Tk'xV y > 0. Let'rk be the simplex multipliers associated with the optimal solution of Problem k of type (7). Define N t+1 Pk' k (8) k=1 and N et+l = Z Pk * k hk' (9) k=l Let w2 = et+l - Et+ x. If 6 > w2, stop, xv is an optimal solution. Otherwise, set t = t + 1, and return to Step 1. Improvements on this algorithm have been given in two directions: (i) the study of cases in which Step 2 can be modified to solve only one linear program instead of N and (ii) the study of bunching and sifting procedures to reduce the work in 4

Step 3 (Garstka and Rutenberg [13]). We again refer to Wets [11] for a detailed account of these improvements. In this paper, we propose to replace the outer-linearization of Q used in the L-shaped method by an outer-linearization of all functions Qk(x) = min {qk'y | Wy = hk - Tkx, y }, (10) y of which Q(x) constitutes the expectation. 3. The Multicut L-shaped algorithm The multicut L-shaped algorithm is defined as follows: Step 0. Set s = V = 0 and tk = 0 for all k = 1,..., K. Step 1. Set V = V + 1. Solve the linear program (11) - (13). K min z = cx + Z ek (11) k=1 s.t. Ax = b D9 x > di, I = 1,..., s (12)!E x + k >l e.' = 1,..., tk (13) k = 1,..., K. Let (x, e,..., K) be an optimal solution of (11). If no constraint (13) is present for some k, ek is set equal to - X and is ignored in the computation. Step 2. As before. Step 3. For k = 1,..., K solve the linear program (7). Let rTk be the simplex multipliers associated with the optimal solution of problem k. < Pk (hk - Tk'X) (14) define (15) ~tk+l = Pk'~' Tk

and etk+l = PkV h k (16) and set tk = tk+l. If (14) does not hold for any k = 1,..., K, stop, xV is an optimal solution. Otherwise, return to Step 1. We illustrate the differences and similarities between the multicut approach and the standard L-shaped algorithm in the example of Appendix A. The multicut approach is based on the idea that using outer approximations of all Qk(x) sends more information than a single cut on Q(x) and that, therefore, fewer iterations are needed. 4. Efficiency and bounds The following dominance property can be established. Define Kin K2 to be the constraint set of the stochastic program (2), where K1 = {x Ax = b, x > 0 } and K2 = n {x 3 y > 0 s.t. Wy = h( ) - T(9)x} where by assumption = has finite support. Q(x) is known to be piecewise linear, hence there exists a polyhedral decomposition of K1 0 K2 into a finite collection of closed convex sets Cr, called the cells of the decomposition, such that the intersection of two distinct cells has an empty interior and such that either the function Q(x) is identically - o, or on each cell the function Q(x) is affine. In the following proposition, we define a major iteration to be the operations performed between returns to Step 1 in both algorithms. Simplex iterations are the number of simplex algorithm pivots performed on any of the 6

linear programs considered by the algorithms. Proposition: Let {x'} be a sequence of points generated by the multicut algorithm and let {yV} be a sequence generated by the L-shaped algorithm. Then, if at all major iterations, xV and yV belong to the same cells of the decomposition, the number of major iterations needed by the multicut algorithm will be less than or equal to the number of major iterations of the L-shaped algorithm. Proof: The multicut approach sends more information on the function Q(x) than the L-shaped algorithm. If the L-shaped algorithm stops for optimality, the multicut algorithm will do so also. The reverse is not true. I Unfortunately, whether the iterate points belong to cells that are close to or far from the optimal point is partly a matter of chance. Therefore, the L-shaped method can conceivably do better than the multicut approach (the reverse is obviously also true) in terms of number of iajor iterations. We illustrate this by the example in Appendix B. Other examples where the multicut approach would do better than the L-shaped method can easily be constructed. Since none of the methods is superior to the other in all circumstances, the efficiency of the two approaches is measured in terms of worst-case analysis on the number of major iterations. Definition: Let b(t) represent the maximum number of different slopes of Q(x, i) in any direction parallel to one of the axes for a given i, i.e. the maximum number of different cells (of the polyhedral decomposition of K1 f K2 relative to Q(x, i) for a given i) encountered by any ray (parallel to one of the axes) originating at a point arbitrarily chosen in K1( K2. 7

Define b = max b(~) to be the "slope number of the second-stage of (2)." Theorem: Let b be the slope number of the second stage of (2). Then, the maximum number of iterations for the multicut algorithm is m2 1 + K * (b - 1) (17) while the maximum number of iterations for the L-shaped algorithm is m2 [1 + K *(b - 1)] (18) where K is the number of different realizations of. Proof: If b is the slope number of the second stage of (2), in the worst-case b([) = b for all i s E, then the maximal number of different slopes for Q(x) in one direction is 1 + K * (b - 1). The worst-case for the total number of slopes is when the effective number of slopes is equal to b(Q) in all directions for all i(as is the case in a simple recourse problem). Then, the total number of facets of Q(x) (or cells of the decomposition of m2 K1 n K2) is equal to [1 + K(b - 1)]. The worst-case for the L-shaped algorithm is to consider all the facets of Q(x), which proves (18). For the multicut approach, the worst-case is to send at each iteration a cut for only one among the K realizations of the random event. For each m2 realization, namely each ek, there are at most b facets of Qk(x); so m2 a total of maximum K'b cuts for all Qk's are needed to describe Q(x). The result given in (17) follows from the fact that K cuts are sent at the first iteration. a The maximal number of iterations has an immediate consequence on the size of the first-stage problems to be solved. While problems of smaller size are needed in the first iterations of the L-shaped method (ml+l constraints, 8

nl+l variables) as compared to the multicut (ml+K constraints, nl+K variables), the above theorem shows that the size of the problem is of the order of m2 m2 m2 (b - 1) K in the worst-case for the L-shaped approach and K *(b - 1) for the multicut strategy. One can therefore expect the multicut approach to be especially efficient for problems where m2 is large and where many cuts are needed. In the next section, the number of facets for the particular case of simple recourse is given explicitly. 5. The Simple recourse case The simple recourse problem is a particular case of the formulation (3) with non-stochastic matrix T where the function P(X, i) is separable m2 ~(X, > )= Z i(Xi, i) (19) i=l and ~li(Xi, Ei) = min {qi yi + qi i | i - yi hi - Xi' Yi _ 0, yi > } (20) where i = (qi' qi hi). Assume that, for each i, Si can take on J different values (where for simplicity of exposition J is assumed to be the same for all i). Then, using the multicut approach consists of approximating the recourse function Y(X) by the outer-linearization m2 J Z Z eij (21) i=1 j=l Due to the simple recourse property, only two cuts of the type (13) can be generated for each eij, namely ei3 j i * j (Xi -hij) (22) 9

and eij >Pij - qij (hij - Xi) (23) where pij denotes the probability of the jth realization of Si. Introducing the slack variable uij in constraint (22), Pij ij (Xi - hij) + uij = ij (24) and substituting eij from (24) into (21) - (23), the simple recourse problem (19), (20) is equivalent to m2 J m2 J min cx + i Z Pij qij(Xi -hi) + Z Z u.i (25) i=l j=l i. i=l j=l 3 s.t. Ax = b Tx - X = 0 uij Pij qij(hij - Xi) i = 1,..., 2, j 1,... J j >0 i 1,...,m, j =,..., J where qi = qj + qij From (25), we can derive the following algorithm. Multicut algorithm for simple recourse problem. Step 0. Set v = t = 0 Step 1. Set v = V + 1. Solve the linear program m2 J t min z = cx + Z Z Pi * j - T x + Z u (26) i=l j=l j =l s.t. Ax = b UR L eR - 1^ = 1,..., t u > e- E x u > 0 =l,..., t Let (xV, uV) be an optimal solution. If t = 0, then u is ignored in the computation. 10

Step 2. For each i = 1,..., m2, and j = 1,..., J, if the constraint 0 >Pij ij(hij - Ti' x) (27) is violated, define t+l = Pij *' ij' Ti and et+l = Pij' qij hij and set t = t+l. The initial problem (27) involves ml constraints and n1 variables. For this problem, the worst-case situation is when at each iteration, only one constraint (27) is violated in Step 2. Then, the maximal number of iterations is J'm2+1. To compare with the maximal number of iterations an L-shaped type algorithm would require to solve the same problem, note that for each i, the function Ti(Xi) = E^i(Xi, Si) contains J+l facets and since Y(X) is separable m2 in i, it contains at most (J+1) facets. This is precisely the worst-case upper bound on the number of iterations for an L-shaped type algorithm as in the Theorem of Section 4. 6. Numerical experimentation and conclusions The L-shaped algorithm and the multicut method have been coded in FORTRAN in the codes NDREG and NDSEP respectively. NDREG is a two-stage version of the multi-stage code developed by Birge in [1] and described in [2]. NDSEP uses the same subroutines for linear program solutions, constraint generation and constraint elimination as NDREG. The subroutines to control where cuts are placed and to determine optimality have been modified in NDSEP to reflect the differences between the standard L-shaped method and the multicut approach. The set of test problems and their size characteristics appear in Table I. The first four problems are small energy examples with varying objectives and constraints and the last example is a stochastic two-stage version of one of Ho 11

and Loute's [5] staircase problems. These examples were chosen because of their applicability and the facet structure of their recourse functions. The problems were solved using the FORTRAN-G compiler at The University of Michigan on an Amdahl 5860. The number of major iterations, simplex iterations and CPU seconds are given for each problem in Table II. Both NDREG and NDSEP used the bunching approach (Wets [11]) for solving second-period problems. They also both included the deletion of slack cuts which resulted in savings of up to twenty percent in CPU times. The results in Table II illustrate the effectiveness of the multicut approach and some of its shortcomings. In each example, the number of major iterations is reduced. This is due to the passing of more information on each major iteration as noted above. A difficulty arises, however, because of the increased size of (11) - (13) over (4) - (6). Although (4) - (6) in the worstcase may have many more constraints than (11) - (13), program (11) - (13) is initially larger and, hence, requires more time to solve. This leads to the increased time in solving NRG4 by NDSEP. NDSEP, in fact, spends 2.8 more CPUs solving (11) - (13) than NDREG spends solving (4) - (6) on NRG4. This problem is an especially bad case because the original problem is so small that the addition of 27 extra constraints increases its size nine-fold and has a significant slowing effect. These examples suggest that the multicut approach can lead to significant reductions in the number of major iterations. As indicated above, the worstcase advantage of the multicut approach in limiting major iterations is enhanced as m2 increases in size,and the experiments show that the multicut approach is most effective when the number of realizations K is not significantly larger than the number of first period constraints n. When K is large relative to n, it may be advantageous to use a hybrid approach in which 12

subsets of the realizations are grouped together to form a reduced number of combination cuts. The worth of this and other strategies is, however, problem dependent and should be demonstrated through experimentation in different and varied application areas. 13

Table 1. Problem parameters Problem Period 1 (A) Period 2 (W) Realizations ni mi p* n2 m2 P R NRG1 7 3 1.000 20 8.375 3 NRG2 7 3 1.000 20 8.375 3 NRG3 7 3 1.000 20 8.375 9 NRG4 7 3 1.000 20 8.375 27 SCAGR7.S2 36 16.191 79 39.092 8 *fraction of elements (excluding slack variable elements) which are nonzero Table 2. Experimental results Problem NDREG NDSEP Major Simplex CPUs Major Simplex CPUs Iterations Iterations Iterations Iterations NRG1 10 117.34 6 64.23 NRG2 13 163.49 9 92.35 NRG3 14 196 1.26 8 121 1.11 NRG4 14 207 3.19 7 166 5.66 SCAGR7.S2 10 138 1.66 7 108 1.40 14

Appendix A Assume that Q(x, ) = i - x if x x- x> and that E can take on the values 1, 2, and 4, each with probability 1/3. Assume also c-x = 0 and 0 < x < 10. Figure 1 represents the functions Ql(x), Q2(x), Q3(x), and Q(x). Since the first-stage objective cx is zero, Q(x) is also the function z(x) to be minimized. Assume the starting point is x = 0. The sequence of iterations for the L-shaped method would be: Iteration 1: x is not optimal; send the cut 0 > 7/3 - x. Iteration 2: x = 10, 2 = -23/3 is not optimal; send the cut 0 > x - 7/3. Iteration 3: 3 3 x3 = 7/3, 3 = 0 is not optimal; send the cut x+1 3 Iteration 4: 4 4 x = 1.5- = 2.5/3 is not optimal; send the cut 5-x 0 >.3 Iteration 5: x5 = 2, e5 = 1, which is the optimal solution. 15

Starting from xl = 0, the multicut approach would yield the following sequences: 4-x 2-x l-x Iteration 1: x is not optimal; send the cuts e1 > -, 92 > --- and 3 > --- 3 3 3 Iteration 2: x2 = 10, 2 = -2, e2 = -8/3, 2 = -3 is not optimal; x-4 x-2 x-l send the cuts e > -, 9> --- - 3 3 3 Iteration 3: x3 = 2, 13 = 2/3, e3 = 0, 93 = 1/3 is the optimal solution. Therefore, by sending separate cuts on Ql(x), Q2(x), and Q3(x), the full description of Q(x) is obtained in two iterations. 16

Appendix B Let n1 = 1, K = 2 be the realizations of with equal probability 1/2. For the first value of i, Q(x, i) has two pieces, such that Ql(x) = ( -x - 1 if x -1 0 if x > -1. While for the second value of i, Q(x, i) has four pieces such that Q2(x) = -1.5 x if x < 0 0 if 0 < x < 2 2/7(x-2) if 2 < x < 9 x - 7 if x > 9. Assume also that x is bounded by -20 < x < 20 and c = 0. Starting from any initial point xi < -1, one obtains the following sequence of iterate points and cuts for the L-shaped method. Iteration 1: xl = -2. 01 is omitted; new cut 6 > -0.5 - 1.25 x. Iteration 2: x2 = +20, e2 = -25.5; new cut' > 0.5 x - 3.5. Iteration 3: x3 = 12/7, 03 = -37/14; new cut e > 0. Iteration 4: x4 ~ [-2/5, 7], 04 = 0. If x4 is chosen to be any value in [0,2] then the algorithm terminates at Iteration 4. The multicut approach would generate the following sequence. Iteration 1: xl = -2, ei and Q\ omitted; new cuts 91 > - 0.5 x - 0.5. e2 - 3/4 x. Iteration 2: x2 = 20, ei = -10.5, e9 = -15; new cuts 1 > 0. e2 > 0.5 x - 3.5. Iteration 3: x3 = 2.8, 3 = 0, 23 = -2.1; new cut e2 > 2/7 (x - 2). Iteration 4: x4 = 0.552, 9e = 0, 9I = -0.414; new cut 62 > 0. Iteration 5: x5 = 0, e5 = e5 = 0, STOP. 17

Q,(x) Q2(x) \ 2 1 2 A Q(x) 3^W 1 2 4 Figure 1. Recourse Functions 18

References [1] J.R. Birge, "Decomposition and partitioning methods for multi-stage stochastic linear programs", Technical Report 82-6, Department of Industrial and Operations Engineering, The University of Michigan (Ann Arbor, MI, 1982); also to appear in Operations Research. [2] J.R. Birge, "An L-shaped method computer code for multi-stage stochastic linear programs", in: Y. Ermoliev and R. J-B. Wets, eds., Numerical methods in stochastic programming, to appear. [3] S.J. Garstka and D.P. Rutenberg, "Computation in discrete stochastic programs with recourse", Operations Research 21(1973) 112-122. [4] S.J. Garstka and R. J-B. Wets, "On decision rules in stochastic programming", Mathematical Programming 7(1974) 117-143. [5] J.K. Ho and E. Loute, "A set of staircase linear programming test problems", Mathematical Programming 20(1981) 245-250. [6] P. Kall, "Computational methods for solving two-stage stochastic linear programming problems", Zeitschrift fuer angewandte Mathematik und Physik 30(1979) 261-271. [7] F.V. Louveaux, "A solution method for multistage stochastic programs with application to an energy investment problem", Operations Research 28(1980) 889-902. [8] J. Silverman, "An accelerated nested decomposition algorithm for the solution of multi-period multi-scenario LPs", unpublished notes, Department of Operations Research, Stanford University (Stanford, CA, 1983). [9] R. Van Slyke and R. J-B. Wets, "L-shaped linear programs with application to optimal control and stochastic programming", SIAM Journal on Applied Mathematics 17(1969)638-663. [10] R. J-B. Wets, "Characterization theorems for stochastic programs", Mathematical Programming 2(1972) 166-175. [11] R. J-B. Wets, "Stochastic programming: solution techniques and approximation schemes", in A. Bachem, M. Groetschel, B. Korte, eds., Mathematical programming: state-of-the-art 1982(Springer-Verlag, Berlin, 1983) pp. 506603. 19