REFINING BOUNDS FOR STOCHASTIC LINEAR PROGRAMS WITH LINEARLY TRANSFORMED INDEPENDENT RANDOM VARIABLES JOHN BIRGE Department of Industrial and Operations Engineering University of Michigan, Ann Arbor, MI 48109, USA STEIN W. WALLACE Chr. Michelsen Institute, N-5036 FANTOFT, Bergen, Norway Abstract: We consider a linear stochastic program where the elements of the right-hand side are linear transformations of independent stochastic variables, and show how bounds on the recourse (secondstage) problem can be found by working directly on the independent stochastic variables, rather than on the elements of the right-hand side themselves. Keywords: 655 Stochastic programming, bounds, dependent right-hand side. * Supported by National Science Foundation Grant No. ECS-8304065. Reprint order form can be sent to Stein W. Wallace, address as above.

L IHTRODICION When stochastic programs with recourse are solved in practice, it is usually assumed that the stochastic variables are independent. In this note we show how algorithmic obstacles can be overcome for one type of dependency. We consider the case where only the right-hand side is stochastic, and assume that each element of the right-hand side is a linear combination of independent stochastic variables. 2PRDBLEO_ F EhJLAIIDN A stochastic program with recourse can be formulated as follows. Find the vector x to minimize z where z = ex + Q(x) (1) subject to Ax = b x > 0 and Q(x) = EQ(xi.)p. where Q(x(.i) = Min{qy II Wy = (. - Tx, y > O}. Here p. is the probability that E =.. These problems can be solved 1 1 by the L-shaped decomposition method developed by Van Slyke and Wets [8]. This decomposition principle is a specialized variant of Benders decomposition [1]. Dlfferent algorithmic procedures based on this can be found in Wets [11,12], Wallace [9,10] and Birge [3,4]. 2,

In all algorithmic work so far, the usual assumption is that the different elements in the c-vector are independent. Let us now drop this assumption, and instead assume that E = Hn where H is some deterministic matrix and the vector n is a vector of independent stochastic variables. In a production context this could mean that the demand for raincoats in several neighboring cities depends on the same stochastic variables, namely some measures of the local weather-conditions. The cities' dependencies on these conditions will, however, not be the same. Therefore the rows of H will not in general be linearly dependent. 3 APPRDXIMAIIdISISCEMES If the number of possible outcomes for E (or n) is too large, we cannot find Q(x), but have to work with lower bounds L(x) and upper bound U(x). These bounds can be found by approximating the distribution of the stochastic variables. A description of such procedures can be fuund in Birge and Wets [5]. The distributions are approximated by partitioning the support of the stochastic variables into smaller rectangles, hereafter called cells. Note that since the n-variables are independent, the support of n is a rectangle. The support of the E-variables, on the other hand, is not a rectangle. The question now is the following. Should we work in the Espace or the r-space? In order to answer this, let us consider the calculations that are necessary in order to find the upper and the lower bounds. To find the lower bounds we shall need: 3,

I The conditional expected values within each cell. II The probability attached to each cell. To find the upper bound, on the other hand, might require that some probabilities attached to each extreme point of every cell is computed. See {5, Section 5] for a general discussion of this problem. In the case of independent right-hand sides, very good schemes exist. Some new work has also been done for the dependent case, see Gassmann and Ziemba [6]. Hence we might need to calculate III Probabilities attached to the extreme points. Finally there is the question of how to refine the approximation of the distributions if the lower and upper bounds are not good enough. This is usually done by partitioning one of the cells into two new cells. The question to answer is IV Which cell should we partition? Let us how see how these 4 questions can be answered depending on whether we work in the E- or the n-space. Question I. This is very simple in the n-space since the variables are independent. All we need to do is to find the expected value of one variable at a time. On the other hand, finding the conditional expected value in the a-space is rather cumbersome since the variables are dependent within each cell. This problem can, however, be regarded as a purely statistical problem, and for some distributions the problem is relatively easy to solve. Most of the known results in the dependent case treat the case when the E-variables are jointly distributed (e.g. a multinormal distribution) and not when E is the sum of a set of independent stochastic variables. One problem Is, for example, that most distributions are not closed under summation. Question II. The problem is approximately the same here. Things are easier in the q-space then in the E-space. In both cases, the general case in the E-space requires multiple integration, which becomes extremely difficult in higher dimensions. 4,

Question III. As mentioned above, this problem is easy in the independent case. Kall and Stoyan [7] proposed using the extreme points with highest value as an upper bound, and in [5] is formulated a general problem that results in a unique determination of appropriate extreme points weights for cells devided into simplices. This result is also in [6] where a linear program that can be solved for finding appropriate extreme point weights is provided. Question IV. The general belief today is that we should partition the cone which contains the vector Tx where x is the current V V approximation of the optimal solution. The reason is twofold. First, the polyhedral function Q(x) (and also L(x) and U(x)) is most nonlinear close to Tx. Second, the point Tx most certainly is a nonV V differentiable point of Q(x). It is therefore a point at which we would like to partition so that our linear approximations become more accurate. It is of course easy to find in which cell in the c-space Tx can be found. But where is (are) the point(s) correspoding to Tx in the q-space? To our knowledge this is an unanswered question. We shall in the next section show how the point(s) in the n-space that correspond(s) to the point Tx in the E-space can be found. V 4 dEINDIGJE PARIIIONINGDOINI The question to answer in this section is as follows. Given the linear combination E = Hn and the point Tx in the c-space. Which point(s) in the q-space correspond(s) to Tx in the E-space? In order to answer this question, we have to recapture the importance of the point Tx. First note that it does not represent the minimizing V point of Q(x,.). Figure 1 illustrates this in the one-dimensional case. But although Tx is not the minimizing point, it is a very interesting point because in almost all cases Q(x,.) will be non-differentiable v here. In other words, if a parametric analysis is performed on E, a pivot will have to take place when E reaches the value Tx. This is a 5.

very important observation. The vector Tx represents the only point where a pivot will have to be performed whichever element of E the parametric analysis is performed on. (We cannot guarantee, however, that Tx is in the support.) v Fig. 1. Possible form of Q(x.). Hence, if we partition [l,u] in Figure 1 into [1,Tx ] and [Tx,u] both the lower and the upper bound will become exact. In the general case, the bounds will become better, and Tx will be the best point to partition, but the bounds will not become exact because of other nonlinearities elsewhere. Let us then return to the question of where to partition in the qspace when Tx is known in the E-space. There are several cases to V consider here. First let us split the discussion into three parts. When doing this we shall assume that H has m rows and n columns and that rank H = min{m,n}, i.e. if m < n all the rows of H are linearly independent and if m > n all the columns of H are linearly independent. 6.

1. m = n. * -1 In this case we simply have that n = H Tx is the best place to partition. If this is not in the support of n, then we partition the cell adjacent to n which has the most "non-linearity" according to the partition direction finding methods of Section 5. 2. m < n. In this case, the number of l-variables is larger than the number of E-variables, a possible but rather improbable situation in most applications, as we see it. What we seek in this case is some n such that Hn = Tx, l(r(u where the vectors 1 and u represent the lower and upper bounds on the support of n. The problem is not necessarily feasible in this case. But if it is not, we know that Tx is outside v the support of E. This is not an unusual case, it can happen if cx in I1) is much larger than Q(x) or if Min Q(x,E) over E is always a corner solution. Consider the following program Min w = ez + ez subject to Hn + Iz+ - Iz- = Tx (2) 1 S n 4 u z O where I is a unit matrix of appropriate size. We shall in the following denote by w the minimal value of w in (2). We shall also let n be the value of n at the optimal solution of (2). 7,

If Tx is inside the support of E, w = 0 will be the optimal solution V of (2). The corresponding n will be a point in the support of r that maps into Tx in the support of [. The optimal solution above is in general not unique. If it is not, the n-values that map into Tx is a convex set. We then suggest to find an interior point in this set as a partitioning point so that we have the possibility of partitioning in all directions. (It is not possible to partition on a variable that is on one of its bounds.) If Tx is outside the support of {, w in (2) will be larger than V zero. In this case the optimal solution q will represent a point in the support of n that maps into the point in the support of E which is closest to Tx in the sense of the 1 norm. v 1 In all cases, (2) should therefore give us a useful point in the support of n at which we can partition. 3. m > n. In the previous case there will always be a point in the q-space (but possibly ouside the support of q) that maps into Tx. When m > n this is no longer the case. * Consider again problem (2). Still we have that w = 0 if Tx is in the V support of E. (The support of E is a part of a n-dimensional hyperlane in m.) The unique point n is again an interesting partitioning point. * If w > 0, Tx is either inside the range of H, but outside the map of the support of n, or Tx is outside the range of H. If we replace 1 < V <( u in (2) by E fm we can decide if Tx is in the range of H or V not. In both cases n will correspond to a point in the support of n that maps into a point In the support of E which is closest to Tx in the * V sense of the 11 norm. In these cases n might not be unique. 8,

This should fully describe the different possible situations. By * solving (2) we get a point n in the support of n where it is relevant to partition. We mentioned earlier in this section that Tx in general does not V correspond to the minimizing point of Q(x,.). There is one exception. Assume that q > O. In this case Min Q(x,.) = Q(x,Tx ) =. If this V V V is the case we can chose to solve the following problem in order to * find n Min v = qy subject to Wy - Hn = -Tx (3) 1 < ( ( u y O Problem (3) will result in a point in the support of n that minimizes * Q(x,.). If v = O, Tx was in the support of E, otherwise it was not. 5EfiDINSIIIE _TARIIIl DlECIIODN *. * Section 4 describes procedures for finding a cell S (containing n ) that is in the support of n and that will be partitioned to improve * the bounds L(x) and U(x). There are however many ways to partition S * We could, for example, split S along all coordinate directions, but, when n is large, it may be impractical to do this at every refinement of the bounds. Instead, we describe here a method for determining a single coordinate direction through which to divide S This procedure follows the development in [5]. Consider the function * h (r) = Q(x, Hr). This function is piecewise linear on S and we are currently approximating it on S by using its value at the conditional * * expectation of n on S and at the extreme points of S. To better 9,

approximate h, we would like to split S into two new cells so that V h is as nearly linear as possible on these new cells. In other words, we would like to cut across the direction in which h is most nonv linear. Birge and Wets. described a procedure for doing this in which one partitions a cell through the edge along which its subgradients vary most. This then represents the edge with the greatest non-linearity. We can do this for h provided we can find the subgradients of h at V V. the extreme points of S. Subgradients at neighboring extreme points would be compared and the pair having the greatest difference (in the direction between the points) would give the edge through which to split S 3. * To find the subgradients at. extreme points, s, i=1,....r, of S, we solve Maximize w = (Hs - Tx ) V subject to (4) nW 4 q. i i i A subgradient of ha at s is then r H where w solves (4). This is clear from the definition of h. V BJL HSERICAL EXAYSELE To show the usefulness of the procedures in Section 4, we consider the following small example. 10,

Minimize z x1 + x2 + E {2y + 5y + 2y + 5y + 2y3 + 5y 3 subject to x +X2 0.5 X1 + y - = h n (5) + Yl X2 + 2 2 h 2r X1 + X2 + Y3 3 = h 3r + + + xl, x2' Y. l' y2' y2 y 3 y3 > 0 and q is uniform [0, 2]. Problem (5) is an example of a simple recourse problem. In the table below, we solve (5) using the L-shaped code [4] and the general refinement scheme presented in [2]. We present two possibilities for partitioning the support of n. The first (Method 1) solves (2) and partitions through n as described above. The second (Method 2) partitions the subinterval with the greatest probability attached to it. The table presents results for varying values of H = (hi, h2, h3). In each of the examples, Method 1 required fewer partitions. In all but one example, Method 1 was also faster. We note, however, that in that example Method 1 had achieved a tighter bound than Method 2 when it terminated. In two of the examples, Method 2 required more than fifty major iterations of the L-shaped code and was stopped before achieving convergence. 11.

Table I. Comparison of alternative refinements. The columns are I - Number of partitions, II - Number of L-shaped iterations, III - Number of simplex iterations, IV - CPU seconds. The number of simplex iterations of Method 1 includes iterations in Problem (2). Method 1 Method 2 Ex H I II III IV I II III IV 1 (1,1,1) 4 11 132.25 8 11 180.45 2 (2,1,1) 7 9 187.38 8 6 96.30 3 (1.5,1,1) 3 8 91.16 15 12 314 1.16 4 (2,0,1) 2 9 46.08 8 8 117.37 5 (1.5,1.5,0) 8 11 228.48 - 50+ -.8+ 6 (1.5,0.5,1) 12 10 290.86 - 50+ -.8+ 12,

REFERENCES 7 REFERENCE [1] 3. F. Benders, Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematic 4 (1962) 238-252. [2] 3. R. Birge, Using sequential approximations in the L-shaped and generalized programming algorithms for stochastic linear programs. Department of Industrial and Operations Engineering, The University of Michigan, Technical Report 83-12 (1983). [3] J. R. Birge, Decomposition and partitioning methods for multistage stochastic linear programs. Operations Research, to appear. [4] J. R. Birge, An L-shaped method computer code for multi-stage stochastic linear programs. In Y. Ermoliev and R. 3-B. Wets (eds.), Numerical methods in stochastic programming. to appear. [5] J. R. Birge and R. J-B. Wets, Designing approximation schemes for stochastic optimization problems, in particular for stochastic programs with recourse. Working Paper WP-83-111, IIASA, Laxenburg, Austria (1983). (6] H. Gassmann and W. T. Ziemba, A tight upper bound for the expectation of a convex function of a multivariate random variable. To appear in Mathematical Pogramming Study. [7] P. Kall and D. Stoyan, Solving stochastic programming problems with recourse including error bounds. Math. Operationsforsch., Statist., Ser. Optimization 13 (1982) 431-447. [8] R. Van Slyke dcr,,;. J-B. Wets, L-shaped linear programs with applications to optimal control and stochastic programming. SIAM J. Appl. Math. 17 (1969) 638-663. [9] S. W. Wallace, A two-stage stochastic facility-location problem with time-dependent supply. In Y. Ermoliev and R. J-B. Wets (eds.), 135

HtIhRENCES Numerical methods in stochastic Drogramming, to appear. [10] S. W. Wallace, On network struktured stochastic optimization problems. Report no. 842555-8, Chr. Michelsen Institute, Bergen, Norway (1984). [11] R. J-B. Wets, Large scale programming techniques in stochastic programming. Working Paper WP-84-90, IIASA, Austria (1984). To appear in a volume on stochastic programming edited by Prekopa and Wets. [12] R. J-B. Wets, Algorthmic procedures for stochastic optimization. Report no. 842650-2, Chr. Michelsen Institute, Bergen, Norway (1984). 14,

Q 1 Tx, 15. 15,