A Quadratic Recourse Function for the Two-Stage Stochastic Program John R. Birge Stephen M. Pollock Department of Industrial and Operations Engineering The University of Michigan Ann Arbor MI 48109-2117, USA and Liqun Qi School of Mathematics The University of New South Wales Sydney, New South Wales 2052, Australia Technical Report 95-13

A Quadratic Recourse Function for the Two-Stage Stochastic Program John R. Birge, Stephen M. Pollock Department of Industrial and Operations Engineering University of Michigan, Ann Arbor, MI 48109, USA and Liqun Qi 2 School of Mathematics The University of New South Wales Sydney, New South Wales 2052, Australia July 28, 1995 Abstract We present a quadratic recourse representation of the two-stage stochastic linear problem. Unlike the usual linear recourse model, it is differentiable with respect to the first stage decision variables. This offers the possibility of applying high convergence rate methods to solve the two-stage problem. We show that the quadratic recourse function approximates the linear recourse function (and the corresponding solution of the two-stage problem with quadratic recourse converges to the solution of the two-stage problem with linear recourse) as a parameter k -- oc and another parameter Ek -~ 0. We also give a bound for this approximation. 1This author's work was supported in part by the National Science Foundation under Award Number DDM-9215921. 2This author's work was supported in part by the Australian Research Council.

1 Introduction One of the main paradigms of stochastic programming is represented by the twostage stochastic linear program formulated as a master problem and a recourse problem [8, 9]. The master problem is min f(x) = ctx + 4(x) s.t. Ax =b (1) x> 0, where x e RJn1 is the first-stage decision vector, c E en is the cost coefficient vector for x, Ax = b and x > 0 are the linear constraints on x, with b E Rm and A E Smxn. 4<(x) is the expected value of the linear recourse function, 4>(x) = E(Q(x, x)), where 4(x, J) is defined by the recourse problem 5(x,) = min (q(y))ty s.t. W(O)y = T()x- h(() (2) > O0. Here < is a random variable of dimension r with distribution P(.), so by definition, ~ (x) = E( (x, )) = (x, )P(d(). (3) The vector, y ~E Rs, is the second-stage decision vector, h E Rr is the demand or resource vector, and T E Jrxn is a technology matrix. The usual origin of the constraint equations in (2) is the desire to satisfy the condition T({)x - h(Q) = 0. However, because of the random nature of both T and /1, T(J)x - h(<) does not equal zero in general. Thus Wy is introduced to represent the "discrepancy" (where W E wRXS) and q E Rs, q > 0, is the associated cost coefficient vector for non-zero y. To simplify our discussion, we only discuss in detail the fixed recourse problem: when W does not depend upon C. We also assume that T and q are independent of I and h = J, conditions corresponding to uncertain resource levels but known prices and technologies. Equation (2) then becomes O(x,) = min qty s.t. Wy = Tx - (4) y > 0. For convenience, we let ( be a discrete random vector, so that I @ (x) E (x, j)pj, (5) j=l where pj > 0 and Z= pj = 1. Our results can be readily extended to continuous There are two disadvantages associated with the two-stage stochastic program with fixed recourse represented by equations (1), (5) and (4). First, generally 1~ is 1

quadratic recourse function: (x,~) = min (qty)2 + kWy- Tx + 12 + ek (7) s.t. y > 0, where |lvll2 = vtv is the 2-norm of the vector v, and k and Ek are two positive parameters. We also assume qk(X, c) > 0. Thus, there is a clearly defined Ok that satisfies equation (7). Correspondingly, (1) and (5) are replaced by min fk(x) = ctx + Ik(X) s.t. Ax =b (8) x > 0 where 1 (Ik(X) E(qk(x, )) = k k(x, Sj )pj. (9) j=1 We may think of (7) as an alternative representation of the situation expressed by the equations of (6). Instead of minimizing (qty)2 with the constraint Wy = T -, we minimize the weighted sum of (qty)2 and \IWy - Tx + ll2, with k reflecting the relative importance of satisfying the T - = Wy constraints compared to minimizing (qty)2. The additional parameter k is added to ensure t hat > Ek > 0. This will be useful for establishing differentiability properties of Ok. [We could also replace k by an n x n diagonal matrix K such that each diagonal element of K.vweights" a component of Wy - Tx + <. In this case, (7) would have the form 2(x, ) = min (qty)2 + (Wy - Tx + )tK(Wy -Tx + J) + (1 s.t. Y > 0. A^ain. for simplicity, we restrict our discussion to the form (7), although the results )below can be shown to hold under the conditions IIK\ -- oc and k -- 0.] The idea of converting the objective function of (2) by a quadratic form is not Iiew. For example, the ELQP model considers the dual problem of (4): O(x, )= max (Tx- )tu s.t. WTu _ q, whlere u is the dual variable vector. ELQP adds a quadratic term iutHu to (11), to (obtain OH(X, ) = max (Tx - Y)tu + 2utHu s.t. WTu < q, whlere H is an n x n positive definite symmetric monitor matrix. As a result of formulations such as (12), superlinearly convergent algorithms have l)een developed by Qi and Womersley [16], Chen, Qi and Womersley [5], and Birge, Chien and Qi [2]. However, the model represented by (12) has no immediate relation to tile original linear recourse model (4), and, in particular, it is hard to interpret the "meaning" of the matrix H. Moreover, although we may let IIHII - 0 so that Oi, -+ A, this results in computational instability in the second-order algorithms used to solve the two-stage recourse problem with recourse OH. On the other hand, as we show below, when we solve the quadratic version of the original problem, not only does ok - as k -> oc, but the algorithms used to solve the problem represented by (8), (9) and (7) are stable as k -- oo. 3

3 Differentiability of the Quadratic Recourse Function By the theory of linear programming, k(x,( ), defined by (4), is not in general differentiable with respect to x. This makes it impossible to apply superlinearly convergent methods, such as the Newton method, to solve the stochastic program defined by (1), (4) and (5). Classically, superlinear convergence of a Newton method for solving a nonlinear optimization problem requires that the objective and the constraint functions of the nonlinear optimization problem are twice continuously differentiable. In [14], based upon the superlinear convergence theory for nonsmooth equations [11, 13, 15], Qi developed superlinearly convergent generalized Newton methods for solving a nonlinear optimization problem, whose objective and constraint functions are SC1, (i.e., they are continuously differentiable and their derivatives are semismooth [12]). In general, an SC1 function is not twice differentiable. A nonlinear optimization problem with an SC1 objective function and linear constraints is called an SC1 problem [12]. It has been shown that the ELQP is an SC1 problem [14, 16]. The superlinearly convergent generalized Newton method proposed in [14] was globalized by using a line search and the trust region strategy in [12] and [7] respectively. These methods were applied to the ELQP in [2, 5, 16]. In this section, we will show that the two-stage stochastic program defined by (8), (9) and (7) is also an SC1 problem. This opens the way to apply the superlinearly and globally convergent generalized Newton method [14, 12, 7, 2] to solve this problem. Before doing this, we briefly review the definition of semismoothness of a vector function and related concepts of generalized Jacobians of vector functions. Suppose that F: Rn -* Jm is a Lipschitz vector function. By Rademacher's theorem, F is differentiable almost everywhere. Let DF be the set where F is differentiable. At any point x E ERn, the B-differential [11, 13] of F at x is defined by AsF(x) = { lim VF(y)}, yEDF which is a nonempty set. The Clarke subdifferential [6] of F at x is aF(x) = conv aBF(x), which is a nonempty convex set. If for any h E Rn, lim {Vh} (13) 'EaF(x+th) tlo exists, then we say that F is semismooth at x. In this case, F is also directionally differentiable at x and F'(x; h) equals the limit in (13). The concept of semismooth functions was introduced for functionals by Mifflin in [10] and extended to vector functions by Qi and Sun in [15]. The following theorem establishes the SC1 property of the stochastic two-stage problem with quadratic recourse. 4

Theorem 1 The stochastic program (8), where (k is defined by (9) and (7), is an SC1 problem. Proof. Since O > CEk > 0, it suffices to prove that Oq is an SC1 function with respect to x. Define z Tx - ~ and 4k(Z) - 0(x, ). Then rewriting (7) gives Ok(Z) = min (qty)2 + kllWy - z112 + k (14) s.t. y0.(14) sSA. Y > 0. If Ok is differentiable with respect to z, then q2 is differentiable with respect to x and Vb(x, 2 ) = Ttvzkk(z). Let Z = (2zt 0)t E Rr+l W= ()kW and u = Wy E r+1. Then = {u Wy: y > O} is a polyhedron in r+l.1 Define g: r+l1 -_ U {+o}) by u { utu, if u E U, +oc, otherwise. Then g is a closed proper extended-valued strongly convex function. We now can write (14) as '/k(Z) = kztz-max{Ztu-utu} + Ek (15) U UU kztz - g*(2z, 0) + k, (16) where g* is the conjugate function of g. By Theorem 23.5 of [17], since g is strongly convex. g* is finite and continuously differentiable everywhere and its derivative at z is the unique maximum point in the maximum operation in (15). Furthermore, the derivative of g* is Lipschitz. Actually, it is not difficult to see that the unique maximum of the maximum operation in (15) is piecewise linear with respect to z. Hence, the derivative mapping of g* is semismooth [15]. This shows that sk, hence 4).. is an SC' function. Therefore, the stochastic programming problem (8), (9) and (7) is an SC1 optimization problem. O Now we can apply the generalized Newton (SQP) method proposed in [14, 12, 7] to solve (8). With an adequate nonsingularity condition, this method is superlinearly and globally convergent. In the next section, we will show that Vbk(z) converges to =(z) - o2(x, s) as k -- oc, and give an error bound. This shows that the generalized Newton method is also stable for this problem. It is noted that, although g* is convex, Ok and 4)k are not convex in general. In fact, by (15), if W is a nonnegative matrix, then Ok is the difference of two convex functions of -. If P is continuous, with an argument similar to that in [3] we can show that (k is twice differentiable. Then superlinear convergence can be established for quasi-Newton methods solving (8). 5

4 Approximation to the Linear Recourse Function In this section, we show that Ok(Z) approximates /(z) monotonically from below when k -. oo, and give an error bound for this approximation. Theorem 2 Suppose that (4) is feasible for z = Tx - C. Then for any 0 < k < K, we have k(, -6Ek < 4(x)-E6K < X(X Proof. Let Yk and YK be solutions to the minimum operations in the definitions of )k(Z) and V/K(Z) by (14) respectively. Let y* be any feasible solution of (4). Then we have Ik(Z) - k = (qtyk)2 + k\IWyk - z112 < (qtyK)2 + kIIWyK - z112 by optimality of yk < (qtyK)2 + KIIWYK - z112 since k < K -= K(Z) - EK < (qty*)2 + KgiWy* - z112 by optimality of YK = (qty*)2 by feasibility of y* = (z). (17) The conclusion of the theorem follows by taking square roots (and noting that qty > 0). D Corollary 1 In Theorem 2, as k - oo, \IWyk - zll - 0, where Yk is the unique solution of the minimum operation in the definition of k (z). Proof. By step 2 of the proof of Theorem 2, 0 < k\lWyk - Z1\2 _< (Z) - (qtYk)2 < () < 00. The conclusion follows since V(z) is finite. D The next theorem shows that as k -x o0, Ok(X, <) converges to 0(x, J) and gives an error bound. Theorem 3 Suppose that (4) is feasible for z = Tx - and a is the maximum value of 2-norms of optimal dual solutions of (4). Then, q/5(x,x)- ek monotonically converges to ~(x, ') from below and, for k large enough, 0< < - (18) - (Vk) 6

Proof. Let yk and y* be the same as in Theorem 2. Then, yk solves 2b(x, )-k = min qty s.t. Wy = Wyk (19) y > 0, while y* solves \$(x, m)= min qty s.t. Wy = z (20) y > 0. By Corollary 1, (19) can be regarded as a perturbation of (20) with a small change of WYk - z on the right hand side of the constraint of the linear program (20). By the perturbation theory of linear programming, we have I x)-(Ek - (X,)j < a\IWYk - zl1. By the proof of Corollary 1, qty. \\Wyk - zl < y By Theorem 2, v/k(X, ()-e k < ~(X, /). Combining these three inequalities, we have (18). The last conclusion follows. O Corollary 2 If ek -> 0, then ~k(X, 0) converges to Xb(x, )). Proof. The conclusion follows mfrom (18) and Theorem 2. - Example. To see how these bounds appear in practice, consider the following simple example (where Ek = 0 for illustration). ((x, <) = min Y1 + Y2 + Y3 s.t. Y1 +Y3 = z( Y2 + Y3 = Z2 y > o. Figures 1 and 2 show convergence of ~k for z = (1,0.5) and z = (1, 1). Note that the convergence is somewhat faster for z = (1, 1). In this case, the dual solution to (21) is not unique and has the same maximum norm, a = 1, as for z = (1, 0.5). The convergence behavior for z = (1, 1) may benefit from a smaller norm for the average of the norms of dual solutions. Theorem 3 also gives us the condition for the optimal solutions of (7) to converge to an optimal solution of (2). We use the following two results from the theory of epi-convergence ([1]). A sequences of functions, {f"}, epi-converges to f, if the epi-graphs of f", converge as sets to the epi-graph of f. 7

z1=1,z2=0.5 1 0.8 — 0.6 -0.4 -0.2 -0 0 dft~~~~~~~~~~~r - ---- 20 1 40 k 60 80 Figure 1: Convergence of ~k to X for z = (1, 0.5). z1=1,z2=1 1 0.8 - 0.6 0.4 0.2 - 0 10 20 30 40 k Figure 2: Convergence of Ok to ) for z = (1, 1). 8

Theorem 4 (/4], Theorem 2.3.) Suppose a sequence of functions, {g VI v = 1,... } epi-converges to g, then lim sup,,, (inf g') < inf g, and, if xk E argmingk for some subsequence, {gV}j, of functions from {gvk}1, and x = limk X k, then x E argming, and lim(inf g~k) inf g. (22) k The next result shows that the sequence {q~k} converges to b Theorem 5 (/1], Proposition 3.12). Suppose a sequence of functions, {g V jj 1,..,} pointwise converge ever~ywhere to g, i'.e., lim, gv (x) = g (x). If the gVl are monotone increasing, or monotone decreasing, and g is lower semi-continuous, then gV epiz-converges to g. The following theorem follows immediately from these observations. Theorem 6 (/4/, Theorem 2.7.) Suppose {g, g', =' 1,...} form a collection of functions defined on ~, with value in RJUoc, for all x, ~ -gx,)ismaurable, if P[~Ig(x, ~)) < cc] = 1, then j4 g(x, ~)P(d<) < oc, and gv(x, ~) epi-converges to g(x. ~) for all all ~ E 3, and all g1' are absolutely bounded by integrable functions, then the expectation functionals, fa gv(x, ~)P(d<) <00c, epi- and pointwise converge t o J,,g (x,~) P (ck) < 00. NowN, we can state our result for convergence of the optimal solutions. Theorem 7 Suppose that ~ has finite second moments, let { xk wk}7 be a sequence of op~timial p~rimal-dual solution pairs to (8), where the multipliers irk are associated ut/ theliner costrints Ax b. f {k. wk} has a i'mit point, (x* wF*), with ( 0. thenl (x*. ir *) is an optimal primal-dual solution pair f() Proof. If ~ has second moments, then it can be shown easily that q is absoliitelv integ-rable (see, for example, ([8])) and that, if P[~Iq5(x, ~)) < 00] =1, then f co.r )P(d<) < cc. The extensions of these results to kk follow, for example, fromIl Theorem 3. To invoke Theorem 6, letk ~(X,) = (,)-Ek and define gk (XI =kX ('JAlx=b~x>O}(x) Where the indicator function, 8s(x) = 0, if x E- S, and +00 if x ~~ S. U'sing, the bound in Theorem 3 and the result from Theorem 5, we then meet the (oii~fitiofls for Theorem 6. If we replace fk in (8) with fk, where /k is replaced by Qk- thenr the expectation functionals fk converge. Using Theorem 4, we have that hiniit points of minimizers of fk are optimal in (1). Next, we must relate fk to fk. Note that the optimality conditions for I-k yield 3'A* and i'7k such that c+V~k(ik) -?rt A = Pk ~ Oi '-k = 0. (23) For fk, the optimality conditions are: c+V4)k(xk) - 7i4A = Pk ~- 0, P ~ = 0. (24) 9

Note that for any x such that qk is finite for all <, we have Vlk continuous and, thus, V k(Z) =S fVk (X,i)P(d<) I k (X,)Vo (X, ) P) (25) = f ^ (25) = AkVDk(X), where Ak - 1 as k - oo. For any limit point, (, -r) of (Xk, 7), we must have, from (24): c + V~(x(.) - TtA = p > 0, (26) where <>(x) = limk Jk(x). Thus (xt, r) forms an optimal primal-dual pair in (8) with I replacing 1k, and (x, T) must be optimal in (1) by the argument above. E 5 Conclusions We present a quadratic approximation of the stochastic program with linear recourse. The approximation uses a linear-quadratic loss function similar to that used in decision theory. It also allows the use of superlinearly convergent methods that cannot be directly to the linear recourse problem. We showed that the approximation converges and that solutions of the approximation also converge to those of the original problem. Our interest in this form of quadratic approximation extends beyond an advantage for fast computational methods. We believe that the approach may also enable further approximations using known moments of the relevant random variables. References [1] H. Attouch and R. J-B Wets, Approximation and convergence in nonlinear optimization, in: O.L. Mangasarian, R.R. Meyer and S.M. Robinson, Nonlinear programming, 4, Academic Press, New York-London, 1981. [2] J.R. Birge, X. Chen and L. Qi, A stochastic Newton method for solving stochastic quadratic programs with recourse, AMR 94/9, Applied Mathematics Report, University of New South Wales, 1994. [3] J.R. Birge and L. Qi, Subdifferentials in approximation for stochastic programming, SIAM Journal on Optimization 5 (1995) 436-453. [4] J.R. Birge and R. J-B. Wets," Designing approximation schemes for stochastic optimization problems, in particular, for stochastic programs with recourse," Mathematical Programming Study 27 (1986) 54-102. [5] X. Chen, L. Qi, and R. Womersley, Newton's method for quadratic stochastic programs with recourse, Journal of Computational and Applied Mathematics forthcoming. 10

[6] F.H. Clarke, Optimization and Nonsmooth Analysis Wiley, New York, 1983. [7] H. Jiang and L. Qi, A globally and superlinearly convergent trust region algorithm for convex SC1 minimization problems and its application to stochastic programs, AMR 94/7, Applied Mathematics Report, University of New South Wales, 1994. [8] P. Kall, Stochastic Linear Programming, Springer Verlag, Berlin, 1976. [9] P. Kall and S.T. Wallace, Stochastic Programming, Wiley, New York, 1994. [10] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1977) 957-972. [11] J.-S. Pang and L. Qi, Nonsmooth equations: Motivation and algorithms, SIAM Journal on Optimization 3 (1993) 443-465. [12] J.-S. Pang and L. Qi, A globally convergent Newton method for convex SC' minimization problems, Journal of Optimization Theory and Applications forthcoming. [13] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Mathematics of Operations Research 18 (1993) 227-244. [14] L. Qi, Superlinearly convergent approximate Newton methods for LC1 optimization problems, Mathematical Programming 64 (1994) 277-294. [15] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathematical Programming 58 (1993) 353-367. [16] L. Qi and R.S. Womersley, An SQP algorithm for extended linear problems in stochastic programming, Annals of Operations Research forthcoming. [17] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, N.J., 1970. [18] R.T. Rockafellar and R.J.-B. Wets, "A Lagrangian finite-generation technique for solving linear-quadratic problems in stochastic programming", Mathematical Programming Study 28 (1986) 63-93. 11