SUBDIFFERENTIAL CONVERGENCE IN STOCHASTIC PROGRAMS John R. Birge Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109 Liqun Qi School of Mathematics The University of New South Wales Kensington, N.S.W. 2033 Australia Technical Report 91-12 April 1991

SUBDIFFERENTIAL CONVERGENCE IN STOCHASTIC PROGRAMS * John R. Birge Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109 and Liqun Qi School of Mathematics The University of New South Wales Kensington, N.S.W. 2033 Australia (Revised on April 15, 1991) Abstract In this paper, we discuss convergence behavior of subdifferentials in approximation schemes for stochastic programs. This information is useful for solving stochastic programs by nonlinear programming techniques. Wets [33] showed that epiconvergence of closed convex functions implies the set convergence of the graph of the subdifferentials of these functions. This conclusion is not true in general by a counterexample of Higle and Sen [14]. We show that epiconvergence of closed convex functions implies set convergence of subdifferentials of these functions at points where the limit function is G-differentiable and apply this result to convex stochastic programs. We also show that similar results can be achieved in three other cases of expectational functionals: piecewise smooth integrands, continuous probability distributions and loss functions. In the case of loss functions, we extend the existing results of Marti [19] to more general situations. Some basic methods using the approximate derivative information are also discussed. Keywords: approximation, subdifferential, stochastic programming. Section 1. Introduction * This work is supported in part by the Australian Research Council. John Birge's work is also supported in part by Office of Naval Research Grant N0014-86-K-0628 and the National Science Foundation under Grant EECS-885101. 1

In general, a stochastic programming problem can be formulated as [36]: minimize E{ fo(x, )} subject toE{fi(x, )} < 0, i = 1,...,s, (1.1) E{fi(x,)} = 0, i = s + 1,..., m, x E X C Rn, where - ( is a random vector with support 5 C RN, and a probability distribution function P on RtN, - fo: R x _ -. U {+oo}, - i 'n x -^, i= 1,..., m, - X is closed, - for all x E X, and i = 0,1,..., m, the expectational functional (Efi)(x):= E{fi(x,)} = fi(x,)dP() is finite. The stochastic program with recourse and the stochastic program with chance constraints are two special cases of this model [36]. The numerical solution for (1.1) is not an easy task [11][26]. One popular method is to use a discrete distribution to approximate P. In the case of the stochastic program with recourse, some approximations for the recourse function have also been proposed [5][6][16]. This results in approximation of fo(x, ). The resulting problem is a large-scale program, which can be solved by either decomposition methods [34], or perhaps variants of the interior point algorithm [2]. One may also propose methods to approximate fi(zx, ), for i = 1,..., m, and solve the resulting relatively tractable problem. In the stochastic programming literature, most of the literature focuses on the convergence of the solution vector as a by-product of the epi-convergence of the approximating expectational functionals. Some recent papers on this topic include [5], [9], [12], [14], [15], [17], [18] and [30]. Can we also get some approximation information on the subdifferentials of the objective function and the constraints of (1.1)? This information will be useful for solving stochastic programming by nonlinear programming techniques which use subgradients [11] [19] [20] [23] [24]. In [31], Wang suggested to solve stochastic programs by an approximate nonlinear programming method, i.e., solving the original problem by a nonlinear programming method and at the kth step using the kth approximation function value and its derivative or subgradient as the approximation value of the original problem and its derivative or subgradient. For this approach, the approximation information of subdifferentials is especially useful. Wang used the term differential stability to describe this information. 2

It seems that Marti [19] first addressed this problem. He considered a special class of expectational functionals: convex loss functions. This class is the most common expectational functional in the recourse problem. Wets [33] proved that epiconvergence of closed convex functions implies the set convergence of the graph of the subdifferentials of these functions. In Section 2, we show that epiconvergence of closed convex functions implies set convergence of subdifferentials of these functions at points where the limit function is G-differentiable and apply this result to convex stochastic programs. In particular, combining with recent epiconsistency result for convex stochastic programs of King and Wets [17], we establish general conditions for subdifferential convergence of approximation schemes for those problems in the sense of probability one. According to this result, a stochastic quasigradient-type method (see [10]) is suggested at the end of that section. Recently, Higle and Sen [14] gave examples that epi-convergence does not imply subdifferential convergence in the nonconvex case. They also introduced a notion of subdifferential convergence, which is weaker than the one we use here. Therefore, we cannot expect that nice subdifferential approximation behavior can be achieved as a bonus of epi-convergence in general. Since in practice we always use some approximation functions, it is thus important to ask in which cases nice subdifferential approximation behavior can be achieved. This behavior may be especially critical when subdifferential information is required by an algorithm or sensitivity analysis. We show that nice approximation behavior of subdifferentials of expectational functionals can be achieved in three other cases, namely, piecewise smooth integrands, continuous distributions and loss funcons. We study these three cases in Sections 3-5 respectively. The significance of piecewise smooth integrands is clear. The continuous distribution approximation probably was first suggested by Wets [32]. In that paper, he suggested a piecewise linear distribution approximation. In [8], Dexter, Yu and Ziemba suggested using the linear combination of lognormal univariate distribution approximations. Wets also discussed the possibility of continuous distribution approximations in [34] and [35]. Gassmann [13] discussed applications of normal distributions in stochastic programming. Important special cases of expectational functionals are loss functions. Marti [19] studied the subdifferential convergence behaviors of this class of functions as we commented before. Our results cover more general cases, and we show that everywhere strictly differentiable results can be obtained in reasonable conditions. Thus, in these special cases, differentiable methods may be used instead of nondifferentiable methods that are otherwise necessary. The main uses for our results are in the ability to use general optimization techniques that do not require complete optimization for a single approximation and that allow differentiable techniques to be used 3

in intermediate approximation iterations. We discuss possible applications in algorithms in Section 6. We give general methods and show how approximations may be used without requiring optimization for each approximation v. Certainly, this section only gives general ideas for possible uses of subdifferential approximation results. For more sophisticated nonlinear programming approaches in stochastic programming, the readers may refer to Marti [20], Marti and Fuchs [21], Nazareth and Wets [23] [24]. For the approximate nonlinear programming method, see Wang [31]. As said before, the issue of subdifferential convergence has been investigated by Marti [19], Wets [33], Higle and Sen [14] and Wang [31] in different contexts. While we study this issue again in this paper, we wish not to depreciate other approaches, such as the stochastic quasigradient method [10] [11] which depends upon sampling. Each method has its advantages in particular situations. Moreover, sometimes they can be employed together, as in the case discussed in Section 2. Section 2. Convex Stochastic Programs Consider the expectationalfunctional Ef = E{f(., ))}, where ( is a random vector with support 2 C SRN and f is an extended real-valued function on 3?n X, as described in the introduction. One has Ef(x)= Jf(xI,)P(d), (2.1) where P is a probability measure defined on RN. It is usually difficult to calculate Ef and its derivative or subdifferential. A popular approach is to approximate (2.1) by Ef(x) = f(x(,)Pi(d4), (2.2) where {P>,v = 1,....} is a sequence of probability measures converging in distribution to the probability measure P. In the following, we use E5 and Po instead of Ef and P for convenience. We denote the Lebesgue measure by m. If C C st" is a closed convex set, then ti is the support function of C, defined by Vi*(hlC) = sup{< z, h >: x E C}. A sequence of closed convex sets {C,: v = 1,...} in Rn is said to converge to a closed convex set C in R" if for any h E "n, lim A*(hlC,) = J*(hlC). V —+ 00 One may easily prove the following proposition: Proposition 2.1 Suppose that C and CV, for v = 1,..., are closed convex sets in tn. The following two statements are equivalent: 4

(a) C, converges to C as v -- +o0; (b) a point x E C if and only if there are xv E Cv such that xv -- x. * In the following, 0 is the symbol for the generalized subdifferential in the sense of Clarke [7]. Suppose an extended real-valued function g: R-n s? U {+oo} is locally Lipschitz at x, where x is in the essential domain of f. Then, the Clarke directional derivative of g at x with respect to h E "n is g~(x; h), where g~(x; h):= limsup[g(y + th) - g(y)]/t. y-X,tlO The subdifferential of g at x is then: Og(x):= {u E Rn: < uh > < g~(x; h), h E "n}. When g is convex, the above definition coincides with the definition in convex analysis: Og(x):= {u E "n: g(x + h) > g(x)+ < u,h >, Vh E Sn}. The conditions of the following theorem are the same as the conditions of Theorem 2.8 of [5]. We only slightly strengthen its conclusions for our further use. Theorem 2.2 Suppose that (i) {P, v = 1,....} converges in distribution to P; (ii) f(x, ) is continuous on E for each x E D, where D = x: Ef(x)< +o} = {x: f(x, ) < +oo,a.s.}; (iii) f(., ) is locally Lipschitz on D with Lipschitz constant independent of f; (iv) for any x E D and e > 0, there exists a compact set Se and v, such that for all v > bv, I/ If(x 1)0Pv(d) < e, \S$ and with Vr = {: f(x,) = +oo}, P(V,) > 0 if and only if PV(Vx) > 0 for v = 0,1,.... Then (a) E' epi- and pointwise converges to Ef; if x, xv E D for v = 1,2,... and xv -- x, then lim Ef(x') = Ef(x); (2.3) V 00O 5

(b) El, where v = 0,1,..., is locally Lipschitz on D; furthermore, for each x E D, {9EfV(x): v = 0,....} is bounded; (c) if xv E D minimizes Ed for each v and x is a limiting point of {x^}, then x minimizes Ef. Proof: The epi- and point convergence were established in [5]. Let x, xv E D, xv - x. Then lim IE/(xV) - E"(x)t < lim |If (x,)- x\f (x, )P,(d~) — OO _< lim Ll1xv- xjP,(d~) = lim Lrllx" - xa V-0oo =0, where L. is the Lipschitzian constant of f(a, A) near x, which is independent of by (iii). By point convergence of E1 at x, lim E(x) = E (x). Putting these two results together, we have (2.3). This proves (a). For any x E D, y and z close to x, v = 0,1,..., IE (y) - E(z)l < I f(y, )- f(z)lP() < |Lxylly- zlP,(dE ) =LIIly - Zl, where L. is the Lipschitzian constant of f(.,) near x, which is independent of C by (iii). By [7], EY (x) is a nonempty, compact convex set, for each v; and the 2-norms of subgradients in these subdifferentials are bounded by L,. This proves (b). By (b), EV are lower semicontinuous functions. By (a), E1 epi-converges to Ef. By Theorem 3.7 of [36], we get the conclusion of (c). This completes the proof. ~ Certainly, the above proof is not difficult. We are interested in the conclusions. Since Ef4 is locally Lipschitz on D, by [7], for each x E D, AEf (x) exists and is a nonempty, compact convex set. Will OEf(x) 6

converge to Ef (x) in the sense of set convergence we just defined? Furthermore, by the Rademacher Theorem, E' is G-differentiable a.s. on D. Will VET(x) converge to VEf (x) for each x E D \ D1, where m(D1) = O? These are the topics of investigation of this paper. In the case of closed convexity, we may invoke Theorem 3 of Wets [33]. He proved that if g, gV: En" - S U {+oo}, v = 1,2,..., are closed convex functions and {g^} epiconverges to g, then the graphs of the subdifferentials of gV converge to the graph of the subdifferential of g, i.e., for any convergent sequence {(xV, U): uaL E agy(xM)} with (x,u) as its limit, one has u E dg(x); for any (axu) with u E ag(x), there exists at least one such sequence {(x, u"): u" E aOg(x")} converging to it. However, in general it is not true that ag(x) = lim agV(x) (2.4) V-*OO even if x E int(dom(g)). For example, let n = 1,g(x) = lxl,g^(z) = lxz if \x\ > -g"(x) = _ if li < for v = 1,2,.... Then g and gV are closed convex functions, {g"} epiconverges to g. For v = 1,2,..., g' is differentiable at x = 0 with Vg^(O) = 0. But 9g(0) = [-1,1]. However, if g is G-differentiable at x, (2.4) is true. We prove this fact here. Theorem 2.3 Suppose that g, g: "n - U {+oo}, v = 1,2,..., are closed convex functions and {g"} epiconverges to g. Suppose further that g is G-differentiable at x. Then Vg(x)= lim aOg(x). (2.5) v- 00oo Proof: By Theorem 3 of [33], any convergent subsequence of {u" E ag"(x)} converges to Vg(x). What we need to prove is that there exists v., such that for any v > v:, 9dg(x) is nonempty, and {Og"(x): v >,x} is bounded. We now prove these. Since g is G-differentiable at x, x E int(dom(g)). By Corollary 5 of [33], g"(x) converges to g(x). Thus, for v big enough, x E dom(g"), i.e., dg"(x) is nonempty since g" is closed convex. For any e > 0, since g is continuous at x, there exists a t > 0 such that g( + tei) < g(x) + E, g(x - tei) < g(x) + e, 7

for i = 1,2,...,n, where ei,i = 1,2,...,n, are the unit vectors in 9n". Since gv epiconverges to g, by the properties of epiconvergence [32] [33] [35], there exist y' -- x4 + tei and zv -x a - tei such that limsup gV(yi,) < g(x +tei) < g(x) + e v/ —00 and limsup g(z^) < g(x - tei) < g(x) + e. v-*00 Then, there exists v, such that for all v _> v, x E dom(g"), g9(y') < g(x) + 2e, (2.6) gv(zv) < g(x) + 2c, (2.7) gV(z) > g(x) - C, (2.8) lyi' - (x + tei)ll <, (2.9) and Ilz - (x - tei)ll < -, (2.10) where (2.8) holds because gV(x) converges to g(x). By (2.9) and (2.10), for any y satisfying IIy - x\l < n, y E conv yl^,..., y, z,..., n ). By (2.6), (2.7) and convexity of gV, for any y satisfying I[y - zl|| t gV(y) < g(z) + 2e. (2.11) Now, for any u E dgV(x) where v > v, if u 5 0, let tu Y = X + nlrull' By convexity of gv, gV(y) > gv(z)+ < u,y - x >= gV(x) + -|uII. (2.12) By (2.12), (2.11) and (2.8), Ilull [g^(y) - g^(x)]lt < 3 This proves that {gV'(x): v > V4} is bounded. The proof of this theorem is completed. u 8

Remark: In fact, for any x E int(dom(g)), there exists iv, such that for any v > vi, (9ga(x) is nonempty, and {dgv(x): v > vy} is bounded. Thus, for any x E int(dom(g)), the right hand side of (2.4) is nonempty and always contained in the left hand side of (2.4). But equality does not necessarily hold by our example. u We now apply Theorem 2.3 to expectational functionals. Corollary 2.4 In (2.1) and (2.2), suppose that f(-,,) is closed convex for each I E - and that Ef epiconverges to Ef. Then for D = dom(Ef), (d) there is a Lebesgue zero-measure set D1 C D such that Ef is G-differentiable on D \ D1, Ef is not G-differentiable on D1, and for each x E D \ D1 lim 9Ef (x) = VEf (x); v-00 (e) for each x E D, Ef(x) = { lim uV: u" E Ef(x^),xV -- x}. V-~00 Proof: By closed convexity of f(.,), Ef are also closed convex for all v. Now (d) follows Theorem 2.3 and the differentiability property of convex functions, and (e) follows Theorem 3 of [33]. u One now can combine Corollary 2.4 with any epiconvergence results for convex stochastic programs. For example, one may combine Corollary 2.4 with Theorem 2.2. Corollary 2.5 In the setting of Theorem 2.2, if f(.,0) is convex for each C E E, then (d) and (e) hold. Proof: By Theorem 2.2, EfV epi-converges to Ef. By convexity of f(,,), Ef are also convex for all v. By (b), Ef are lower semicontinuous, thus closed convex. Now the conclusions follow Corollary 2.4. u One may also combine Corollary 2.4 with some recent results of epiconvergence, such as the results of Lepp [18]. Perhaps an interesting combination is with the results of King and Wets [17]. We now let P, be an empirical measure derived from an independent series of random observations {i1,...,,} each with common distribution P. Then for all z, =l1 9

Let (-, A, P) be a probability space completed with respect to P. A closed-valued multifunction G mapping _ to 3" is called measurable if for all closed subsets C C_ ", one has G-(C):= {( e: G() n C 0} E A. In the following, "with probability one" refers to the sampling probability measure on {(i, v,,...} that is consistent with P (see [17] for details). Applying Theorem 2.3 of [17] and Corollary 2.4 of this paper, we have: Corollary 2.6 Suppose for each 4 E E, f(-,4) is closed convex and the epigraphical multifunction 4 - epi f(-,4) is measurable. Let Ef be calculated by (2.13). If there exist one point x E dom(Ef)and a measurable selection iu(4) E Of(x,4) with f lliu()jjP(d4) finite, then the conclusions of Corollary 2.4 hold with probability one. * King and Wets [17] applied their results to the stochastic program with fixed recourse minimize cx + / Q(x,,)P(d4) subject to Ax = b, (2.14) x> 0, where x E 9n and Q(x, ) = inf{q()y: Wy = T(4)x - ((4), y E M} (2.15) It is a fixed recourse problem since W is deterministic. Combining their Theorem 3.1 with our Corollary 2.4, we have Corollary 2.7 Suppose that the stochastic program (2.14) has fixed recourse (2.15) and that for all i, j, k, the random variables qicj and qiTjk have finite first moments. If there exists a feasible point x of (2.14) with the objective function of (2.14) finite, then the conclusions of Corollary 2.4 hold with probability one for f(x, ) = cx + Q(x, ) + 6(x), where 6(x) = 0 if Ax = b, x > 0, 6(x) = +oo otherwise. * 10

By Theorem 3.1 of [17], one may solve the approximation problem minimize cz + - Z Q(xi) i=1 (2.16) subject to Ax = b, x > 0, instead of solving (2.14). If the solution of (2.16) converges as v tends to infinity, then the limiting point is a solution of (2.14). Alternatively, by Corollary 2.7, one may directly solve (2.14) with a nonlinear programming method and use cx+ - Q(x, ) i=1 and ci=1 as approximate objective function values and subdifferentials of (2.14) with v = v(k) at the kth step. Notice that u E a.,Q(Xi) if and only if u is an optimal dual solution of (2.15) with = (i. In this way, one may directly solve the original problem but avoid the multiple integrals. Certainly, this approach needs further investigation for its convergence and practical performance. Section 3. Piecewise Smooth Integrands In the nonconvex case, if f is a "piecewise smooth" function in x, then we may still get similar results. We say a mapping (x, ) -* f(x,I ): Rn" x E — *? U {+00} is a piecewise smooth function in x if R" x E can be partitioned into countable pieces of convex polyhedra such that for any (x, ) in the interior of a piece, V/f(x, C) exists and is continuous in f, m(K()) = O, for each ( E E, where K(() = {x: (x,() is on the boundary of a piece of f}. We assume that any polyhedral set in " x lRN is measurable under the product measure m x P, for v = 0,1,.... Theorem 3.1 In the setting of Theorem 2.2, we only assume (i) and (iii) hold. If f is a piecewise smooth function in x as defined above, then the conclusion (b) holds and (f) there is a Lebesgue zero-measure set D1 C D such that Ef, for v = 0,1,2,... are G-differentiable on D \ D1 and for each x E D2 = D \ D1 lim VEf(x) = VEf(x). (3.1) V 00 11

Proof: The conclusion (b) holds since in the proof of Theorem 2.2 (b) only (iii) is invoked. Let J be the set of all the boundary (facet) points of polyhedron pieces of f. Then J is measurable under the product measure m x Pv, for v = 0,1,... Let J(x) = {( E -: (X, ) E J} for each x E D. Let Dv = {x E D: J(x) has positive P, measure}. By (2.5) and the Fubini Theorem, m(DV) = 0. Let D1 = Uv=o,1,....DV and D2 = D \ D1. Then the Lebesgue measure of D1 is zero. We now prove (f) is true for the sets D1 and D2. Let x E D2. By the property of D2, the set J(x) has zero measure in Pv and m, V,,f(x, ) exists and is continuous in C for E E \ J(x). Let h E Wn. Denote f(xz) = f(x, ). By the dominated convergence theorem, we have the existence of (E^)'(x; h) and the equality (E)(x;h) = J f (x;h)Ph(k). But f(x; h) =< Vf(x,.), h >, for E E \ J(x). The existence of VE]^(x) follows. Let c > 0 be given. Let h E R". By (b) and the property of D2, ft(x; h) is bounded by L, and continuous in _ \ J(x), where L, is the Lipschitz constant in the proof of Theorem 2.2 (b). Let 1 Ck ={ E: d(, J(z))< }, where 6 is a positive number. Since J(x) is a union of polyhedra of dimension less than N, Ck is also a union of polyhedra, thus, measurable in pv for v = 0,1,... But no= Ck = J(x). Since P(J(z)) = 0, sufficiently large k, we have P(Ck+l) < P(Ck) < 5L, 12

We may define a bounded continuous function (: E -+ SR such that 0(() = 1 if ( E Ck+l, 0 < 0(;) < 1 if 6 E Ck \ Ck+l, and 0(() = 0 otherwise. Then I Jm(()P(d() l P< P(Ck) 5Lp and Pv(Cktl) < 0() Pm (d) for v= 1,2,... By (i) lim (()Pv(d() = 0()P(d). Combining these, we see that there is a v' such that for all v > v', Pv(Ck+l) < Denote such a Ck+l by C. We may also define a continuous bounded function g: -R 3S such that g9() = f/(; h) for ( E - \ C, and g is also bounded by L,. By (i), there is a v'' such that for all v > v', I g (( ) Py (d) - g(()P(d1)j < " Now, for all v > max{I', v'}, I(Er)'(x; h) - (Ef)'(x; h)l =\1 ff(x; h)P^(d()- f (x; h)P(d)\ < lft(x; h) - g()\Pi(d) + f \f(x; h) - (()|P(d) + | g()P(d) - g()P(d) <2LP^,(C) + 2LP(C) + <e. Thus, lim(Ef)'(x; h) = (Ef)'(x; h). i,-00 Since this is true for all h E n", (3.1) holds. This completes the proof. I Section 4. Continuous Distributions 13

We say that a function p: 9JN -- R is piecewise continuous if RN can be partitioned into countable pieces of convex polyhedra such that p is continuous in the interior of each piece. We say a probability measure P on RN is continuous if there is a piecewise continuous function p: RN -+ 9+ such that P(d~) = p(()d. One may anticipate that many practical probability measures are continuous. Piecewise linear and piecewise constant probability measures may be regarded as possible candidates for approximating measures P,. In the following, we will see that if P is a continuous probability measure and we use continuous probability measures to approximate it, then we have similar results to Corollary 2.3 and Theorem 3.1, even slightly stronger. In the proofs of the following theorem and Theorem 5.2, we use an alternative subdifferential of nonconvex functions besides the Clarke subdifferential to establish G-differentiability results. We let &~ be the symbol for the generalized subdifferential in the sense of Michel and Penot [22]. Again, suppose an extended real-valued function g: 9e - 9 U {+oo} is locally Lipschitz at x, where x is in the essential domain of f. Then, the Michel-Penot directional derivative of g at x with respect to h E 9t" is g"(x; h), where g~(x; h):= sup {limsup[g(x + tk + th) - g(x + tk)]/t}. kERn tlO The Michel-Penot subdifferential of g at x is then: O*g(x):= {u E 9n: < u, h > < g~(x; h),Vh E "n}. An advantages of the Michel-Penot subdifferential is that 90g(x) is singleton if and only if g is G-differentiable at x. Thus, g is G-differentiable at x if and only if *(z; ei)+ g~(x;-ei) = 0, for i = 1,2,..., n, where ei is the ith unit vector. This fact is used in the proof of the following theorem. Theorem 4.1 In the setting of Theorem 2.2, we only assume (iii) holds. If {P,,^ = 0, 1,.... are continuous probability measures with Pj(do) = p=(~), where p,: 2 -+ +, v = 0,1, 2,... are piecewise continuous functions, (v) lir J Ip() -p()I = 0; t —.00 14

(vi) the map (x,) -+ f(x, ) is Lebesgue measurable in D x RN; then the conclusions (f) and (g) hold: (f) the same as in Theorem 3.1. (g) for any x E int (D), lim 9Ef(x) = 9Ef(x). (4.1) V- *4+00 Proof: (f). By (iii), f(.,,) is locally Lipschitz for each ~ E S. By the Rademacher Theorem, f(.,) is Gdifferentiable almost everywhere in D for each ~ E E. Let J = {((x,) E D x EV: f(x,) exists}. We will show that J is Lebesgue measurable; Let ei be the ith unit vector in En. Denote f(x,0) by f((x). Consider f/(x; ei) = sup {limsup[f(z +tei + tk,) - f(x + tk, )]/t}, (4.2) kER" tO the Michel-Penot directional derivative of fe at x with respect to ei. Since f(., () is continuous, we may let t I 0 and k only take rational values in (4.2). By (vi), (x, C) f(x + tei + tk,) and (x, ) f(x + tk, ) are Lebesgue measurable. Therefore, f((z; ei), as the "countable sup limsup" of Lebesgue measurable functions of (x, (), is Lebesgue measurable. Similarly f(*(z; -ei) is also Lebesgue measurable. So the set Ji = {(x,() E D x (9N \ V) f (x; ei) + fe(x; -ei) = 0} is also Lebesgue measurable. However, by the discussion on the Michel-Penot subdifferential before this theorem, J = nn Ji. Thus, J is also Lebesgue measurable. Let J(x) = {( E -: (z,f) 4 J} for each x E D. Let D1 = {x E D: m(J(x)) > O}. By the Fubini Theorem, m(D1) = 0. Let D2 = D \ D1. We now prove (f) is true for the sets D1 and D2. First we show that the G-derivative VE' (x) exists for x e D2. Let x E D2. Then Vf(x, a) exists a.s. Let h E /n. Similarly, we may show that the map ) ~- f((x; h) is Lebesgue measurable and bounded. By the dominated convergence theorem, we have the existence of (Eyf)'(x; h) and the equality (Ef)'(; h) = f(x; h) p^ ()d = Jf(x; h) Pv (d). (4.3) 15

But f}(x; h) =< Vf(x, ), h >, for J e J(x). The existence of VE(x) follows. By (4.3), I(E)'(x; h) - E[(x; h) <J If(x; h)\. p,(I - p(/)Id (4.4) <Lxihll |IP I()-p() where Lx is the Lipschitz constant of f(.,() near x, which is independent of C by (iii). By (4.4) and (v), (3.1) holds. This proves (f). (g). Let x E int (D). Notice that (b) still holds since it only relies on (iii). By (b), aEf(x) are nonempty, compact and convex sets for v = 0,1,.... Let h be an arbitrary vector in R" and e be a small positive number. Let U(x) be a very small neighborhood of x. For any y in U(x) f D2, by (4.4), (E )'(y; h) tends to Ef (y; h) uniformly. Then there is a v(e) such that for any v > v(e) and y E U(x) n D2, j(EY)'(y; h) - E (y; h) < c. (4.5) By 2.5.1 of [7], (Ef(x) = conv{limVEf(y) y -- x,y E D2}, (4.6) for v = 0,1,.... Then there is a sequence yj -+ x, yj E U(x) n D2 such that V*(hl|Ef(x)) = lim Ef(yj; h). (4.7) By (4.6), lim sup(Ef)'(yj; h) < q* (h IEf (x)), (4.8) j —oo for v = 1,.... By (4.5), (4.7) and (4.8), for any v > v(e), I*(hIOE|(x)) < T*(hljEf(x)) + c. (4.9) Similarly, by (4.6), there is a sequence yj -- x, yj U(x) n D2 such that V*(hI9Ef(x)) = lim (E^)'(9j;h), (4.10) and limsup Ef(j%; h) < *(h9Ef (x)), (4.11) J'-^OO ~ k(hd~ 16

for v = 1,.... By (4.5), (4.10) and (4.11), for any v > v(e), 9*(hldEfV(x)) < 9*(hOEf (x)) + c. (4.12) By (4.9) and (4.12), lim **(hlOEf(xz)) = '*(h!aEf(x)). V ---+OO Then (4.1) follows. This completes the proof of this theorem. I Section 5. Loss Functions We now discuss a special case of (2.1), in which f(x, ) = u(T()x - ()), where u: y"nm - 3? U {+00} is a loss function, E(() E Jm and T(C) E Rmxn. Consider Eu(x) = u(T()x - (())P(d), (5.1) where P is a probability measure on RN. The expectational functional (5.1) arises in stochastic program with recourse. See [5][6]. It also appears in other applications such as error optimization and optimal design. See [21]. In stochastic linear programming with recourse, u is piecewise linear and convex. In more general problems, however, u may not even be convex. This situation occurs, for example, when u is a loss if inventory, T(()x, is less than demand, C(). This penalty will generally become flat for very low values of T()x - C((), voiding convexity. Nevertheless, it is useful to characterize the convergence to critical points as we do here. Again, approximate (5.1) by Eu=) J (TF )x -()) P (d and denote Eu by Em sometimes. In particular, we consider continuous approximations as in Theorem 4.1. Theorem 5.1 Suppose that (i) for any x A D, where D is an open set in n", E,(x) < +oo, for v = 0,1,...; (ii) u is locally Lipschitz in its support Q, and strictly differentiable in Q1 C Q such that Q2 = Q \ Q1 is a zero Lebesgue measure set; (iii) for any 5 E RN, there is a constant L such that IIT(_)|I < L; 17

(iv) for any x E D, the map A - T(,)x - C(4) maps a set in 3RN to a Lebesgue zero measure set in Rt'm only if this set has Lebesgue measure zero in R)N; (v) {PV, v = 0, 1,....} are continuous probability measures with Pv(d() = pv(()d(, where p,: 2 - S +, v = 0, 1, 2,... are piecewise continuous functions; (vi) there is a positive number 6 and a function rj: Q — * 9, such that i7(w) > sup{lldll: d E Au(IW), II\ - w|l < 6}, and for any x E D, r7(T(.)x - ((.))pv(.) is Lebesgue integrable, for v = 0,1,....; (vii) assumption (iv) of Theorem 2.2 holds; (viii) either u is convex or piecewise smooth, or Theorem 4.1 (v) holds; Then (a) EV are strictly differentiable in D for v = 0, 1,...; (b) for any x E D and xz -- z, lim E,(xvz)= Eu(x); V — +00 (c) if xv E D minimizes Eu, for each v and x is a limiting point of {x^}, then x minimizes Eu; (d) for any x E D, lim VEu(x) = VEu(x). (5.2) V- +00 Proof: By (ii) (iii) (v) and (vii), the conditions of Theorem 2.2 hold. We have conclusions (b) and (c). It suffices now to prove (a) and (d). Suppose x E D. Let U = {X E D: Ili - x\\ < 6/L} and g,9() = L7r(T()x - C(())pv(.). Then for x, x E U, by (ii) and 2.3.7 of [7], u(T()x- - u(,)) - u(T()i- C(()) e < 9u(tw),T()( - () ) >, where tw is a point in [T(~)x - 4((),T():i - ((a)]. Let w = T()x - C((). Then, \T()i- )-wll < IIT()(x-x)\ _< S, 18

JJT().i - (() - w1l < JJT()(.i - x)II < b. Thus, iwI - wll < 6. By (iii), (v) and (vi), \u(T()x - C(-))Pv() - u(T()x - C(())p(|) < gv() - x where g, is Lebesgue integrable on RN. However, Eu () = u(T()x - ((C))p,(0)d. Hence, by 2.7.2 of [7], for z E D, v = 0, 1,..., Eu is locally Lipschitz and Eu(x) C f 9u(T())x)- ((5.3) But the right hand side of (5.3) is singleton by (ii), (iii) and (iv). Thus dEu(x) is also a singleton and equality holds for (5.3). By 2.2.4 of [7], Ev is strictly differentiable at x for v = 0,1,.... This proves (a). Now, by Corollary 2.5, Theorems 3.1 and 4.1, (5.2) follows (viii). This completes the proof of this theorem. ~ We note that Marti [19] establishes similar results for u continuous, positively homogeneous, and subadditive. We extend his results to this more general loss function. We now justify the conditions of Theorem 5.1. If T is deterministic, then conditions (iii) and (iv) of this theorem hold naturally. Condition (ii) is equivalent to saying that u is locally Lipschitz and that the Clarke subdifferential of u is single-valued almost everywhere. Such functions are called primal functions in [27][28][29]. Convex functions, concave functions, differences of convex functions, regular functions and semismooth functions are all examples of primal functions. All primal functions defined on an open set form a linear space, i.e., the class of primal functions is closed in addition and scalar multiplication. For more discussion on primal functions, see [27][28][29]. As said before, in stochastic linear programming with recourse, u is a convex piecewise linear function. Thus, conditions (ii) and (viii) hold. Furthermore, condition (vi) also holds as long as the Pv are integrable for v = 0,1,.... For condition (v), as we said before, the piecewise linear distribution approximation suggested by Wets [32] and the linear combination of lognormal univariate distribution suggested by Dexter, Yu and Ziemba [8] are good examples of continuous approximations. In [4], we will discuss this topic more. 19

Theorem 5.2 If we delete the almost everywhere strict differentiability requirement on u in Theorem 5.1, then all the conclusions still hold except conclusion (a) is changed to: (a) Eu are G-differentiable in D for v = 0,1,.... Proof: We use the Michel-Penot subdifferential instead of the Clarke subdifferential in the proof of Theorem 5.1. Instead of invoking 2.7.2 of [7], we invoke Theorem 5.2 of [3] now. This implies that the left-hand-side of (5.3), now the Michel-Penot subdifferential of Eu at x, is a singleton. By the properties of the Michel-Penot subdifferential [3], Eu is G-differentiable at x E D. This completes the proof. * Another approach to approximate (2.1) is by using Ef (x) = f where {f",V = 1,....} is a sequence of extended real-valued functions converging pointwise to f. The convergence of Efv to Ef was also discussed in [5] in the context of epi-convergence. See Theorem 2.7 of [5]. One may also derive results similar to Corollary 2.5, Theorems 3.1, 4.1, 5.1 and 5.2 by considering these four cases. Section 6. Application in Algorithms The major motivation in the development of continuous approximation schemes is for uses in algorithms. In this section, we demonstrate that some basic algorithms based on gradient and subgradient information can incorporate continuous approximations into convergent procedures. We first consider methods based on the result in Corollary 2.5. In this case, we only suppose that subgradients are available. The major result is that we obtain convergence to a value below some goal or else demonstrate that no value below the goal exists. The algorithm is a direct modification of the subgradient method given by Polyak [25]. It can also be strengthened to allow for other step sizes as in Allen et al. [1]. We state the algorithm as follows. We assume that f is a convex function of x. The objective is to minimize E (x) over x E D. Subgradient Algorithm Step 0. Suppose sequences c, -+ 0, 7k -` 0 such that ZEkO 7k = +0, a goal value G, a maximum iteration count per approximation of Imax, and an initial point x~. Let k = 0, i = 0, and v = 0. 20

Step 1. Pick C E aEf(xk). If 11iC1 < ev, let xk = xk+1 and go to Step 2. Otherwise, let k+1 = xk - k((/II). (6.1) If Ey(xk+l) < G, go to Step 2. Otherwise, go to Step 3. Step 2. Let = v + 1, i = 0, and k = k + 1. Go to Step 1. Step 3. Let i = i + 1. If i > Ima, go to Step 2. Otherwise, k = k + 1, go to Step 1. We note that alternative choices for the step length parameter (here 7k/|llCl) are possible but we will show convergence just for this case. The basic idea of the algorithm is to follow the subgradient algorithm steps unless a small subgradient is found, an approximate function value is less than the goal value, or neither has been found within Imaz iterations with a single approximation. In these cases, the approximation is refined. From Corollary 2.5, we have conditions for the subdifferential sets to converge. We need an additional condition about the boundedness of the sequence of iterates: (vii) The iterates xk of the subgradient algorithm all belong to D and the diameter of D is a finite value, M = supy,zED{lly - xll}. This last condition can be guaranteed by adding a barrier or penalty function near the boundary of D. We assume it here for simplicity. Together the conditions lead to convergence in this algorithm as in the following theorem. Theorem 6.1. Suppose the conditions of Theorem 4.1 and (vii) above, then the subgradient algorithm given above produces a sequence {xk} such that E( (x) < G for some limit point x of {xk} or Ef (x) > G for all x E D. Proof: The proof follows the same form as in a proof of the subgradient method's convergence. Suppose that there exists x* such that E1 (x*) < G but E1 (x) > G for any limit point x of {xk}. Since D is bounded, there exists K such that for all k > K, Ef(xk) > G. Otherwise, there exists {xk'} with Ef(xki) < G and some x = limi xki with Ef(x) < G. By the continuity of Ef, for all sufficiently small c, there exists some 6 such that for all j1x - x*ll < 6, Ef(x) < G - eM. Suppose K is such that for all k > IV, there exists I77 E aEf (Xk) (where v is the corresponding approximation index used by the algorithm at iteration k) with 117 - 7711 < eM and any t7 E aEf( E ). This is always possible for any c > 0 by Theorem 4.1 and noting that the algorithm always increases v if a value below G, a small subgradient, or neither condition has been found in Imax iterations. 21

Next, we suppose that y = z* + 6(?^/11I7^I1). From this for k > max{K, K}, we obtain: Ef(y) > Ef (xk)+ <, y- Xk > > G+ < r7-rf,y-zk > + < Vf,x* -xk > + < rV y-x* > > G - \\lr - 'lIIly - xkll+ <.v,x* - > +61|l|I, (6.2) ~ G - cM+ < rl, x* - xk > +^||V|, which implies that < 7rf,z- k > /11i"1 < 6. But, we also have that Ixk+1 - x*112 = lIk - * 112 -27k < 1^/||I1ll xk - X* > + k2. Hence, 7k(26 _ 7k) < _-l.k+l _ x*112 + I[Xk - X*112, (6.3) but summing yields infinity on the left hand side of (6.3) and Ixk - x* 12I on the right, a contradiction. The result follows. * This result shows that subgradient information may be sufficient to produce an algorithm that achieves an optimal value. If continuous distribution approximations are used however, we may have the results of Theorem 5.1 and convergence of derivatives. This result allows the use of any method that uses derivatives. It is contained in the following general method. Now, in the context of Theorem 5.1, we suppose that Eu has a finite minimum and that the solution set is compact. Gradient Method Step 0. Suppose a sequence ev -- 0 and an initial point x0 E D. Step 1. Follow a convergent descent algorithm for unconstrained minimization of a differentiable convex function that uses VE,(xk) to generate a new point xk+1 (i.e., an algorithm that generates a sequence of points with decreasing function values that either terminates with an optimal solution or has all limit points as optimal solutions to the unconstrained minimization problem such as in steepest descent ). Go to Step 2. Step 2. If VE'(xk+1) < eV, let v = v + 1, k = k + 1 and go to Step 1. Otherwise, let k = k + 1 and go to Step 1. Theorem 6.2. Suppose the conditions of Theorem 5.1, then the gradient method given above produces a sequence xk such that all limit points x* of {xk} minimize Eu. 22

Proof: When v is not updated, the method just follows whatever algorithm has been employed in Step 1. By the differentiability assumption, the algorithm will generate some xk(^) such that IIVEa(zk(V))II < E. Thus, the algorithm continues to refine the approximation. At each xk(V), the bound on the norm of the gradient decreases so we have that there exists some KA such that for all v > A1, tIVEu(xk(v))II < e/2. From Theorem 5.1, for any c, there exists K2 such that for all v > A2, HIVEu,(xk(v))- VEu(xk(v))lI < e/2. Hence, for any v > max{.Ai,./2}, IVEu(xzk(^))I <: c. Thus, any limit point x* of {xk(^)} minimizes Eu. Moreover, for any k, k(v) < k < k(v + 1), EU(Xk(v)) > EU(Xk). Thus, we also have Eu(Xk(^)) + 6k > EU(Xk) for some 6k -+ 0. Hence, Eu(xk) -> min Eu(x), proving the result. ~ These results are given to show that continuous approximation schemes have the advantage of allowing optimization with methods that require derivatives and that the resulting algorithms may converge under suitable assumptions. Specific forms of the approximations and computational results with these procedures will be reported in a following paper [4]. We note that similar results are also reported in Wang [31] for several algorithms. His procedure relies on the epiconvergence of Ef to Ef and requires descent when using subgradients. Since this may not occur at some approximation, refinement may be necessary before optimality at that approximation and before any steps are possible. Our procedure allows progress at each approximation either through differentiability or through the subgradient method. REFERENCES [1] E. Allen, R. Helgason, J. Kennington and B. Shetty, "A generalization of Polyak's convergence result for subgradient optimization," Mathematical Programming 37 (1987) 309-317. [2] J.R. Birge and L. Qi, "Computing block-angular Karmarkar projections with applications to stochastic programming," Management Science 34 (1988) 1472-1479. [3] J.R. Birge and L. Qi, "The Michel-Penot subdifferential and stochastic programming," Applied Mathematics Preprint 89/12, School of Mathematics, The University of New South Wales, 1989. [4] J.R. Birge and L. Qi, "Continuous approximation schemes for stochastic programs," in preparation. [5] J.R. Birge and R.J-B. Wets, "Designing approximation schemes for stochastic optimization problems, in particular, for stochastic programs with recourse," Mathematical Programming Study 27 (1986) 54-102. [6] J.R. Birge and R.J-B. Wets, "Sublinear upper bounds for stochastic programs with recourse," Mathematical Programming 43 (1989) 131-149. [7] F.H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. 23

[8] A.S. Dexter, J.N.W. Yu, and W. T. Ziemba, "Portfolio selection in a lognormal market when the inversot has a power utility function: computationals results," in: Stochastic Programming, ed. by M.A.H. Dempster, Academic Press, New York, 1980, pp. 507-523. [9] J. Dupacova and R.J-B. Wets, "Asymptotic behavior of statistical estimators adn of optimal solutions of stochastic optimization problems," Annals of Statistics 16 (1988) 1517-1549. [10] Y. Ermoliev, "Stochastic quasigradient methods and their applications to systems optimization," Stochastics 9 (1983) 1-36. [11] Y. Ermoliev and R.J-B. Wets, Numerical Techniques in Stochastic Programming, SpringerVerlag, Berlin, 1988. [12] K. Frauendorfer, "Solving SLP recourse problem with arbitrary multivariate distributions - the dependent case," Mathematics of Operations Research 13 (1988) 377 - 394. [13] H. Gassmann, "Conditional probability and conditional expectation of a random variable", in: Numerical Techniques in Stochastic Programming, eds. by Y. Ermoliev and R. Wets, SpringerVerlag, Berlin, 1988, pp. 237-254. [14] J.L. Higle and S. Sen, "Statistical verification of optimality conditions for stochastic programs with recourse," Working Paper 89-028, Department of Systems and Industrial Engineering, University of Arizona (Tucson, AZ 85721, November 1989). [15] P. Kall, "On approximations and stability in stochastic programming," in: Parametric Optimization and Related Topics, eds. by J. Guddat, et al., Akademie-Verlag, Berlin, 1987, pp. 387-407. [16] P. Kall, A. Ruszczyniski and K. Frauendorfer, "Approximation technicales in stochastic programming", in: Numerical Techniques in Stochastic Programming, eds. by Y. Ermoliev and R. Wets, Springer-Verlag, Berlin, 1988, pp. 33-64. [17] A. J. King and R.J-B. Wets, "Epi-consistency of convex stochastic programs," Stochastics and Stochastics Reports 34 (1991) 83-92. [18] R. Lepp, "Approximations to stochastic programs with complete recourse," SIAM J. Control and Optimization 28 (1990) 382-394. [19] K. Marti, "Approximationen von Entscheidungsproblemen mit linearer Ergebnisfunktion und positiv homogener, subadditiver Verlusfunktion," Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete 31 (1975) 203-233. [20] K. Marti, "Computation of descent directions and efficient points in stochastic optimization problem with invariant distributions," ZAMM 65 (1985) 132-156. 24

[21] K. Marti and E. Fuchs, "Computation of descent directions and efficient points in stochastic optimization problems without using derivatives," Mathematical Programming Study 28 (1986) 132-156. [22] P. Michel and J.-P. Penot, "Calcul sous-differentiel pour des fonctions lipschitziennes et non lipschitziennes", C. R. Acad. Sc. Paris. 298 (1984) 269-272. [23] J. L. Nazareth and R.J-B. Wets, "Algorithms for stochastic programs: the case of nonstochastic tenders," Mathematical Programming Study 28 (1986) 1-28. [24] J. L. Nazareth and R.J-B. Wets, "Nonlinear programming techniques applied to stochastic programs with recourse," in: Numerical Techniques in Stochastic Programming, eds. by Y. Ermoliev and R. Wets, Springer-Verlag, Berlin, 1988, pp. 95-119. [25] B.T. Polyak, "Subgradient methods: A survey of Soviet research," in: Nonsmooth Optimization, eds. by C. Lemarechal and R. Mifflin, Pergaman Press, Oxford, 1978, pp. 5-29. [26] A. Prekopa and R.J-B. Wets, Stochastic Programming 84, Mathematical Programming Study 27 & 28 (1986). [27] L. Qi, "The maximal normal operator space and integration of subdifferentials of nonconvex functions", Nonlinear Analysis, Theory Methods and Applications 13 (1989) 1003-1011. [28] L. Qi, "Semismoothness and decomposition of maximal normal operators", Journal of Mathematical Analysis & Applications 146 (1990) 271-279. [29] L. Qi, "Quasidifferentials and maximal normal operators", Mathematical Programming 49 (1991) 263-271. [30] S. M. Robinson and R.J-B. Wets,"Stability in two-stage stochastic programming," SIAM J. Control and Optimization 25 (1987) 1409-1416. [31] J. Wang, "Approximate nonlinear programming algorithms for solving stochastic programs with recourse," Annals of Operations Research, to appear. [32] R.J-B. Wets, "Solving stochastic programs with simple recourse, II," in: Proceedings of 1975 Conference on Information Sciences and Systems (John Hopkins University, Baltimore, Maryland, 1975). [33] R.J-B. Wets, "Convergence of convex functions, variational inequalities and convex optimization problems," in: Variational Inequalities and Complementarity Problems, eds. by R.W. Cottle, F. Giannessi and J.-L. Lions, John Wiley, New York, 1980, pp. 375-404. [34] R.J-B. Wets, "Stochastic programming: Solution techniques and approximation schemes," in: Mathematical Programming: The State of the Art - Bonn 1982, eds. by A. Bachem, M. Grotschel 25

and B. Korte, Springer-Verlag, Berlin, 1983, pp. 566-603. [35] R.J-B. Wets, "Stochastic programs with simple recourse," Stochastics 10 (1983) 219-242. [36] R.J-B. Wets, "Stochastic programming," in: Handbooks in Operations Research and Management Science, Volume 1: Optimization, eds. by G.L. Nemhauser, A.H.G. Rinnooy Kan, and M.J. Todd, North-Holland, Amsterdam, 1989, pp. 573-630. 26

UNIVERSITY OF MICHIGAN I 11 ll IIIIII11111 IIl ll 3 901504733 8325