The Michel-Penot Subdifferential and Stochastic Programming John R. Birge Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109 Liqun Qi School of Mathematics The University of New South Wales Kensington, N. S. W. 2033 AUSTRALIA Technical Report 89-23 July 1989 This report is also issued as Applied Mathematics Preprint 89/12 School of Mathematics The University of New South Wales Kensington, N. S. W. AUSTRALIA

THE MICHEL-PENOT SUBDIFFERENTIAL AND STOCHASTIC PROGRAMMING John R. Birge* Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109 and Liqun Qi School of Mathematics The University of New South Wales Kensington, N.S.W. 2033 Australia Abstract The Michel-Penot subdifferential of a locally Lipschitzian function is the principal part of the Clarke subdifferential. It does not contain extraneous "subgradients" at abnormal points. A locally Lipschitzian function can be determined by its Michel-Penot subdifferential uniquely up to an additive constant though this cannot be done by its Clarke subdifferential if the set of abnormal points is not negligible. A set-valued operator is the Michel-Penot subdifferential of a locally Lipschitzian function if and only if it is a maximal seminormal operator satisfying a cyclical condition. Various calculus rules hold for the Michel-Penot subdifferential. Equalities hold for these rules at a point under semiregularity, a weaker condition than regularity, and almost everywhere under conditions on the space alone. Applications in stochastic programming are discussed. Section 1. Introduction The Clarke subdifferential theory is a breakthrough in nonsmooth analysis. It extends the subdifferential theory of convex functions to nonconvex functions and has found applications in several fields [5]. However, at some abnormal points of certain nonsmooth functions, the Clarke subdifferential of this function may include some "extraneous subgradients". These "extraneous subgradients" may be further expanded through calculus. These abnormal phenomena are not so important for "not seriously" nonsmooth func* Supported by Office of Naval Research Grant N0014-86-K-0628. 1

tions, such as semismooth functions and regular functions. However, for some "seriously" nonsmooth functions, this abnormality cannot be ignored. These "seriously" nonsmooth functions can even be Lipschitzian functions. The following examples demonstrate these abnormalities: 1.1. Even at a point where the G-derivative exists for a Lipschitzian function, the Clarke subdifferential of the function at this point may not be singleton. For example, let f: — * R be defined by f(x) = x2sin(1/x) for x: 0 and f(O) = 0. Then f is a globally Lipschitzian and differentiable function with f'(x) = 2xsin(1/x)- cos(l/x) for x: 0 and f'(O) = 0. However, the Clarke subdifferential of f at 0 is [-1,1]. Thus, the Clarke subdifferential includes some extra "subgradients" at this point. When the measure of the set of such abnormal points is not zero, these "extraneous subgradients" may overgrow to cover the real derivatives. 1.2. Such abnormalities make the characterization problem of the Clarke subdifferential very difficult in general, if not impossible. In act, such a "seriously" nonsmooth function cannot be determined uniquely by its Clarke subdifferential up to an additive constant. For example [20], consider f:? — + R. Let S be a measurable subset of? with the property that for every nonempty open interval I, the intersection of I and 5, and the intersection of I and R \ S are both of positive measure. Define g(x) = 1 if x E S and g(x) = -1 if x E R \ S. Let f(x) = fo g(t)dt. Then f is globally Lipschitzian on R (with Lipschitzian constant 1) and in particular f is differentiable almost everywhere: f'(x) = g(x) for almost every x. From the construction of 5, however, we know that the Clarke subdifferential 9~f(x) = [-1,1] for every x E R. If we let ft(x) = f(x + t), then we still have the same Clarke subdifferential. It cannot be true in general that ft = f + const. Thus, it is not generally true that a function can be determined by its Clarke subdifferential uniquely up to an additive constant even in the one-dimensional Lipschitzian case. 1.3. The calculus of the Clarke subdifferential only has inclusion relationships. Under some regularity conditions, these inclusion relationships become equalities. One approach to treat these abnormalities is to assume that the set of the abnormal points is negligible for the function under consideration, i.e., the Clarke subdifferential of this function is singleton almost everywhere. In [16][17], Qi studied the characterization problem of 2

the Clarke subdifferential with this assumption. He characterized the Clarke subdifferential of such functions as cyclically maximal normal operators [16] and proved that semismooth functions satisfy this assumption [17]. One may argue that this assumption is artificial and thus trivializes the problem. However, according to a generalized version of the Rademacher Theorem [4], a locally Lipschitzian function is differentiable almost everywhere in a separable Banach space under the Haar measure. Thus, if we can exclude those "extraneous subgradients" from the Clarke subdifferential, this assumption would hold for all the locally Lipschitzian functions. Several authors have proposed alternative definitions to remove these extraneous subgradients (see Treiman [21]). Michel and Penot [9][10] gave such a definition of subdifferential. The Michel-Penot subdifferential of a locally Lipschitzian function at a point is a nonempty, convex, weak*-compact set, contained in the Clarke subdifferential of the function at this point. It is singleton whenever the G-derivative of this function exists at this point. Thus, no "extraneous subgradients" are included in their definition. The Michel-Penot subdifferential forms the principal part of the Clarke subdifferential and is the same as the Clarke subdifferential at normal points. It is also the same as Treiman's B-subdifferential in finite dimensions and was also defined by Caligaris and Oliva [3] in 1986. We follow Borwein, Fitzpatrick and Giles [2] and Ward [22] in calling this subdifferential, the Michel-Penot subdifferential. We summarize the definition and basic properties of the Michel-Penot subdifferential in Section 2. In Section 3, we prove various calculus rules for the Michel-Penot subdifferential. They include two chain rules, pointwise maxima, and basic calculus. We show that equalities hold for these rules under semiregularity, which is a condition weaker than regularity. In fact, a function is regular at a point if and only if it is semiregular and normal at this point. We also discuss some applications of these rules to the problem of minimizing the maximum eigenvalue of a symmetric matrix. We study in Section 4 the characterization problem for the Michel-Penot subdifferential. We prove that a locally Lipschitzian function is determined uniquely by its MichelPenot subdifferential up to an additive constant in a separable Banach space. We define the maximal seminormal operator there. The Michel-Penot subdifferentials of locally Lipschitzian functions and maximal monotone operators in finite-dimensional spaces are exam 3

pies of maximal seminormal operators. A maximal seminormal operator is the Michel-Penot subdifferential of a locally Lipschitzian function if and only if it is cyclical. When a maximal seminormal operator is weak*-upper semicontinuous, it is a maximal normal operator. One of our motivations for this study is stochastic programming. One major method for solving stochastic programs is the subgradient method [7][15]. However, the objective function of a stochastic program generally is a multi-integral of several variables [1]. The number of variables ranges from 5 to 100 [15]. The "extraneous subgradients" may cause catastrophic effects to the computation. We study ways to avoid such "extraneous subgradients" by using the Michel-Penot subdifferential in integral functionals in Section 5. We first establish an inclusion relationship for the Michel-Penot subdifferential of integral functionals. Again, this inclusion relationship is an equality relationship under semiregularity. Furthermore, by the generalized Rademacher Theorem, equality holds almost everywhere in finite dimensions and, in Haar measure, in separable Banach spaces. (This is not true in general for the Clarke subdifferential.) We may estimate the subdifferentials at other points using this information. In Section 6 we study applications of the above discussions in stochastic programming. In particular, we establish general optimality conditions and propose a framework for the use of subgradient methods for finding stationary points of Lipschitzian functions. Section 2. The Michel-Penot Subdifferential Suppose X is a Banach space and Y is an open subset of X. Consider a locally Lipschitzian function f: Y -- W?. The Michel-Penot directional derivative of f at x E Y in the direction h E X, is f~(x; h):= sup {limsup[f(x + th + tk) - f(x + tk)]/t}. kEX tlO The Michel-Penot subdifferential of f at x is the set &9f(x):= {u X*: <u, h > < f(x; h),Vh e X}. Most of the followings properties of the Michel-Penot directional derivative and the Michel-Penot subdifferential were given in [9] and [10]. The others are obvious. 4

2.1. f-(x;h) < f+(x;h) < f(xz;h) < f~(x;h), where f-(x; h) is the lower directional derivative: f-(x; h):= liminf[f(x + th) - f(x)]/t; tio f+(x; h) is the upper directional directional derivative: f+(x; h):= limsup[f(x + th)- f(x)]/t; tlo f~(x; h) is the Clarke directional derivative, f~(x; h):= limsup[f(y + th) - f(y)]/t. y-.x,tlO When f-(x; h) = f+(x; h), then the usual directional derivative f'(x; h):= lim[f(x + th)- f(x)]/t tjo exists and f'(x;h) = f-(x; h) = f+(x;h). 2.2. The mapping f~(x;.) is sublinear; f~(x;-h) = (-f)~(x;h); O~f(x) is a nonempty, convex, weak*-compact set; (9f is locally bounded in Y; for any real number s, 9~(sf) = s~Of. 2.3. f~(x; h) = max{< u, h >: u E a~f(z)}. 2.4. supkEx{f+(; h + k)- f+(x;k)} < fO~(; h) < supkEXf{+(; h + k)- f-(x; k)}. In particular if f'(x; k) exists for all k at x, then f~(x; h) = supkEx f'(x; h + k) - f'(x; k). Function f is G-differentiable at x if and only if o~f is singleton at x. In this case, &of(x) = {f'()}. 2.5. If f is a convex function, then 9~f = af, the subdifferential of f in the sense of convex analysis. 2.6. f~(x; h) = limsupy_ fO(y; h); 9~f(x) C o~f(x); 9~f(x) = 00f(x) if and only if ~f is weak*-upper semicontinuous at x. 5

2.7. Suppose that {fi: i E S} is a collection of locally Lipschitzian functions on Y. Let f:Y -+ R be defined by f(y) = sup{f(y): i E S}. Then for any x in Y, 9~f(x) C weak* - clconv{~fi(x): i E S(x)}, where S(x) = {i ) = f(x) f(x)}. 2.8. If f attains a local minimum or maximum at x, then 0 E O~f(x). For example, let f: S - s be defined by f(x) = x2sin(1/x) + 0.5x when x is not 0 and f(0) = 0. Then ~9f(0) = 0.5. Thus, 0 is not an optimum point. On the other hand, 0~f(0) = [-0.5,1.5]. We cannot draw a conclusion about the optimality from 0~f(0). Section 3. Semiregularity and Calculus Rules Suppose that X, Y and f are the same as in Section 2. Let x be a point in Y. We say that f is normal at x if &~f(x) = d~f(x). We say that f is semiregular at x provided for all h, the usual directional directional derivative f'(x; h) exists and f'(x; h) = f~(x; h). Proposition 3.1. Function f is regular at x if and only if f is normal and semiregular at x. Function f is strictly differentiable at x if and only if f is normal and G-differentiable at x. Proof: By 2.3, f is normal at x if and only if for any h, fO(x; h) = f~(x; h). Recall that f is said to be regular at x provided that for all h, f'(x; h) exists and f'(x; h) = fo(x; h) [5]. By 2.1, f'(x;h) < f~(x;h) < f~(x;h). The first conclusion follows. By Proposition 2.2.4 of [5], f is strictly differentiable at x if and only if 9~f(x) is singleton. However, ~f(x) is singleton if and only if O~f(x) is singleton and f is normal at x. By 2.4, O~f(x) is singleton if and only if f is G-differentiable at x. The second conclusion follows. I Therefore, semiregularity is a property weaker than regularity. By 2.4, if f is Gdifferentiable at x, then f is semiregular at x. According to the generalized Rademacher Theorem [4], a locally Lipschitzian function is differentiable almost everywhere in a finite dimensional space and, in Haar measure, in a separable Banach space. Recently, Preiss [14] showed that if X is a Banach space with an equivalent norm G-(F-) differentiable 6

away from the origin, then every locally Lipschitzian function defined on an open subset Y of X is G-(F-) differentiabl on a dense subset of Y. These show that semiregularity holds for locally Lipschitzian functions either almost everywhere or in a dense subset under conditions on the space alone. On the other hand a locally Lipschitzian function need not be regular almost everywhere as Example 1.2 shows. Certainly, if f is convex, then f is semiregular at x, since a convex function is regular at any point. The following Mean-Value Theorem for the Michel-Penot subdifferential was proved in [2]. Since the Michel-Penot subdifferential of f at a point is contained, and sometimes properly contained, in its Clarke subdifferential at this point, this theorem is stronger than the corresponding version for the Clarke subdifferential, the Lebourg Theorem (Theorem 2.3.7 of [5]). This again implies that those "subgradients" in the Clarke subdifferential but not in the Michel-Penot subdifferential are extraneous in a certain sense. Theorem 3.2 (Mean-Value Theorem). (Borwein, Fitzpatrick and Giles) Suppose that [x, y] is a line segment in Y. Then there exists a point z in (x, y) such that f(y) - f(x) E < d~f(z),y — x > We now discuss the chain rule for the Michel-Penot subdifferential. Suppose that f: Y -, Rn is given by f(y) = g(p(y)), where p: Y — + R and g: Rn -- R are locally Lipschitzian. Then f is also locally Lipschitzian. Denote the components of p by pi, i = 1,2,..., n. Assume that g is normal at p(x), where x is a point in Y. The inclusion relationship part of the following chain rule is similar to the corresponding part of Theorem 2.3.9 of [5] for the Clarke subdifferential but the proofs are not all the same. The difference is that in general 9~p is not weak*-upper semicontinuous at x. Thus, we cannot invoke here a lemma as the one in the proof of Theorem 2.3.9 of [5]. The equality part of this chain rule is stronger than the corresponding part of Theorem 2.3.9 of [5] by only requiring semiregularity for each Pi in (i) and G-differentiability for p in (ii). Theorem 3.3 (Chain Rule I). One has 0~f(x) is contained in weak* - cl conv S, (1) 7

where S = { aivi vi E a~pi(x),a E O~g(p(x))}, and equality holds under any one of the following additional hypotheses: (i) g is regular at p(x), each pi(x) is semiregular at x, and every element of &~g(h(x)) has nonnegative components. In this case, it follows that f is semiregular at x. (ii) g is semiregular at p(x) and p is G-differentiable at x. In this case it follows that f is regular at x, and the weak*-cl conv is superfluous. Proof: The set S is weak*-compact, and therefore its convex hull has compact closure. The support function of S evaluated at a point h in X is r(h):= max{ ai < vi, h >: vi E api(x), a E O~g(p(x)). (2) By 2.3, it suffices to prove that r(h) majorizes f~(x; h) for any h. Applying the MeanValue Theorem on g, for any h and k in X and small positive t, we have [f(x + th + tk) - f(x + tk)]/t = [g(p(x + th + tk)) - g(p(x + tk))]/t = - ai(z)[pi(x + th + tk) - pi(x + tk)]/t (3) where a(z) E a~g(z),z E [p(x + tk),p(x + th + tk)]. Fix h and k in X. On one hand, max{< vi, h >: vi E O~pi(x)} = p~(x; h) > limsup I tio pi(x + th + tk) - pi(x + tk)] t On the other hand, min{< vi, h >: vi E O~pi(x)} = - max{< vi, -h >: vi E ~pi(x)} = -pt(x; -h) < - lm s [pi(x - th + tk') - pi(x + tk')] < -lim sup tlo t where k':= k + h [pi(x + tk) - pi(x + th + tk)] = -lim sup tlO t = liinf [pi(x + th + tk) - pi(x + tk)] tlo t 8

Thus, for e > 0, there exists 6 > 0 such that Vt E (0,6], [pi(x + th + tk) - pi(x + tk)]/t E < O>pi(x), h > +[-, e]. (4) Since g is normal at p(x), we may choose 6 small enough such that for all t E (0,6], a(z) E Og~(p(x)) + [-e, ]n. (5) Combining (2), (3), (4) and (5), we have [f(x + th + tk) - f(x + tk)]/t < r(h) + n(L + Lillhll + c)e, for any t E (0, 6] where L is the local bound of O~g near p(x), Li are the local bounds of 9~pi near x. Therefore, for any h in X, f~(x; h) = sup {limsup[f(x + th + tk) - f(x + tk)]/t} kEX tJO < r(h). This proves (1). The proof for the equality under condition (i) is almost identical with the proof of the corresponding part of the proof of Theorem 2.3.9 of [5] except we use semiregularity and the Michel-Penot subdifferential for pi and f instead of regularity and the Clarke subdifferential there. When p is G-differentiable at x, each pi is semiregular at x by 2.4. Thus, condition (ii) is a special case of (i). In this case, each dopi(x) is single-valued. Therefore, S is already convex and weak*-closed. The weak*-cl conv is hence superfluous in this case. This completes the proof. * Remark 3.4 If g is not normal at p(x), we may use O~g(p(x)) to replace &~g(p(x)) in the expression of S. The claims still hold. * Instead of assuming that g is normal at x, we assume now that g is G-differentiable at p(x). Notice that for locally Lipschitzian functions in a finite dimensional space, Gdifferentiability is the same as F-differentiability. The first two examples in the introduction show that G-differentiability in general does not guarantee normality. Theorem 3.5 (Chain Rule II). In this case, one has 9

&~f(x) C S:= {Z aivi: vi E ~pi(x),a = g'(p(x))}, (6) and the equality holds if each pi(x) is semiregular at x, and g'(p(x)) has nonnegative components. In this case, it follows that f is semiregular at x. Proof: We only point out the difference between this proof and the proof of Theorem 3.3. Instead of (3), we have for fixed h and k, [f(x + th + tk) - f(x + tk)]/t = [g(p(x + th + tk)) - g(p(x + tk))]/t = [< g'(p(x)),p(x + th + tk) - p(x + th) > (7) + o(llp(x + th + tk) - p(x + tk)ll)]/t =: ai[pi(x + th + tk) - pi(x + th) >]/t + o(1). Combining (7) and (4), we obtain the conclusion. In this case, S is already convex and weak*-closed. Hence, we do not need to take the weak*-closure of the convex hull of S in (6). The proof for equality is similar to the corresponding part of the proof of Theorem 2.3.9 of [5]. * Remark 3.6. Michel and Penot [9] also proved a chain rule which requires that p is differentiable at x. Our chain rules are more useful to derive the following calculus rules. * Proposition 3.7 (Pointwise Maxima). Suppose that {f: i = 1,...,n} is a finite collection of locally Lipschitzian functions on Y. Let f:Y — + S be defined by f(y) = max{fi(y): i 1,...,n}. Then for any x in Y, O~f(x) C conv{aofi(x)' i I(x)}, where I(x) = {i: fi(x) = f(x)}. If f is semiregular at x for each i in I(x), then equality holds and f is semiregular at x. Proof: The arguments are similar to those for Proposition 2.3.12 of [5] except we use Theorem 3.3 here. * 10

Remark 3.8. The inclusion relationship is a special case of 2.7. Proposition 3.9 (Basic Calculus). Suppose that f and g are two locally Lipschitzian functions on Y. Then &9~(f + g)(x) C 9 f(x) + &~g(x); ~(f ~ g)(x) C g(x)O~ f(x) + f(x)&~g(x);?~(f/g)(x) C [g(x)~f(x) - f(x)O9g(x)]/[g(x)]2, if g(x) + O. Furthermore, if f and g are semiregular at x, then equality holds for the first formula and f + g is semiregular at x; if f and g are semiregular at x, and f(x) > O, g(x) > 0, then equality holds for the second formula and fg is semiregular at x; if f and g are semiregular at x, and f(x) > 0, g(x) > 0, then equality holds for the third formula and f/g is semiregular at x. Proof: The arguments are similar to those for Propositions 2.3.13 and 2.3.14 of [5] except we use Theorem 3.3 here. ~ The following shows that equality can be obtained in Proposition 3.9 almost everywhere in certain spaces. Proposition 3.10. Suppose Y is an open subset of X, a separable Banach space. Then, equality holds for the formulas in Proposition 3.9 almost everywhere in Y in the Haar measure. Proof: When X is a separable Banach space, a generalized version of the Rademacher Theorem under the Haar measure holds [4]. Thus, f and g are differentiable almost everywhere in the sense of the Haar measure. The right hand sides of each formula in Proposition 3.9 are, therefore, singleton almost everywhere in the Haar measure on Y, proving the result. * We now give some applications of the above discussions. Theorem 3.11 (Maximum Eigenvalue). Let A(x) be a G-differentiable n x n symmetric matrix-valued function of a variable x in SRm. Let f(x) be the maximum eigenvalue of A(x). Then f is a locally Lipschitzian, semiregular function, and 11

9~f(x) = conv{[< A1(x)u,u >,...,< A (x)u, u >]: u E S}, where S is the set of unit eigenvectors belonging to the maximum eigenvalue of A(x). If the maximum eigenvalue of A(x) has multiplicity 1, then f is G-differentiable at x, and f'(x) = [< Al(x)u, u >,..., < A' (x)u, u >], where u is the unit eigenvector belonging to the maximum eigenvalue of A(x). Proof: Let g be a function mapping the n x n symmetric matrix space to R, such that g(y) is the maximum eigenvalue of y. Then g is a convex function [12] and 9g(y) = conv{uTu: u E S(y)}, where S(y) is the set of unit eigenvectors belonging to the maximum eigenvalue of y. Combining this and Theorem 3.3, we get our result.I This theorem generalizes Proposition 2.8.8 of [5], which is valid when A is smooth. We thus can calculate the directional derivatives of f at x since they are the same as the Michel-Penot directional derivatives in this case. This is useful to find a decent direction of f at x if optimality has not been achieved [12] [13]. For more discussions on eigenvalues and singular values of nonsmooth matrix-valued functions, see [19]. Section 4. Characterization of the Michel-Penot Subdifferential Suppose again that X is a separable Banach space, that Y is an open subset of X and that f is a locally Lipschitzian function on Y. Thus, f is differentiable almost everywhere in the sense of the Haar measure as given above. By 2.4, O~f(x) is equal to {f'(x)} almost everywhere. Let [x, y] be a line segment in Y and x' -- x, y' = y + (x' - x). It follows from the Fubini Theorem that there are points x' approximating x with the property that 0~f is single-valued almost everywhere on the line segment [x', y']. On [x', y'], j <& Of(x' + t(y - x)),y- x > dt = f(y') - f(x'). Jo 12

Define < ~f(x + t(y-x)), y - x > dt jo = lim < ~f(x' + t(y-x)),y-x > dt, where x' are those points such that 9~f is single-valued almost everywhere on [x', y']. Then / < ~ff(x + t(y - x)), y- x > dt = lim [f(y')- f(x')] = fy) - f(x). Suppose now that for any two points x and y in Y, there exist a finite number of line segments [x0 = X], x], [ 2], 2,...,[xm-1,Xm = y] in Y, connecting them. We call such a set Y linearly connected. Theorem 4.1 (Unique Determination Theorem). Under the above assumptions, f is determined by a~f uniquely up to an additive constant. Proof: Fix a point z in Y. For any y in Y, let [x0 = x,, [1],, * * *, [Xm-i, Xm = y] be a finite collection of line segments in Y, connecting x and y. Then for any such f with the same 0~f, m f(y) = f(x) + [f (xi) - f(xi- )] i=1 m 1l = f(x) + < < ~f(xi_1 + t(xi - xi_)), xi- xi_ > dt, i=l which is uniquely determined up to an additive constant, f(x). ~ Corollary 4.2. For any cycle of line segments [xo = x, xi], [1, X2],...,, [XmXm+i = x~] in Y, E 1 io0 < ~f(xi_- + t(xi - xi_)), xi - xi_ > dt = 0. We now study the characterization problem for the Michel-Penot subdifferential, i.e., what kind of set-valued operators are the Michel-Penot subdifferentials of locally Lipschitzian functions? Suppose that X is a separable Banach space and that Y is a linearly connected open set in X. A set-valued operator F: Y -+ X* is called a maximal seminormal operator if 13

(i) F is single-valued almost everywhere in Y in the Haar measure; (ii) F is locally bounded in Y; (iii) For any x E Y, F(x) is a nonempty, convex, weak*-compact set; (iv) For any x E Y and h EX, max{< u, h >: u E F(x)} = sup limsup / < F(x + t(k + sh)), h > ds, (8) kEX tO Jo where the integral applies to the intervals in which F is single-valued almost everywhere in the corresponding one-dimensional measure. By Section 2 and Corollary 4.2, it is easy to see the following. Proposition 4.3. The Michel-Penot subdifferential of a locally Lipschitzian function defined on Y is a maximal seminormal operator. ~ Another candidate is the maximal monotone operator. Proposition 4.4. Suppose that X is a finite dimensional space. A maximal monotone operator F defined on Y is a maximal seminormal operator. Proof: According to Mignot [11], we have (i). Condition (ii) is commonly known for maximal monotone operators. It suffices now to prove (iii). Let x be in Y, h be in X, t > 0 and s E (0,1]. Let u E F(x) and v E F(x + tsh). Then < v-u, tsh > > O, i.e., < u,h > < < v,h>. Thus, < u,h > < limsup < F(x + tsh), h > ds, Therefore, the left hand side of (8) is not greater than its right hand side. On the other hand, let x be in Y, h and k in X, t > 0 and s E [0, 1). Let v E F(x +t(k + sh)) and w E F(x+tk +th). Then 14

< w- v,t(1- s)h > > 0, i.e., < v,h > < < w,h >. Thus, limsup < F(x + t(k + sh)), h > ds = limsup < w, h > tI Jo t]o < max{< u,h >: u e F(x)}. Therefore, the right hand side of (8) is not greater than its left hand side. Hence (8) holds. ~ Let [x,y] be a line segment in Y and x' -- x, y' = y + (x' - x). It follows from the Fubini Theorem that there are points x' approximating x with the property that F is single-valued almost everywhere on the line segment [x', y']. Define < F(x+t(y-x)),y-x > dt = limsup < F(x'+ t(y- x)),y- x > dt, Jo x'-z, JO where x' are those points such that F is single-valued almost everywhere on [x', y']. This quantity is finite because of the local boundedness of F. We call F cyclical if for any cycle of line segments [0 = x,, x, ],[x,x],..., [Xm,xm+l = x] in Y, m+l 1 E / < F(xi 1 + t(xi - x1)),xi - x,_1 > dt = 0. i=O This property characterizes the Michel-Penot subdifferential of a locally Lipschitzian function in Y. Theorem 4.5 (Characterization Theorem). A maximal seminormal operator F, defined on Y, is the Michel-Penot subdifferential if and only if it is cyclical. Proof: The necessity of this condition is due to Corollary 4.2. We now prove the sufficiency of this condition. Suppose that F is cyclical. Fix a point x in Y. For any y E Y, let [xo = x,x], [x 1,2,2],...,[Xm, m+x = x] be a finite collection of line segments in Y, connecting x and y. Define f Y - S by m <1 f(y) = > j < F(xi_l + t(xi - xi_l)), Xi - Xi_ > dt, i=l 0 15

for any y in Y. Since F is cyclical, this definition is unique, no matter what collection of line segments we choose to connect x and y. By the local boundedness of F, we see that f is locally Lipschitzian. For fixed k and h, and very small t, we see that / < F(y + t(k + sh)), h > ds = [f(y + th + tk) - f(y + tk)]/t. Jo Thus by (8) and the definitions of the Michel-Penot directional derivative and the Michel-Penot subdifferential, max{< u, h >: u E F(y)} = f~(y; h) for any y E Y and any h E X, and ~/ f(y) = F(y) for any y E Y. This completes the proof. u Theorem 4.5 is thus parallel to the characterization theorem of the subdifferential of convex functions. It is also parallel to the discussion of Qi [16] for the Clarke subdifferential of nonsmooth functions under the assumption mentioned in our introduction. If a set-valued operator F satisfies (i) and (v) For any x E Y, F(x) = weak* - cl conv{weak* - limui uui E F(xi),xi -- x}, (9) then F is called a maximal normal operator [16]. Proposition 4.6. If a maximal seminormal operator F, defined on Y, is weak*-upper semicontinuous at a point x in Y, then (9) holds at this point. Proof: Denote the right hand side of (9) by C. For any h E X, by (9), max{< u, h >: u E C} < max{< u, h >: u E F(x)}. On the other hand, by (8), 16

max{< u,h >: u e F(x)} = sup{limsup < F(x + t(k + sh)),h > ds} kEX tJO o < max{< u, h >: u E C}. Thus, F(x) = C, i.e., (9) holds. ~ Therefore, a maximal seminormal operator is a maximal normal operator, if it is weak*upper semicontinuous on Y. A maximal normal operator is the Clarke subdifferential of a locally Lipschitzian function if and only if it is cyclical [16]. Theorem 4.5 generalizes this result. For other discussions on maximal normal operators, see [17] and [18]. Section 5. Integral Functionals Again, we assume that X is a separable Banach space and Y is an open subset of X. Suppose that (W,A, P) is a positive measure space. Consider the following integral functional: f(x)= f(x)P(dw), fw where (i) for each x in Y, the map w - fw(x) is measurable; (ii) there exists a integrable function g: W -- SR, such that for any x and y E Y and w E W, one has Ifw(x) - f (Y)I < g(w)lx - YII. Consider the Michel-Penot subdifferential of f. We first establish an inclusion relationship ~f(x) C J ~fw(x)P(dw). (10) Then we try to get exact formulas for i~f(x). As in [5], (10) has the following interpretation. For every G E ~f(x), there exists a mapping w --,w from W to X*, where w E GO~f (x) a.e. relative to P, and such that Vv E X, the function w -— < ~,,v > is in L'(X, f) and < J, v >= fw < (w, > P(dw). 17

Lemma 5.1. Suppose that f is defined at some point x E Y. Then f is defined and Lipschitzian in Y. Proof: Let y be any point in Y. Then the maps w -- fw(x) and w -+ fw(y) are measurable. Hence, If(x)- f(y)l Ifw(x) - f (y) P(dw) < LI[x-yll, where L = fw g(w)P(dw). Thus f is defined and Lipschitzian on Y. u Theorem 5.2. Under the above assumptions, for any x E Y, (10) holds. Further, if fw is semiregular at x for each w, then f is semiregular at x and equality holds in expression (10). Proof: This proof is similar to the proof of Theorem 2.7.2 of [5]. Let u E 3~/f(x), and let h be any vector in X. Then by the definitions of 9~f(x) and f~(x; h), < u, h > < f~(x; h) =sup{limsup ([fw(x +th + tk) - f(x +tk)]/t)P(dw). kEX tto According to conditions (i) and (ii) on f and Fatou's Lemma, the right hand side of the second inequality is not greater than sup{(limsup[fw(x + th + tk) - fw(x + tk)]/t}P(dw) = f~(x; h)P(dw). W kEX tO w By 2.2, the mapping f~(x;.) is convex. Let r(h) = J f~(x; h)P(dw). Then r is also a convex function and r(O) = 0. Thus, < u, h >< r(h)- r(0) 18

for any h e X. Thus, u E Or(O). If the map w — + rw(h) = f~(x;h) is measurable for each h, then dr(O) is contained in fw Orw(0)P(dw) [5][8]. However, drw(O) = (~f(x) by construction, so the result would follow. Consider the map w — * rw(h) for each h, r (h) = sup{limsup[fw(x + th + tk) - fw(x + tk)]/t. kEX tJO Since fw(') is continuous, we may let t 1 0 only take rational values and k take values in a countable dense subset of X. Therefore, rw(h) as a function of w, is the countable supremum of a countable lim sup of measurable functions of w. Thus, it is also a measurable function of w. This proves (10). The proof of the second assertion is identical to the corresponding part of the proof of Theorem of 2.7.2 of [5] using semiregularity and the Michel-Penot subdifferential instead of regularity and the Clarke subdifferential. ~ Theorem 5.2 is essentially a Michel-Penot subdifferential version of Theorem 2.7.2 of [5]. The advantage obtained in this case is that under moderate requirements on X and W, the right hand side of (10) is singleton almost everywhere in Y, thus (10) becomes equality almost everywhere in Y. In stochastic programming, X is a space of functions on W. If W is countable or X is restricted to continuous functions or a finite dimensional space, we may use Theorem 5.2 directly. In general, however, X is an Lp space. The next theorem states the results for L~. The result for general p follows closely. Theorem 5.3. Let (W,A, P) be a a-finite positive measure space and let LOO(W,Jn) be the space of measurable essentially bounded functions mapping W to n". Suppose X is a closed subspace of L~~(W, n) and f,: Rn -- W such that (i) holds and (ii') 3e > 0, a function g E Li(W, R), such that, Vw E W, kl,k2 E B(x(w),e), If)(ki) - f(k2)l < g(w)jlIk - k2 1, (11) where B(x(w), e) is the ball of radius e about x(w) in?n. Then, for any x satisfying the conditions, f is Lipschitzian near x and (10) holds. Furthermore, equality holds in (10) if fw is semiregular at x(w) for all w. Proof: First to show that f is Lipschitzian near x, suppose lx - ylloo = 6 < e, so that 19

P{w: IIx(w)- y(w)ll > 6} = 0. Then, If(x)- f(y)I < Ifw (x(w))- fw(y(w)) P(dw) /.= |/,, 6 Ifw(x(w))- fw(y(w))IP(dw) {w: I x(w)-y(w )||<6} (12) + j| ) )I Ifw(x(w)) - f(y(w))lP(dw) {w: llx(w)-y(w)11>6} < L6 where L = fw g(w)P(dw). Hence, (12) implies f is Lipschitzian for all y E BLO(X, e). The proof of Theorem 5.2 can now be applied in exactly the same way, noting that X is separable. * The results in Theorems 5.2 and 5.3 lead to the complete characterization of subgradients of the integral functional through subgradients of the integrand. In the first case, we assume that X and W are finite dimensional and use the Lebesgue measure. In stochastic programming, this corresponds to decisions X that must be taken before the realization of random outcomes. A common example is the simple recourse stochastic program (see Wets [23]). Proposition 5.4. If X and W are finite dimensional, then under the assumptions for Theorem 5.2, the right hand side of (10) is single-valued over a subset Y' of Ysuch that Y" = Y \ Y' has zero measure. Thus equality holds for (10) on Y. Proof: Since f is Lipschitzian in Y for each w, 8~fw(x) is single-valued almost everywhere in Y for each x by the Rademacher Theorem. Let Y' = {x E Y: &~fw(x) is single - valued almost everywhere in W}. Then, the right hand side is single-valued over Y'. By the Fubini Theorem, the measure of Y" = Y \ Y' should be zero. ~ In the more general case, the equality in (10) holds almost everywhere in the Haar measure around x using Theorem 5.3. Proposition 5.5. Under the assumptions for Theorem 5.3, equality holds in (10) almost everywhere in the Haar measure on BLoo(X, e). 20

Proof: The proof follows similarly to the proof of Proposition 5.4. Note that fw is Lipschitzian on Rn near x(w) for each w. Hence, o~fw is single valued almost everywhere in Bsn(.(x(w), c). The assumptions imply that x(w) is finite for all w so we may translate to the origin without loss of generality by suitably redefining fw(v) as f(v - x(w)) for any v E Rn. We follow this translation (which does not affect the Haar measure on L~~) and assume x = 0. Consider the product measure on Rn" x W. From the Fubini Theorem, the set, {(w, y), where IIYll < e and O~fw(y) is not single valued }, must have zero measure. Moreover, for almost all y E Rn', IIyII < e, {w: &~fw(y) is not single valued } has zero measure. We will now construct finite dimensional subspaces of L~ on which (10)'holds almost everywhere near x. Since W is separable, we may construct a sequence of finite sigma fields Ui C A such that Ui C Ui+l and for any B E A, 3B' C UiUi such that B D B' and P{B \ B'} = 0. For each i, consider Vi C L (W, Rn), where V, = {y such that y is measurable on Ui}. This subspace is finite dimensional and equivalent to a IUil-fold product of "n. Also, note that the union of subspaces Vi is dense in L~. Again, &~fw is single valued almost everywhere in Bv,(e) using the Lebesgue product measure on Vi. The Fubini Theorem gives that for almost all y E Vi, Illl < 6, {w: O~fw(y) is not single valued }, has zero measure. Thus, the set of IlIYl < e, where {w: 9~fw(y) is not single valued } does not have zero measure, has zero measure in finite dimensional subspaces whose union is dense in L~~(W, Rn). Hence, for almost all Illl < e in the Haar measure on L~~(W, Rn), {w: O~f,(y) is not single valued } has zero measure ( using [4, Theorem 7.2]). Therefore, (10) holds almost everywhere near x. * Notice that &~f(x) = {f'(x)} a.e. by 2.4. The next result relates this to the Clarke sub differential. Proposition 5.6. For x E Y", as defined above, the Clarke subdifferential O~f(x) = conv{limif'(xi): xi -, xi E Y'}; the Michel-Penot subdifferential can be decided by max{< u,h >: u E 9~f(x)} 21

= sup{limsup <0 9f(x+ t(k+sh)),h >ds}, kEX tlO where the integral is only defined where the intersection of Y" and [x + tk, x + tk + th] has one-dimensional zero measure. Proof: The first claim is due to Theorem 2.5.1 of [5]. The second is due to an argument similar to that in Section 4. * Section 6. Applications in Stochastic Programming Much of our motivation is for analysis and computation in stochastic programming. Useful optimality conditions for these problems follow directly from the results above. The general form of the stochastic program that we consider is: inf F(x) subject to gi(x, w) < O, a.e., Vi E I, (SP) xEX where F(x) = f f(x)P(dx), X is a separable Banach space, I is a finite index set, and (W, A, P) is a positive measure space. In the stochastic programming context, X is a space of functions on W. The functions f and g may be written f(x(w), w) and gi(x(w), w). To distinguish a random vector, we use the bold-face notation convention, e.g., x. In general, X is a closed subspace of LP(W,A,P;Rn') for 1 < p < oo. The random vector, x = (xl,x2) where x1 = x1: w — + R" and X2 = x2: W + SRn2, n = n1 + n2. This corresponds to an initial, or first-period, decision x1 followed by a future or second-period decision x2, which may in turn be partitioned into decisions at distinct points in time. To obtain implementable decisions in this context, all points in X satisfy nonanticipativity, defined by: (xi - E {xiAi}) = O,a.e. (13) where E { IA1} represents conditional expectation with respect to a sigma field, A1 C A. Practically, this corresponds to enforcing a decision before the realization of some random outcome to depend on data that is available at that time. In multiple period problems, several subfields are nested to correspond to the resolution of random outcomes as time proceeds. 22

The following theorem gives a necessary condition for an optimal solution to the general problem in (SP). Theorem 6.1. If x is an optimal solution to (SP), and f and gi,i E I, satisfy conditions 5.i and 5.ii', then there exists some A > 0 and measurable function, v E X~, the annihilator of X in X*, such that v(w) = (w) + A1ri(w), (14) where E(w) E Off,((w)) and rqi(w) E conviEI((w),w){Ogi(x(w),w))}, I(x(w),w) = {il gi(((w),w) = 0}. Proof: We begin by rewriting the constraints as G(x) = j {max gi(x(w),w)}P(dw) < 0. (15) Now, (SP) becomes an optimization problem in x with a single constraint. From Michel and Penot ([9]) (a direct application of 2.8 and Proposition 3.7), if x is optimal in (SP), then 0 E &OF(x) + AOG(x), (16) for some A > 0. Using Theorem 5.3, we have &9F(x) C w O~fw(x)P(dw) and 8~G(x) C J I O[max gi(x(w), w)Pl(dw). From Proposition 3.7, 8~[max g(x(w), w)] C conv{O gi(x(w),x): i E I(x(w), w)}, (17) iEI where I(x(w),w) = {ilgi(x(w),w) = 0}. The application of Theorem 5.3 and (17) to the right hand side of (16) yields *F(x) + A9&G(x) C |J {ffw(x) P(dw) -+ Aconv{O~gi(x(w), x) i E I(x(w),w)}}P(dw). (18) Inclusions (16) and (18) imply that 0=/w < ((w)+ (w), v > P(dw), (19) Jw 23

for all v E X and some ((w) E 9fw(x(w)) and?i(w) E conviEI((w),w){(Ogi(x(w),w))}. This proves the result. * The theorem is strengthened if we can guarantee that v = 0 is included in (14). This is true if A1 = A, but not generally true otherwise. An additional assumption of some feasible extension of any initial decision is required to obtain this stronger result (see Dempster [6] for a discussion.) Theorem 6.1 can, however, be used in conjunction with 5.5 for the construction of a class of algorithms for (SP). If we can find C(w) and rli(w) such that the right hand side of (19) is less than some e, then Proposition 5.5 assures us that we have an e-subgradient of F + AG. This can be combined with a subgradient method to obtain stationary points of F + AG. ACKNOWLEDGEMENT The authors are thankful to Dr. Jeyakumar for references [3] and [22]. REFERENCES [1] J.R. Birge and L. Qi, "Computing block-angular Karmarkar projections with applications to stochastic programming", Management Science 34 (1988) 1472-1479. [2] J.M. Borwein, S.P. Fitzpatrick and J.R. Giles, "The differentiability of real functions on normed linear space using generalized subgradients", Journal of Mathematical Analysis & Applications 128 (1987) 512-534. [3] 0. Caligaris and P. Oliva, "Necessary conditions for local Pareto minima", Bollettino U.M.I 5-B (1986) 721-750. [4] J.P.R. Christensen, Topology and Borel structure, North-Holland, Amsterdam, 1974. [5] F.H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [6] M.A.H. Dempster, "On stochastic programming II: dynamic problems under risk," Stochastics 25 (1988) 15-42. [7] Y. Ermoliev and R. Wets, Numerical Techniques in Stochastic Programming, Springer-Verlag, Berlin, 1988. 24

[8] A.D. Ioffe and V.M. Tihomirov, Theory of Extremal Problems, North-Holland, Amsterdam, 1974. [9] P. Michel and J.-P. Penot, "Calcul sous-differentiel pour des fonctions lipschitziennes et non lipschitziennes", C. R. Acad. Sc. Paris. 298 (1984) 269-272. [10] P. Michel and J.-P. Penot, "A generalized derivative for calm and stable functions", preprint, 1988. [11] F. Mignot, "Control dans les inequations variationelles elliptiques", J. Functional Analysis 22 (1976) 130-185. [12] M.L. Overton, "On minimizing the maximum eigenvalue of a symmetric matrix", SIAM J. Matrix Analysis & Application 9 (1988) 256-268. [13] M.L. Overton and R.S. Womersley, "On minimizing the spectral radius of a nonsymmetric matrix: Optimality conditions and duality theory", SIAM J. Matrix Analysis & Application 9 (1988) 473-498. [14] D. Preiss, "Differentiability of Lipschitz functions on Banach spaces", to appear in: Journal of Functional Analysis. [15] A. Prekopa and R.J.-B. Wets, Stochastic Programming 84, Mathematical Programming Study 27 & 28 (1986). [16] L. Qi, "The maximal normal operator space and integration of subdifferentials of nonconvex functions", to appear in: Nonlinear Analysis, Theory Methods and Applications. [17] L. Qi, "Semismoothness and decomposition of maximal normal operators", to appear in: Journal of Mathematical Analysis & Applications. [18] L. Qi, "Quasidifferentials and maximal normal operators", to appear in: Mathematical Programming. [19] L. Qi and R.S. Womersley, "On extremal singular value functions", in preprint. [20] R.T. Rockafellar, The Theory of Subgradients and Its Applications to Problems of Optimization: Convex and Nonconvex Functions, Helderman-Verlag, West Berlin, 1981. [21] J.S. Treiman, "Shrinking generalized gradients," Nonlinear Analysis, Theory, Methods and Applications 12 (1988) 1429-1450. [22] D. Ward, "Convex directional derivatives in optimization", preprint, 1988. 25

[23] R. J-B. Wets, "Solving stochastic programs with simple recourse," Stochastics 10 (1983) 219-242. 26