Bounding the Expectation of Convex Functions with Limited Distribution Information John R. Birge Industrial and Operations Engineering University of Michigan Ann Arbor, Michigan, USA 48109 Jose H. Dul1 34 Voie du Roman Pays C.O.R.E. 1348 Louvain-la-Neuve, Belgium Technical Report 87-15

Bounding the Expectation of Convex Functions with Limited Distribution Information* John R. Birge Industrial and Operations Engineering University of Michigan Ann Arbor, MI, USA 48109 Jose H. Dula 34 Voie du Roman Pays C.O.R.E. 1348 Louvain-la-Neuve, Belgium Keywords: integration, stochastic programming, moment problem, duality, approximation. Abbreviated title: Bounding convex functions. Mathematics Subject Classification: 90C15. Abstract: This paper considers bounds on the expectation of a convex function of a random variable when only limited information is available about the underlying distribution. The problem is presented as a generalized moment problem. A special class of functions is shown to have an easily computable solution to this problem with first and second moment constraints. Extensions are given for general convex functions on finite intervals or with finitely valued recession functions. * This research has been partially supported by the National Science Foundation under Grant ECS-8304065, by the Office of Naval Research Grant N00014-86-K-0628 and by the National Research Council under a Research Associateship at the Naval Postgraduate School, Monterey, California. 1

1. Introduction. Many problems in applied mathematics involve an integral or an expectation of a convex function. For example, in mechanics, the average load on a given structure may be expressed as the expectation of convex functions (or linear combinations of convex functions) of the stresses. In finance, the present value of an option on a stock may be the expected value of a convex function of the future value of the stock price. In computer systems, the performance (throughput) may be a concave (i.e., the negative of a convex) function of the system load. The average performance is then the expectation of this performance function. Our interest has been in functions arising in stochastic programs, mathematical optimization models that involve random variables. The basic problem is to find: Ef(x) = E{f(x(w))} = f(x(w))P(d), (1.1) where x is a random vector mapping the probability space, (0, A, P), onto (RN, BN, F), F is the distribution function of x, and x E X C 9N. The expectation functional, Ef (x), can also be written as a Lebesgue-Stieltjes integral with respect to F: E (x) = j f(x)dF(x). (1.2) Difficulties arise in evaluating Ef (x) when either the function f is difficult to evaluate (for example, requiring a complex simulation for each function evaluation as in the computer system example) or the distribution function F is not known exactly (for example, when the demands on the computer system are not known beforehand). Many approximation formulas for integrals (1.2) have been given (see Davis and Rabinowitz [8]), but they either do not provide efficient error bounds or impose strict differentiability requirements on the integrand. In this paper, we review methods for evaluating upper bounds on E (x) with limited distribution information and convex f. In addition, we provide a method for evaluating upper bounds on a large class of convex functions on R1 given first and second moment information about the distribution of x. We also show how this result can be extended to a general bound for convex functions on RN. The bounds extend from basic properties of convex functions and from a generalized moment problem interpretation of the problem. 2

Section 2 provides background on previous approaches to bounding expectations. The generalized moment problem interpretation is given in Section 3. Section 4 presents basic results and some examples in'R1. Section 5 provides the multi-dimensional extension. 2. Background and previous integral approximations. For f(x) = z, the expectation functional is the first moment or mean of the random vector, x, with respect to the distribution function, F. For f(x) = x', the expectational functional is the ith moment of x with respect to F. A function f: X -+ (-oo,+oo0] for X a convex set is convex on X if and only if f((1 - X)x + Ay) < (1 - A)f(x) + (y) for all A (0,1) and every x and y in X. In the following, we assume that X is convex and that every convex function f on X is proper, i.e., there exists x E X such that f(x) < +00 and f(x) > -oo for all x E X. A function f is concave on X if -f is convex on X. A function f is affine on X if f is finite-valued, convex and concave on X. If f(Ax) = Af (x) for all A E (0, oo) and x E RN, then f is positively homogeneous (of degree one). A positively homogeneous function f: RN -- (-00, +00] that is also convex on?RN is sublinear. A sublinear function associated with any convex function is the recession function, fO+, defined as (see [31]) (fO+)(y) = sup{f(x + y) - f(x) I f(x) < oo}. (2.1) In the sequel, we also consider optimization problems of the form, sup f(x), (2.2) xEC where C is a subset of a linear space. A point x is feasible in (2.2) if x E C. A feasible point x is an extreme point of C if there does not exist y and z in C such that x = (1 - A)y+ Az, 0 < A < 1 other than x = y = z. A point x* is optimal in (2.2) if it is feasible and f(x*) = sup-Ec f(x). In this case, x* attains the supremum of f over C. When the supremum of f is assumed to be attained for some x* E C, we replace "supremum" by "maximum." For general functions f, the basic procedures to approximate E () use some o form of a discrete approximation for the distribution of x. For X = [a, b] c Ep1, the most basic procedures are the midpoint and the trapezoidal approximations. If one interval is used, 3

these approximations apply to a uniform distribution on [a,b]. They are Ml (f) = f ((a + b)/2) for the midpoint and T1 (f) = (f(a) + f(b))/2 for the trapezoidal approximation. The approximations are improved by dividing [a,b] into subintervals, appropriately weighting the subintervals and applying M1 and T1 on each subinterval. A more sophisticated procedure is gaussian quadrature to find an integral formula that fits all polynomials up to some degree. As noted by Miller and Rice [29], this can be used to find a discretization with N values that matches the first N +1 moments of the distribution of x. To match the first three moments of the uniform distribution on [a,b], for example, gaussian quadrature selects two points, (a + b)/2 ~ (/'3/6)(b - a), with equal probability, 1/2. A difficulty with using the gaussian quadrature formulas is that they do not generally provide bounds on the expectation. Restrictions on higher-order derivatives and Peano's theorem [30] may be used to provide bounds but that requires, at least, differentiability of f and a density function that may not be available. Generalizations of the mid-point and trapezoidal approximations do, however, obtain bounds on the expectation of a convex function. Jensen's inequality [21] can be interpreted as a generalization of the mid-point approximation that provides a lower bound on the expected value of convex f through the following: N f(x)dF(x) > f( xdF(x)), (2.3) where x = fS2N dF(x) is assumed finite. Madansky, following Edmundson, ([27], [12]) provided a generalization of the trapezoidal approximation, called the Edmundson-Madansky inequality, that gives an upper bound on the expectation of a convex function. For N = 1, the basic inequality is: f f()dF < -((b- )f(a)+ ( - )(b)) (2.4) J-N (b - a) where [a,b] is a finite interval. The Edmundson-Madansky inequality (2.4) can also be extended to multiple dimensions and infinite intervals (see, for example, [1], [14], and [16]). Refinements of the Jensen and Edmundson-Madansky inequalities are possible by subdividing the interval (or, more generally, the region) into smaller pieces on which the 4

bounds can be reapplied as in the traditional mid-point and trapezoidal approximations (see [3], [15], [19], and [22]). These refinements require additional functional evaluations and conditional expectations on the subregions. As has been observed, the Jensen lower bound is generally reasonably accurate relative to the Edmundson-Madansky upper bound (e.g., [17]), which requires more function evaluations. The primary concern is then in obtaining more accurate upper bounds without additional computational effort. This paper shows that this is possible for a class of convex functions, if second moment information is available. The next section provides the generalized moment problem formulation that leads to this new bound. 3. Generalized moment problem. To obtain bounds that hold for all distributions with certain properties, we can find Q E P a set of probability measures on (X, BN) subject to Vi(x)Q(dx) < o,,i = 1,...,s, (3.1) x vi,()Q(dx) =,,i = s + 1,...,M, to maximize f(x)Q(dx), where M is finite and the vi are bounded, continuous functions. A solution of (3.1) obtains an upper bound on the expectation of f with respect to any probability measure satisfying the conditions above. Problem 3.1 is a generalized moment problem ( [25]). When the v, are powers of x, the constraints restrict the moments of x with respect to Q. In this context, (3.1) determines an upper bound when only limited moment information on a distribution is available. Problem 3.1 can also be interpreted as an abstract linear program since the objective and constraints are linear functions of the probability measure. The solution is then an extreme point (see [33] for a discussion of properties) in the infinite dimensional space of probability measures. The following theorem, proven in [23, Theorem 2.1], gives the explicit solution properties. 5

Theorem 1. Suppose X is compact. Then the set of feasible measures in (3.1), Q., is convex and compact (with respect to the weak* topology), and Q is the closure of the convex hull of the extreme points of Q. If f is continuous relative to X, then an optimum (maximum or minimum) of fx f(x)Q(dx) is attained at an extreme point of Q. The extremal measures of Q are those measures that have finite support, {xl,...., X}, with L < M + 1, such that the vectors, vi(xi) v\ / (XL) v2(X1) V2(ZL) (3.2) VM (x1) VfM (XL) are linearly independent. Kemperman [24] showed that the supremum is attained under more general continuity assumptions and provides conditions (which we assume to hold) for Q to be nonempty. Dupatova's (formerly 2kAkova) [9, 10, 36] work on a minimax approach to stochastic programming led to the use of the moment problem as a bounding procedure for stochastic programs. She showed that (3.1) attains the Edmundson-Madansky bound (and the Jensen bound if the objective is minimized) when the only constraint in (3.1) is v1 = x, i.e., the constraints fix the first moment of the probability measure. She also provided some properties of the solution with an additional second moment constraint (v2 (x) = x2) for a specific objective function f. The general problem can be solved using the generalized linear programming procedure in [7, Chapter 24]. This procedure is given below. Generalized Linear Programming Procedure for the Generalized Moment Problem (GLP) Step 0. Initialization. Identify a set of L < M + 1 linearly independent vectors as in (3.2) that satisfy the constraints in (3.1). (Note that a phase-one objective ([7]) may be used if such a starting solution is not immediately available. For N = 1, the gaussian quadrature points may be used as mentioned above.) Let v = L, k = 1, go to 1. 6

Step 1. Master problem solution. Find pi > 0,...,p, > 0 such that IP= =1, 1=1 vl(xl)pi < ci,i = 1,...,s, (3.3) I=1 vI (xI)pI = ai- +l,..., M, and z = f(xz)p, is maximized. 1=1 Let {pk,...,pk} attain the optimum in (3.3), and let {(k, 4,...?} be the associated dual multipliers such that M i=1 M (3.4) 8 + tX vi (xI) > f (x), if p = 0,1 1,..., v, i=1 > 0, i= 1,...,s. Step 2. Subproblem solution. Find x^+1 that maximizes M p(X, e ~-) = f(x) - = - E,,(X). (3.5) i:1 If p(xv+l,, k rk) > 0, let Y = v+ 1, k = k+ 1 and go to 1. Otherwise, stop, {pk,...,p} are the optimal probabilities associated with {x1,..., xL} in a solution to [5]. The proof of the convergence of GLP is givenn [7, Chapter 24]. This result is used in [13] to solve a class of problems (3.1). The difficulty in GLP is in the solution of 7

the subproblem (3.5), which generally involves a nonconvex function. Birge and Wets [3] describe how to solve (3.5) with constrained first and second moments, if convexity properties of p can be identified. Cipra [6] describes other methods for this problem based on discretizations and random selections of candidate points, x,. This paper considers specific conditions so that (3.1) can be solved without requiring the repeated nonconvex optimization in (3.5). The goal is to show that a large class of problems for N = 1 require only L = 2 points of support that can be identified in one line search, that the points of support are analytically calculable for certain functions, and that the solution procedure can be extended to a general class of functions in multiple dimensions. The next section describes the basic results. 1.0 0.8 0.6 0.4 0.2 0 o x.4 0.2 Figure 1. The generalized moment problem in R1. 4. Two-point support functions. In this section, we restrict (3.1) to the case of N = 1, 8 = 0, and M = 2, where again the constraints correspond to first and second moment constraints. To distinguish this case of (3.1), we refer to it as Problem (3.1-(1,0,2)). The problem is illustrated geometrically in Figure 1. Here, X = [0,.6] and C = the convex hull of (a, z2, f(z)) for z E X. The first moment, z, and the second moment, z(2), are shown. The objective in (3.1-(1,0,2)) is to find y* = (, x(2),z*) E C that maximizes z. A generalization of Caratheodory's theorem ([35]) for the convex hull of a connected set tells us that y* can be expressed as a convex combination of at most three extreme points of C, giving us a special case of Theorem 1. Therefore, an optimal solution to (3.1-(1,0,2) can be written, {X*,p*}, where the points of support, x* = {x*, zx } have probabilities, 8

P* = {p*lP;,p;}. We will show that, for a broad class of two-point support functions, this can indeed be further restricted to two points, {x;, x }, with positive probability. A useful result of the linear programming interpretation of (3.1) is the presence of a dual problem to (3.1). The dual to (3.1) is known as a semi-infinite program ([18]) that appears, for example, in Chebyshev approximation ([32]). For the one-dimensional, two-moment constraint problem considered here, this dual is to find 6, Ir1, r2 such that 0 + 1iz +'r2x2 > f(x),Vx E X, (4.1) and 8 + rlt + Wr2 (2) is minimized. Note that (4.1) involves three variables and an infinite number of constraints in constrast to the infinite dimensional, finitely constrained primal problem (3.1-(1,0,2)). Note also that an optimal solution to (4.1) is a quadratic function that dominates f and that has minimum expectation with respect to any probability measure in Q. The optimality conditions on a feasible solution to (3.1), x* = {x, xl, x }, with associated probabilities, p* = {p;,p*,pp}, are that there exist dual variables, 6*,l, 7r*, such that 6* + r; X + 2r (z )2 = f(xZ) if Pi > 0, (4.2) *8 + 7r zx + rx2 > f(x),Vx E X, where the first condition is known as complementary slackness condition and the second condition is dual feasibility. A useful interpretation of these conditions in terms of the function p defined in (3.5) is that x* has a positive probability, pi*, only if p(x*, 6*, Ir* ) = 0 and x* maximizes p over X for fixed (6*, f* ). It is convenient to let p(x, 6, r) = f(x) - q(x), where q(x, zr, 6) = 6 + Irlx +?r222. The following lemmas give additional properties for an optimal solution {x*,p*} to (3.1-(1,0,2)). We assume that f is always convex below. The first lemma refers to feasible solutions of (3.1-(1,0,2)). Lemma 1. If a feasible solution to (S.1-(1,0,2)) is obtained with positive probability support points, X1 < X2 < X3, then another feasible solution exists with support points, xl and X4, where x2 < x4 < x3. 9

Proof: This result is straightforward. Feasibility of {x1, 2, 3 } implies there exists {Pi,P2,P3} > 0 such that E31Pi (i,x2) = (,x(2)) or (x1,x2) +p2(x2 - x1,xJ - x2) + p3(X3 - X1,Xa -_X2) = (X,X(2)). This implies that the line (xi +t( x-x),X2 +t(x(2) - X2)) is a convex combination of the lines, (x1 + t(xi - z1), x2 + t(x? - x2), i = 2,3. All three lines intersect (x, x2) at (x1i, 2). The convexity property implies (x1 +t(x- x1), x2 +t(x(2) - X2)) intersects (x, x2) at X4 such that x2 < X4 < X3.1 The next lemma considers f with a derivative f' that has local convexity or concavity properties. These properties form the basis for the bounding approach given in this paper. Lemma 2.If p(x,, X) = 0 for some (6, r) feasible in (4.1), x is in the interior of X, and f' is convex or concave on A = (x - trx) or B = (x,x + ri) for some qr > 0, then, there exists 8 > 0, such that, for any e > 0,e < 6, f'(x - ) > q'(x - E, 6, r) for f' convex or concave on A and f'(x + e) < q'({ + e,, r) for f' convex or concave on B. Proof: Let p(.) = p(,, O,) and let q(.) = q(,, 6). Note that q is differentiable on X and that, for f convex, f is continously differentiable on all but a countable number of points in X. Since p is then densely differentiable, there exist right and left open neighborhoods of x on which p is differentiable. The following limits are well-defined p'- (A) = lim'( - t); A'+ (x) = liml'(x + t). (4.3) tlo t4XO If p(x) = 0 for feasible (0, r) in (4.1), then x maximizes p. Therefore, 0 E [p'- (x), p' (x)]. By the local convexity (or concavity) assumption for f' around x, there exist intervals (X-t, x) or (x, A+t) (t > 0) over which p' has constant sign. Suppose f'(x-e) < q'(x-E, 8, r) for all 0 < e < t and f'(x - s) < q'(x - s,(, *) for some 0 < s < t, then p(x) < p(x - t), contradicting the maximality of x. A similar argument holds for (x, z + t), proving the result.1 The previous lemma considers local convexity properties of f' when it exists. The following results refer to functions with derivatives that are convex and then concave. 10

Lemma 3. Let g(x) = h(x) - c(x) be a function such that g(x) is increasing on 3, h is convex on (-oo, y) and concave on (y, oo), and c(x) is an affine function on Wt. Then there exists a partition of (-oo, oo) into subintervals, I, = (-oo,a1), 12 = [a1,a2), 3 = [a2,a3), 14 = [a3,+oo), -00o a1 < a2 a < a oo, such that g(x) > O for all x E I U 13 and g(x) < 0 for all x E 12 UI4. Proof: First note that g is continuous on (-oo, c) and (c, oo). Also, g cannot change sign from negative to positive twice on (-oo, c) by the convexity of h in that region. If g has two sign changes on (-oo, y), then there exists intervals, I1, I2, I3 such that g is positive on I, U 13 and negative on I2. For g increasing, g(y) > 0. By h concave on (y, oo), g has at most one sign change on (y, oo), giving the result when g has two sign changes on (-oo, y). A similar argument applies if g has one or no sign changes on (-oo, c).m The next lemma considers the case where g = p' is constant on an interval. In the following, we use the notation X = [a,b] for convenience. It is assumed that this also includes the cases X = (-oo, b], X = [a, +oo), and X = (-oo, +oo) unless explicitly stated otherwise. Lemma 4. If f is convex on X = [a, b] with derivative f' defined as a convex function on (a, c) and as a concave function on (c, b) for a < c < b and if p(x, 9, *) = 0 for some (0, #r) feasible in (4.1) and for all x E (x - E, x + e) for some x E X and e > 0, then there exists an interval D D (x - c, x + E) such that p(x,', r) = 0 for all x E D and p(x, 6, *) > 0 for all x 4 the closure of D. Proof: Let D = (d, e) be the largest interval including (x - E, x + c) such that p(x) = 0 for all x E D. By Lemma 2 and the assumption, if d > a, there exists an interval, (d- t7, d), and, if e < b, there exists an interval (e, e + rl), for q > 0, such that p' > 0 on (d - tr, d) and p' < 0 on (e,e+ r7). By the convex-concave assumption, f' is convex on [a, d) and concave on (e, b] since q' is affine. Hence, p' would be strictly positive on [a, d) and strictly negative on (e, b]. This would imply that p is negative on (e, b], violating the feasibility of (, *) in 11

(4.1). Hence, e = b, giving the result.u The convex-concave property is now used to derive our main result about two-point support functions. Theorem 2. If f is convex with derivative f' defined as a convex function on (a, c) and as a concave function on (c,b) for X = [a, b] and a < c < b, then there exists an optimal solution to (S.1-(1,0,2)) with at most two support points, {x1, x2}, with positive probabilities, {P,,P2a Proof: Let {6, r} be an optimal solution to (4.1). First, assume that there does not exist e > 0, x E (a, b) such that p(x, 0, 6r) = 0 for all x E (x - c, x + c). By Lemmas 1 and 3, the only isolated points where p could be 0 and maximized are a1 and a3 if [a, b] D I U I3. If [a, b];> I2, then a can replace a1 and if [a, b];6 I3, then b can replace a3, but, in either case, at most two points meet the conditions for optimality. If there exists e > 0, x E (a, b) such that p(x, 6, *r) = 0 for all x E (x - c, x + c), then Lemma 4 implies that any optimal solution {x1, 2, x3} must be in the closure of D and that the p(x) = 0 for all x E D. By Lemma 2, we can select X4 in (x2, x3) such that there exists {Pi,p4} so that {xi,x4,P,pp4} is feasible in (3.1-(1,0,2)). The optimality conditions still hold for p(X4) = 0. Hence, {(l,x4,p1,p4} is optimal in (3.1-(1,0,2)).u A corollary of Theorem 2 is that any function f that has a convex or concave derivative has the two-point support property. The class of functions that meets the criteria of Theorem 2 contains many useful examples. Some of these functions are given below: 1. Polynomials defined over ranges with at most one third derivative sign change. 2. Exponential functions of the form, coecl, Co > 0. 3. Logarithmic functions of the form, log (cx), for any k > 0. 4. Certain hyperbolic functions such as sinh(cx), c, x > 0, cosh(cx). 5. Certain trigonometric functions such as tan- (cx),c,x > 0. 12

In fact, Theorem 2 can be applied to provide an upper bound on the expectation of any convex function with known third derivative when the distribution function has a known third moment, X(3). Suppose a > 0 (if not, then this argument can be applied on [a, 0] and [0,b]), then let g(z) = ax3 + f(x). The function g is still convex on [0,b) for a > 0. By defining a > (-1/6)min(0,infla, b f"' (x)), 9' is convex on [a,b], and an upper bound, UB(g), on Eg(x) has a two-point support. The expectation of f is then bounded by Ef (x) < UB(g) - ax(3) (4.4) The conditions in Theorem 2 are only sufficient for a two-point support function. They are not necessary. The following function, for example, has an optimal two-point support at x* = {1/3, 1} for any corresponding feasible pi and pa when X = [0, 1]. f - -2x+ 1 Jf() = J1 - (2/5)x - 8x2 + 10O3 1 - 4x + 4x2 if O < x <.2, if.2 < x <.4, if.4 < x <.6, if.6 < x < 1. (4.5) The function defined in (4.5) does not, however, meet the conditions of Theorem 2. Note also that not all functions are two-point support functions (although bounds such as (4.4) are generally available). A function requiring three support points, for example, is f(x) = (1/2) - /(1/4)- (z - (1/2))2. This function and its optimal dominating quadratic function are illustrated in Figure 2. 0 Figure 2. A function requiring three support points. 13

Given that a function is a two-point support function, the points {xl, x2 } can be found using a line search to find a maximum. For example, if some candidate xl < x is given, then a feasible corresponding x2 is xz() - Zxx x2 = -_ -, (4.6) x - x1 where pi = (x2 - x)/(x2 - x1) and P2 = 1 - pi. Note that the problem is obviously not feasible if (x1, x2) ~ x. The solution of (3.1-(1,0,2)) then reduces to maximizing: Ar(x ) = p1(xi)f (xi) + P2(x1)f(X2(X)) subject to xi E [a, ). (4.7) A line search to find the maximum in (4.7) can be performed efficiently using, for example, Lemarechal and Mifflin's procedure in [26] if y E C2 or Mifflin and Strodiot's [28] method without derivatives. Table 1 gives the values (under "2-M") that were obtained by this procedure for three two-point support functions with distibutions on [0,1]. Figures 3-5 illustrate that the optimal points, {x, x }, may be at either endpoint or interior to [0, 1]. The table gives the expectation for a random variable with beta distribution (under "Beta") with the given first and second moments for comparison. The Edmundson-Madansky upper bound ("E-M") is also provided. The "S-L" value (for "semi-linear bound") given in Table 1 is discussed below. Table 1. Bounding values Function x zx(2) Beta 2-M S-L E-M e-Z 0.500 0.333 0.622 0.624 0.651 0.684 x3 0.833 0.714 0.625 0.629 0.675 0.833 sin(xr(z + 1)) 0.500 0.333 0.363 0.384 0.577 1.000 +1 ___________ 14

Figure 3. Optimal bounding function for e-' o 0. o0.8 Figure 4. Optimal bounding function for X3. A line search is not necessary for solving (3.1-(1,0,2)) in certain cases. The support points can be calculated analytically for semi-linear, convex functions that are defined by f(x) q(c-x) ifx<c, q+- (2x-C) if > c, where q+ + q~ > 0. These functions clearly meet the conditions of Theorem 2 and, hence, require at most two points of support. The analytical results depend on the interval, [a,b]. If [a, b] = [0,1], then consider the nonintersecting intervals, A = (0, (2)/(2x)), B = [Z(2)/(22),(1 - Z(2))/(2(1 - a))], and C = ((1- (2))/(2(1 - z)),1). The points of support for a semi-linear, convex function defined on [0,1] are {0, x(2)/} if c E A, {X, X2} = - {cd, c + d} if c E B, (4.8) ({(x- x(2))/(l- _),1} if c E C, 15

where d = c2 - 2cx + x(2). This result can be obviously extended to all finite intervals. It results from analyzing the zeroes of 7. o pa X 0 0. 0.0.4 0. 0.8 1 Figure 5. Optimal bounding function for sin(r(z + 1)) + 1. Infinite intervals can also be solved analytically for semi-linear, convex functions. For X = [0, oo), the results are as in (4.8) with B = [x(2) /(2x),oo) and C = 0. For the interval (-oo, oo), the points of support are those for interval B in (4.8). We note that special cases for these supports of semi-linear, convex functions were considered in [9], [20], and [34]. Semi-linear, convex functions are common in decision problems to represent penalties for being above or below a preferred value, c. They can also be used, however, to provide bounds for other convex functions when only the first and second moments of the distribution function are known. For example, the following function dominates any convex function on [0, 1], v(xc) = (c - x)(( (O) - f(c))/c) + f(c) if X < c (49) (z - c)((f(1) - f(c))/(l - c)) + f(c) if z > c, where c E (0,1). The function v is a semi-linear, convex function for any convex function, f. The values of v at the points in (4.8) can, therefore, be used to bound f v(x)Q(dx) > f f(x)Q(dx) for any Q that is feasible in (3.1-(1,0,2)). The results of using v to bound f(x) = et", x3 and sin(r(x + 1)) + 1 are given in Table 1 as mentioned above. More extensive results in [11] indicate that this approximation is accurate for many functions and moment values. Semi-linear, convex functions can also be defined to dominate functions defined on unbounded ranges if those functions have finitely valued recession functions in each direc16

tion. If fO (y) < oo for all y E (-oo, oo), then there exists soe c such that f(c) < oo and v(z, c) = (f )(x - c) + f(c) > f(x) for all x. The corresponding formulas for semi-linear, convex functions on unbounded ranges can then again be used to bound Ef (x) for any feasible distribution that meets the conditions in (3.1-(1,0,2)). Bounds on the expectation of sublinear functions are especially computable because (fO+)(y) = f(y) for sublinear f by definition. The next section discusses how to use these bounds in multiple dimensions. 5. Bounds in multiple dimensions. The use of the generalized programming formulation is limited in multiple dimensions because of the difficulty in solving the subproblem (3.5). Another difficulty is that, even with a bound only on the first moment of the distribution function and constrained cross-moments, E[x ],i j, the moment problem solution ([14]) involves positive weights on 2N extreme points of X when X is a compact set in RN (and a corresponding number of recession function evaluations when X is not compact ([5])). These computational disadvantages for large values of N suggest that a looser but more computationally efficient upper bound on the value of (3.1) may be more useful than solving (3.1) exactly for large N. If a separable function, v(x,c) = 1vi (x(i), c(i)), is available, it offers an obvious advantage by only requiring single integrals. In this case, we would like to find (Zx, c) = Si 1 Vi(x(i), c(i)) > f(z) where each vii(z(i),c(i)) is a semi-linear, convex function. Methods for constructing these functions to bound the optimal value of a linear program with random right-hand side are discussed in [2] and [4]. In these stochastic linear programs, f(x) = min {qy I Ay = x}, (5.1) where q E Rn and A E RnxN are known parameters of the problem. Note that f defined in (5.1) is sublinear. The functions vi are found by solving for qf = min {qy I Ay = ~ei}, (5.2) YE g n+ 17

where e, is the ith unit vector in RN. We then have,(xi, 0)= f q+ X if xi > 0, i (*,~0)= {q-Q i x if i < 0(53) whose expectation can now be bounded using the support points found in Section 4. Note that this bound requires 2N function evaluations instead of the exponential number of function evaluations required in the Edmundson-Madansky bound. Different values of ci from 0 in (5.3) are used when a deterministic vector also appears in the right-hand side of the linear program in (5.1) (i.e., the constraints are Ay = x - t for some deterministic vector t). More precise bounding functions for f in (5.1) are possible for distributions on compact regions ( [2]). Linear transformations can be used to obtain other separable, semi-linear convex upper bounding functions ( [4]). 6. Conclusions. This paper describes how to bound the expectation of convex functions when only limited distributional information is available. Given the difficulties in estimating random phenomenon, limited information in terms of bounds on the mean and second moments of distributions is a general practical situation. The bounds provided in this paper allow for efficient computation. We note that these procedures can be extended to lower bounds on the expectation of concave functions obviously. Since the Jensen inequality is the solution of the generalized moment problem to minimize the expectation of a convex function subject to a first moment equality constraint and an upper bound on the second moment constraint, upper and lower bounds are computable on the expectation of general functions that can be expressed as linear combinations of convex and concave functions. Given this extension and the use of upper bounding separable, semi-linear convex functions as in Section 5, the two-point support bounds apply to a wide range of problems. REFERENCES 1. A. Ben-Tal and E. Hochman, More Bounds on the Expectation of a Convex Function of a Random Variable, J. Appl. Prob. 9 (1972) 803-812. 2. J. R. Birge and S.W. Wallace, A Sepable Piecewise Linear Upper Bound for Stochastic Linear Programs, SIAM J. Control and Optimization, to appear. 18

3. J.R. Birge and R. J-B. Wets, Designing Approximation Schemes for Stochastic Optimization Problems, in particular, for Stochastic Programs with Recourse, Mathematical Programming Study 27 (1986) 54-102. 4. J.R. Birge and R. J-B. Wets, Sublinear Upper Bounds for Stochastic Linear Programs with Recourse, Technical Report No. 86-26, Department of Industrial and Operations Engineering, The University of Michigan (Ann Arbor, MI, 1986). 5. J.R. Birge and R. J-B. Wets, Computing Bounds for Stochastic Programming Problems by Means of a Generalized Moment Problem, Mathematics of Operations Research 12 (1987) 149-162. 6. T. Cipra, Moment Problem with Given Covariance Structure in Stochastic Programming, Ekonom.-Mat. Obzor 21 (1985) 66-77. 7. G.B. Dantzig, Linear Programming and Extensions, Princeton University Press, Princeton, NJ, 1963. 8. P. J. Davis and P. Rabinowitz, Methods of Numerical Integration, Academic Press, Orlando, Florida, 1984. 9. J. Dupacova, Minimax Stochastic Programs with Nonconvex Nonseparable Penalty Functions, in: Progress in Operations Research, A. Prekopa, ed., North-Holland, Amsterdam, 1976, pp. 303-316. 10. J. Dupatova, Minimax Approach to Stochastic Linear Programming and the Moment Problem. Recent Results, ZAMM 58 (1977) T466-467. 11. J. Dula, Bounds on the Expectation of Convex Functions, Ph. D. dissertation, The University of Michigan, Ann Arbor, Michigan, 1986. 12. H. P. Edmundson, Bounds on the Expectation of a Convex Function of a Random Variable, The Rand Corporation Paper 982, Santa Monica, CA, 1956. 13. Y. Ermoliev, A. Gaivoronski, and C. Nedeva, Stochastic Optimization Problems with Partially Known Distribution Functions, SIAM J. Control and Optimization, to appear. 14. K. Frauendorfer, Solving S.L.P. Recourse Problems with Arbitrary Multivariate Dis 19

tributions - the Dependent Case, Mathematics of Operations Research, to appear. 15. K. Frauendorfer and P. Kall, A Solution Method for SLP Recourse Problems with Arbitrary Multivariate Distributions - the Independent Case, Institut fur Operations Research der Universitat Zurich, Technical Report, 1986. 16. H. Gassmann and W. T. Ziemba, A Tight Upper Bound for the Expectation of a Convex Function of a Multivariate Random Variable, Mathematical Programming Study 27 (1986) 39-53. 17. D. Hausch and W.T. Ziemba, Bounds on the Value of Information in Uncertain Decision Problems II, Stochastics 10 (1983) 181-217. 18. R. Hettich, An Implementation of a Discretization Method for Semi-Infinite Programming, Mathematical Programming 34 (1986) 354-361. 19. C.C. Huang, W. Ziemba, and A. Ben-Tal, Bounds on the Expectation of a Convex Function of a Random Variable: with Applications to Stochastic Programming, Operations Research 25 (1979) 315-325. 20. R. Jagannathan, A Minimax Procedure for a Class of Linear Programs under Uncertainty, Operations Research 25 (1977) 173-177. 21. J.L. Jensen, Sur les fonction convexes et les inegalites entre les valeurs moyennes, Acta Mathematica 30 (1906) 175-193. 22. P. Kall and D. Stoyan, Solving Stochastic Programming Problems with Recourse including Error Bounds, Math. Operationsforsch. Statist. Ser. Optim. 13 (1982) 431-447. 23. A. Karr, Extreme Points of Certain Sets of Probability Measures, with Applications, Mathematics of Operations Research 8 (1983) 74-85. 24. J.H.B. Kemperman, The General Moment Problem, a Geometric Approach, The Annals of Mathematical Statistics 39 (1968) 93-122. 25. M. Krein and A. Nudelman, The Markov Moment Problem and Extremal Problems, Translation of Mathematical Monographs, 50, The American Mathematical Society, Providence, RI, 1977. 20

26. C. Lemarechal and R. Mifflin, Global and Superlinear Convergence of an Algorithm for One-Dimensional Minimization of Convex Functions, Mathematical Programming 24 (1982) 241-256. 27. A. Madansky, Bounds on the Expectation of a Convex Function of a Multivariate Random Variable, Ann. Math. Statist. 30 (1959) 743-746. 28. R. Mifflin and J. Strodiot, Reliable and Rapid Univariate Minimization without Derivatives, presented at TIMS/ORSA Joint National Meeting, Los Angeles, CA, April, 1986. 29. A. C. Miller, III, and T. R. Rice, Discrete Approximations of Probability Distributions, Management Science 29 (1983) 352-362. 30. G. Peano, Residuo in formulas de quadratura, Mathesis 39 (1914) 5-10. 31. R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970. 32. T. J. Rivlin, An Introduction to the Approximation of Functions, Blaisdell, Waltham, MA, 1969. 33. N. M. Roy, Extreme Points of Convex Sets in Infinite Dimensional Spaces, Am. Mathematical Monthly 94 (1987) 409-422. 34. H. Scarf, A Minimax Solution of an Inventory Problem, in: Sudies in the Mathematical Theory of Inventory and Production, K. J. Arrow, S. Karlin, and H. Scarf, eds., Stanford University Press, Stanford, CA, 1958. 35. F. A. Valentine, Convex Sets, McGraw-Hill, New York, 1964. 36. J. Nakova, On Minimax Solutions of Stochastic Linear Programming Problems, 6asopis pro Pestovani Matematiky 91 (1966) 423-430. 21