An Upper Bound on the Expected Value of a Non-Increasing Convex Function with Convex Marginal Return Functions1 Christopher J. Donohue2 and John R. Birge Department of Industrial and Operations Engineering The University of Michigan Ann Arbor MI 48109-2117 Technical Report 95-6

An Upper Bound on the Expected Value of a Non-Increasing Convex Function with Convex Marginal Return Functions1 Department Christopher J. Donohue2 of Industrial and Operations University of Michigan Ann Arbor, Michigan 48109 Engineering Department John R. Birge of Industrial and Operations University of Michigan Ann Arbor, Michigan 48109 Engineering. bstract: In this note, we show that if a convex function is non-increasing and has a special property we call conivex mlargilal return functions, an effective upper bound can be established using only two function evaluations. Further, we show that this bound can be refinled in suilc a way that the number of functioll evaluations nleeded grows lilearly wit l t lie number of refinements performed. AIeyLuords: Upper Bound. (Convex lluniictioIn. Approximation for Stochastic Programming This work has been partially slupported byI the National Science Foundation under Grant DDM-9215921 and the Great Lakes. ('elter for Trucking and Transit Research 2 Christopher J. Donohue. )epart eiinit f Industrial and Operations Engineering, 1205 Beal Ave.. University of Michigan, Ann Arir. Michigan 48109-2117

1 Introduction Bounding the expected value of a convex function of a multivariate random variable is a problem which has many applications in mathematical programming. in particular stochastic programming. Unfortunately, the effort for finding upper and lower bounds generally favors the lower bound, both in computational difficulty and effectiveness. Jensen's inequality gives the classic lower bound which only requires one function evaluation. This lower bound is generally effective, and as was shown in Huang, Ziemba and Ben-Tal [7]. this bound can be replicated over finer and finer partitions of the probability space until the bound converges to the expected value. Edmundson and Madansky [8] developed the classic upper bound which requires 2n function evaluations, where n is the number of random variables. This bound requires the random components to be independent and to have bounded support. Since the number of function evaluations grows exponentially with the number of random components, other methods for finding an upper bound have been developed. Gassmann and Ziemba [6] developed an upper bound which requires solving a linear program. This bound applies to random variables with arbitrary convex domains which may be bounded or unbounded. rauellndorfer [5] obtained an extension of the Edmundson-NMadansky bound for the case of dependent ralldoml components. Further, upper bounds have been developed for specific convex functions with special attributes. Birge and Wets [3] considered the recourse function of a two-stage stochastic linear programl and showed that an upper hound could be obtained using separable sublinear functions. Wallace [10] shiowed that the network recourse functioll could b1he Iounded above by solving three net-work flow probllells. Birge and Wallace [2] extended tIlis result for genteral linear recourse function. III this paper, we consider a special structire that obtains an upper bounid with only two function ~evalulations. We first observed this structure ill iiiiiiiiml cost network flow problems with stochastic lilk capacities. We refer to the general property as colvex marginal return functions. Tils property canll e proven to exist for various aspects of IinimIlIIIum cost network flow problems with stochastic link capacities (see Donohue and Birge [4] for details). Applications include vehicle allocation problems and 1

othe~r logistical models. 2 Development of the Upper Bound Let X = (XI, X,,..., X,.) be a multivariate random variable on the probability space (Q, Z-, P) having distribution function F and finite mean x-. Assume that the components of X are independently distributed, each with bounded support. For i = 1,. n, let L ~0 =H inffx- F(x) K 1}. (2) Let, o(NY) be a bounded convex function of N Xe [, X.' i = 1-.., n. Then the Edmaundson-Mladanskv inequalit-y MO gives the classic upper bound on F [O(X)] (P, as follows: iE{jL,H} n~E{L.H} whtere H -L L XI xi, H XI ___x Pt ~~and p 4 Pz xI I Iix~ I4 4 'Iibs corre'(sl)onds. to considering only the corner point~s of the n-dimiensional rectangle defined by the vauI Ies, o f 4 and x. for all i. For a random vect~or N of highi dimensions, this bound still requires the function values for 2" realizat ions of N. For functions whichi are not (lirectly calculable (suchl as the objective function of a inatlieniatical programi). this miay be comiput~ationally prohibitive. The ability to reduce the numiber of realwzal osi of N that need to be calculated wouild thierefore be useful. The following paper shows that, if o(X.\) is a non-increasing, convex function of N wit i convex marginal ret~urn functions withi respect to any p~air (NV. X.). I = 1,..-n, then an upper bouind on 0 can be established using only two realizations of N. 2

First the property of convex marginal return functions with respect to any pair of X components must be defined. Definition: A function f(X1i, X,.. X1n) has convex marginal return functions with respect to any pair(NX,Xj),i= 1,...,n, j= 1,...,n iffor all 1 i < n, 1 j < n,AA > 0, 6> 0, f(xl...,Xj + 6,...,Xn) - f(x,..., X,..., ) (5) < f(x...., X + AX,...,Zj +,)...., ) -f(xl,...,zi +....j,...., ). (6) Note that if f(X) is a differentiable function over all possible values of A, this property is equivalent to requiring that Of(Xi,...,Xn) dOX be a non-decreasing function of Xi. Let VA be an n-dimensional random vector, where each \i is independently distributed with a bounded support anid finite mean. Further, let x, xH be the realizations of Ai as defined in equations (1) and (2). so that.r <.H for all i. Let pfL ) prob{X, = xz} and pH prob{Xi = xH} for all i be sucll that L L H H r-i L H L H IpI I P1 +P px, = E[Xi], pz + p; = 1, pi p > O. Let f(A ) be a lnon-increasing convex function of A with convex marginal return functions with respect to ami pair (,. ). i = 1...,n. j = 1.... 7. Thenl t he following shows that an upper bound onl E[f(X )] call I)( estalalislhed with only two functionl evaluations. While this bound may not be as effective as tle Ie dlldmindson-lladansky bound, the loss ill tiglit ness Inay be nore than outweighed by tle savings in (coll)lll atioll time. Theoremi 2.1 Let N and f(X) be as defincd abo,(. L(t vL _ max{pfp l...,pL } and pH _ 1 - p. The e E[f(X)] < pL f( XL..... ) + 'p f(X.xH.. X) -HL0. (7) 3

Proof: By induction. Let n = 1. The result holds by Edmundson-Madansky. Suppose the result is true for n = t > 1, and consider the case where n = t + 1. Then, by EdmundsonI adansky, E[f(X)] pE[f(x,,X2..., Xn)] + p E[f(x, X2,.., Xn)] For a particular value of X1 = xi, f(x1,X2,..., Xn) is a function of t variables. By the induction hypothesis, E[f(xi,X2,,..., X,)] < pLf(xl XL,..., ) + f(xlxH,, x, H where p- max{pI), p/,..., p } and pH - 1 - pL. Thus, E[f( )] < pL f(xfL,xL* XL) + pH H X(. ) + ppLf(x x.... ) + H^/t-rf,^,...,^). (8) pH pHf (XHH XH,,... H). Note that now the set of random variables (X, -)3..3., X) are perfectly correlated and therefore can be replaced by a single random variable. Now. without loss of generality, assume that p L > pL. To simplify notation, let f(L.L) Ef(lxL....^ f(L H) f(X' H..xH), f(H L)- f(XlH X,... ) f(H, H) f(XH, XH,...,X). 'I 'henl. by construction. 4

Erf (X)] ~ piLpzLf (L, L) + pLfiH f(L, H) + PH fL f(H, L) ~p HfiHf(H, H), ~ ~ (L L+_~PpH Hf L, H) + p~~pL f L, L +f (H, H) - f (L, H)) + jfHpHf (Hl H). -(iupL + PfiP )f(L, L) 1(~L3 pfI )f (L, H) ~pf'f(H, H), from (8), since f has convex marginal return functions, f (H, L) K f (L, L) + f (H.H) - f (L, H), since f is non-increasing, f (L, H) - f (L, L), and by assumption, Pir3 PiP ~0 = PifrfL, L)+pjf(H, H). 0 hii [7]. it was shown that. upper and lower bounds on the expected value of a. convex function of a mutltivariate random variable (with independent components) could be refined to arbitrary accuracy by rePeat jug thle bound over sharper and sharper part itions of the probability space of the randomvaibe lFor example. for a one-dimensional random variable N, a sharper bound Ml on the expectation of O(X) canm be found by app)lying the Edmundii~soni-MNadanisky~ bound on each of the subint~ervals [xf L xj1] f, I II];I., follows': jj' = dF(xi-) > 0. L1 Il anid C1 - 1 -<C > 0. ad /H = I>xdx) Il l,u cH CI (10) (learlv. this method can be used to refine tHe houind shown in Theorem 2.1, as long as each partition is rectanguilar. By applying this method to Hicreasing partitions, we again may have many function

evaluations. The bound in Theorem 2.1 forces the individual random variables to become perfectly correlated. Simply reapplying this bound over various rectangular partitions of the probability space changes that correlation. However, the following shows that we may refine the HLO bound and keep the perfect correlation. This helps primarily with keeping the number of function evaluations small. First we state without proof a rather obvious extension of the bound M1 given above. Lelmma 2.1 If pLxL + px + pHtH = E[x], pL + pM + pH 1, and pLp H > 0 then _< M1 < pLQ(XL) + pM(xM) + pHO(xH) < MO for 0 < p < (cL ( r L ) + CH (,z ) ), where cL, cH, L and pH are as defined in equations (9) and (10). tUsing this lemnma, the bound for non-increasing convex functions with convex marginal return functions can be refined. Let X be an n-dimensional random vector, where each Xi is independently distrihuted with bounded support and finite mean. Further, let x, xH be the realizations of S, as defined in, e(lations (1) and (2). so that xL < xH for all i. Further. let x < xm < x for all i. Let f' - )'obH{X = xi}.'PIt -. prob{X, = x } and p - p7ob{Xi = x} for all i be such that IL WP~l H LA H Al pt1 L' + p.I + E[X,. + p; + = 1, p,p p >. i,Lt jf(\) he a non-increasing convex function of,' with convex marginal return functions with respect to any p)air (,., X\j). i = 1...,n j = 1....n. L Tl()heorem 2.2 Let.\ and f(X) be a.s d6fin(d ahoi'.( L1t 1)_ max{ +' 7 i -= 1...n} anid plH I- p1-. ITr //h(r, let 10 = miini{c ~ (A) I _r): i= 1.2,... 7}, where cf,, cL and 4IA art a.s d(jit(d in equations (9) and (10). 7heni E[f(X.)] < pLf(x) +. f(l...... ') + )H f(x.. ) _= HL1. (l /7f r( ipL =(* (1 PL ( p'l). aIId p pH * (1 _ p AI). P 6

Proof: By induction. Let n = 1. The result holds by the Huang, Ziemba, and Ben-Tal extension of the Edmundson-Madansky bound. Assume the result is true for n = t > 1, and consider the case where n = t + 1. Then, by the extension of the Edmundson-Madansky bound, E [f (X)] <p1E [f (xL X..., 'Xn)] + p E1 [f(x, XN2. X )] + p E[f(x 2. n)X However, for a particular value of X1 = x1, f(xl, X2..,Xn) is a function of t variables, and so, by the induction hypothesis, ~~~E[f(xi.N. A\)]~jLf(xCL.x^..4)+pMf(x1,xM. M)+Hf(xxH. H) where = in {cf (i ) +cH, ( ) -:- i 23.., n pL a..., -(1_) _ L,,...,, x pL +P, and pH = (1 - max{f p: i = -)2,. n} * (1 -M). Thus, from above, E[f(X)] < pLiLf(XL L.... ) + f(f.,..., X )+pf H, p Lf(L H (. L L) + p'Z f f(j' H AMlH,H X, H H X H _ Pi l. 1 2 2 II n I)*X. +plH - f(xH, x,.L... X L) + p f. zH, X".., X)+ l H f(x, _ H....., ) Note tlhat now the set of random variab)les (X\,.....,, ) are perfectly correlated and, therefore, canl againl be replaced by a single random variable. To sinmplify thle notation, let yk _ (..., X ), for = L= l, H. Now. thle mainl idea behind this part of tlhe proof is to break tlhe domain of X into various rect angular regions. eliminiate points within eachi of tlios( regionls anid continue this process until the only points relllaiillng are the desired points. Assullle witlmloimt loss o f generality tlat p /(pf + )) = {/( + ) i 1,...,. iF rtlier, let (1 ) )*p/(p+p) ( ad = * (1 - /() ( ). d _k k _(lpM)/(l M) k= LH). 7

Note that by construction, pL = max{pL/(pL + p): i = 2..., n} * (1 - pA). Then by Lemma 2.1, E[f(X)] < pfLtf(xL,yL) + pLpM f(xf,yM) + pLpHf(x,yH) + MPL1f(xM,YL) + PM pMf(M,yM) + pMp3Hf( M yH) + HpLf(xH,yL) + PHpM f(x(H, yM) + pH f(xH, yH) _< pzLf (xyL, ) + pMf(xyL(xlLyM) + pLpHf(x1, H) + pMpLf( M, L) + pMlpMf(xM yM) + pMpHf(x yH)+ (11) pHL (xH, yL) + pMf( xH yM) + pHpH Y(xH ). Now, consider the region of the domain of X in which xL < Xl < x and yL < y < yMl. By Theorem LpL Lf(XLyL)+p{ LpMf(L ) + pf( x 1yLy ) + p f( y ) p M pf yM) < -L(pL + pM)f(x L yL ) + pAM(pM + ip)f(x, yM) (12) Next consider the region of the domain of X in which xl <~ xl < xlH and y"M < y < yH Again bt I'llorenl 2.1, since pH > p'H, we obtain: P.,i' f(., A f(.A YH ) + piH pHf(x H', M) + pp lH H ) f(xH, yH) p- f (.I. y I Pi f(13) < pM(p' + p H)f(X y ) + p (pH + pM)f(, yH) (13) iiiall-. looking over the entire domaini of A anid only considering the corner points, since _- > pL, we (caii Us, Tlheoreml 2.1 to obtain: L L Lf(L,yL ) + pH f(x L yH) + p1f. f( 1,I L ) + pHiH f(xH' yH) < -L (L + 4j)I )f( L. -) +1 + pL)f(X,y ). (14) ('Oliiillilgi equations (11)-(14) gives the desired resutlt. O Note that t lie only restrictions on the choice of.rJ, are that xL < xH and that xl be such that I I I 8

0 < < 1. Therefore, the middle points can be chosen so that the p A values are all relatively close. This aids the effectiveness of the refined bound. 3 Example Functions In this section, we consider two examples to illustrate the advantage of this new bound. The first example represents a common logarithmic utility function with two commodities. The second is taken from a vehicle allocation problem. Example #1: o(x1,x)= - ln(x2 + 8x2), 1 < x1 < 25, 0 < x2 < 20. Since all second derivatives of d(Xl, N.2) are nonnegative over the range of all possible values of X1 and 2,. o(X\1, X) is a convex function of X1 and N2 over that range (See, e. g., Rockafellar [9]). Since the argu ment. of ln(-) always increases as X1 and/or NX2 increases, and since - ln(x) is a strictly decreasing function of x > 0. c (X1, X-) is also a non-increasing function of X1 and X2. Finally, since 0co( x 1-.) _ -8 Ax2 s -+ 4x, is a 1lnol-decreasing function of Ai. o(Xi.A X)) has convex. marginal return functions with respect to N1 and X',. thus. the results from Section 2 can be used to generate upper bounds on E[(X 1, A.,)] _ 0. Let tlio raiinlon variables X1 and X~ have the following probability density functions. X'l: fl(xl) =.0567-.001157l X1 x 1 < 1xl < 25 > E[X1] = 9.4967 = E-M weights: (.6460..3540); X2~ f:2(x2) =.11565 *expT', 0 < ' j < 20 => E[X] = 6.870 => E-M weights: (.6565,.3435). 9

Theorem 2.1 can be applied to give the following upper bound on IP, HL0 =.65615 * 0(1, 0) +.3435 * q$(25, 20) = (.6565)*(-n(l)) +(.3435)*(ln(785)) = -2.28966 >P -p compared to the Edrnundson-Madansky bound, which is MO.6460.6565 * 1, 0) +.6460 *.3435 * 1, 20) +-3540 *.6565 * (25, 0) +.3540 *.3435 * 0(25, 20) - -3.43439 > ~ Thien byv Jensen's inequality, a lower bound can be established. > ~ (9.4967, 6.8710) -ln(145.14671) =-4.9717439 V inally, thle uipper hound can be tightened using Theorem 2.2. Let x"I E[X1], and x~'1 E[X-2]. Then 7)".1:365. The improved upper bound is: HLI.G565 *(I1-.51365)* o(1. 0)~+.51365 *0(9.4967,6.870) +.3435 *(1-.51365) * (25, 20) = -3.67024 > q Note that thils l)ound improves on tHie Edmundsoni-Madansky bound and requires fewer function evaluat IonsI. Exarmple #2: Consider the nietwork flow problem shiown in figure 1. To t lie left of 'Nodes 1 and 2 is the availab~le sti)ply for thlose nodes, and to the right, of Nodes 3 through 7 Is thie demanid requirements for each of thiose ntodes. The link number, the link's upper capacity. and the cost per unit flow is listed along eachi link. Eachl Ilink is assumed to have lower capacity of zero. Note that Liniks 1.2.3. and 4 have random variables cl C'. & and ~4 respectively for their upper capacity-. Forapfartic ulIar realization of ~ = ~~~.4). let f (c) (lenote the value of minimizing costs in this network. Lettingp x, den-ote the flow along Link i. then (f(l cani be written as follows: 10

< 50 < 20 < 30, < 40 > 30 igure 1 (Link number, upper capacity t — 41- -rx8 -22x9 +3 -4r7 -* s.t. rl rl -512 -6x3 +X2 + I -3r4 + 5 +X4 + s +Z8 +9q 4 +z7 re, +-6T +rz 12 +z9 I I4 -10 = 100 lo 45 < 50 < 2'v < 30 < ^( < A < C,:} <, < Ill < I'0 < I0 > 0). ri r2 I 4 It6 I7 X8 Z9 11

Suppose C1 and ~3 have the following distributions: x 11 12 13 14 15 16 17 18 19 20 Pr(x).02.04.05.05.06.06.06.07.07.08 x 21 22 23 24 25 26 27 28 29 30 Pr(x).07.07.06.05.05.04.04.03.02.01 And (2 and (4 have the following distributions: x 21 22 23 24 25 26 27 28 29 30 Pr(x).01.03.05.06.07.08.09.09.10.10 x 31 32 33 34 35 36 37 38 39 40 Pr(x).11.08.05.02.01.01.01.01.01.01 This gives the following: => E[I] = E[53] = 19.8 => => E[~ ] = E[4] = 28.69 = E-NI weights: (.5368,.4632); E-.M weights: (.5953,.4147). Solving each of the 204 possible versions of this linear program could be very difficult, making the v*alul of effective bounds quite useful. This mat heinatical program, viewed as a function of the random ilpper capacitiies 1.2, 3, and 4 is a convex fulln ion (see e.g.. Birge and Louveaux [1]), a non-increasing fillct ionll sillce all solutionl feasible for <1 = (C i. I.. ) is also feasible for an = (., e2 c ) sucll tlat - >, 1 (i = 1.. 4), and has convlex mXargiial ret urn functions with respect to each of tlese compllonenlltts by [4]. the nietwork flow problem is solved for particulaIr values of 1I,I^, 23, and E4. Suppose then that the value of 'I is inlcreased by one, while all otlier capacities remain unchanged. Suppose that the new problem is solved by a network algorithm which recursively (deletes negative cost cycles, such as the out-of-kilter 12

method, and the solution to the previous problem is used as the starting solution. The only possible negative cost cycle would have to include an increase of flow on Link 1, since the previous solution was optimal and the only changed aspect of the problem is the upper capacity of Link 1. But that implies thiat the solution to the updated problem either decreases or leaves the flow along Link 2 unchanged. Ini either event, if ~2 is now increased by one, the effect of that increase is less than if.2 had been increased before 1. Again, the results from Section 2 can be applied to bound the expected value of f(C). The bound from Theorem 2.1 gives an upper bound on the expected value of f(J) by solving two versions of this program. instead of the sixteen needed to obtain the E-M bound. Here, HL0 =.5953*f(11,21, 11.21) +.4147 * f(30, 40, 30, 40) (.5953) *(-270.00)+ (.414T)* (-365.00) -312.0965 > E[f(C)]. The Edmun-dsoni —Madansky bound for this function is: MO = -319.1815> E[f(V)]. Jensen s, I ieqItiality gives an lower bouniid of -343.09. F'iiillvr Theorem 2.2 can be used to refine tihe uppjler hotinid. Let cnl = V 19.8 and let CAI = A 325.5. Thlei nM =.59049. The improveid uplper boundI is: HL1 =.2470 f( 1. 21. 11. 21) +.581 f( I 9.. 2.325, 19.8, 32-.5) +.1679 * f(30, 40.30.,40) -329.4819 > E[f(f)]. lII t his case, t lie result is agaiii better tn tha h Ldiiiiiiidsoii-M adanskv bound MO witlh oild thliree insteadl of sixteeii funiction evaluations. 13

4 Conclusion The difficulty of finding an upper bound for a convex function of a mulitivariate random variable, where the number of random components is greater than three, makes any special properties of certain classes of convex functions useful in analysis. In this paper, a bound is established for non-increasing convex functions which only requires two function evaluations whenever marginal return functions are convex with respect to any pair of the random components. Further, this bound can be refined in such a way that allows the number of function evaluations needed to grow linearly with the number of refinements performed. The bound has already been effective in establishing upper bounds for certain stochastic network flow problems. In practice, this bound has been found to be consistently as good or better than the Edmundson-Madanskv bound after only one refinement of the bound, regardless of the number of random components. 14

References [1] J. R. Birge and F. Louveaux, (1995) Stochastic Programming, unpublished manuscript. [2] J.- R.- Birge and S. W. Wallace, "A Separable Piecewise Linear Upper Bound for Stochastic Lihear Program-s", SIAM Journal on. Control and Optimi.zation 26 725 - 739 (1988). [3] J. R. Birge and R. J-B. Wets, "Sublinear Upper Bounds for Stochastic Programs with Recourse" Mathematical Programming 43 131 - 149 (1989). [4] C..J. Donohute and J. R. Birge, "An Upper Bound on the Network Recourse Function." W~orking Paper. Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor,.Michigan (19953). [5] 1,". Frauendorfer. "Solving Stochastic Linear Program Recourse Problems with Arbitrary Mulit-variate Distributions - The Dependent Case", Mathematics of Operations Research 13 3771 - 394 (1988). []1-1. Gassmann and XV. T. Ziemba. '-A Tight Upper Bound for the Expectation of a Conv~,ex Functionl of Miiltiv-ariate Random Variable", Mathematical Programming 'Study 27 39 - 53 (1986). [7] C. C. Hluanig. XV. T. Ziemba and A. Ben-Tal, "Bouinds on the Expectation of a Convex Function of ai Handom \Variable: With Applications to Stochastic Programming". Operations Research 25:315 - [>S] A\. Nladaiisky. "Bounds on the Expectation of a \lult]ivariat~e Random Variable", Annals of M1ath - m1(itical.Statistics 30 743 - 746 (1939). [9] IH. I'. Rockafellar. Convecx.Analysis. Princeton ('niiversitv- Press. Princeton, N.J, 1970. [10] S. Wallace. "A Piecewise Linear Up1)per B3oundI on the Network Recourse Function", Mathematical Programmning 38 133 - 146 (1987). 15

Lemma 2.1 If pLxL + pMxM + pHxH = E[x], p + pM + pH = 1. and pL, p, pH > then _ < M1 < pL (xL) + pM0(XM) + pHO(xH) < MO for 0 < pMf < CL H-z I-xML, H L for O <p < (CL A ( -rL ) + cH (C r^)). where cl C1 f L and plH are as defined above. Proof: The first inequality comes from Huang, Ziemba, and Ben-Tal [7]. The second and third come from notilng that at pM = (cL (L ) + c (H p)) pC(xL()+pA (xM)+pH(H) = M1, and at p = 0, pL o(xL )+pn P(x )+pH(xH) = Mo. Between those two values of pM, pLO(xL)+pM p(xI )+pHQ (xH) is increasing linearly. O 16