A Quadratic Recourse Function for the
Two-Stage Stochastic Program
John R. Birge
Stephen M. Pollock
Department of Industrial and Operations Engineering
The University of Michigan
Ann Arbor MI 48109-2117, USA
and
Liqun Qi
School of Mathematics
The University of New South Wales
Sydney, New South Wales 2052, Australia
Technical Report 95-13

A Quadratic Recourse Function for the
Two-Stage Stochastic Program
John R. Birge,
Stephen M. Pollock
Department of Industrial and Operations Engineering
University of Michigan, Ann Arbor, MI 48109, USA
and
Liqun Qi 2
School of Mathematics
The University of New South Wales
Sydney, New South Wales 2052, Australia
July 28, 1995
Abstract
We present a quadratic recourse representation of the two-stage stochastic linear problem. Unlike the usual linear recourse model, it is differentiable
with respect to the first stage decision variables. This offers the possibility
of applying high convergence rate methods to solve the two-stage problem.
We show that the quadratic recourse function approximates the linear recourse function (and the corresponding solution of the two-stage problem
with quadratic recourse converges to the solution of the two-stage problem
with linear recourse) as a parameter k -- oc and another parameter Ek -~ 0.
We also give a bound for this approximation.
1This author's work was supported in part by the National Science Foundation under Award
Number DDM-9215921.
2This author's work was supported in part by the Australian Research Council.

1 Introduction
One of the main paradigms of stochastic programming is represented by the twostage stochastic linear program formulated as a master problem and a recourse
problem [8, 9]. The master problem is
min f(x) = ctx + 4(x)
s.t. Ax =b (1)
x> 0,
where x e RJn1 is the first-stage decision vector, c E en is the cost coefficient vector
for x, Ax = b and x > 0 are the linear constraints on x, with b E Rm and A E Smxn.
4<(x) is the expected value of the linear recourse function, 4>(x) = E(Q(x, x)),
where 4(x, J) is defined by the recourse problem
5(x,) = min (q(y))ty
s.t. W(O)y = T()x- h(() (2)
> O0.
Here < is a random variable of dimension r with distribution P(.), so by definition,
~ (x) = E( (x, )) = (x, )P(d(). (3)
The vector, y ~E Rs, is the second-stage decision vector, h E Rr is the demand or
resource vector, and T E Jrxn is a technology matrix.
The usual origin of the constraint equations in (2) is the desire to satisfy the
condition T({)x - h(Q) = 0. However, because of the random nature of both T and
/1, T(J)x - h(<) does not equal zero in general. Thus Wy is introduced to represent
the "discrepancy" (where W E wRXS) and q E Rs, q > 0, is the associated cost
coefficient vector for non-zero y.
To simplify our discussion, we only discuss in detail the fixed recourse problem:
when W does not depend upon C. We also assume that T and q are independent
of I and h = J, conditions corresponding to uncertain resource levels but known
prices and technologies. Equation (2) then becomes
O(x,) = min qty
s.t. Wy = Tx - (4)
y > 0.
For convenience, we let ( be a discrete random vector, so that
I
@ (x) E (x, j)pj, (5)
j=l
where pj > 0 and Z= pj = 1. Our results can be readily extended to continuous
There are two disadvantages associated with the two-stage stochastic program
with fixed recourse represented by equations (1), (5) and (4). First, generally 1~ is
1

nondifferentiable with respect to x, which prevents the use of algorithms with high
rates of convergence. Secondly, (4) may not be feasible for some x. (A problem with
(4) feasible for all x is called a problem with complete recourse. If (4) is feasible
for all x satisfying the linear constraints of (1), then it is called a problem with
relatively complete recourse.)
To address these difficulties we consider below a problem with a quadratic recourse function k (x, 5), where k > 0 is a parameter. This quadratic recourse function is always continuously differentiable, which paves the way for using algorithms
with high rates of convergence to solve the two-stage stochastic program.
Two-stage quadratic recourse models have been proposed before. In particular,
Rockafellar and Wets discuss the extended linear-quadratic problem (ELQP) [18].
ELQP introduces additional coefficient matrices but there are no direct links between its solutions and the solutions of the two-stage stochastic program with linear
recourse. On the other hand, our quadratic recourse function will have a direct link
with (4) in that it is proposed as an alternative way to model the original linear
recourse problem. We show that the quadratic recourse function gk(X, f) converges
to the linear recourse function O(x, J), and the solution of the two-stage stochastic
program with quadratic recourse converges to the solution of the two-stage stochastic program with linear recourse respectively, as k -- oo. An error bound is also
given for this convergence.
The remaining part of this paper is as follows. In Section 2 we present the
quadratic recourse function (X, (), discuss its meanings and point out its differences from the ELQP model. We discuss its differentiability properties and algorithmic meanings in Section 3. In Section 4 we prove that it converges to the linear
recourse function as k -x oc. An error bound is given for this convergence and a
simple example is presented to illustrate the use of this bound. We. also show thar
the solution of the two-stage stochastic program with recourse k (x, <) converges 1:3
the solution of the two-stage stochastic program with recourse X(x, (). In Section
5 we make some concluding remarks.
2 The Recourse Function
WVe now consider an alternative to (1), (4) and (5) that has the same secondstage optimal solution under our nonnegative objective assumptions. We do this
by minimizing the square of the recourse function; i.e., we solve
2(x,x)= min (qty)2
s.t. Wy = Tx- )
y > 0,
for q > 0.
Ve could use this function directly in an optimization procedure, but more
than first-order differentiability is needed for high convergence rate methods. We
therefore approximate the problem represented in (6) by the following parametrized
2

quadratic recourse function:
(x,~) = min (qty)2 + kWy- Tx + 12 + ek
(7)
s.t. y > 0,
where |lvll2 = vtv is the 2-norm of the vector v, and k and Ek are two positive
parameters. We also assume qk(X, c) > 0. Thus, there is a clearly defined Ok that
satisfies equation (7). Correspondingly, (1) and (5) are replaced by
min fk(x) = ctx + Ik(X)
s.t. Ax =b (8)
x > 0
where
1
(Ik(X) E(qk(x, )) = k k(x, Sj )pj. (9)
j=1
We may think of (7) as an alternative representation of the situation expressed by
the equations of (6). Instead of minimizing (qty)2 with the constraint Wy = T -,
we minimize the weighted sum of (qty)2 and \IWy - Tx + ll2, with k reflecting
the relative importance of satisfying the T - = Wy constraints compared to
minimizing (qty)2. The additional parameter k is added to ensure t hat > Ek > 0.
This will be useful for establishing differentiability properties of Ok. [We could also
replace k by an n x n diagonal matrix K such that each diagonal element of K.vweights" a component of Wy - Tx + <. In this case, (7) would have the form
2(x, ) = min (qty)2 + (Wy - Tx + )tK(Wy -Tx + J) + (1
s.t. Y > 0.
A^ain. for simplicity, we restrict our discussion to the form (7), although the results
)below can be shown to hold under the conditions IIK\ -- oc and k -- 0.]
The idea of converting the objective function of (2) by a quadratic form is not
Iiew. For example, the ELQP model considers the dual problem of (4):
O(x, )= max (Tx- )tu
s.t. WTu _ q,
whlere u is the dual variable vector. ELQP adds a quadratic term iutHu to (11),
to (obtain
OH(X, ) = max (Tx - Y)tu + 2utHu
s.t. WTu < q,
whlere H is an n x n positive definite symmetric monitor matrix.
As a result of formulations such as (12), superlinearly convergent algorithms have
l)een developed by Qi and Womersley [16], Chen, Qi and Womersley [5], and Birge,
Chien and Qi [2]. However, the model represented by (12) has no immediate relation
to tile original linear recourse model (4), and, in particular, it is hard to interpret
the "meaning" of the matrix H. Moreover, although we may let IIHII - 0 so that
Oi, -+ A, this results in computational instability in the second-order algorithms
used to solve the two-stage recourse problem with recourse OH.
On the other hand, as we show below, when we solve the quadratic version of
the original problem, not only does ok - as k -> oc, but the algorithms used to
solve the problem represented by (8), (9) and (7) are stable as k -- oo.
3

3 Differentiability of the Quadratic Recourse Function
By the theory of linear programming, k(x,( ), defined by (4), is not in general
differentiable with respect to x. This makes it impossible to apply superlinearly
convergent methods, such as the Newton method, to solve the stochastic program
defined by (1), (4) and (5). Classically, superlinear convergence of a Newton method
for solving a nonlinear optimization problem requires that the objective and the
constraint functions of the nonlinear optimization problem are twice continuously
differentiable. In [14], based upon the superlinear convergence theory for nonsmooth equations [11, 13, 15], Qi developed superlinearly convergent generalized
Newton methods for solving a nonlinear optimization problem, whose objective
and constraint functions are SC1, (i.e., they are continuously differentiable and
their derivatives are semismooth [12]).
In general, an SC1 function is not twice differentiable. A nonlinear optimization
problem with an SC1 objective function and linear constraints is called an SC1
problem [12]. It has been shown that the ELQP is an SC1 problem [14, 16]. The
superlinearly convergent generalized Newton method proposed in [14] was globalized
by using a line search and the trust region strategy in [12] and [7] respectively. These
methods were applied to the ELQP in [2, 5, 16].
In this section, we will show that the two-stage stochastic program defined by (8),
(9) and (7) is also an SC1 problem. This opens the way to apply the superlinearly
and globally convergent generalized Newton method [14, 12, 7, 2] to solve this
problem. Before doing this, we briefly review the definition of semismoothness of a
vector function and related concepts of generalized Jacobians of vector functions.
Suppose that F: Rn -* Jm is a Lipschitz vector function. By Rademacher's
theorem, F is differentiable almost everywhere. Let DF be the set where F is
differentiable. At any point x E ERn, the B-differential [11, 13] of F at x is defined
by
AsF(x) = { lim VF(y)},
yEDF
which is a nonempty set. The Clarke subdifferential [6] of F at x is
aF(x) = conv aBF(x),
which is a nonempty convex set. If for any h E Rn,
lim {Vh} (13)
'EaF(x+th)
tlo
exists, then we say that F is semismooth at x. In this case, F is also directionally
differentiable at x and F'(x; h) equals the limit in (13). The concept of semismooth
functions was introduced for functionals by Mifflin in [10] and extended to vector
functions by Qi and Sun in [15].
The following theorem establishes the SC1 property of the stochastic two-stage
problem with quadratic recourse.
4

Theorem 1 The stochastic program (8), where (k is defined by (9) and (7), is an
SC1 problem.
Proof. Since O > CEk > 0, it suffices to prove that Oq is an SC1 function with
respect to x. Define z Tx - ~ and 4k(Z) - 0(x, ). Then rewriting (7) gives
Ok(Z) = min (qty)2 + kllWy - z112 + k (14)
s.t. y0.(14)
sSA. Y > 0.
If Ok is differentiable with respect to z, then q2 is differentiable with respect to x
and
Vb(x, 2 ) = Ttvzkk(z).
Let Z = (2zt 0)t E Rr+l W= ()kW and u = Wy E r+1. Then = {u
Wy: y > O} is a polyhedron in r+l.1
Define g: r+l1 -_ U {+o}) by
u { utu, if u E U,
+oc, otherwise.
Then g is a closed proper extended-valued strongly convex function.
We now can write (14) as
'/k(Z) = kztz-max{Ztu-utu} + Ek (15)
U UU
kztz - g*(2z, 0) + k, (16)
where g* is the conjugate function of g. By Theorem 23.5 of [17], since g is strongly
convex. g* is finite and continuously differentiable everywhere and its derivative at
z is the unique maximum point in the maximum operation in (15). Furthermore,
the derivative of g* is Lipschitz. Actually, it is not difficult to see that the unique
maximum of the maximum operation in (15) is piecewise linear with respect to z.
Hence, the derivative mapping of g* is semismooth [15]. This shows that sk, hence
4).. is an SC' function. Therefore, the stochastic programming problem (8), (9)
and (7) is an SC1 optimization problem. O
Now we can apply the generalized Newton (SQP) method proposed in [14, 12, 7]
to solve (8). With an adequate nonsingularity condition, this method is superlinearly and globally convergent. In the next section, we will show that Vbk(z) converges to =(z) - o2(x, s) as k -- oc, and give an error bound. This shows that the
generalized Newton method is also stable for this problem.
It is noted that, although g* is convex, Ok and 4)k are not convex in general. In
fact, by (15), if W is a nonnegative matrix, then Ok is the difference of two convex
functions of -.
If P is continuous, with an argument similar to that in [3] we can show that
(k is twice differentiable. Then superlinear convergence can be established for
quasi-Newton methods solving (8).
5

4 Approximation to the Linear Recourse Function
In this section, we show that Ok(Z) approximates /(z) monotonically from below
when k -. oo, and give an error bound for this approximation.
Theorem 2 Suppose that (4) is feasible for z = Tx - C. Then for any 0 < k < K,
we have
k(, -6Ek < 4(x)-E6K < X(X
Proof. Let Yk and YK be solutions to the minimum operations in the definitions of
)k(Z) and V/K(Z) by (14) respectively. Let y* be any feasible solution of (4). Then
we have
Ik(Z) - k = (qtyk)2 + k\IWyk - z112
< (qtyK)2 + kIIWyK - z112 by optimality of yk
< (qtyK)2 + KIIWYK - z112 since k < K
-= K(Z) - EK
< (qty*)2 + KgiWy* - z112 by optimality of YK
= (qty*)2 by feasibility of y*
= (z). (17)
The conclusion of the theorem follows by taking square roots (and noting that
qty > 0). D
Corollary 1 In Theorem 2, as k - oo,
\IWyk - zll - 0,
where Yk is the unique solution of the minimum operation in the definition of k (z).
Proof. By step 2 of the proof of Theorem 2,
0 < k\lWyk - Z1\2 _< (Z) - (qtYk)2 < () < 00.
The conclusion follows since V(z) is finite. D
The next theorem shows that as k -x o0, Ok(X, <) converges to 0(x, J) and gives
an error bound.
Theorem 3 Suppose that (4) is feasible for z = Tx - and a is the maximum value
of 2-norms of optimal dual solutions of (4). Then, q/5(x,x)- ek monotonically
converges to ~(x, ') from below and, for k large enough,
0< < - (18)
- (Vk)
6

Proof. Let yk and y* be the same as in Theorem 2. Then, yk solves
2b(x, )-k = min qty
s.t. Wy = Wyk (19)
y > 0,
while y* solves
$(x, m)= min qty
s.t. Wy = z (20)
y > 0.
By Corollary 1, (19) can be regarded as a perturbation of (20) with a small change
of WYk - z on the right hand side of the constraint of the linear program (20). By
the perturbation theory of linear programming, we have
I x)-(Ek - (X,)j < a\IWYk - zl1.
By the proof of Corollary 1,
qty.
\\Wyk - zl < y
By Theorem 2,
v/k(X, ()-e k < ~(X, /).
Combining these three inequalities, we have (18). The last conclusion follows. O
Corollary 2 If ek -> 0, then ~k(X, 0) converges to Xb(x, )).
Proof. The conclusion follows mfrom (18) and Theorem 2. -
Example. To see how these bounds appear in practice, consider the following
simple example (where Ek = 0 for illustration).
((x, <) = min Y1 + Y2 + Y3
s.t. Y1 +Y3 = z(
Y2 + Y3 = Z2
y > o.
Figures 1 and 2 show convergence of ~k for z = (1,0.5) and z = (1, 1). Note that
the convergence is somewhat faster for z = (1, 1). In this case, the dual solution
to (21) is not unique and has the same maximum norm, a = 1, as for z = (1, 0.5).
The convergence behavior for z = (1, 1) may benefit from a smaller norm for the
average of the norms of dual solutions.
Theorem 3 also gives us the condition for the optimal solutions of (7) to converge
to an optimal solution of (2). We use the following two results from the theory of
epi-convergence ([1]). A sequences of functions, {f"}, epi-converges to f, if the
epi-graphs of f", converge as sets to the epi-graph of f.
7

z1=1,z2=0.5
1
0.8 —
0.6 -0.4 -0.2 -0
0
dft~~~~~~~~~~~r
- ----
20
1
40
k
60
80
Figure 1: Convergence of ~k to X for z = (1, 0.5).
z1=1,z2=1
1
0.8 -
0.6
0.4
0.2 -
0 10 20 30 40
k
Figure 2: Convergence of Ok to ) for z = (1, 1).
8

Theorem 4 (/4], Theorem 2.3.) Suppose a sequence of functions, {g VI v = 1,... }
epi-converges to g, then lim sup,,, (inf g') < inf g, and, if xk E argmingk for some
subsequence, {gV}j, of functions from {gvk}1, and x = limk X k, then
x E argming, and lim(inf g~k) inf g. (22)
k
The next result shows that the sequence {q~k} converges to b
Theorem 5 (/1], Proposition 3.12). Suppose a sequence of functions, {g V jj
1,..,} pointwise converge ever~ywhere to g, i'.e., lim, gv (x) = g (x). If the gVl are
monotone increasing, or monotone decreasing, and g is lower semi-continuous, then
gV epiz-converges to g.
The following theorem follows immediately from these observations.
Theorem 6 (/4/, Theorem 2.7.) Suppose {g, g', =' 1,...} form a collection of
functions defined on ~, with value in RJUoc, for all x, ~ -gx,)ismaurable,
if P[~Ig(x, ~)) < cc] = 1, then j4 g(x, ~)P(d<) < oc, and gv(x, ~) epi-converges to
g(x. ~) for all all ~ E 3, and all g1' are absolutely bounded by integrable functions,
then the expectation functionals, fa gv(x, ~)P(d<) <00c, epi- and pointwise converge
t o J,,g (x,~) P (ck) < 00.
NowN, we can state our result for convergence of the optimal solutions.
Theorem 7 Suppose that ~ has finite second moments, let { xk wk}7 be a sequence
of op~timial p~rimal-dual solution pairs to (8), where the multipliers irk are associated
ut/ theliner costrints Ax b. f {k. wk} has a i'mit point, (x* wF*), with
( 0. thenl (x*. ir *) is an optimal primal-dual solution pair f()
Proof. If ~ has second moments, then it can be shown easily that q is absoliitelv integ-rable (see, for example, ([8])) and that, if P[~Iq5(x, ~)) < 00] =1, then
f co.r )P(d<) < cc. The extensions of these results to kk follow, for example,
fromIl Theorem 3.
To invoke Theorem 6, letk ~(X,) = (,)-Ek and define gk (XI =kX
('JAlx=b~x>O}(x) Where the indicator function, 8s(x) = 0, if x E- S, and +00 if x ~~ S.
U'sing, the bound in Theorem 3 and the result from Theorem 5, we then meet the
(oii~fitiofls for Theorem 6. If we replace fk in (8) with fk, where /k is replaced by
Qk- thenr the expectation functionals fk converge. Using Theorem 4, we have that
hiniit points of minimizers of fk are optimal in (1).
Next, we must relate fk to fk. Note that the optimality conditions for I-k yield
3'A* and i'7k such that
c+V~k(ik) -?rt A = Pk ~ Oi '-k = 0. (23)
For fk, the optimality conditions are:
c+V4)k(xk) - 7i4A = Pk ~- 0, P ~ = 0. (24)
9

Note that for any x such that qk is finite for all <, we have Vlk continuous and,
thus,
V k(Z) =S fVk (X,i)P(d<)
I k (X,)Vo (X, ) P) (25)
= f ^ (25)
= AkVDk(X),
where Ak - 1 as k - oo. For any limit point, (, -r) of (Xk, 7), we must have,
from (24):
c + V~(x(.) - TtA = p > 0, (26)
where <>(x) = limk Jk(x). Thus (xt, r) forms an optimal primal-dual pair in (8) with
I replacing 1k, and (x, T) must be optimal in (1) by the argument above. E
5 Conclusions
We present a quadratic approximation of the stochastic program with linear recourse. The approximation uses a linear-quadratic loss function similar to that
used in decision theory. It also allows the use of superlinearly convergent methods
that cannot be directly to the linear recourse problem. We showed that the approximation converges and that solutions of the approximation also converge to those
of the original problem.
Our interest in this form of quadratic approximation extends beyond an advantage for fast computational methods. We believe that the approach may also enable
further approximations using known moments of the relevant random variables.
References
[1] H. Attouch and R. J-B Wets, Approximation and convergence in nonlinear optimization, in: O.L. Mangasarian, R.R. Meyer and S.M. Robinson, Nonlinear
programming, 4, Academic Press, New York-London, 1981.
[2] J.R. Birge, X. Chen and L. Qi, A stochastic Newton method for solving stochastic quadratic programs with recourse, AMR 94/9, Applied Mathematics Report, University of New South Wales, 1994.
[3] J.R. Birge and L. Qi, Subdifferentials in approximation for stochastic programming, SIAM Journal on Optimization 5 (1995) 436-453.
[4] J.R. Birge and R. J-B. Wets," Designing approximation schemes for stochastic
optimization problems, in particular, for stochastic programs with recourse,"
Mathematical Programming Study 27 (1986) 54-102.
[5] X. Chen, L. Qi, and R. Womersley, Newton's method for quadratic stochastic
programs with recourse, Journal of Computational and Applied Mathematics
forthcoming.
10

[6] F.H. Clarke, Optimization and Nonsmooth Analysis Wiley, New York, 1983.
[7] H. Jiang and L. Qi, A globally and superlinearly convergent trust region algorithm for convex SC1 minimization problems and its application to stochastic
programs, AMR 94/7, Applied Mathematics Report, University of New South
Wales, 1994.
[8] P. Kall, Stochastic Linear Programming, Springer Verlag, Berlin, 1976.
[9] P. Kall and S.T. Wallace, Stochastic Programming, Wiley, New York, 1994.
[10] R. Mifflin, Semismooth and semiconvex functions in constrained optimization,
SIAM Journal on Control and Optimization 15 (1977) 957-972.
[11] J.-S. Pang and L. Qi, Nonsmooth equations: Motivation and algorithms, SIAM
Journal on Optimization 3 (1993) 443-465.
[12] J.-S. Pang and L. Qi, A globally convergent Newton method for convex
SC' minimization problems, Journal of Optimization Theory and Applications
forthcoming.
[13] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Mathematics of Operations Research 18 (1993) 227-244.
[14] L. Qi, Superlinearly convergent approximate Newton methods for LC1 optimization problems, Mathematical Programming 64 (1994) 277-294.
[15] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathematical
Programming 58 (1993) 353-367.
[16] L. Qi and R.S. Womersley, An SQP algorithm for extended linear problems in
stochastic programming, Annals of Operations Research forthcoming.
[17] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton,
N.J., 1970.
[18] R.T. Rockafellar and R.J.-B. Wets, "A Lagrangian finite-generation technique
for solving linear-quadratic problems in stochastic programming", Mathematical Programming Study 28 (1986) 63-93.
11