A VARIANT OF THE TOPKIS-VEINOTT METHOD FOR SOLVING INEQUALITY CONSTRAINED OPTIMIZATION PROBLEMS' J.R. BIRGE Department of Industrial and Operations engineering University of Michigan, Ann Arbor, MI 48109, USA L. QI and Z. WEI School of Mathematics, University of New South Wales, Sydney NSW 2052, Australia Technical Report 97-11 June 18, 1997

A Variant of the Topkis-Veinott Method for Solving Inequality Constrained Optimization Problems 1 J. R. BIRGE Department of Industrial and Operations Engineering University of Michigan, Ann Arbor, MI 48109, USA L. QI and Z. WEI School of Mathematics, University of New South Wales, Sydney NSW 2052, Australia June 18, 1997 Abstract. In this paper, we give a variant of the Topkis-Veinott method for solving inequality constrained optimization problems. This method uses a linearly constrained positive semi-definite quadratic problem to generate a feasible descent direction at each iteration. Under mild assumptions, the algorithm is shown to be globally convergent in the sense that every accumulation point of the sequence generated by the algorithm is a Fritz-John point of the problem. We introduce a Fritz-John (FJ) function, an FJ1 strong second-order sufficiency condition (FJ1-SSOSC) and an FJ2 strong second-order sufficiency condition (FJ2-SSOSC), and then show, without any constraint qualification (CQ), that (i) if an FJ point z satisfies the FJ1-SSOSC, then there exists a neighborhood N(z) of z such that for any FJ point y E N(z) \ {z}, fo(y) $ fo(z), where fo is the objective function of the problem; (ii) if an FJ point z satisfies the FJ2-SSOSC, then z is a strict local minimum of the problem. The result (i) implies that the entire iteration point sequence generated by the method converges to an FJ point. We also show that if the parameters are chosen large enough, a unit step length can be accepted by the proposed algorithm. Key Words. Constrained optimization, feasibility, global convergence, unit step, stochastic programming. AMS subject classification. 65K10, 90C30. 52A41.'This material is based on work supported by the National Science Foundation under Award Number DDM9215921 and DMII-9523275 and the Australian Research Council. 1

1. Introduction Consider the optimization problem minfo(x) xE X}, (1.1) where X = {x E Rn I fi(x) < 0, i = 1,... m} and fi: Rn -., i = 0,1,..., m, are continuously differentiable. Assume that X $ 0. Denote I = {1,..., m} and 10 = IU{O}. Let x E X be a given feasible point of (1.1). Denote I(x) = i | iE, fi(x)= }. Recall that x is said to be a Fritz-John point of (1.1) if there is a vector (r, u) E R x Rm such that (r, u) > 0, (r, u) $ 0 and TVfo (x) + E=l ui Vfi(x) = 0; uifi(x) = 0, for i E I; (1.2) fi(x) < 0, for i E I. We call such a pair (Tr, u) an FJ multiplier of x, and denote the set of all possible FJ multipliers associated with x by M(x). If r $t 0, then such an FJ point x is also a Karush-Kuhn-Tucker (KKT) point of (1.1). Given x1 E X, methods of feasible direction [1, 19, 24, 26, 35] for solving (1.1) construct a sequence {xk}1=l such that, for all k, xk E X (1.3) and fo(xk+l) < fo(x). (1.4) These two conditions turn out to be important in their own right in many contexts, e.g., (i) when the objective function is not well defined outside the feasible set X or (ii) in real-time applications, when it is crucial that a feasible solution is available at the next "stopping time" (see [22] for discussion of optimization problems arising from design problems). Some methods for solving (1.1). such as SQP type algorithms, stop at a KKT point of (1.1) [1. 5. 10. 13, 14, 18, 24, 25, 26, 27, 34]. In the following example, no KKT point exists, though the problem has a unique solution, FJ point, and bounded level sets. Methods that only stop at KKT points may not work for this example. Example 1.1 Consider the problem (P1): min fo(x):= (X1)2 + X2 + (23)2 s.t. fi(x):= xl + 5(23)2 < 0, f2(x):=- -X- (x2)5 < 0. It is not hard to verify the following properties: (al) (P1) has a unique solution x* = (0, 0,O)T; 2

(bl) (P1) has a unique FJ point x* = (0, 0, O)T; (cl) (P1) does not have a KKT point; (dl) for any Co > 0, the level set L(Co) = {x E R3 I fo(x) < Co, x e X} is bounded; (el) for each x E X \ {x*}, the linear independence constraint qualification (LICQ) holds. The following example shows that even if (1.1) has a KKT point, the solution of (1.1) is only an FJ point. Example 1.2 Consider the following problem (P2) min fo(x):= (X1)2 + x2 s.t. f(x):=- - (X2)5 < 0, f2():= 1 + (X2)2 + (x3)2 < 0. It is not hard to verify the following properties: (a2) (P2) has a unique solution x* = (0, 0, O)T; (b2) x* = (0,0, O)T is an FJ point of (P2), but it is not a KKT point of (P2); (c2) (P2) has a KKT point x = (-1, 1, )T; (d2) for any Co > 0, the level set L(Co) = {x E 3 | fo(x) < Co, x E X} is bounded. From properties (a2) and (c2) of this example, even if the SQP type algorithms [1, 5, 10, 13, 14, 18, 24, 25, 26, 27, 34] work for (P2), they may not be able to find its solution. Given an estimate x E X of the solution x* to the problem (1.1), Topkis and Veinott [35] generated a search direction using the following linear programming min v s.t. Vfo(x)Td < v, fi(x) + Vfi(x)Td < v, i E I, -1 < d <l 1, j =,...,n. Like other feasible direction methods, once a feasible descent direction is obtained, a line search is needed for the Topkis-Veinott method. From [1] and [35], the Topkis-Veinott method works well for (P1) (also (P2)) though the SQP type methods mentioned above do not. However, the subproblem (1.5) is also a source of disadvantages associated with the Topkis-Veinott method. Due to linearity, (1.5) always produces extreme point solutions, and the directions obtained may therefore depend more on properties of the feasible set X than on properties of the objective function fo. Consequently, the convergence of the algorithm is, in general, slow and characterized by zigzagging. The approach suggested in this paper is based on the above two examples and a modification of the subproblem (1.5) of the Topkis-Veinott method. In order to avoid nonuniqueness of the direction, the extreme point solution in (1.5) and the slow convergence behaviour of the TopkisVeinott method, we introduce a penalty term in the objective function of (1.5) and some weights in the constraints of (1.5). More precisely. we use the following positive semi-definite quadratic problem to generate a search direction d at an estimate x E X of the solution x* to the problem (1.1), min v + -dTH(x)d s.t. Vfo(x)Td < cov, (1.6) fi(x) + Vfi(x)Td < civ, i E I, 3

where H(x) is a symmetric positive definite n x n matrix. We present the algorithm in Section 2 and establish its global convergence in Section 3. It is known that the Topkis-Veinott method is globally convergent in the sense that every accumulation point of the iteration point sequence is an FJ point of the problem without any CQ. However, two questions remain: (I) Does the entire iteration point sequence converge to an FJ point of (1.1)? (II) Is such an FJ point a strict local minimum of (1.1)? Robinson [33] proved that if a KKT point satisfies the LICQ, the second-order sufficiency conditions (SOSC) and the strict complementarity slackness (SCS), then it is an isolated KKT point. This result has been widely used to prove the convergence of the entire iteration point sequence generated by algorithms for general nonlinear programming [33, 24, 25, 26, 27]. In 1982, Robinson [34] proved that if a KKT point satisfies the Mangasarian-Fromovitz constraint qualification (MFCQ) and the strong second-order sufficiency conditions (SSOSC), then it is an isolated KKT point. From this result and the fact that every FJ point of (1.1) is a KKT point if the MFCQ holds, we can easily deduce that if the MFCQ and te SSOSC hold, then the above question (I) has a positive answer. Recently, Qi and Wei [30] proved that the Robinson conditions can be reduced to the the positive linear independent regular condition (PLIRC) and the SSOSC. For question (II), Qi [29] proved, for LC1 optimization problem, that if a KKT point satisfies the SSOSC, then it is a strict local minimum. In Section 4, we introduce an FJ function, an FJ1-SSOSC and an FJ2-SSOSC, and then prove, without any CQ, that (1) if an FJ point z satisfies the FJ1-SSOSC, then there exists a neighborhood N(z) of z such that for any FJ point y E N(z) \ {z}, fo(y) 0 fo(z); (2) if an FJ point z satisfies the FJ2-SSOSC, then it is a strict local minimum of (1.1). These results improve (a) the conditions for the convergence of the entire iteration point sequence generated by algorithms for general nonlinear programming [33, 24, 25, 26, 27] and (b) the conditions of a strict local minimum [29]. Consider the following two-stage stochastic programs with fixed recourse [2, 3, 4, 6, 7, 16, 31, 32, 36] min P(x) + N(x) s.t. Ax < b, where P is a twice continuously differentiable convex function from X -X R,,(x) = J Ij(w - Tx)iJL()d, (w-Tx) = max{-IyTHy+yT(w-Tx) \ Wy < q}, H E Rmxm is a symmetric positive definite matrix, c E n IA E Rrxn b E r 7T E Jmxn,q E Rml1 and U4 E 3sml xm are fixed matrices or vectors, u E 3Qm is a random vector and p: Rm -2, + is a continuously differentiable probability density function. Since it is impossible generally to demand the exact evaluation of the function X and its gradient, we always consider approximate problems of the form min P(x) +N(x) 17 s.t. Ax < b, 4

where N N(X) = E ai(wi - Tx)(wi). (1.8) i=1 The weights {oai}N 1 and points {wi}P1l are generated by a form of multidimensional numerical integration rule. For the above problem (1.7), the calculation of the value of the objective function P(x) + (N(x) at xk requires solving N quadratic programs (i=1,..., N) O(wi - Txk) = max{ —yTHy + T(w - Txk) I Wy < q}. This implies that the following line search procedure fo(Xk + dk) - fo(xk) < a/Vf/(xk)Tdk is very costly, where fo(x) = P(x) + qN(X). it is, therefore, an interesting issue to avoid a line search for such problems. In Section 5, we prove that the unit step can be accepted by the proposed algorithm if the parameters ci, (i E 10~) are chosen large enough under the assumption that Vfi(i E 10) are globally Lipschitz on R'. This may be useful for solving the problems when fi(x) and Vfi(x), i E I~, are extremely difficult to calculate at each point x. Throughout the paper, we denote the Euclidean norm of a vector w by liwli and supp(w)= {i I W- 0O}. 2. The algorithm Algorithm 1 (Initialization). Let a E (0,1),p E (0,1), cl > 0,(i E 10); choose x1 E X and a symmetric positive positive matrix H1 E Rnxn; set k= 1. Step 1: Solve the following subproblem min v + dTHkd 2 s.t. Vfo(xk)Td < cv, (2.1) f(xk) + Vfi(xk)Td < civ, i E I. Let ((dk)T, Vk)T be one of its solutions. Step 2: If ((dk)T, vk)T = 0, xk is a Fritz-John (FJ) point for (1.1), stop. Step 3: Let jk be the smallest nonnegative integer j such that fo(xk + ^dk) - fo(xk) < acV fo(xk)Tdk (2.2) and fi(xk + pdk) < O, i E I. (2.3) 5

Set tk = pk. Step 4: Generate ck+l > 0, for i E 1I. Compute a new symmetric positive definite approximation matrix Hk+l to the Hessian of the Lagrangian. Set xk+l = xk + tkdk, k:= k + 1 and return to Step 1. Remark 2.1 There are many methods to solve (2.1). One may use the dual quadratic programming method of Kiwiel [17] and the interior point algorithms for convex quadratic programming of Daya and Shetty [9], Hertog, Roos and Terlaky [15], Goldfarb and Liu [12], Monteiro and Adler [20], Monteiro, Adler and Resende [21], and Ye and Tse [38]. In fact, if we let Pk = (Vfo(xk), fi (k),... Vfm(k))T and ak = (O, -f/(x),...,-f(xk))T, then the dual problem of (2.1) is min yT (Pk HkPk)y + (ak)Ty s.t. EMZno0y 1, (2.4) y > 0, where y E sm+l. It is clear that (2.4) is a standard-form positive semi-definite convex quadratic programming problem. Hence, we can use the interior point algorithms mentioned above to solve (2.4). Suppose that yk is one of the solutions of (2.4). Then dk = -HkPkyk is a solution of (2.1). Notice that (2.4) is also a least-squares problem. One can also use other methods such as the methods given in [11] to solve (2.4). The following proposition indicates that Algorithm 1 is well-defined. Proposition 2.1 For any given k, if ((dk)T,vk)T = 0, then xk is an FJ point of (1.1). If ((dk)T, vk)T: 0 and xk E X, then there exists Tk > 0 such that for any t E [0, Tk], fo(xk + tdk) - fo(xk) < atVfo(xk)Tdk and fi(k + tdk), iEI. Proof: Since ((dk)T,vk)T is a solution of (2.1), we have the following Fritz-John conditions 6

(Theorem 4.2.6 of [1]), UkilHkdk + EM=0 /Vfi(xk) = 0; E= k-Hk i= Ik Em=O Cii = -l1; ufk(Vfo(xk)Tdk - CVk) = 0; (2.5)'ik(fi(xk) + Vfi(xk)Tdk - Vk) = 0, for i E I; Vfo(zk)Tdk < cvk; fi(xk) + Vfi(xk)Tdk < cikvk, for i E I; where Uik1 > 0 and iuk > 0 are some multipliers with (uk 1, Uk) 0. From the second equality of (2.5), uik I O. Hence, if we let kik ui -1 then, the following Karush-Kuhn-Tucker (KKT) conditions for (2.1) hold, Hkdk + Em=o UkVfi(xk) = 0; Cln=o Ckuik = 1; ok(Vfo(xk)Tdk - cvk) = 0; (2.6) u (fi(xk) + Vfi(xk)Tdk - CVk) = 0, for i e I; Vfo(xk)Tdk F< cVk; fi(xk) + Vfi(xk)Tdk < ckVk, for i E I. If ((dk)T, vk)T = 0, then the above KKT conditions imply that there exist some multipliers u/k > 0 such that Etou 2Vfi(xk) = 0; ukf,(xk) = O, for i E I; fi(xk) < 0, for i E I; therefore, xk is an FJ point for (1.1) and the proof of the first part of this proposition is completed. We prove now the second part of this proposition. From Step 1, since (dT, v) = (0,0 O) is a feasible solution of (2.1), we have Vfo(k)Tdk Ck < - (dk)Hkdk, (2.7) 7

which implies that dk = 0 if and only if vk = 0. If ((dk)T, Vk)T $ 0, then dk $ 0 and Vk < -1(dk)THkdk < 0. Therefore, Vfo(Nk)Tdk < _-(dk)THkdk < 0 (2.8) and /i(x) + Vfi(Xk)dk < — _ (dk)THkdk <0, for i I. (2.9) If we let qk(t) = fo(x + tdk) fo(xk) - tVfo(k)Tdk, then, we obtain, from (2.8), that qk(t) = tVfo(xk)Tdk + o(t) -atVfo(xk)Tdk = (1 - a)tVfo(xk)Tdk + o(t) < -( - )o (dk)THkdk)t + o(t) 2 which implies that there exists rk > 0 such that for any t E [0, 0k], qk(t) < 0. (2.10) Similarly, for i E Ik = {i: fi(xk) = 0, i I}, from (2.9) and fi(k + tdk) = f,(Xk) + tVf(xk)Tdk + o(t) = tVfi(k)Tdk + o(t) < -((dk)THkdk)t + o(t), we have r7 > 0 such that for any t E [0.,k], fi(xk + tdk) < 0. (2.11) For i E I \ Ik. from the continuity of f, and fi(Xk) < 0, we have rTk > 0 such that for any t~ [o. r?]. t E [0 T]. fi(xk + tdk) < 0. (2.12) Let Tk = min{rk |I i E I~}, the conclusion follows from (2.10), (2.11) and (2.12). Q.E.D. Remark 2.2 It is worth noting that, for any given FJ point x of (1.1), if the MFCQ holds, i.e., 8

(H) there is a vectors z E Rn such that Vfi()Tz < 0, for i E I(x), then such an FJ point x is also a KKT point of (1.1). Furthermore, if fi, i E ~1 are convex, then the KKT point is also an optimal point of (1.1). In this paper, we do not assume (H) holds. Therefore, Algorithm 1 generates only an FJ point of (2.1). 3. Global convergence Throughout the sequel, the following conditions are assumed to hold. (Al) For i E 1~, 0 < lim inf c_ < lim supci < +oo; k-oo k-Ioo (A2) the Hessian estimates {Hk}Q=o are bounded, i.e., there exists a scalar C1 > 0 such that for all k, IIHkIl < C1; (3.1) (A3) there exists a scalar C2 > 0 such that, for all k, the Hessian estimates satisfy dTHkd > C211dl12, for any d E Rn. (3.2) Theorem 3.1 Assume that (A1)-(A3) hold. Then Algorithm 1 described in Section 2 either stops at an FJ point or generates a sequence {xk}j1= for which each accumulation point is an FJ point of (1.1). Proof: The first statement is obvious by using Proposition 2.1. Thus, suppose that there exists an infinite index set KC such that {xk}keC -- x*. From (2.7), we have for k E C, -IIV/o(Xk)llldkll < cvk -C2Ckll k1121 which implies that for k E KC, Ildkl < 2 IIVfo(xk)ll (3.3) C2 lim infker; and Ivkl < 7l|Vfo(xk|112 (3.4) C2(lim infkkEJ co)2 The second equality of the KKT conditions for (2.1), (Al), (3.3) and (3.4) imply that we may assume, without loss of generality. that lim dk = d*; lim vk = v*; kE C.k-..o kE C,k —oo lim =lim im c = c for i I~. kcfKkc-oc l kEC,k-oo 9

It is clear that d* = 0 if and only if v* = 0. We assume by contradiction that x* is not an FJ point of (1.1). From the KKT conditions for (2.1), we can deduce that d* $ 0. On the other hand, from (2.8) and (2.9), we have Vfo(x*)Td* < C2 d* 112 <o and fi(x*) + Vfi(x*)Td* < -C2 2 ld* 112 < 0 for i E. Therefore, there exists 6 > 0 such that for all k E C, k large enough, Vfo(xk)Tdk <-6, (3.5) Vfi(x k)dk < -6, for i E I(x*) (3.6) and fi(xk) < -6, for i E I \ I(x*). (3.7) Similar to the proof of Proposition 2.1, using (3.5), (3.6) and (3.7), we can prove that there exists T* > 0, such that for k E KC, k large enough, tk >,T > 0. (3.8) From (2.2), (3.5) and (3.8), we have, for k E K. k large enough, fo(k+l ) fo(Xk) < — C6,, which contradicts the fact that fo(xk+') - fo(xk) -- 0 as k -- oo. The proof is completed. Q.E.D. The following proposition shows that the existence of an accumulation point in the sequence generated by Algorithm 1 induces some regularity properties on this sequence. Proposition 3.1 Assume that (A1)-(A3) hold. Suppose that the sequence {xk}= 1 generated by Algorithm 1 has an accumulation point. Then l Im |Ik+1 _ xki = 0. (3.9) k-oc Proof: Since fo(xk) is monotonically decreasing, existence of an accumulation point of {xk}1= and continuity of fo imply that the sequence {fo(xk)}k^=l is convergent. From fo(xk + tkd) - fo(k) < atkVfo(xk)Tdk, Vfo(x)Tdk < _Co dk 2 and (Al). we have lim tkIdklI2 = 0, k-ooc 10

which implies that lim tklldk11 = 0. (3.10) k —-oo Since i|Jk+l _ -kll < tklldkll, the claim follows (3.10). Q.E.D. Proposition 3.2 Let Z denote the set of all possible accumulation points of {xk}.k=l If Z, 0, then either Z is a singleton or Z is a connected set. Proof: The claim follows (3.9) and Remark 14.1.1 in [23]. Q.E.D. 4. Local structure of FJ points and the convergence of the entire iteration point sequence In order to study the convergence of the sequence {xk}=l, it suffices to show that Z is a singleton. In doing so, we need some strong regularity assumptions on the functions involved in problem (1.1). For any x E nR and any (T, u) E R x SRm, denote by FJ(x, 7, u) the FJ function m FJ(x, T, u) = Tfo(x) + i uifi(x). i=l Let m F,uT(x) = VFJ(x, T, u) = TVfo(x)+ uiVfi(x). i=1 In what follows, in addition to (A1)-(A3), we assume that the following condition holds: (A4) For each i E I~, Vf/ is locally Lipschitzian, i.e., fi is an LC1 function. In general. assume that F: Rn -_ m is locally Lipschitzian. By Rademacher's Theorem, F is differentiable almost everywhere. Let DF be the set where F is differentiable. Then the generalized Jacobian of F at x in the sense of Clarke [8] is aF(x) = co{ lim VF(xk)}. (4.1) XkEDF xk _z In [28]. the author introduced BF(X) = { lim VF(xk)}. xkE DF xTk -. F is said to be semismooth at x e 3" if F is Lipschitz continuous in an open neighbourhood of x and the limit lim Dd DE9F(z+td' ) d -.d,t-1O+ 11

exists for any vector d E?n. Definition 4.1 Let z be an FJ point of (1.1). We say that the point z satisfies the FJ1 strong second-order sufficiency conditions (FJ1-SSOSC) if for any FJ multiplier (Tr, u) E M(z), all V E aFr,u(z) are positive definite on the following set Gl(z, r, u) = {d E n Vfo(z)Td =0, Vfi(z)Td = 0 for i E supp(u), Vfi(z)Td < O, for i E (I(z) \ supp(u))}. Definition 4.2 Let z be an FJ point of (1.1). We say that the point z satisfies the FJ2 strong second-order sufficiency conditions (FJ2-SSOSC) if for any FJ multiplier (r, u) E M(z), all V E aFr,u(z) are positive definite on the following set G2(z, T, u) = {d E n I Vfo(z)Td < O, TVfo(Z)Td = 0, Vfi(z)Td = 0 for i E supp(u), Vfi(z)Td < 0, for i E ((z) \ supp(u))}. It is clear that if an FJ point z satisfies the FJ2-SSOSC, then z satisfies the FJ1-SSOSC. If T + 0, then Gi = G2, the FJ1-SSOSC and the FJ2-SSOSC become.the SSOSC introduced by Robinson [34]. The following second-order mean value theorem for LC1 functions can be found in [29]. Lemma 4.1 Suppose that f: W -- t is an LC1 function on W, where W is an open subset of ~R. Let P = Vf. Then for any x, y E W, there exists t E [0, 1] and V E OP(x + t(x - y)) such that f(y) - f(x) - Vf(x)T(y - x) = -(y- x)TV(y - ). Theorem 4.1 Suppose that z is an FJ point of (1.1). If z satisfies the FJ1-SSOSC, then there exists a neighborhood N(z) of z, such that for any FJ point y E N(z) \ {z}, fo(z) + fo(y). (4.2) Proof. Suppose, by contradiction, that such a neighborhood N(z) of z does not exist. Then there exists an FJ point sequence {zk}i=l1 such that zk $ z, lim zk= z (4.3) k-oo and fo(zk)= fo(z). (4.4) For any given k, since zk is an FJ point of (1.1), we have (fk, uk) E M(zk). Let (T(kukk), fk)Then (Trk, u) M(Zk) and 11(Tkuk)ll = 1. We may assume, without loss of generality, that lim (k,uk) = (T, u). (4.5) k-coo 12

From the FJ conditions at zk and (Tk, uk) E M(zk), we have, for all k, TkVfo(zk) + iEI uV/f,(k) = 0; kfi(zk) = 0, for i E I; i(zk) < 0, for i E I. which combining with (4.3) and (4.5) implies I TVfo(z) + EiEI uiVfi(z) = 0; uifi(z) =0, for iEI; fi(z) < 0, for i E I. Hence, (r, u) E M(z) and FT,u(z) = 0. (4.6) Since zk is an FJ point of (1.1), we have fi(Zk) < 0 for i E I, which combining with (4.4) and the facts that uifi(z) = 0 and (T, u) > 0, implies, for all k, FJ(z, T, u) = Tfo(Z) + uifi(z) > fo(Zk) + Uifi(k) = FJ(zk,T, u). (4.7) iEI iEI By Lemma 4.1, we have FJ(z,, ) - FJ( z, u7) + FT, (z)(zk -) + (z - Z)T ( (k - ), (4.8) where Vk E OF,u(z + tk(zk - )) 0 < tl < 1. By (4.6). (4.7) and (4.8), we have for all k, (zk - z)Vk(zk- ) < 0. (4.9) By (4.1) and the Caratheodory Theorem, \j can be approximated by a convex combination of VF~.V(z^). where zj E DF,.T are close to z + tk(zk - z) as we wish, j = 0,..., n. Thus, we may choose zk E DF,.z z z as l -* o for j = 0..., n such that n 1 IIVk A- k,VFu(z )l < 3=0 where >=0 A = 1, A ~0 for all k and j =0,1,..., n. By (4.9), we have whereit: E-3=0,Xj,...., %,kkZ ) <( ) (4.10) Z ( ikI _ zl)VFu(Zj Ilzk - zll )< k(4.10)k Since FTu is locally Lipschitzian. V7F,u is locally bounded in DF.U. Without loss of generality, by passing to a subsequence, we may assume that F,u(zjk) y- Vj for V E'9BFr,,(z) and 13

k - X as k -> oo. Then =0 A = 1, Aj > 0. Since I ] L II = 1, we may also assume that d = lim oo Itk-z With this consideration, letting k —, oo in (4.10), we have n dTVJd 0<. (4.11) j=1 On the other hand, for any i E supp(u), since uk - ui, there is kci such that for all k > ki, uk > 0. Hence, for all k > li, i E supp(uk). Letting k = max{ki I i E supp(u)}, we have, for all k > k, supp(u) C supp(uk), and hence, for k > k, that fi(zk) -f() = 0, for i E supp(u). (4.12) By Taylor's theorem, we have 0 = fi(zk) - fi(z) = Vf(z)T(zk - z) + o(II|k - Zll). Dividing the above equality by jIzk - zll and letting k -- oo, we have Vf(z)Td = 0, for i E supp(u). (4.13) Similarly, from (4.4), we have Vfo(z)Td = 0. (4.14) For i E I(z), fi(z) = 0. Thus, we have, from 0 > f(zk) = fi(z) + Vf(z)T(zk - ) + o(llk -z1) = Vf(z)T(zk _ ) + o(IIzk - Zll), i.e.. Vfi(Z)T(Zk - z) + o(IIlk - zll) < 0 that. for i E I(z) Vft(z)Td < 0. (4.15) Tile relations. (4.11). (4.13). (4.14) and (4.15). contradict the assumption that z satisfies the FJ1-SSOSC. Hence the claim holds. Q.E.D. Corollary 4.1 Assume that {xk} is generated by Algorithm 1 and Z $ 0, x* C Z. If x* satisfies the FJ1-SSOSC, then Z is a singleton and lirm k =x*. (4.16) k-oc Proof. Suppose that Z is not a singleton. From Proposition 3.2, we have yl E Z such that y1 f x* and \limo-oy1 = x*. Since for all k. fo(Xk+1) < fo(xk) and there is a set /C such that linlkEc k = x*. we have fo(xk) -- fo(*), which implies that for all 1, fo(yl) = fo(z*), which contradicts Theorem 4.1. Hence Z is a singleton and (4.16) holds. Q.E.D. We now show that the FJ2-SSOSC implies a strict local minimum of (1.1). 14

Theorem 4.2 Assume that an FJ point z of (1.1) satisfies the FJ2-SSOSC. Then z is a strict local minimum of (1.1). Proof: Suppose not. Then there is a sequence {zk}1=l such that zk Z, zk # z, fi(zk) < 0 for i E I and fo(z) - fo(z) < 0. (4.17) Without loss of generality, we may assume that zk — Z lim = d. k —oo Izk- zll= Let (r, u) E M(z). Then FT,u(z) = 0. Similar to the discussions of (4.6)- (4.11), we have n dTVjd < 0. (4.18) j=1 Here VJ has the same meaning as in the proof of Theorem 3.2. For i E I, o > f (zk) = f,(z) + Vf,(z)(zk - z) + o(I|zk - zI). For i E I(z), fi(z) = 0. Thus, we have Vfi(z)T(zk - z)+ o(Ik - zll) < 0. Dividing the above expression by 1zk - zll and let k - oo, we have Vfi(z)Td < 0, (4.19) for i E I(z). From (4.17), we have Vfo(~)Td < 0. (4.20) On the other hand, from the FJ conditions. we have -rVfo(z) + E uiVfi(z)= O, IEsupp(u) which combining with (4.19). (4.20) and the fact that supp(u) C I(z), yields that Vf,(z)Td = 0. for i E supp(u), (4.21) and rVfo(z)Td = 0. (4.22) (4.18)-(4.22) contradict the assumption that z satisfies the FJ2-SSOSC. Hence, the claim holds. Q.E.D. By using Theorem 3.1, Corollary 4.1 and Theorem 4.2, we have the following result. 15

Corollary 4.2 Suppose that (Al)-(A4) hold, {xZk}=1 is generated by Algorithm 1. If {xk}k=l has an accumulation point x* satisfying the FJ2-SSOSC, then x* is a strict local minimum of (1.1) and xk - x*. Example 4.1 Let \ = {1, 2,...} and p E Ar. Consider the following example (P3) with parameter p, min fo(x):= (x1)2 + X2 s.t. f1(x): -X - (x2)5 < 0, f2(x):= XI + (X2)2P + (x3)2 < 0. It is not hard to verify the following properties. (a3) (P3) has a unique solution x* = (0, 0, O)T; (b3) x* = (0, 0, O)T is an FJ point of (P3), but it is not a KKT point of (P3); (c3) (P3) has a KKT point x = (-1, 1, O)T if p E {1, 2} (d3) for any co > 0, the level set L(co) = {x E R3 fo(x) < co,x E X} is bounded. Simple calculations yield that Vfo(x*) = (O1, )T, Vfl(x*) =(-1,,0)T, Vf2(x*)= (1,0,0) AI(x*) = {(O,a,a)T | a > O}, GI (x*,, u*)= ((0, 0, d3)T I d3 E R} and G2(x*. 7*, u*) = {(0, d2, d)T I d2<, }. For any (,, (u*)T)T = (O,a.a)T E AI(x*), / 0 0 0 VF-U.(x*)=2a 0 1 0 if p=l 0 0 1 and 0 0 0 \ VFrt..,(x*)=2a 0 00, if pEA\{l}. 0 0 1 Hence. ifp = 1, then VF,.,. (xr) is positive definite in GI(x*,,, u*) and G2(x*, 7, u*) for all FJ multipliers (T-,, (u*)T)T E A(x'). Therefore. x- satisfies the FJ1-SSOSC and the FJ2-SSOSC. Similarly, if p E / \ {1}. then x* satisfies the FJ1-SSOSC, but not the FJ2-SSOSC. We consider x now. Keep in mind p E {1, 2} in this case. Simple calculations yield that Vfo(2) = (-2, 1, )T. Vfl() = (-1,-5, 0 )T, Vf2(x*) = (1,2p, )T, 5-2p 4p+l ( = {( a,Y a,,a)T a > 0} 11 11 and Gi(XT, )-= {(0,0, ds)T I d3 E 61}. 16

Since for any (r, (iU)T)T E M(.), r: 0, we have G2(x, IT U) = G1 (x, ui). Since 200 V2fo(X) = 000 0 0 0 0 0 0 V2fl() = 0 -10 0 0 0 0 and 0 0 0 \ 0 2 0 if p=l, ( 002 V2f2() = 0 0 0 0 12 0 if p=2, k 0 0 2 we have, for any (r, (U)')T) E A/I(), that 1 0 01 a 0 28 0 if p =2. 0 0 2 a 0 42 0 if p = 2. Hence. if p E {1, 2}, then x satisfies the FJ1-SSOSC and the FJ2-SSOSC. From the above discussions, we can see that, if we solve (P3) by Algorithm 1, then Algorithm 1 either stops at an FJ point or generates a sequence {xk}I1= such that the sequence converges to an FJ point of (P3). 5. Unit steps In Section 3, we have established global convergence for Algorithm 1. In this section, we will show that if we choose the parameters ck large enough, then a unit step will always be acceptable by (2.2) and (2.3). We also obtain some new results for (1.1). In order to study the unit step, we need the following blanket assumption. (A5) For each i e I~, there exists Li > 0. such that for any x E n and any y E 3fn, IlVf(x) - Vfi(y)II < L, x - yll. 17

Theorem 5.1 Suppose that (A1)-(A5) hold and {xk}1^l is generated by Algorithm 1. If lim inf c > 2LO (5.1) k-Xoo C2(1 - a) and lim inf c > i for i E I, (5.2) k —oo C2 then for k > 1, fo(xk + dk) - fo(k) < aVfo(xk)Tdk, (5.3) and fi(xk + dk) < O, for i E I. (5.4) Proof: We first prove (5.3), i.e., qk(l) < O0. For k > 1, from (2.8) and (A5), we have qk(l) = j Vfo(xk + tdk)Tdkdt- aVfo(xk)Tdk = (1- _a)Vfo(xk)Tdk + (Vo(z + tdk) - Vfo(xk))Tdkdt, ) + ( vf0(xk + < - 2 Cok(dk)THkdk + olldk 12 (1 - a)C2co idk 112 + LoIldk 112 < -( ) 2ldk12 + Lolldkll2 2L which implies that if ck > - qk(1) < 0. Therefore, (5.3) follows (5.1). C2(l - 0C) We prove (5.4) now. For k > 1, from (2.9) and (A3) we have for each i E I, fi(xk + dk) (fi(xk) + Vfi(xk)Tdk) + (Vfi(xk + tdk) -Vf (xk))Tdkdt < C.Vk + Lildkl12 < -c (dk)THkdk + Ldk 12 2 < - l 2 ldk + Llldkl2, which implies that if ck > 2L., then fi(xk + dk) < O0. Hence (5.4) follows (5.2). Q.E.D. Let X0 denote the interior point set of X and XFJ denote the FJ point set of (1.1). From the proof of Theorem 5.1, we have the following result. Proposition 5.1 Suppose that fi, i E I, satisfy (A5). If X 0 XFJ, then X~ is nonempty. Proof: Let z E X - XFJ. For each i E I. choose ci satisfying the following relation ci > 2(Li + 1). (5.5) Consider the following minimization problem min v + dT d, s.t. V fo(z)Td < v, fi(z) + Vfi(z)Td < cjv, i E I. 18

Let dZ be one of the solution. Since z is not an FJ point for (1.1), then dz $ 0 by Theorem 3.1. Similar to the proof of Theorem 5.1, we have, for i E I, that fi(z + dz) < - ld1d2 + Li ldz l2 which implies that fi(z + dZ) < 0 by using dZ $ 0 and (5.5), i.e., X~ $ 0. Q.E.D. By Proposition 5.1, we have the following corollary. Corollary 5.1 Suppose that fi, i E I, satisfy (A5). If X~ = 0, then X = XFJ. By Theorem 5.1, we can present a variant of Algorithm 1 which does not employ a line search. The variant (Algorithm 2) is also globally convergent if the assumptions (A1)-(A5) hold. Algorithm 2 Initialization, Step 1 and Step 2 are the same as those given in Algorithm 1. Step 3: If fo(xk + dk) - fo(k) < afo(xk)Tdk (5.6) and f(k+dk) <, i E I, (5.7) go to Step 4. Otherwise, set k 2ck if (5.6) is satisfied, cA':= 2ck if (5.6) is not satisfied and for i E I, 2cit if (5.7) is satisfied, Ck:= _ 2ck if (5.7) is not satisfied. Go to Step 1. Step 4. Set xk1 = Xk + dk cL+ = c, (i E 10) and k:= k + 1, return to Step 1. Remark 5.1 It is worth pointing out that for all k > 1, tk = 1 does not imply that {xk}j= superlinearly converges to an FJ point of (1.1). In fact, dk in general is not a Newton direction. Therefore. the proposed algorithm does not possess a superlinear convergence rate. For the feasible direction methods with superlinear convergence rate, we refer readers to [24, 26, 30]. References [1] M. S. Bazaraa and C. NI. Shettv, Nonlinear Programming - Theory and Algorithms, Wiley, New York, 1979. 19

[2] J. R. Birge and L. Qi, Subdifferentials in approximation for stochastic programming, SIAM Journal on Optimization 5 (1995) 436-453. [3] J. R. Birge and L. Qi, Continuous approximation schemes for stochastic programs, Annals of Operations Research 56 (1995) 251-285. [4] J. -P. Birge and R. J -B. Wets, Designing approximation schemes for stochastic optimization problems, in particular, for stochastic programs with recourse, Mathematical Programming Study 27 (1986) 54-102. [5] J. V. Burke and S. P. Han, A robust sequential quadratic programming method, Mathematical Programming 43 (1989) 277-303. [6] X. Chen, L. Qi, and R. Womersley, Newton's method for quadratic stochastic programs with recourse, Journal of Computational and Applied Mathematics 60 (1995) 29-46. [7] X. Chen and R. S. Womersley, A parallel inexact Newton method for stochastic programs with recourse, Annals of Operations Research 64 (1996) 113-141. [8] F. H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [9] M. B. Daya and C. M. Shetty, Polynomial barrier function algorithms for convex quadratic programming, Arabian Journal for Science and Engineering 15 (1990) 657-670. [10] M. Fukushima, A successive quadratic programming algorithm with global and superlinear convergence properties. Mathematical Programming 35 (1986) 253-264. [11] P. E. Gill, S. J. Hammarling, W. Murray, M. A. Saunders and M. H. Wright, User's guide for LSSOL (version 1.0): A Fortran package for constrained least-squares and convex quadratic programming, Report SOL 86-1, Department of Operations Research, Standford University, 1986. [12] D. Goldfarb and S. Liu. An O(n3L) primal interior point algorithm for convex quadratic programming. Mathematical Programming 49 (1991) 325-340. [13] S. P. Han, A globally convergent method for nonlinear programming, Journal of Optimization Theory and Applications 22 (1977) 297-309. [14] J. Herskovits, A two-stage feasible directions algorithm for nonlinear constrained optimization. Mathematical Programming 36 (1986) 19-38. [15] D. D. Hertog, C. Roos and T. Terlaky. A polynomial method of weighted centers for convex quadratic programming. Journal of Information & Optimization Sciences 12 (1991) 187205.' [16] P. Kall and S. W. Wallace. Stochastic Programming, Wiley, New York, 1994. [17] K. C. Kiwiel, A dual method for solving certain positive semi-definite quadratic programming problems, SIAM Journal on Scientific and Statistical Computing 10 (1989) 175-186. 20

[18] D. Q. Mayne and E. Polak, A superlinearly convergent algorithm for constrained optimization problems, Mathematical Programming Study 16 (1982) 45-61. [19] A. Migdalas, A regularization of the Frank-Wolf method and unification of certain nonlinear programming methods, Mathematical Programming 65 (1994) 331-345. [20] R. D. C. Monteiro and I. Adler, Interior path following primal-dual algorithms, part II: convex quadratic programming, Mathematical Programming 44 (1989) 43-66. [21] R. D. C. Monteiro, I. Adler and M. G. C. Resende, A polynomial-time primal-dual affine scaling algorithm for linear and convex quadratic programming and its power series extension, Mathematics of Operations Research 15 (1990) 191-214. [22] W. T. Nye and A. L. Tits, An application-oriented, optimization-based methodology for interactive design of engineering systems, Internat. J. Control 43 (1986) 1693-1721. [23] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [24] E. R. Panier and A. L. Tits, A superlinearly convergent feasible method for the solution of inequality constrained optimization problems, SIAM Journal on Control and Optimization 4 (1987) 934-950. [25] E. R. Panier and A. L. Tits, Avoiding the Maratos effect by means of a nonmonotone line search I. general constrained problems, SIAM Journal on Numerical Analysis 4 (1991) 1183-1195. [26] E. R. Panier and A. L. Tits, On combining feasibility, descent and superlinear convergence in inequality constrained optimization. Mathematical Programming 59 (1993) 261-276. [27] E. R. Panier, A. L. Tits and J. N. Herskovits, A QP-free globally convergent, locally superlinearly convergent algorithm for inequality constrained optimization, SIAM Journal on Control and Optimization 4 (1988) 788-811. [28] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Mathematics of Operations Research 18 (1993) 227-244. [29] L. Qi, Superlinearlvy convergent approximate Newton methods for LC1 optimization problems, Mathematical Programming 64 (1994) 277-294. [30] L. Qi and Z. Wei. Constant positive linear independence, KKT points and convergence of feasible SQP methods, AMR 97/4, Applied Mathematics Report, University of New South Walqs, 1997. [31] R. T. Rockafellar and R. J. -B. Wets. A Lagrangian finite-generation techniques for solving linear-quadratic problems in stochastic programming, Mathematical Programming Study 28 (1986) 63-93. 21

[32] R. T. Rockafellar and R. J. -B. Wets, Linear-quadratic problems with stochastic penalties: the finite generation algorithm, in: V. I. Arkin, A. Shiraev and R. J. -B. Wets, eds., Stochastic Optimization Lecture Notes in Control and Information Sciences 81, SpringerVerlag, Berlin, 1987, pp. 55-69. [33] S. M. Robinson, A quadratically convergent algorithm for general nonlinear-programming algorithms, Mathematical Programming 3 (1972), 145-156. [34] S. M. Robinson, Generalized equations and their solutions, part II: Applications to nonlinear programming, Mathematical Programming Study 19 (1982) 200-221. [35] D. M. Topkis and A. F. Veinott, On the convergence of some feasible direction algorithms for nonlinear programming, Journal on SIAM Control 5 (1967) 268-279. [36] J. Wang, Approximate nonlinear programming algorithms for solving stochastic programs with recourse, Annals of Operations Research 30 (1991) 371-384. [37] R. J. -B. Wets, Stochastic programming: Solution techniques and approximation schemes, in: A. Bachem, M. Grotschel and B. Korte eds., Mathematical Programming, The State of the Art - Bonn 1982 Springer-Verlag, Berlin, 1983, pp. 566-603. [38] Y. Ye and E. Tse, An extension of Karmarkar's projective algorithm for convex quadratic programming, Mathematical Programming 44 (1989) 157-179. 22