A UNIFIED APPROACH TO CONVERGENCE PROPERTIES OF SOME METHODS FOR NONSMOOTH CONVEX OPTIMIZATION John R. Birge Department of Industrial & Operations Engineering The University of Michigan Ann Arbor, Michigan 48109-2117 Liqun Qi and Zengxin Wei School of Mathematics University of New South Wales Sydney, NSW 2052 Australia Technical Report 95-28 November 1995

A Unified Approach to Convergence Properties of Some Methods for Nonsmooth Convex Optimization 1 JOHN R. BIRGE Department of Industrial and Operations Engineering University of Michigan, Ann Arbor, MI 48109, USA LIQUN QI and ZENGXIN WEI School of Mathematics, University of New South Wales, Sydney NSW 2052, Australia November 10, 1995 Abstract. Based on the notion of the c-subgradient, we present a unified technique to establish convergence properties of several methods for nonsmooth convex minimization problems. Starting from the technical results, we obtain the global convergence of: (i) the variable metric proximal methods presented by Bonnans, Gilbert, Lemarechal and Sagastizabal, (ii) some algorithms proposed by Correa and Lemarechal and (iii) proximal point algorithm given by Rockafellar. In particular, we Rrove that the Rockafellar-Todd qhenomenori does not occur for each of the above mentioned methods. Moreover, we explore the convergence rate of {(11klI} and {f(Xk))} when {JXk} is unbounded and {f(Xk)} is bounded for the nonsmooth minimization methods (i), (ii) and (iii). Key Words. Nonsmooth convex minimization, global convergence, convergence rate. AMS Subject classifications. 90C25, 90C30, 90C33. 1. Introduction To establish convergence of algorithms for convex minimization, a usual assumption is, at least, the existence of a minimum. This assumption has been removed for some methods [11, 9, 8, 10, 17, 6, 15, 16]. The study of the minimizing sequence was pioneered by Auslender, Crouzeix and their colleagues [3, 2, 1], the relation between minimizing and stationary sequences of unconstrained and constrained optimization problems has appeared recently, (see [5]). Similar results for complementarity problems and variational inequalities appeared in [7]. These papers were motivated by examples that presented in 1The research is supported by the National Science Foundation under Grant DDM-9215921 and the Australian Research Council. 1

Rockafellar [12] and Todd [14], which show that in general a stationary sequence is not necessarily a minimizing sequence. Todd's example has the following properties: * h: Rn -- R is convex and continuously differentiable. * The sequence {h(xk)} is monotonically decreasing and limk,,o Vh(xk) = 0. * limk,, h(xk) > infxeRn h(x). We call the above phenomenon the Rockafellar-Todd (RT) phenomenon. Since most optimization algorithms produce a sequence {Xk} that is only stationary, i.e., limk,,O Vh(xk) 0, it is therefore important to know what kind of algorithms generate such sequences that are minimizing, i.e., limko h(xk) = the infimal value of h. The purpose of this paper is to propose a general model algorithm for minimizing a proper lower-semicontinuous extended-valued convex function f: Rn - RUj{oc and to establish the convergence properties without any additional assumption on f. We shall focus on two aspects: the RT phenomenon and the convergence rates of {I xkll} and { f(xk)} when {Xk} is unbounded and {f(xk)} is bounded from below. These two issues have not been discussed in the literature. Let x211x2 denote the Euclidean norm of the vector x E Rn. The subdifferential of f at x is a nonempty convex compact set Of(x) = {g: g E Rf(y) > f(x)+ < g,y - x>, for all y C R}. (1.1) For any c > 0, let eff(x) = {:g Eg R/, f(y) > f(x)+ < g,y - x > —, for all y C Rn}. (1.2) We are interested in estimating the e-minimum value of f, say.f, and also in identifying a c-minimum point, for example, x*, where f(x) > f - c, for all x C Rn, (1.3) and f(x) > f(x*)- c, for all x: Rn. (1.4) Let f* = fo, the infimal value of f, R+ = {a R: a > 0} and R+ = R+ U{0}. With the above notation, we may now state the method in detail: Algorithm 1: Let xl C RA be given. At the kth iteration, given Xk e Rn, generate (E61k, 62,k, tk, Xk+l, gk+ 1) e R0+ x Rx+ x R+ x x R7 and gk+l C 0,2kf(xk+l) satisfying the following inequality: f(xk+l) < f(xk) - tk < gk+1,gk+1 > +l,k. (1.5) 2

The remainder of the paper is organized as follows. In Section 2, we give some basic global convergence results for Algorithm 1 without any additional assumption on f. We in particular give a sufficient condition for avoiding the RT phenomenon. In Section 3, we discuss the convergence rate of Algorithm 1. In Section 4, we demonstrate that a number of methods for convex optimization problems are special cases of Algorithm 1. In addition to results on the convergence rates of {||xk |} and {f(xk)}, a class of descent algorithms for minimizing a continuously differentiable function is studied in [16]. 2. Global Convergence Theorem 2.1 Let (61,k, e2,k, tk, Xk+, gk+1) be any sequence generated by Algorithm 1, Ek i c1,k < +0 and Ek=l t = +-0. (i) Either limk o f(xk) = -oo or liminfk,,o \9gk\ = 0. In particular, if {xk} is bounded, then every accumulation point, x* C Rn is an e*-minimum point off, f. = f (x*) and limkoo f(xk) = f,i where * = sup{ limsup 62,k-1 K1 is an index set such that lim l9gk\l = 0}. (2.1) kEKl,k-coo kE ~1,k —oo (ii) If inf{tk} > 0, then either limkco f(xk) =-o or ||g|| -. In this case, every accumulation point of {xk} (if one exists) is an e*-minimum point of f. (iii) If 62,k -+ 0 and there exists a sequence {mk} with mk > m > 0 such that for all large k, 7mk\Xk+ l - xkl < tk lgk+li\, (2.2) then f(xk) -+ f*. Furthermore, if for all large k, Xk+l - Xk = tkgk+l, (2.3) then {xk} converges to a minimum point of f if one exists. Proof: It is clear that either limk,' f(xk) = -0o or {f (k)} is a convergent sequence. (i) Suppose that f(xk) is bounded from below. From (1.5), we obtain 00 Y[f(xk+l) - f(Xk)] > -00 k=l which implies that lim inf Igkl11 = 0. (2.4) k-oo Thus, there exists an infinite index set KI1, such that limkEKl,k —oo \lgkll = 0. If {xk} is a bounded set, then {xk} has at least one accumulation point. Since {xk: k C IK1} is also a 3

bounded set, without loss of generality, we may assume that limk-K1,k-oo [Xk - x*I = 0. Applying the c-subgradient inequality, f(x) > f(xk)+ < gk, - Xk > -E2,k-1, (2.5) we have that, for all x E R, f(x) > f(X*)- e* This implies that f(x*) = f*. Conclusion (i) follows from f(xk) - f(x*). (ii) Suppose that f(xk) is bounded from below. Using (1.5), we have lim |gklI = 0. (2.6) k —+ oo Let {xk: k C K } be any convergent sequence of {Xk}, limkEKk' —0o HXk - xll = 0. For these {Xk: k CE K}, by (2.5) and (2.6), we have that x* is an c*-minimum point of f. (iii) From (i) of this theorem, it sufficies to consider the case that {f(Xk)} is bounded and {Xk} is unbounded. Suppose that there exist x E Rn nT > 0 and ko such that for all k > ko, < gk, - Xk > < -r. This inequality and (1.5) imply that f(xk+l) - f(xk) < -tklgk+1 2 + l1,k < tkllgk+ 11 <gk+lX-Xk + 61,k K Itk k+i -k+II+ll1,k Therefore, we have that for all k, [Igk"1 11 f(Xk+l) - f(xk) < -Ttk 1 + Ek (2.7) X-ll Xk+l' iFrom (2.2), k k [[Xk+1 -X- 1l < IlZ I+1 -x 1< r-1 Zt m1||+111, i=1 i=l which implies that 00 Ctkllgk+l11 = +O (2.8) k=l by using the unboundedness of { Xk| }. Therefore, there exists k1, such that for all k > k1, k k ||xk+1 - || < |~i - || + Z Ixi+I1 -x i|| < 2mn-1 t|llg+i. (2.9) i= i=1 4

iFrom (2.7) and (2.9), we have M dT tkllgk+ lll (. 0) f(Xk+l)- f(Xk) <- +,. (2.10) This inequality, (2.8) and Lemma 3.1 given in [17] yield that 00 [f (k+l) - f(Xk)] = -00, k=l which contradicts that {f(xk)} is bounded from below. Therefore, for all x C Rn, limsup < gk,x - Xk > 0. (2.11) k —-oo The e-subgradient inequality, f(x) - f(xk) > < k,X - Xk > -62,k-1 and (2.11) yield that for all x C Rn, f(x) > limsupf(xk) > fo. k- oo This implies that f(xk) -g fo, which completes the proof of the first conclusion of (iii). We now prove the second conclusion of (iii). It is easy to verify that for all x e Rn and all k: \Xk -x|2 = Xklk+i -x|2 + \Xk - xk+\2 +2 < Xk-Xk+1, Xk+1 - >. (2.12) i From (2.3) and the definition of gk+l, for all large k, < Xk -- Xk+l,k+ 1 - X > atk[f(Xk+l) - f(x) - 62,k]. Combining it with (2.12), we have, ||Xk - X2 > ||k+ -X X ||2 + a2tkllgk+l2 + 2atk[f(xk+l) - f(x) - 2,k] (2.13) Using any given minimum point x* in (2.13), we have 1k - X12 > I|xk+ - x2 - 2atkC2,k. This implies that {xk} is bounded and the existence of lim xk - x*112 = I < +00. koo Suppose there are two accumulation points x1 and x2 of {Xk}. From the first part of this theorem, x1 and x2 are minimum points of f. As above, we have the existences of lim \xk - x:r112 = li < +oCo, for i = 1,2. k- oo This implies that i = O for i = 1,2 and xl = x2. Therefore {Xk} has a unique accumulation point. O 5

3. Local Convergence In this section, we discuss the convergence rate of Algorithm 1 in the following two cases: Case 1: x* minimizes f and limk+oo k = *. Case 2: A global minimizer of f does not exist but infeR,1f(x) > -oo. In this case, {11xkl|} is unbounded and fo > -o. Let ak,k bk 62,k g = k+ 112' IIgYk+111 and Ck (tk ak) f(xk)- f(x*) Ck = (tk - kk) ----- (bk + lXk+l - X*11)2' Theorem 3.1 Suppose that inf{tk} > 0, x* minimizes f and limk X oo = x*. If, for all k, tk > ak, then f(xk+l) - f(x*) )2 - (3.1) f(xk) - f(x*) 2 2 Consequently, (i) if for all k, Ck > c* E (0, +o), then f(xwk+l) - f(x I 1 1 lim f(Xk+) - f(x*) < A + (*)2 <.,k-_o f(xk) -f(x*) 2 (ii) if limkEK,k-o Ck = +, then f(Xk+l) - f(x*) lim = O. kK,k —+oo f(xk)-f(x') Proof: Using gk+l C &2,kf(xk+l), we have f(x*) > f(xk+l) + gk+1(X* - Xk+l) - 62,k > f(Xk+l) - I|gk+l 11|| - Xk+l| - bk|Igk+ll|| This implies,k+ f(x (k+l) -f(x*) -bk + lXk+l - X*1 which combined with (1.5) yields f(xk) > f(xk+l) + tklgk+11|2 - l,k = f(Xk+l) + tk|Jgk+lI2 - ak lgk+1112 > f(xk+,l) + (bk+lIkilr*)2 [f(xk+l) f(x )]2 Hence (tk - ak)(f(xk+l) - f(x*)) f(xk) - f(x*) > [f(xk+1) - f(x*)][1 + (k (Xk+1) - *)) 6

Therefore f(xk+l)-f(x*) 1 f(xk) - f(x*) - 1~ f(k+i)-f(X*) (tk-ak)(f(Xk)-f(*))' f Xk) fx ) 1+ f(2k)- f( *) (bk+llxk+) -x*11)2 Thus, (3.1) follows. Since 1 + ( t)2 - t ( > 0) is a decreasing function, by (3.1), we have conclusions (i) and (ii). The following theorem extends the related result of [16] for smooth optimization to the case where f is only a proper lower-semicontinuous extended-valued function. Theorem 3.2 Suppose that {Xk} is generated by Algorithm 1 with e2,k -- 0 and kZl v/cik < +cx. Suppose that (2.2) holds and {tk} is a bounded set. If {xk} is unbounded and {f(xk)} is bounded, then the rate of /f(xk)- fo converging to zero is less than geometric. Furthermore, {I1 } is bounded. Proof: From (2.2), we have for all k ||Xk+1 - Xl| < Z X1 x+l - Xi| 1< +1- 11 T = 11 1 tig+1 - 11, which implies that OC lltkg k+ +, ( 3.2) k=l using that {Xk} is unbounded. Since for all k, f(xk) > fo and {tk} is bounded, we obtain from (1.5) that fo - f(xk) < f(xk+l) -f(xk) < -tkll1k+l11| + lk < - (tkllgk+l 1)2 + (l,k. Hence d.f(xk) - fo > max{0, (tklgk+lll ) - 1ik}. This inequality and our assumptions on tk and l1,k yield z f f(xk)-f^=+oo, k=l 7

which implies that f/f(xk) - fo cannot converge to 0 with geometric rate. We now prove the second part of the theorem. From (1.5) and (2.2), f(xk+l) -f(xk) < -tk- gk+l2 + 6E,k < -Xk+ - l Xk1+ i + 61,k, which implies 2 k k f(xk+l) - f(Xl) ~ -inf{:i = 1...,k, C} ] lx+i - xi|2 + ei,i. ti i=l i=1 On the other hand, [Zk+1 -- 112 < (Etk1 Xl+i -Zi11)2 -< k 2 liX^+l kX> 2. Hence we obtain the following inequality by combining the above two inequalities k <:, IIk+l - x112 k f(xk+i) - f(xi) < - inf{ i - +,...,,} - l i=I which implies that { IlIk+l-1 }} is bounded since {f(xk)} is bounded from below. Therefore, {k } is bounded. D 4. Applications In this section, we demonstrate that a number of methods for convex optimization problems are special cases of Algorithm 1. These contain * A family of variable metric proximal methods proposed in [4]. * The methods for convex minimization given in [6]. * Proximal point algorithms introduced in [13]. Example 4.1 A family of variable metric proximal methods [4]. WI 41, the a3uthrs QpropQosed a famity of variable metric proUximal algorithms based ou the Moreau-Yosida regularization and quasi-Newton approximations. Given x G R" and a symmetric positive definite n x n matrix B, let y5B(z):= f(Z) + - < B(z - x),z - x >, (4.1) 8

x = PB(x) argmin{B(z): z RN}, (4.2) 2 6 = f -f -2< 9,Wkgp >, (4.3) where Wk = B1, g E 0 f(xp). With the notation in (4.1)-(4.3), we can state the algorithm of [4] as follows: Algorithm 4.1: (GAP of [4]) Step 0: Start with some initial point x1 and matrix B1; choose some parameter mo E (0, 1); set k=l. Step 1: With 8k given by (4.3), compute Xk+1 satisfying f(xk+l) < f(xk) - mo6k. (4.4) Step 2: Update Bk, increase k by 1 and loop to Step 1. Lemma 4.1 Suppose that (xk+1,gp) is generated by Algorithm 4.2. Let 62,k = max{0, [f(xk+l) - f(xk)- < g, Xk+1 - Xk >] + [f(Xk) - f(x)- < g, Wkg' >]} (4.5) then gA C a2,k f(Xk+l) (4.6) and M0 p P f(Xk+1) < f(Xk)- - < g, Wkgk > (4.7) Thus, (1.5) holds with tk = lAmrin(Wk), where An,,(W) denotes the smallest eigenvalue of a symmetric matrix W. Proof: Using gpC E 9O(xp), we have for all x R1N: f(Z) > f(Xk)+ < gkx-xk> f(xk+l)+ < g,x- Xk+l > -[f(xk+l) -f(xk)- < g,k+l -Xk >] (4.8) -[f(xk) - f(X)- < gk, k - X >]. On the other hand, from the definitions of xP and gP, we have P =Xk - Wk^g. (4.9) Substituting (4.9) into (4.8), (4.6) follows. 9

Since for all x E RN, 1 1 f(x) + 2 < Bk(-k),x - xk >< f(x) + 2 < Bk(x -- k), - Xk >. Setting x = xk, we have f(xpk) < f(xk)- < gk, WkgP > (4.10) by (4.9). Relations (4.10), (4.3) and (4.4) imply (4.7). D Conclusion (a) in the Theorem 4.1 is the global convergence result of [5], while conclusion (b) is a new one for this algorithm. Indeed, the assumptions in (b) can be satisfied by viewing (qN-AP3) (or BFGS-AP3) and the Remark 4.1 of [4]. Theorem 4.1 (a) (Theorem 2.3 of [5]). Assume that f has a nonempty bounded set of minima, and let {Xk} be a sequence generated by (GAP). Then {Xk} is bounded and, if 00 E Anin(Wk) = +00 (4.11) k=l any accumulation point of {xk} minimizes f. The same properties hold for the sequence of proximal points {xP}. It also holds that liminfko0 ||I9| = 0. (b) Suppose there exists tk such that for all large k, Xk+1 = Xk + tk(Xk - Xk), (4.12) where tk < t< +00oo. If {\\Wk\l} is bounded and (4.11) holds, then f(xk) - fo*. Proof: (a) Since f has a nonempty bounded set of minima, the level sets of f are bounded. Hence, {xk} and {xk} are bounded by (4.7) and (4.9). (In fact, this conclusion follows due to [5, Theorem 2.3].) Using (4.4), we have 6k - 0. (4.13) iFrom (4.7), we obtain < gk, WkgP >-+ 0. (4.14) Hence, f(Xk) -f - < g Wkg >= k- < g, Wkg >. (4.15 Results (4.13), (4.14) and (4.15) imply that if limkeK |gP| = 0, then lim'kei E2,k = 0. i.From the definition of e*, we have e* = 0. Let tk = 2AnmA(Wk), then (4.11) and (i) of Theorem 2.1 yield the first conclusion. 2From (4.13) and (4.14), we have lim f(xk) = lim f (xp). k- oo k- -oo 10

This implies that every accumulation point of {4x} minimizes f by the first conclusion. Using (4.7) and (4.11), we have liminfk,,O Ils1l = 0. (b) From (4.12) and (4.9), we have, for all large k, that 11t Xk+1 - Xkll < IIg119, t_ let-ll ^ Il, which implies that f(xk) -- fo by using the assumptions in (b) and result (iii) in Theorem 2.1. In fact, we only need to note that we can let mk = f and let tk = 1 for all large k. E Theorem 4.2 Suppose that the assumptions of (b) in Theorem 4.1 hold. If {xk} is unbounded and fo > -oo, then the rate of /f(xk) - fo converging to zero is less than geometric and { IIe1} is bounded. Proof: The conclusions follow Lemma 4.1, Theorem 3.2 and (b) of Theorem 4.1.: Example 4.2: Algorithms given in [6]. In [6], Correa and Lemarechal presented a simple and unified technique to establish convergence of a number of minimization methods. These contain (i) the exact proxiteration, (ii) its implementable approximations, which include in particular (iii) bundle methods, and finally (iv) the classical subgradient optimization scheme. Their methods can be summarized as follows: Algorithm 4.2: From an arbitrary point xl C Rn, the sequence {Xk} constructed with the following formulas: Xk+l = Xk - Tktk, (4.16) k e 063,kf(xk), (4.17) f(xk+l) < f(xk) - mTkr||72, (4.18) where 63,k is nonnegative, Tk > 0 is the stepsize and m is a positive constant. The following lemma shows that Algorithm 4.2 is a special case of Algorithm 1. Lemma 4.2 Suppose that (e3,k,Tk,Zk+l,-yk) is generated by Algorithm 4.2. Let tk = mrk, (4.19) e2,k = max{0, E3,k + (1 - m)k||7yk||2}. (4.20) 11

Then 7Yk E ae2,kf (Xk+l) (4.21) and (1.5) holds for any l,k > 0. Proof: It sufficies to prove that (4.21) holds. By (4.17), we have for all x E Rn, f(x) > f(xk)+ <,k,- Xk> -E3,k = f(xk+l)+ < k,x - Xk+l > +f(xk) - f(xk+l)+ <'k, Xk+l - Xk > -3,k. This inequality, (4.16) and (4.18) imply that f(x) > f(xk+l)+ < k,x - Xk+1 > -[C3,k + (1 - m)7k|11k2]. So (4.21) follows. o The following is a main result of [6]. Theorem 4.3 (Proposition 2.2 of [6].) Suppose that (C3,k, k,xk+1, Yk) is generated by Algorithm 4.2. (i) Assume that 00 Z rk +00, (4.22) k=l E3,k -+ 0, (4.23) then f (k) - ft. (ii) If {Tk} is bounded and 00 E 63,k < +00, (4.24) k=l then {Xk} converges to a minimum point of f if there is such a minimum point. Proof: (i) If the decreasing {f(xk)} tends to -oo, then the conclusion follows. Otherwise, from (4.18), we have 00 Z'Tk11k 2 < +00. (4.25) k=l This and (4.20) imply that {3,k} tends to 0 if and only if {e2,k} tends to 0. So the conclusion follows by Lemma 4.2 and (iii) of Theorem 2.1. (ii) Suppose f has a minimum point. Then {f(xk)} is bounded from below. Thus, (4.25) holds. Inequalities (4.25) and (4.24) imply that 0o E tk2,k < +00 k=l 12

by the boundedness of {Tk} and (4.19). The results of (iii) in Theorem 2.1 imply that { k} converges to a minimum point of f. o Note that from (4.16) and Theorem 3.2, we obtain the following new convergence rate for Algorithm 4.2. Theorem 4.4 Let (C3,k, Tk, Xk+l, 7k) be generated by Algorithm 4.1 with (4.22) and (4.23). If {Tk} is bounded, {Xk} is unbounded and fo > -oo, then the rate of 7f(xk)- f converging to zero is less than geometric and { I1 } is bounded. Example 4.3: A proximal point algorithm introduced in [13]. In [13], Rockafellar introduced two general criteria for finding the zero of an arbitrary maximal monotone operator when the iteration points are given approximately. As an application, he applied the results to a lower semicontinuous proper convex function f. In this case, one of the algorithms follows: Algorithm 4.3: For Xk, generate (k, Ak,Xk+1,gk+l) E R+0 x R+ x Rn x Rn (gk+l C af(xk+l)) satisfying dist(0, Sk(Xk+l)) < T —Xk+l - Xk\1, (4.26) Ak where 00 E k < 0, (4.27) k=l and Sk() = af(x)+ (x- k). (4.28) Ak In the following discussion, we only assume that for all k, k E [0, ]. (4.29) Lemma 4.3 Suppose that (ak, Ak,xk+1,gk+1) e R+o x R+O x R+ x Rn x Rn is generated by Algorithm 4.3. Let tk ( )k, (4.30) Then (1.5) holds for l,k = C2,k = 0 Furthermore, (1 - Uk)2 (1 -+ O)2Xk+1 - Xkll tk||gk+1l|. (4.31) 13

Proof: By (4.26), (4.29) and (4.28), we have 1|k+1 + t (Xk+l - Xk)l < k IlXk+l - k||. (4.32) Ak Ak The inequality, < gk+i + (Xk+l - Xk),Xk+l - Xk >< I|gk+1 + -(Xk+l - Xk)\I\IXk+l - Xk[[, Ak Ak and (4.32) imply that < 9k+1,xk -k >< - Xk > k Xk+l - Xk11 Ak Therefore, < gk+1,Xk -Xk+1 >> ~ -lxk+l - xk112 (4.33) Ak On the other hand, by (4.32), we have Ilgk+lll 11 Ixk+1 - xkl. (4.34) Ak Inequalities(4.33) and (4.34) yield < gk+1, k - Xk+1 > > tk < gk+, gk+1 > (4.35) Applying the subgradient inequality for convex functions, f(xk) > f(xk+l) + tkllgk+l 2. Hence, (1.5) follows. iFrom (4.32), we have 1-k IXk+l - Xk _< 1Hgk+111, Ak which implies that (4.31) holds. D The following theorem indicates that the RT phenomenon does not occur for the well-known Algorithm 4.3. In view of Theorem 2.1 and Lemma 4.3, it does not require proof. Theorem 4.5 Suppose that {((k, Ak, Ak+, g+,k+)} is generated by Algorithm 4.3 with =l Ak = +00. (i) Either limk-oo f(xk) = -oo or liminfk+oo 11[k1 = 0. In particular, if {Xk} is bounded, then f(xk) -* fO and every accumulation point of {Xk} is a minimum point of f. 14

(ii) If inf{Ak} > 0, then either liminfk- o f(xk) = -oo or g9kIl - 0. In this case, every accumulation point of {xk} (if one exists) is a minimum point of f. (iii) f (k) -4 f*. (iv) If for all k, ck = O, then {xk} converges to a minimum point of f if such one exists. For Algorithm 4.3, we obtain the following two basic convergence rate results from Theorem 3.1 and Theorem 3.2. Theorem 4.6 (a) Suppose that x* minimizes f, Xk -+ x*, and there exist two scalars r > 0 and M > 0 such that for any x satisfying Ijx - x*1 < r, f(x) - f(x*) > Mxl - x* 12. (4.36) Then (al) If lim Ak = A* E (0, +oo), k —+oo then f(xk) tends to f(x*) linearly. (a2) If lim Ak = +00, k —oo then f(xk) tends to f(x*) superlinearly. (b) Suppose that Ek=, k = +00, {Ak} is bounded, {xk} is unbounded and fo > -oo. Then the rate of ~f(xk) - f converging to zero is less than geometric and {Ix} is bounded. Proof: We first prove (a). Since for all k, 1 O- f(xk)-f(x*) ck A kX c (1 + k)2 IlXk+l -*112 and f(Xk+l)- f(x *)< f(xk)- f(x*) Hence, we have, from (4.29), that 1 - Uk f(xk) - f(x*) f(xk+l)- f(x*) 2 (1 + Uk)2 f(xk+1) - f(x*) Ilxk+1 -_ xll2 - 9 which implies that (al) and (a2) hold by Theorem 3.1. 15

(b) From the definition of mk in (2.2), we have, for Algorithm 4.3, that (1 -Ok)2 1 (1 + (lk)2 - 9 by (4.29). This implies that the results of (b) hold by using Theorem 3.2. E It is worth noting that the conclusions (iii) in Theorem 4.5 and (b) in Theorem 4.6 are not contained in the convergence results given in [13]. Since Algorithm 4.3 is different from those using linear search to produce the next iteration Xk+1 = Xk + tkdk, where dk is a linear search direction at the kth iteration, it is surprising that we can easily obtain the same convergence properties for these two types of methods. We believe that the tool in this paper is useful in the convergence analysis for optimization problems under a unified framework. References [1] A. Auslender, Convergence of stationary sequences for variational inequalities with maximal monotone operators, Applied Mathematics and Optimization 28 (1993) 161-172. [2] A. Auslender, R. Cominetti and J.-P. Crouzeix, Convex functions with unbounded level sets and applications to duality theory, SIAM Journal on Optimization 3 (1993) 669-687. [3] A. Auslender and J.-P. Crouzeix, Well behaved asymptotical convex functions, Analyse Non-linemare (1989) 101-122. [4] J. F. Bonnans, J. C. Gilbert, C. Lemarechal and C. A. Sagastizabal, A family of variable metric proximal methods, Mathematical Programming 68 (1995) 15-47. [5] C. C. Chou, K. F. Ng and J. S. Pang, Minimizing and stationary sequences for optimization problems, manuscript, Department of Mathematical Sciences, The Johns Hopkins University, Baltimore, Maryland, August 1995. [6] R. Correa and C. Lemarechal, Convergence of some algorithms for convex minimization, Mathematical Programming 62 (1993) 261-275. [7] M. Fukushima and J-S. Pang, Minimizing and stationary sequences of merit functions for complementarity problems and variational inequalities, manuscript, Department of Mathematical Sciences, The Johns Hopkins University, Baltimore, Maryland, August 1995. 16

[8] 0. Giiler, On the convergence of the proximal point algorithm for convex minimization, SIAM Journal on Control and Optimization 29 (1991) 403-419. [9] K. C. Kiwiel, An aggregate subgradient method for nonsmooth convex minimization, Mathematical Programming 27 (1983) 320-341. [10] B. Lemaire, About the convergence of the proximal method, in: D. Pallaschke, ed., Advances in Optimization. Lecture Notes in Economics and Mathematics System, (Springer,Berlin, 1992), 39-51. [11] B. T. Poljak, A general methods for solving extremum problems, Soviet Mathematics Doklady 8 (1967) 593-597. [12] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ 1970. [13] R. T. Rockafellar, Monotone operators and proximal point algorithm, SIAM Journal on Control and Optimization 14 (1976) 877-898. [14] M. J. Todd, On convergence properties of algorithms for unconstrained minimization, IMA Journal of Numerical Analysis 9 (1989) 435-441. [15] Z. Wei and L. Qi, Convergence analysis of a proximal Newton method, AMR 95/33, Applied Mathematics Report, University of New South Wales, 1995. [16] Z. Wei, L. Qi and H. Jiang, Some convergence properties of decent methods, AMR 95/34, Applied Mathematics Report, University of New South Wales, 1995. [17] S. Q. Wu, Convergence properties of descent methods for unconstrained minimization, Optimization 26 (1992) 229-237. 17