ON THE PRIMAL-DUAL AFFINE SCALING METHOD Romesh Saigal Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109-2117 Technical Report 94-10 March 1994

On the Primal-Dual Affine Scaling Method Romesh SAIGAL Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, Michigan 48109-2117, USA March 31, 1994 Abstract We consider here the primal-dual affine scaling method with a fixed step size a to the boundary. We show that for all a < 1, the primal and dual objective function values are monotone and the generated sequence of primal and dual solutions converge to a solution on the boundary of the respective polyhedrons. Also, when a < 21, the limiting primal and dual solutions either converge to an optimum solution, or a non-optimum solution where no pair of variables satisfy the strict complementarity condition. The proof of this result uses a modified version of the potential function of Tanabe, Todd and Ye; and, Mizuno and Nagasawa used in the study of the potential reduction strategy for investigating this method. Key words: Linear Programming, prinlal dual affine scaling methods, interior point methods. Abbreviated title: Primal dual affine scaling method 0

1 Introduction We consider here the linear programming problem: minimize cTx Ax =b (1) x > O where A is an m x n matrix and b and c are appropriate vectors; and its dual: maximize bTy ATy + s = c (2) s>O We also assume that Assumption 1 The primal and the dual linear programs have interior solutions. Assumption 2 The objective functions of the primal, (1); and, the dual, (2), are not constant on their respective feasible regions. Assumption 3 The matrix A has rank m. In this paper we consider the primal-dual affine scaling method, and investigate its convergence properties. This method was presented in Section 3 of Monterio, Adler and Resende [9]. In their work they established its convergence for a small step size. Recently, NMizuno and Nagasawa [7] have developed a potential reduction method, and proved its convergence for a larger step size. The step size at each iteration is selected so that the potential function will not increase. In this paper, we investigate the convergence properties of this method. We show that the objective functions are monotone along the generated sequence. This result has also been obtained by Todd, Tuncel, and Mizuno [13]. Also, the sequences are shown to converge for an arbitrary choice of step size, a < 1. When the fixed step size a is strictly less than 2, we show here that the sequences converge to the respective optimum solutions or to nonoptimum solutions for which no pair of variables satisfy the strict complementarity condition. It is generally believed that some "centering" is needed in the primal-dual methods. Our 1

result does not contradict this. We give two proofs of the main result in the hope that these will help in the construction of an example to demonstrate this need. Besides the introduction, this paper has 6 other sections. In section 2 we present the primal-dual affine scaling method. In section 3 we prove the convergence of the primal sequence, while in section 4 we investigate the dual sequence. In section 5, we prove some important properties of the primal-dual sequences, and in section 6 we prove that if the step size is required to be strictly less than 1, the sequences converge to optimum solutions or to non-optimum solutions for which no pair of variables satisfy the strict complementarity condition. Finally, we conclude the paper with a conclusions section and an appendix presenting another proof of the main theorem of section 6. 2 The Primal Dual Affine Scaling Method From the complementary slackness theorem, vectors x, and y, s are optimal for the primal (1) and the dual (2) respectively, if and only if the following conditions hold: Ax = b ATy + s = c (3) Xs =0 (X > 0) (s > 0) At a given interior point x > 0 of the primal and (y, s), s > 0 of the dual, we d:rive the primal-dual affine scaling step by applying Newton's method to the nonlinear sy-stem (3). The Newton step (Ax, Ay, A.s) is generated by solving the following system: AAx = 0 ATAy + As = 0 (4) SAx XAs = -Xs The primal-dual affine scaling method that thus results follows: Step 0 Let (x0,y0, s) be an interior point solution, 0 < < 1, and let k = 0. Step 1 Compute the direction (Axk, ayk, Ask) by solving the system (4). If Axk = 0, Ayk = 0 and Ask = 0 then stop. The last solution found is an optimum solution. 2

Step 2 Minimum Ratio Test: Define Ok = max{f(-Xk Ax ), k (-Skl/Ask)} where q(u) = maxjuj for a given vector u. Step 3 Next Interior Point: k+1 = a a+ Ok k+l = k + Ayk Y - - Ok k+1 = + Ask Ok/ Step 4 Iterative Step: Set k = k + 1, and go to Step 1. We now show that the minimum ratio test in Step 2 is well defined. Theorem 1 Let Axk # 0 and Ask # O. Then bk > 1. Proof: From system (4), it is easily established that (AXk)TAsk = 0. Thus, under the hypothesis of the theorem, there is a j such that AxzjkAs < 0, and so Ok > 0. Also, using the system (4), we obtain x + Axk = -SXkAsk k + As = -X71SkAxk and thus (xk + Axk)(sk + As.) = Azks < 0. Thus, there is a 0 < Lk < 1 such that xk + [lkAzxk > 0 and sk + P/kAsk > 0. Our result follows by noting that k- = /k. Since the conditions of Theorem 1 are satisfied at non - optimum solutions of the pair of linear programs, we will henceforth assume that the algorithm generates an infinite sequence satisfying the conditions of the above theorem. 3 The Primal Sequence {xk} Define a diagonal matrix D = S-'X. Thein the system (3) can be written as AAx = O ATAy + As = 0 (5) D-Ax As = -s 3

which is readily seen equivalent to the following system: -c - D-1Ax +ATy = 0 (6) AAx = 0. where y = y + Ay. The system (6) represents the K.K.T. conditions of the following convex quadratic programming problem: minimize cTAx + AxD-lAx( (7) AAx = 0 where y is the Lagrange multiplier associated with the constraint of the quadratic program (7). Also, consider the following ellipsoidal approximating problem: maximize -cTv Av =0 (8) iID-~2vjl < I ID-"v K 1 We now prove a simple lemma: Lemma 2 The direction \x* generated by the quadratic program (7) is a positively scaled multiple of the direction v* generated by the ellipsoidal approximating problem (8). Proof It is readily seen that the solution to the quadratic program (7) is y = (ADAT)-lADc Ax* = -D(c-ATi) and the solution to the ellipsoidal approximating problem is y = -(ADAT)-ADc = - D(c + ATy') lID (c + ATy') and we have our result. U We now show that the sequence {CTxk} is strictly monotone decreasing. This result also appears in Todd, Tuncel and Mizuno [13]. Theorem 3 The sequence {cTxk} is strictly monotone decreasing and is bounded, and thus converges, to say c*. 4

Proof: Note that cTxk+l CT k = c+ cTAxk and cTAx = -cD (I- DzA (ADkA )-AD )D2c = -IIPkD2cl - -IDi (c —ATk) 12, 1 where Pk is the projection matrix into the null space TZ(AD 2). From Assumption (2), c 1 does not belong to the row space RT(AT) of A, and so D2c does not belong to the row space 1 1 1 7(Dk AT) of AD; thus PkDj c is not zero, and we are done. U Before we prove the convergence of the primal sequence we will need the following result related to the ellipsoidal approximating problem (8). Theorem 4 There exists a constant p > 0 such that for every k = 1,2,... 11Vk11 -pcTvk where vk is the solution to (8) with D = S-Xk. Proof: See Corollary 6, Saigal [10]. A \Ve are now ready to show that the sequence {xk} converges. Theorem 5 The sequence {xk} converges, say to x*. Proof: From Step 3, Lemma 2 and Theorem 4 we note that for each k = 1,2, *, there is Ilk > 0 such that k k+1 l k Xk _- x = -x-tkV. Thus 00 T 100 1 00 Oc > co * - cT(zk - xk+) = - kCTk > -E 11kV = -E lI 1 X k=0 k=0 P k=O P k=0 and thus the sequence {xk} is a Cauchy sequence and converges, say to x*. U We now prove an important result that is needed in the proof of convergence to optimality. 5

Theorem 6 There exists a > 0 such that for eachj such that x = 0, and everyk = 1,2,-, cTxk _ * 4X >6. xj Proof: Let k be arbitrary. Then, from Step 3, Lemma 2 and Theorem 4 00oo -l-{ 1 ~~ 1 0 1 > cTk —C* - T(xk+l+l k+-l) > _ (x+l+l- k+l), 1 [ - 11 k+l+l l) 1 - i l=k P 1=0 P 1=0 P and our result follows. U 4 The Dual Sequence ({yk} {sk)) From system (4), it is readily seen that y = (ADAT)-lb. Ay can also be obtained by solving the following quadratic program: maximize bTAy (9) AyTADATAy < 1. The following theorem relates the solution vector of this quadratic program to its objective function value. Theorem 7 There exists a positive constant, p > 0, which is only determined by A and b, such that IIAyll < PtTA.Y. Proof: See Theorem 5, Saigal [10]. e We can now prove that the dual objective function is strictly monotone increasing on the sequence {yk}. This result also appears il [13]. Theorem 8 The sequence {bTyk} is strictly monotone increasing and thus converges, say to b*. 6

Proof: The result follows from the fact that bT(yk+l _ yk) = bT/yk = bT(ADAT)b 9k Ok and that ADAT is a positive definite matrix and from assumption (2), b $ 0. Our result now follows since the sequence {bTyk} is bounded ( by the weak duality theorem ) and every bounded monotone sequence converges. U We are now ready to show that the dual sequence also converges. Theorem 9 The sequences {yk} and {sk} converge, say to y* and s* respectively. Proof: From Step 3 and Theorems 7 and 8, we see that oo 00 al 1 k 00 0o > b* -b Ty = b(yk+ -yk) = -bT/yk > E - Ay = - +1 -Y I k= k=O k k= P Pk= and thus {yk} is a Cauchy sequence and thus converges, say to y*. But sk = c- ATyk, and thus {sk} also converges, to s* = c - ATy*. The following theorem plays an important role in proving convergence to optimality. Theorem 10 There exists a 6 > 0 such that for each j with s* = 0, and every k = 1,2, b* - bTyk > 6. 3 Proof: Follows exactly as the proof of Theorem 6 and the fact that Ilsk - s* < IAT(y* - yk)ll < KlllkY - Y*ll 5 The Primal-Dual Sequences We have seen that there exist vectors x* > 0. y* and s* > 0 such that k * yk ~ y* X- — ) s Define B = {j:x>0O} N = {j:s >0}). (10) 7

We do not assume here that B n N = 0 nor that B U N = {1,.-, n}. We will obtain some important results in this section. These will be used in the next section to establish the optimality of the limit points. We now prove two simple lemmas: Lemma 11 The sequence {(xk)Tsk} is monotone decreasing. Proof We note that using system (4) and Step 3, we obtain (k+l)Tk+l = (Xk +-Axk)T( + A k) Ok Ok = (1-( )-)(x ()) <Ok and our result follows from Theorem 1. X Lemma 12 For every k = 1,2,.. 1 1 X11' AL2 ~2 (x11 < k)Ts Proof: This lemma readily follows from the following facts: for Dk = (XkSk1), from system (4), we obtain DklAxk + DkAs = -(XkSk)e, and (D/klAk)T(DkAsk) = (Axk)Tsk 0. Thus IDklAxkll < II(XkSk)ell -= /(.k). \We are done, as the other result follows by the same argument. U We now establish an important property of the sequences {AXk}, {Ask} and {(xk)Tsk}. This theorem also appears in Ye, Guler, Tapia and Zhang [16]. Theorem 13 There exist constants /31 > 0 and /2 > 0 such that for all k = 1, 2, *. IIAxk11 ~< i (xk)Tsk IlASi| < /:2(Xk)Ts. 8

Proof: From definitions and weak duality theorem we derive the following inequalities: (xk)TSk cTxk_ bTyk > CTXk _ CTx* = (* - ATy*)T(xk _ X*) (s ()T4 - (*)Ts. From Lemma 11 (xk)Tsk > (x*)TS*, Thus, there is a i1 > 0 such that (xk)TSk > ILxk II. Also, from Lemma 12, 1 1 1 1 IIaxlAI ~ IIX2NSk2 kIIIIS JNrkSIAXvII ||AXN || ||k,N kI,N |||kN kN N I 1 1 2 llXk,Nel1 Sk,NXk,NAxN < 3(X (k)Sk and; from Theorem 4, for some p > 0, IIAxkH < -pcTAXk -p(s* + ATy*)TAxk = -p(s)TAXN < P|llS llllN|AX ||. For some /i > 0 we have our first result. The second result follows in a similar manner, with the use of Theorem 7. U The next theorem characterizes the convergence property of the sequences: Theorem 14 (xk)Tsk > 0 if and only if X-kA4, -e (12) SkAs >- -e. (13) Proof: Let (xk)Tsk O. Then B n N = 0. From Theorem 13, Axk - 0, and Ask - 0. Also, from system (4), Xk, Ax- + SkB/AB = -e. Thus xZ3 > 0 implies (13). Similarly, from (4), X-AX + SNyAs = -e. Thus sy > 0 implies (12). 9

We complete the proof by noting that if the conditions (12) and (13) hold, then AsN -k 0 and Ax -- 0. Thus, if there is a j E B n N then A4 Ask 5k which is a contradiction. U We now establish an important property: Theorem 15 For each j = 1,, n and k = 1, 2, 1 Ax Ask -1 < <1. -k j -j Proof: Let k be arbitrary. From system (4), for each j = 1,.., n Ar Ask k k S+ = -1. (14) Thus, either A? < 0 and As < O0; or, A^As < O. In the first case, by the definition of Ok, we have 0 > 3 > -1 andO> >-1 - kxj -kS - Thus their product satisfies the required inequality. To see the second case, without loss of Ask generality assume that Axc > 0. Then 0 > - >-1, and so 0 " 3 - O>kSk Axk 1 As. 0 < = - - <1 Okx Ok OkSj and we are done. U We now state a technical lemma on the natural logarithm function. Lemma 16 Let w an n vector and O < A < 1 be such that wj < X. Then n x, log(l - wLj)> -ew _ Eo1j=w ) - w 2(1 - A) Proof: See Lemma 8, Saigal [10]. U 10

6 On Convergence to Optimality We are now ready to investigate the limits x*, y* and s* as optimal solutions of their respective problems. We note that there is a j such that either x = 0 or sj = 0 or both. With such a j, we define the following potential function: Fr(x,,j) = (2r + )log(xTs) - log(xjsj). where r is a large positive constant. This function is a specialization of the one considered by Tanabe [12], Todd and Ye [14] and Mizuno and Nagasawa [7]. We can then prove the following about this function. Theorem 17 Let (xk)Tsk 7 0. Then F(xk, skj) - oo. Proof: Since x^s- = 0 from Theorems 5 and 9 X. S 0 3 3 and thus, under the hypothesis of the theorem, Fr(xk, Sk,j) cannot be bounded above. U W'e now establish our main theorem. Theorem 18 Let a < 2/. Then, there exist vectors x*, y* and s* such that 1..' -- 1' t k 2. ys - y.,?. 5& s 3, where txactly one of the following holds: 1..r' is an optinzum solution of the primal, and (y*, s*) is an optimum solution of the dual. 2. B = N, i.e., x* and (y*,s*) are non-optimum solutions where no pair of variables satisfy the strict complementarity condition. 11

Proof: The convergence of the sequences, and thus the existence of the required vectors follows from Theorems 5 and 9. Let a satisfy the hypothesis of the theorem, and let ak = - Also, let a a3 ( 1-a + 2(1 - a)( - a- a2) Either B = N or there is a j such that either x = 0 and sj > 0 or s = 0 and xX > 0. Let the potential function Fr(xk; skj) be defined for this j, with some r. Now x-k+-1 -+1 k k xk+1 + OA k) k Ok 3 First assume that xz = 0 and s* > 0. The analysis of the case when xJ > 0 and s= 0 is similar. Let r > 0 be sufficiently large such that there is an L > 1 and for every k > L, max{ Ij ((a), L)j ((a)} < 2r. Such an r exists from our choice of a and Theorems 9 and 13. Define Wk = k By a 3 3 simple substitution of above and Identity 11, we obtain Fr(Xk+l,k+l,j) - Fr(Xk,Sk,) = (2r + l)log(1 -a k) — log(l - ak + akWk) a2 = 2rlog(1 - ak) - log(1 + a Wk). 1 - cak Using the fact that log(1 - a) < -a and Lemma 16, we obtain Fr(xk+l.k+,j) - Ftr(k, sk j) < -27ra + k - k +-kI (15) F-or (l-(x S ()l) From Theorem 15 we have Iwkl < IIl I |Wk - I < and < 1 q~k sk2 Ok - S k and, from Theorem 1, ak < a. Thus, substituting the above expressions in (15), we obtain k+l,k+l j) - F(xksk,) < Qo(-2 - k +(1 -a 2) and for the chosen r, the expression in the bracket remains strictly less than zero. Thus, {Fr(XkSkj)} cannot increase, so from Theorem 17 (xk)Tsk - and our theorem follows. 12

7 Concluding Remarks It is generally believed that the primal-dual methods will not converge to optimality without "centering" steps. Our Theorem 18 does not contradict this. Another proof of this theorem is given in the appendix, which gives more insight into the non-convergence to optimality. Also, convergence to optimality follows readily if the sequence {Ayk} converges. So long as at least one pair of complementary variables satisfy the strict complementarity condition, convergence to optimality is guaranteed. Adler and Monterio [2] show that for an infinitesimal step size, Part 2 of Theorem 18 cannot occur. This paper is written in the hope that strong believers in centering will look for an example where convergence to optimality for large steps sizes does not occur. 8 The Appendix We now give another proof of Theorem 18, which does not make use of a potential function, and gives further insights into non-convergence to optimality. It also indicates that convergence to optimality occurs if the convergence rate of the sequence {cTxk - c*} or {b* - bTyk} is at least linear. We now prove a simple lemma before proving the main theorem in this section. Lemma 19 Let ao = - for each j = 1,2,... Then 1. (xk)Tsk 7 0 if and only if Ejl log(l - aj) > -oo. 2. EJ1 log(l - a) > -o if and only if ZE= cej < oo. Proof: Note that k (Sk+l)Txk+l = (1 - Qk)(k)Tk = ](1 - aj)(sO)Tx j=o and our Part 1 follows by taking logs of the above expression. To see the only if part of Part 2, since log(l - ac) < -car, note that 00 00 -oc < E log(l - aj) < - < cEj j=1 j=l 13

To see the if part, we note that from Lemma 16, 00oo oo oo00 2 log(l - aj) > - ~ - - 2(1 - 00 j=0 j=0 j=0 2(1 - aj) and we are done. U The next theorem gives another proof of Theorem 18. r~C,,,,,~ ~n ~~oo k k Theorem 20 Let Z,1 log(1 - ca) > -oo and xjsj -+ 0 for some j. Then, for that j 1. k > 0 2. s, 0 k <1. Now sklxk+1 (1 Let bk = x. Then _roof: Let aj k IC Then a oo. Thus assume the sequence {aj} is bounded. 33 3 k > L, 1 —?LR < i. Now j+1 =+l -a+ k ak k+i) s3 J-. (1 - cfk + cuaj a j. Let b:. Then 2k 2 *r+l + 2 k+i';. j,k -- (1 -- O -k+i a. i=O Thus r log(br+') = lg( - ak, + 2+.ak+I) j=o a. > y log( I -o)+ E log(l + a.) i=k t=k 1 - Cia i=k = -M +.\' where, > ag(.-R) =k^ U 1 - ai and using Lemma 16, N>-( -2 2 24 ). 14

Since ca -* 0, there is an L > 1 such that for all i > L ai < 1 and 1 - cai - aR > 1. Then 2 Cai 1 2 1 < - and < -. 1 - ci 2 1 - a - aR 2 L-1 2 R2 a4 1 2 R2C 4 N > R+ i R+ i i=k 1-a (1_ )22(1 - R) (i=L a (1 a)22(1 R) = -M > -00. Thus, for every r > k, log(bl) > - -M > -oo. Thus, as r -i o 2, b+1 7- 0, and this contradicts the fact that xs - 0. Thus 2(c) follows. To see 2(a) and 2(b) note that k-+ k=-' _l -N1 -Rsk I -- -M Thus, -- - oo if and only if -- - oo. Since the numerator of these expressions is bounded, the denominators must go to zero, and we have parts 2(a) and 2(b) from part 2(c). U 15

References [1] I. Adler, N. K. Karmarkar, M. G. C. Resende and G. Veiga, " An implementation of Karmarkar's algorithm for linear programming," Mathematical Programming 44 (1989) 297 - 335. [2] I. Adler and R. D. C. Monterio, "Limiting behavior of the affine scaling continuous trajectories for linear programming," Mathematical Programming 51 (1991) 29-51. [3] E. R. Barnes, " A Variation on Karmarkar's Algorithm for solving linear programming problems," Mathematical Programming 36 (1986) 174 - 182. [4] I. I. Dikin, " Iterative solution of problems of linear and quadratic programming," Soviet Mathematics Doklady 8 (1967) 674 - 675. [5] N. Karmarkar, "A new polynomial-time algorithm for linear programming," Combinatorica 4 (1984) 373 - 395. [6] M. Kojima, S. Mizuno and A. Yoshise, "A primal-dual interior point algorithm for linear programming," in: N. Meggido ed., Progress in Mathematical Programming, Interior Points and Related Methods ( Springer, New York, 1989 ) pp 29 - 47. [7] S. Mizuno and A. Nagasawa, "A primal-dual affine scaling potential reduction algorithm for Linear Programming," Mathematical Programming 62 (1993) 119-131. [8] R. D. C. Monterio and I. Adler, "Interior path following primal-dual algorithms. Part 1," Mathematical Programming 44 (1989) 27 - 41. [9] R. D. C. Monterio, I. Adler and M. G. C. Resende, " A polynomial time primal-dual affine scaling algorithm for linear and convex quadratic programming and its power series extension," Mathematics of Operations Research 15 (1990) 191 - 214. [10] R. Saigal, "A simple proof of the primal affine scaling algorithm," Technical Report No 92-60, Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan, December 1992. 16

[11] R. Saigal, "Efficient variants of the affine scaling method," Technical Report No 93-21, Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan, August 20, 1992.August 20, 1993 ( Revised October 22, 1993). [12] K. Tanabe, "Centered Newton method for linear programming and quadratic programming," Proceedings of the 8th Mathematical Programming Symposium, Japan (1987) 131 - 152. [13] M. Todd, L. Tuncel and S. Mizuno, "Monotonicity of Primal-Dual Objective Values in Primal-Dual interior-point algorithms," Research Report 1014, ORIE, Cornell University (1992). [14] M. Todd and Y. Ye, "A centered projective algorithm for linear programming," Mathematics of Operations Research 15 (1990) 508 - 529. [15] T. Tsuchiya and M. Muramatsu, "Global convergence of a long-step affine scaling algorithm for degenerate linear programming problems," Research Memorandum 423, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan (January 1992 revised, September, 1992). (To appear in SIAM Journal on Optinization ). [16] Y. Ye, O. Guler, R. A. Tapia and Y. Zhang, " A quadratically convergent O(v//L)iteration algorithm for linear programming," Mathematical Programming 59 (1993) 151 - 162. 17