A SIMPLIFIED PROOF OF THE GENERAL CONVERGENCE OF AFFINE SCALING John R. Birge Charles'Rosa Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109-2117, USA Technical Report 91-7 March 1991 Revised September 1991

A Simplified Proof of the Convergence of Affine Scaling John R. Birge*, Charles Rosa Abstract: This paper analyzes the convergence of the (dual) affine scaling interior point algorithm for linear programs. The analysis includes a primal nondegeneracy assumption and allows full steps in the normalized search direction. The proof is motivated within the geometry of the problem and does not rely on reference to the projective scaling step. * Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, MI 48109, USA. His work was supported by Office of Naval Research Grant N0014-86-K-0628 and the National Science Foundation under Grant ECS-885101. 1

1. Introduction Following Karmarkar's [8] projective scaling algorithm, several authors proposed affine scaling methods (Vanderbei, Meketon and Freedman [12], Barnes [3]) and showed convergence under primal and dual nondegeneracy. The method had actually been proposed earlier by Dikin [5] who also gave a proof of convergence only using primal nondegeneracy([6]). Vanderbei and Lagarias [13] later refined this proof and answered some open questions concerning it. All convergence proofs assumed some form of nondegeneracy until Tsushiya's [11] proof that affine scaling (in the "dual" form also proposed by Adler, et al.[l] and Monma and Morton[9] for affine scaling and by Freund[7] for projective scaling) is convergent without any degeneracy assumptions but with a maximum step size of 1/8 the maximum distance within an inscribed ellipsoid. We also use this inequality form of affine scaling (which is equivalent to the equality primal form as noted by Tsushiya and explained in Birge and Freund[4]). We derive a proof with full step sizes in this form but again include a primal degeneracy assumption. A key to the result is a lemma from Vanderbei and Lagarias about the boundedness of a fundamental transformation. 2. General Problems and Affine Scaling We consider a linear program of the form: max cT x subject to Ax < b, where x E n", c E "n, b E m, A E Smxn, and a finite optimal value cTx* = z*. We assume the following: (Al) The feasible region has an interior point, the set of optima is nonempty, and c y 0; (A2) A has full rank; (A3) The set of optima in (1) is bounded; 2

(A4) Every extreme point of the feasible region is nondegenerate. We also suppose that a strict interior feasible point is available ( x~ such that Ax~ < b). We may obtain such a starting value using phase one types of procedures as, for example, in Anstreicher [2], Karmarkar [8]. We use the form in (1) because of its generality and intuitive geometry. It is often called the dual form of a linear program with nonnegative variables and equality constraints, but it may be derived directly. Equality constraints may also be added to (1) which would complicate the following analysis through an additional projection, although the results would remain the same. We now give the affine scaling method. Our approach is similar to the original procedure in Dikin [5,6]. Given an initial interior point, the procedure creates a sequence of points x~, xi1,..., k by following the basic steps of finding a search direction, uk, choosing a step size, ak, and updating the iterate, xk+l = xk - akuk. We describe these steps below. Affine Scaling Algorithm Step 1. Search direction. Consider an iterate xk and a positive diagonal matrix, Wk = diag{wk}. Often, the matrix Wk is composed of the slack variables, sk = b - Axk > 0, so that wk = sk. This choice is not necessary (as long the ratios between similar elements remain bounded), but we will use it to simplify the analysis. In any ascendant optimization procedure, we would like to find a step v that improves the objective and is feasible, i.e., such that A(xk + v) < b,or (Wk)-lAv < (Wk)-l(b- Ak) = e,( if k = wk), (2) and cTv > 0. Let this set of feasible directions be Vk, which we note depends on xk. An intuitive simplifying transformation is to let y = (Wk)-1Av. Its inverse transformation is v = (AT(Wk)-2A)-AT(wk)-ly. 3

The objective, cTv, becomes dkTy, where dkT = cT(AT(Wk)-2A)-AT(Wk)-1. The region in (2) becomes Y = {yly < e} plus a restriction that y must lie in the affine space spanned by (Wk)-lAv. We let this subspace be FK = {yI(I - (Wk)-lA(AT(Wk)-2A)-lAT(Wk)-l)y = 0}. Problem 1 becomes max(dkTy(+cTx:)ly E Y n Fk}. (3) Since the unit ball in completely contained in the feasible region in Y, it seems reasonable to choose a search direction by optimizing dkTy over the intersection of Fk and the unit sphere, S = {ylyTy < 1}. The affine scaling search direction is, therefore, chosen by solving max{dkTyly E Sn Fk}. (4) Since dk E Fk, the optimal solution of (4) is yk = dk/I|dkil. The search direction in the original problem variables is vk, where k = (AT(Wk)-2A)-lc/ cT( Wk) 2A)-c. This is often given as uk = -(AT(Wk)-2A)-lc, where xk+1 = xk - kuk for some step size ak. Step 2. Step size: We already note that we can choose a feasible step size (ak) as large as * = l/ cT(AT(Wk)-2A)-lc, as in the convergence proof in Dikin [6] and Vanderbei and Lagarias [13], although larger step sizes appear to work better in practice. Other procedures take a step as some fraction 7 of the maximum feasible step size a = mmin{l —. aTuk < 0} or in Tsushiya's approach as a fraction (1/8) of a*. If a* = a, then a face has been hit and the algorithm stops. Otherwise, the algorithm proceeds to Step 3. Step 3. Update: Let xk+1 = xk + vk and wk+l = b- Axk+l. We continue with a full update to Wk+1, although rank one updating procedures can be used here as in projective scaling methods (see Karmarkar [8] and Shanno [10]). 4

3. Proof of Convergence Assuming nondegeneracy of the constraints in (1), the affine scaling algorithm obtains an optimal point as a limiting point of the sequence x k (see Vanderbei and Lagarias [13] for the proof for the dual to (2.1)). Tsushiya has proved that this method also converges with degeneracy if a step of (ok = -a*) is used. We will show that convergence is generally obtained with the full (ok = a*) step. Our proof applies directly to the problem with inequality constraints. It relies on the basic geometry of the problem and is stated in the following theorem. Theorem 1. Under the above assumptions, the affine scaling algorithm converges to an optimal solution of (1). Before we proceed directly with the proof of this main theorem, we prove several useful lemmas. The first concerns the termination of the algorithm. In each lemma, we make only assumptions A1-A3. Lemma 1. The affine scaling algorithm either terminates finitely with an optimal solution of (1) or obtains an infinite sequence of iterates. Proof. The algorithm terminates if a* = a in Step 2. In this case, 3i such that aT(xk + vk) = bi ( or aTvk = bi - aT xk) and Vj # i avk < bj - aTxk. (i.e., a face has been hit). All movement occurs within the ellipsoid: {vlg(v) = vT(AT(Wk)-2A)v < 1}. The vk we choose sets us on the boundary of the ellipsoid. At this point, the gradient of g(v) is simply a multiple of c. By definition of vk, g(v) = En=l(w1)2(avk)2 = 1, and T k for ao = a, (-)aTvk =, which implies = 0 for all j i or av for all j i. Hence, for g(v) = 2AT(Wk)-2Av, Vg(vk) = 2 and Vg(vk) = 2.. V CT(AT(Wk)-2A)-1CWai. So, c = ( ( ( k 2 T ) -1-)ai. for an active constraint, aT(xk + vk) = bi. This implies that the K.K.T. conditions are met at xk + vk implying optimality. Thus, the algorithm terminates finitely with xk + vk or an infinite sequence of feasible iterates occurs. * 5

Lemma 2. If the affine scaling algorithm produces an infinite sequence of iterates, then there exists some convergent subsequence of iterates in xZ and wk. Proof: Since the set of optimal solutions is bounded, there exists no nonzero v such that cTv = 0 and Av < 0. Hence, every level set of the algorithm is bounded. Since the algorithm starts with a finite objective value and always improves the objective, the iterates are within a bounded set. * Lemma 3. Along any sequence {wk wk > 0}, the corresponding price vector, 7r(wk) = (Wk)-2A(AT(Wk)-2A)-lc is bounded. Proof: Using Cramer's rule and the Cauchy-Binet Theorem, Vanderbei and Lagarias [13] prove that fT(X)2A(AT(X)2A)-' is bounded for any positive, diagonal matrix X and fixed f and A. Letting X = (Wk)-1 and fi = 1 if r(wk)i > 0 and fi = -1 if 7r(wk)i < 0, we obtain that 7r(wk)i is bounded for every i and any wk > 0. u Lemma 4. If the affine scaling algorithm produces a convergent sequence of iterates in {wk}, then the algorithm converges to an optimal solution of (1). Proof To show that an optimal solution occurs requires that we show that complimentary slackness holds and that the dual variables converge to a nonnegative vector in the limit. To show complimentary slackness, observe that cTvk = -(cTuk)1/2 where cT k = cT(AT(Wk)-2A)-lc = =l(CT(AT(Wk)-2A)-lai.(wk)-1)2, so that cTuk -- 0 implies that cT(AT(Wk)-2A)-lai.(wk)l - 0. We write this as Wkr/ -k 0 where 7rk = (1/wl)2aT(AT(Wk)-2A)-lc. Hence, complimentary slackness holds in the limit. Let w = limk wk. Since Trk is a bounded, continuous function of wk, the sequence {I.k} has some limit point. For any such limit point, x, we must have 7rj = 0 for all j a mI = {iji = 0}. However, we also have oTfA = cT, so iTAr = cT. By the nondegeneracy assumption, the rows of AI are linearly independent. Thus, trTAI = cT has a unique solution, or, which is then the unique limit of {rk}. Hence, the sequence I{Tk} converges. 6

To show that 7rk is nonnegative in the limit note that: wp+. = (b - Axk+l)i = uw + (AZk)i - (Axk+l)i = w, - (A(xk+l - xk))i k -k_ (w)2,ki w cT[AT(Wk)-2A]-1C above we must have: Z cTvk <cX (5) k=l Next, we wishm converg to ohat the sequence of iterates k also converges. If so, then wk = bProof Axk converges and Lemma 4 gives the result. We, therefore, wish to show that Erithm prok+ -an in Ek=l seque < + s. This result follows if, we have CTVk > (l/a)i n for some a < oo. To show this result, let B - [/T] be composed of a maximal independent set of vectors such that /3T.c = 0 (i.e., B is a basis for the orthogonal subspace to the subspace spanned by c). Let ai. = yic+BTpi for each i = 1,..., m. For M = [pi], we then have A = ycT+MTB. Now, we rewrite the optimization problem in (4) by substituting for v = cA + BTV. Since cTBT O0, we obtain: max CT(cA) subject to (6) (AcT + (v)TB)(cyT + BTM)(Wk)-2(yT + MTB)(cA + BTV) < 1. 7

By the construction, the feasible region is equivalent to (A)2 cTcyT(Wk)-2YcTC + 2ACTcyT(Wk)-2)(MTB)BTv (7) + (v)TB(BTM)(Wk)-2(MTB)BTV < 1. The optimality conditions for (Ak, vk) optimal in (6) state that there exists some (k > 0 such that CTc - 2Ck(AkCTCT(wk)-27 CTc + CTCYT(Wk)-2)(MTB)BTvk) = 0 (8) AkcTcT(Wk)-2(MTB)BT + (Vk)TB(BTM)(Wk)-2(MTB)BT = 0. Since A has full rank by assumption, B(BTM)(Wk)-2(MTB)BT is full rank (n - 1) and invertible. The second set of conditions in (8) therefore implies that (vk)T = AkcTcyT(Wk)-2(MTB)BT(B(BTM)(Wk)-2(MTB)BT)-. Letting A = MTBBT, we have (vk)T = -(AkcTcY7k)(Wk)-2A(AT(Wk)-2A)-1. Again applying the result from Vanderbei and Lagarias as in Lemma 3, we have that lvkI < ()( Ti|)I|AkI| for every i= 1,..., n - 1, and some r7 < oo that does not depend on Wk. Hence, I1vk11 = IIcAk + BTVkII n-1 < IcIlIdAkl + (II1i.Il)lDIVl (9) t=l < IIclIIAkl(1 + (n - r= cv(a), for a = (1+(l-1),) < oo since CTVk = CT-(AC + BTVk) = A-lC112 This gives the desired result and completes the proof. * 8

References [1] I. Adler, N. K. Karmarkar, M.G.C.Resende, and G. Veiga," An implementation of Karmarkar's algorithm for linear programming," Technical Report ORC 86-8, Operations Research Center, University of California, (Berkeley, CA, 1986). [2] K. M. Anstreicher," A combined Phase I-Phase II projective algorithm for linear programming," Mathematical Programming 43 (1989) 209-223. [3] E. R. Barnes," A variation on Karmarkar's algorithm for solving linear programming problems," Mathematical Programming 36 (1986) 174-182. [4] J. R. Birge and R. M. Freund," Prior reduced fill-in in the solution of equations in interior point algorithms," Technical Report, Sloan School of Management, MIT (Cambridge, MA, 1990). [5] I.I. Dikin," Iterative solution of problems of linear and quadratic programming," Soviet Mathematics Doklady 8 (1967) 674-675. [6] I.I. Dikin," On the speed of an iterative process," Upravlyaemye Sistemi 12 (1974) 54-60. [7] R. M. Freund," An analog of Karmarkar's algorithm for inequality constrained linear programs, with a 'new' class of projective transformations for centering a polytope," Operations Research Letters 7 (1988) 9-13. [8] N. Karmarkar, " A new polynomial-time algorithm for linear programming," Combinatorica 4 (1984) 373-395. [9] C.L. Monma and A.J Morton, " Computational experiments with a dual affine variant of Karmarkar's method for linear programming," Operations Research Letters 6 (1987) 261-267. 9

[10] D. F. Shanno, " Computing Karmarkar projections quickly," Working Paper 85-10, Graduate School of Administration, University of California, Davis (Davis, CA, 1985). [11] T. Tsuchiya, " Global convergence of the affine scaling methods for degenerate linear programming problems," Research Memorandum No. 373, The Institute of Statistical Mathematics (Minato-ku, Tokyo, 1990). [12] R. J. Vanderbei, M. S. Meketon, and B. A. Freedman," A modification of Karmarkar's linear programming algorithm," Algorithmica 1 (1986) 395-407. [13] R. J. Vanderbei, and J. C. Lagarias, " I. I. Dikin's convergence result for the affinescaling algorithm," Technical Report, AT&T Bell Laboratories ( Murray Hill, NJ, 1988). 10