AN INFINITELY SUMMABLE SERIES IMPLEMENTATION OF INTERIOR POINT METHODS Romesh Saigal Department of Industrial & Operations Engineering University of Michigan Technical Report 92-37 May 1992 Revised October 1993

An Infinitely Summable Series implementation of interior point methods Romesh SAIGAL Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, Michigan 48109-2117, USA May 1992 ( Revised October 1993) Abstract In this paper we consider an alternative implementation of the interior point methods. In the popular implementations, a variant of sparse Cholesky factorization is intregrated with a congugate gradient type iterative method. In contrast, we set up the problem as an infinitely summable series of vectors. This series is then, iteratively, sumrmed. At each iteration, a procedure based on "least squares" is used to obtain an estimate of the "tail' of the series. In addition, using Chebychev approximations, we show how to accelerate the convergence of this procedure. In this alternative approach, besides finding cholesky factors of some simpler matrices, only multiplications by some matrix are involved. This may be an advantage since in the later part of these methods, the linear system to be solved has a tendency to become ill-conditioned. We present some evidence ( for the case of the dual affine scaling method applied to the assignment problem ) which supports this conclusion. Key words: Linear Programming, interior point methods, iterative methods, inifinitely summable' series, Chebychev approximations. Abbreviated title: Implementation of interior point methods. 0

1 Introduction In this paper we consider the following linear programming problem: minimize c x Ax = b x > 0 where A is a m x n matrix of full rank m; b is a m vector and c is a n vector, and consider the recent interior point methods, a study of which was started by the seminal work of Karmarkar [4]. Most of the material on these methods can be found in the following references Barnes [1], Kojima, Mizuno and Yoshise [5], Vanderbei, Meketon and Freedman [10] and the recent book Fang and Puthenpura [2]. We leave the reader to browse through these and many other works for an introduction. Our starting point is the implementation of these methods, which requires, at each iteration, the solution of a m x m linear system of equations Cz =d (1) where C = ADAT for some n x n diagonal matrix D which has positive diagonal elements, and d is some vector generated by the matrices A, D and vectors b, c, the exact form depending on the particular interior point method being implemented. The popular implementations to date all use some version of the sparse Cholesky techniques, George and Liu [3], integrated with the congugate gradient method. The goal of this paper is to present another implementation methodology for solving (1). The first approach for solving (1) as an infinite series was presented by Puthenpura, Saigal and Sinha [7] ( this can also be found in Fang and Puthenpura [2] ), where the solution to (1) is generated as a sum of an infinite series. In section 2 we give some details of this proposal. Another recent approach for implementation of large sparse or structured problems is presented by Saigal [9]. Here the rows of A are partitioned into two parts, with each part treated separately. In section 3 we give some details of this proposal. Our approach here builds on both these approaches, and the problem is reduced to solving the system (I- N)y= q (2) 1

where N is an r x r symmetric positive semi-definite matrix, with spectral radius less than 1. The variation in this implementation is to solve (2) by the "power expansion" (I-N)- = I + N +N2 +. (3) and then write the solution to (2) as the infinite sum: y= q+ Nq + N2q +' (4) An apparant advantage of (4) is that its implementation involves only multiplications by N. But,, as we shall see later, in applications to interior point methods, the spectral radius of N is close to 1 ( and gets closer to 1 as the algorithm proceeds to the solution ). Thus the convergence rate of the sequence {sk} could be very slow, where sk = q+Nq+.-+Nkq = q + NskAs is easily seen the convergence rate of Sk is bounded by the spectral radius of N; i.e., [I S - s ||11< p(N) || Sk-1 _- s || where s is the limit of the sequence {sk} and p(N) is the spectral radius of N, in our case, the magnitude of the largest eigenvalue ( see, for example, [7] ). In this paper we present two methodologies for summing (4). The first approach is to generate sk, and an estimate t of the tail tk = Nk+lq + Nk+2q+ ' and to declare =k 8k + Ak and to declare = sk t as an estimate of the sum (4). This estimate is generated by a linear model, which is solved using "least squares". This model establishes a linear relationship between the coefficients of a polynomial whose roots contain certain eigenvalues of N. It is based on the following: given the number of "similar classes" of eigenvalues of N ( i.e, number of classes of eigenvalues behaving similarily when raised to increasing powers) the model estimates the coefficients of a polynomial which includes each class within its set of roots. One class easy to identify is of small eigenvalues. These all go to zero rapidly as they are raised to higher powers, and also contribute little to the tail. Our model distinguishes 2

these classes of eigenvalues by functions (I - N)lNk, of N, for some integers I and k. The series (I - N)'Nkq, (I - N)Nk+lq,... is then used in the estimation and is generated by finite-differencing of the original series q, Nq,. *,N q,+k N+k+lq, This model has the interesting property that the residual vector of the least squares problem is the same as the error vector when the estimate yjk is used in place of the "true" solution y in (2). Using the formulation (2) for the assignment problem ( presented by Saigal [9] ), we show here some limited computational experience which compares a model with two classes to the one with at most 14 classes. This experience suggests that the model, for this problem, will perform well. The second approach, which accelerates the convergence of the one presented above, uses Nrq, r = 0,1,.*, k', to generate, k' a lower order Chebychev polynomial approximation to ak k > k'; andthen uses ~k =k' A k Ak to Sk, k > k'; and then uses y = S +t as an estimate of the sum (4), where t is the estimate of the tail defined earlier. Here, the series, p(N) = I + N + N2 + + Nk is approximated by q(N) = A' + A2N +... + A,N ' = V of is to the esti at and the approximation S = q(N)q of sk is added to the estimate t to obtain a better estimate of the solution. AX are constants determined so that q is a Chebychev polynomial approximation to p. After presenting, in sections 2 and 3, two formulations that convert equation (1) into the form (2), we develop the basic model in section 4. In section 5, we show how the estimated parameters are used to generate an estimate of the tail, and an estimate of the solution. In section 6 we present a method to modify the eigenvalue structure, while in section 7 we present an acceleration scheme based on Chebychev polynomials. Finally, in section 8, we present some computational justification for this model when used in the context of the assignment problem. 3

2 LQ Factorization In this section we show how the system (1) can be transformed into (2) with an appropriate matrix N. We use the methodology of Puthenpura, Saigal and Sinha [7] to show this transformation. For this purpose we will assume that A is an orthonormal matrix, i.e., AAT = I. We can assume this because of the following standard theorem, and the implied transformation of the linear program that follows the statement of the theorem. Theorem 1 Let A be a m x n matrix of rank m. There exists a unique lower triangular matrix L and a unique orthonormal matrix Q such that A = LQ. The uniqueness is upto a choice of the signs of columns of L and the corresponding rows of Q. Proof: See, for example, Wilkinson [11]. U A consequence of this theorem is that, without loss of generality, we may assume the matrix A is orthonormal. This can be guaranteed by the linear transformation L-1Ax = L-lb of the system in the linear program. In this case then, Q = L-1A is the required matrix. We note here that the matrix Q is dense, even when A and L may be sparse. In [7], it is shown how all the iterations can be carried out without explicitly requiring Q. Now consider C = ADAT, where A is orthonormal, and let a > 0 be a large positive scalar. Then C = a(I - A(I- -D)AT) a has the required property provided a > maxiDi. In this case then N = A(I- D)AT a q= -d. a The following theorem appears in Puthenpura, Saigal and Sinha [7]. Theorem 2 For an orthonormal matrix A, IIAI12 < 1, IIAT1 2 = 1 and thus, for a > maxiDia, N defined above has spectral radius less than 1. Here ||A||2 = maxlxr]=iIAZxll2. 4

3 Matrix Partitioning and Special Structures In this section we present another example in which system (1) can be transformed into system (2). This derivation is due to Saigal [9]. Assume that the rows of A have the partition: E e A=, b= (5) where E and F can be readily identified. Examples where this happens include the transportation problem, the generalized upper bounded problem, and the block-diagonal linear programming problem. See, for example, Lasdon [6] for more on these special linear programs. This partition could also be generated to preserve sparsity of the Cholesky factors. The following theorem is from Saigal [9]: Theorem 3 There exist a lower triangular matrix L1, a matrix L2 and D = FDFT L2Lj, such that ADAT = LDLT EDET = L1LT LL = EDFT with Li 0 I 0 L=, D= L L2 I 0 D We now show how Theorem 3 can be used to reduce (1) into the appropriate form. Partitioning Z1 d1' z -[, d =[ z2 d2 to correspond to the partition of A, the following can be easily proved: Theorem 4 Equation(2) can be readily solved by the following steps: L1z1 = dl (6) 5

Dz2 = d2 -L z1 (7) LTZ1 = -Z LTZ2 (8) Proof: See Saigal [9]. U Note that (6) and (8) are triangular systems, while (7) has the form (FDFT- L2L)z2 = d2-L (9) To reduce equation (7) into the form (2), define a lower triangular matrix L3 such that FDFT = L3L3T and the system (9) becomes (I- L31L2L2L3T)LTz2 = L31(d2- L21) (10) Letting N = L31L2L L3, the following is proved in Saigal [9]. Theorem 5 N is a symmetric positive semi-definite matrix, with spectral radius less than 1. An advantage in partitioning A into two parts is that the Cholesky factorization EDET = L1LT and FDFT = L3LT can be found separately, and can thus be very sparse and structured. The price paid is in solving (10). For the assignment problem, both EDET and FDFT are diagonal matrices, and the complexity of solving this problem is reduced to that of solving (10). 4 The Model In this section we develop a model that will generate information useful for estimating the tail, tk. For example, if the eigenvalues and eigenvectors of N are known, the estimate of the tail can be readily obtained by using the results of the corollary to the standard theorem that appears below: 6

Theorem 6 For every r x r symmetric and positive semi-definite matrix N, there exist an orthonormal matrix P, and a diagonal matrix A, with non-negative diagonal elements, such that N = PAPT. The diagonal entries of A are the eigenvalues of N. Proof: See Wilkinson [11]. N Since P in Theorem 6 is orthonormal, the following corollary readily follows: Corollary 7 For some matrix P, and all k = 1,2,., Nkq = PAke, where e = (1,1, *., 1)T is the r-vector of all ones. Proof: Using Theorem 6, Nq = PAPq. Let q = T and Pj = qjPj, where Pj is the jth column of P. Since P is orthonormal, it is readily confirmed that Nkq = PAkPTq, and the result follows. N Using the result of the Corollary 7, we can rewrite the tail tk = P(Ak+l + Ak+2 +.. )e, and as the diagonal entries of A are non-negative and less than 1, the infinite series in the above expression sums to Ak+l(I - A)-1, and we have a formula for the tail. Since calculating the eigenvalues and eigenvectors of N is as difficult as solving the original system, this result is of little value. Our starting point towards thinking about the model is that a good estimate of the tail can still be obtained if only the larger eigenvalues are known. This is so because the smaller eigenvalues approach zero rapidly, as k becomes large. This is the basis of the model developed in this section. The model developed here generates estimates of coefficients of a polynomial whose roots are certain eigenvalues of N, generally the larger ones. These, instead of the eigenvalues themselves, are used to calculate the estimate,, of tk ( as is done in the next section ). We now develop the fundamental ideas of the model. Using the result of Corollary 7, we note that r Nq= E XjP j=1 and, allowing for multiplicities in the eigenvalues, Nq =E E AP j E= E ZP = ZEQi i=1 jESi i=1 jES i=1 where Si, i = 1,,p is a partition of {1,,., r} with Aj = Ai for j E Si, i.e., Si represents the indices of eigenvalues that have the same value Ai. 7

In addition, since p k E - kNkq= q 9jk. i=1 for sufficiently large k, the contribution, to Nkq, of "small" Ai will be minimal. The partition of the eigenvalues into "similar" groups, and the previous statement about "small" eigenvalues, forms the basis of our model. Since the terms in the tail are associated with large values of k, we note that the contribution to it of "small" eigenvalues will be minimal. We now develop a model which attempts to estimate the contribution of other classes of eigenvalues. For this purpose, consider the function (I - N)Nk of N for some > 1 and k > 1. We then have: Theorem 8 The eigenvalues of(I-N)lNk are {(1-AT)'Ak} where {Ai} are the eigenvalues of N. Proof: Since N = PAPT using the orthonormality of P we can write (I - N)Nk P(I -.A)AkPT, and we have the result. U We note that for I = 1 and large k, the eigenvalues of (I - N)Nk corresponding to eigenvalues of N close to 1, and small, are small. Thus the function concentrates on relatively large eigenvalues of N. It is this observation that will be used to identify the sets Si. We now investigate this function in the next theorem. Theorem 9 The function (1 - )lxk is maximized at x* = k/(k + 1). Proof: We have the result as this function is concave, and its derivative vanishes at x*. To understand further the relation of the function (I -N)lNk to the original sequence q NNq, N2q,, and to see how to translate the sequence into the above functional form, we introduce the following finite difference operator on an ordered sequence of vectors XO~X~... X1 XkXo,x-^ *1, * ',ac: Oxj = Xj - Xj+I. Then, k+lzj = a(k Xj) = kxj - kXj+, 8

is the same as the operator a applied k + 1 times. The following result relates this operator to the required form. Theorem 10 Given q, Nq,.,Ntq, &S(Njq) is well defined for all s+j < t, and as(NJq) = (I - N)sN3q. Proof: This is clearly true for s = 1. Assume it true for some s and consider as+l(Nq), s +j + 1 < t. But this is, by definition, 9s(Njq) - a(NJ+lq) with both terms well defined. Using the induction hypothesis, we see that aS+l(Njq) = (I-N)9NIq-(I-N)Nj+q = (I - N)3+lN3q and we are done. U Using Theorem 10 one can readily develop an algorithm for generating the required functional form from the original sequence. Thus, generating (I - N)lNkq requires an I - fold finite differencing of the original sequence, which only involves vector subtractions. Since this is self evident, we omit the details. We are now ready to develop the linear model that will enable, at iteration k, the estimation of the tail t from the given sequence q, Nq, *, Nkq. To do this, assume the eigenvalues fall into the classes S1, S2, * *, Sp ( which form a disjoint partition of {1,., r} ); and for each class, assume that the functional from (I - N)"'Njq, si > 1, distinguishes the class Si from the others. Based on this information, we now develop the linear model. Let si + t < k and define?i) = 9s (Ntq) = (I - N)iNq p = E Pj(I- Aj)iAje j=1 Here Pj, Aj are the corresponding matrices associated with Sj, and, as mentioned earlier, A(i) is known, and can be generated from the original sequence by finite differencing. Under the assumption that si distinguishes the eigenvalues of Si from the others, we can write: (i) = Pi(I - Ai)s'Aie + 1) (11) where we treat el1) as the "error" of ignoring the terms pertaining to the eigenvalues not in Si. 9

For each i, let si + ti < k. ti is the number of eigenvalues in Si. Define the r x ti matrix: a(-) A(_)... A_ ) (12) k-si -ti +l ^ k-si )) Using (11), it can be readily shown that A() = P~(I - Ai)s A-i+lV + 'r() where 1 Ai i) (Ai))2... (A(i))tV 1 A (A ))2... (A ))t'-I Vi 1 A, (A)2... (A()ti is a ti x ti Vandermonde's matrix. Here, A(i) are the eigenvalues of N indexed in Si. One would expect Y(1) to be small only when such distinguishing si exist. We do not expect them to exist in all such systems, but believe that they will exist when these systems are generated by an interior point method. We provide some preliminary evidence of this later in this work. Thus, for each i, we can write Pi - A(i)VlA-(k-s-ti+l)(I i)-s' + T(2). (13) Now. Nkq = PAke = EP-1 PiAke + PpAke where we will assume that Sp contains the indices o-f all the 'small' eigenvalues. Substituting (13) into the above, we get p-i Nkq = ^A()V lAi+t-l(I - Ai)-ie + e(2). (14) i=1 Defining, for each i = 1,..,p- 1, a(i) = VTl1A+t"-1(I - A)-'se, and A(~) Nq, we obtain the linear model p-1 (~) = A(i)a(i) + (2) (15) i=1 where 5(2) is the 'deterministic error', and a(i) are the parameters to be estimated. A comment on this model is in order here. The parameters c(i) are complicated functions of the eigenvalues of N, but by a simple analysis, can be shown to generate, through a 10

linear transformation, coefficients of a polynomial which has A(i) as roots ( besides others ). Fortunately, we do not need to compute i), and need only estimates of a(i) ( see the next section ). Also, the data, A(i) for i = 0,..,p - 1 in the model is known, and is generated by finite differencing of the original data. In pedagogical terms, this model sets up a linear relationship between the most recently generated data Nkq and the past data q, Nq,.* *, Nk-lq. In estimation of the tail, this linear relationship is assumed to hold for all future data ( to be estimated ) and the past data ( assumed known ). This model now allows us to estimate the parameters by the "least squares" procedure, requiring the solution of the normal equation: ATAa = TA(0). (16) where A = (A('),.. A,(P-1)), a = (a(l),...,a(P-1))T, and is a u x u system of linear equations, with u = EiP ti. 5 Estimating the Tail and the Solution Let 'a be the estimate of the coefficient a, and be obtained by solving the model (15) by the method of least squares, i.e., by solving (16). We now show how this estimate can be used ^k-l to compute, t, an estimate of the tail 00 tk-= Nk+j q. j=o Please note that the "known" term Nkq has been included in the tail and the reason for this inclusion will become clear later. Using Corollary 7, we can write 00 p-1 tk-1 = E PiA+j e+(3) j=0 i=l p-1 PiA (I - Ai)-le + ~(3) i=l where e(3) is the error vector. 11

Our approach now is to obtain an expression for Pi in terms of Ai, such that its substitution into the above expression can be readily identified as some function of a(i). We could use formula (13), but it is unsatisfactory ( the final terms involve both a() and A, in the simplification ). We now desire an expression for Pi that does the trick. Define A =i)- = s'-(Ntq) (17) Using (17), for t = k - si - ti + 1,.., k - si, we can define,a0 = PinAk-si-t,+l(i_ A+i)Si-lVi + (3) - PiA.t+I - AiY)s'Vi + i As before, A(i) is obtained from the original data by finite differencing, and is known. Defining Pi, as in (13), and substituting in above, we get p-l tk-1 = A(i)VlAsi+ti-l(I - Ai)-ie + S(4) i=l p-1 = y a(i),() + E(4) i=l and we can use tk-1 Aa (18) ( where A = ((l).., A(p-)), and c&T = ((1),.., &(p-)) is the "least squares" estimate of a in (15) ) as an estimate of tk-1. There is clearly a relationship between A and A. Using the estimate t, defined in (18), we set ok $k-1 + _k -1 - = Sk-1 + -1 and use this as an estimate of the solution to (2). Of interest is the error in the equation (2) when this estimate is used. Define the vector k q - (I- N)ik. Indeed, a good estimate of the solution to (2), results in a smaller error Ck, and a good model (15) would generate a lower residual error in the least squares setting. The next theorem asserts that these two errors, are the same, and also justifies the inclusion of Nkq as a part of the tail. 12

Theorem 11 Let yk be the estimate of the solution to (2) as computed above using a, the least squares solution to (15). Then p-1 e = q- (I- N)j k = Nkq- _ A()(i) i=l Proof: q-(I- N)i = q- (I-N)(sk-l + a) q - (I- N)(q + Nq +. + Nk-lq + Af^) = Nq - (I - N). But (I- N)- A = (O9si-1(Ntq)) = A(i) and we have our result. * This is an important theorem, since it connects the error in the solution of (2) with the residual vector of the least-squares solution. Thus, if the residual is large, the resulting error in the equation will also be large. Hence a large residual would indicate a model modification which is defined by the number of sets Si eigenvalues are partitioned into, the number of eigenvalues, ti, in each set, and the "distinguishing" power si. 6 Separating Eigenvalues The "goodness of fit" of the model developed in section 4 is also effected by the "separation" of the eigenvalues of N into classes that behave similarily when raised to powers. Thus for systems with well separated eigenvalues, our models will exhibit better perform. We present here a technique for enhancing the separation of eigenvalues of N. This separation is acheived by rewriting the series involving higher powers of N, and the fact that the eigenvalues of Nk, for k > 2, are "further apart" than those of N. Consider the sum of (4). Theorem 12 For each I > 1, there exist a vector q and a symmetric positive semi-definite matrix N such that y =- + Nq +- N2 + * * ~ where the eigenvalues of N are (I + 1) powers of the eigenvalues of N. 13

Proof: For each given I > 1, the sum of (4) can be rearranged as y = (q~+ Nq +...~+Nlq)+Nl+l(q +Nq +. + Nlq) +N2(1+l)(q+Nq +...+Nlq)+.. and we obtain our theorem by letting q = q + Nq +.. + Nlq and N = N1+1. U Implementation of the result of this theorem requires generating the partial sums: NI q = Nj('+l)q + Nj('+l)+lq +... + Nj(+l)+lq from the original data, q, Nq, N2q,.. In any computational analysis, this "overhead" must be compared to the resulting savings. 7 Acceleration by Chebychev Polynomials ChebychLev Polynomials, ( see for example, Rivlin [8] ), provide a convenient and efficient mechanism for making a best lower order polynomial approximation, over a range of values of the variables, to a given polynomial. In this section we show how these polynomials can be used to accelerate the convergence of the techniques developed in this paper. Before we do this, we present a brief introduction. The Chebychev polynomial of degree n is defined as Tn(X) = a( + ax +... + a()x (19) for all x E [-1, 1]; and, the relationship between these polynomials is defined by the difference equation To(x) = 1 Tl(x) = x Tn+l(x) = 2xTn(x)-Tn-i(x). This difference equation can be used to generate coefficients of Tn, and the formulii for these coefficients can be found in [8], and other references. Given a polynomial p(x) of degree n, a lower order approximation q(x) of degree m, m < n, is generated by first expressing p(x) in terms of Tj(x); and then dropping the terms 14

involving the higher order Chebychev polynomials. Thus, to get a m degree approximation q(x) to p(x), we first define Ao, ' ", A, such that p(x) = AoTo(x) + AiTi(x) +... + AnTn(x) (20) and then define q(x) = AoTo(x) + AiTi(x) +. + AmTm(x). (21) The following is well known: Theorem 13 p(x) - q(x)l < Ejm+l IAl for all x E [-1,1]. In our application, we will apply Chebchev approximations to the series p(N) = I + N + N2 +... + Nk- (22) of matrices N by the lower order series q(N) = Ao + A1Ti(N) +... + Ak'Tk,(N) (23) where k' < k - 1. A caution is in order here. For the ease of exposition, we have abused notation, and used the same symbols, p, q and T, having the set of reals ( as in (19) - (21) ) or the set of matrices ( as in (22) and (23) ), as their domain and range. We can show that Theorem 14 For every k \ITk(N)\I2 < maxilTk(A)l _ < 1. Proof: Using the fact that N = PAPT for some orthonormal matrix P, we see that Tk(N) = P(a(k)I + a(k)A +... + a()Ak) pT and thus IITk(N)112 < Ila?()I + a(k)A +. + ak)A kl = maxi (a(+) a )Ai + a + a(k)Ak = maxi|Tk(A.)l < 1 15

and we are done. U In the estimation of the solution to (2) we will use k-1 = q(N)q instead of sk-i Since p and q are polynomials in N, the error calculation of Theorem 13 will involve the eigenvalues Ai of N. This follows from the fact that I\p(N) - q(N)}I < i-1 |Aj-|||lTj(N)I)||2 and, from Theorem 14, 11Tj(N)112 < maxilTj(Ai)l. Since Ai E [0, 1], we need to modify the definition of the polynomials so that the corresponding domain of interest is [0,1]. This is readily clone by a change of variables to obtain To(x) = 1 Tl(x) = 2x-1 (24) Tn+l(x) = 2(2x - )Tn(x)- Tnl(x). We now compute the coefficients of Tn(x). Theorem 15 Let Tj be defined by (24). Then for each n > 1 and j < n (n) _( l+_ n ( + j ) 3 n+j Proof: Using (24), we can readily establish that for each n > 1: an+1 4a$() (n+l) 4a(n) 2a() an 4an-1 2an (n+1) _ (n) _ 4a(n) (n-1) a( = -2a n + 4a)i - a() 0 <j < n a(n+) - 2a(n) (n-a). a0 a0 The result now follows by a simple induction hypothesis using a() = 1, and a(l) = 2, (1) a(0) = -1. To generate the approximation (23) we need to determine the coefficients, Ak-1), j = 0,. -, k - 1, such that k-1 p(N) = A(k-l)Tj(N). (25) j=0 16

It can be readily established, using the form (19) of Tj, that Ajk ) solve the following triangular system of equations: (0) (1) A(k-) k-l) 1 ao ao ~... a0 A (1) (k-) A(k-1) 1 al al1 Al (26) (k-I) (k-l) 1 ak-l-l Theorem 16 The solution to the system (26) is: For j > 1, Ak-j)= ( - i ) 4_Ji+1 Aik-l) 4 (27) i=o i with k-1 A( -') 1 + E(-l)j+1Ak-l) (28) j=l Proof: The (26) can be rewritten as a(0) a0 A0 ao) oH Ao L Ao e for the appropriate upper triangular matrix L, vectors Ao, ao and e = (1,1,, 1)T. After substituting the result of Theorem 15, we can obtain the solution of (27) by solving 8LAo = r (29) where |1 (2 3 k-(- -1) | 1 -(4) ( 1)k( 1 _ *3 3 L = 1 - (-1- tc+l 1 ' (_l)k+l( 5 17

2Ak-1) 2A2 k k- 1,r = 1 24-1 (k - 1)4-k2 Using the standard identity r r-1 r-1 it can be readily shown that where 1 1 1 1 I 1 <- rowj I 1 which specifies that columns, starting with the jth, be successively added. Since L-1 is triangular, we can define L-1 =L 1 v and; if we define where Uk-l = (0,0,..., 1)T ( the r = + - rk-l Uk-1 0 (k - l)st unit vector ), =+ rk1 -L Uk-1 L0 L r + rk-lu rk-lv 18

Using the last identity, the result of the theorem can be established by a straightforward induction on k. U Theorem 16 can now be used to derive the coefficients of the approximating polynomial (23). If q(N) = Ao + AN +... + AN then A' = L'A where A' = (A,..., A,)T, A = (Ao,..., Ak,) and (0) (1) (k') ao ao... ao (1) (k') (k') akI 8 Validation of the Model In this section, we present some computational results validating the model of Section 4. This is done by its application to the system (4) generated by an assignment problem. This problem has an optimum solution which is highy degenerate, and can thus present special problems for interior point methods. In several examples we tested this model on, the spectral radius of N approaches 1, as the algorithm proceeds to the solution ( see Table 3 ), thus, making the summation of the series (4) very hard. This happens when the optimum solution is unique, and thus a highly degenerate vertex of the assignment polytope. In the case when the algorithm converges to an interior solution, our model performs exceptionally well ( see Tables 1 and 2 ). These computational results are with the dual affine scaling method, see, for example, Vanderbei, Meketon and Freedman [10] for the specifics. We now derive the matrix N used for the validation. The assignment problem is the following linear program: given n2 positive real numbers cij, i = 1,, n, j = 1,, n; find 19

xij that solve the following linear program: min En=1 En=_ ci,jij Ej=xi,j = 1 i = 1,,n Ei=i,j 1 - 1= l,3,n i,j > 0 i = 1,... n,j = 1,,n. For this problem, the vertex optimum solutions are known to be highly degenerate. The dual to this problem is the linear program: max En + E1 v ui + vj + s,j = cij i=, 1,.j =,,n. sij > 0 i = 1,.n,j = 1,,n. where ui, vj are the dual variables and si j are the dual slacks. We apply the dual affine scaling method, starting with the interior dual solution Ui = 0, vj = 0 and Sij = cij(> 0) for each i and j. In this case the diagonal matrix, D, is a n2 x n2 matrix with the diagonal entries, Di,j = 1/sj for i = 1, *, n, j = 1,, n; and the matrix A has the partition (5) of section 3 with eT 0... 0 0 eT... 0 E=;F (II,...,I), 0 0... eT where e is a n-vector of all l's, and I is a n x n identity matrix. Thus, EDET and FDFT of section 3 are diagonal matrices with positive diagonal entries. L1 and L3 are also diagonal matrices, with L2 = L-1FDET. The matrix N = L 1L2LTL-1, is a n x n positive and positive semi-definite matrix with spectral radius less than 1, see, for example Saigal [9]. Computational testing of the model is performed on three sets of randomly generated assignment problems, i.e, the positive constants cij are determined randomly. Each set consistes of two similarily behaving problems, one of size 200 x 200 and the other 300 x 300. These problems are solved by the dual affine scaling method adopting the methodology of Saigal [9] presented above, and present increasing degree of difficulty to the affine scaling method and the estimation model presented in this work. On the first set of problems, the 20

method converges to a solution on the boundary of the primal polyhedron with a very few positive variables at value 1; on the second set the method converges to almost a vertex, with about 75% of positive variables at value 1; and, on the third set the method converges to a vertex, with all positive variables at value 1. The "step-size" in the affine scaling method is given the following values: during the first 20 iterations the values, respectivelly, are 0.5,0.5,0.0.0.5, 0.5, 0.5, 0.5,0.5, 0.50.5,0.5,0.5, 0.9,0,90,0, 93, 0.93,0.94,0.95, 0.95,0.97 and remains at 0.97 for the 21st and higher iterations. Thus the method approaches the boundary gradually. The matrix N for testing the model is generated by the affine scaling algorithm. The estimated solution y generated by the model is not used in the method, but (2) is solved by the technique proposed in Saigal [9]. This is done to guarantee that the same matrix N is used in testing of different parameters. During each iteration of the affine scaling method, the model is tested on the resulting N with p = 2 and p < 14. In the later case, the iterative method automatically increases p as more entries N3q of the series are generated. The values s = and t = 5 are eused with p = 2, and for p < 14, si and t are given in the table below: i 1 2 3 4 5 6 7 8 9 10 11 12 13 t 3 2 2 2 2 2 2 2 2 2 2 2 2 s 1 4 6 8 10 12 14 16 18 20 22 24 26 The results of this test are presented in Tables 1 and 2. At each iteration of the dual affine scaling method the matrix N is used to recursively generate Nq, *, NKq. This sequence is then used in the model to estimate the solution to the system (2). This system is considered adequately solved when either the error "ERROR" ( reported in Tables 1 and 2 ) of this estimate satisfies ERROR < min{10-5, l1q112 x 10-4}. or N500q has been generated. As can be verified from the Tables 1 and 2, the problem complexity greatly effects the model performance. The first set of problems ( converging to the interior of a face ) is solved quickly; i.e., a good estimate of the solution is obtained for small K. Solving (2), for 21

example requires, ( at iteration 14 ) 44 or 23 matrix multiplications for the 200 x 200 and 44 or 24 for 300 x 300 case, instead of the inversion of I - N. From the tables, it appears that the second class of problems is easier to solve than the third; and for these problems, the model performs well at the earlier and the later iterations. This limited experience suggests that the performance is enhanced when p is increased. It can b:e verified from Table 3, that the spectral radius of the matrix N approaches 1, specially for the harder problems, and that the model performance improves in the later iterations. This may be because the model performance is effected by the number of sets of "similar" eigenvalues, and not necessairly by the magnitude of the largest one. This phenomenon is under investigation. We expect this experience to carry over to the general problem as well. To date no computations that confirm this have been performed. 9 Acknowledgment The author gratefully acknowledges many helpful discussions with Teresa Lam. References [1] E. RL. Barnes, " A variation of Karmarkar's Algorithm for solving Linear Programming Problems, " Mathematical Programming 36 (1986) 174 - 182. [2] S. C. Fang and S. Puthenpura, Linear Optimization and Extensions, Prentice Hall, in print. [3] A. George and J. W. Liu, Computer solution of large definite systems, Prentice Hall, Englewood Cliffs, New Jersey (1981). [4] N. Karmarkar, " A new polynomial time algorithm for linear programming", Combinatorica 4(1984) 373 - 395. [5] M. Kojima, S. Mizuno and A. Yoshise, " A primal-dual interior point method for linear programming ", in Progress in Mathematical Programming: Interior Point Methods, 22

edited by N. Megiddo, Springer Verlag, New York (1989) 29-48. [6] L. S. Lasdon, Optimization Theory for Large Systems The Macmillian Company, London (1970). [7] S. Puthenpura, R. Saigal and L. P. Sinha, " Application of LQ Factorization in implementing the Karmarkar algorithm and its variants ", Technical Memorandum, No 51173 - 900205 -01TM, AT&T Bell Laboratories (1990). [8] T. J. Rivlin, The Chebychev Polynomials, John Wiley and Sons, New York (1974). [9] R. Saigal, "Matrix partitioning methods for interior point algorithms", Technical Report 92 - 39, Department of Industrial and Operations Enginering, University of Michigan, June 1992. [10] R. Vanderbei, M. S. Meketon and B. A. Freedman, " A modification of Karmarkar linear programming algorithm ", Algorithmatica 1986 395 - 407. [11] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, 1965. 23

200 x 200 ASSIGNMENT PROBLEM Interior Facet Almost Vertex Vertex Iter no p ~=2p<14 p =2 p <14 p~=2p<14 Error K K Error Error K K Error Error K K Error 1I 2E0 4 4.23E-06.21E-07 5 5.21E-07.38E-05 7 7.38E-O5 2.66E-06 4 4.66E-06.87E-07 5 5.87E-07.15E-07 8 8.15E-07 3.22E-05 4 4.22E-05.66E-06 5 5.66E-06.84E-06 8 8.84E-06 4.17E-06 5 5.17E-06.85E-05 5 5.85E-05.1OE-05 10 10.1OE-05 5.30E-05 5 5.30E-05.25E-05 6 6.28E-05.1OE-05 11 11.10E-05 6.32E-05 6 6.34E-05.62E-O5 7 8.58E-OS.44E-06 16 16.44E-06 7.99E-06 8 9.21E-06.57E-05 9 10.14E-05.41E-05 19 19.41E-05 8.26E-05 9 9.48E-05.53E-05 12 12.20E-05.85E-05 26 21.35E-05 9.36E-05 11 11.22E-05.38E-05 17 14.87E-05.71E-05 41 33.5OE-05 10.31E-05 14 13.11E-05.92E-05 25 18.71E-05.95E-05 65 47.12E-05 11.442E-05 16 14.32E-05.76E-05 30 20.73E-05.92E-05 97 70.72E-05 12.16E-05 20 16.1OE-05.87E-05 43 23.91E-05.59E-05 117 79.50E-05 13.11E-05 25 18.64E-06.91E-05 58 29.98E-05.82E-05 183 110.75E-05 14.78E-07 44 23.12E-06.66E-05 127 61.89E-05.91E-05 475 367.86E-05 15.72E-08 19 15.70E-08.99E-05 268 136.1OE-04.20E-02 500 500.32E-03 16.95E-09 60 29.1OE-08.19E-05 282 147.18E-05.66E+00 500 500.52E-02 17.33E-10 18 15.18E-10.1 7E-05 500 500.67E-06.13E+00 500 500.25E-01 18.157E- 11 8 1 39.57E- 11.30E-05 500 500.12E-05-.37E-01 500 500.15E-01 19.19E-12 18 15.11E-12.85E-06 500 500.28E-05.40E-02 500 500.29E-02 20.22E-13 115 52.36E-13.62E-07 500 500.92E-06.90E-04 500 500.72E-04 2 21.13E-14 22 1 7.81 E- 15.20E-08 500 500.20E-08.31E.05 500 500.20E-05 2 2.23E- 15 215 95.23E- 15.81E-11 500 500.81E-11.29E-07 3 3.29E-07 23.60E- 17 18 15.43E-17.12E-11 177 79.l1E-11.42E-09 2 2.42E-09 24.85E-18 77 36.11E-17.-23E'-12 500 319.87E-13.16E-1I1 2 2 I.16E-I1l 25.60E- 14 187 86.60E-14.68E- 14 2 2.68E- 14 26 ______ ____.59E- 15 500 276.41E-15.31E-16 2 2.31E-16 27 ____.30E- 16 239 110.29E- 16.20E- 18 2 2.20E-18 28.24E- 17 410 210.25E- 17.62E-21 2 2.62E-21 29 _______.27E-23 2 2.27E-23 1 0 ~ ~ ~ ____ __ _____.20E-25 2 2.20E-25 3_ _ _ __ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _.11E-27 2 2.11E-27 Table 1 24

_____ ~~~300 x 300 ASSIGNMENT PROBLEM___________ Interior Facet Almost Vertex Vertex Iter no p =2 p <l4 p =2 p <l4 p=2p< 14 Error K K Error Error K K Error Error K K Error 1.16E-06 4 4.16E-06.44E-07 5 5.44E-07.43E-05 6 6.43E-05 2.44E-06 4 4.44E-06.49E-07 5 5.49E-07.26E-06 7 7.26E-06 3.20E-05 4 4.20E-05.43E-06 5 5.43E-06.89E-07 9 9.89E-07 4.11E-06 5 5.11E-06.61E-05 5 5.61E-05.66E-08 11 10.44E-06 5.24E-05 5 5.24E-05.1OE-05 6 6.11E-05.23E-05 10 10.23E-05 6.11E-05 6 6.12E-05.67E-06 7 7.60E-05.39E-05 12 12.39E-05 7.86E-06 7 7.69E-05.67E-05 8 9.97E-06.47E-05 16 16.47E-05 8.42E-05 8 9.98E-06.51E-05 11 11.21E-05.38E-05 24 24.38E-05 9.49E-05 10 11.58E-06.91E-05 15 14.30E-05.35E-05 37 37.35E-05 10.52E-05 12 11.93E-05.58E-05 21 17.20E-05.81E-05 58 42.20E-05 11.44E-05 14 13.24E-05.69E-05 26 19.43E-05.98E-05 84 84.98E-05 12.35E-05 16 14.33E-05.99E-05 33 22.54E-05.69E-05 139 67.38E-0-5 13.20E-05 23 17.16E-05.81E-05 51 27.67E-05.83E-05 185 107.87E-05 14.42E-06 44 24.31E-06.82E-05 125 63.90E-05.99E-05 354 226.91E-05 15.81E-08 24 17.90E-08.77E-05 213 119.96E-05.51E-03 500 500.8E4 16.29E-08 100 36.26E-08.39E-05 488 288.38E-05.12E+00 500 500.46E-02 17.47E- 10 16 14.33E-10.37E-05 500 500.25E-05.14E+00 500 500.73E-01 18.14E-10 150 50.16E-10.52E-05 500 500.43E-05.89E-01 500 500.56E-01 19.39E- 12 2 1 17.12E-12.18E-05 500 500.15E-05.32E-01 500 500.22E-01 20.97E-13 138 57.91E-13.21E-06 500 500.57E-05.79E-02 500 500.52E-02 2 1.20E- 14 20 16.15E-14.16E-07 500 500.27E-06.27E-02 500 500.15E-02 22.40E- 15 116 57.35E- 15.39E-10 124 3 1.38E-10.20E-03 500 500.25E-03 23.24E- 16 4 1 23.23E- 16.30E- 11 77 28.38E- 11.11E-04 500 500.67E-05 24.21E-17 125 53.22E- 17.20E-12 41 2 1.13E-12.88E-07 500 500.34E-07 25.25E- 18 67 28.24E- 18.22E- 13 69 27.21E-13.27E-08 32 17.24E-08 26 ____ __.11E-14 77 29.71 E- 15.19E-10 53 1 7.14E- 10 27 ___.93E-16 49 23.27E-16.55E- 11 39 1 7.23E- 12 28 ___.51E-17 250 40.11E-17.83E- 12 6 1 19.99E —13 -29 ____ ___.73E- 18 96 33.41 E- 18.45E- 13 57 1 7.57E- 15 30 _____.44E- 14 77 19.37E- 14 3 1 ________.17E-15 43 18.67E- 18 32 __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _.21E-16 37 17.15E- 18 33 _ ___ _ _ __ _ _ _ _ _ __ _ _ _ _ _.12E-17 71 1 7.57E- 19 Table 2 25

tC 0) | _______________THE LARGEST EIGENVALUE Interior Facet Almost Vertex Vertex iter 200 300 200 300 200 300 1.9901447238672399.9934978175014585.9896131723699559. 993234766Q942302.9902704011386622.934762989524124 2.9900961443084537.9934186912268195.9899056464756422.9933083337386703.9901099097716491.9933895689884312 3.9900607937971756.9933559054478737.9901720102742565.9933864437617831.9899756389655071.9933091899889467 4.9899965061222104.9932791318740413.9905098393465249.9935104212676482.9898290242692684.9932136733431952 5.9899036721613828.9931941062181071.9909246054695797.9936881369782937.9896816509502613.9931259144676824 6.9898362574474263.9931267666279024.9914143785352492.9939091182799246.9895654883729561.9930849464235598 7.9899194893548973.9930952597987705.9920528292478605.9941600282091113.9895965563950285.9931239002123157 8.9902630967647999.9931056428988306.9927454269019744.9944891617919949.9900471983168925.9932731057716612 9.9909347495951451.9938442214108793.9934552173991646.9949007307038975.9910334510805776.9935938447387953 10.991698350999346.9933668666248594.9942917996965845.9953666836700457.9924207814025372.9941748293727437 11.9923520122869073.9936250698922473.9950296766226088.9957957479586064.9937359111635224.9949332859291869 12.9930303653401724.9939487465725621.9957786450015449.9961745224941985.9949046663890485.9957201695856718 13.9935758356117154.9943471698211603.9963645109429915.9966200644015995.9960325015350879.9964116190494283 14.9952617651221247.9951934721853056.99770385857701.9978513542039164.9986825570584397.9982736953381484 15 1.9932782887906138.9945002638745208.9982106973541132.9986067117779875.9994308944679383.9987384902708429 16.9965906303594346.9958084656045198.9986350889093472.9990509284216897.9998158719261547.9993068865577555 17.9931044028591927.9942191630970714.9993053726008107.9993768099938902.9999507392517608.9996908839712081 18.9971093266767113.9966531473552512.9998011882172851.999656820722268.9999914233005139.9998352646075588 19.9922950323648496.9948769102561883.9999726712124946.99980850749491.9999975689126962.9998816294842967 20.9973853729691724.9967373895081607.9999978542185801.999939258173573.999999824948421.9999699825813522 21.9936907023585453.9935361068659556.9999998914042579.9999894909179856.9999999816879248.9999984634373612 22.9980711062108224.9969089383222424.9999999841440135.9999985974184675.9999999995858695.999999833594602 23.9911968175530437.9957296613353846.9999999997658635.9999999503334166.9999999999067528.9999999907302287 24.9969672707273868.9966921636270423.9999999999165852.9999999881180519.9999999999997641.9999999991419738 25.9968005968116425.9999999999985193.9999999999413154.9999999999998697.9999999999731211 26.999999999999502.9999999999677954.9999999999999836.9999999999951627 27.9999999999999911.9999999999964501.9999999999999946.9999999999999893 28.9999999999999983.9999999999998771.9999999999999996.9999999999999925 29.999999999999977 1.000000000000002*.9999999999999992 30 _________1.000000000000002* 1.000000000000003* 31 1.000000000000003* 1.000000000000004* 32 1.000000000000004* 33 ____________1.000000000000004* Table 1: * - No eigenvalue of N is larger than 1. This may be due to numerical problems encountered by EISPACK.