MATRIX PARTITIONING METHODS FOR INTERIOR POINT ALGORITHMS Romesh Saigal Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 Technical Report 92-39 June 1992

Matrix Partitioning Methods for interior point Algorithms Romesh SAIGAL Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, Michigan 48109-2117, USA June 1992 Abstract We consider here a linear programming problem whose rows of the constraint matrix can be partitioned into two parts. Such natural partitions exist in several special linear programs, including the assignment problem, the transportation problem, the generalized upper bounded variable problem, the block diagonal linear program; and can also be used to induce sparsity patterns in Cholesky factorizations. In this paper we propose a matrix partitioning method for interior point algorithms. The proposed method independently generates Cholesky factorizations of each part, and reduces the complexity to that of solving generally, a dense linear system involving several rank one updates of the indentity matrix. Here, we propose solving this linear system by an inductive use of the Sherman - Morrison - Woodbury formula. The proposed method is easily implemented on a vector, parallel machine as well as on a distributed system. Such partitioning methods have been popular in the context of the simplex method, where the special structure of the basis matrix is exploited. Key words: Linear Programming, interior point methods, Cholesky factorizations, Matrix Partitioning Methods, Vector-parallel machines, Distributed systems. Abbreviated title: Matrix Partitioning Methods. 0

1 Introduction We consider here the linear programming problem min CTZ Ax =b x >0. where A is a m x n matrix of rank m, b is a m vector and c is a n vector. We assume that the matrix A has the partition G g A=[, 6b= ], (1) H h where G and H, respectively, are m1 x n and m2 x n matrices, and; g and h are, respectively, ml and m2 vectors, with m1 + m2 = m. We also assume that such a partition is determined either by the structure of the problem or has been induced by sparsity considerations. Specially structured linear programs that have this natural partition include the assignment problem, the transportation problem, the generalized upper bounded variable problem and the block diagonal linear program. Such partitions may also be considered to arise in situations where rows are added, iteratively, to a linear program, and one of the set of rows G or H may be considered to be the ones added. Many variants of the simplex method exist which exploit the induced special structure of the basis matrix. We refer the reader to Lasdon [6] for a background on these variants. Our goal, in this paper, is to consider the application of the recent interior point methods to this partitioned linear program. The study of these methods was started by the seminal work of Karmarakar [4]. Some notable papers that are related to this work are Barnes [1], Kojima, Mizuno and Yoshise [5] and Vanderbei, Meketon and Freedman [8]. The reader is encouraged to browse through these, the recent book Fang and Puthenpura [2], and many references therein, for an introduction to these methods. Our starting point in this paper is the implementation of these methods, which requires the solution of a linear system Cz = d (2) 1

where C = ADAT for some diagonal matrix D; and d, as a function of A, b, c and D), is determined by the particular interior point method employed. In this paper, we reduce (2) to solving a m2 x m2 system of the form (I-EET)u = q (3) where E is a m2 x ml matrix. If Ej is the jth column of the matrix E, it can be readily seen that ml I - EET = I- EjEj, (4) i=1 and we view (4) as ml rank one updates to the identity matrix I. We solve the system (4) by an inductive version of the Sherman - Morrison - Woodbury formula. This inductive method requires O(m2m2) multiplications, and is advantageous over inverting I- EET when m2 > mi. Another advantage of this method is its ready implementation on a vector and a parallel/distributed computing environment. In section 2 we present partial Cholesky factorization, the basic idea behind the partitioning technique; and, in section 3 we present a technique to handle several rank one updates. In section 4 we present the variant to handle the transportation and assignment problems, in section 5 the GUB and in section 6 the block diagonal linear program and the multicommodity flow problem. Finally in section 7, we give some preliminary computational results comparing the transportation variant with LOQO, Vanderbei [9]. 2 Partial Cholesky factorization of ADAT In this section we generate the general theory we will use to exploit the partition (1) of A. The main result we need for this purpose is the following theorem: Theorem 1 Let C be a m x m symmetric positive definite matrix. Then there exists a unique m x m lower triangular matrix L with positive diagonal entries, such that C = LLT. In case C is dense, mn3 + m2 - 3m multiplications are required to compute L. Proof: See George and Liu [3]. 2

The matrix L in the above theorem is called the Cholesky factor of C. Given that C is sparse, we refer the reader to George and Liu [3], an excellent reference on the methodology that preserves sparsity of L. We now use this theorem to exploit the partition (1) of the rows of A. Assuming this partition, it can be easily seen that GDGT GDHT ADA =. (5) HDGT HDHT Then we can prove: Lemma 2 There exist lower triangular ml x ml matrix L1 and m2 x m2 matrix L3 such that GDGT = L1LT and HDHT = L3LT. Proof: Since A is full row rank so are G and H. Since D has positive diagonal entries, GDGT and HDHT are symmetric positive definite matrices, and we have our result from theorem (1). We now use the lower triangular matrices L1 and L3, guaranteed by lemma (2), for solving (2) when A has the partition (1). This is done in the next theorem, which generates a partial Cholesky factor of ADAT. Theorem 3 Let the ml x ml matrix L1 and m2 x m2 matrix L3 be defined as in the lemma (2). Then, for the m2 x ml matrix L2 and m2 x m2 matrix D with D = L3(I-L31L2L L3T)LT, (6) L1L2 = GDHT, (7) L, 0 A I 0 LL= i b-b; O] ADAT2I 0LT ADAT- LDLT. 3

Proof: Can be readily verified by a direct multiplication of L, D and LT. As a result of this theorem, the solution of the equation (2) can be expressed in terms of the matrices L1, L2, and L3. This is done in the next result. Assume that the partitions z =,d= [ conform to the partition (5) of C ( = ADAT ). Here, di, zl are ml vectors and d2, Z2 are m2 vectors. Theorem 4 Let the m2 x ml matrix E be defined so that L3E = L2. The equation (2) can be solved by the following sequence of steps: Step 1 Find z1 by forward solve Litz = dl Step 2 Define Z= d- L2ZI Step 3 Find z2 by forward solve L z2' - Z2 Step 4 Solve (I-EET)z' = z' (8) Step 5 Find z2 by backward solve LT22 - Z2I L3 r2 2 Step 6 Find zl by backward solve LT z = z1 - L2 z2 Proof: Using the structure of L and D ( of theorem (3)) the following equations can be readily derived: L1 z = di 4

s2 - d 2 1 Dz2 = z'2 L Tzl = z- -L zT. The theorem now follows from the structure (6) of D. The result of theorem (4) requires the Cholesky factors of GDGT and HDHT, which are expected to be less dense than the factor of ADAT. The price for this enhanced sparsity is paid at Step 4 of the theorem. Here, generally, a dense system of equations has to be solved. In the next section, we will present a procedure that solves this system in O(m m2) multiplications. If this system were solved by Cholesky factorization, calculation of EET would require m1m2 multiplications, and the calculation of the Cholesky factor another m2 + m2 - m3m2. In the case m 2 mi, solving directly is more expensive. For a dense matrix C, system (2) can be solved, using Cholesky factors, in -m3 + 3m2 -m multiplications. Using the technique of this section with a partitioned matrix A, it requires -(m3 + 9m22+ m - 3m) multiplications, and is not competitive. We now establish a basic property of the matrix EET, encountered in Step 4. Theorem 5 Let E be defined as in theorem (4), Step 4. The spectral radius of EET is less than 1. Proof: Since ADAT is positive definite, so is LDLT ( see theorem (3) for definitions ), and thus D is positive definite. Using the structure of D, D is positive definite. Thus, from equation (6), zT(I - EET)z > 0 for all z. Thus zTEETz < Tz for all z, and we have our result. 3 A method for system with several rank one updates In this section we develop an inductive version of the Sherman - Morrison - Woodbury formula for solving the system (8). For each j = 1, * * ml, let Ej be the jth column of the matrix E, and Emi+l = Z2' 5

Also, let B1 = I, and for each k = 1,, mi Bk+l = Bk - EkET and, note that Bm1+1 = I - EET. Now, for each k = 1,..., mi + 1 and j = 1,., ml + 1 define BkE(k) = E. Thus, for each k = 1, * *, ml and j = k + 1,., ml + 1, Bk+lE (+l) =E or, (Bk-EkE )E(k+) = E or, (k) k+l) E(k) (I-Ek)E )E - E.k) (9) Using the Sherman - Morrison - Woodbury formula, we can write the solution to equation (9) as < Ejk~i)E; (k) 3 3 - E k E^ =^+- <+Eje E?) 1 -< Ek,E > - k) ) < Ek, Ek E > (k) 3 E 1-) < Ek, E) > where <.,. > is the usual inner product of two vectors. We can prove the following result about the above procedure: Theorem 6 The solution z2" of the equation (8) is E(m'+l) Proof: This is readily seen since by definition, Bmi+1 E(m+l) = Emi + substituting the identities Bmn+i = I - EET and Em+il = z, we obtain the theorem. The above inductive procedure ( on k ) suggests the following algorithm in a vector and a distributed/parallel environment: 6

Step 1 1. Communicate Ej to processor j = 1, m, ml + 1. 2. At processor j, set E(1) = Ej. 3. set k = 1. Step 2 From processor k, communicate Ek), Ek and < Ek,E) > to each processor j = k+ 1,**,m + 1. Step 3 At processor j, j = k + 1,,mi + 1 compute E(k+l) E()+ < Ek,E E(k)) 1- < Ek, Ek >k Step 4 Set k = k+ 1. If k < m, then go to Step 2, otherwise declare E'k+l) as the solution 2"' of equation (8). A careful count of the number of multiplications needed are summarized in the following theorem: Theorem 7 The number of multiplications required by the algorithm to solve equation (8) is mlm2 + 2mlm2. In the vector and parallel environment, the number of vector operations required is 3ml. Proof: Can be readily verified by a careful counting. There is a considerable advantage in keeping ml < m2 while solving equation (8), a m2 x m2 system, by the above strategy. Otherwise, it would require mlm2 multiplications to obtain EET, -(m3+3m2-9m2) to obtain the Cholesky factor of I-EET and m2(m2+1) to solve the system using the Cholesky factor. The method of this section is less advantageous if m2 < mi; and to use it effectively ml and m2 should be re-defined. 4 Transportation and Assignment Problem The transportation problem is the following: given m supply depots, with si (> 0) as the units of supply of some good at depot i for each i = 1, * * *, m; n demand centers with dj (> 0) as the units of demand of the good at the center j for each j = 1, *., n; and cij (> 0) the cost 7

of shipping one unit of good between depot i and center j for each i = 1,, m, j = 1, * -, n; find the least cost quantity of good shipped between each depot and center. We assume here that the transportation problem is balanced, or E=1 si = Z < di; i.e., there is just enough quantity of good available at the supply depots to meet this demand. In this case it is well known that the transportation problem has an optimal shipping schedule, and it can be found by solving the following Transportation Linear Program: m n min yE CijXij i=l j=1 j i,= si, i= l,-,m j=1 m xij = dj, j = l,,n i=l xi,j 0, i =l,.., m; j= 1,- -,n. where xij is the number of units of good shipped from depot i to center j, for each i and j. The Assignment problem is the following: given there are n individuals that can do any of the n tasks; and, assigning the individual i to the task j costs cij (> 0) for each i = 1,.., n, j = 1,., n, find the least cost assignment of individuals to tasks. It is also required that each individual perform exactly one task, and that each task be performed by exactly one individual. This problem can be cast as a balanced transportation problem, with each individual associated with a supply depot with exactly one unit of supply, and each task with a demand center with exactly one unit of demand. The only difference is that the variables xij are required to take on values 0 or 1. As is well known, because of the total unimodularity property of the constraint matrix, setting Si = 1 and dj = 1 for each i = 1, *, n and j = 1,..., n and solving the transportation linear program suffices finds the optimal assignment. The dual of this linear program is the following: m n max siUi + E djvj i=1 j=1 i + Vj + Sij = ci,j for all i = 1, * *m, j = 1, *, n si,j > 0 for all i = 1, ***m, j =1,* *,n 8

where ui and Vj are the dual variables and sij are the dual slacks for each i and j. The constraints of the (primal) linear program fall naturally into two sets, the supply constraints and the demand constraints. It is this partition that we will exploit in the algorithm. As is well known, the rank of the constraint matrix is m + n - 1, which is one less than the number of constraints. Thus one constraint must be discarded to ensure that the constraint matrix has full row rank. We will discard the supply constraint associated with the depot m, and use the following constraint matrix: eT 0 *.*0 0O 0 eT.. 0 0 0 0... eT 0 I I I I where eT = (1, 1,., 1), a n vector, and I is the n x n identity matrix. Thus we define, eT 0... 0 0 0 eT -. 0 0 G=, I=(II, ), 0 0... eT O where G is a (m - 1) x mn and H is a n x mn matrix. The diagonal matrix D has mn entries along the diagonal, and has the partition D(1) 0... 0 0 D(2)... 0 D = O 0... D(m) where D(') is a n x n diagonal matrix, and its jth diagonal entry Dj is the ijth diagonal entry of D, which corresponds to the entry of D associated with the variable xij. The exact form of this entry depends on the particular interior point algorithm implemented. 9

It can now be readily confirmed that eTD(1)e eTD(2) e GDGT = eTD(m-1)e a (m- 1) x (m - 1) diagonal matrix; and n HDHT = D(') i=1 a n x n diagonal matrix. Thus L1 and L3 are diagonal matrices, with the jth diagonal entry of L1 and L3 being (Ek=l D(?) and (E= D()), respectively. Also eTD(l) eTD(2) GDHT - eTD(m-l) and L2 = (D(l)e, D(2)e, *, D(m-1)e)L-1 Thus E = 3 LL2 = L(D(l)e, D(2)e,..., D(m-)e)Ll1, which is a n x (m - 1) positive matrix. Theorem 8 Each iteration of the interior point method requires nm2 + 6nm + 2(m - n + 1) multiplications for the partitioned assignment problem. Proof: Using the structure of L1, L2 and L3; and solving (8) by the method of section 3, and carefully counting the multiplications at each step, we get the theorem. 10

5 Generalized Upper Bounding Problem The generalized upper bounding linear program is the following: minimize Coxo AoXo + cTf1 + AlXl T e C2 + c'aT22 + A2X2 +.. + cT p + *- + Apxp = b =1 Te 2 2 a2 = 1 1 T ep _p P- ~ o > 0 a1 >0 z2 > 0 where, for each j = 0,..,p, Aj is a m x nj matrix, cj and xj are nj vectors; and for j = 1,... p, ej a is a nj vector of all ones. Transportation and assignment problems have this structure as well, but in important applications, m << p ( i.e., m is much less than p ). We make the usual assumption that the constraint matrix has the full row rank m + p. As is evident, the constraints of this problem have the natural partition (1) with G= (Ao, Al,.., Ap) O eT O...0 O e ** 0 o o? 4.. 0 H= L0 0 has the partition 0... e ( corresponding to the partiton of A ) and the diagonal matrix D (10) where D(j) is a nj x nj diagonal matrix corresponding to columns of xj in A, for each j= 1,...,p. 11

Here p GDGT =. AjD(j)A j=O eTD(l) el T e(2)e 'D T = |e2 D)e2 HDH = eTD(P)ep with HDHT a diagonal matrix. Thus L3 is a diagonal matrix, and L1, the Cholesky factor of the m x m matrix p AjD(j) AT = L1L1. j=o In applications, even when Aj are sparse, we would expect their sum to be considerably more dense, and thus we expect L1 to be relatively dense. Also GDHT = (A1D(l)el, A2D(2)e, *, ApD(P)ep). Thus if L2 (11, 2,', Ip), Lllj = AjD(j)ej and we can expect the p x m matrix L2 to be dense. In addition, E = Ll L2. Theorem 9 Each step of an interior point method requires 3pm2 + 2m3 + 2pm + 3m2 - m + 2p when applied to the partitioned GUB problem. Proof: Can be obtained by a careful calculation of the work at each step of the algorithm. The multiplications required to obtain the Cholesky factor L1 are included in the above formula. 12

6 Block diagonal linear program The block diagonal linear program is the following: minimize c Xo Aozo + c TX + A1ll A1z2 + c aT2 + A2x2...~ + Cc xp + - - + APXP =bo A2x2 ApXp = bp xp > 0 Xo > O X1 >0 22 >0 where, for each j = 0,*., p, Aj is a mo x nj matrix, cj and Xj are nj vectors; and, for each j = 1,...,p, Aj is a mj x nj matrix. Let m = mo and M = ml + * * * + mp. The constraints of this problem have the natural partition (1) with G = (Ao, A1,.-., Ap) 0 Ai 0 0 0 **~ 0 A2.. 0 0 0 0. * Ap has the partition ( 10). It can and the diagonal matrix D be verified that p GDGT = EAjD()AT; j=o and, A D(1)AT -AD(2) -T A2D(2)A2 HDHT A DWAT A i - J is a block diagonal matrix. As in the GUB case, the factor L1 of GDGT will be relatively dense. L3, the Cholesky factor of HDHT is block diagonal, and each diagonal factor can 13

be computed independently. Thus, we can assume that When Aj is sparse, we can preserve the sparsity of L3j (the Cholesky factor of AjD(j)AT ) by the techniques of sparse Cholesky factorization, George and Liu [3]. Also GDHT = (A1D(')A,.., ApD(P)A ) is a m x M matrix. If (, * *, Tp), LzL = AjD(j)A We expect L2j to be, generally, dense. Let E = (E(1), E(2),., E(P))T which conforms to the partition of L3, Then, for each j = 1,2,...,p L3jE() = L2 and we can readily establish the following theorem: Theorem 10 For a dense problem,it takes 1 E=o (m3 + 3m2 - 9mj) + -Mm(m + 1) + m2 Z^=i mj(mj+1) multiplications to generate L1, L2, 43 and E; and, takes Mm2 +4Mm+ m2 + m + ZP m2 + M multiplications to solve the systems of theorem (4). If mj = m for all j = 1, * p, the above forms reduce to 6(m3 + 3m2 - 9m + p + 3pm2 - 9pm) + -pmm(m + 1) + {pmm(mi + 1) and pm2rm + 4pmm + m2 + m + pm2 +pm respectively. Thus, in this case, the computations grow linearly in p. Proof: This can be readily proved by a careful counting of the required multiplications. One specially structured, and important problem which shares the block-diagonal structure is the Multi-commodity flow problem. In its node-arc formulation, p is the number of commodities, Ai = I for each i = 0, * * *, p; and Ai = R, where R is a node-arc incidence 14

matrix of the directed network through which these commodities flow. Thus, p GDGT = D(j) j=o is diagonal, and the sparsity pattern of AiD(i)A'T, for each i = 1,..,p, is the same. Also, AiD()AT = D(i)RT Thus p L2i ( D(j))-2D(i) RT j=1 P L3iE(i) = RD(i)(C D(j))-. (11) j=1 L3i is the Cholesky factor of RD(i)RT, and, for each i, has the same sparsity pattern as that of the factor of RRT. It is readily confirmed that, in the notation of George and Liu [3], the graph associated with RRT is the unordered network on which the commodities flow. This facilitates considerably the use of their techniques for generating sparse Cholesky factors L3i. Since each column of R has only two nonzero entries, E(i) will be sparse; and will have the same sparsity pattern for each i. For a network with m nodes, n arcs and p commodities, L1 is a n x n diagonal matrix, for each i = 1,.,p, L3i is m x m matrix, L2i is m x n and E is pm x n. Also, assume that it takes 6 multiplications to get a sparse Cholesky factor of RRT; and note that 6< -(m3 + 3m2 - 9m); and that the number of non-zero elements in this factor are r). Then, the following can be shown: Theorem 11 Given L1, L2, L3 and E, it takes pmn2 + 4pn + 2n + pr1 multiplications to solve all the systems of theorem (4). 7 Computational Experience In this section, we present some computational experience of solving assignment problems by the procedure suggested here and by the state-of -the-art code LOQO, Vanderbei [9]. 15

LOQO is a state of the art interior point code based on the primal-dual homotopy method, and implements a predictor-corrector strategy for tracing the path of centers. The per iteration times of this code are compared with the per iteration times of a specialized transportation code, implementing the dual affine scaling strategy. The LOQO per iteration times may be a little larger because of the step size selection in the predictor-corrector strategy. The per-iteration times and their ratio are given in Table below. We point out that for these special problems, the specialized version was about 3 to 4 times faster than the general purpose code, LOQO. Assignment LOQO Ratio Problem Size iter time * iter time iter time iter time Loqo/Asgn 1 200x200 14 20.78 1.48 15 75.72 5.048 3.41 2 200x200 17 25.09 1.476 17 85.30 4.911 3.33 3 200x200 21 31.02 1.477 20 96.61 4.830 3.27 4 300x300 14 65.72 4.69 18 301.07 16.72 3.565 5 300x300 19 88.80 4.67 19 314.16 16.53 3.540 6 300 x300 21 98.06 4.67 21 418.5 16.86 3.61 * This time is the total time for all the iterations, without the input/output time. All times are in seconds. The three problems of size 200 x 200; and, of size 300 x 300 are generated randomly, and present increasing difficulty to interior point methods. On the first problem, these methods converge to a solution in the interior of a face, with very few variables at value 1. On the second problem, these methods converge to a solution with more that 75 % of the variables at value 1, while on the third problem they converge to a vertex, with all variables at value 1. For the first problem, LOQO found a solution to the accuracy of eight significant figures, but for the second and third, it was unable to find this accurate a solution. For these problems it found a solution to 7 digits of accuracy. 16

References [1] E. R. Barnes, " A variation of Karmarkar's Algorithm for solving Linear Programming Problems, " Mathematical Programming 36(1986) 174 - 182. [2] S. C. Fang and S. Puthenpura, Linear Optimization and Extensions, Prentice Hall, in print. [3] A. George and J. W. Liu, Computer solution of large positive definite systems, Prentice Hall, Englewood Cliffs, New Jersey (1981). [4] N. Karmarkar, " A new polynomial time algorithm for linear programming", Combinatorica 4(1984) 373 - 395. [5] M. Kojima, S. Mizuno and A. Yoshise, " A primal-dual interior point method for linear programming ", in Progress in Mathematical Programming: Interior Point Methods, edited by N. Megiddo, Springer Verlag, New York (1989) 29-48. [6] L. S. Lasdon, Optimization Theory for Large Systems The Macmillian Company, London (1970). [7] R. Saigal, "An Infinitely summable series implementation of interior point methods", Technical Report 92 - 37, Department of Industrial and Operations Engineering, University of Michigan, May 1992. [8] R. J. Vanderbei, M. S. Meketon and B. A. Freedman, " A modification of Karmarkar linear programming algorithm ", Algorithmatica 1(1986) 395 - 407. [9] R. J. Vanderbei, LOQO User's Manual, Program in Statistics and Operations Research, Princenton University, Princeton, N. J. 08544, June 1992. 17