NEW ITERATIVE METHODS FOR LINEAR INEQUALITIES Kai Yang Department of Industrial & Operations Engineering University of Michigan Technical Report 90-6 February 1990

NEW ITERATIVE METHODS FOR LINEAR INEQUALITIES Kai Yang February, 1990 Abstract Two classes of new iterative methods for solving systems of linear inequalities are considered in this report. In the first class of methods, a group of violated constraints are identified and a surrogate constraint is derived by taking a convex combination of these violated constraints. Each iterate is obtained by making an orthogonal projection from the current iterative solution onto the surrogate constraint. The second class of methods belongs to modified Newton's method and it can find least squares solution of linear inequalities. These methods can be implemented very efficiently, especially when the systems are large and sparse. Convergence proofs are provided. Key words: Linear inequalities, Surrogate constraint, Iterative methods, Least squares solution, Modified Newton's Method. 1

1. INTRODUCTION We are concerned with a system of linear inequalities: Ax < b (1.1) where A is an m x n matrix and b is an m-vector. We are interested in finding a feasible solution x for system (1.1). Solving a system of linear inequalities is a fundamental problem in linear optimization and it has many applications. These applications include linear programs [22], and in particular, the problem of image reconstruction from projections [7]. The image reconstruction problem has arisen in a large number of scientific fields. In medical science, computerized tomography reconstructs the images of crosssections of the human body by processing data obtained from measuring the attenuation of X-rays along a large number of lines through the cross-section. Other image reconstruction problems include remote sensing [13], seismic tomography [3] and industrial nondestructive testing. There are basically two approaches which could be used to solve the system (1.1). The first approach is to transform the linear inequality system (1.1) to a linear programming problem and then use well-established methods such as Karmarkar's method [21] or the simplex method to solve the resulting linear program. Usually this approach is not applicable to the image reconstruction problems due to the following two difficulties. The first difficulty is the special environment within which the problem has to be solved. This environment is characterized by: (i) the immense dimension of the system (1.1); for example, n > 105 and m is even greater. (ii)the sparsity of the A matrix; usually the sparseness of matrix A is less than 1% and the 2

sparsity structure of A varies greatly from matrix to matrix so it is impossible to take advantage of any structure pattern of sparsity. (iii)the poor conditioning of matrix A; (iv)the restrictions on computer power; usually image reconstruction problems are solved by small, dedicated on site computers and the objective is to obtain approximate results in a relative short period of time. Most linear programming algorithms require matrix manipulations, such as factorization, projection and etc. Since it is difficult to take advantage of the sparsity patterns of the A matrix, sparse factoring techniques can not be used. Using conjugate gradient type methods to do the matrix manipulations is also not an option since it is very difficult to find a preconditioning technique to preprocess this kind of A matrix. Another difficulty is that system (1.1) may be inconsistent. It is then often desirable to find a least squares solution [16], but most linear programming algorithms can not be used to find least squares solutions. The second approach involves using iterative methods to solve system (1.1). Most of the iterative methods are derived from the relaxation method for linear inequalities (Agmon Motzkin and Schoenberg [1954])[1][20] and Kaczmarz's method[17] for linear equations. There are numerous special implementations of the relaxation method which are called Algebraic Reconstruction Techniques (ART) [7][9]. Another method is derived from Cimmino's algorithm [10] for linear equations. Y. Censor, T. Elfving [1982] [8] and A.R. De Pierro, A.N. Iusem [1985] [11] developed a Cimmino type algorithm for linear inequalities. Other iterative methods for linear inequalities include Magasarian's SOR type methods [18]. Those iterative methods always work with the original data and most of them do not need matrix manipulations. The basic computation steps in iterative methods are extremely simple and easy to program. Because of those advantages, all linear 3

inequality solvers which are used in image reconstruction are iterative methods. However, the rates of convergence of those iterative algorithm are usually slow [25] and therefore, it often takes hours to solve a single image reconstruction problem. In this report we propose two classes of new iterative algorithms for solving (1.1) which are sparsity preserving and have finite convergence properties. Therefore, this paper is divided to two parts and each part describes one class of methods. The first class of methods is called'surrogate constraint methods'. There are three different surrogate constraint methods in this paper designed to tackle image reconstruction problems. The second class of methods is a revised version of S. P. Han's method for finding the least squares solutions of linear inequalities [16] which belongs to the class of modified Newton's method [24]. In section 2 we introduce the surrogate constraint methods and define notations used in PART I. Sections 3, 4 and 5 describe the first, second and third surrogate constraint algorithms, respectively. Section 6 presents the geometric interpretation of the surrogate algorithms and Section 7 presents their extension to linear equations. Section 8 introduces the second class of methods and defines notations used in PART II. Section 9 describes the method. Convergence proofs and geometric interpretations for this method are presented in section 10 and some computational results are presented in section 11. Some theorems and their proofs are summarized in the Appendix. 4

PART I THE SURROGATE CONSTRAINT METHODS 2. INTRODUCTION TO SURROGATE CONSTRAINT METHODS In their classical papers Agmon [1] and Motzkin and Schoeberg [20] introduced the relaxation method to solve the system of linear inequalities. The method is called'relaxation' method because constraints in the system (1.1) are considered one at a time. At each iteration a violated constraint is identified and an orthogonal projection is made onto this constraint from the current iterative solution. So it is also called the'successive orthogonal projection' method. Bregman [1965, 1967] [5] [6], Eremin [1965] [12] and Gubin et al. [1967] [15] extended this' successive orthogonal projection' method to general convex feasibility problems. Making an orthogonal projection onto a single linear constraint is computationally inexpensive. However, when solving a huge system of linear inequalities, which may have hundreds of thousands of constraints, considering only one constraint at a time would lead to slow convergence. Instead, we would like to process a group of constraints at a time. But making an orthogonal projection onto a group of constraints is computationally expensive. In PART I we will introduce a class of methods which are able to process a group of violated constraints at a time but retain the same computational simplicity as the relaxation method. This class of methods is named as'surrogate constraint method' since at each iteration, a group of violated constraints is identified and a'surrogate constraint' is derived from those constraints. An orthogonal projection is then made onto this surrogate constraint from then current iterative solution and the process is repeated until a feasible solution is found. 5

In order to describe the surrogate constraint methods precisely, it is necessary to define some notations and list the assumptions made for PART I. Assumptions and Notations Notations: Ai: i-th row of matrix A. aij: (i,j)-th element of matrix A. K= {xlAx < b}, the feasible solution set of (1.1) bi: i-th element of b vector. Assumptions: IIAill2 = 1 for all i=l,..., m. System (1.1) is feasible, i. e. K: 0 in section 3, 4 and 5. Additional Notations: I c{l,...,m}, is an index set. Hi = {xlAix=bi },wherei {,...,m} Ki= {x lAix- bi }, where i{ l,...,m} KI = {(nhK }iE I. KI is a convex set and KI # 0, since KI D K. d(x, Hi ) = minimum Euclidean distance from x to Hi. d(x,Ki ) = minimum Euclidean distance from x to Ki. Note that d(x,KI ) = 0, if xe Ki; otherwise, d(x,Ki) = d(x,Hi). 4(x) = sup d(x, K ) iE(1..,m) d(x, K) =minimum Euclidean distance from x to K. 6

Here we define the length of the binary encoding of all problem data in (1.1) as: L=, log(laijl+l)+ log(lbil+l)+lognm+2 (2.1) i j i In sections 3, 4 and 5 aij and bi are required to be integers. This is not a serious restriction even if we require IIAill2 = 1 for all i, since we can simply determine L first and then rescale the problem. Two lemmas which will be used in the convergence proofs are as follows: LEMMA 2.1. If the system (1.1) is feasible, then there is a feasible solution 9 with,Kjl 1 2 L/2n, j = 1,...,n. LEMMA 2.2. If the system: Aix < bi + 2 -L (i =..., m) (2.2) has a solution, then system (1.1) is feasible. For the proofs of these lemmas, see [14] [22]. It is well known that if we are able to find a solution x for system (2.2), then there exists a procedure with polynomial complexity which can construct a feasible solution for (1.1) from x. So theoretically, if an algorithm is able to find a solution for (2.2) in a finite number of steps, then the algorithm is also a finite algorithm for solving linear inequality (1.1) [22]. 3.THE FIRST ALGORITHM (Basic Surrogate Constraint Method) The first algorithm is called'basic surrogate constraint method', in which at each iteration all violated constraints are identified and a'surrogate constraint' is derived by 7

taking a convex combination of these violated constraints. An orthogonal projection is made onto the surrogate constraint from the current iterative solution and this process is repeated until a feasible solution is found. Additional Notations I = I(x)= ilAix -bi > 2 A = (Ai)iEI b= (bi)iE I II(x)l, the cardinality of the index set I(x), it is assumed that II(x)l = q. I = (x1,...,C q), is a weight vector consisting of positive real numbers satisfying: i > for all i=l to q and 7ti = 1. i=l Surrogate Constraint: A surrogate constraint for the system Ax - b is the constraint of the following form: xt AI x i bI. Surrogate Hyperplane: Hs= {xlt AI x = 7tbI }. 3.1. Algorithm 1 (Basic Surrogate Constraint Method). Initialize: Set x~ = OeR 1n, and k=O; start. Step 1: If A xk _ b is feasible, stop. xk is the solution of system (1.1). Step 2: Determine the index set I= I(xk), and select a weight vector n. Compute: xk+1 =xk - x, (iK AI k - ibI )( ( AI )T x = 2 (3.1) 11n AII2 where 0< X <2. 8

Set k = k+l, and go to Step 1. Remark 3.1. Apparently, in every iteration of this surrogate constraint algorithm, if the relaxation parameter X is equal to 1, then the new point xk+ is the orthogonal projection of the current point x to the surrogate hyperplane Hs. ( See FIG. 3.1 ) Surrogate Hyperplane v\ 1\ Y k X FIG 3.1 Illustration of the Surrogate Constraint Method Remark 3.2. ( Recommended choices of the X vector): (1) Weight by error. The quantity ri = Aix - bi, denotes the Euclidean distance from the current point xk k to K,for each ie I(x ). Since larger ri corresponds to greater infeasibility with respect to Ki, it may be desirable to make 7i proportional to ri. Therefore the following formula for computing the i vector is recommended: 9

i = -r clearly, ir >0 and /Ii= 1. Xri ieI (2) Weight equally. 1 7ni =4- (3) Convex combination of the two methods: ti = Xri + (1- )-, where 0 < X < 1. q Xri iel 3. 2. Convergence Results. DEFINITION 3.1. A sequence xk }k= is called strictly Fejer-monotone with respect to the set K iffor every xe K: /lxk+ - // < // xk- for all k>O. (3.2) It is easy to check that every Fejer-monotone sequence is bounded. THEOREM 3.2. Any sequence {x}k=l generated by Algorithm 1 is strictly Fejermonotone with respect to K, provided that x K for all k > 0. PROOF: Vxe K we define ek =xk - x. Then 10

ek+1 = k X (7c AI xk - 7b )( A)T and e e and: 2 Ii1c AI i lek+l 2 = i 2eki2 + n A, X - x bI 11 2 (c AX - tbI1 ) e k lie II = lie II + —----- A -2 - 2l A —--- u(t A,) e kIt- AIII ( IlX AIII =ki lekl2 + 112 AIX - _ bI 11 Ie 11 ll + 117 A11l2 (K AIXk - bI) k I- 2X I l2 (tC Ai) (x -x ) = 1 l ll+ Ai x A - rbI 11 Iekll2 + Il: C AiII2 llekl 2 + AI X - b I 11 - ----- (K ATx k-Tb - K Al X +nbr) Il1 AiII2 A2 + 2X 2 ll17 AIll2 17 AIIl2 k Since C AI x > itb and it AI x< tbI, we have (C AI X - 7tbI )(7AI X - ~bI ) 2X ( A )(A x - ib )- < 0. Therefore it follows that Ili7 AIl2 llekl2 2 l- AI xk - bI i2 < llekIl2 (3.3) IJ117 Ai2Q. Q.E.D. 11

T -- 111. 1 - I / \ as Y \r Kecaii mtat x) = sup a(x, i ) iE[{..,m} THEOREM 3.3. Any sequence{xk}kU generated by Algorithm 1 has the property: lim (X )=O k=oo PROOF: Fejer-monotonicity implies that the sequence{llekll}k=l is monotonically decreasing, thus converging. It follows from (3.3) that 1 im ( t AI xk - tbI ) = 0 k-c Since ni > 0 for all i, which implies that 1 im ( Aix - bi) =0 for all i E I. Therefore im (xk)= Therefore l im ((xk)= 0. k=oo Q.E.D. LEMMA 3.4: If K4 0 an if the sequencexk }k= satisfies the conditions (i) {xk}k=1 is Fejer-monotone with respect to K, and (ii) I m t(xk)= 0 k=oo then xk = x e K PROOF: Follows from Lemma 5 and Lemma 6 of Gubin et al.[15] THEOREM 3.5. Algorithm 1 converge to a point x E K in afinite number of iterations. PROOF: It follows from Theorem 3.2, Theorem 3.3, and Lemma 3.4. that Algorithm 1 converges to a point x E K. From Lemma 2.1 it follows that lie~ II < 2L"l/n, If at iteration k, Ai xk - bi < 2 - L for all ie { 1,...,m) then, from Lemma 2.2 the system is feasible and we found a solution for (2.2). A solution for (1.1) can be constructed from x by an polynomial procedure. So if we assume that Ai xk - bi > 2 - L for all k, it follows that 12

cAix-b2i( ~ i 7C)*2 L=2-L i=l and It AI \II_ I llt 1121AI 112 < 1it 112 11A\iF = ( ti )1/2 (i laijl2 )1 < i=1 ieI j=l IIAI xll2 where IIAI 112 = sup Ilxl12 x i IIAIIF is the Frobenius norm of AI and n IIAIlIF ( laijl2)12 ieI j=l It follows that llek+l12 < llekl2 X(2 - )- 2 -L q Therefore, Algorithm 1 converges within k steps where 2L -2 -2L _q-2 2L2/n - 2 L X(2 - X)-2 -2L q2 4L-2 - (2 - X)-n Q.E.D. 4.THE SECOND ALGORITHM (Sequential Surrogate Constraint Method) In many applications requiring the solution of linear inequalities, the coefficient matrices of (1.1), that is, the A matrices, are often very large ( m and n are of the magnitude 105 or more ) and sparse (less than 1% of entries are nonzero). If the system 13

(1.1) is to be solved by the computer on site, working on the whole matrix A is almost impossible. So, it is preferable to work on one small subset of constraints of (1.1) at a time. Specifically, the matrix A can be partitioned into p submatrices, and the right-handside vector b can be partitioned into p subvectors, as follows A1\ b1 A = Ai and b = b (4.1) \AP bP where A is a matrix with m rows and n columns, b has m rows, for i = 1 to p, and Emi =m. i=l Now we will show that the surrogate constraint method can be used to solve the system (1.1) by successively solving the subproblems, Aix bi, i = 1 to p in cyclic order. 4.1 Notations: Ii = I (x) = {ilAix- bi > 2L } for i=l to p qi = Iii I, the cardinality of Ii Pi = (I1,..., mi ) for i= 1 to p, where n1 >0 ifjeIi and ij =0 if jO Ii and 7j =1 foralli=ltop. j Ii The surrogate constraint of the i-th subproblem is 1i Aix _ Ei bi And the surrogate hyperplane of the i-th subproblem is /7i Aix =:i bi 14

4.2 Algorithm 2 (Sequential Surrogate Constraint Method) Initialize: Set x~ = OE Rn, and k=O; start. Step 1: Do i = 1 to p while Ii (xk) + 0 for at least one ie 1,...,p}. If the index set Ii = Ii (xk) i 0 for subproblem i, then select a weight vector 7ci, and compute k+l k k x =x -X i where d k (4.2) where dk= (KiAix - ibi )(/ciAi)T (4.2) IIiAAill2 where O < X <2. Else if the index set Ii = Ii (xk)= 0 for subproblem i, then k+l k x = Endif Set k = k+l, next i, continue. Step 2: If Ii (xk) = 0 for all the subproblems Aix< bi, i=l to p, then x is a feasible solution, stop. k xk+p 15

FIG 4.1 Diagram of the Sequential Surrogate Constraint Method 4. 3. Convergence. LEMMA 4.1. Let FeRn be the half space corresponding to the inequality cTx < cTco. Let z be an element of Rn such that z F and whose projection on F coincides with co. Then the inequality /Iy - zI// < /Iy - zl/ is satisfied, where zA = z - Ahz - co)for 0<A<2 and y is an arbitrary element of F. PROOF: See [1] [12]. CO I Ily-JI / I y/,- Ily-zll FIG. 4.2 Illustration of Lemma 4.1 COROLLARY 4.2. Let P be the half space corresponding to the inequality 7i Aix i 7i bi, or if Aix - i Ai(xk - dk ),for each i=l to p in Algorithm 2. Let xx = xk - /dk where 0<A<2. Then, I y - x // < //y - xk /, Vy E IP. LEMMA 4.3. If rF=xl / Aix < n bi = n A i(xk- dk ), then K crI. 16

PROOF: Recall that K ={xlAx < b), clearly Vx e K, ni Aix _ ni bi has to be satisfied for any i = 1 to p. Hence the result follows. IHs k+l x k ~- -—. — X FIG 4.3 Interpretation of Lemma 4.3. THEOREM 4.4.Vx K, //x - xk+ // < // x - xk// in Algorithm 2. PROOF: It follows immediately from Corollary 4.2. and Lemma 4.3. THEOREM 4.5. Any sequence[x }k= generated by Algorithm 2 has the property: lim ((xk)= k=oo PROOF: Theorem 4.4 implies that the sequence({llxk- xll}k=l is monotonically decreasing, thus converging. It follows from (4.2) that lim ( i Aixk _ i bi) = 0 for each subproblem Aix< bi for i = 1 to p, since for each k= 00 subproblem 71j is strictly positive for all je I and which implies that lim (Ax - b) =0 forallje Ii and for all subproblem Aix b, i= 1 to p. k= 0 Therefore 1 im O(xk)= 0. k= 0oo 17

Q.E.D. THEOREM 4.6. Algorithm 2 converges to a point x E K in a finite number of iterations. PROOF: It follows from Theorem 4.4, Theorem 4.5 and Lemma 3.4 that Algorithm 2 converges to a point x E K. From Lemma 2.1 and let's define ek =xk - x, where xe K, it follows that: lle~ll< 2Ll/An. If Ii (xk) + 0 for i-th subproblem, then k+l k (ei Aixk _ ni bi )( /iAi)T ~e e~ 2 and llIiAill2 lle + l < lle ll2 -2 ) 11 A i bi After a complete scan of all p subproblems if Ii (x) = 0 for all i = 1 to p, then xk is a feasible solution for the problem and we are done. So we assume that there exists at least one subproblem i, i { 1,...,p, for each complete scan of all p subproblems, such that I (xk) # 0, and we assume that Max IIi (xk)l =q, It follows that i iAixk - bi>( i )-2 -L = 2-L i=l and IlI iAill < I i 11Ai 12 < II i 11211A IIF = ( )1/2 ( laij12)1/2 i=l iEI j=1 18

Hence ie12 < e2 (2 - )2 L for every p iteration. q Therefore, Algorithm 2 converges within k steps where p- q2 2 L - 2 /n - 2 - 2 L.2 4L-2 2L 2 -2L 4< k - pq'2 -/n - 2 < p'q'2 X )(2 - )~2 (2- ))-n Q.E.D. 5. THE THIRD ALGORITHM (Parallel Surrogate Constraint Method) The surrogate constraint method can also be implemented to work on ALL of the subproblems, Aix g bi, for i =1 to p of (1.1) SIMULTANEOUSLY. This is particularly good for the computers with parallel processors. 5.1 Algorithm 3 (Parallel Surrogate Constraint Method) Initialize: Set x~ = OE Rn, and k=O; start. Step 1: Do while Ii (xk): 0 for at least one ie {1,...,p}. For all i=l to p, IfIi (xk): 0, then determine the index set Ii = Ii (xk), and select a weight vector i, define: Pi(x = xk - dk k (i1 Aix _ ni bi )( iAi) (5.1) where d = 2- A (5.1) 19iAill 19 I

Else, Ii (xk) = 0, define Pi(x) = xk Endif Compute Pi(xk) for all i = 1 to p simultaneously on parallel processors. p Define P(xk) = ri Pi(x), and P(x) = I + (P - I); i=l p where t i = 1, Ti > 2 > 0 for all i such that Ii (xk) 0 i=l and 0 < < 2. Compute: xk+ = P(xk ) If Ii (xk) = 0 for all the subproblems Aix~ bi, i=l to p, then x is a feasible solution, stop. Step 2: FIG 5.1 Diagram of the Parallel Surrogate Constraint Method 20

5. 2. Convergence. THEOREM 5.1. Vx K //x 1 x /</xk - x //in Algorithm 3. PROOF: x+ = p(X ) = x i=1 Ti (7i Aixk - Ii bi )( niAi)T iIItA1ii2 Let: e= xk -x, then p k+l= ek e = e - i=l i=l (;i Aixk - 7i bi )( 1iAi)T ti' *-J IibiAil112 and it is clear that Pi > 0, for all i=l to p *.','Ti (7iC Aixk -_ bi ) Let Pi = iiA1 -112 IlIniAill2 and let: T = (p1:1,,p.,[i 1i,...,p plP)E r1Rxm Then: (ek+l )T= (ek )T - XTA and: llek+1112 = liekl2 + 2TAATTT = liek112 + X21rTA12 -2 = liek112 + 221rTAI2 - - 2XTAek p i=1 (7i Aixk-;i bii (xk T- - nl iAill' lii Ai (x - x)] -b)] (;i Aixk -; i bi 4i i-c-iAill2 --- i(Aixk 21

p + 2X Ti i=l (7ni AiXk _-i bi - 11- iAill2 Lti(Aix - b)] p < llell2 + ( 2 -2X)X i=l (Ti Aixk - 7i bi ) Ti -'iA-ll: i(Aix -b)] = liekII2 - (2-)lIITA112 (5.3) < llek112 Q.E.D. THEOREM 5.2. Any sequenceixk }l generated by Algorithm 3 has the property: lim (xk)= O k=00 PROOF: Theorem 5.1 implies that the sequence(llxk-xll}k=1 is monotonically decreasing, thus converging. It follows from (5.2) and since Ti > 0 for all i such that Ii (xk) # 0, then: lim ( i Aixk - i bi) = 0 for each subproblem Aix- bi, i = 1 to p, since for each k= oo subproblem nj is strictly positive for all je I and which implies that i k i lim (Axk - b) =0 forallje I' andforallsubproblem Aix bi, i = 1 top. k=o o Therefore 1 im O(xk)= 0. f= 00 Q.E.D. THEOREM 5.3. Algorithm 3 converges to a point x E K in a finite number of iterations. 22

PROOF: It follows from Theorem 5.1., Theorem 5.2. and Lemma 3.4. that Algorithm 3 converge to a point x E K. From Lemma 2.1 it follows that le~ll < 2L'l/-n. Let xeK and define ek = xk - x, if Ii (xk) # 0 for at least one subproblem, say, the i-th subproblem, then from (5.2) and (5.3) we get llek+ 12 < liekl2 - X(2- k) IITA112 = k,l,2 -v 11nlit Aixk - 7i bi 11 -Ilekl -2 3(2- X) "i 1 i AilI2 < llekI2 - (2- -,)ri I If I (xk) = 0 for all i sub blems i 1i 2 is sib ti If Ii (xk) = for all i subproblems i = 1 to p, then x is a feasible solution for the problem and we are done. So we assume that there exist at least one subproblem i, ie {l,...,p}, such that Ii (xk): 0, again we assume that Aixk - bi > 2.2 - L for all k and assume Max IIi (xk) I =q then iAixk -Ibi_ ( 1ni 2' 2L L and i=l a _n llt iAill <IIt 112IIAi2 _ Ii 11211AI F = (I,7 )1/2 ( laij2)1/2 < q i=l ieI j=l It follows that llekll2 < Ilek 12 - (2 )2 -2Lfor every p iterations. Therefore, Algorithm 3 converges within k steps where 23

2L-2 -2L 8L-2 k = pq2 - 2/n - 2 < p.q'2 X(2 - )-2 -4L )(2 - )-n Q.E.D. Remark 5.1. It is preferable to choose Ti in the following way: P If Ii (xk) = 0, then Ti = 0, Else xi > 0 for all Ii (xk) 0 and z i = 1. and if i=l I (xk) =0 for all i=l to p, stop. 6. GEOMETRIC INTERPRETATIONS FOR SURROGATE CONSTRAINT METHODS It is clear from Lemma 4.3 that the surrogate hyperplanes Hs in Algorithms 1, 2 and 3 are actually separating hyperplanes which separate the current iterative solution xk from the feasible solution set K. So the surrogate constraint methods can also be called'successive separating hyperplane projection' methods. It is clear from theorem 4.4 that all surrogate constraint methods possess Fejer-monotone property, which means: k+1 k d(x+, K) < d(xk, K) for every iteration. In Algorithm 1 (basic surrogate constraint method) all violated constraints are identified and the resulting surrogate constraint contains the information on all violated constraints. In Algorithm 2 (sequential surrogate constraint method) only a subset, I { 1,...,m}, of constraints are visited, so the resulting surrogate hyperplane Hs is actually the separating hyperplane which separates KI from x, and KIDK. In general, the smaller the subproblem, that is,the smaller the III, the less information contained in the surrogate hyperplane, and the smaller the improvement at every iteration of the surrogate constraint method and vice versa. ( See FIG. 6.1 ). On the other hand, the smaller the subproblem, the less computational work needed to construct a 24

surrogate constraint (including scaning and identifying violate constraints), hence the cheaper for every iteration of the surrogate constraint method. k+l x k+l X - - Xk IHs Surrogate Hyperplane of Algorithm 1'-s Surrogate Hyperplane of Algorithm 2 FIG. 6.1 Comparison of Surrogate Hyperplanes Generated by Algorithm 1 and Algorithm 2. The rate of convergence for all surrogate constraint methods depend on the choice of the x vector. A'better' I vector can make a larger improvement for each iteration of the surrogate constraint method. ( FIG. 6.2 ) 25

.llrrnortp T4nTvrnl anp Surrogate Hyperplane Xk ^^^ --— X1 ^^^1 ___ __-x k // k Poor improvement Good improvement due to an improper due to a proper surrogate hyperplane surrogate hyperplane (a) Xc vector is chosen by'weight by error' (b) X vector is chosen by'weight equally' FIG 6.2 The Effect of a Different Choice of X Vector on the Performance of the Surrogate Constraint Methods The geometric interpretations of Algorithm 3 (Parallel Surrogate Constraint Method) are summarized in FIG. 6.3. In this example three surrogate hyperplanes are derived from 3 subproblems and orthogonal projections are made simultaneously onto these three surrogate hyperplanes. These three projection points are denoted by the three vertices of the shaded triangle. The new iterative solution xk+l will be somewhere inside the triangle, depending on the choice of the t vector. 26

Surrogate Hyperplanes FIG. 6.3. Geometric Interpretation of Algorithm 3 (Parallel Surrogate Constraint Method) Now let's compare the surrogate constraint methods, that is, Algorithm 1, 2 and 3 in this report, with the relaxation method for solving linear inequalities. The relaxation method is also called the'successive orthogonal projection method', in which at each iteration an orthogonal projection is made from current iterative solution xk onto an individual convex set Ki. However, Ki only contains the information of one constraint. Sometimes the projection on Ki offers little improvement in reducing the distance from the iteration point xk to set K. On the other hand, the surrogate hyperplane contains the information of more than one violated constraint, so it would generate a better new iterative solution than that of the relaxation method. ( See FIG. 6.4 ) 27

Surrogate T 1 \ Hyperplane - \ K \ \ \ Originm4 \ \ Constraints \ \ \ Y' - New Point RQ irrn r nt VL U Ivg.tv k Constraint x Method / New Point by / Relaxation k+l Method FIG. 6.4. Comparison of the Surrogate Constraint Method with the Relaxation Method We could try to make successive orthogonal projections from current point xk to set KI, where I is an index set and KI = {(Ki lieI denotes a group of constraints and {(TKI}=K. When III = 1, KI denotes one constraint, and the successive orthogonal projection approach is equivalent to the relaxation method. If III > 1, then KI denotes a polytope. In this case, making an orthogonal projection from current point xk to set KI is simply to find the nearest point in polytope KI to current iterative solution x, which is very expensive even when Ipl =2. In the surrogate constraint methods, instead of making orthogonal projections onto set KI, the projection is made onto the separating hyperplanes Hs. The projections onto the hyperplanes are easy to calculate and the computational work for constructing those hyperplanes Hs is very small. 28

The Cimmino's method for linear inequalities identifies all violated constraint. Othogonal projections are made simultaneously onto all violated constraints from the current iterative solution and then a convex combination of those projection points will be the new iterative solution. ( See FIG. 6.5 ) / k+1 KK/ -". / / ^^^^^>M^^^^somewhere % xk / / / ^ -'^^^ -"^ —^ on this line FIG. 6.5. Geometric Interpretation of Cimmino's Method. Cimmino's method can be implemented on computers with parallel processors. However, when dealing with huge linear inequalities, with many violated constraints, making projections onto all violated constraints is expensive and time consuming. On the other hand, by using the parallel surrogate constraint method, the number of subproblems can be chosen by the user, and only one projection is made onto the surrogate hyperplane for each subproblem. Thus, the amount of computational work is much less than that of the Cimmino's method. 7. EXTENSIONS TO LINEAR EQUATIONS It is very easy to modify Algorithm 1, 2 and 3 to solve a system of linear equations Ax = b (7.1) by substituting (5.1) with the following equivalent systems of linear inequalities Ax b - Ax < -b (7.2) 29

and applying Algorithm 1, 2 and 3 to (7.2). Many of the classical iterative methods, such as the successive approximation method, the Gauss-Seidel method, SOR method, and the steepest descent method, may not always converge for an arbitrary coefficient matrix A. Some methods require A to be positive definite or diagonal dominant, otherwise those methods would have to be applied to the system ATA = ATb. In the case of successive approximations, convergence requires that the spectral radius of an approximation matrix be less than one. Whereas the surrogate constraint methods only require that the system (7.1) be feasible. This is one advantage of the surrogate constraint methods over the classical iterative methods. The surrogate constraint methods for solving linear equations will be presented as follows: Algorithm 1.1 (Surrogate Constraint Method for Linear Equations) Initialize: Set x~ = OeR n, and k=O; start. Step 1: If IA xk - bl - e is feasible, where e is a predetermined small positive number, stop. xk is the solution of system (2.5). Step 2. Otherwise, partition the rows of the matrix A into 3 sets, AI, AII, and Am, such that IAI k- bI < e A xk < bn - e Am xk > bm + e 30

Select 2 sets of weight vectors, tII and 7nI such that: II III II Hi lti > 0, ti >0and i +i = 1. iE IIUIII The surrogate constraint for this problem is m m k I A xk = II Tci AI x - 7Ci Anx= K:i bm - i bn Then, compute k+l k x = - III AI k 11 II k-I: III 1II T X ( II AII x -;i An x - 7i bIII + t i I bII )(Ci A I - 7 i An) II X 11 12 Illim A II -i An2 where 0 < X < 2. Set k=k+l and go to step 1. Algorithm 2.1 (Sequential Surrogate Constraint Method for Linear Equations) Initialize: Set x~ = 0e Rn, and k=0; start. Step 1. Partition the rows of the matrix Ai into 3 sets, Aj, Al and Am for all i = 1 to p such that IAI - b I - 31

A Xk < b -e ik Amxx > b + Select 2 sets of weight vectors, nII and imH such that: II III H 111 Tti > O, ti >0and i i +7i = 1. iELIumll The surrogate constraint for the i-th subproblem is mAi k lIIAIxk I i = II ni Ajx - AI x =7^ bl - nTi bn Define Ii (xk) = IHIlII for all i = 1 to p. Step 2: Do i = 1 to p while Ii (xk) + 0 for at least one ie { 1,...,p}. If the index set Ii = Ii (xk) + 0 for subproblem i, then select a weight vector Ki, and compute k+l k k x = x - where dk= II k H I IIi I i i nT 1Ailx Ki lAII X I'lbii + tIIbi )(ici AiI, IIA1ii 11I I AI n - II Ai' II2 and 0 < X <2. Else if the index set Ii = Ii (xk)= 0 for subproblem i, then k+l k x =X Endif Set k = k+l, next i, continue. 32

Step 3: If Ii (xk) = 0 for all the subproblems Aix = bi, i=l to p, then x is a feasible solution, stop. Algorithm 3.1 (Parallel Surrogate Constraint Method for Linear Equations) Initialize: Set x~ = Oe Rn, and k=O; start. Step 1. Partition the rows of the matrix Ai into 3 sets, A, Al and Am for all i = 1 to p such that I Al Xk - I~ Anx < b -e i k AmX > bI + e Select 2 sets of weight vectors, nt1 and 7im such that: oil > 0.i >Oand n i +1 i = 1. ien um The surrogate constraint for the i-th subproblem is III i k Hi k mIIi Hi tiI Aimx - IIAi xk = m bI - lI bIIn Define Ii (xk) = IIUIII for all i = 1 to p. Step 2: Do while Ii (xk) # 0 for at least one ie { 1,...,p}. For all i=l to p, 33

IfIi (xk): 0, then determine the index set Ii = Ii (xk), and select a weight vector i, define: Pi(xk)=xk - d where d = III i k Ii k ii x iii IIi T ( i nmx - 7UAII X - UbjII + ibI)(n i AIII - n II T 1IiAi - xil AII 12 Else, Ii (xk) = 0, define Pi(xk) = xk Endif Compute Pi(xk) for all i = 1 to p simultaneously on parallel processors. p Define P(xk) = i Pi(xk), and P(xk) = I + (P -I); P wherei = 1, i > 2L> forallisuchthat I(xk) i=l and 0 < X < 2. k+l k Compute: x = Pk(x ) Step 3: If Ii (xk) = 0 for all the subproblems Aix = bi, i=l to p, then k is a feasible solution, stop. x is a feasible solution, stop. 34

PART II THE LEAST SQUARES SOLUTION OF LINEAR INEQUALITIES 8. INTRODUCTION AND NOTATIONS If the feasibility of system Ax -< b is unknown, it is often desirable to find a vector x that satisfies the system in the least squares sense. In other words, we are interested in a vector x that minimizes the quantity II(AR - b)+112, where (A x - b)+ is the m-vector whose i-th element is max {(Ax - b)i,O). The definition of the least squares solution of linear inequalities and the necessary and sufficient condition for vector x to be least squares solution are given as follows: DEFINITION 8.1. A least squares solution x of Ax < b is a vector which minimizes 1 T 1 f(x) = lAx-b)+(Ax-b) = //(Ax - b)+. LEMMA 8.2. x is a least squares solution of the system Ax _ b if and only if: AT(Ax-b)+ = 0 (8.1) PROOF: It follows directly from the facts that f(x) is differentiable and convex and its gradient Vf(x) is AT(Ax-b)+.[16] Q.E.D. In order to find a least squares solution for system (1.1), we can simply try to use an unconstrained minimization method to minimize the function: f(x) = (Ax-b)(Ax-b) (Ax- b)2 f(x) = ~(Ax-b)+(Ax-b)+ = | ll(Ax - b)+112 35

However, the function f(x) is not twice differentiable and the powerful Newton's methods are not applicable. One approach is to solve the above problem iteratively, at any iteration point x, using the following function f(x) = (Ax-bAx-b) II(Ax-b) = - as substitute for f(x), where I is an index set and I=I(x)= {ilAix 2 bi }. Clearly, f(x) is twice differentiable and Newton's method may be applicable. This approach has been studied by S.P. Han [16]. Though iterative in nature, Han's method is a very fast method for linear inequalities. When applied to linear inequalities with a full dimensional feasible set, that is, Dim(K) = n, it often produces an solution in a very few number of iterations. The number of iterations required to find a solution is almost independent of the dimension of the problem. Some computational experiment results on Han's method are summarized in Section 11. Han's method is also a finite iterative method; it will find a least squares solution for system (1.1) in a finite number of iterations even if the entries in matrix A or the b vector are real numbers.[16] The major problem of Han's method is the substitution of f(x) with f(x). Han's method finds the new iterative solution by taking a full Newton step from the original iterative solution x in the Newton direction for f(x). However, since f(x) equals to f(x) only in the neighborhood of the original iterative solution x, many originally satisfied constraints in Ajx 5 bj might be violated at the new iterative solution, where J is an index set and J = { l,...,m\I. This hampers the performance of Han's method when the feasible set is lower dimensional, that is, when Dim(K) < n. 36

One possible alternative is to treat Ajx < bj as a constraint set for f(x). A' barrier' function, B(J,x), would be designed to measure the violation of those constraints. Now it is possible to minimize the function T(x) = f(x) + B(J,x) by Newton's method. This is the basic idea behind the fourth algorithm (Revised Han's Method). Notations and Assumptions (Aix - bi)+ = max {(Aix - bi), 0}; (Ax - b)+ = m-vector whose i-th element is (Aix - bi)+ f(x) = (Axb(Ax-b)(Ax-b)+ = (Axb)+2 I=I(x)= {ilAix >= bi ) J=J(x)= {ilAix < bi) = {l,...,m) III = cardinality of I IJI = cardinality of J and we assume that IJI= q AI = {Ai}iEI Aj = {Ai}i bI= bi }iI bj = {bi}ij W= diag(wi) E Rq x q, where wi > 0 for all i. HI = {(Hi }i I- HI is a convex set if the system AI y = bI is consistent; otherwise, HI =0. dmin = Min d(x, Hi ) i I. d(x,HI ) = inf II - x 112, where x is a least-squares solution of the system AI y = b d(x, KI ) =minimum Euclidean distance from x to KI 37

9. THE REVISED HAN'S METHOD Algorithm 4 (Revised Han's Method) Initialize: Set: x~ = Oe REn, and k=0; start. Step 1: If (Axk - b)+ = 0, then xk is a feasible solution, or, if AT(Axk - b)+ = 0, then xk is a least squares solution of the system Ax < b, stop. Otherwise, go to step 2. Step 2: Detect 1= I(x ), and J= J(x ), and solve the system: AIQ = (9.1) WAj = WAj x (9.2) in a least squares sense. Step 3. Let: dk= x -xk,and 0 () = f( x + d). Do a line search to find optimal step size: X by minimizing the function 0(X). Step 4: Let x+ = x + dk. Set k = k+l, and go to step 1. Remark 9.1. Since the function 0 (2) = f( xk + x dk) is convex, piecewise quadratic and of one variable, the optimal stepsize X can be accurately and efficiently computed [19]. Remark 9.2. Algorithm 4 with W=0 is exactly Han's method. To begin the discussion of Algorithm 4 and its convergence proofs, we will analyze some properties of Han's Method as follows. 38

LEMMA 9.1. xk +- xk= Ai (Ak - bi ) (9.3) in Algorithm 4.[16] AI denote the pseudo-inverse of AI. The pseudo inverse of any matrix B, i.e. X = B+, is the unique matrix X satisfying the following Moore-Penrose conditions: (a) BXB = B; (b) XBX = X; (c) (BX)T =BX; (d) (XB)T= XB (9.4) There are many efficient methods for solving large, sparse systems of least squares problems, such as system (9.1), (9.2)[23]. An iterative method which is developed by Paige and Saunders is based on the bidiagonalization procedures of Golub and Kahan. It is analytically equivalent to the standard method of conjugate gradients, but possesses more favorable numerical properties. The method, also called algorithm LSQR, is derived by applying the well-known method of Lanczos process. In this method, the matrix A is used only to compute products of the form Av and ATu for various vectors v and u. Hence, the sparsity of the matrix A can be fully exploited. Remark 9.3. Newton's Method for solving a system of m equations (nonlinear) in n variables: gl(xi,...,xn)=0 gm(Xl,...,Xn) — 0 or g(x) = 0 for the case of m=n, is given by: xk+l = xk - g'(xk )-lg(xk ) (k=0,,...) where g(xk ) is the derivative of g at xk, represented by the matrix of partial derivatives(the Jacobian matrix). If the nonsingularity of g'(xk ) cannot be assumed for every xk, and in 39

particular, if the number of equations is different from the number of variables, then it is logical to use the generalized inverse of g'(xk ). This is then called the modified Newton method [24]. We will show that Han's method is a modified Newton's method for solving g(xk)=Vf(xk) = 0, since: Vf( k)= AT (AXk T k T k_ T Vf(x)= ATAxk- b)+ = A (AI xk- bI) = AAi x -A b and (Vf(k))' = AAI xk+l = xk g(xk )-1 g(xk ) with the use of the generalized inverse implies: xk+l = xk (A (AAI xk - bI) And it is well known that: AI = (A TAI )+A So: xk+l = xk - AP (AI xk - bi) and this is equivalent to solving the system: AI xk+l = bI in a least squares sense. Hence Han's method is equivalent to modified Newton's Method. Remark 9.4. At each iteration of Algorithm 4, the current iterative solution xk has the following properties: AI Xk > bI Aj xk < bj If we use the above modified Newton method, that is, Han's method, then the new iterative solution xk+1 is not the exact new iterative solution desired. While we want minimize f(x) 40

1 T 1 T = I(Ax-b)+(Ax-b),. We would be actually minimizing (x) = i(Ax-b) (Ax-b)j. At the new iterative solution xk+1 some of the constraints in AJ xk _ bj could be violated. To remedy this, we could impose some penalty for approaching the boundary of Aj xk - bi from inside. One way to do this would be to add a barrier function to T(xk), for example, a logarithm barrier function B(xk,J), where B(x,J) = log(zi ) and zi = bi- Ai x for ieJ all i E J. Thus we now minimize T(xk) + B(xk,J) This approach, however, has the following deficiencies: 1). Since B(x,J) is a log function, there are some implementation difficulties. 2). If x approaches the boundary of Aj x _ bj from inside, then B(x,J) =oo. This may be too restrictive, since the original system may not be feasible. Adding this logarithm barrier function may prevent us from finding the least squares solution of the linear inequalities. So, we may want a'soft' barrier function that imposes only moderate penalties when x approaches the boundary of AJ x ~ bj from inside. This is exactly why we include: WAj y= WAJ xk (9.2) into system (9.1). It is equivalent to adding a soft barrier function B(x,J) to T(x) of the following form: B(x,J) = (A x- Aj xk )TW2(Aj x- Aj xk ) This is also called the weighted least squares method. Since (9.1) and (9.2) is usually an overdetermined system, a least squares solution may not satisfy all equations. Also, in general, the smaller the weight -for a particular equation, the larger the error. If the current iterative solution is close to the boundary Hj, we assign a larger weight Wj for j-th equation, and if the current iterative solution is far away from the boundary Hj, we assign a smaller or zero weight wj to the j-th equation. In such a way we can impose the appropriate penalties when x approaches Hj. 41

10. CONVERGENCE PROOFS AND GEOMETRIC INTERPRETATIONS OF REVISED HAN'S METHOD 10.1. Convergence Results: LEMMA 10.1. Vf(k) = -(A A + W2ATAj )dk PROOF: From Lemma 9.1: Vf(k) = AT(Axk - b)+ = (AI xk- bI) = AAI X - ATbI The normal equation of (9.1) and (9.2) is the following: (AIAI + W2AAj)(x + d ) = AbI + W2AAjx (10.1) But (10.1) is equivalent to: ATIA xk - Abi = W2ATAJxAxk - (A AI + W2AtAj)dk Hence: Vf(x) = - ( AAI + W2/AAJ)dk (10.2) Q.E.D. COROLLARY 10.2: (Vf(Xk))Tt= -//A d//2- //WA //2 LEMMA 10.3.[16] For any u, v in Rn, // Vf(u) - Vf(v)// //A// 2//u - v// LEMMA 10.4. If x0 is the initial guess of the iterative solution, then //Vf(xk) //2 /iA//2f(x0 ) for all k in Algorithm 4. 42

PROOF: From Corollary 10.2 it is clear that dk is a descent direction and since xk+l is obtained by performing an optimal line search on f(x), it follows that f(xk+l) - f(xk) for all k. And from Lemma 8.2 Vf(xk) = AT(Axk - b)+, it follows that lVf(xk) 112 < IIAT11211 (Axkb)112 IIAIf(xk) 2 IIA x) THEOREM 10.5: Let [xk} be generated from any starting point by the Algorithm 4 Then, either Vf(xk) =0 for some k<oo, or limVf( xk) =0 k=oo PROOF: We first note that if Vf(x ) = 0 then x is a least squares solution of the system and we are done. Hence, we assume that xk is not a solution and Vf(xk ) # 0. This also implies that the index set I(x ) 0. Define: c1 = IIAl2 c2 max I( AiAI + W2 AAjx )+11 10 A -(Vf(xk)) d /%~2k 2c lid IdI2 Then it follows that f(xk + X dk) - f(x) = JVf( k + t X dk)Tdk dt 1 = [Vf(xk)Td + ((Vf(xk + t dk) Vf(x)) d )dt] From Lemma 10.3. Vf( xk + t dk ) Vf( xk ) tilAI 211 dk II it follows that 1 f( x + X dk) - f( k) X< [Vf(xk)Td + cl lldk112 Itdt 0 =Vf(xk)Tdk -(Vf(x)Tdk)2 2 f(2ci lidk II2 Also from (10.2) it follows that dk = - (A A A + W2AiAJxk )+ Vf(xk) Hence 43

lid 112 _ 11( AIAi + W2AJAJXk )+112 IIVf(xk)112 <c2 IlVf(xk) 112 Therefore we have f(x )-f(xk + x dk) IlVf(xk)Tdkll2 2c1 c 2 IIVf(xk)112 2Cl C 2 Since X is the optimal step size and xk+ = xk + X dk, we have f( x ) - f(x ) xk) -f(xk + dk) > IIVf(xk)Tdk112 2c1i 2 llVf(xk)112 2Clc Since the sequence {f(x )} is monotone decreasing and bounded below, we have: o> (f( ) - f(k+l ))> k=O 1 0 2 / 2c1C2 kMO IlVf(xk)Tdk 112 IIVf(xk)112 Which implies that 1 im iVf(xk)Tdk l2 0 k=oo IIVf(xk)112 From Lemma 10.4 IIVf(xk) II2 IIAII f(x ) 1 im IIVf(xk)Tdk 112 = k=oo Thus it is bounded, which implies: since IIVf(xk)Tdk 112 = IlAi dk 112 + IIWA dk II2 Therefore lim Ai d =0 and limWAj d = 0 k=oo k= oo 44

It follows from Vf(xk) = - ( AAi + W2 AJ )dk that lim Vf(xk) =0 k=co Q.E.D. 10.2 Geometric Interpretation of Algorithm 4. 10.2.1. Geometric Interpretation of Han's Method Some preliminaries are needed before we formally give the geometric interpretation of Han's method. LEMMA 10.6. In Han's method, let x = x + dk, then xis the closest least squares solution of the system Ajy = bj to the point xk in 2-norm. PROOF: See theorem 3.2 of [16] COROLLARY 10.7. // d 112 = d(x,HI ) THEOREM 10.8. If the system Al y = bj is consistent, and letx = xk + dk,where x is A (Aix - bi the solution of(9.1).Then x eK.In addition, i = - A d )= Aid for all i E {i /Aix>bi} PROOF: If the system AI y = bi is consistent, any least squares solution for the system AI y = bt is also a feasible solution to AI y = bI. Since x is a least squares solution to AIy = bi, then it is also a feasible solution to AI y = b. Also, if the system AI y = bi is consistent, then HI # 0 and t E HI. since HI KI. It follows that E KI, since HI KIAlso substitution of y= = x + d into A = bi, for all i i x>b, yields e Ki. Also substitution of y= x = x + d into Ai y = bi, for all i E fi IAix>bi}, yields 45

Ai= Aixk +AAid = bi forall {i lAx>bi. Hence i = (i k = 1 for all i {i Ai x>bi}. Q.E.D Remark 10.1. During the process of solving system (9.1), the consistency of the system AI y = bi can be automatically determined by checking whether the residual is equal to zero or not. k+1 k k In Han's method we know that xk+ = x + dk, we may treat: k+l k x = x + d _0 (10.3) as a half line from x to x. When the system AI y = bi is consistent, then Ai x > bi, VX <1, and when X =1, Aixk+l = bi, for all i E {i IAixk>bi}. In other words, when X=1, the end point xk+l hits all the hyperplanes {H}i) I simultaneously along the half line (10.3). This is also the geometric interpretation of Han's method when the system AI y = bi is consistent. ( See FIG. 10.1 ). xk FIG. 10.1 Geometric Interpretation of Han's the System AI y = bi is Consistent Method when 46

The results for the case where AI y = bi is inconsistent are given as follows. The proofs of the Theorems are summarized in the Appendix. THEOREM 10.9. If the system Al y = bI is inconsistent, let x = xk + dk, where A x is the solution of (9.1). Then AIX: bl. In other words, there exists at least one iEI, such that Ai < bi Theorem 10.9 indicates that if the step length X = 1, then at the new iterative solution k+l x, at least one originally violated constraint is now satisfied. COROLLARY 10.10. If the system AI y = bi is inconsistent, then 3] such that 0< X J, and X= MinA xA- bi E. Let: xk+ = x+dk, andfor i such that: O<_and~;= MirAi dk+l Mint ix A d- ) \ isachieved, there mustbe: A xk = bi When the system AI y = bI is inconsistent, we can conclude from the definition of the A'least squares solution' of AI y = bI, x is in the GRAVITATIONAL CENTER of the set {UHi}ii. (See FIG. 10.2) 47

FIG. 10.2. Geometric Interpretation of Han's Method when the System AI y = bi is Inconsistent 10.2.2. Geometric Interpretation of Algorithm 4 The comparisons of the geometric interpretation between Han's method and Algorithm 4 are illustrated in FIG. 10.3. 48

a) Method b) New Point by Revised Han's Method ( Constraint 1 Constraint 2 FIG. 10.3. Comparisons Between Han's Method and Algorithm 4. In FIG. 10.3 a), at current iterative solution xk, only constraint 2 is violated. For Han's k+l method, the new search direction towards x will be perpendicular to constraint 2. The originally satisfied constraint 1 will soon be violated when moving towards constraint 2. The'soft barrier' on constraint 1 of Algorithm 4 will make it less likely to for the new iterative solution xk+ to cross constraint 1, as illustrated in FIG. 10.3 b). 11. NUMERICAL RESULTS ON LINEAR INEQUALITIES The Han's method described in Section 10 has been tested for many randomly generated problems. The search direction d is computed by using algorithm LSQR. The method is very satisfactory in CPU time and number of iterations, where the number of 49

iterations refer to the number of times to compute a new d by LSQR. For each test problem a point with IIAT(Ax - b)+112 < 10-20 is found. The number of iterations is usually much less than the size of the problem and almost kept constant as problem size grows. In the following table 11.1, Rows is the number of inequalities and Columns is the number of variables. The computation is done in an IBM 3090 system at the University of Michigan. TABLE 11.1 Computational Results for Han's Method Problem Size Number of Total Average ______ Iterations | LSQR LSQR Iterations Steps Steps Rows Columns 100 100 3 69 23 200 100 7 167 24 200 200 3 94 31 1000 1000 5 243 49 2000 2000 9 416 46 4000 2000 12 457 38 4000 4000 8 277 35 50

REFERENCES [1] AGMON,S., The Relaxation Method for linear Inequalities, Canadian Journal of Mathematics 6 (1954), 382-392 [2]AL-SULTAN,K.S.; MURTY,K.G., Exterior Point Algorithms for Nearest Point and Convex Ouadratic Programs, Tech. Rept. 89-31, Department of Industrial and Operations Engineering, The University of Michigan, MI [3]ANDERSON, D.L.;DZIEWONSKI,A.M., Seismic Tomography, Sci. Amer. 251 (1984) 58-66 [4] BEN-ISRAEL,A; GREVILLE,T.N.E., Generalized Inverses: Theory and Applications, 1974, John Willey & Sons, Inc. [5] BREGMAN,L.M.; The Method of Successive Projection for Finding a Common Point of Convex Sets, Soviet Mathematics Doklady 6, 6 (1965), 688-692 [6] BREGMAN,L.M., The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming, U.S.S.R. Computational Mathematics and mathematical Physics 3 (1967), 200-217 [7] CENSOR,Y.; HERMAN,G.T., On Some Optimization Techniques in Image Reconstruction From Projections, Applied Numerical Mathematics 3 (1987) 365-391 [8] CENSOR,Y.; ELFVING, T., New Method for Linear Inequalities, Linear Algebra and Its Applications 42, 199-211 (1982) [9] CENSOR,Y., Row-Action Methods for Huge and Sparse Systems and Their Applications, SIAM Review, Vol 23, No. 4, Oct. 1981, pp 444-466. [10] CIMMINO,G. Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari, Ricerca Sci. (Roma), Ser. II, Anno IX, 1 (1938) 326-333 51

[11] DE PIERRO,A.R.; IUSEM,A.N., A Simultaneous Projections Method for Linear Inequalities, Linear Algebra and Its Applications 64,243-253 (1985) [12] EREMINI.I., The Relaxation Method of Solving System of Ineualities with Convex Function on the Left Sides, Soviet Mathematics Doklady 6, (1965), 219-222 [13] FLEMING,H.E., Satellite Remote Sensing by the Technique of Computerized Tomography, J. Appl. Meterorology 21 (1982) 1538-1549 [14] GACS,P.; LOVASZ,L., Khachiyan's Algorithm for Linear Programming, Report STAN-CS-79-750, Department of Computer Science, Stanford University, (1979) [15] GUBIN,L.G.; POLYAK,B.T. AND RAIK,E.V. The Method of Projections for Finding the Common Point of Convex Sets, U.S.S.R. Computational Mathematics and Mathematical Physics 6 (1967), 1-24 [16] HAN, S.P., Least-Squares Solution of Linear Inequalities, Tech. Rept. 2141, Mathematics Research Center, University of Wisconsin, madison, WI, 1980 [17] KACZMARZ, S., Angenherte Auflosung von Svstemn Linearer Gleichungen, Bull. Internat. Acad. Polon. Sci. Lett. A. 35 (1937) 355-357 [18] MANGASARIAN, O.L., Normal Solutions of Linear Programs, Mathematical Programming Study 22 (1984) 206-216 [19] MIFFLIN, R., A Superlinearly Convergent Algorithm for One-Dimensional Constraint Minimization Problems with Convex Functions, Mathematics of Operations research, 8 (1983) 185-195 [20] MOTZKIN, T.S.; SCHOENBERG,I.T., The Relaxation Method For Linear Inequalities, Canad. J. Math. 6 (1954) 393-404 [21] MURTY, K.G., Linear Complementarity, Linear and Nonlinear Programming, 1988, Heldermann Verlag Berlin. [22] MURTY, K.G., Linear Programming, 1983, John Wiley & Sons. 52

[23] PAIGE,C.C.; SAUNDERS,M.,A., LSOR: An Algorithm for Sparse Linear Equations and Sparse Least Squares ACM Transactions on Mathematical Software, Vol. 8, No. 1, March 1982, Pages 43-71 [24] RHEINBOLDT,W.C., A Unified Convergence Theory for a Class of Iterative Processes, SIAM J. Numer. Anal. 5 (1968), 42-63 [25] TELGEN, J., On Relaxation Methods for System of Linear Inequalities, European Journal of Operational Research 9 (1982), 184-189 53

APPENDIX This Appendix provides some proofs for Section 10. LEMMA A.1. (Gordan's theorem of the alternatives)[21] BTX = O x > O (I) By > O (II) Either (I) or (II) has solutions but not both. LEMMA A.2. The system: T T B Bu = B v Bu> v v>O (A.1) has no feasible solution. PROOF: Suppose (A.1) has a feasible solution, let g = Bu - v > 0 then, (A. 1) is equivalent to: B T=0 g >0 (A.2) If (A.2) has a solution, by Lemma A. 1, Bu > 0 has no solution. Which contradicts Bu > v > 0. Hence, (A.2) has no feasible solution. Q.E.D. THEOREM A.3. If the system Aly = bi is inconsistent, and let = + k, where A'i x is the solution of (9.1). Then x KI. PROOF: Since the system AI y = bI is inconsistent, it follows that 54

Mind(I AI (xk -d)- b I112 ) = II AI(xk dk)- bill2 # 0 Also, the solution of this least squares problem, dk, must satisfy the following normal equation: ATA dk = AI(AIxk- b) Now assume that: Q E Kp, and since Ajy = bi is inconsistent, then: A x= AIx -Ald < bI In other words, the following system: T k T k AA1idkAT^AlXk b) AiAI d = AI(AI - bI ) AI dk (AIX k- b ) (Axk - bI) > 0 is feasible. But Lemma A.2 has shown that the above system can not have a feasible solution, which leads to a contradiction. Hence KI. Q.E.D Theorem A.3. indicates that when A, y = bi is inconsistent, the geometric picture of the Algorithm 4 is quite different from that when AI y = bI is consistent. Now we will show A \o the that = k - dk must hit or cross at least one of the hyperplanes {Hi}i I. T LEMMA A.4. Let y and z are both m-vector, in addition, y > 0, and z < 0, If y z 0, then 3e >0, such that: //y+ez// <//y/2 PROOF: Define f(e)= (y+ez)T(y+Ez)=lyll2 + 2zTy + 2E211zll2 Solve efor df() 0, yields: = -z0- >0 de,llzil Substitute e into f(E), yields: f(E)=llyll2 - z y < l izl I 55

Q.E.D. THEOREM 10.9. If the system AI y = bl is inconsistent, and letx = x + dk,where A x is the solution of (9.1). Then AI $ bl. In other words, there exists at least one ilI, such that Ai < bi. PROOF: From Lemma 10.6. Q is a least squares solution to system AI y = bI. Assume that: AI > bi. Pick up any ye K, and consider 5d= y - x # 0, then: Aj6d = AI y - AI. Since: AI y < bI and AI > bI. It follows: AI6d < 0. Consider the system: IIAi( Q +e&d) - b II= IIAi - b1 + eAI6d)l2, Since: AI x - bI > 0 and Ai~d < 0. From the result of Lemma A.4, 3~ >0, and x = Q + ed&, such that: IIAIX - b 112< IIAix - bt I2, which contradicts x is a least squares solution of AI y = bi. A A So: AI Q * bI. And it immediately follows that there exists at least one ic I, such that Ai x <bi Q.E.D. COROLLARY 10.10. If the system AI y = bI is inconsistent, 3) such that 0< A ~ 1, k = Min A dk i).JE. Let: x+ = xk+ dk andfor i such that: kMin Ai isa eved, therem 1Mun 1- r- I^ is acixbeved, there must be: Ai x+ =kbi. Mi ix -i aciv 56

UNIVERSITY OF MICHIGAN 3 9015 04732 6338 3 9015 04732 6338