NEW ITERATIVE METHODS FOR LINEAR INEQUALITIES Kai Yang Department of Industrial and Operations Engineering University of Michigan Ann Arbor, Michigan 48109 Katta G. Murty Department of Industrial and Operations Engineering University of Michigan Ann Arbor, Michigan 48109 Technical Report 90-9 March, 1990 Revised September, 1990 1

NEW ITERATIVE METHODS FOR LINEAR INEQUALITIES Kai Yang and Katta G. Murty Department of Industrial and Operations Engineering University of Michigan Ann Arbor, Michigan 48109 March, 1990 Revised, September, 1990 1

Abstract New iterative methods for solving systems of linear inequalities are presented. Each step in these methods consists of finding the orthogonal projection of the current point onto a hyperplane corresponding to a surrogate constraint which is constructed through a positive combination of a group of violated constraints. Both sequential and parallel implementations are discussed. Key words: Linear inequalities, Surrogate constraint, Iterative methods, Sequential and Parallel Implementations. 2

1. INTRODUCTION We consider the problem of finding a feasible solution x E Rn to a system of linear inequalities: Ax - b (1.1) where Ac IRmn and be Rm. Large scale versions of this problem have many applications. Some recent applications include image reconstruction from projections [5], which is becoming an important problem in many scientific fields. In medical science, computerized tomography reconstructs the images of cross-sections of the human body by processing data obtained from measuring the attenuation of X-rays along a large number of lines through the cross-section. Other image reconstruction problems arise in remote sensing [ 1], seismic tomography [2] and industrial nondestructive testing. There are basically two approaches which could be used to solve the system (1.1). The first approach is to transform the linear inequality system (1.1) to a linear programming problem and then use well-established methods such as Karmarkar's method [15] [17] or the simplex method to solve the resulting linear program. These methods require matrix operations which are often impractical for the large scale systems that arise in applications such as image reconstruction. Also, in these applications the A matrix tends to have a special structure [5] which is hard to exploit in these methods. The second approach involves using iterative methods. They always work with the original data and most of them do not need matrix manipulations. The basic computation step in them is extremely simple and easy to program. Because of these advantages, the linear inequality solvers employed in image reconstruction are most often iterative methods. One class of iterative methods are derived from the relaxation method for linear inequalities (Agmon, Motzkin and Schoenberg [1954]) [1] [16] and Kaczmarz's method [14] for linear equations.The word'relaxation' in the name refers to the fact that they consider one constraint at a time, so in each step, all but one constraint are'relaxed'. At each iteration a violated constraint is identified and an orthogonal projection is made onto it from the current point. So they are also called'successive orthogonal projection' methods. Bregman [1965, 1967] [3] [4], Eremin [1965] [10] and Gubin et al. [1967] [13] extended this' successive orthogonal projection' idea to find a point in a convex set defined by a system of inequalities involving convex functions. Making an orthogonal projection onto a single linear constraint is computationally inexpensive. However, when solving a huge system, considering only one constraint at a time would lead to slow convergence. Instead, it is better to process a group of constraints at a time. But making an orthogonal projection onto the affine space corresponding to a group of constraints is computationally expensive. The amount of work for this projection grows as a cube of the number of constraints in the group. Another class of iterative methods are derived from Cimmino's algorithm [8] for linear equations. Y. Censor, T. Elfving [1982] [6] and A.R. De Pierro, A.N. Iusem [1985] [9] developed a Cimmino type algorithm for linear inequalities. This method makes orthogonal projections simultaneously onto each of the violated constraints from the current point and takes the new point to be a convex combination of those projection points. Cimmino's method can be implemented very easily on parallel computers. However, when dealing with large systems, with many violated constraints, making projections onto every violated constraint is computationally expensive. 3

In this paper we propose a class of new iterative algorithms called'surrogate constraint methods'. They are able to process a group of violated constraints at a time but retain the same computational simplicity of the relaxation method and at the same time they are highly amenable to parallel implementations. In each iteration, a'surrogate constraint' is derived from a group of violated constraints. The current point is orthogonally projected onto this surrogate constraint treated as an equation and the process is repeated until a feasible solution is found. We discuss three different surrogate constraint methods. Section 2 defines notation. Sections 3, 4 and 5 describe various algorithms. Section 6 compares these methods with relaxation methods and provides some computational results. Section 7 presents the extension of these methods to solve linear equations. 2. NOTATION AND ASSUMPTIONS A = ( aij ), b = ( bi), and Ai denotes the ith row vector of A. We assume that all this data is integer and Ai # O for all i. We let K denote the set of feasible solutions of (1.1) and we assume that K# 0. I C{ l,...,m}, denotes an index set, which identifies a subset of the constraints. Ki = {x IAix < bi }, is the half space corresponding to the ith constraint. Hi = {x IAix = bi }, is the hyperplane corresponding to the boundary of the ith constraint. KI= {(Ki }ie, KI - 0, since KID K. ISI = Cardinality of the set S. AI = the IIlxn matrix with rows Ai, ie I. bI = (bi, ie I), a column vector. Ilxll: the Euclidean norm of the vector x, Ilxil = d(x, Hi ) = minimum Euclidean distance from x to Hi. d(x,K ) = minimum Euclidean distance from x to Ki. Note that d(x,Ki ) = 0, if xe Ki; otherwise, d(x,Ki ) = d(x,Hi ). ~(x) = sup d(x,K ) ie(1,..,m) d(x,K) = minimum Euclidean distance from x to K. Here we define the length of the binary encoding of all problem data in (1.1) as: L = F rl+log((aijl +1)1 + rl+log(lbil +1)1 + Fl+lognml + 2 (2.1) i J i The following lemma [12, 18] will be used in the convergence proofs. LEMMA 2.1. If the system (1.1) is feasible, then there is a feasible solution x with,xKl I 2 L/2n,j = I,...,n. 4

Without any loss of generality we assume that first, each row of A is normalized so that IIAill=l for all i = 1 to m. This has no effect on K, or Ki or Hi, but makes it easier to write down the projections of points on hyperplanes Hi and prove the convergence. Clearly, a point xe K iff ~(x) = 0. In practice, we are usually interested in getting an approximate solution for (1.1) within some tolerance. Given a tolerance e > 0, a point x is said to be'feasible to (1.1) within tolerance e' if O(x) < e, or in other words, we want to find a point xe Ke={x: Aix < bi + ~ for all i = 1 to m}. Clearly if e = 0, KE=K.1 3.THE BASIC SURROGATE CONSTRAINT METHOD In this method, at each iteration, the next point is some point in the line segment joining the current point and its reflexion in the hyperplane corresponding to a surrogate constraint which is a nonnegative combination of all the constraints violated by the current point. The actual point selected in this line segment depends on a parameter X in the algorithm, which can be set by the user anywhere between 0 and 2 (X = 1 corresponds to the orthogonal projection). When x is the current point, let I = I(x) = {i: Aix - bi > 0}. For a given point x, the problem of finding the set I(x), the index set of violated constraints at x, is highly amenable to parallel implementation. If I(x) # 0, The row vector x = (pi: ie I(x)) denotes a positive vector of weights. Given such x and i, the surrogate constraint defined by them is: (it AI )x < (ibI ) and the corresponding surrogate hyperplane is H,= { x: (it AI)x = (tbI) }. Also, without any loss of generality, we assume that the it vector is normalized so that Xi7 = 1. This has no effect on the following algorithms, so this assumption is purely iE I(x) for the sake of simplifying some statements. Algorithm 1: Basic Surrogate Constraint Method Initialization: Let x~ 11n be some initial point ( it could be 0, or some known near feasible point). Go to Step 1. General Step k+1: Let xk be the point obtained at the end of the previous step. Check whether Ai xk < bi holds for all i = 1 to m, and identify the index set, Ik = I(xk), of kviolated constraints. If Ik = 0, x is feasible to (1.1), terminate. Otherwise, select a weight vector 7i(k) = (ik:,i e Ik) and compute: k+l k (it (k)AIkx - k (k)bk )( it (k)AIk) x X 2 (3.1) 11i (k)AIkl2 1 It is well known that if e = 2-L and a solution x is found to be feasible with tolerance e = 2-L, then there exists a procedure with polynomial complexity which can construct a true feasible solution for (1.1) from this approximate solution x. [12] [18] 5

where 0< X <2, go to next step. Remark 3.1. If l = 1, xk+1 will be the orthogonal projection of the current point xk on the surrogate hyperplane Hs. ( See FIG. 3.1) Surrogate Hyperplane K\ \ Ve\ \ \ \k^ Vx late \x Contrint \ \ \ \ \ X FIG 3.1 Illustration of a Step in the Surrogate Constraint Method with X= 1 Computationally, the most expensive piece of work in this method is that of finding the index set of violated constraints, I(x ), in each step. As mentioned above, this can be implemented for parallel computation very easily. This is one of the major advantages of these surrogate constraint methods. Remark 3.2. Recommended choices of the weight vector K: The following are some of the rules that can be used to select the vector of weights in each step. (1) Weight by error: The quantity ri = (Aix - bi), is the Euclidean distance from the current point x to Ki,for each iE I(x ). Since larger ri corresponds to greater infeasibility with respect to Ki, it may be desirable to make Ki proportional to ri, for all ie I(xk ), that is: ri 1i = ri iE I(xk) 1 <k (2) Weight equally: in this rule, i =ni - for i I(x ). (3) Convex combination of the two weights given above: ri /1;i = a i+ iE I(xk) 1 k (1 - a) i-I where 0 < a< 1, for is I(xk ). 11^1 6

For the sake of simplicity, all our convergence proofs will be based on the assumption that the weight vector 7- = ( 7i: ie I(xk )) is selected so as to satisfy ni > y for all ie I(xk ), k when x is the current point, where y is some predetermined small positive quantity. Convergence Results. DEFINITION 3.1. When Ki 0, a sequence [{x k= in Rn is called strictly Fejermonotone with respect to the set K iffor every xE K: //xk - x < //x - x I for all k > O. (3.2) Every Fejer-monotone sequence is bounded if K# 0, since Ilxk - x II is always positive and monotonically decreasing with K. THEOREM 3.1. If K4 0, any sequence {x}Jk generated by Algorithm 1 is strictly Fejer-monotone with respect to K, provided that xk~ Kfor all k > 0. PROOF: Select any point xe K. Define ek =xk - x, k = 0, 1,..., for simplicity we denote i(k) by i and Ik by I. Then, if I: 0, k+l k e (X AI Xk - AbI )( 7c AI)T e e and: IlIn AIi2 IIk+1 2 11 kil2, AI x - 7Xbi )2 (7 AI X - k b ) ( lie II = lie II + 2 - 2A ---- -- (t A,) e 2 II AI12 21X AII1 l117 AIII2 lek l2 + x2t A k _ ib, )2 Ite Al112 tA2l (AIx-7t AbI ) -2 (25 A(x _72(- Tc AI x +7Tcbi) l117 AII l2 7

Ilekl2 i+ 2( A xk -_ b,i)2 11c AIIIl 2X(7c AI X - _CbI )2 (7c AIx - 7ibI )(i7AIx - tbI ) + 22 11t AII2 ll7 AiII2 Since K AIXk > ibI and AI x < cbI, we have (71 AIX k - nbj )(7lAX - nbT) 2. AI - 7tb )(A I x - ibI ) < 0. Therefore it follows that k 2 Ilellek+1112 < Ilekl2 -(2 - ) ( AI x - bI) 2 < llekl2(3.3) lie- II~ <lAeblbll2) THEOREM 3.2. If/ K 0, then any sequence{xk }k generated by Algorithm 1 has the property: im (xk)= PROOF: Fejer-monotonicity implies that the sequence{lle kll}k=1 is monotonically decreasing. Since the sequence Ilekll is positive and bounded below thus it converges. Which implies that 1 i m llek+ll = 1 i m llekl. It follows from (3.3) that k — 00 k-Ooc k ~~ k ~~ - i m ( 7 ((k)AIx" - (k)blk) =0. Since ri(k) > y for all ie I(x ), and:ki = 1 and iE Ik also since AI kxk - bI k> for all i e Ik, this implies that either 1 i m (Aixk - bi) = 0 for k-4oo all i e Ik, or at some k < oo, I(xk ) = 0. Therefore i m ((xk) = sup d(xk,Ki) =0 k —0oo iE{1..,m LEMMA 3.1: If K 0, an if the sequencefxk }k= satisfies the conditions (i) {x k} is Fejer-monotone with respect to K, and (ii) lim (xk) = k-o o then xk converges to an x E K 8

PROOF: Follows from Lemma 5 and Lemma 6 of Gubin et al.[13] In practice, we are more interested in finding a point xe K8 quickly. We will show if we set I(x) = {i: AiX - bi > e) in Algorithm 1, then Algorithm 1 converges to a point xe K8 in a finite number of iterations if K#: 0. THEOREM 3.3. If K4 0, and we set I(xk) ={i: Aixk- bi > e} in Algorithm 1, then it converges to a point x E Ke in afinite number of iterations. PROOF: I) First we show that for any xe K if xk K, then lixk+l - x II < llxk - x II for all k > O.This follows directly form the fact that if xk K8, then I(xk) # 0 so Algorithm 1 will not terminate, from (3.3) we have lie+ll2 < lekl2 -X(2 - X) (:c AIX - bI)2ek li-A-I2 < ilekl12 1l17 AIII Hence the result follows. - k oo II) Then we show that in this case any sequence{x kk=l generated by Algorithm 1 has the property: 1 i m ~(xk) < ~. This follows from that fact that the sequence llekll}k=1 is k — oo positive and monotonically decreasing thus it converges. So: 1 i m lek +ll = 1 i m Ilekll. k->o- ke-go So it follows from (3.3) that 1 i m ( x (k)AIkxk - 7(k)bil) = 0. But since it (k) > 0 and k — k kk ki = 1 and also since Aix - bi> e for all i E Ik, this implies that t(k)Akx - iE Ik ni(k)bk > ~ as long as Ik: 0 which contradicts 1 i m ( x (k)Aikxk - 7(k)bk ) = 0. So IK >~ k —*^=~~~ ~ ~~~~k I00o there must exist k < oo, such that I(xk ) = 0. Which implies at iteration k, (xk ) = sup d((xk, Ki ) < e. In other words, xk E Ke. ie {1,..,m} k III) Bounds on k: It follows from II) that the sequence of points {x } produced by k k Algorithm 1 converges to a point xk E K. Let ek=x - x, where xe K, from Lemma 2.1, we have lle~ll < 2L'/n, If at iteration k, I(xk): 0, Ai xk - bi > for all ie I(xk), it follows that it Akx - ibik ( i){min(Ai x - bi)} > ( Ci )e> E iIk iIk EIk n and lIn AIk i < 117 IllIAllA < 1it llIIIkF = ( iC2 )1/2 ( laiji2) 1/2 < m iC Ik i EIk j=l 9

IIAi xIl n 1 where IIAI 112 = sup -IA xl IIAIkF is the Frobenius norm of AIk = (, laij2) 1/2 x i Ik j=l Itfollowsthat llek+ll2 < Ilekll2 (2- X)E2 m2 2.2 2L-2 Therefore, Algorithm 1 converges within k steps where k < m 2 2 nX(2 - -).E2 4.THE SEQUENTIAL SURROGATE CONSTRAINT METHOD In many applications requiring the solution of linear inequalities, the coefficient matrix of (1.1), that is, the A matrix, is often very large ( m and n are of the magnitude 105 or more ) and sparse (less than 0.1% of entries are nonzero). If the system (1.1) is to be solved by the computer on site, working on the whole matrix A is almost impossible. So, it is preferable to work on one small subset of constraints of (1.1) at a time. Specifically, the matrix A can be partitioned into p submatrices, and the right-hand-side vector b can be partitioned compatibly into p subvectors, as follows (A1 (bl A = At and b= bt (4.1) AP bP where At is a mtxn matrix, bt has mtrows, for t =1 to p, and X mt = m. t=l Now we will show that the surrogate constraint method can be used to solve the system (1.1) by successively applying on the subsystems, Atx ~ bt, t= 1 to p in cyclic order. For any xeIR", define It(x) = {i: ith constraint in t-th subsystem is violated by x 1. We t t denote by nit = ( X 1,..., n mt), the weight vector for the t-th subsystem, t = 1 to p. When the current point is x, and we have to operate on the t-th subsystem next, we will set Tit > if is It(xk), 0 otherwise. Then the corresponding surrogate constraint for the t-th subsystem is: ittAtx < ntbt, and the surrogate hyperplane of the t-th subsystem is: Ht = {x: XtAtx = itbt}. The algorithm goes through major cycles. In every major cycle, each of the p subsystems is operated on once, in serial order t =1 to p. 10

Algorithm 2: Sequential Surrogate Constraint Method Initialization is the same as in Algorithm 1. Considers a major cycle. In this major cycle, operate on subsystems in the order t = 1 to p. Let xk be the current point and let the t-th subsystem be the one to be operated next. Find It (xk) If It (xk) = 0 define xk+ = xk and go to the next subsystem with xk+, if t < p. If t = p, this completes the major cycle. If there is no change in the current point throughout this major cycle, then the current point is feasible to (1.1), terminate. Otherwise, go to the next major cycle with the current point. If It(xk) # 0, select a weight vector tt and define xk+l = xk - dk kwe (:tAtxk- _tbt)(ntAt)T where d = (4.2) II7Atll2 and 0 < X <2. With xk+, go to the next subsystem if t < p, or to the next major cycle if t = p. k X Xk+l Xk+2 X +p- xk+l Subsystem _ Subsystem Subsystem ~1 ---- 2 " -- ~ - p FIG 4.1 Diagram of the Sequential Surrogate Constraint Method Convergence Results LEMMA 4.1. Let c~eRn and CT a row vector in R. Let Fbe the half space {x: CTx < CTc~}. If z Fris such that its orthogonal projection on F is c. Then: l/y - z// < ly - z// for all ye Fand z; = z - Az - c~)for 0<<2. PROOF: See [1] [10]. p 11

r -0 0 C, Ily-zIl yI- y / Ily-zll FIG. 4.2 Illustration for Lemma 4.1 Suppose the point xk+1 is obtained by operating on the t-th subsystem with xk as the k+1 k current point. Then from Algorithm 2, x = xk - dk for some O < X <2, with d as defined in (4.2). The surrogate hyperplane at this iteration is Ht = {x: t'Atx = 7ntbt }, and x - dk is the orthogonal projection of xk onto Ht. So: tAt(xk - dk ) = tbt Also recall that K ={xlAx 5 b}, clearly for all xe K, ntAtx <_ tbt has to be satisfied for any t = 1 to p. Using the above arguments and Lemma 4.1 the following Lemma 4.2 follows directly. LEMMA 4.2. Suppose the point x +is obtained by operating on the t-th subsystem with Xk as the current point. Let It be the half-space corresponding to the surrogate constraint in this step, rtAtx _ 7tbt= IAt(xk - dk ). Then //y- (xk - )// < I y- xk / for all 0 < X < 2 andforall y rt. Also K Ct. t Hs rt xk- 2dk k x FIG 4.3 Illustration for Lemma 4.2. 12

THEOREM 4.1. In Algorithm 2, if xk+1 4 xk, then for all x E K, // x - xk+ < //x- //. PROOF: Follows directly from Lemma 4.2. THEOREM 4.2. If K 4 0, any sequence{xk }k generated by Algorithm 2 has the property: I im Ok )= 0 k — 0oo PROOF: For any xeK, Theorem 4.1 implies that the sequence{llxk- xll}k=i is monotonically decreasing, thus it converges. So if t(k) denotes the subsystem operated upon when xk is the current point, and Kt(k) the weight vector used in that step, then 1 i m ( t(k)At(k)xk -t(k)bt(k)) = 0. Since k) > 0 for all i It(k) (xk) and sums to one k — oc over these i, and A xk - b(k) > 0 for all ie It(k) (xk), this implies that either 1 i m ( A) xk - bti) = 0 for all ie It(k) (xk) or there exist a r < oo such that at major k — oo cycler, It(xk) = 0 for all t = 1 to p. Therefore: 1 i m (xk)= 0 U k->oo k t k THEOREM 4.3. If K4 0 and, we define It(x ) = {i: Aix - b > e}, then Algorithm 2 converges to a point x E Ke in afinite number of major cycles. PROOF: As in the proof of Theorem 3.3, and from Theorem 4.1 we know that if xk Ke, then for all x EK, II x - k+l II < II x - xk II. Define e =xk - x, since llekll is monotonically decreasing and bounded below so it converges. Which implies that 1 i m llek+ = 1 i m i lekll. But this happens only if there exists an r < oo such that at k- oo k-> oo major cycle, It(xk) = 0 for all t = 1 to p. Otherwise for all k there exists a subsystem t such that It(xk) + 0, then tAtxk - tbt > e hence 1 i m lle kl # 1 i m llek+ 11, a k-oo0 k —oo contradiction. So Algorithm 2 must converge to a point in K. in a finite number of major cycles. Now we derive a bound on r. From lemma 2.1 it follows that: lie 11 < 2L- /-n. If at the begining of major cycle r, the current point xk K8, there exists at least one subsystem t, tk t and at least one constraint in it, such that Atx - bi > e. So a change of point must occur in some step in this major cycle. Let t be the subsystem operated on in that step, and xg,xg+ the point at the begining and end of this step, and 7t the weight vector used. So, g+l = eg tAtxg tbt)( 7At)T Thus e^'. e< -^ ('itAtlhus: lrtAtl I 13

(I7tAtxg - /Itbt)2 lleg ll2 < Ilegll - k(2 - A)2 iitAtll2 As in the proof of Theorem 3.3, we get: ileg+1 2 < legl2 X(2 - X))2 le1l112 < liegII2. (- 2 -. From this it follows that Algorithm 2 converges within r - 22 2 L- 2 major cycles where r < nX(2-.)-e2 5. PARALLEL SURROGATE CONSTRAINT METHOD The surrogate constraint method can also be implemented to work on ALL of the subsystems, Atx bt, for t =1 to p of (1.1) SIMULTANEOUSLY. This is particularly suited for parallel computation. This algorithm generates one new point in each step and an operation is carried out with the current point on each subsystem in a parallel manner. Algorithm 3: Parallel Surrogate Constraint Method Initialization is the same as in Algorithm 1. General Step k+l: parallelly. Let x be the point obtained at the end of the previous step. Do the following for each subsystem, Atx < bt, for t =1 to p k k k k Find It(xk) as in Algorithm 2. If It(xk) =0, define Pt(xk) = x If It(xk) + 0, select the weight vector nt as in Algorithm 2, and define: P(x = xk - d where d = (ntAtxk - 7tbt)( /tAt)T II7tAtil2 (5.1) If I(xk) = 0, for all t = 1 to p, then xk is feasible to (1.1), terminate. Otherwise, p Define P(xk) =Z ttPt(xk), t=l where zt are nonnegative numbers summing to 1 with xt > 0 for all t such that It(xk): 0 Define xk+ = xk + X(P(xk) - xk) where 0 < X < 2 (5.2) 14

FIG 5.1 Diagram of the Parallel Surrogate Constraint Method Convergence Results P P LEMMA 5.1. Let Vten for t = to p, and V = zt Vt, where rt = 1 and t=l t=l P O g1 for all t = 1 to p, p is a positive integer, then //V/ /2 _ E //V/2. t=l PROOF: By induction on p. For p = 2, V = T1V1 + 2V2, where T1+ t2 =1, T1 ~ 0 and T2 > 0. So IIV112 = II TlVl + T2V2 112 = t2I1V2+ TlIV2112 + 21Tl2(VI)T2. So, Tii1 i"n f - 2iviP^Tlii.C2P+T(Vl i2 V2 iiv -.:lllVl112+:211V2112 - 1V112 -tllIV112+:211V2112 -: 2 IV111V2 -'22IV2112 - 2t1l2(V1)TV2 =-T(1-1-Tl)11Vl112+ 2(1-t2)11V2112 - 2r122(V1)TV2 =-1t211V1112+ 2tl11V2112 - 21t2(VD1)T2 =T1T211lV- V2112 0. It follows that IIV112 < T111V112+ 21iV2112. Hence the assertion is valid for p = 2. Induction Hypothesis: Assume that the assertion is valid for p - 1. We will prove that the assertion is also valid for p. Let V = lt Vt = t Vt + TpVp = (1-Zp) V + TpVp where t=l t=l V = -t Vt. From the above argument, t= 1 -T 15

P.I I,= IP IIV2 < pllVpll2 + (l-p)ll 112 plVp2 + (1-p) t Vt2 = tllVtl2 t=l 1-T t=l by the induction hypothesis. So, the assertion is valid for all finite positive integers p. THEOREM 5.1. In Algorithm 3, if x k+ $xk, then for all x EK,// x - xk+ // < //x -xk//. PROOF: x.=t - - -- 2t xk+ =xk - (ltAtxk -l tbt)(IitAt) t= 1 ltA Let: eg = xg -x, then e k e= k ( tAtxkA- k tbt)(n7tAt)T (5.3) Let V tAtll2 a t 1 tAtll p it is clear that V = t Vt. t=l Then: (ek+l )T = (ek )T XVT and: lle k+ll2 = llekll2 +22VTV - 2VTek = lek12 + 211V112 - 2, t(7ttAtxk- tb) [ Atx x)] k'2 ct(KctAtxxk -/ctbt) = llek112 + X21iVI2 - 2XX lAl2 bt [) t (Atxk - bt)] II-tAtll2 +k 2XX x b,t()tAtxk -;tbt) Ilek112 + X211VI2 2 2 [7[( (At xbt)b t= IC1 [;t (Atx - bt)] llrntAtl12 t= 1 16

lekll2+ 2 lP ll2 _2 tt(ttAtxk - tbt)2 < liek112 + X2 IIV112 - 2X2 tt IVtIP2 t=1 p =Ilekll2 + X2 IIVll2 - 2E xt IIVtll2 t=l From Lemma 5.1 we have: X2 IIVII2 -< X2 t IIVtl2, So t=l p Ilek+ll2< Ileklle2 - x(2-) Ttlt lVtI2 < Ilello2 (5.4) t=1 THEOREM 5.2. If K= 0, any sequencefx tk generated by Algorithm 3 has the property: l im o(x )= O k~ooo PROOF: For any xeK, Theorem 5.1 implies that the sequencefllxk-xll }k= is monotonically decreasing, thus it converges. It follows from (5.4) that 1i m f tt(ltAtxk - ~tbt)2 lim - 0 k — - oWe tAt112 Since %t > 0 for all t such that It(xk) 0, and the %t sum to 1, the above implies that 1 i m ( tAtxk - _tbt) = 0 for all such t. Since for each subsystem t are strictly positive k-*oo for allij It(x) and sum to 1, this implies that either 1 i m (Ax - bh) = 0 for all k —oo je It(xk) and for all subsystems, t = 1 to p or there exists a k < oo such that It(xk ) = for all t = 1 to p. Therefore 1 i m (xk)= 0. k-> oo For the sake of simplicity, our following finite convergence proofs will be based on the assumption that Tt are all equal for all t = 1 to p, that is, Tt =1 for all t. In practice, a p 17

different choice may yield better computational performance, this has to be determined in a computational experiment. THEOREM 5.3. If K4 0, and we define It(xk ) = i: A x - bt > e} in Algorithm 3, then it converges to a point x E Ke in a finite number of steps. PROOF: As in the proof of Theorem 3.3, and from Theorem 5.1 we know that if xk, Ke, then for all x E K, II x - xk+l I < 1I x - xkII. Define e =x - x, since llekll is monotonically decreasing and bounded below so it converges. Which implies that 1 i m Ilek+ ll = 1 i m lekll. But this happens only if there exist a k < oo such that at kth k-> oo k —4 ~ iteration, It( xk ) = 0 for all t = 1 to p. Otherwise for all k there exists at least one subsystem t such that It(xk): 0 then ntAtxk - tbt > e.Since llek+lll < llekll2 + (X2 - 2X)X tt(ttAtxk - tb)2 ( X2 - 2X) Tt(7ctAtx - 1b)2, and this implies that 1 i Ile i m llek i m lle, a t= ItWAhtl I s c e contradiction. So Algorithm 3 must converge to a point in K6. From lemma 2.1 it follows that: lle~11 < 2Ll/n. If It(xk) # 0 subsystem, say, the t-th subsystem, then from (5.3) and (5.4) we get for at least one p Ilek+1112< Ielle2 -_(2- 3) t=l tt(ttAtxk - ntbt)2 lltAtll2 k< llet-X) t(7tAtxk - tbt)2 I< le k12 -X (2- ) lltAtl 117tA tll2 1 1 Since: t tt < m p m' tAtxk -tbt > ( i).e = e, and IItAtll < 1litt 11211All2 < Illtl12lAtllF < m i Itfollows that llek+112 < llekll2 (2 - X)e2 Therefore, Algorithm 3 converges to a point in K within k steps where m3.2 2L -2 nk(2 - r)-e2 A different parallel method In the parallel method discussed above, we obtained for the t-th subsystem a surrogate constraint 18

7x tAtx <tbt (5.5) and a point Pt(xk ) by projecting xk onto this surrogate hyperplane, for each t = 1 to p such that It(xk): 0. And the new point xk+l is derived from a weighted average of these Pt(xk). So this method can be viewed as a Cimmino type method using groujaof constraints instead of individual constraints, and surrogation within each group. Another parallel method would just obtain the surrogate constraint (5.5) for each subsystem t such that It(xk): 0. Then it would take a positive combination of all such surrogate constraintS generated, leading to a surrogate constraint for the entire original system (1.1). If the weight assigned to t is 8t > 0, this constraint will be: 8t (c tAtx) 8t (itbt) (5.6) te {It(x0k) 0} te {It(xkk) 0} The point P(xk) is then defined to be the orthogonal projection of xk onto (5.6) treated as an equation, and the next point x is obtained as in (5.2) using this P(xk). This method is essentially Algorithm 1 using a parallel implementation for identifying all the violated constraints, with a different processor examing the constraints in each subsystem. 6. COMPARISONS WITH EARLIER METHODS The rate of convergence for all surrogate constraint methods depend on the choice of the i vector. A'better' 7X vector can make a larger improvement, see Figure 6.1. Surrogate\eSurrogate Hyperplane Poor improvement Good mprov Poor improvement Good improvement due to an improper kdue to a proper surrogate hyperplane surrogate hyperplane (a) 7 vector is chosen by'weight by error' (b) r vector is chosen by'weight equally' FIG 6.1 The Effect of the x Vector on the Performance of the Surrogate Constraint Methods Now let's compare the surrogate constraint methods, that is, Algorithm 1, 2 and 3 in this report, with the relaxation method for solving linear inequalities. In the relaxation 19

Surrogate T __ 1 ____ \< rtnyperplane - \ K Ka \ \ \ \S. Original \N \ X Constraimts x \ \S Y' "01 New Point by k Surrogate k Constraint x Method New Point by k Relaxation x k Method FIG. 6.2 Comparison of the Surrogate Constraint Method with the Relaxation Method method at each iteration an orthogonal projection is made from current point xk onto an individual Ki for some i. However, Ki only contains the information of one constraint. Sometimes the projection on Ki offers little improvement in reducing the distance from the iteration point x to set K. On the other hand, the surrogate hyperplane contains the information of more than one violated constraint, so it is expected to generate a better new point than the relaxation method. (See FIG. 6.2 ) Cimmino's method for linear inequalities identifies all violated constraints in each iteration. Othogonal projections are made simultaneously onto all violated constraints from the current point and the new point is a convex combination of those projection points. ( See FIG. 6.3 ) / iv / _; ^^^^^^k somewhere k xk FIG. 6.3. Geometric Interpretation of Cimmino's Method. Computational experiments have been carried out to compare the sequential surrogate constraint method (Algorithm 2) with the version of the relaxation method that processes the inequalities in cyclical order. We give below our preliminary computational results on randomly generated large sparse problems carried out on the IBM 3090-400/VM main frame computer at the University of Michigan. The problems are generated in such a way that the system would have an interior feasible solution. The sequential surrogate 20

constraint method is implemented using in each step the weights suggested in (3) of Remark (3.2) with a = 0.2. The value of x for both methods was taken to be 1.7. The results are listed in Table 6.1. Five test problems were generated in each dimension. The speedup of surrogate constraint method over the relaxation method ranged from 30 to 60. The speedup increases as the problem size increases. TABLE 6.1 Comparison of the Relaxation Method and Algorithm 2 (Sequential Surrogate Constraint Method) Problem Sparsity Relaxation Sequential Surrogate Size of the Method Constraint Method Problem %o ~ic* ** + # + Rows Columns 5000 2500 2.0 10500 4.9 17.04 2 3.4 0.511 2500 5000 5000 1.0 10675 5.2 23.49 2 3.2 0.544 2500 10000 2500 1.0 22375 6.3 45.11 5 2.7 0.806 2000 10000 5000 0.4 20985 5.9 54.06 5 3.7 1.146 2000 10000 10000 0.4 23125 7.1 84.22 5 2.8 1.371 2000 18000 5000 0.5 41125 8.4 213.00 9 3.4 3.332 2000 18000 9000 0.2 44750 9.3 255.13 9 3.8 4.002 2000 Five test problems were generated for each dimension and the accuracy = 10* Average number of projections + Average CPU time (seconds) # Number of subsystems ## Number of rows in each subsystem ** Number of sweeps Number of major cycles In the relaxation method, in each sweep, all the constraints are examined once from top to bottom. The average number of sweeps before termination varied from 5 to 10 among the problem sizes. Since the current point changes after each projection, it is not possible to implement a sweep in this method in a parallel fashion. In the sequential surrogate constraint method, in each major cycle, the number of projections made is at most equal to the number of subsystems. In each major cycle, each constraint is examined once, but as explained earlier, this work can easily be parallelized. Also, the number of major cycles needed in the surrogate constraint method is much less than the number of sweeps needed in the relaxation method to achieve the same accuracy. These computational results are very encouraging. More extensive experimentation is necessary to determine the strategies to implement the surrogate constraint methods for obtaining the best performance, things such as the best choice for the weight vector in each step, etc. 21

7. EXTENSIONS TO LINEAR EQUATIONS It is very easy to modify Algorithm 1, 2 and 3 to solve a system of linear equations Ax = b (7.1) by applying them on the following equivalent systems of linear inequalities Ax< b - Ax - -b (7.2) Many of the classical iterative methods, such as the successive approximation method, the Gauss-Seidel method, SOR method, and the steepest descent method, may not always converge for an arbitrary coefficient matrix A. Some methods require A to be positive definite or diagonal dominant, otherwise those methods would have to be applied to the system ATAx = ATb. In the case of successive approximations, convergence requires that the spectral radius of an approximation matrix be less than one. Whereas the surrogate constraint methods only require that the system (7.1) be feasible. This is one advantage of the surrogate constraint methods over the classical iterative methods. For each i, system (7.2) has both the constraints Aix - bi and Aix b bi.When xk is the current point, if Aixk = bi, both these constraints are satisfied. Otherwise, Aixk + bi, and exactly one of the constraints in the above is violated, while the other one is satisfied. Thus, when xk is the current point, the set of violated constraints in (7.2) includes at most one of the constraints from the pair Aix - bi and Aix _ bi.Using this, simplifications can be made in executing Algorithm 1, 2 or 3 on the system (7.2). Acknowledgements: We are grateful to a refree for pointing out the result in Lemma 5.1 and its importance in the proof of Theorem 5.1. REFERENCES [1] AGMON,S., The Relaxation Method for linear Inequalities, Canadian Journal of Mathematics 6 (1954), 382-392 [2] ANDERSON, D.L.;DZIEWONSKI,A.M., Seismic Tomography, Sci. Amer. 251 (1984)58-66 [3] BREGMAN,L.M.; The Method of Successive Projection for Finding a Common Point of Convex Sets, Soviet Mathematics Doklady 6, 6 (1965), 688-692 [4] BREGMAN,L.M., The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming, U.S.S.R. Computational Mathematics and mathematical Physics 3 (1967), 200-217 [5] CENSOR,Y.; HERMAN,G.T., On Some Optimization Techniques in Image Reconstruction From Projections, Applied Numerical Mathematics 3 (1987) 365-391 22

UNIVERSITY OF MICHIGAN 11111111 illlllll I IIIIII11111111 Illl 3 9015 04732 6494 [6] CENSOR,Y.; ELFVING, T., New Method for Linear Inequalities, Linear Algebra and Its Applications 42, 199-211 (1982) [7] CENSOR,Y., Row-Action Methods for Huge and Sparse Systems and Their Applications, SIAM Review, Vol 23, No. 4, Oct. 1981, pp 444-466. [8] CIMMINO,G. Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari, Ricerca Sci. (Roma), Ser. II, Anno IX, 1 (1938) 326-333 [9] DE PIERRO,A.R.; IUSEM,A.N., A Simultaneous Projections Method for Linear Inequalities, Linear Algebra and Its Applications 64,243-253 (1985) [10] EREMIN,I.I., The Relaxation Method of Solving System of Inequalities with Convex Function on the Left Sides, Soviet Mathematics Doklady 6, (1965), 219-222 [11] FLEMING,H.E., Satellite Remote Sensing by the Technique of Computerized Tomography, J. Appl. Meterorology 21 (1982) 1538-1549 [12] GACS,P.; LOVASZ,L., Khachiyan's Algorithm for Linear Programming, Report STAN-CS-79-750, Department of Computer Science, Stanford University, (1979) [13] GUBIN,L.G.; POLYAK,B.T. AND RAIK,E.V. The Method of Projections for Finding the Common Point of Convex Sets, U.S.S.R. Computational Mathematics and Mathematical Physics 6 (1967), 1-24 [14] KACZMARZ, S., Angenherte Auflosung von Systemn Linearer Gleichungen, Bull. Internat. Acad. Polon. Sci. Lett. A. 35 (1937) 355-357 [15] KARMARKAR, N.; A New Polynomial Algorithm for Linear Programming, Comninatorica, 4 (1984) 373-395 [16] MOTZKIN, T.S.; SCHOENBERG,I.T., The Relaxation Method For Linear Inequalities, Canad. J. Math. 6 (1954) 393-404 [17] MURTY, K.G., Linear Complementarity, Linear and Nonlinear Programming, 1988, Heldermann Verlag Berlin. [18] MURTY, K.G., Linear Programming, 1983, John Wiley & Sons. [19] TELGEN, J., On Relaxation Methods for System of Linear Inequalities, European Journal of Operational Research 9 (1982), 184-189 23