NEAREST POINTS IN NONSIMPLICIAL CONES AND LCP'S WITH PSD SYMMETRIC MATRICES K.S. AL SULTAN Department of Systems Engineering King Fahd University of Petroleum and Minerals Dhahran 31261, Saudi Arabia and K.G. MURTY Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 Technical Report 90-22 August 1990

1 NEAREST POINTS IN NONSIMPLICIAL CONES AND LCP'S WITH PSD SYMMETRIC MATRICES K.S. AL-SULTAN and K.G. MURTY Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109-2117, U.S.A. and Department of Systems Engineering King Fahd University of Petroleum and Minerals Dhahran 31261, Saudi Arabia Abstract We discuss conditions for the equivalence of a linear complementary problem (LCP) with a positive semidefinite (PSD) symmetric matrix to a nearest point problem over a Pos cone. We then develop an algorithm for this nearest point problem, and report on its computational performance. Key Words: nearest point problem, convex polyhedral cones, Pos cones, linear complementary problem. 1. INTRODUCTION Given a point q E Rn and a matrix Q of order n x In, we consider the nearest point problem (NPP), denoted by the symbol [Q, q], of finding the nearest point (by Euclidean distance) to q in the cone Pos(Q) = {x: x = QAA = (Al,..., Am)T > 0}. A variety of applications of this problem in robotics have been discussed2. This problem also appears prominently in remote sensing where the composition of a mixture of various constituents is to be estimated using measurements on signals from the mixture and from the pure constituents. D.R. Wilhelmsen8 discusses applications of the NPP in constructing approximations of functions. Finally, the direction finding routine in each Step of the gravitational method for linear programs1 is an NPP. We extend an algorithm developed7 for the case of a, simplicial cone, to the NPP considered here.

2 K.S. AL-SULTAN AND K.G. AMURTY For any matrix A, we denote by Ai., A.j, its ith row, jth column respectively. If S and T are two sets S\T denotes the set of all elements of S which are not in T. IS denotes the cardinality of the set S. For any vector x, IIxII denotes its Euclidean norm. Without any loss of generality we assume that every column vector of Q is nonzero. The problem is to find A = (A1,..., Am)T to minimize Ijq - QAXI2 subject to A > 0 (1) An optimum solution A* for (1) is called an optimum combination vector for the NPP [Q; q], then the desired nearest point in Pos(Q) is x* = QA*. The nearest point x* is always unique, but the optimum combination vector A* may not be unique. If rank Q is p < n obtain any maximal linearly independent subset of rows of Q, say {Qi., *, Qp.}, let Q' be the p x m submatrix of Q consisting of these rows, and F their linear hull. Let q = (q1,., qn)T be the orthogonal projection of q in F, and let q' = (ql,,, qp) be the subvector of q corresponding to the rows in Q'. Then any optimum combination vector for [Q; q] is also an optimum combination vector for [Q'; q'] and conversely. Thus, it is possible to transform the NPP [Q; q] into one of the same form with the matrix of full row rank, however this transformation is computationally expensive, as it involves finding a maximal linearly independent subset of rows of the original Q. Our algorithm works whether Q is of full rank or not. For any 1 < r < m, if the ray of Q.r is not an extreme ray of Pos(Q), deleting the column Q.r from Q does not change the cone Pos(Q), and hence the NPP, and this simpiilifies the NPP. However, detecting such Q.r is computationally exl)ensive. Our algorithm will work even if some Q.r may correspond to non-extreme rays in Pos(Q). 2. THE SPECIAL CASE OF DIMENSION 2 Suppose n = 2. The following direct metlhod can be used. Step 1: In this case it is possible to find a feasible solution, A*, to the 2 x m system QA = q, A > 0, if one exists, by simple geometric procedures. Such a A* is an optiimullm combination vector for [Q; q]. Otherwise, there is no feasible solution to this system, go to Step 2.

NEAREST POINTS... 3 Step 2: For 1 < j < m, the nearest point, q3 in the ray of Q.j to q is given by * f 0 if (Q.)Tq <0 qj_ { Q 0. (2) j = { Q ((Q.T) if (Q.j)T> (2) Q VJviQ. 11jii2 ) Let p be such that qP is the closest among all the q3 to q. Define A* = (A,) by A* = 0 for all j: p, and = ('Q.p)T q for j = p. This A* is the optimum combination vector for [Q; q]. 3. RELATIONSHIPS TO LCP'S WITH SYMMETRIC PSD MATRICES The KKT optimality conditions for (1) lead to the following LCP: find, = (/1,..., m)T, A, satisfying (QTQ)A = _QT 0 > A > 0 o T,A = o 3 If (/i,A) is a solution of the LCP (3), then A is an optimum solution for (1). Conversely, if A is an optimum solution of (1), let t = QTQ\ - QTq, then (/f, A) is a solution of the LCP (3). Thus, any NPP is equivalent to an LCP with a PSD symmetric matrix. Now, suppose M is a PSD symmetric matrix of order m x m and rank n. Consider the LCP (b, M), which is to find wu E Rm, z E R' satisfying w - Mz = b w^,z >0 ztT O (4) We can find a matrix Q of order n x m anld rank ni such that M = QTQ (for example, Q can be taken as the Choleskvl factor of Ml. Subroutine LCHRG in IMSL computes this). Considelr tle following system in variables y = (yl,... y,)T QTy = -h (5) (5) inay not have a solution. As an exanmple. consider 7)M -, )

4 K.S. AL-SULTAN AND K.G. MlURTY Here Q = (1, 1). In this example it can be verified that system (5) has no solution, even though the LCP (b, /l) has the unique solution (w = (3,)T, z = (0, 7)T). If (5) does have a solution y, then the LCP (b, AI) is equivalent to the NPP [Q; y] as discussed above. Theorem 1: Consider the LCP (b, Al) where Al is a PSD symmetric matrix of order m and rank n. Let Q be any matrix of order n x m satisfying M = QTQ. Then the LCP (b, Al) can be transformed into a nearest point problem iff the system QTy = -b has a solution y. Proof: If y is a solution of QTy = -h. the LCP (b, Al) is equivalent to the NPP [Q; y] as discussed above. Now, suppose LCP (b, M) is equivalent to an NPP [Q; cq]. Without any loss of generality we can assume that Q is of full row rank. From the above discussion we know that (TQ =.l and QT = -b. Since M is of order m x m and rank n. these facts imply that Q is of order n x m. From standard results in linear algebra we know that if Q is any matrix of order n x m satisfying QTQ = M, then the linear hull of the column vectors of QT is the same as F, the linear hull of the column vectors of M, and hence (5) has a solution iff -b E F which depends only on M. Thus, if (5) has a solution y for some Q satisfying QTQ = M, then it has a solution for all such Q. These facts imply that the system QTy = -b must have a. solution y too, in this case. Corollary 1: The LCP (b, Al) where.1 is a PSD symmetric matrix, can be transformed into an NPP iff b E F, the linear hull of the set of column vectors of M. Proof: Follows from the proof of Theorem 1. A vector A E Rm, A > 0, is an optimum solution for (1) iff there exists a f E Rm, t >_ 0, such that (,. A) together satisfy (3), hence fi = QT(Qx)-QTq, and as QA is constant for all optimum solutions A to (1), this implies that the vector gl remains the same in all solutions of the LCP (3). Thus, if (/i,A) is a solution of the LCP (3), every solution of it is of the form (A, A) where \ satisfies: A. = 0 if j is such that ftj > O, i - QTQA = _QTq, and \ > 0. Definition: An index j, 1 < j< 77., is said to be a critical index for the LCP (3), or for the NPP (1), if A, > 0 in some solution of (3),

NEAREST POINTS... 5 or equivalently, if the nearest point in Pos(Q) to q can be expressed as Ek=1 AkQ.k where Ak > 0 for all k, and > 0 for k = j. Hence, if j is a critical index for (3), (1), ilj = 0 in every solution for (3). Reduction to a lower dimensional problem using a critical index Consider (1), (3). Let M = (mj) = QTQ, b = (b) = _QTq. Suppose (1) is a critical index. Then in every solution to (3) we have /1 = 0, and hence (3) is equivalent to -Mi.A = bl pi -Mi.A = bi, i = 2 to 7n (6) 112 to 0lm 0 O, A >0 O; and At;,X = O. i = 2 to m. Since mll = (Q.1)TQ.1 > 0, from the first equation in (6), we can get A1 in terms of A2 to Am and eliminate it from the system. This leads directly into an LCP in the variables p2 to It, and A2 to Am. To get this lower order LCP perform a Gaussian Pivot Step on (-M:b) with -M.1 as the pivot column and row 1 as the pivot row. Suppose this leads to -mll -m12 - * l? lm i bl 0 -i22 b2 0 -rhm2.* M - m bm, Let = (rij: i,j = 2 to m), b = (b2.. n~ ~J)T,: = (_(2,' / m), ( = (A2,',Am)T. Then the lower order (LCP obtained after eliminating Al, from (6) is - = b (, j, > 0. ATc = 0o (7) Since 1 is a critical index, every solution for (3) comes from a solution to (7), with itl = 0 and A1 obtailned from the first equation in (6). Since M is PSD, symmetric and of rank n., from the manner in which M is obtained, M of order (m - 1) x (7n - 1) is PSD, symmetric, and of rank (n- 1). Also, since b is in the linear hull of the columns of

6 K.S. AL-SULTAN AND K.G..IMURTY M, b is in the linear hull of the columns of M; So M has a Cholesky factor, Q of order (n - 1) x (m - 1) and rank (n - 1) and b is in the linear hull of the columns of Q, i.e., the system, QTy = -b, has a solution y E Rn-1. Thus (7) is equivalent to the lower order NPP [Q; y] where Q is of order (n - 1) x (m - 1) and rank n - 1. Hence, if a critical index for (1) is known, it can be reduced to a lower dimensional problem Reduction Using Geometrical Arguments Given a critical index for the NPP [Q; q], its reduction to a lower dimensional NPP can also be carried out using geometric arguments as discussed7. Again, suppose 1 is a critical index. Then the nearest point to q in Pos(Q) is also the nearest point to q in Pos((Q - Q.i)). Define n- - n. Ql _ Q 1,, = 9 to 17 Q3 =Q.- I IQ.it112 2 to. q- n-= q _ Q.l(Q.l )Tq q =-q IIQl112 For j = 2 to n, Q. is the orthogonal projection of Q.j in the hyperplane through the origin orthogonal to the ray of Q.1, likewise q- is the orthogonal projection of q in the same hyperplane. The cone Pos((Q: - Q.1)) is the direct sum of the full line generated by Q.1 and the cone Pos(Q). Let Q- = (Q.. Q;~). If' x- is the solution of the NPP [Q-;q-] as embedded in Rn, then.r = x-* + Qi(QlI is the solution of the original NPP [Q; q]. Q- is of order n x (m - 1), and from the construction it is clear that the rank of Q- is one less than the rank of Q, hence the set of row vectors of Q- form a linearly dependent set. It can be verified that (Q-)T(Q-) = Q discussed earlier, and that the NPP [Q-;q-] is equivalent to the LCP (b, M). Since Q- is not of full row rank, it is possible to transform the NPP [Q-; q-] into an ecquivalent NPP of the form [Q=; q=], where Q= is obtained from Q- by dropping a dependent row vector from it, as discussed in Section 1, but it is not necessary to do this work. 4. CONDITIONS FOR CRITICALITY Given z E Rn, z q q, define B(q; z) = {x: lx - ql < |Ix-Zll} C Rn. T(q;z) = {x: (x - z)T(q - z) = 0} is the tangent plane to B(q;z) at its boundary point z.

NEAREST POINTS... 7 Property 1: The point z E Rn is said to satisfy property 1 if z 5 0 and it is the nearest point (by Euclidean distance) to q on its ray {'z: 7 > 0}. It can be verified that z satisfies property 1 iff z ~ 0 and zT(q - z) = 0, i.e., 0 E T(q; z). For such points, we therefore have T(q; z) = {x: xT(q- z)= 0}. The open half space defined by T(q; z) containing q is called the near side of T(q; z), while its complement is called the far side of T(q; z). So, for points z satisfying property 1, near side of T(q; z) is {x: xT(q- z) > 0}, and the far side of T(q; z) is {x: xT(q - z) < 0} For z satisfying property 1, we define N(z) = {j: j =1 to m and (Q.)T(q - z) > 0}, it is the set of j such that Q.j is on the near side of T(q; z). Let r = {1, -, m}. For - S C F, we define Q.s = matrix of order n x ISI whose column vectors are Q.j for j E S. H(S) = linear hull of the column vectors of Q.s in Rn. q(S) = orthogonal projection of q in H(S). In the algorithm, we will deal with subsets S C F satisfying: { Q.j: j E S } is linearly independent. For such sets S, we have q(S)= Q.s((Q.s)TQ.s)-i(Q.s)Tq. We have the following Theorems 2,:3 based on corresponding results7 for the NPP in simplicia.l cones. Theorem 2: A point x E Pos(Q) is the nearest point in Pos(Q) to q iff (i) x satisfies propeity 1. and (ii) either x = q itself. or N(.f) = 0. Proof: If x = q, q E Pos(Q) and hellce q( is the nearest point in Pos(Q) to itself. So, assume that:, $ q. Since x E Pos(Q), it is the nearest point in Pos(Q) to q iff Pos(Q) n B(q; ) =, i.e., iff x is a boundary point of Pos(Q). and T(q; ) separates Pos(Q) and B(q; x) which haplpens iff N(.:) = q. Theorem 3: Assume that q p Pos(Q). Let x E Pos(Q) satisfy property 1. If N(x) is a singleton set, {/ }. then i h is a critical index for the NPP [Q; q].

8 K.S. AL-SULTAN AND I.G. iIUlRTY Proof: Since N(x) ~ >, by Theorem 2, x is not the nearest point in Pos(Q) to q. Let x* be the nearest point in Pos(Q) to q. So IIx* - ql < IJx - ql and hence x* E B(q; x). Since x satisfies property 1, 0 E T(q;.). Also, since N(x) = {h}, Q.j is on the far side of T(q; x), that is T(q; x) separates the ray of Q.j from B(q;x), for all j $ h. Hence T(q;x) separates Pos(Q.1,., Q.h-1, Q.h+l,'' Q.m) and B(q; x ). These facts imply that Pos(Q) n B(q; I) $ o, and that none of the points in it is in Pos(Q.1, *,Q.h-iQ.h+l,,Q.m). Hence, x* satisfies this property too, that is, if x* = 1= -.iQ.j with yj > 0 for all j, then we must have 7h > 0. Therefore h is a critical index for the NPP [Q; q].. 5. THE ALGORITHM If (Q.j)Tq < 0 for all j = 1 to m, N(0) = o. and hence 0 is the nearest point in Pos(Q) to q by Theorem 2, in this case (i" = -QTq, A* = 0) is the solution of the LCP (3). So, in the sequel we assume that (Q., )Tq > 0 for at least one j. The algorithm applies a routine for findingo a critical index at most n- 2 times. This routine itself may either terminate with the nearest point (in which case the whole algorithmn termllinates), or with a critical index. In the latter case we use the critical index to reduce the NPP into one of lower dimension as discussed albove, and repeat the whole procedure on the reduced problem. As the ranlk reduces by 1 with each reduction, this routine will be called at most (u? - 1) times during the algorithm. Actually, when the rank becomes 2, the reduced problem can be solved by the special direct algorit lhml discussed in Section 2, and then from this solution build up an ol)timullm combination vector for the original problem using equations of t lie form of the first in (6) from earlier reduction Steps. The routine maintains a subset S C r scli that Q.j: j E S} is linearly independent, and a current point.r E Ios(Q.s) which always satisfies Property 1. x gets closer to q as tile algorithm progresses. A = (Aj) > 0 is the combination vector cotrrlesplollc(ling to x, satisfying: Xj = 0 for all j ~ S, and QA = -

NEAREST POINTS... 9 Routine for Finding a Critical Index Step 1: [Initialization]: For each j = 1 to i?. define qJ as in (2). qJ is the nearest point to q in the ray of Q.j. If q3 = 0 for all j = 1 to m, 0 is the nearest point in Pos(Q) to q, terminate the whole algorithm. If qJ $ 0 for at least one j, let qh be the nearest to q among all qJ, break ties arbitrarily. Set: = qh, it satisfies property 1. Let S = {h}. Define the corresponding A = Aj where Ah = ((Q q) and j = 0 for j h. Step 2: Compute N(x) = {j: 1 < j 77, and (Q.j)T(q - X) > 0}. If N(x) = X, x is the nearest point in Pos(Q) to q, and the corresponding A is an optimum combination vector, terminate the whole algorithm. If N(x) is a singleton set {h}, h is a critical index for the present nearest point problem, terminate this routine. If IN()l] > 2, go to Step 3. Step 3: If N(i) n (r\S) = $ go to Step 5. otherwise go to Step 4. Step 4: Select a p E N(5x) n(r\S). Compute q, the orthogonal projection of q onto the linear hull of {XQ.}. q will be in Pos(x, Q.,). Actually q = afxl + ca2Q.p where both ac > 0 and a2 > 0. Define A = (Aj) where A\ - aAXj for j E S, a2 for j = P, and 0 otherwise. Let S1 = S U {p}. Obtain (QTs Q.s1)from (QsQ.s)-1 using any of the updating schemes. If linear dependence of Q., with columns of Q.s is signaled, go to Step 5, otherwise replace S,,, A by S1, q(, A respectively and go back to Step 2. Step 5 Compute q(S) = Q.sa(S), where c(S) = (aj(S): j E S) ((Q.s)TQ.s)- (Q.S)Tq. If c(S) > 0, define A = (Aj) by Aj = on(S) if j E S, 0 otherwise. Replace x, A by q(S), A respectively. and go back to Step 2. If a(S); 0, go to Step 6.

10 K.S. AL-SULTAN AND K.G. MURTY Step 6 Let x denote the last point in Pos(Q) as we move from x along the line segment joining it to q(S). Actually x = E ((1 - /) )A +,aj (S)) Q.3 jeS where = min {- -(S) j E S such that ca(S) < 0 (8) [X^-a S) J Let k E S attains the minimum in (8), break ties arbitrarily. Define A = (A) where Aj = (1 - 3)A + 3caj(S), and 0 for j ~ S. Replace x, A, by x, A, respectively, delete k from S, and go to Step 5. Finite Convergence of the Algorithm We have the following facts: 1. At each pass through Step 4, I.T - qll strictly decreases. At each pass through Steps 5 and 6, 1[I - rll lmay decrease or stay the same. 2. Step 4 can be performed at most n times consecutively before linear dependence is signaled. 3. Steps 5 and 6 could be performed at most n-1 times consectively before the projection gets into the relative interior of the Pos of the current set. 4. The same subset of r cannot reappear as S once it changes due to 1. Since there are only a finite number of sul)sets of r, these facts imply that the routine must terminate in a finite number of steps. Also, the overall algorithm is finite as it, uses the above routine at most (n - 2) times.

NEAREST POINTS... 11 6. IMPLEMENTATION AND RELATED NUMERICAL ISSUES ((Q.s)TQ.s)-1 can be updated by standard updating schemes for projection matrices whenever S changes (as it always changes by the addition or deletion of one element). These schemes will detect linear dependence of the set of column vectors of the new Q.s, if it occurs when a new element is added to the set S. An even better implementation is obtained by updating and using the Cholesky factor of (Q.s)TQ.s instead of its inverse, this cuts down the amount of work for each updating from O(IS13) to 0(ISI2). Computing N(x) completely in Step 2 is expensive, particularly when m is large. We found that an efficient, way to carry out Step 2 and to select a p for carrying out Step 4 if that is the step to move next, is to begin the search for the new p from the previous p, continue upto m, and then from 1 upto the previous p again, until the first eligible candidate is noticed, at which point the search is terminated by selecting that candidate as the new p. This is similar to the LRC (least recently considered) entering index strategy commonly mentioned in the literature on the Simplex algorithm for linear programming. Computational Experience The above algorithm was coded in FORTRAN 77 and tested using APOLLO series 4000 machines. The performance of the algorithm is compared with the performance of the following two algorithms. K. Haskell and R. Hanson4 for solving linearly constrained least square problems (HH) which - in our case - boils down to their implementation of Lawson and Hanson5 for Pos cones. The code is available as ACM software 587. The other algorithm WIL is that of D.R. Wilhelmsen8. We coded Wilhelmsel's algorithm using FORTRAN 77. We used updating procedures based on the Cholesky factor. And we used LRC (least recent considered) entering rule as discussed above. Surprisingly this happens to be far superior to other rules. These strategies accelerate this algorithm a lot. Test problems were constructed by generating the elements of Q to be uniformly distributed between -5 and 5 and the elements of q to be uniformly between -20 and 20. All the test. problems are fully dense. The timings of all three algorithms are shown in Table 1. It is clear from the table that our algoritlhm is 2-I tliles faster than the

12 IK.S. AL-SULTAN AND K.G. MURTY Table 1: Summary of Performance Results (Average time per problem in APOLLO 4000 seconds on fully dense randomly generated problems.) n m # problems our algorithmn HH WVIL 50 70 10 6.84 8.5:3 9.55 150 150 10 49.9 74.1 93.6 200 250 10 180.3 412.5 400.1 300 400 10 790.0 1544.0 1644.1 400 500 5 1194.5 3038.9 3118.4 500 550 5 1470.4 4778.5 4364.4 600 800 3 5285.0 11927.0 13260.7 Table 2: Average number of type (i) andl (ii) projections per problem. n m # type (i) projections # type (ii) projections 50 70 52.8 3.5 100 150 116.4 4.5 200 250 177.6 3.7 300 400 303.4 4.2 400 500 351.6 3.9 500 550 357.2:3.4 600 800 587.0 4.67 HH and the Wilhelmsen algorithms on an average. Moreover, the algorithm performs much better than HH when the cone is close to simplicial (or when m is comparable to n) while marginally better when m > n. This may be due to the fact that computationally cheap two ray series of projections (Step 4 of the algorithm) becomes less effective when m > n. The work in our algorithm consists of t llree tIypes of steps, (i) the orthogonal projection of a point into a two rlilllnsiona.l subspace, (ii) orthogonal projection of a point into a s1ihslpace of dimension > 2, and (iii) reduction of the problem into oii( of lower dimension using a critical index. Among steps (i) and (ii). clearly (ii) is far more expensive than (i). In Table 2, we provi(de the' total number of steps of types (i) and (ii) carried out on an. a\ver'age per problem in our algorithm as the order of Q ranges fronm 50 x TO to 600 x 800. The number of steps of type (i) clearly grows as t lhe order of Q increases,

NEAREST POINTS... 13 but the number of steps of type (ii) seems to be more or less constant (between 3 to 5). The good performance of our algorithm stems from the fact that the majority of work in it involves steps of type (i) which are computationally cheap. ACKNOWLEDGEMENTS This work was partially supported by NSF Grant No. ECS-8521183 and by King Fahd University of Petroleum and Minerals. 7. REFERENCES 1. Chang, S.Y. and Murty, K.G. (1989) The Steepest Descent Gravitational Method for Linear Programming, Discrete Applied Mlathematics, 25, 211-239. 2. Gilbert, E.G., Johnson, D.W., and IKeerthi, S.S. (1988) A Fast Procedure for Computing the Distance Between Complex Objects in Three-Dimensional Space. [IEE.Journal of Robotics and Automation, 4 (2), 193-203. 3. Golub, G.H.and Van Loan, C.F. (1989).lIatrix Computations, Baltimore, MD, John Hopkins University Press. 4. Haskell, K. and Hanson, R. (1981) An A-lgorithm for Linear Least Squares Problems with Equality and Nonnegativity Constraints, Mathematical Programming, 21, 98- II 1. 5. Lawson, C.L. and Hanson, R.J. (1974) Sollring Least Squares Problems, Englewood Cliffs, NJ, Prentice-lfalll Inc. 6. Murty, K.G. (1988) Linear Complemnir flrity, Linear and Nonlinear Programming, West Berlin, Heldlerllain \Verlag. 7. Murty, K.G. and Fathi, Y. (1982) A ('ritical Index Algorithm for the Nearest Point Problem on Sill)plic(ial (.ones, lMaathematical Programming, 23, 206-215. 8. Wilhelmsen, D.R. (1976) A Nearest I'-,ilt.\lgorithm for Convex Polyhedral Cones and Applications to P'ositive Linear Approximations, Mathematics of Comlputalio,.:0(. 48-57.

14 K.S. AL-SULTAN AND K.I..MURTY 9. Wolfe, P. (1974) Algorithm for a Least Distance Programming, Mathematical Programming Study 1, 190-205. 10. Wolfe, P. (1976) Finding Nearest Point in a Polytope, Mathematical Programming, 11, 128-149.