EXTERIOR POINT ALGORITHMS FOR NEAREST POINTS AND CONVEX QUADRATIC PROGRAMS'K. S. AJ Sultan and K. G. Murty Department of lViustrial and Operations Engineering The University of Michigan Ann Arbor, Ml 48109-2117 U.S.A. and Systems Engineerng Department King Fahd University of Petroleum and Minerals Dhahran-31261, Saudi Arabia October, 1989 Technical Report 89-31'Partially supported by NSF grant No. ECS-8521183

Abstract We consider the problem of finding the nearest point (by Euclidean distance) in a convex polyhedral cone to a given point, and develop an exterior penalty algorithm for it. Each iteration in the algorithm consists of a single Newton step followed by a reduction in the value of the penalty parameter. Proofs of convergence of the algorithm are given. Various other versions of exterior penalty algorithms for this problem and for convex quadratic programs, all based on a single descent step followed by a reduction in the value of the penalty parameter per iteration, are discussed. The performance of these algorithms in large scale computational experiments is very encouraging. It shows that the number of iterations grows very slowly, if at all, with the dimension of the problem. Key words: Exterior penalty function methods, nearest point problems, convex quadratic programs, Newton step, SOR methods. Abbreviated title: Exterior point-algorithms 2

1. Introduction We consider the nearest point problem, which is that of finding the nearest point by Euclidean distance in a convex polyhedral cone K cRfn, to a point q e Rn. It is a special case of a convex quadratic program with many important applications. In one class of applications in remote sensing we need to find the composition of a mixture of various pure constituents. Signals are obtained from the mixture and each of the pure constituents, and the problem of estimating the composition of the mixture from this data can be posed as a nearest point problem using the least squares formulation. Other applications of the nearest point problem in constructing approximations of functions have been discussed by D.R. Wilhelmsen [1976]. The nearest point problem also appears as a subproblem in each step of the gravitational method for solving linear programs (S.Y. Chang and K.G. Murty [1988]). Also, the nearest point problem has important applications to a variety of problems in robotics (E. G. Gilbert, D. W. Johnson and S. S. Keerthi [1988]). There is a linear complementarity problem (LCP) associated with the nearest point problem, it consists of the KKT conditions that provide the necessary and sufficient optimality conditions for it. The nearest point problem can be solved through the associated LCP using any of the available algorithms for solving LCPs such as Lemke's complementary pivot algorithm, or the principal pivoting methods, etc. (C. Lemke [1965], R. W. Cottle and G. B. Dantzig [1968], K. G. Murty [1988]), or the recently developed polynomial time interior point methods (M. Kojima [1989], M. Kojima, S. Mizuno and A. Yoshise [1989]). The ellipsoid algorithm of linear programming has been extended to solve the nearest point problem in polynomial time, but it is only of theoretical interest so far, as its computational performance is poor (S. J. Chung and K. G. Murty [1981]). Special algorithms for the nearest point problem, based on its geometry are discussed in (E. G. Gilbert [1966], R. 0. Barr and E. G. Gilbert [1969], K. G. Murty and Y. Fathi [1982], D. R. Wilhelmsen [1976], P. Wolfe [1976]). The nearest 3

point problem can also be solved by specializations of any of several descent methods or active set methods for linearly constrained nonlinear programs discussed in the nonlinear programming literature. Thus, a variety of methods are already available for solving the nearest point problem, and some of these are practically efficient. But the nearest point problem appears in many practical applications, and often as a large scale problem. This has provided a continuing motivation for the development of new algorithms which can solve large scale problems faster. This is also the motivation for the work reported in this paper. Our inspiration comes from recent reports of significant improvements in computational performance for solving large scale linear programs and convex quadratic programs through the use of barrier methods in novel ways. In this paper, we develop several penalty methods for the nearest point problem, and report on their encouraging computational performance. If the dimension of K is nI < n, let F be the subspace of Rn which is the linear hull of K, and q* the orthogonal projection of q in F. Then the nearest point in Kto q and q* is the same, and the problem of finding the nearest point in Kto q* can be carried out completely in the subspace F itself, with K a full dimensional cone in it. Thus, without any loss of generality we assume that the cone Kis of full dimension. For any matrix D, we denote by Di., D.j, its ith row, jth column respectively. (xj) denotes the vector whose jth component is xj. If S, T are two sets, S\T denotes the set of all elements of S not in T. When x is any vector, II x II denotes its Euclidean norm. When A is a square matrix, 11 A j1 denotes its norm which is maximum { I Ax |I: over x satisfying 11 x 11 = 1 }. 2 The Two Forms of Input Data for the Nearest Point Problem There are two distinct ways in which the convex polyhedral cone K may be specified. 4

One is as the POS cone of a specified subset of column vectors in Rn, that is, K = POS (Q) = {y: y= _1 XjyO.j, > 0 } where Q is an nxm data matrix. As mentioned above, when K is given in this form, we assume that the dimension of Kis n, i.e., the rank of Q is n. In this form the data for the nearest point problem consists of q and 0. The second form in which K may be given is as { x: Ax 0 } where A is an mxn data matrix. Here also we will assume that the dimension of K is n. In this form, the data for the problem consists of q and A. In practical applications of the nearest point problem, K may appear in either form. When K is given in one of these two forms, to transform it into the other form typically requires exponential effort. A major exception to this occurs when K is simplicial, in this case both Q and A are square and nonsingular and A = O-1, and we can pass from one form to the other in this special case, with just one matrix inversion. We develop special versions of exterior penalty methods for the nearest point problem depending on the form in which K is given. 3 A Penalty Algorithm When K is Given in the POS Form Let K= POS (Q) where 0 is of order nxm and rank n. In this case, the nearest point problem is: find X = (X1,..., Xm)to minimize II q- OX 112 subject to X > 0. If X* is an optimum solution of this problem, x* = QX* is the nearest point in Kto q. It is well known that this problem always has an optimum solution, and that the nearest 5

point in Kto q exists and is unique. The penalty approach solves this problem through an unconstrained minimization problem in X of the form minimize f(;,) = 11 q- QX 12 +-P), overke FRm where P(X) is a penalty function associated with the region { X: X 0), and yu is a positive penalty parameter. The various methods differ in the choice of the penalty function P(X). In this study we will consider the penalty functions that result in the following unconstrained minimizing functions f(-) which are distinguished by a subscript. 1 m fr (X,) = II q OX 112 +- (max {0, -j}), m Let P() = S (max {0,- } )r j=1 for r= 2, 3, 4. f2 (X, #) is continuously differentiable in X up to the first order. f3 (k, p) and f4 (k, /) are twice continuously differentiable in X. All the functions f2(k, u), f3 (k, is), f4(k, u) are convex in the X-space for fixed p > 0. We denote the row vector of partial derivatives of fr (X, p) with respect to X by g{X, p) = Vfr (k, i), and the hessian matrix of fr (X, M) with respect to X, for r= 3, 4, by H (fr (k, p)). We have, for r= 2, 3, 4, gsa,. g) = Vfr (k, /2) = -2qTQ + 2XTQTQ + lpr (X), r) 1 r with defined by where P'r (X) ='(Si X,..., m Xm) with 8j defined by 6

0(J= (-l.)r if Xj > 0 if xj < 0 for j= 1 to m. Also, for r= 3, 4, we have 1 r-2 r-2 H(fr(h, I)) = 2QTQ +-r(r- 1) diag(^r 12,..., 5mrm), where for any al,..., am, diag(ai,..., am) is the diagonal matrix of order mxm with entries in the principal diagonal equal to al,..., am. Line Search Routine for fr (k, yI) Let E Rm and suppose d = (d,..., dm)T is a descent direction for fr (, p) at. Then d fr(Z + ad, L) da (-2dTQTq + 2dTQTQ l) + 2adTQTQd 1 m +- r(-1 )r ( + adj)r'1 dj ~j (a) Z j=_ 1 (1) where j(a)) = 0 if+ and 1 if j + adj <. So r ( + ad ) is a sum of at da most m + 2 terms. Determine the set of ratios { Ij: j such that either Xj> 0, dj < 0, or 4, < O, dj > 0}, and arrange them in increasing order. Suppose there are s distinct values in this set, al < a2 <... < as. Define as+l = oo. For any u = 1 to s, in the interval au < a < aU+i, each of the terms on the right hand side of (1) is monotonic and has the 7

same sign. Since d is a descent direction for fr (X, p) at, d + ad < 0 at a = de; 0, and as f,(, jy) is convex and has a minimum, this derivative becomes 0 at some point in this direction. So, if we begin computing + ad, ) for a = 0, al, a2,... da there will be a t such that it remains negative until we get to at, and at at+1 it becomes > 0. Then, we know that the minimum of fr (k, p) in X, over the half-line { X + ad: a > 0 }, is attained by some a in the interval at < a < at+1, and it can be found by an efficient d fr (I + dd_! search for the zero of-r + ad 1p in this interval, as all the terms on the right da hand side of (1) are monotonic and of the same sign. Thus for the problem of minimizing fr, ( + ad, M) over a > 0, when d is a descent direction for fr (k, J) at X, the optimum step length can be determined efficiently by the above method. Selection of Search Directions Given a fixed positive value for the penalty parameter j, a descent method can be used for finding an unconstrained minimum of fr (k, /) in X. This method begins with an initial point X= X~ and goes through several steps. If Xo is such that q = Q k~, then XO is a global minimum of the problem when j = oo and thus could be used as a starting solution for p > 0 and large. In each step, a descent direction at the current point, Xk, is generated, and a step is taken from Xk in that direction. We get a variety of methods by varying the selection of search directions in each step. We basically consider two search directions. 8

Steepest descent search direction: Under this rule, the search direction at the current point Xk is -V f (Xk, p), and step length for the move in this direction is taken to be the optimum step length determined as discussed above. The method based on this strategy is the steepest descent version of the method. Newton search direction: Under this rule the search direction y at the current point Xk is a solution of the system H (fr (Xk,/ ))y = -Vfr (k, u). (2) If K is simplicial (i.e., if m = n and Q is of order nxn, since we already assumed that 0 has rank n) then H (fr (%, p)) is positive definite and (2) always has a unique solution. If K is not simplicial (i.e., 0 is of order nxm, m > n), H (fr (Xk, g)) is positive semidefinite, and may be singular, and (2) may not have a solution. In this case we can use some of the standard modifications in the literature for defining the Newton search direction. If this happens, we will replace (2) by (H (fr (Xk, )) + T I) y= -V fr (k, #i) (3) where r > 0 and I is the unit matrix of order m. The matrix on the left hand side of (3) is positive definite and hence (3) again has a unique solution which is a descent direction for fr (X, ip) at the current point Xk. When using the Newton search direction, we can take the step length for the move to be 1 in all the steps, this leads to the standard Newton method. Or, we can take the step length for the move to be the optimum step length determined as discussed above. The method based on this strategy is the Newton method with line searches. 9

Termination In practice we select a tolerance e, positive and sufficiently small, and terminate the methods when the current point Xk satisfies 1 Pr() < e, or if Xk > -ee, where e is the column vector of all 1's in Rm. At that stage, the point Q Xk is accepted as the nearest point in Kto q. The Classical Exterior Penalty Method This method, discussed extensively in nonlinear programming literature, goes through many iterations. In each iteration, the value of the penalty parameter p1 is fixed, and an unconstrained minimization algorithm is used to find the global minimum f,{(, u) over X E Rm. This algorithm itself may take several descent steps in this iteration. A commonly used termination criteria for this unconstrained minimization is II g X1, ) II < c* (4) where e* is a positive and sufficiently small tolerance. Then the value of the penalty parameter p, is decreased, and the method moves to the next iteration. It has been shown (see, for example, A. V. Fiacco and G. P. McCormick [1968]. K. Truemper [1975]) that this method converges to the optimum solution of the original problem. If X(g) denotes the unconstrained minimum of f,(, u) over X, as a function of pi, then X(g) converges to A*, an optimum X for the original problem, as L -- 0. In fact. as shown in K. Truemper [1975], it is possible to select a target value.* > 0 for the penalty parameter, such that when p decreases to this value or becomes smaller, then X(p) is within a specified tolerance of X*. 10

The New Exterior Penalty Method We will now describe an algorithm based on the ideas of the classical penalty method, but differs from it in one important aspect. In this algorithm we do not find the unconstrained minimum of f,(, p) for each fixed p.. Instead, we carry out only one descent step of the unconstrained minimization algorithm, then reduce the value of the penalty parameter pI, and repeat in the same way. One can visualize a path in the X-space, parametrized by p, { X: V f,., jP) = 0, lu > 0 }, called the exterior path, and an envelope containing this path defined by {: 11 V fr(X, ) II < e, e. > 0 } for a certain e > 0. We will later on prove that the points obtained in this new algorithm, can be interpreted as a sequence of points in this envelope converging to the limit of the point on the exterior path corresponding to p. as i tends to 0. Basic algorithm Initialization Let e be a prespecified accuracy, p be a multiplier satisfying 0 < p < 1. Select ~o > 0 and large, select Xo satisfying Qok = q, let k = 0. Main step 1. Xk is the current point. Find the direction dk at Xk (either the steepest descent direction or Newton's direction). 2. Select step length ak. This is 1 for standard Newton method, or the optimum step length for other methods. 3. Let Xk+l = Xk + ak dk. If Xk+1 > - e stop. Otherwise, let gk+l = p k, k = k + 1, return to 1. 11

4 Theoretical Results on Convergence of the Algorithm Classical penalty methods are known to converge to an optimum solution of the original problem. Proofs can be found in nonlinear programming literature (for example, D. G. Luenberger [1984], M. S. Bazaraa, and C. M. Shetty [1979], A. V. Fiacco, and G. P. McCormick [1968], K. Truemper [1975], J. P. Evans, F. J. Gould, and J. W. Tolle [1973], T. Pietrzykowski [1968]). However, these proofs assume that an actual optimum solution for the problem of minimizing fk, j) over X E Rm is obtained for each fixed value of p in a sequence converging to zero. We do not really obtain the unconstrained minimum of f(k, p) over X e Rm for any value of yp before we change it. Hence, standard proofs of penalty methods do not apply directly to our algorithm. In this section we give a convergence proof for standard Newton version of our algorithm in the simplicial case, that is, when 0 is a nonsingular square matrix. So, m = n, and the algorithm considered is the one-step Newton method (step length = 1) attended by a decrease in the penalty parameter p after every step. We restrict our attention to the case r= 3, i.e., the unconstrained minimizing function is f3(k, pu), which we denote by f(X, i) for the sake of simplicity. We denote by g(X,,A), H(X, #u) the gradient vector (as a row), and the Hessian matrix of f3(k, A) with respect to X. So, 3 2 2 g(X, /) = -2(q- Q)TQ - 3 (861,..., 8r) (5) H(k, l) = 2QTQ - 6 diag (61x1,..., 8nn) where 6 = (81,..., 8n) is a function of X defined by:forj=1 to n 12

0 if Xj >=. 0 81={1 > 0 (6), {l if Xj < 0 THEOREM 1: f(k, u) is a strictly convex function in X E Rn, for all, > 0. PROOF: Since Q is nonsingular, QTQ is PD (positive definite). Also, from (6), we verify that -(6/p) diag (81X1,..., 68n) is a positive semidefinite matrix for every u > 0 and k E Rn. So H(X, y/) is PD for every p > 0 and X E Rn. Hence f(X, p) is a strictly convex function in X E Rn, for every / > 0. 1 THEOREM 2: For each u > 0, the system g(x, /) = 0 (7) has a unique solution in X, say X(p). X(pu) is a continuous function of g in { p: p > 0 }. Also, as, -> 0 through positive values, 11 X(u) 11 remains bounded above by a constant which depends only on q and Q. PROOF: Fix p > 0. Select any point x from K= POS (Q). Let (p = II q - x 1, S = { y: II q - y II < (p }, r = {: X = Q-ly, y E S }. is a compact set, and as f(X, ju) is a continuous function in X, there is a point X E r where f(X, g) attains its minimum over r. For all y 4 S, |1 q - y |I > (p, and hence for all X 4 r, 11 q - QX 112 > (p2, which implies that f(X, pA) > p2 since P3(X, jL)! 0 for all X. Let = Q-1x. As x E Kwe have X > 0, thus P3 (T, 1L) = 0, and f(X, p) = p2. Hence for all X 4 r, we have f(X, p.) > (p2, and for at least one X E r we have f(X, 1p) < (p2. This implies that X, the minimizer of f(X, p.) over r is its global minimum. Hence f(X, p.) attains its minimum in X, and since it is strictly convex in k by Theorem 1, it has a unique global minimum in X, which is attained at the solution to ag(, pL) g(X, #s) = 0. Hence the system g(X, g) = 0 has a unique solution,;X(p), say. Oah 13

H(3, ji) is nonsingular for all X whenever u > O. Hence by the implicit function theorem, X(#L) is continuous in y in the region { i: j > 0 }. From penalty function theory we know that X(p) - X* where X* = Qlx*, x* being the nearest point in POS (Q) to q. This and the continuity of X(p) implies that I (#u) II remains bounded above by a constant as s -e 0 through positive values, where this constant depends on II X* II which itself depends only on qand Q. 1 From Theorem 2 we know that { X (): p > O } is a path in the X-space, called the exterior path. The set {: 1 g(k, p) II' e, p > 0 } for some e > 0 defines an envelope containing the exterior path. We will now show that it is possible to interpret the points obtained in our algorithm, as a sequence of points in this envelope leading to X* as p -0. For #u > 0, e> O, define Q (. e) = {X: 11 g(X, ) II e} (8) All the eigenvalues of QTQ are real and positive since it is symmetric and PD. Let L1, on be the smallest and largest eigenvalues respectively, of the symmetric PD matrix QTQ. Hence Ol > 0. THEOREM 3: For p > 0 and every X E Rn, the smallest eigenvalue of H(X, p) is > 2Oi. Hence f(k, p.) is a strongly convex function in X for all i > 0. PROOF: Let as be the smallest eigenvalue of H(X, p). Then os = min{zTH(X, z)z: z = 1 } (9) 14

From (5), we have ZT H(, L) z = 2 zTQTQz - zT(diag (611i,..., 5nkn))Z > 2 zTQTQz from the definition of 8js in (6). Hence, from (9), as >- min {2 zTQTQz: II z II = 1 } 2o1 This automatically implies that f(k, pg) is strongly convex in X for every g > 0. 0 THEOREM 4: The set Q(g, e) is bounded for all e > 0, whenever g > 0. PROOF: Given I > 0, X() is the solution of g(k, g) = 0 in X. In Theorem 3 we established that f(k, A) is a strongly convex function in X. Hence, there exists a constant y such that for all X, X, I I gSxq II) - gX gC) I I =- * II x - )DI (10) See B. T. Polyak [1987]. Substituting X = X(li) in (10) leads to 11 g(xq il) 11 => YI x - x1) 11 (11) for all X, since g(X(A), g) = 0. From (1 1) we conclude that if X E Q(1(, c), then 15

III x ()) II < E/y (12) The fact that X(p) is bounded, and (12) together imply the result in this theorem. 0 We will now investigate what happens in one iteration of our algorithm. THEOREM 5: Let p, X be the penalty parameter, and the vector respectively, at the end of one iteration. pi = pp is the value of the penalty parameter for the next iteration. If E > 0 is such that II g(k, p) II _ e, then II g(, + - where 3 is a constant. P P 1 b.... 5n^). So PROOF: g(X, g) = a +- b where a = -2(q - Q:)TQ and b = -3(81 Xi,..., 6-Xn). So II g(, I) II b - Ila+ 11 P.L p p. -1 (p- 1)aII+Ia+-I1 <1 - P)1 b'< ( P ) la I I+ - I1 a + - II p p I4 (1 - ) IIall +P P (13) Now II a ll | -2(q- QX)TQ || < 211 qTQ 11 + 211 QTQ l I X II 16

< P where P is some positive constant since 1 X 11 is bounded on Q (Lp, e) by Theorem 4. Hence, from (13), we have,p P P LEMMA 1: 1 (H(k, ))-1 | < 1/2O1, where 01 is the smallest eigenvalue of QTQ. PROOF: This follows from the arguments in the proof of Theorem 3. D THEOREM 6: Let p., X be the penalty parameter, and the vector at the end of one iteration. In the next iteration set the penalty parameter to A = pp, and carry out a single Newton step leading to the new vector X. Then 11 9(g( 1 _II a II g(., 1t)|12 3 where a PROOF: Define B = - diag(81i1i,.... 6n,Xn) H = H(,) = 2QTQ + 2B (14) g = g(X, ) = -2qTQ + 2hTQTQ -3 (~1X1,..., 5n ) g~k, P -2q 8' 17

From the Newton step we know that Z = X - H-1 g(k, i) = X - H-lg. Hence g(X, F) - 2qTQ + 2VQTQ - 3 (8112,..., Snn2) i -2qTQ + 2(k - H-lg)TQTQ - 3( (i - (H-)i. g)2) -2qTQ + (k - H-1g)T (2QTQ + 2B - 2B) 3(8 (i (- 2Xi (H-l)i. g + ((H-)i. g)2)) II -2qTQ + 2XTQTQ - gTH-1 (H - 2B) (15) (8i (X - 2X (H-1)i. g + ((H-1)i. g)2)) -2qTQ + 2XTQTQ gT + 2gTH-1B - (6i 2) -2gTH- B - (((H-)i. g)2) it - ( — i ) by substituting for gT from (14) and cancelling terms. 18

So, II g(, -) 112 = 9 j2 m. ((H-1)i. g)4 i=1 (m 2 _, ((H-l)i. g)2 i=1 9 2 CL Hence, 1I g(~q ji) II 3 3- II H-lg 112.. 3 1I H-1 112 11 g 112 5~< 3 11g 912,bylemma 4 0a12 That is, 11 g(Tq P 11I 34II g (, — ) 112 4i (12 Now we select a sufficiently small target value for the penalty parameter gl, say A A g, and a tolerance e, and show that the algorithm converges to a point X satisfying (4), i.e., 11 g(X, g*) 11' e*, for some A* ^, E* <. By the results on the classical penalty method mentioned above, this will imply that point X at termination is within the specified tolerance of X*, the optimum X. Define max 1 3: max { 27,4 4'2 } 2e 4o 12 (16) 19

I* 3 1 4a* o12' 2a* Therefore g* = g, e* < ~. Since A, e are small, a is a large positive number. Let 3* > 1 be a large positive number which is an upper bound for 11 x 11 over the set Q(j*, ~*), guaranteed by Theorem 4. Select p 1 - 1-1 6a*P*:17) We will now denote by Jk, Xk, the penalty parameter, and the X-vector at the end of iteration k in the algorithm, for k = 1, 2,... gk+ = ppk < 1 k for all k, and we will show that when k is such that gk _ A*, then II g(kk, pk) II < e* will hold. LEMMA 2: If the algorithm did not terminate in iteration k, then we will have II g(kk+l, k+l) II a*x I g(?k, lk+l) 112 PROOF: Since the algorithm did not terminate, we have'k+l > *, and this lemma follows from Theorem 6. 1 THEOREM 7: If II g(k, k) I|| e*, then II g(Xk+l, gk+l) I| < e* too. PROOF: From Lemma 2, we have II g(Xk+l, p.k+l) II < ca* II g(Xk, Lk+l) 112 20

a* 1- P * + e), by Theorem5 P P * ((1 -p)2 (F* (e)2 2e*P*(1 - p) P P P - *(9) + (18) by substituting for 1 - p from (17). From the choice of p it is clear that the right hand side of (18) is' e*, proving the theorem. 0 How to initiate the algorithm? If Q-1q O0, then q e POS (Q), and hence q is itself the nearest point in POS (Q) to q, terminate. So, assume Q-1q 0. We will now show how to select an initial penalty parameter g~ and point k~, so that II g(XO, gO) 11 e*. Select;~ = Q-1q (19) o~ = 3 N ((x:)4 overj such that;X < 0O Then it can be verified that II g(X~, ~) I = e*. When initiated with Xo, i.o as given in (19), with p selected as in (17), it is clear that the method terminates after at most (log g~ - log g*) / (- log p) iterations, with a X close to the optimum point X*. 21

5 The exterior penalty method for the nearest point problem when K is specified by homogeneous linear inequalities. Let K= { x:Ax > 0 } where A = (aij) is of order mxn. We assume that K is of full dimension in Rn. In this case the nearest point problem is: find x e Rn to minimize 1 q- x l2 subject to Ax 0 In the penalty approach we solve this problem through the unconstrained minimization problem of the form 1 m minimize hr(x,.) = 11 q- x 112 + -, (max { 0,-Ai. x } )r over x Rn A i=1 for r = 2, 3, or 4, where g is a positive penalty parameter as before. For ji > 0, hr (x, Lt) is convex in x for all these values of r. For r = 2, 3, 4, V hr(x, ), the gradient vector (as a row) of hr(x, p.) with respect to x at x, is Vhr(xi, ) = (-1)r r m -2(q - x)T + - Dui (max { 0, - Ai. x } )r-1 Ai. AL i=1 if Ai. x > 0 if Ai. x < 0 where (20) 22

for i = 1 to m. Also, for r = 3, 4, we have H(hr(x, a)) = the hessian matrix of hr(x, Ca) with respect to x, H(hr (x, )) = 21+E where I is the unit matrix of the order n, and E = (eij) of order nxn is given by r(r- 1) m eij = k ~t (max {0, At. x} )r-2 ati atj for i, j = 1 to n, with vt as defined in (20). Using these, the new exterior penalty algorithm for this problem, based a single descent step followed by a decrease in the penalty parameter per iteration, is constructed in a way similar to that discussed in Section 3. 6 Extension to convex quadratic programs Consider the convex quadratic program, 1 minimize z(x) = cx +xTDx subject to Ax b x>O where D is a symmetric positive semidefinite matrix and A is of order mxn. In the penalty approach we solve this problem through the unconstrained minimization problem of the form 23

1 1 m minimize wr(x, pI) = cx + xTDx + - (max { 0, bi - A.x } )r I~ i=1 1 n +- A (max{O,-xj})r cI j=1 over x e R" Using the function wr(x, A), versions of the new exterior penalty algorithms for this problem are constructed exactly as before. 7 Results from computational experiments So far, we have conducted a computational experiment with the new exterior penalty algorithm described in Section 3, for solving randomly generated nearest point problems in simplicial cones of dimension n ranging from 10 to 700. DATA GENERATION: Results are reported for the case of problems in which q and Q are generated randomly as follows. Each element of q is generated from the uniform distribution between -5 to +5, of double word length (8 bytes). Likewise elements of Q are generated from the uniform distribution between -20 and +20, of the same word length. No nonsingularity checking was ever carried out on any of the matrices Q generated, but the system of equation (2) always had a solution in methods which used it to generate descent directions. The vector q and the matrix Q are fully dense in all the problems generated. We have solved various other problems in which q and Q are randomly generated according to different distributions, but the performance of the algorithms turned out to be the same. 24

Implementation details Experiments were carried out using the following versions of the algorithm. VERSION 1: Using Newton direction with step lengths 1: In this version, exactly one descent step is carried out, followed by a decrease in the penalty parameter, per each iteration. f3(, p) and f4(k, A) both have second derivatives at all k E Rn. But f2(, Ap) has second derivatives only at points in which all components are nonzero. It has been observed that all the Xk generated in the algorithm tended to have all components nonzero, even though some of the components were clearly converging to zero (as explained in D. G. Luenberger [1984], this might be due to the fact that exterior methods approach the optimum from outside the feasible region). So, we implemented this version also with f2(k,.), using the formula for the hessian given in Section 3, and the method never encountered any problems in thousands of runs in early experiments. VERSION 2: Using Newton direction with line searches: Unlike Version 1, here we implemented this version only using f2(X, p), because of the simplicity of the line search routine for r = 2. Again, only one descent step was carried out per iteration VERSION 3: Using steepest descent strategy: We implemented this version using f2(X, p.) and also fi(k, p.). Although f1(k, p) is not differentiable at points X with some components zero, the previous discussion on second order differentiability applies here. Unfortunately, this version based on f1i(, p.) did not converge in most cases. A close look at the gradient reveals that this divergence is expected since the gradient does not carry much information about negative Xis. 25

In this version we also experimented with carrying out several descent steps per iteration (i.e., several descent steps between changes in the value of the penalty parameter). Termination Criteria For all versions, the following criteria have been tried as signals for convergence in early experimentation. Small positive tolerances e1, ~2 have been selected, and the algorithm is terminated in iteration k if the penalty parameter gk and the vector Xk satisfied both the following conditions i) hX >5 -e1, for allj = 1 to n 1 ii) - Pr(k)< ~2 condition (i) relates to "near feasibility" (within the tolerance e1). Both conditions have been used in early experiments, and it always turned out that when one held the other too held in the same iteration (depending on the choice of suitable values for ~1 and C2). Hence, in later tests, we used condition (i) as the sole termination criterion. In Figure 1 we provide the plot of, ( | Xj |: over j such that Xj < 0 ), which is a measure of infeasibility of the current X-vector, in each iteration of the standard Newton version based on f2(k,.), for a problem of dimension n = 40. It can be seen that this infeasibility measure drops very sharply and becomes almost zero in about 6 iterations. 26

Infeasibility versus # iterations 409.6 20 20 2. c C 10 0 0 1 2 3 4 5 6 Iteration # Figure 1 27

Updating the Penalty Paramter For the algorithm to converge, we observed that the penalty parameter A. must reach a sufficiently small value (like 10-12, this depends on the desired accuracy ~1). The penalty parameter is updated using the formula p.k+l = pAk. Clearly, the smaller the value of p, the more rapid is the decrease in pI. However if p is taken too small, the unconstrained minimum of fr(k, l1k+l) tends to be far away from the unconstrained minimum of fr(X, pJk), thus ruining the chance of gaining advantage from using steps of Newton's method which is only locally convergent. So, we experimented with different values of p, and provided the best observed values of p in the tables. A lot of experimentation has been done to determine the best value of pLO, the initial penalty parameter value. The smaller the value of I~o, the less the number of iterations to drive p to its terminal value. However, since the starting X, which is XO = Q-1q, is the unconstrained minimum of fr(X, pi) only at p. = oo, choosing a small value for ~O puts the unconstrained minimum of fr(R, ~O) far away from XO, resulting in slow convergence. In most experiments, selecting ipo = 0.01 yielded excellent results. Solving (2) For Finding Newton Directions Several strategies have been tried for this. One is to solve this system directly using matrix factorizations. This provides accurate solutions but is expensive in computer time when the dimension n is large (particularly because all our problems are fully dense). Iterative methods based on preconditioned conjugate gradient methods turned out to be even more expensive computationally than direct matrix factorization methods. We also tried iterative successive overrelaxation (SOR) method, which turned out to be the best among all the methods we tried. In fact approximately <n iterations of these SOR methods each with 0(n2) effort yielded results comparable in accuracy to those obtained with 0(n3) effort in other methods. The best value for the 28

relaxation parameter, o*, seems to increase with n (Table 2). However, a lot more work remains to be done to obtain good implementation for this aspect of the algorithm. Since almost all the work in algorithms based on Newton directions goes into solving system (2) in various steps, optimizing this aspect is very important. The algorithms were coded in FORTRAN 77 using double precision arithmetic and run on an IBM 3033 or on an APOLLO Series 4000. Summary of Computational Results: Number of Iterations NEWTON BASED METHODS: In all versions based on Newton directions, it can be seen from Table 1 that the number of iterations grows extremely slowly, if at all, with problem dimension. As mentioned earlier, there is tremendous scope for improving the computer time taken for solving (2) in each iteration, hence the number of iteration is a more reliable guide of algorithm performance, than the computer time, and this is about 6 for version 1 (constant step length) and marginally less for version 2 (Newton with line searches), almost independent of problem dimension. Even with our current tentative equation solving routines for direction finding, it is clear that these algorithms are superior to implementations of other existing algorithms. This almost constant number of iterations was about 6 for versions based on f2(R, l)q 38 for versions based on f3(X, l), and 70 for versions based on f4(k, A). The main reason for this may be the fact that f2(k, I) is very nearly quadratic in X, thus making it well suited for Newton method. Since f2(k, pl) is almost quadratic one Newton step almost always leads very close to the unconstrained minimum of this function in each iteration, thus enabling a much faster reduction in the value of p. The barrier terms based on logarithmic functions employed by interior point methods do not share this nice property. 29

We compared the performance of our algorithms with M. J. D. Powell's implementation of A. Idnani and D. Goldfarb's dual quadratic programming algorithm (available through IMSL as subroutine QPROG), K. Haskell and R. Hanson's [1981] routine available through ACM algorithm 587 for linearly constrained least squares, and with our implementation of D.R. Wilhelmsen's nearest point algorithm. Our algorithm was superior to each of these, but we display comparative figures only for K. Haskell and R. Hanson's code and QPROG to conserve space (comparative timings for the other algorithm can be obtained from the authors). We found that the total number of iterations depends critically on the value of p used. We found that once a j becomes > 0, it remains > 0 in almost all subsequent iterations. This may explain the excellent performance of the algorithm. STEEPEST DESCENT BASED METHODS: We found that it is better to do many descent moves between consecutive updates of the penalty parameter. In each iteration we continue making descent moves as long as there is significant change in the solution vector X. When this becomes small by the LO, norm we update A. and go to the next iteration. There is the possibility of using conjugate gradient based moves rather than steepest descent moves in each iteration. Implementing conjugate gradient moves may be tricky as gI becomes smaller, hence this has not been tested yet. In comparing these methods with those based on Newton directions one should bear in mind that each move in these methods is computationally cheaper as there is no equation solving involved. These methods are more sensitive than Newton based methods to the strategy for updating the penalty parameter. In general the rate of reduction of p has to be slower, more so as n increases. 30

The number of descent moves per iteration was almost constant at 15 independent of problem dimension. The number of iterations itself averaged around 37 in problems with n up to 250. From Tables 1, 2, and 3 we see that our current implementation of this version does not compare favorably with the implementations of Newton based versions, or even with QPROG, in terms of computer time. However, there is tremendous scope to improve the coding of this version. LINE SEARCHES IN NEWTON DIRECTIONS: In versions based on Newton's method with line searches, almost always the step length ranged between 0.8 to 1.2, with many of them very close to 1. Moreover, the line search faces numerical difficulties when the penalty parameter gets small. Thus, the following strategy was implemented: apply Newton's method with line search till the penalty parameter reaches a certain value, and then start applying standard Newton's method (i.e. with a constant step length of 1). Unfortunately, Newton's method with line search does not reduce the number of iterations significantly (this may be attributed to the surprising small number of iterations, and the size of the optimum step length which is close to 1). SAVING ITERATIONS BY PROJECTIONS: It has already been mentioned earlier that if Xj becomes > 0, it tended to remain > 0 in subsequent iterations. In the nearest point problem, if we know the set J = { j: Xj > 0 in the optimum solution }, it is well known that the nearest point itself can be found by orthoganally projecting q into the linear hull of the face of K corresponding to J. In some experiments we selected a tolerance ~3 (about 10-3) and when the current solution vector Xk satisfied Xj > -~3 for all j, we have taken { j: x > 0 } as an estimate for J and used the projection strategy. This has cut the number of iterations by about 1/3 on an average. 31

g Fixed At A Small Value We experimented with selecting p. at the small target value initially itself and carrying out several descent steps keeping p. fixed. This scheme worked well and converged with the same (sometimes less) number of iterations as the usual scheme. However, 12 numerical difficulties have been encountered as the product - was large causing the hessian to be unstable. This does not happen when p. is gradually reduced because as pk gets smaller, I;L I for j such that Xj < 0 also gets smaller, and the numerical difficulty is avoided. 32

TABLE 1: Time in Seconds on IBM 3033 for Version I (Standard Newton Steps) Il;laed on fr (3. l) r = 2 __r = 3 Rr =4 n # Avg. # time optimal Avg. # time optimal Avg. # time optimal Time problems iteration p iteration p iteration P for......... _____ ___ ______ ______________________QPRCX 10 200 5.80.658.02 37.4.71.29 69.5.77.4.656 20 200 6.01.681.02 38.877.29 70.1 1.075.4.675 30 200 6.03.722.02 38.1 1.17.29 70.5 1.62.4.73 40 200 6.04.7983.02 38.13 1.61.29 70.6 2.44.4.819 50 200 6.04.895.02 38.4 2.24.29 70.79 3.59.4.964 100 100 6.08 1.98.02 39.1 9.27.29 71.2 16.35.4 2.88 700 1 7 339.7.02 + + + + + + 652. l.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ M J* i I I CO CO) Accuracy = 10-8. System (2) solved by direct factorization using IMSL routines LFCSF and LSLSF * This is M.J.D. Powell's inplementation of A. Idnani and D. Goldfarb's dual QP algorithm (IMSL routine QPROC G). + Not tried.

TABLE 2 Time in Seconds on APOLLO Series 4000 for Version I (Standard Newton Method) based on f2 (X. Pi) n |# problems Avg. # of time optimal 0* tim iterations p 10 10 6.0 6.12.02 1.02 6 50 10 6.0 4.05.02 1.05 4 100 10 6.0 24.0.02 1.25 3 200 10 6.1 150.0.02 1.30 2' 300 10 6.3 405.1.02 1.30 8! 400 10 6.3 902.3.02 1.30 21 700 2 6.5 4302.0.02 1.45 11' te for.07.50 4.0 59.6 90.0 23.0 768.0 Accuracy = 10-7. System (2) solved by SOR with relaxation parameter o* (found best by experimentation). Almost in all cases an SOR iterations give an accurate solution to (2). This is the K. Haskell and R. Hanson's routine available through ACM algorithm 587.

TABLE 3: Time in Seconds on IBM 3033 for Version 3 (Steepest Descent Steps) based on of (A. P) n |# problems Avg. # of Avg. # steps Time Optimal Tim iterations per iteration P QP 10 200 12 14.66 1.807.1. 20 200 12 15.16 1.17.1 30 200 35 14.9 3.76.5 40 200 36 15.56 6.26.5.1 50 200 36 13.38 8.08.5 100 100 36 15.58 32.9.5 2 250 1 37 14.9 197.3.5 3 le for ROG 565 67 727 815 96.33 1.7 LO CY) Accuracy = 10-3

Conclusion In our experiments so far we found that the best performance is obtained by the version using standard Newton steps and based on f2(X, 4). Of course, several other versions are still under investigation. Acknowledgements We acknowledge the many discussions we had with R. Saigal. We are also grateful to 0. L. Mangasarian [1989] for bringing the result on strongly convex functions to our attention which greatly simplified the proof of Theorem 4. 36

References 1. M.S. Bazaraa and C.M. Shetty, Nonlinear Programming: Theory and Algorithms, (Wiley, NY, 1979). 2. S.Y. Chang and K.G. Murty, "The Steepest Descent Gravitational Method for Linear Programming," Discrete Applied Mathematics, to appear. 3. S.J. Chung and K.G. Murty, "Polynomially Bounded Ellipsoid Algorithms for Convex Quadratic Programming," in O.L. Mangasarian, R.R. Meyer, and S.M. Robinson (Eds), Nonlinear Programming 4, (Academic Press, 1981). 4. M.B. Daya and C.M. Shetty, "Polynomial Barrier Function Algorithms for Convex Quadratic Programming," School of ISE, Georgia Tech (Atlanta, GA, 1988). 5. J.P. Evans, F.J. Gould, and J.W. Tolle, "Exact Penalties Functions in Nonlinear Programming," Mathematical Programming, 4,2 (1973) 72-97. 6. A.V. Fiacco and G.P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques, (Wiley, NY, 1968). 7. E.G. Gilbert, D.W. Johnson, and S.S. Keerthi, "A Fast Procedure for Computing the Distance Between Complex Objects in Three-Dimensional Space," IEEE Journal of Robotics and Automation, 4, 2 (1988) 193-203. 8. G.H. Golub and C.F. Van Loan, Matrix Computations, (John Hopkins University Press, Baltimore, MD, 1989). 37

9. K. Haskell and R. Hanson, "An Algorithm for Linear Least Squares Problems with Equality and Nonnegativity Constraints," Mathematical Programming, 21 (1981) 98-118. 10. N. Karmarkar, "A New Polynomial Algorithm for Linear Programming," Combinatorica, 4(1984) 373-395. 11. M. Kojima, "A Polynomial-Time Algorithm for a Class of Linear Complementarity Problems", to appear in Mathematical Programming [1989]. 12. M. Kojima, S. Mizuno, and A. Yoshise, "A Polymonial-Time Algorithm for a Class of Linear Complementarity Problems," Mathematical Programming, 44, 1 (1989) 1-26. 13. C.E. Lemke, "Bimatrix Equilibrium Points and Mathematical Programming," Management Science, 11 (1965) 681-689. 14. C.E. Lemke, "On Complementary Pivot Theory" in G. B. Dantzig and A. Veinott (eds) Mathematics of the Decision Sciences," (1968). 15. D.G. Luenberger, "Linear and Nonlinear Programming," 2nd edition (AddisonWesley, Melo Park, CA, 1984). 16. O.L. Mangararian, Private Communication (1989). 38

17. K.G. Murty, Linear Complementarity, Linear and Nonlinear Programming, (Heldermann Verlag, West Berlin, 1988). 18. K.G. Murty and Y. Fathi, "A Critical Index Algorithm for the Nearest Point Problem on Simplicial Cones," Mathematical Programming, 23 (1982) 206-215. 19. T. Pietrzykowski, "An Exact Potential Method for Constrained Maxima," SIAM J. Numerical Analysis, 6, 2 (1968) 299-304. 20. B. T. Polyak, Introduction to Optimization, (Optimization Software, Inc., NY, 1987). 21. J. Renegar, "A Polynomial-Time Algorithm Based on Newton's Method for Linear Programming," Mathematical Programming, 40, 1 (1988) 59-95. 22. J. Renegar and M. Shub, "Simplified Complexity Analysis for Newton LP Methods," Tech Report No. 807, SORIE, Cornell University (Ithaca, NY, 1988). 23. K. Truemper, "Note on Finite Convergence of Exterior Functions, Management Science, 21, 5 (1975) 600-606. 24. D.R. Wilhelmsen, "A Nearest Point Algorithm for Convex Polyhedral Cones and Applications to Positive Linear Approximations," Mathematics of Computation, 30 (1976) 48-57. 25. P. Wolfe, "Algorithm for a Least Distance Programming," Mathematical Programming Study 1 (1974) 190-205. 39

UNIVERSITY OF MICHIGAN 3 9015 04732 7435 26. P. Wolfe, "Finding Nearest Point in a Polytope," Mathematical Programming, 11(1976) 128-149. 27. Y. Ye and E. Tse, "An Extension of Karmarkars Projective Algorithm for Convex Quadratic Programming," Mathematical Programming, 44, 2 (1989) 157-181. 28. D.M. Young, "Convergence Properties of the Symmetric and Unsymmetric Successive Overrelaxation Methods and Related Methods," Mathematics of Computation, 24(112), (1970) 793-807. 40