"/' J.'' / ( / / EXTERIOR POINT ALGORITHMS FOR NEAREST POINTS AND CONVEX QUADRATIC PROGRAMS iZ~c i L.* r^" -I... I7 /...', V..'""',-. | I... fC../- bf L.i'l'a::." I'.-''.(/ * /' i /':t:".'..... K.S. Al-Sultan and K.G. Murty Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109-2117, U.S.A. and Systems Engineering Department King Fahd University of Petroleum and Minerals Dhahran-31261, Saudi Arabia October, 1989 Technical Report 91-6 (revised March 1991) Partially supported by NSF Grant No. ECS-8521183:'/s " O'z'{t.:'"' /?',"<:,'-f,,,.';: ~...t,'', -. A /..'::'". 1, 1 jI^ / u

Abstract We consider the problem of finding the nearest point (by Euclidean distance) in a simplicial cone to a given point, and develop an exterior penalty algorithm for it. Each iteration in the algorithm consists of a single Newton step following a reduction in the value of the penalty parameter. Proofs of convergence of the algorithm are given. Various other versions of exterior penalty algorithms for nearest point problems in nonsimplicial polyhedral cones and for convex quadratic programs, all based on a single descent step following a reduction in the value of the penalty parameter per iteration, are discussed. The performance of these algorithms in large scale computational experiments is very encouraging. It shows that the number of iterations grows very slowly, if at all, with the dimension of the problem. Key words: Exterior penalty function methods, nearest point problems, convex quadratic programs, Newton step, SOR methods. Abbreviated title: Exterior Point-algorithms 2

1 Introduction We consider the nearest point problem, which is that of finding the nearest point by Euclidean distance in a convex polyhedral cone K C R" to a point q E R. It is a special convex quadratic program with many important applications (see [1, 2, 4, 8, 16, 17]). The KKT optimality conditions for the problem, form the linear complementarity problem (LCP) associated with it. The nearest point problem can be solved by specializations of descent or active set methods for linearly constrained nonlinear programs [3, 15, 16 ], or through the associated LCP using any of the available algorithms for solving LCPs such as complementary and principal pivoting methods [2, 13, 14, 16 ], interior point methods [12, 24 ], and special methods based on its geometry [2, 8, 17, 22, 23]. Thus a variety of methods are already available for the nearest point problem. However, since it appears often as a large scale problem, there has been a continuing motivation to develop new algorithms which are faster. This is our motivation too and our inspiration comes from recent reports of significant improvements in computational performance for solving large scale linear and convex quadratic programs through the use of barrier methods in novel ways. We discuss a family of penalty methods for our problem and report on their encouraging computational performance. For any matrix D, we denote its ith row, jth column by Di., D.j respectively. The symbol (xj) denotes the vector whose jth component is xj. For any real numbers al,..., am, diag (ai,..., am) denotes the m x m diagonal matrix with entries in the principal diagonal equal to al,..., am. The symbol S\T denotes the set of all elements in the set S which are not in the set T. For any vector x, lx\II denotes its Euclidean norm. For a square matrix A, IlAIl denotes its matrix norm which is max. {llAxll: Illl = 1}. Given a matrix Q, Pos (Q) = {Qy: y O} is the cone which is the nonnegative h i o hull of its column vectors. A square matrix A is said to be positive definite (PD) if yTAy > 0 for all y / 0, or positive 3

semidefinite (PSD) if yTAy - 0 for all y. Given any differentiable real valued function h(x) defined over R", we denote by Vh(x) its gradient vector at x, written as a row vector. We use both superscripts and exponents in this paper, and distinguish between them by typeface. Boldface superscripts indicate exponents, for example Ay is Xj raised to the power r. Regular superscripts, as in Ak, are used to enumerate various vectors, etc. There are two forms of the nearest point problem, depending on the way the polyhedral cone K is specified. In one form K is specified as Pos (Q) for a given matrix Q. In the other form K is specified as {x: Axz 0} where A is given. Both forms appear in applications. When the problem is given in one of these forms, to transform it into the other form typically requires exponential effort. We develop a family of exterior penalty methods for the nearest point problem when K is given as Pos(Q). Our convergence analysis and computational experiments treat the problem in this form. However, we show how the methods can be easily adopted to handle problems in the other form, and convex quadratic programs in general. Without any loss of generality, we assume that the cone K is of full dimension. 2 Penalty Algorithms When K is a Simplicial Cone Given in the POS Form Let K = Pos(Q) where Q is a nonsingular square matrix of order n. This special case of the nearest point problem itself has many applications [1,4 ]. In this case, the nearest point problem is: find A = (Al,..., A)T to minimize 11 q - QA 112 (1) subject to A 0. It is well known that this problem always has a unique optimum solution. If A* is the 4

optimum solution of this problem, x* = QA* is the nearest point in K to q. The penalty approach transforms the problem into one of unconstrained minimization of a composite function which is the sum of the original objective function and a penalty function associated with the feasible region. The various penalty methods differ in the choice of the penalty n function. In this study we consider penalty functions of the form Pr(A) = ZE(max.{O, -Aj}) j=l for r = 2, 3, 4. This leads to the following unconstrained minimization problem in A, where P is a positive penalty parameter. 2 n minimize fr,(A, ) =1 q- QA 112 +-1 (max.{0,-Aj})r over A E RB (2) /1 j=l fr(A, ) is convex in the A-space for fixed,a > 0. f2(A, \1) is continuously differentiable in A up to the first order, f3(A, s) and f4(A, L) are twice continuously differentiable in A. Let gr(A,,u), H(f,(A, A) be the gradient as a row vector, and the Hessian matrix of fr(A, b) with respect to A. g(A,,) for r = 2,3,4; and H(fr(A,,)) for r = 3,4 are given below. g(A, )= Vfr(A, i) = -2qT + 2ArTQTQ + -(1Alr...nAn) H(fr(A, ))= 2QTQ+ r(r-) diag (61A2, I r2) (3) = ( dif Aj,, where 6j -= j(Aj)= < (-1) if A < 0. In penalty algorithms, the unconstrained minimization of fr(A, A) in A is carried out by a descent method. This method begins with an initial point A~ and goes through several steps. In each step a descent direction at the current point Ak for fr(A, P) is generated, and a step is taken from Ak in that direction. The step length to move in that direction is either 1 (for algorithms based on standard Newton's method) or the optimum step length determined by 5

a line search in that direction. We will now describe the line search routine, and choice of descent directions that we will use. Line Search Routine for f7(AX,1) Let X E Rn and suppose d = (d,... d, )T is a descent direction for fr(A, l) at A. Then dfr( + ad, I) = (-2dTQTq + 2dTQTQ-) + 2adTQTQd da 1 +- E r(-1)'r(X + cdj)r-ld,( (a) (4) / j=l where (j(a) = 0 if A + cdj - 0, and 1 if A + adj <. So dr(A+ad,) is a sum of at most n + 2 terms. Determine the set of ratios {I2 d j such that either Aj > 0,dj < O, or Aj < O, dj > 0}, and arrange them in increasing order. Suppose there are s distinct values in this set, al < a2 <... < Ca. Define acs+ = oo. For any u = 1 to s, in the interval au < a < au+l, each of the terms on the right hand side of (4) is monotonic and has the same sign. Since d is a descent direction for fr(A, ) at A, dfr(A+d,) < at a = 0, and as fr(A, l) is convex and has a minimum, this derivative becomes 0 at some point in this direction. So, if we begin computing dfr(A+a d, ) for a = O. al,,2... there will be a t such that it remains negative until we get to at, and at at+l it becomes > O. Then, we know that the minimum of f,(A, ) in A, over the half-line {A + ad: a _ 0}, is attained by some a in the interval at < a a ct+l, and it can be found by an efficient search for the zero of d afr(Adg) in this interval, as all the terms on the right hand side of (4) are monotonic and of the same sign. 6

Selection of Search Directions We basically consider two directions. Steepest descent search direction: Under this rule, the search direction at the current point Ak for fr(A, /1) is - V fr(A'k, 1), and step length for the move in this direction is taken to be the optimum step length determined as discussed above. The method based on this strategy is the steepest descent version of the method. Newton search direction: Under this rule the search direction y at the current point Ak for fr(A, it) is a solution of the system H(fr(Ai,))y = (Vfr(Ak, ))T (5) Since Q is square and nonsingular, H(fr(A, >)) is PD and (5) always has a unique solution. When using the Newton search direction, we can take the step length for the move to be 1 in all the steps, this leads to the standard Newton method; or optimum step lengths leading to Newton method with line searches. The Classical Exterior Penalty Method This method, discussed extensively in nonlinear programming literature, goes through several iterations. In each iteration, the value of the penalty parameter 1 is fixed, and an algorithm is used to find the unconstrained minimum of fr(A,,u) over A E Rn. This algorithm itself may take several descent steps in this iteration. Proofs of convergence of the classical penalty method require that the unconstrained minimum of fr(A, s) is obtained in this iteration, but in actual computation a sufficiently small positive tolerance, e, called the desired accuracy, is selected and the unconstrained minimization routine terminated whenever a point satisfying (6) is obtained. II || gr(I) II 7 (6) 7

Now the penalty parameter is reduced from its present pj to JT say, and the method moves to the next iteration to find the unconstrained minimum of f,(A, F) beginning with A as the initial point. Let A(p) denote an unconstrained minimum of fr(A, t) over A, as a function of /. It has been shown [21] that it is possible to select a target value l., > 0 for the penalty parameter, such that when i decreases to this value or becomes smaller, then A(p) is within a specified tolerance of an optimum A for the original problem. New Exterior Penalty Methods Our methods are based on the ideas of the classical penalty method, but differ from it in one important aspect. We do not find the unconstrained minimum of fr(A, A) for each fixed y. Instead we carry out only one descent step of the unconstrained minimization algorithm, then reduce the value of the penalty parameter, and continue the same way. One can visualize a path in the A-space, parametrized by t, {A: 7fr (A, s) = 0, /L > 0}, called the exterior path, and an envelope containing this path defined by {A:|11 vfr(A, i) 1!e,, > 0} for a certain e > 0. We will later on prove that the points obtained in this new algorithm, can be interpreted as a sequence of points in this envelop converging to the limit of the point on the exterior path corresponding to # as p tends to 0. Basic Algorithm Initialization Let e be the desired accuracy, and y. the target value for the penalty parameter value. Select an initial point A~ satisfying QA~ = q, and an initial value Po for the penalty parameter (this could be large). Select a value for the factor p between 0 and 1. With (A~,,o) as the initial pair, go to Step 1. 8

General step Let (Ak,Pk) be the pair at the end of the previous step. Let pk+1 = PIk. Find the descent direction (either steepest descent or Newton's) dk at Ak for fr(A, Clk+l). Select the step length ak (1 for standard Newton method, or the optimum step length for other methods). Let Ak+l = Ak + akdk. If Ak+1 _- for all j, or if / k+1, Ak is a point within the specified tolerance of the optimum solution for (1), terminate. Otherwise, with the new pair (Ak+l, /k+l) go to the next step. This is the basic algorithm. Details on how to select p, HO etc. are specified later on. Convergence Results Classical penalty methods are known to converge to an optimum solution of the original problem [3, 7, 15, 19, 21 ]. However, these proofs assume that an actual optimum solution for the problem of minimizing f(A, /) over A E R' is obtained for each fixed value of,a in a sequence converging to zero. We do not really obtain the unconstrained minimum of fr(A, p) over A E Rn for any value of / before we change it. Hence standard proofs of penalty methods do not apply directly to our algorithm. We give a convergence proof for Standard Newton version of the algorithm (one Newton step following a decrease in the penalty parameter Jl in every step), using the composite function f4(A, c), this has been selected because each component of its gradient is twice continuously differentiable, and our present proofs use this property. We denote f4(A, /) by f(A, /) for the sake of simplicity. Likewise we denote the gradient vector and the Hessian matrix of f4(A, p1) with respect to A by g(A, y), H(A,,u). So g(A, I) = -2(q - QA))TQ + -(1A,..., n) (7) H(A,) = 2QTQ + 12 ) H(A, IL) 2QTQ+i diag (6i1A1..., SnA) 9

where 6 = (S1,.... S,) is a function of A defined as in (3). We denote by gj(A, y) the jth component of g(A, /); and by H.j(A, /) the jth column vector of H(A, yu). Since Q is nonsingular, QTQ is PD. Let cra be the smallest eigenvalue of QTQ. Also, from (3), (12/p) diag(iA)2,...,SAU) is PSD for every p > 0. These facts imply that H(A, it) is PD for every It > 0, and that its smallest eignevalue is 2cra. Hence f(A, It) is not only strictly convex, but is a strongly convex function in A for all C/ > 0. They also imply that I (H(A,,))-1 1/(2cr1), for every u > 0 and A E Rn. Theorem 1: For each p > 0, the system g(A, y) = 0 has a unique solution, A(p/) say, in A. A(pt) is a continuous function of / in {j: l > O}, and as j -+ 0 through positive values, |I A(p) 11 remains bounded above by a constant which depends only on q and Q. Proof: Fix, > O0. Select any point x E Pos(Q), let W =11 q- x 11,S = {y:11 q-y 11- y},r = {A: QA E S}. For all A V r, 11 q - QA 112> W2 and hence f(A,jI) > V2. Let A = Q-'z, then A - 0 since x E Pos(Q), and hence f(A, Ju) = W2; and A E6 r. r is a compact set and f(A, j) is a continuous function in A, so it attains its minimum over r, at a point A E r say. By the above f(A,/ ) _ 02 and since f(A,,) > (02 for all A Fr, A is the unconstrained minimizer of f(A, L) over A E Rn. Since f(A, /) is strictly convex in A, its unconstrained minimum A is unique and is attained at the solution of g(A, J) = 0. Hence, for any y > 0, g(A,,) = 0 has a unique solution in A, A(p) say. 9g(A) = H(A,,) is nonsingular for all A whenever u > 0. Hence, by the implicit function theorem, A(p) is continuous in i in the region e, > 0. If x* is the nearest point in Pos(Q) to q, and A* = Qlx*, then from penalty function theory we know that A(y1) - A* as 9 - 0 through positive values. This and the continuity of A(y) implies that IIA()|II remains bounded above by a constant as y - 0 through positive values, where this constant depends on IIA*II which itself depends on q and Q. I From Theorem 1 we know that {A(p): j > 0} is a path in the A-space which we call the exterior path. For given c > 0, y > 0 define 10

n(, c) = {A: Ilg(Ai)| - e} (8) The set U,>o(,cy, e) defines an envelope around the exterior path. We will show that it is possible to interpret the points obtained in our algorithm, as a sequence of points contained in this envelope, converging to A* as / — + 0. Lemma 1: For given e > 0, the set f2(t, e) is bounded for each /p > 0. Proof: Fix it > 0. Since f(A, 1) is strongly convex in A, for all A, A1 E R, there exists a positive constant 0o such that lIg(A, p) -g(Al',i)ll >- YoIIA - A111 (see [19]). Substituting A1 = A(/i) in this inequality yields I(g(A, 1)II yo|IIA - A(p)II for all A. Hence, if A E Q(Q,e) we have IIA - A(p/)11 e. The result in this lemma now follows by using Theorem 1. I From Lemma 1, we can establish a bound for A E f(yt, e) which depends only on,u, e, and the input data. Let -y(d, e) denote this bound, i.e., IIA 11 7(y, e) for all A E 2Q(, e). The algorithm will be initiated with a value for,, yo > 0, and the value of ji will be decreased in each iteration, until it reaches a target value,* say. The algorithm will terminate when, reaches or decreases below io. Define 7 = max{7(1, e): * Y, o}. Then A|II |- 7 for all A E f2(,, e), and for all /, 0o; and this interval itself is called the range for the penalty parameter. y depends on e, and clearly it decreases with e. Theorem 2: For given e > 0, and i > 0 in its range, let A E Q(/, e), i.e., 19g(A, p)|ll 1- e. Let p be a number satisfying 0 < p < 1 and ft = pp. Then there exists a constant P depending only on the input data, and the bound y defined above, such that IIg(A, )l lp p3 +. Proof: Let a = -2(q - QA)TQ and b = 4(S1Af,,- n. ). Then g(A,f) = a +. So Ilg(A, )lJ = Ila + p 11 = p|(p - l)a + a + || < >Il||a|| + l||a + bll- lP ||a| + -. Now |lall = II - 2(q - QA)TII 2|1qTQ|| + 211QTQl|| * lAl 2|11|qTQ + 211QTQl7 =, by the definition of 7 given above. So Ilg(A, ) 11 <-1P)/ + _. I Theorem 3: For given e > 0, and y > 0 in its range, let A E fT(1, e). Select a p satisfying 2 < p < 1, and let ft = pU A = A - (H(A. p))-(g( F))T. Then 11g(A,)11 < a|llg(A, a)112 11

where a is a constant depending on p, c, 7, f/, P, and al defined earlier. Proof: Since the function <g((, f) is twice continuously differentiable in A, from Taylor's theorem we have gj(x, f) = gj(A, p) + (A - )TH.j(, i) (9) +j(A - A)TW(A, )(A - A) where W(A, f) is the square matrix (Wu,v(i f)) of order n with Wu,((A, ) = 0 for all (u,v) f (j,j), and = 24Aj for (u,v) = (j,j); A = (A= ( = (1 - O)A + OA for some 0 O O _ 1, and Sj = 8j(Aj) as defined in (3) for r = 4. From the definition of A we verify that gj(A, ) + ( - )TH.j(A, ) = 0. So from (9) gj(x,f) = 2(j - A)2j 12 n Yls~- ) 5~ 3 3 ( (2)2 Z( j j)4 2 < (_)2 ( -_ A )4X (10) II j=i From the definition of A, I||A|| < IAIl+l (H(A, ))1 ||I||(AX, f) <~ y+7. where y is the constant defined earlier and 7 = 2- (Sep + ~) from Lemma 1 and Theorem 2. So IAl112 < (7 +.)2. Since A = (Aj) = (1-0)A+0A for some 0 - 1, Aj _ max _.{Aj, Aj} - max {llA|12, 11A12} _ (7 + r7)2. Substituting this in (10) yields l iX 8) l- _ y). (>i-AX)4 (11) ]g(x, fi)1I2 < (12('+ r))) ) -A41 12

Now, E=(Aj -A)4 )(E_=(i -AAj)2)2 = ( 1A- 2)2 (ll((A( ))- 112 g |(A,)112)2 < (4)2Illg(A, )l14. Substituting in (11) we get < 3(- + 1)2,O 11( )|- 2 2(12) Now, r = ^2a (^lp + P < 2l ( + 2e), since 2 < p < 1. So, if we take C = a (& + ), then from (12) l|g(X, ) 11 <-a |lg(A, )112. I We will now formally describe the detailed steps in the algorithm and show that there exists a value for the factor p strictly < 1 which ensures that all the points A obtained in the algorithm are within the envelope around the exterior path mentioned earlier, and that the algorithm terminates in a finite number of steps. Let e be the desired accuracy and ji the specified target value for the penalty parameter. The aim is to find A which satisfies gls(A,/i)t) I c for some t = <u, e < c. Let A~ = Q-'q. If A~ O 0, q E Pos(Q), hence it is itself the nearest point in Pos(Q), and A~ is optimal to (1), terminate. If A~ A 0, define o= E ((A~)6: over j such that A~ < ) Define -y = max *{y(i): ) /< C,o} as before. Define P = 211qTQJ1 + 211QTQ1ll as before, using this value for 7. Define A x~( 13 (+ +2e C= max.{, + } * = 1 2& (13) * - 2(+2) 13

Since a - 2, - 4 (1) = e*. Similarly, we can verify that * L, p* < 1, and that IIg(A~, ~o)|J = 6. The Algorithm Initialization Initiate the algorithm with the pair (A~, go). General Step Let (Ak, k) be the pair from the end of the previous step. Define /k+l = P*Lk. Take one Newton step from Ak for f(A, rk+l), i.e. A+ = X - (H(f(Al, k+)))- (g(A, k+1))T If Plk+l < y*, Ak+1 is a point within desired tolerance of the optimum solution of (1), terminate. Otherwise, with Ak+l, 11k+l as the new pair, go to the next step. We will now show that every pair (Ak, Ik) obtained in this algorithm satisfies I1g(Ak, Ilk) < e. Suppose this inequality holds for k = p. From the definition of a' in (13), we see that a = the constant a- of Theorem 3 for e = e, this leads to Ilg(AP+l,lp+l)ll - a!lAg(APp+l)jl2 _ & (l p/% + P. ) since IIg(AP, p) 11 and by using Theorem 2. So 14

ISg(AP+', p+1)11 - ((1 - p)*3 + c*)2 2*(Pp) 2 ((1 - p)*/ + + *)2 = I((1 -/)* +E*)2 (e)2 (X-1 + l 2c*(p-)2 V+ve* + ) 2(p*)2 + Vi+e* (pe\)P 2 Since IIg(A~, o)ll < a, by induction IIg(Ak, k)ll < for all k. Since p* < 1, the value of the penalty parameter decreases strictly in the algorithm. So, after at most [(log/.- log /o)/(log p*)l steps of the algorithm, the value of the penalty parameter decreases to *. or below, i.e. to < the target value of /. We terminate with the vector A at that time as the point near to the optimum solution of (1) to the desired accuracy. 3 Computational Results We have so far solved several randomly generated nearest point problems in simplicial cones of dimension n ranging from 10 to 1500 with implementations of these exterior point methods. In these problems, each element of q was generated from the uniform distribution between -5 and +5 (of double word length of 8 bytes) and each element of Q was drawn from the uniform distribution betwen -20 and +20. The vector q and the matrix Q were fully dense in 15

all the problems generated. None of the matrices Q was checked for singularity, but system (5) always had a solution in methods which used it to generate descent directions. Even though our convergence results have been proved for the standard Newton method using the composite function f4(A, I), in our computational tests we tried a variety of methods to compare their performance. The various methods tried are listed below. VERSION 1: Using Newton direction with step lengths 1: In this version, exactly one descent step is carried out following a decrease in the penalty parameter in each iteration. f3(A,,u) and f4(A, F) both have second derivatives at all A but f2(A, i) has second derivatives only at points A in which all components are nonzero. It has been observed that all the Ak generated in the algorithm tended to have all components nonzero, even though some of the components were clearly converging to zero (as explained in [15], this might be due to the fact that exterior methods approach the optimum from outside the feasible region). So, we implemented this version also with the composite function f2(A, P) using the formula for the hessian given in (3) and the method never encountered any problems in thousands of runs in early experiments. VERSION 2: Using Newton direction with line searches: Unlike Version 1, here we implemented this version only using the composite function f2(A, /), because of the simplicity of the line search routine for r = 2. Again, only one descent step was carried out per iteration. VERSION 3: Using steepest descent strategy: We implemented this version using the composite functions f2(A,,) and also fi(A, 1). Although fi(A, ) is not differentiable at points A with some components zero, the previous discussion on second order differentiability applies here. Unfortunately, this version based on f1(A, ) did not converge in most cases. A close look at the gradient reveals that this divergence is expected since the gradient does not carry much information about negative Ajs. In this version we also experimented with carrying out several descent steps per iteration (i.e., several descent steps between changes in the value of the penalty parameter). 16

Our convergence theory showed that there exists a value for the factor p strictly < 1 (this is p* defined in (13)) which ensures that Version 1 with r = 4 (i.e., with the composite function f4(A, A)) terminates with a desired point in a finite number of iterations. To actually implement the methods, we experimented with several trial values for p and pursued that one which seemed to give the best performance. This is a very standard practice in implementing optimization algorithms. In the same vein, in our computational studies we experimented with all composite functions f,(A, /) for r = 2, 3, 4, even though our convergence analysis focussed on the composite function f4(A, /), since the current proof depended on the twice continuous differentiability of the gradient. We got significantly better computational results with r = 2. In most mathematical programming implementations this phenomenon is typical. For all versions, the following criteria have been tried as signals for convergence in early experimentation. Small positive tolerances e1, e2 have been selected, and the algorithm is terminated in iteration k if the penalty parameter #k and the vector Ak satisfied both the following conditions: (i) Ak > — 61 for all j =1 to n (ii) lPr(Xk)< e2 Condition (i) relates to "near feasibility" (within the tolerance e1). Both conditions have been used in early experiments, and it always turned out that when one held the other too held in the same iteration (depending on the choice of suitable values for e1 and E2). Hence, in later tests, we used condition (i) as the sole termination criterion. In Figure 1 we provide the plot of Z(Ajl: over j such that Aj < 0), which is a measure of infeasibility of the current A-vector, in each iteration of the standard Newton vesion based on f2(A,,t), for a problem of dimension n = 40. It can be seen that this infeasibility measure drops very sharply and becomes almost zero in about 6 iterations. 17

Infeasibility 409. 20 10 0 2 3 4 5 6 Iteration # Figure 1 Infeasibility versus # iterations For the algorithm to converge, we observed that the penalty parameter P must reach a sufficiently small value (like 10-12, this depends on the desired accuracy 61). A lot of experimentation has been done to determine the best value for uo, the initial penalty parameter value. The smaller the value of /lo, the less the number of iterations to drive Yi to its terminal value. However, since the initial A, A0 = Q-~q is the unconstrained minimum of fr(A, /) only at y = oo, choosing a small value for io0 puts the unconstrained minimum of fr(A, )o) far away from A~, resulting in slow convergence. In most experiments, selecting ho = 10-2 yielded excellent results. In Versions 1 and 2, computationally the most expensive part in each iteration is solving a system of equations ((5) here) to get the descent direction. Since many optimization algorithms require a step like this in every iteration, this has been the object of intense research activity over the years. For our computational runs we experimented with several schemes and found that the iterative SOR methods gave the best results, however the best value for the relaxation paramter w* in these methods seems to increase with n (see Table 2). The performance of Version 1, 2 can be improved substantially by using better schemes to 18

solve (5) in each iteration. All versions were coded in FORTRAN 77 using double precision arithmetic and run on an IBM 3033, or on an APOLLO series 4000. Problems with n > 700 were run on the Cornell Supercomputer (IBM 3090). In Version 1 based on Newton directions and constant step length of 1, it can be seen from Table 1 that the number of iterations grows extremely slowly, if at all, with the problem dimension. As mentioned earlier, there is tremendous scope for improving the computer time taken for solving (5) in each iteration, hence the number of iterations is a more reliable guide of algorithm performance, than computer time, and this is almost independent of problem dimension. This almost constant number of iterations was about 6 for versions based on f2(A, u), 38 for versions based on f3(A, 1), and 70 for versions based on f4(A, T). The main reason for this may be the fact that f2(A, u) is very nearly quadratic in A, thus making it well suited for Newton method. Since f2(A, y) is almost quadratic, one Newton step almost always leads very close to the unconstrained minimum of this function in each iteration, thus enabling a much faster reduction in the value of y. The barrier terms based on logarithmic functions employed by interior point methods do not share this nice property. In Version 2 based on Newton's method with line searches, the step length ranged between 0.8 to 1.2, and was very close to 1 in most iterations. The total number of iterations was only marginally less than that for Versions 1 based on constant step length of 1. We compared the performance of Version 1 with M.J.D. Powell's implementation of A. Idnani and D. Goldfarb's dual quadratic programming algorithm (available through IMSL as subroutine QPROG), K. Haskell and R. Hanson's [10] routine available through ACM algorithm 587 for linearly constained least squares ( called HH in Table 2 ), and with our implementation of D.R. Wilhelmsen's nearest point algorithm [22]. Our algorithm was superior to each of these, but we display comparative figures only for K. Haskell and R. Hanson's code and QPROG to conserve space (comparative timings for the other algorithm can be 19

obtained from the authors). In Version 1 we found that the total number of iterations depends critically on the value of the factor p used. We found that once a Aj becomes > 0, it remains > 0 in almost all subsequent iterations. This may explain the excellent performance. For Version 3 based on steepest descent directions, we found that it is better to do many descent moves between consecutive updates of the penalty parameter. In each iteration we continue making descent moves as long as there is significant change in the solution vector A. When this becomes small by the Loo norm we update is and go to the next iteration. In comparing Version 3 with those based on Newton directions one should bear in mind that each move in these methods is computationally cheaper as there is no equation solving involved. Version 3 is more sensitive than Newton based versions to the strategy for updating the penalty parameter. In general, the rate of reduction of # has to be slower, more so as n increases. The number of descent moves per iteration was almost constant at 15 independent of problem dimension. The number of iterations itself averaged around 37 in problems with n up to 250. From Tables 1, 2, and 3 we see that our current implementation of Version 3 does not compare favorably with the implementations of Newton based versions, or even with QPROG, in terms of computer time. However, there is tremendous scope to improve the coding of this version. There is also the possibility of using conjugate gradient based directions rather than steepest descent directions, this has not been tested yet. It has already been mentioned earlier that if Aj becomes > 0, it tended to remain > 0 in subsequent iterations. In this problem, if we know the set J = {j: Aj > 0 in the optimum solution}, it is well known that the nearest point itself can be found by orthogonally projecting q onto the linear hull of the face of K corresponding to J. In some experiments we selected a tolerance f3 > 0 (about 10-3) and when the current solution vector Ak satisfied Ak -3 for all j, we have taken {j: A > -63} as an estimate for J and used the projection 20

strategy. This has cut the number of iterations by about 1/3 on an average. Table 1: Results with Version 1( time in IBM 3033 seconds ) based on fr(A, i) r=2 r=3 r- 4 n # * time * time * time time problems for QPROG 10 200 5.80 0.65 37.40.710 69.50 0.77 0.65 20 200 6.01 0.68 38.00.877 70.10 1.07 0.68 30 200 6.03 0.72 38.10 1.170 70.50 1.62 0.73 40 200 6.04 0.80 38.13 1.610 70.60 2.44 0.82 50 200 6.04 0.89 38.40 2.240 70.79 3.59 0.96 100 100 6.08 1.98 39.10 9.270 71.20 16.35 2.88 700 1 7.00 339.70 + + + + 652.00 Accuracy = 10-8. System (5) solved by direct factorization using IMSL routines LFCSF and LSLSF. *Average number of iterations. +Not tried Best p =.02,.30,.40 for r = 2,3,4 respectively. 21

Table 2: Results with Version 1 based on f2(A, #) on APOLLO series 4000. Time in seconds. n # Avg. # of time optimal w* time for problems iterations p HH 10 10 6.0 6.12.02 1.02 6.07 50 10 6.0 4.05.02 1.05 4.50 100 10 6.0 24.0.02 1.25 34.0 200 10 6.1 150.0.02 1.30 259.6 300 10 6.3 405.1.02 1.30 890.0 400 10 6.3 902.3.02 1.30 2123.0 700 2 6.5 4302.0.02 1.45 11768.0 Accuracy = 10-7. System (5) solved by SOR with relaxation parameter w* (found best by experimentation). In all cases, +/n SOR iterations gave an accurate solution. Problems with 700 < n < 1500 were run on a different computer (IBM 3090) and for these the average number of iterations was also 6. 5. But we do not show those results in the table because the times on the computers are not comparable, and the algorithm HH was not available on that system. 22

Table 3: Results with Version 3 ( time in IBM 3033 seconds) based on f2(A, t) n i Avg. # of time optimal time for problems iterations descent steps p QPROG per iteration 10 200 12 14.66 1.807.1.565 20 200 12 15.16 1.170.1.670 30 200 35 14.90 3.760 5.727 40 200 36 15.56 6.260.5.815 50 200 36 13.38 8.080.5.960 100 100 36 15.58 32.90.5 2.33 250 1 37 14.90 197.30.5 31.7 Accuracy = 10-3. 4 Exterior Penalty Methods for Nearest Point Problems in Nonsimplicial Cones and Convex Quadratic Programs. Consider the problem of finding the nearest point in Pos(Q) to q E Rn where Q is an n x m matrix of rank n, with m > n. This is the problem of the same form as (1) with the present Q, however an optimum A in this case may not be unique. If A* is an optimum solution, then x* = QA* is the nearest point in Pos(Q) to q, x* is always unique. We use the same 23

composite functions fr(A, y) as before, for r = 2,3, or 4 and the formulas for gr(A, ), and H(fr(A, a)) are the same as before. However, in this case the hessian H(fr(AX, i)) of order m x m is PSD and may be singular, and (5) may not have a solution. In this case, we can use some of the standard modifications in the literature for defining the Newton search direction, one of which replaces (5) by (H(fr(A,p)) +rI) y = - fr(A k, ) (14) where r > 0, and I is the unit matrix of order m. The matrix on the left-hand side of (14) is PD and hence (14) again has a unique solution which is a descent direction for fr(AX, I) at Ak The steepest descent direction, and the line search routine for fr(A,,u) discussed earlier, remain valid as they are, in this case. So, all the exterior point methods discussed earlier for the simplicial cone case; can be implemented to handle this case directly, with only minor modifications. We conducted some computational experiments in which Q was rectangular, and our results were almost the same as those reported earlier. Now consider the general convex quadratic program minimize z(x) = cx + xTDx subject to Ax _ b x =0 where D is a symmetric PSD matrix and A is of order m x n. In the penalty approach we solve this problem through the unconstrained minimization problem of the form 24

minimize Wr(x, cx) = cx +;xTDx + 1 E l(max{0, b- A. +i E^ l(max{0,- -x})r over x E Rn Using the function wr(X, il), versions of the new exterior penalty algorithms for this problem are constructed exactly as before. 5 Acknowledgements We acknowledge the many discussions we had with R. Saigal. Our thanks to O.L. Mangararian for bringing the result on strongly convex functions to our attention which greatly simplified the proof of Lemma 1. Finally, we are grateful to an anonymous referee for pointing out errors in an earlier version of this paper. 6 References 1. K.S. Al-Sultan, "Nearest Point Problems: Theory and Algorithms," Ph.D. Dissertation, University of Michigan (Ann Arbor, MI, 1990). 2. K.S. Al-Sultan and K.G. Murty, "Nearest Points in Nonsimplicial Cones and LCPs with PSD Symmetric Matrices," in S. Kumar (ed.) Recent Developments in Mathematical Programming, Australian Society for O.R. special Publication, Gordon & Greach, Camberwell, Victoria, Australia, 1991. 3. M.S. Bazaraa and C.M. Shetty, Nonlinear Programming: Theory and Algorithms, (Wiley, NY, 1979). 25

4. S.Y. Chang and K.G. Murty, "The Steepest Descent Gravitational method for Linear Programming," Discrete Applied Mathematics, 25 (1989) 211-239. 5. M.B. Daya and C.M. Shetty, "Polynomial Barrier Function Algorithms for Convex Quadratic Programming," Arabian Journal for Science and Engineering, 15, no. 4B (October 1990) 658-670. 6. J.P. Evans, F.J. Gould, and J.W. Tolle, "Exact Penalties Functions in Nonlinear Programming," Mathematical Programming, 4, 2 (1973) 72-97. 7. A.V. Fiacco and G.P. McCormich, Nonlinear Programming: Sequential Unconstrained Minimization Techniques, (Wiley, NY, 1968). 8. E.G. Gilbert, D.W. Johnson, and S.S. Keerthi, "A Fast Procedure for Computing the Distance Between Complex Objects in Three-Dimensional Space," IEEE Journal of Robotics and Automation, 4, 2 (1988) 193-203. 9. G.H. Golub and C.F. Van Loan, Matrix Computations, (John Hopkins University Press, Baltimore, MD, 1989). 10. K. Haskell and R. Hanson, "An Algorithm for Linear Least Squares Problems with Equality and Nonnegativity Constraints," Mathematical Programming, 21 (1981) 98118. 11. N. Karmarkar, "A New Polynomial Algorithm for Linear Programming," Combinatorica, 4 (1984) 373-395. 12. M. Kojima, S. Mizuno, and A. Yoshise, "A Polynomial-Time Algorithm for a Class of Linear Complementarity Problems," Mathematical Programming, 44, 1 (1989) 1-26. 13. C.E. Lemke, "Bimatrix Equilibrium Points and Mathematical Programming," Management Science, 11 (1965) 681-689. 26

14. C.E. Lemke, "On Complementary Pivot Theory" in G.B. Dantzig and A. Veinott (eds) Mathematics of the Decision Sciences (1968). 15. D.G. Luenberger, "Linear and Nonlinear Programming," 2nd edition (Addison-Wesley, Melo Park, CA, 1984). 16. K.G. Murty, Linear Complementarity, Linear and Nonlinear Programming, (Heldermann Verlag, West Berlin, 1988). 17. K.G. Murty and Y. Fathi, "A Critical Index Algorithm for the Nearest Point Problem on Simplicial Cones," Mathematical Programming, 23 (1982) 206-215. 18. T. Pietrzykowski, "An Exact Potential Method for Constrained Maxima," SIAM J. Numerical Analysis, 26, 2 (1968) 299-304. 19. B.T. Polyak, "Introduction to Optimization," (Optimization Software, Inc., NY, 1987). 20. J. Renegar, "A Polynomial-Time Algorithm Based on Newton's Method for Linear Programming," Mathematical Programming, 40, 1 (1988) 59-95. 21. K. Truemper, "Note on Finite Convergence of Exterior Functions," Management Science, 21, 5 (1975) 600-606. 22. D.R. Wilhelmsen, "A Nearest Point Algorithm for Convex Polyhedral Cones and Applications to Positive Linear Approximations," Mathematics of Computation, 30 (1976) 48-57. 23. P. Wolfe, "Finding Nearest Point in a Polytope," Mathematical Programming, 11 (1976) 128-149. 24. Y. Ye and E. Tse, "An Extension of Karmarkar's Projective Algorithm for Convex Quadratic Programming," Mathematical Programming, 44, 2 (1989) 157-181. 27