"On Relationships Between L1, L2 and Loo Minima" K.G. Murty and K.S. Al-Sultan Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109-2117, USA and Systems Engineering Department King Fahd University of Petroleum and Minerals Dhahran-31261, Saudi Arabia Technical Report 89-26 March 1989

Abstract Let K be a nonempty convex polyhedron { x:Ax > b } in Rn. We investigate the problems of finding the nearest point in K to the origin by the L2 (Euclidean), L1 (Rectilinear), and the L,o (Tschebychev) distances. Problems of this type occur very often in curve fitting and parameter estimation applications, and there has been a long standing controversy on which distance measure to use in such applications, for which there is as yet no mathematically satisfactory resolution. We establish relationships between the sets of optimum solutions of the nearest point problems under these three distance measures. We provide results comparing the distances of the three minima when measured using the same measure (any one of L1, L2, or L,). Finally, we discuss ways in which an optimum solution of one of these problems might be used as an initial point in algorithms for solving the other problems. Keywords: Convex polyhedron; Euclidean, rectilinear and Tschebychev distances; nearest points.

2 1. INTRODUCTION Given a point x = (xi,..., xn)T E Rn the distance between the origin 0 and x, by three commonly used distance measures is defined to be: L2, or Euclidean distance between 0 and x, Ilxl12 = vx 2 +... + x2 = x L1, or rectilinear distance between 0 and x, Ilxl1i = lxi +... + Ixnl Loo, or Tschebychev distance between 0 and x, lxlloo1 = max {Ix1I,..., Ixnl} Given a 6 E R1, 6 > 0, we define Sr(6) = { x: lixllr < 6 }, the Lr- sphere with 0 as center and radius 8, for r = 2, 1, oo. These spheres are drawn in Figure 1 for n = 2.

3 Lo, O 0S(,- ) 8() in R I S.(8) in R (0,8 I (8, 0),) in R2 So(8) in R Figure 1: The L2-, L1- and Lo-. spheres in R2

4 In Rn, S1(6) has 2" facets (one corresponding to each orthant), but only 2n extreme points (one on each half-axis); whereas So,(6) has 2n facets (one corresponding to each half-axis), and 2" extreme points (one in each orthant). Let K = {x:Ax > b} be a given nonempty convex polyhedron in Rn. Finding the nearest point in K to the origin by the three distance measures leads to the following problems: n minimize X, xil i=1 (P1) subject to Ax' b minimize xTx subject to Ax _ b (P2) minimize (max {Ixi I,..., Ixnl}) subject to Ax > b (Poo) (P2) is a strict convex quadratic program (QP). It is well known that (P1) is equivalent to the following linear program (LP) (P'1), and similarly (P,) is equivalent to the LP (P'O,)

5 n minimize, (uj + vj) j=1 subject to Ax > b (P'1) xj - Uj + vj =, j = 1 to n uj, vj, j = 1 to n minimize e subject to Ax > b j - uj + Vj = O, j = 1 to n (P'oo) 0- uj-vj >, j = 1 to n uj, vj 0, j = 1 to n We will refer to the optimum solutions of (P1), (P2), (Po,) as the L1-, L2-, LO,- minima or nearest points respectively. It should be understood that this refers to the nearest points in K to the origin. Being the optimum solution of a strict convex QP, the L2- nearest point in K is unique. Since the LPs (P'1), (P'O,) may have alternate optimum solutions, the L1and Loo- nearest points in K may not be unique. We will use the following notation:

6 x = (xi,..., Xn)T = L2- nearest point in K r1, Fr, = sets of L1-, Loo- nearest points in K respectively 6r = Ilxllr for each xe Fr, r = 1, oo; and 62 = I1x112. Since we assumed that K 0 p, x exists, and rl and r,, are both nonempty. Also, {x) = K n S2(82), and F1 = S1(61) n K, rO = So(8so) n K. Obviously x is a boundary point of both K and S2(82). rF, r2 are both subsets of the boundary of K. Also F1, r2 are subsets of the boundaries of S1(81), SO(6o,) respectively. Nearest point models of the type (Pi), (P2), and (P,) are used very often in research studies involving statistics, science and engineering; for curve fitting, approximation of functions, etc. In a few instances, the choice of distance to use is preordained by circumstances, for example, for driving in a city with a rectangular grid of streets like Manhattan, NY, the rectilinear (Li) distance is the only logical choice. However, in general in most applications all three distance measures make sense, and in such cases a widely debated issue,(Appa and Smith [1973])is: which distance measure to use, L1, L2, or LO,. Unfortunately, no solid scientific principles have come up to prefer one measure over the other. The choice is mostly a matter of personal preference. Sometimes researchers seem to prefer obtaining nearest points by all three measures, and select the best among them based on extraneous practical or political considerations. This opens the need to develop efficient procedures to obtain nearest points by one measure given a nearest point by another. McCormick and Sposito [1976] and Sklar and Armstrong [1982] have suggested some rules that can be used to obtain an optimum Li- point

7 given the L2- minimum, in applications of constrained regression. These rules help in reducing the number of iterations required to solve the LP corresponding to the L1- regression problem. In this paper, we investigate the relationships between the sets {x4, F1, and r,, and ways in which the computation of the others can be simplified given one of them. We also derive results comparing the distances of the L1-, L2-, L,o- minima measured using the same measure (one of L1, L2, or LO). Surprisingly, we find that these could differ by as a factor of approximately fn, or n asymptotically, so in higher dimensions they could be wide apart by any measure. The mysterious'~n or n factors add fuel to the controversy but leave the preferential nature of the choice among these measures unaltered. Philosophically this is unfortunate. It lends support to the skeptical view often expressed by practitioners, "the solution to a practical problem obtained from a correct and highly rigorous mathematical analysis, may depend critically on the way mathematics is used to model the problem."

8 2. SOME ILLUSTRATIVE EXAMPLES x is a boundary point of K, and Fi, Fo are subsets of the boundary of K. Each of the sets r1, rFo is a subset of a face of K. They could be proper subsets of faces of K as illustrated in Figure 2. Each of x, Tr, r, may lie in different faces of K with no intersection, this is illustrated in Figure 3. K m I I L. -I Figure 2: An example in R2 where Fi and Fr are both proper subsets of a face of K. F1 is the thick line segment and ro, is the singleton set containing only the dotted point.

9 K Figure 3: An example in R2 with x, r1 (thick line segment), rO (dashed line) lying in different nonintersecting faces of K.

10 3. RESULTS ON THE SETS r1, {X}, ro. Remember that K = { x:Ax > b }. There may be no sign restrictions on x, and in general K may not be contained in a single orthant of Rn. However, we will show that F1 is always contained in a single orthant of Rn. THEOREM 1: F1 lies in a single orthant of Rn. PROOF: As discussed earlier r1 = S1(81) n K, and it is a subset of the boundaries of both K and S1(81). Let F be a facet of S1(81) containing a point of F1, and H the hyperplane in R" containing F. Hence H separates S1 (1) from K, this implies that S1(56) n K = r1 C F, thus all of F1 is in the orthant of Rn containing F (Figure 4). H Figure 4: The thick line segment is F1 = F n K.

11 RESULT: There may not be a single orthant containing the entire set r,. See Figure 5. r I I I' —'-' —' I I K I I I I'="= -'1I Figure 5: Example in R2. [ot is the thick line segment.

12 4. RELATIONSHIPS AMONG 61, 82, 8,. THEOREM 2: (i) Either all three of 81, 82, 800 are zero, or all three of them are strictly >0. (ii) (a) -'82 <61 ^ /n (b) 86<626 <oo, (C) 86o <1 < n (iii) 61 = 82 = 8o iff x lies on a coordinate axis of Rn. (iv) All the L1-, L2-, Lo- nearest points coincide at a single point iff x lies on a coordinate axis. PROOF: (i) Follows very directly. All three of 61i, 62, 80 are zero if the origin is in K, otherwise all three are > 0.

13 (ii) (a): Let F be a facet of S1(61) containing r1. The farthest points in F from the origin by the Euclidean distance are the points of intersection of the coordinate axes with F whose Euclidean distance is 61. Hence F1 contains points whose Euclidean distance is 681. Since x is the closest point in K D rf by the Euclidean distance, we have 82 < 81. H, the hyperplane containing F separates K from the origin, and the closest point in H to the origin by L2- distance, has distance 81/I, so 82 > 61/n, completing the proof of (a). (ii)(b,c) Let G be a facet of So,(86,) containing rF,. The Euclidean distance of points in G lies between 86, and 60o, see Figure 6. This implies (b) by an argument similar to that in (a). The rectilinear distance of points in G lies between 8, and n 8o, this implies (c) by an argument similar to that in (a). (iii) Suppose x is on a coordinate axis of Rn. Then IIxII1 and IixlILo are both equal to 82. So, from (ii)(a), 81 = 82 and x is also the L1minimum in K. Also, the tangent hyperplane to S2(82) at x is a facet of Soo(62) (as x is a point on a coordinate axis) and it separates the origin and K, hence 860 = 82 also, and x is also an Loo- minimum in K.

14 Conversely, suppose 61 = 82 = 68. The only points in S1(61) whose Euclidean distance is also 68 are the points on the coordinate axes. Hence 61 = 82 implies that the L1- minimum in K must be a point on a coordinate axis, and by the above this implies that this same point is also the L2- and LO- minimum in K. (iv) This follows from (iii) and the fact that the only points which are common to the boundaries of L2-, L1-, and Lo,- spheres of the same radius are those on the coordinate axes. K Figure 6.

15 5. COMPARISON OF THE DISTANCES OF THE THREE MINIMA, USING THE SAME MEASURE. To help choose one of the three L1-, L2- or L=,- distance measures, we need comparative figures on how far the three minima could be spread out. For example, if the distance of the L2- minimum, x, is measured in Li- distance, how much greater than 61 can it be in the worst case? If this difference is small, we could consider it fairly unimportant which among L1 or L2 is selected as the distance measure. We obtain results on such comparative questions for every pair of distance measures in this section. We have the following definitions: 2 2 J1., -oo. 1 1 J2, (,oo), T1, (r2), the L1- and LO, distances of x. the L2in F1. the L1in Fr. (Loo-) distance of the nearest L2- (Lo-) point to the origin (L2-) distance of the nearest L1- (L2-) point to the origin So, in this notation, the superscript on gp gives the type of minimum (L1, or L2, or Lo,), and the subscript gives the distance measure used to measure its distance. THEOREM 3: We have (1)

16 PROOF: Clearly 1' 61. To prove the upper bound, we look for the 2 maximum value of g2 with 61 fixed. The smallest L2- sphere containing S1(61) is S2(81), and x must lie in it. The maximum value of lxil1i over S2(61) is 81\n, and hence IxI111 -< 861X, which implies 2 < 81'F. 0 EXAMPLE 1: A POLYHEDRON ATTAINING HALF OF THE UPPER BOUND IN (1) ASYMPTOTICALLY WITH n. Let K3 be the half-space defined by n-1 xi ( —j7= +xn > 61 (2) i=1 1 + ~In where 61 > 0. The L1- nearest point in K3, is unique and it is (0,..., 0, 61). The L2nearest point in K3 is x = (X/a,..., X/a, )T where a = 1 + i n and X = 61 / (1 + (n-1) / (1 + Xn)2). So for K3, we have + 1(1 + n)2 I cn b ii i n si () s (1 It can be verified that the right hand side of (3) approachesa()~1 (1+ \fn) asymptotically (i.e., as n gets larger).

17 NOTE 1: The proof of the upper bound 81 n for P1 in Theorem 3 is extremely simple, but we believe that bound is not tight. We believe that the right hand side in (3), which approaches (f)81 (1 + n) asymptotically is an upper 2 bound for j12 (this is actually achieved in the polyhedron K3 in Example 1) but its proof is messy. THEOREM 4: We have 2< < 8 oo 8 oo oo (4) PROOF: Clearly 1, = 6,o. To prove the upper bound, we look for the 2 maximum value of Ioo with 6~ fixed. K has a nonempty intersection with the boundary of SO,(56o) but not its interior, and the smallest L2- sphere containing Soo(6o) is S2(8o, n), and 7 must lie in it. The maximum value of Ilxlloo over S2(6, oo ) is 60,0 n, and hence |II1|0| < 80. 4, which implies 82 < 6o v. O EXAMPLE 2: A POLYHEDRON ATTAINING HALF OF THE UPPER BOUND IN (4) ASYMPTOTICALLY WITH n. Let K4 be the half-space defined by n-1 xi - + xn > 6 (5) i=1 X

18 Where 8~0 > 0. The LO,- nearest point in K4 is unique and it is (,,..., 5.o)T. The L2- nearest point in K2 is x = (/in,..., XI/-n, X)T where X = oo 1 + )-(1 + n-) Its Loo- distance is 2 x, /(n - 1 + n)/ (6) I'[Soo =Xi2n - 1(6) It can be verified that the right hand side of (6) approaches (6,o/2) (1 + \ n) asymptotically. 2 NOTE 2: Again, the proof of the upper bound L; Fn for g., in Theorem 4 is very simple, but we believe that this bound is not tight. We conjecture that the right hand side of (6), which approaches (6o/2) (1 + In) asymptotically, is an upper 2 bound for Lo, this is achieved in the polyhedron K4 in Example 2. THEOREM 5: We have <22 <p82= n (7) PROOF: As in the proofs of Theorems 3, 4, this follows because the smallest Lo- sphere containing S2(62) is So,(2) and the maximum value of lIx1l2 over So(62) is 62. 0

19 THEOREM 6: We have 82 < J2 < 82 X (8) PROOF: As in the proofs of Theorems 3, 4, this follows because the smallest L1- sphere containing S2(82) is S1(82 I.) and the maximum value of lIx1l2 over S1(52 Vn) is 82 8 n. 0 THEOREM 7: We have 8o < < n 8 (9) PROOF: Again this follows because the smallest L1- sphere containing Soo(86o) is S1(n8,), and the maximum value of l|xl|l over S1(n8o) is n8oo. THEOREM 8: We have 81 < 1. < n8i (10) PROOF: This follows because the smallest Loo- sphere containing S1(81) is So,(81) and the maximum value of Ilxl1i over So(81) is n81..

20 EXAMPLES TO SHOW THAT THE UPPER BOUNDS IN THEOREMS 5 TO 8 ARE TIGHT IN THE LIMIT. For Theorem 5, take S2(82) and the tangent hyperplane, Hs, to it at a point on its boundary in the interior of Rn but very close to (0,..., 0, 62)T. See Figure 7. Let K5 be the half-space defined by H5 not containing the origin. Then i&' for K5 can be verified to be very close to 82'n. K5 H5 I N / I _. Figure 7: Construction to show that the upper bound in Theorem 5 is tight in the limit.

21 For Theorem 6, take S2(82), and the tangent hyperplane to it, H6, at a boundary point close to (62/',..., &2/Th)T. See Figure 8. Let K6 be the half-space defined by H6 not containing the origin. Then for K6, I.2 can be verified to be very close to 82 Vn. K6 6 Figure 8: Construction to show the tightness of the upper bound in Theorem 6.

22 For Theorem 7, take H7, the facetal hyperplane of Si(n60) in the nonnegative orthant, and tilt it slightly keeping the point (5.,..., 6,)T fixed. Let K7 be the half-space defined by this hyperplane not containing the origin. See Figure 1 9. It can be verified that for K7, gmo is close to nBo. K7 H7 a,_ I_/ Figure 9: Construction to show the tightness of the upper bound in Theorem 7.

23 For Theorem 8, take S,(81), and keeping the point (0, 0,..., 0, 81)T fixed, tilt its facetal hyperplane containing this point slightly, and let K8 be the half-space not containing the origin, defined by this tilted plane. See Figure 10. It can be verified that I~' for K8 is very close to n81. K8 (0,..., 81 ) I\ I_ Figure 10: Construction to show the tightness of the upper bound in Theorem 8.

24 6. HOW TO USE A SOLUTION OF ONE OF THE PROBLEMS, AS A GOOD INITIAL POINT TO SOLVE THE OTHERS. L1-, Loo- nearest point problems are LPs, while the L2- nearest point problem is a QP. A variety of methods are available to solve each of these problems. Here we show how to use a given solution to one of these problems as a good initial point in algorithms to solve the others. Suppose (P'o) is solved by a variant of the simplex algorithm, and let x = (xj) be the Loo- nearest point obtained from an optimum basic feasible solution (BFS) of (P'o). Define vj = I minimum { 0, xj}, uj = maximum { 0, xj }, for j = 1 to n, u = (uj), v = (vj). Then (x, u, v) is a BFS of (P'1) and can be used to initiate the solution of (P'1) by the simplex algorithm. Similarly, if (x, u, v) is an optimum BFS of (P'I), define 6 = Maximum {uj + v: j = 1 to n }, then (x, u, v, ) is a BFS of (P'o) and can be used to initiate the solution of (P'oo) by the simplex algorithm. Now, let w = Ax - b, the vector of slack variables, and z, the vector of Lagrange multipliers, associated with the constraints in (P2). Then the linear complementarity problem (LCP) corresponding to (P2) is w - AATz=-b w 0, z 0 (11) wTZ = 0. and if (w, z) is a solution of the LCP (11), then x = ATI is an optimum solution of (P2). qiven x1, an L1- or LO,- nearest point, let w1 = Ax1 - b, g = -b - w1. Since wl > 0, g = O0 iff b < 0, in this case OE K and hence 0 is the L1-, L2-, and Lo,- nearest point. Assuming that this is not the case, g ~ 0. Introduce an artificial variable zo

25 associated with the column vector g in the system of linear equality constraints in (11) leading to w - AATz + o g = -b (12) Then (w, z, Zo) = (wl, z1 = 0, zO = 1) is a nonnegative solution of (12). From the results in Section 1, at least one component in w1 is zero, hence (wl, z1, zo) is an almost complementary basic feasible solution of (12) and the LCP (11) can be solved by applying the complementary pivot algorithm on (12) beginning with this almost complementary BFS [Lemke [1965], Cottle and Dantzig [1968], Murty [1988]]. Given the L2- nearest point x, a BFS of (P'1) or (P',) can be constructed from it by standard techniques, and this can be used to initiate the solution of those problems by the simplex algorithm.

26 7. Acknowledgements: This research has been partially supported by NSF grant No. ECS-8521183, and by King Fahd University of Petroleum and Minerals, Dhahran-31261, Saudi Arabia. This support is acknowledged. We are grateful to John R. Birge for pointing out an error in an earlier version of this paper.

27 1. G. Appa, and C. Smith "On L1 and Chebychev Estimation," Mathematical Programming, 5 (1973) 73-87. 2. R. W. Cottle and G. B. Dantzig, "Complementary Pivot Theory of Mathematical Programming," Linear Algebra and Its Applications, 1 (1968) 103-125. 3. C. Lemke, "Bimatrix Equilibrium Points and Mathematical Programming," Management Science, 11 (1965) 681-689. 4. G. F. McCormick, and V. A. Sposito "Using the L2- Estimator in L1Estimation," SIAMJ. Numerical Analysis, 13, No. 3 (June 1976) 377-342. 5. K. G. Murty, Linear Complementarity, Linear and Nonlinear Programming (Heldermann Verlag, West Berlin, 1988). 6. M. G. Sklar, and R. D. Armstrong "Least Absolute Value and Chebychev Estimation Utilizing Least Squares Results," Mathematical Programming, 24 (1982) 346-352.