Pure Adaptive Search in Global Optimization Zelda B. Zabinsky Industrial Engineerng Program, FU-20 University of Washington Seattle,Washington 98195 Robert L. Smith Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 Technical Report 89-22 June 1989

PURE ADAPTIVE SEARCH IN GLOBAL OPTIMIZATION Zelda B. Zabinsky Industrial Engineering Program, FU-20 University of Washington Seattle, Washington 98195 Robert L. Smith Department of Industrial & Operations Engineering The University of Michigan Ann Arbor, Michigan 48109 June 20, 1989 Abstract Pure adaptive search iteratively constructs a sequence of interior points uniformly distributed within the corresponding sequence of nested improving regions of the feasible space. That is, at any iteration, the next point in the sequence is uniformly distributed over the region of feasible space containing all points that are equal or superior in value to the previous points in the sequence. The complexity of this algorithm is measured by the expected number of iterations required to achieve a given accuracy of solution. We show that for global mathematical programs satisfying the Lipschitz condition, its complexity increases at most linearly in the dimension of the problem. Key Words: Random search, Monte Carlo optimization, global optimization. 1

1 Introduction In this paper, we provide a theoretical analysis of pure adaptive search for general global optimization (see [18,19,20] for a survey of the field). The algorithm proceeds by generating a sequence of points uniformly distributed in a sequence of nested regions of the feasible space. At any iteration, the next point in the sequence is uniformly distributed over the region of feasible space containing all points that are equal or better in value to the previous points in the sequence. It has been shown [16] that for convex programs, the number of iterations required to achieve a given accuracy of solution increases at most linearly in the dimension of the problem. In this paper, we extend this linear complexity result for non-convex or global optimization problems satisfying the Lipschitz condition. We do this by modeling the sequence of values for points obtained by pure adaptive search as a nonhomogeneous Poisson process. A bound on the rate of this process is used to obtain an upper bound on the expected number of iterations which is a linear function of the dimension of the problem. Although at this time there is no known efficient implementation of the pure adaptive search algorithm, the theoretical result of linear time complexity for global optimization is interesting in itself. Pure adaptive search is not practical because the principal computational effort of the algorithm lies in generating a point uniformly distributed in the improving region. At present, this is a challenging problem with no satisfactory solution. However, the linearity result suggests there is hope for an efficient random search method for global optimization. In fact, several random search algorithms have reported linearity in dimension [23,24,26], although only for convex programs. Pure adaptive search is analogous to a randomized method of centers [12] which, itself is not very practical, but has been the precursor to a class of extremely efficient projective and affine scaling methods [8,10,13,17] for linear programming. Our hope is that pure adaptive search can similarly inspire better random search methods for global programming. 2 Pure Adaptive Search Consider the following global mathematical program, (P) minf(x) xES where x is an n-dimensional vector, S is a convex, compact subset of Rn, and f is a realvalued continuous function defined over S. We will assume that f satisfies the Lipschitz condition with Lipschitz constant kf, i.e., If(x) - f(y)l I kfllx - yll 2

for all x and y e S, where || j is the Euclidean norm on Rn. To ensure a proper sampling distribution, we further assume that f has no flat spots, i.e., that f'(x) $ 0 for almost all x E S. Let the optimal solution to (P) be denoted by (x*, y.) where x* = argmin f(x) xES and Y* = f(x*) = min f(x). arES It will also be convenient to define y* = max f(). xES Note that we do not require that a unique minimum exists at x*. If there are multiple minima, let x* be a arbitrary fixed minimum. The pure adaptive search (PAS) procedure for solving (P) begins by generating a point X1 uniformly distributed within the feasible region S. The associated objective function value is labeled W1 = f(X1). The next point is generated from a uniform distribution over the region formed by the intersection of the feasible region with the level set of points with objective function values equal to or less than W1. The procedure proceeds iteratively in this fashion until a stopping criterion is satisfied. More formally, Pure Adaptive Search (PAS) Step O. Set k = 0, So = 5, and Yo > y*. Step 1. Generate Xk+1 uniformly distributed in Sk+l = {x: x E S and f(x) < Wk}. Step 2. Set Wk+1 = f(Xk+l). If a stopping criterion is met, stop. Otherwise set k = k + 1 and return to Step 1. The difficulties of implementing pure adaptive search are discussed in Patel, Smith, and Zabinsky [16]. In this paper we are however interested in its computational complexity as a model for other possible algorithms that share some of the same features as pure adaptive search. For example, if another algorithm can generate random points (uniform or not) with associated objective function values stochastically less than those of uniformly distributed points, then the performance of the new algorithm will be bounded by the performance of pure adaptive search. Thus a theoretical result of linear time complexity for PAS supports research in other random search algorithms for global optimization. 3

3 A Comparison with Pure Random Search It is instructive to compare pure adaptive search with pure random search [4,6,7]. Pure random search (PRS) generates a sequence of independent, uniformly distributed points in the feasible region. When a stopping criterion is met, the best point of the sequence generated thus far is used as an approximation to the optimal solution. For the global optimization problem (P), consider the stochastic process {Wk, k = 0, 1, 2,..} of objective function values generated by PAS, and let {Yk, k = 0, 1, 2,...} be the corresponding sequence of values for the points generated by PRS so that Yk = f(Xk), k = 1,2,... where X1, X2,... are independent and uniformly distributed over S. For convenience, define Wo = Yo = y*. Let p(y) be the probability that a point in the sequence generated by PRS has an objective function value less than or equal to y, that is, p(y) = P (Yk < y) for k = 1,2,... and y* < y < y*. Note that p(y) is the same for all k since PRS generates identically distributed points. Also, due to the uniform distribution employed in PRS, we have p(y) = v(S(y))/v(S) where S(y) = {x: x E S and f(x) < y} and v(-) denotes Lebesgue measure. Incidentally, although a uniform distribution is used in both the PRS and PAS algorithms, much of the subsequent analysis holds for nonuniform absolutely continuous distributions. We have for pure adaptive search that P (Wk+l < yIWk = z) = v(S(y))IV(S(Z)) = p(y)/p(z) for any k, where y* < y < z < y*. We now establish a fundamental relationship between the iterates of PAS and PRS. The lemma below states that the record values of pure random search are equal in distribution to the values generated by pure adaptive search. Definition A record is said to occur at epoch i for the sequence {Yk, k= 0,1,2,...} if Yi < min(Yo, Y,..., Yi_-). The corresponding value Yi is called a record value. Lemma 3.1 For the global optimization problem (P), the stochastic process {Wk, k = 0,1,2,...} is equal in distribution to the process {YR(k), k = 0,1,2,...} where R(k) is the kth record value of the sequence {Yk, k = 0, 1, 2,... }, i.e. {Wk, k = 0, 1,2,...} {YR(k), k = 0, 1,2,.... In particular, P(Wk < y) = P (YR(k) Y) for k = 0,1,2,... and y y < y*. 4

Proof: See Appendix. I An intuitive understanding of the previous lemma follows from the property that a point X uniformly distributed over a region S is conditionally uniform over the region S' C S when given that X is in S'. It follows that a simple acceptance-rejection approach to generating {Wk} would be to generate {Yk} and select the record values {YR(k)}. Theorem 3.2 Let k and R(k) be respectively the number of PAS and PRS iterations needed to attain an objective function value of y or better, for y. < y < y*. Then R(k) = ek+o(k) with probability 1 where limk.O o(k)/k = 0 with probability 1. Proof: We have by definition that Wk-1 > y > Wk However by the previous lemma, this holds for a given k with the same probability that YR(k-1) > Y > YR(k) holds. From [9, p. 298], the record values R(k) of a sequence of continuous independent and identically distributed random variables satisfy r In R(k) lim nR(k) 1 with probability 1. k- oo k This implies that In R(k) = k+o(k) w.p. 1 and thus, R(k) = ek+o(k) w.p. 1. I The result in Theorem 3.2 states that the number of PRS iterations needed to reach the kth minimum, R(k), is exponentially growing in the number of iterations of PAS, k, needed to reach an equivalent minimum. Thus the complexity of PRS is exponentially greater than that of PAS. Of course, each iteration of PAS may be more difficult than an iteration of PRS since the search region changes with each iteration. It is nonetheless interesting that the simple device of forcing monotone value improvement on PRS achieves an exponential improvement in iterations required. 5

4 The Distribution of Improvement We turn now to establishing the distribution of the values {Wk, k = 0, 1,2,... } obtained by pure adaptive search. Lemma 4.1 Let Z1, Z2,.. denote a sequence of independent and identically distributed nonnegative continuous random variables, whose hazard rate function is given by A(z), A(z) = f(z)/(1 -F(z)) where f and F are respectively the density and cumulative distribution function of Z. Let M(z) denote the number of record values (maximum) of {Z, i = 1, 2,...} less than or equal to z. Then, {M(z),z > 0} is a nonhomogeneous Poisson process with intensity function A(z) and mean value function m(z) = fo A(s)ds. Proof: See [22, p. 47].. Applying the lemma above, let Zk be the relative improvement obtained on the kth iteration of PRS where Zk = (y* - Yk)/(Yk - y,). Then, the assumptions of Lemma 4.1 are satisfied since Z0 = 0, and {Zk, k = 1,2,...} are independent identically distributed nonnegative continuous random variables. The cumulative distribution function F of Zk, k = 1,2,... can be written in terms of p(y), as follows. For z > 0, F(z) = P(Zk <z) =P (Yk > (Y* + zy)/(l + Z)) 0 if z <0 = 1-p((y + zy*)/( + z)) if0 < < c. Since M(z) counts the number of records of {Zk, k = 0,1,2,...} with values less than or equal to z, M(z) by Lemma 3.1 is equal in distribution to the number N(z) of PAS iterations achieving a relative improvement of z or less. Theorem 4.2 Let N(z) equal the number of PAS iterations achieving a relative improvement at most z for z > O. Then {N(z),z > 0} is a nonhomogeneous Poisson process with mean value function m(z) = In (1/p((y* + zy*)/(1 + z))) for 0 < z < oo. Proof: By definition, m(z) = J A(s)ds Jo 6

where A(s) = f(s)/(1 - F(s)). Making the substitution t = 1 - F(s) yields l- I-F(z) = -lntl~= -ln ( -F(z)) for z > O. Now, 1 - F(z) = p((y* + zy*)/(l + z)) and hence, m(z) = in (1/p((y* + zy*)/(1 + z))) for 0 < z < oo.. It is now an easy matter to obtain the distribution of the objective function values obtained through pure adaptive search. In particular, since Wk < y if and only if N((y* - y)/(y - y)) < k and Wk is a continuous random variable, P(Wk < y) = P(N((y* - y)/(y - y*)) < k) where by Theorem 4.2, N(z) is a Poisson distributed random variable with mean m(z) = ln(1/p((y* + zy*)/(1 + z))). We therefore have Theorem 4.3 k- p(y) (in (1/p(y)))' P (Wk < y) for k =1,2,... and y < y < y. i=O Proof: See Appendix. I There are several problem classes in th he literature where the asymptotic distribution for large sample sizes of PRS and PAS have been obtained (see for example [1,5,11,15,16]). The result in Theorem 4.3 is particularly striking in that it provides the exact distribution of values generated by PAS for all sample sizes and all global optimization problems. 7

5 Performance Bounds A simple measure of the performance of pure adaptive search is the number of iterations N*(y) required to achieve a value of y or better. Since an objective function value of y corresponds to a relative improvement of z = (y* - y)/(y - y*), we have that N*(y) = N((y*- y)/(y - y)) + 1. The distribution of N*(y) then follows from Theorem 4.2. Corollary 5.1 The cumulative distribution function of N*(y), the number of iterations of PAS needed to achieve a value of y or better, is given by P(N*(y) _< = Ek- p(y)(ln(1/p(y)) P(N*(y) ~ k) = Z i=O for k = 1,2,... and y* < y < y*. The expected value of N*(y) is given by E(N*(y)) = + ln(l/p(y)) fory* < y < y*. As seen in Theorem 4.2 and Corollary 5.1, performance measures of PAS depend on the function p(y) for y, < y < y*, where p(y) is the probability of obtaining an objective function value between y and y* when selecting a feasible point at random according to a uniform distribution. We now derive a bound on p(y) for the class of global optimization problems with objective functions that satisfy the Lipschitz condition over a convex feasible region. The bound is a function of the dimension, n of the problem; the Lipschitz constant, kf of the objective function; and the maximum diameter, ds of the feasible region, where ds = max{llw - vll,w, v E S}. Lemma 5.2 For the global optimization problem (P) over a convex feasible region S in n dimensions with diameter ds and Lipschitz constant kf for objective function f, p(y) > ((y - y*) /kfds)n for y* < y < y* Proof: See Appendix. I From the above lemma together with Corollary 5.1, we get the main result of this paper. 8

Theorem 5.3 For all global optimization problems (P) over a convex feasible region in n dimensions with diameter at most d, and with Lipschitz constant at most k, E(N*(y)) < 1 + [ln(kd/ (y - y.))]n for y* < y < y. Proof: This follows immediately from Lemma 5.2, where p(y) > ((y-y )/kd)' implies that l/p(y) < (kd/ (y-y y)) and from Corollary 5.1, E(N*(y)) = l+ln(l/p(y)) 1 + [In (kd/l (y - y))] n. From the above theorem, we conclude that the expected number of PAS iterations grows linearly in dimension for a class of problems with finite Lipschitz constant k and feasible region diameter d. This is in dramatic contrast to PRS where from Theorem 3.2 we know that the expected number of iterations will be an exponential function of dimension n. The logarithmic term, kd/(y - y*) can be viewed as a bound on the "length" of the graph of f expressed in units of the specified error from the optimal. Clearly, an exponential increase in the Lipschitz constant or the diameter gives rise to a linear increase in the correponding number of iterations of PAS required to achieve the same value error. Although several researchers have empirically reported linear behavior in dimension for a variety of other random search algorithms including Schumer and Steiglitz [24], Schrack and Borowski [23], and Solis and Wets [26], PAS is a difficult algorithm to implement. The principal reason is that there is no known efficient procedure for generating a point uniformly distributed in a general region. Although the problem of efficiently generating many points uniformly distributed within a single bounded region has met with some success [2,3,21,25], the problem of efficiently generating a single point uniformly distributed in each of many bounded regions is still unresolved. An alternative is to design an improving algorithm that generates points that in value stochastically dominate the uniform distribution. That is, if an algorithm generates random points (uniform or not) with associated objective function values stochastically better than those of uniformly distributed points, then the same linear bound on performance will apply. Natural candidates include interior point methods that have displayed similar dimensional linearity for linear programming problems, such as Karmarkar's projective scaling algorithm [13] and its affine scaling variants [8,10,17]. Several of these algorithms [17] are similar 9

in spirit to the method of centers due to Huard [12]. PAS can in fact be viewed as a randomized method of centers. Just as the method of centers has given rise to a class of extremely practical interior point methods for linear programming, perhaps PAS can be similarly employed to inspire a class of practical methods for global programming. 10

References [1] F. Archetti, B. Betro and S. Steffe, "A Theoretical Framewok for Global Optimization Via Random Sampling," Working Paper, (Quaderno dei gruppi di recerca matematica del C.N.R., University of Pisa, 1975) [2] H.C.P. Berbee, C.G.E. Boender, A.H.G. Rinnooy Kan, C.L. Scheffer, R.L. Smith, and J. Telgen, "Hit-and-Run Algorithms for the Identificaion of Nonredundant Linear Inequalities," Mathematical Programming 37 (1987) 184-207. [3] A. Boneh, "A Probabilistic Algorithm for Identifying Redundancy by a Random Feasible Point Generator (RFPG)," in: M.H. Karwan, Bl Lotfi, J. Telgen and S. Zionts, eds., Redundancy in Mathematical Programming (Springer-Verlag, Berlin, 1983) [4] S.H. Brooks, "A discussion of random methods for seeking maxima," Operations Research 6 (1958) 244-251. [5] D.J. Clough, "An Asymptotic Extreme-Value Sampling Theory for Estimation of a Global Maximum," CORS Journal 7 (1969) 102-115. [6] L.C.W. Dixon and G.P. Szego, eds., Towards Global Optimization (North-Holland, Amsterdam, 1975). [7] L.C.W. Dixon and G.P. Szeg6, eds., Towards Global Optimization 2 (North-Holland, Amsterdam, 1978). [8] R.M. Freund, "Polynomial-time algorithms for linear programming based only on primal scaling and projected gradients of a potential function," MIT working paper OR-182-88 (Massachusetts Institute Of Technology, Cambridge, 1988). [9] J. Galambos, The Asymptotic Theory Of Extreme Order Statistics (John Wiley and Sons, New York, 1978). [10] C.C. Gonzaga, "Polynomial affine algorithms for linear programming," Report ES139/88 (Federal University of Rio de Janeiro, Rio de Janeiro, 1988). [11] L. De Haan, "Estimation of the Minimum of a Function Using Order Statistics," Journal of the American Statistical Association 76 (1981) 467-469. [12] P. Huard, "Resolution of mathematical programming with non-linear constraints by the method of centers," in J. Abadie, ed., Non-Linear Programming (North-Holland, Amsterdam, 1967) pp. 207-219. 11

[13] N. Karmarkar, "A new polynomial-time algorithm for linear programming," Combinatorica 4 (1984) 373-395. [14] M.G. Kendall, A Course In The Geometry Of n-Dimensions (Hafner Publishing, New York, 1961). [15] N.R. Patel and R.L. Smith, "The Asymptotic Extreme Value Distribution of the Sample Minimum of a Concave Function under Linear Constraints", Operations Research 31 (1983) 789-794. [16] N.R. Patel, R.L. Smith, and Z.B. Zabinsky, "Pure adaptive search in Monte Carlo optimization," Mathematical Programming 43 (1988) 317-328. [17] J. Renegar, "A polynomial-time algorithm, based on Newton's method, for linear programming," Mathematical Programming 40 (1988) 59-93. [18] A.H.G. Rinnooy Kan and G.T. Timmer, "Stochastic methods for global optimization," American Journal of Mathematical and Management Sciences 4 (1984) 7-40. [19] A.H.G. Rinnooy Kan and G.T. Timmer, "Stochastic global optimization methods part I: clustering methods," Mathematical Programming 39 (1987) 27-56. [20] A.H.G. Rinnooy Kan and G.T. Timmer, "Stochastic global optimization methods part II: multi-level methods," Mathematical Programming 39 (1987) 57-78. [21] R.Y. Rubinstein, "Generating Random Vectors Uniformly Distributed Inside and on the Surface of Different Regions", European Journal of Operations Research 10 (1982) 205-209. [22] S.M. Ross, Stochastic Processes (John Wiley and Sons, New York, 1983). [23] G. Schrack and N. Borowski, "An experimental comparison of three random searches," in: F. Lootsma, ed., Numerical Methods For Nonlinear Optimization (Academic Press, London, 1972) pp. 137-147. [24] M.A. Schumer and K. Steiglitz, "Adaptive step size random search," IEEE Transactions On Automatic Control AC-13 (1968) 270-276. [25] R.L. Smith, "Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions," Operations Research 32 (1984) 1296-1308. [26] F.J. Solis and R.J.-B. Wets, "Minimization by random search techniques," Mathematics Of Operations Research 6 (1981) 19-30. 12

Appendix Lemma 3.1 For the global optimization problem (P), the stochastic process {Wk, k = 0,1,2,...} is equal in distribution to the process {YR(k), k= 0,1,2,...} where R(k) is the kth record value of the sequence {Yk, k = 0, 1,2,...}, i.e. {Wk2, 0, 1,2,...} YR(k), k = 0, 12,...}. In particular, P(Wk < y) = P (YR(k)< Y) for k= 0,1,2,... and y < y < y*. Proof: First, we show that the conditional distributions are equal. Let k be any iteration, k = 1,2,..., and x, y be such that y* < y < x < y*. Now, P (YR(kl) i YIYR(k) = ) = P (YR(k)+l YIYR(k) = ) +P (YR(k)+2 < Y, YR(k)+i > xYR(k = ) = + and since the Yk are independent and R(k) is a stopping time for all k, = P (YR(k)+1 < ) + P (YR(k)+2 < Y) P (YR(k)+1 ) +. and since the Yk are identically distributed, i.e., Yk-Y1 for k > 1, 00 = P(Y1 <y)P(Y1 x) i=O P (Y1 y) 1 - P (Y > x) = v(S(y))/v(S(x)), which is the conditional distribution of improvement for pure adaptive search, = P(Wk+l < YWk =X). We have P (YR(k+I) < YIYR(k) =x) = P (Wk+1 < Y\Wk = x). Next, induction is used to show that the unconditional distributions are equal. By definition, Yo = Wo = y*. Thus, P (YR(1) <y) = P (YR() YIYO =Y) = P (Wi yWo = y*) = P(W1 y) for all y, < y < y*, 13

and hence, YR(1) "W1~ Let k be an integer greater than 1, and suppose YR(.)~Wi for i = 1,2,..., k. Then, P (YR(k+l) ~ y) = E [P (YR(k+1) < YIYR(k))1 = j P (YR(k+) < YYR(k) = x)dFyrk)(x) and using the equality of the conditional distributions and the induction hypothesis = P (Wk+l yWk = x)dFwk(x) = E [P (Wk+l < ylWk)] = P (Wk+l < y) for all y, < y < y*. Therefore, YR(k+i) ~Wk+1 and by induction the two sequences are equal in marginal distribution. Finally, by the equality of conditional and marginal distributions, the two sequences are equal in joint distribution. * Theorem 4.3 k-1 p(y) (In (I p(y))) P (W < ( (ln(1/p(y)) for k 1,2,... and y. < y < y. i=O Proof: As noted in the text, P(Wk < y) = P (N((y* - y)/(y - y)) < k) k-1 EP (N((y* -y)/(y -y)) = i) i=O and because N(z) has a Poisson distribution with mean m(z) (Theorem 4.2) k-1 e-m(('-Y)/(y-y))m(y* y)/(y - y*))/i! i=o and since algebraically m((y* - y)/(y - y)) = In (l/p(y)) for y, < y < y*, we have = e-ln (l/p(y)) (ln (1/p(y)))i/i! i=o k-1 -E p(y) (ln (1/p(y))) /i!. I i=O 14

Lemma 5.2 For the global optimization problem (P) over a convex feasible region S in n dimensions with diameter d and Lipschitz constant k, p(y) > ((y - y) /kd)' for y* < y < y*. Proof: To obtain a lower bound on p(y), we construct two intermediate bounds. Let r = (y*-y*)/k Br = n-dimensional hypersphere centered at x* with radius r Bd = n-dimensional hypersphere centered at x* with radius d kllx - x|ll + y* if x E Br g() y* otherwise inf [y: (x, y) E convex hull of (x*, y*) and (Br, y*)] if x E Br y* otherwise G(y) = {x: x E S and g(x) < y} Sr = Brns h() f inf [y: (x, y) E convex hull of (x*, y*) and (Sr, y*)] if x E Sr y* otherwise H(y) = {x: x E S and h(x) < y}. For a geometrical interpretation of these, refer to figure 1. Notice that two expressions are given for g(x). These expressions are equivalent in defining g(x), which can be easily verified by comparing the ratio lIx -x* II/r to the ratio y/ (y* - y.). The sets G(y) and H(y) are the level sets of the functions g(x) and h(x) respectively. We first show that the level sets are nested, H(y) C G(y) C S(y) for any y* < y < y*. We begin by considering any x E H(y) for any y* < y < y*. By definition of H(y), x G S and h(x) < y. There are two cases, either x C Sr or x' Sr. If x E Sr, then h(x) = inf [y: (x, y) E convex hull of (x., y*) and (Sr, y*)]. Since Sr = Br nS, x E Br, so g(x) = inf [y: (x, y) E convex hull of (x*, y*) and (Br, y*)]. Also, Sr C Br implies that h(x) > g(x), thus x E G(y). Now, because f(x) satisfies the Lipschitz condition, we have If(x)- f(x)l < kllx - x*.1 15

*X * I 0 I I ~ ~S I * -- -- d, -3 X X X II**_ ^ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -^ - - - - - - - - - - - - - - - - - - - - - — 61111.11.~

for any x E S, or equivalently f(x) < kllx - xll + y* which equals g(x) using the first expression for defining g(x). Thus f (x) < g(x) and x E S(y). If x V S,, then h(x) = y. Also, because x E S, we have x V Br, and thus g(x) = y*. And f(x) < y*, so again, x e H(y), x e G(y), and x E S(y). Combining both cases gives us H(y) C G(y) C S(y). Now, using this result yields, p(y) = v(S(y))/v(S) > v(G(y))lv(S) > v(H(y))/v(S) = ((H(y))/!,(Sr)) (v(Sr)/l(S)) and due to the similarity of the convex conical sets, = (( - y) /(y* - y*)) (V(Sr)/V(S)). Thus the first intermediate bound is: p(y) > ((y - y) / (y - y,)) (v(Sr)//(S)). (1) It remains to develop a lower bound on i(Sr)/lv(S). Before we continue, notice that for some mathematical programs, the set Sr equals the set S, and thus the bound simplifies to p(y) > ((y - y) / (y* - y,)) For example, let f(x) = Ixl, with S = [-2, 2]. In this simple example, k = 1, y* - y* = 2, and r = 2. Thus we have an example with Br = S = Sr, and the bound is tight. In fact, the bound in equation 1 is tight for the class of convex programs, even though Sr is not necessarily equal to S for all convex programs. A proof is given in Patel, Smith and Zabinsky [16]. We now show that v(Sr)/v(S) > V(Br)/lv(Bd). In order to continue the proof, we need to introduce a similarity transformation. Let X(x): Rn - Rn be the affine function defined by > (y - -y) A(x) = x,+ c(x - x,) for any x E S with c = - -- - d kd 17

which takes a point x in S and moves it towards x, by a factor of c. Also, let Bd = {x:x = A(x),x E Bd} and similarly, S = {x: = A(x),x E S}. The first step is to establish that d = B,. (2) This follows directly from applying the similarity transformation, A, to Bd, which shrinks a ball of radius d centered at x, by a factor of c = r/d. This yields a ball of radius r centered at x*, or Br. The second step is to establish that S C S. (3) To prove this, consider any x E S. We show that x E S, and x E Br. Now, 5x E S because - = A(x) for some x E S = *+ c(x-x ) = cx + (1 - c) and since x, x E S and 0 < c < 1, and S is convex, we conclude E S. Also, x E Br because II - x*I = IIA(x) -.x*l for some x E S = IIx* + C(x - x*) - x* = IIc(x-x*)11 = cl (x - x*)| and by the definition of c = r/d = (r/d)l I(x-x\*)I and because x E S and d is the diameter of S, < (r/d)d 18

UNIVERSITY OF MICHIGAN 3 9015 04732 6759 hence, e B,. Therefore, x G Sr, since Sr = SnB.r Now, using equation 3 yields, (Sr)/lV(S) > _ (S)/(S) = v(Bd)/(Bd) because the ratio of the contents of sets is preserved under the similarity transformation, and = v(Br)/v(Bd) from equation 2. Therefore, v(Sr)/v(S) > v(Br)lv/(Bd) Substituting the above inequality into equation 1 yields the second intermediate bound: p(y) > ((y - y) / (y - y*))" (v(B,)/v(Bd)). (4) But the ratio of the volumes of two n-dimensional hyperspheres of radii a and b is (a/b)" [14], thus (Br)/(Bd) = (r/d)n = ((y*-y) /kd) Hence, the final bound on p(y) is: p(y) > ((y - y*) /kd)n (5) and the proof is concluded. * 19