IMPROVING HIT-AND-RUN FOR GLOBAL OPTIMIZATION Zelda B. Zabinsky Industrial Engineering Program, FU-20 University of Washington Seattle, WA 98195 Robert L. Smith Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 J. Fred McDonald Department of Mathematics and Statistics University of Windsor Windsor, Ontario, Canada N9B 3P4 H. Edwin Romeijn Dept. of Operations Research & Tinbergen Institute Erasmus University Rotterdam P.O. Box 1738 NL-3000 DR Rotterdam, The Netherlands David E. Kaufman Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 Technical Report 92-25 March 1992

IMPROVING HIT-AND-RUN FOR GLOBAL OPTIMIZATION* Zelda B. Zabinskyt Robert L. Smith Industrial Engineering Program, FU-20 Department of Industrial and Operations Engineering University of Washington The University of Michigan Seattle, Washington 98195 Ann Arbor, Michigan 48109-2117 J. Fred McDonaldt Department of Mathematics and Statistics University of Windsor Windsor, Ontario, Canada N9B 3P4 H. Edwin Romeijn Department of Operations Research & Tinbergen Institute Erasmus University Rotterdam P.O. Box 1738 NL-3000 DR Rotterdam, The Netherlands David E. Kaufman Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, Michigan 48109-2117 March 4, 1992 *This work was supported in part by NATO Collaborative Research Grant 0119/89. tThe work of this author was partially supported by the Graduate School Research Fund at the University of Washington. SThe work of this author was partially supported by the Natural Sciences and Engineering Research Council of Canada under Grant A4625. 1

Abstract Improving Hit-and-Run is a random search algorithm for global optimization that at each iteration generates a candidate point for improvement that is uniformly distributed along a randomly chosen direction within the feasible region. The candidate point is accepted as the next iterate if it offers an improvement over the current iterate. We show that for positive definite quadratic programs, the expected number of function evaluations needed to arbitrarily well approximate the optimal solution is at most O(n5/2) where n is the dimension of the problem. Improving Hit-and-Run when applied to global optimization problems can therefore be expected to converge polynomially fast as it approaches the global optimum. Key Words: Random search, Monte Carlo optimization, algorithm complexity, global optimization. 1 Introduction Random search algorithms offer considerable promise as efficient optimization methods for a large class of problems. Recent results [16, 28] demonstrate that it is theoretically possible for a random search algorithm to achieve a computational complexity that is, on average, linear in dimension. In this paper, we introduce and implement a new sequential random search algorithm named Improving Hit-and-Run (IHR). Sequential random search procedures are designed to address a standard optimization problem, (P) min f(x) xES where x is an n-dimensional vector, S is a convex, compact, full-dimensional subset of Rs, and f is a real-valued continuous function defined over S. All sequential random search procedures generate a sequence of random points {Xj} which may depend on the previous point or several of the previous points. The concept underlying sequential step size algorithms [17, 18, 15, 23, 21, 13, 22, 25] is to generate the next random point Xj+i by taking a specified step in a random direction from the previous point Xj. These algorithms are based on the iterative formula, for j = 1,2,... X =[ Xj + sjDj if f(Xj + sjDj) < f(Xj) X otherwise where Dj is the direction vector, sj is the step size, and Xj is the point generated in the jth iteration. The direction vector is often, but not necessarily, obtained by sampling from a uniform distribution on a unit hypersphere. The method of choosing the step size is specific to each algorithm. 2

There is experimental evidence in the literature that suggests sequential random search algorithms are efficient for large dimensional quadratic programs. Schumer and Steiglitz [23] provide experimental evidence that the number of function evaluations increases linearly with dimension for their adaptive step size algorithm on the following three test functions: E>=1 x, Z1 X4, and E= ajxT. They also prove that the average number of function evaluations for an optimum relative step size random search restricted to an unconstrained quadratic objective function is asymptotically linear in dimension. Schrack and Borowski [21] report experimental results on a quadratic test function, El x2, that doubling the dimension doubles the number of function evaluations required for their random search algorithm. Solis and Wets [25] experimentally verified a linear correlation between the number of function evaluations and dimension for their own variation of the step size algorithm on a quadratic test function. They provided a justification of this linearity condition based on the tendency of these algorithms to maintain a constant probability of successful improvement. There is also theoretical justification that sequential random search algorithms are efficient for a larger class of global optimization problems. An analysis of a random search procedure called pure adaptive search [16, 28] proves that it is theoretically possible for a sequential random search procedure to achieve linear complexity (in improving iterates) for global optimization problems. Pure adaptive search (PAS) constructs a sequence of points uniformly distributed within a corresponding sequence of nested improving regions. In Zabinsky and Smith [28], complexity is measured as the expected number of iterations needed to get arbitrarily close to the solution with a specified degree of certainty. In pure adaptive search, each iteration corresponds to an improving point. Unfortunately, as pointed out in [16, 28], pure adaptive search is difficult to implement directly due to the problem of efficiently generating a point according to a uniform distribution in a general region. However, recent research has shown that Hit-and-Run methods [4, 5, 11, 24, 2] can generate a sequence of points that asymptotically approach a uniform distribution, as shown by Smith [24]. Hit-and-Run generates a sequence of random points by providing a random direction and then providing a uniform random point in that direction. Since no better alternative for generating uniform points is known at this time, we use Hit-and-Run to generate an improving point in the level set at each iteration. The resulting algorithm is called Improving Hit-and-Run. We analyze the computational efficiency of Improving Hit-and-Run for a specific class of quadratic programs, and theoretically establish an 0(n5/2) upper bound on the expected number of function evaluations. 2 Improving Hit and Run Improving Hit-and-Run (IHR) is designed to be easy to implement and at the same time to inherit the efficiency of pure adaptive search. As already noted, the difficulty of directly 3

implementing pure adaptive search lies in the iterative step of efficiently generating a uniform point in the improving region. This can be achieved by executing the Hit-and-Run algorithm at each iteration, restricting the sequence of Hit-and-Run points generated within an iterative step to the improving feasible region. Since the resulting sequence is only asymptotically guaranteed to be uniform within the improving region, we must decide on how long a sequence to generate. A class of algorithms can therefore be parametrically related to the length of the corresponding Hit-and-Run sequences. At one extreme, we know that when the Hit-andRun sequences are very long and hence provide a close approximation to sampling from a uniform distribution, we have a good approximation to pure adaptive search and can expect the number of improving points to be linear in dimension. At the other extreme, the Hitand-Run sequence is reduced to a length of one. It is the latter algorithm with Hit-and-Run sequences of length one that we effectively adopt. We call the resulting algorithm Improving Hit-and-Run. Although the sequence of points generated per iteration is insufficiently long to well approximate uniformity, the hope is that the algorithm may nonetheless inherit a polynomial complexity similar to that enjoyed by pure adaptive search. In fact, we will prove for the class of positive definite quadratic programs that the expected number of iterations is 0(n2), but more importantly the expected number of function evaluations remains polynomial, in particular, O(n5/2). Improving Hit-and-Run is a sequential random search algorithm. The basic structure of IHR is to generate a random direction followed by a candidate point that is along a random step in that direction. A positive definite matrix H in the algorithm controls the direction distribution. If the matrix H is the identity matrix, then the direction distribution is uniform on a hypersphere. However in order to achieve the analytically established polynomial performance bound, H must be set to the Hessian of the quadratic objective function. In practice, for a global optimization problem H would be locally estimated as in quasi-Newton local search procedures. (See [29] for experimental results on general global optimization problems.) Note that IHR reduces to the Hide-and-Seek continuous simulated annealing algorithm with temperature 0 (see [3]). Given a positive definite matrix H, we define the Improving Hit-and-Run algorithm as follows. Improving Hit-and-Run (IHR) Step 0. Initialize Xo E S, Yo = f(Xo), and j = 0. Step 1. Generate a direction vector Dj from the multivariate normal distribution with mean 0 and covariance matrix H-1. Step 2. Generate a step size sj uniformly from Lj, the set of feasible step sizes from the 4

current iteration point Xj in the direction Dj, where Lj ={A E R: Xj + ADj E S}. If Lj = 0, go to Step 1. Step 3. Update the new point if it is improving, i.e. set Xi Xj+sjD if f(Xj + sjDj) <Y Xj+l { Xj otherwise and set Yj+1 = (Xj+l). Step 4. If the stopping criterion is met, stop. Otherwise increment j and return to Step 1. Improving Hit and Run is straightforward to implement. Generating a random direction, as defined in Step 1, can be accomplished by appropriately transforming a multivariate standard normal deviate, using a decomposition of the covariance matrix H-1. When H = I, this simplifies to computing the direction vector Dj of unit length given by, n \-1/2 Dj = (dl, d2,..., dn) d12 where di, i = 1,2,...,n are sampled independently from a standard normal distribution, N(0, 1) (see [12]). For the general case, we first find a matrix A of rank n such that H = A'A. One possible choice for A is obtained by diagonalizing H, i.e. by writing H = ZDZ', where D is a diagonal matrix containing the eigenvalues of H, and Z is a matrix having the corresponding eigenvectors as its columns. This approach yields A = ZD2Z'. Another possibility is to perform a Cholesky decomposition H = LL', where L is a lower triangular matrix, yielding A = L'. If Dj is distributed according to the standard normal distribution (i.e., Dj has mean zero and covariance matrix I), then A-'Dj has the desired distribution, i.e. A-'Dj is normally distributed with mean zero and covariance matrix H-. Generating the step size in Step 2 is straightfoward as long as it is possible to find the points where the line through Xj in the direction Dj intersects S. This is particularly easy for the case where S is a set of linear constraints. The points of intersection are easily found with a modified minimum ratio test [14]. Alternatively, one can enclose a general region in a box and identify the corresponding points of intersection, and use one dimensional acceptance-rejection along the resulting line segment until a feasible point is found. 3 Polynomial Performance 3.1 Definitions and Notation Before proceeding with the analysis of the algorithms computational complexity, we begin with notation and definitions. 5

For the optimization problem (P), let (x., y*) denote an optimal solution, where y, = min f(x) xES and f(x*) = y*. The value x* need not be unique. It will also be convenient to define the maximum, y*= max f(x). xES Let Py denote the level set of the problem (P) at objective function value y, Py= {x S: f(x) < y} for y* < y < y*. Let {Xj }, Xj E S, j = 0, 1, 2,... be the sequence of sample points generated by Improving Hit-and-Run on (P), with the subscript j denoting iteration count. Notice that each iteration corresponds to one sample point, and one function evaluation. Let {Yj}, y* < Yj < y*, j = 0, 1,2,... be the corresponding sequence of sample values generated by the algorithm on (P), Yj = f(Xj). Improving Hit-and-Run is defined so that the current sample point is only updated (Xj+ = Xj + sjDj) when it improves and repeated otherwise (Xj+l = Xj), so that Yj+l > Yj for j = 0, 1, 2.... The sequences {Xj} and {Yj} are random variables since the directions and step sizes are random. The distribution of improvement is defined to be the probability that the jth objective function value Yj is at most y. This is the probability that the jth sample point lies within the level set Py, so that P(Yj < ) = P(X, E P,) for j = 0,1,2,... and y* < y < y*. The conditional distribution of improvement, denoted P (Yj+ < yiY = w) for j = 0, 1, 2... and y* < y, w < y* is defined to be the probability of obtaining an objective function value of at most y in a single iteration, starting from sample value w. It will be convenient for the analysis to identify the sequence of record values and their corresponding improving points. Let jk denote the subscript corresponding to the kth improving point. Thus the point Xj, is the kth record point. The analysis will be performed on a class of mathematical programs with "elliptical" level sets. This class of programs includes positive definite quadratic programs as a special 6

case. We define an elliptical program to be the mathematical program (P), where f(x) can be expressed as f(x) = h(r) with r = ||X - X*11A, where A is an n x n-matrix of full rank, and h is strictly monotonically increasing for r > 0. The norm 1II - A is defined for any n-dimensional point z as IIZllA = Azl1 where 11 I[ denotes the standard Euclidean norm. Note that I 11 = I. In this class of programs, we assume x* is interior to S. Furthermore, for this class of programs we use the IHR algorithm with H chosen to be equal to A'A. An elliptical program can be interpreted geometrically as a problem with level sets that are elliptical in shape, and nested about the optimum. The function h(r) can be interpreted as a means of layering the level sets. In the special case where A = I the level sets become spherical in shape, and the variable r can be geometrically interpreted as the radius of the level set. We will call the special case of A = I a spherical program. Notice that an elliptical program is convex if and only if h(r) is convex in r. The class of elliptical programs also includes cases where h(r) is nonconvex in r. Much of the analysis is developed for spherical programs and then extended to elliptical programs using the A matrix. For spherical programs, we use the geometric interpretation of the radius of the level set and define {Rk}, k = 0,1,2,... to be the sequence of radii associated with the level sets of the improving points generated by Improving Hit-and-Run, i.e. Rk = IIXjk -x.li, where Xjk is the kth record point. Also, Rk = h-(f(Xjk)) = h-(Yjk). Notice that the sequence {Rk} is defined only for record points, and thus is strictly decreasing, i.e. Rk > Rk+i for k = 0, 1,2,.... To measure computational complexity we define a sample point count, N(r), as the number of points sampled to achieve Yj < h(r) for 0 < r < q. The sample point count is equal to the count of iterations and equivalently the count of function evaluations, since each iteration involves exactly one additional function evaluation. There is very little overhead other than the function evaluation associated with each iteration, so this is a very accurate indication of total computing time. In the next section we prove our main result that the expected value of N(r) is at most O(n5/2). For the analysis we also define an improving point count, K(r), as the number of improving points needed to achieve Yj, < h(r) for 0 < r < q. As an intermediate result, we prove that the expected value of K(r) is at most quadratic in dimension. 3.2 Complexity Analysis We now turn to an analysis of the complexity of Improving Hit-and-Run. To simplify the presentation of the analysis, we assume without loss of generality that Yo = y*, and S equals the level set associated with Xo. We also assume for the spherical program that Ro = q. 7

We will proceed by first showing that we can restrict our attention to the case where A = I, i.e. a spherical program where the level sets are concentric spheres instead of ellipsoids. To facilitate the proof of this result, we transform an elliptical program (P) with corresponding matrix A into a spherical program (P) by defining a transformed point x = Ax, and S= { E R: A-' E S} and f:S - R f(5) = (A- ). Denote the transformed problem of minimizing f over S by (P), and note that min f(x) = min f(x). xES iES Theorem 3.1 Performing the IHR algorithm on the elliptical problem (P) min f (x) xES using H = A'A is equivalent (under the identification x - Ax) to performing the IHR algorithm on the problem (P) min f (x) aES using H = I. Proof: See Appendix. It follows from Theorem 3.1 that in the remainder of the complexity analysis, we need only consider the class of spherical programs. Since the number of points generated is invariant, all complexity results that will be obtained for this class also hold for the more general case of elliptical programs. We next determine the conditional distribution of improvement for Improving Hit-andRun on a spherical program. The conditional probability of making a specified improvement on a single iteration depends on the position of the current point, P(Yj+ < yIYj = w) = E[P (Yj+ < y\Xj,Yj = w)] E [P (Xj+ E PylXj Y, = w)]. The probability within the last expectation can be expressed as an integral in terms of the random direction generated from the point x, and the ratio of the length of improving line segment(s) in that direction, P (Xj+, E PyXj =, Y= ) = j IILpy(dx) Il/lLs(d x)11 dFaD(d) (1).7 =, Y,= w) D ) 8

where AD is the boundary of the unit sphere, and FD(-) is the cumulative distribution function for the normalized random direction vector. Note that the normalized direction vector has a uniform distribution on the boundary of the unit sphere (see [12]). The expression I|Lp (d, x) I is the combined lengths of the line segments formed by the intersection of the level set Py with the line in direction d originating at x. In words, the probability of achieving an objective function value of at most y, starting at point x with f(x) = w, is the probability of landing within Py given direction d, i.e. IlLpY(d,x) JII/Ls(d,x)ll, integrated over all feasible improving directions. For general mathematical programs, the conditional probability of improvement depends on the exact location of Xj, and makes it difficult to derive a general expression. However, for the class of spherical programs, we can analytically derive the conditional probability of improvement. Lemma 3.2 For any spherical program (P), the conditional probability of improvement on the next sample point for IHR is given by P (Yj+ < h(s)lYj = h(r),Yo = h(q)) p(s; r,q) r (L\ n- 1 1 1 n+2 52 52 q) (r) n ( 2 '2' 2' 2;q2)r2' for 0 < s < r < q, and for j = 0, 1,2,..., where Fi(a, b, c, d; x, y) is a generalized hypergeometric function (see [1] or [10]). Proof: See Appendix. Notice that the special case where s = r, denoted p(r; r, q), is the probability that a sample point is improving given the current point is at Yj = h(r) and simplifies to (using [10] pp. 1055, equation 10): r 1 n- 1 r n+ (1r p(r;r,q) = F ( 7 +1 r (2) P q y(n) 2' 2; q2 where -y(n) = ( 2 r(n) for 0 < r < q, and for j = 0, 1, 2,..., where F(a, b; c; z) is Gauss' hypergeometric function (see [1] or [10]). We now express the expected number of sample points, and hence the expected number of function evaluations, in terms of the expected number of improving points. 9

Theorem 3.3 For any spherical program (P), the expected number of IHR sample points needed to achieve an objective function value of h(r) or better is bounded by E [K(r)], IN(r)I < p(r; r,q) < q (n) E [K(r)] for 0 < r < q and where 7(n) (2 ) o(Vn). Proof: See Appendix. We now analyze the performance of improving points of IHR. We start by evaluating a conditional probability for improving points and then develop a bound on the expected number of improving points E [K(r)] in terms of dimension. Lemma 3.4 For any spherical program (P), the conditional probability distribution of improving points for IHR is given by p(r; r, q) P (Rk+l <slR= r,Roq) p(r;r, q) > (s/r)n n forO < s< r < q, and fork =0,1,2,.... Proof: See Appendix. Theorem 3.5 For any spherical program (P), the expected number of IHR improving points needed to achieve an objective function value of h(r) or better is bounded by E[K(r) < nE [K(r)PAs] = o(n2) for 0 < r < q where K(r)PAS is the corresponding expected number of improving points for PAS. Proof: See Appendix. 10

The culminating result of polynomial complexity follows from combining Theorems 3.3 and 3.5 with Theorem 3.1. Corollary 3.6 For any elliptical program (P), the expected number of IHR sample points needed to achieve an objective function value of h(r) or better is bounded by E NE[K(r)PAS] E [N(r)] < qr (n) n E [K(r)P] = 0(n5/2) for O < r < q. Proof: From Theorem 3.3 we have, for 0 < r < q, E [N(r)] < rq y(n) E [K(r)] and from Theorem 3.5 < q(n) nE[K(r)P ] which yields 0(n5/2). 4 Discussion Improving Hit and Run has been shown to have a search effort that is polynomially bounded in dimension for the class of elliptical programs. This complexity is attainable for strictly convex quadratic programs by choosing H to be the Hessian of the objective function. Although this class is small, we can (in principle) construct an algorithm with asymptotically the same polynomial complexity if the objective function is twice continuously differentiable at the global minimum x,, and if the Hessian H* at that point, is strictly positive definite. In that case, we can, for values of x sufficiently close to x*, approximate the objective function by f(x); f. + -(x-x*)'H*(x-x, ) 2 by using the second order Taylor approximation at the global minimum. The approximating function is now a strictly convex quadratic function, for which the complexity results in this paper apply. However, we should know H* in advance to implement the algorithm. Alternatively, if f is twice continuously differentiable everywhere, we can, at iteration j, approximate the objective function by f (x f ) + Vf(X)( - X) + f (x- - X)'H(Xj)(x - X,) 11

where Vf(X,) denotes the gradient at Xj, and H(Xj) denotes the Hessian at Xj. This suggests that, in every iteration, we should use the Hessian at the iteration point in the IHR algorithm. If this scheme is computationally too expensive, or if H(Xj) is not positive definite at some iteration point, one should turn to approximation methods for the Hessian. For example, DFP and BFGS approximation schemes are used in quasi-Newton local search algorithms (see e.g. [20]). The original hope for Improving Hit-and-Run was that its computational complexity would be comparable to pure adaptive search. If extremely long sequences of Hit-andRun were used to approximate PAS, then we would expect a linear bound in dimension on improving points at a comparably high cost of obtaining an improving point. For IHR, which has extremely short sequences of Hit-and-Run (length one), the expected number of improving points has a quadratic bound which although worse than PAS is associated with a comparably low cost of obtaining an improving point. In fact, the total expected number of function evaluations is only 0(n5/2). Thus IHR can be viewed as an optimizing version of Hit-and-Run, and the complexity results provide support for the value of short Hit-and-Run sequences within an optimizing framework. 12

References [1] M. Abramowitz and I.A. Stegun, eds., Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables (National Bureau of Standards, Applied Mathematics Series 55, June 1964). [2] C.J.P. Belisle, H.E. Romeijn, and R.L. Smith, "Hit-and-Run Algorithms for Generating Multivariate Distributions," to appear in Mathematics of Operations Research. [3] C.J.P. Belisle, H.E. Romeijn, and R.L. Smith, "Hide-and-Seek: A Simulated Annealing Algorithm for Global Optimization," Technical Report 90-25, Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, September 1990. [4] H.C.P. Berbee, C.G.E. Boender, A.H.G. Rinnooy Kan, C.L. Scheffer, R.L. Smith, and J. Telgen, "Hit-and-Run Algorithms for the Identification of Nonredundant Linear Inequalities," Mathematical Programming 37 (1987) 184-207. [5] A. Boneh, "A Probabilistic Algorithm for Identifying Redundancy by a Random Feasible Point Generator (RFPG)," in: M.H. Karwan, V. Lotfi, J. Telgen and S. Zionts, eds., Redundancy in Mathematical Programming (Springer-Verlag, Berlin, 1983) [6] H. Cramer, Mathematical Methods of Statistics, (Princeton University Press, 1946). [7] L.C.W. Dixon and G.P. Szeg6, eds., Towards Global Optimization (North-Holland, Amsterdam, 1975). [8] L.C.W. Dixon and G.P. Szego, eds., Towards Global Optimization 2 (North-Holland, Amsterdam, 1978). [9] W. Feller, An Introduction To Probability Theory And Its Applications, Volume 2, 2nd Edition (John Wiley and Sons, New York, 1971). [10] I.S. Gradshteyn and I.M. Ryzhik, eds., translated by Alan Jeffrey, Table of Integrals, Series, and Products (Academic Press, New York, 1980). [11] M.H. Karwan, V. Lotfi, J. Telgen, and S. Zionts, eds., Redundancy in Mathematical Programming (Springer-Verlag, Berlin, 1983) 108-134. [12] D.E. Knuth, The Art of Computer Programming, Vol. 2 (Addison-Wesley, Reading, Massachusetts, 1969) 116. [13] J.P. Lawrence III and K. Steiglitz, "Randomized pattern search," IEEE Transactions On Computers C-21 (1972) 382-385. 13

[14] K.G. Murty, Linear Programming (John Wiley and Sons, New York, 1983). [15] V.A. Mutseniyeks and L. Rastrigin, "Extremal control of continuous multi-parameter systems by the method of random search," Engineering Cybernetics 1 (1964) 82-90. [16] N.R. Patel, R.L. Smith, and Z.B. Zabinsky, "Pure adaptive search in Monte Carlo optimization," Mathematical Programming 43 (1988) 317-328. [17] L.A. Rastrigin, "Extremal control by the method of random scanning," Automation and Remote Control 21 (1960) 891-896. [18] L.A. Rastrigin, "The convergence of the random method in the extremal control of a many-parameter system," Automation and Remote Control 24 (1963) 1337-1342. [19] S.M. Ross, Stochastic Processes (John Wiley and Sons, New York, 1983). [20] L.E. Scales, Introduction to Non-linear Optimization (Macmillan, 1985). [21] G. Schrack and N. Borowski, "An experimental comparison of three random searches," in: F. Lootsma, ed., Numerical Methods For Nonlinear Optimization (Academic Press, London, 1972) pp. 137-147. [22] G. Schrack and M. Choit, "Optimized Relative Step Size Random Searches," Mathematical Programming 10 (1976) 270-276. [23] M.A. Schumer and K. Steiglitz, "Adaptive step size random search," IEEE Transactions On Automatic Control AC-13 (1968) 270-276. [24] R.L. Smith, "Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions," Operations Research 32 (1984) 1296-1308. [25] F.J. Solis and R.J.-B. Wets, "Minimization by random search techniques," Mathematics Of Operations Research 6 (1981) 19-30. [26] A. Sommerfeld, Partial Differential Equations in Physics (Academic Press, New York, 1949). [27] Z.B. Zabinsky, Computational Complexity Of Adaptive Algorithms In Monte Carlo Optimization (Ph.D. Dissertation from The University of Michigan, Ann Arbor MI, 1985). [28] Z.B. Zabinsky and R.L. Smith, "Pure adaptive search in global optimization," to appear in Mathematical Programming. [29] Z.B. Zabinsky, D.L. Graesser, M.E. Tuttle, and G.I. Kim, "Global Optimization of Composite Laminates Using Improving Hit-and-Run," in: Recent Advances in Global Optimization, eds.: C.A. Floudas, and P.M. Pardalos, Princeton University Press, 1992. 14

Appendix A Proofs for the Complexity Analysis Theorem 3.1 Performing the IHR algorithm on the elliptical problem (P) min f(x) xES using H = A'A is equivalent to performing the IHR algorithm on the problem (P) min f (x) iES using H = I. Proof: We show that the following two methods are equivalent. Given Xj E S, the first method is to generate Xj+l using IHR as defined in the text on the original problem (P) with H = A'A. The second method is to transform the given Xj into Xj = AXj, perform IHR on the transformed problem (P) with H = I, and then transform the point Xj+l back into the original space, X>+ 1 = A-lX+i. We now show that Xj+1 is stochastically equivalent to X! Method 2: Step 1'. Set Xj = AXj. Generate a direction vector Dj from the normal distribution with mean 0 and covariance matrix I. Step 2'. Generate a step size ~j uniformly from Lj, the set of feasible step sizes from the current iteration point Xj in the direction Dj, i.e. L= { E R:X +ADj E S} If Lj =0, go to Step 1. Step 3'. Update the new point if it is improving: - Xj ~+ sj.D if f(Xj + s.jD,) < f (X) j+l Xj otherwise and set i+l = f(Xj+). Set X+, =A-Xj+i. 15

First note that ADj, where Dj is as in Step 1 of IHR, has the same distribution as Dj in Step 1' of Method 2. Using this, we get L = {A E R:X, + ADj E S} - {A E R: A-Aj + XA-LDj E S} {A E R: A-(Xj + ADj) E S} = {E R: X + ADj E } = Lj. Thus sj from Step 2 and ~j from Step 2' are realizations from the same distribution, i.e. sj d sj. Note that f(Xj + sjDj) - f(A-'Xj + sjAD-'bj) = (Xj + A- j Dj) f f(Xj+Fs.DA) and hence f(Xj+1) = f(Xj+l). Thus the probability of not finding an improvement is the same for both methods, and so if the new point is not improving, then Xj+1 = Xj+1. Now look at the distributions of the new iteration points given that an improvement occurs: Xi, = A-'~+1 -1 XXj+D = A-'(Xj + ~jDj) A-lXj + sjA-Dj _ Xj + sjDj - Xj+1 and again X+1 Xj+l. Thus performing IHR on (P) with H = A'A is equivalent in distribution to the second method of performing IHR on (P) with H = I.. Lemma 3.2 For any spherical program (P), the conditional probability of improvement on the next sample point for IHR can be expressed as P(Yj+, < h(s)I = h(r),Yo = h(q)) = p(s;r,q) = (r\ (S) 1 (n- 1 1 n+2 s2 s) [q n 2 '2'2' 2 q2,"r2' for O < s < r < q, and for j = 0,1,2,..., where Fi(a, b, c, d; x, y) is a generalized hypergeometric function (see [1J or [10]). 16

Proof: We derive the analytical expression for P (Yj+l < h(s)\Yj = h(r), Yo = h(q)) in terms of the radii of the level sets, r. In order to take full advantage of the symmetry of the problem, we shall use a spherical coordinate system (p, 0, qXi, i = 1,..., n - 2) to compute the integral required to evaluate P (Yj+l < h(s)lYj = h(r), Yo = h(q)) p(s; r, q). We let the origin of the spherical coordinates be at Xj, where the radius of the corresponding level set is r. Let the positive xl axis run through the center of the level sets. Figure 1 illustrates a cross section corresponding to fixed Xi, i = 1,..., n - 2 of a level set of radius r within the feasible region of radius q and containing a level set of radius s. Appendix B includes a brief summary of spherical coordinates. The integral that we need to evaluate corresponds to equation 1 in the text, only now we use the spherical coordinate system. An important term in the integral is the proportion of the line intersecting the s-sphere to the line intersecting the q-sphere, which we write as 6(rs0) (see figure 1), where 6(r, s, 0) is the length of a line segment emanating in a direction 0 from a point Xj on a sphere of radius r that intersects with a sphere of radius s. Let Oo be the angle that corresponds to the line segment that is tangent to the inner sphere of radius s. Then, sin 00 = s/r. Using spherical coordinates, the equation of the inner sphere of radius s is given by p2 - 2rp cos 0 + r2 = 2, or (refer to figure 1), p~= rcos 0 ~ 2 - r2 sin20 for 0 < 0 < 00. Now, in the range 0 < 0 < 00 the length of the line segment that lies in the sphere of radius s is independent of the q, because of symmetry, and can be expressed as 6(r, s,) = p+ -p = 2/-r2sin2 0. Similarly we get 6(r, q, 0) = 22-r2 sin2. It now follows easily that (refer to Appendix B for details) 2 laoo r rr r2 w(0,,,, n-2),) dO dl... drn-2 p(s; r, q) = 2 fOr/2 fr f7 fJo7 W(,,, _. X n-2) do dndl d... dqn-2 where w(0, ),,...,,-2)= sin"-2(0) n-3 sinn2-k(k)., = I~k_1 sn"-2- (k=l: 17

for n > 4 (as in [26], pp. 227-228), and (r, s, 0) s2 - r2 sin2 0 6(r, q, ) vq2 _- r2 sin2 8 After simplifying, we have fo~ "2-,r2 sin sin"-2(0) dO /q_-r2 sin2 0 p(s; r, q) = r /2 sin'n-2 (0) deO and since for w, z > 0 the Beta function can be written as ([1] pp. 258), 1B(z, w) I r(z)r(w) = r/2 sin2z-l (t)cos2w-1 (t)dt we can write p(s; r, q) ---- sin- (0)dO. It is easily shown that this expression is also valid for n = 2 and 3. We now concentrate on the integral and go through a series of substitutions in order to get the final expression. First we let a = s2/r2 and b = q2/r2 to get 00 sin n2 (0) d Oo a -in"-= 1/ sin- (0) dO Jo^ - sin2 s-2(J)dO bo b-sin 2/ and then we change variables using sin2 0 = at to get n-2 _.1 a- at\ 1/2 (at)-7 a dt =Jo b-at 2 (at) 1/2 (- at)'/2 n A ( 1-t 1/2t ar [l I^l —' t n-3 2 (b - at)(i - at) t dt a2 1 1-t n-3 2 fo (b/a - t)(l/a - t) which, using 3.211 of [10] pp. 287, reduces to a2 B f3 n-l ) F(n-1 1 1 n+2 a ~ W 2 \2'2 2 2 '2'2' 2 'b'a ' 18

Noticing that B 2'3 n-l) 1 B (n, ) nn we can write _an/2 1 1 (n 1 1 n+2 a p(s;r,q) = bl/2l F1 --— ~'2'2' 2;b and returning to the original variables (r\ s\n 1F n-1 1 1 n+2 s2 s2 _ q rg n_ 2 '2)'2' 2 q2 r) Theorem 3.3 For any spherical program (P), the expected number of IHR sample points needed to achieve an objective function value of h(r) or better is bounded by E [K(r)] E[N(r)] < p(r;r, q) _< q (n) E [K(r)] for 0 < r < q and where 'y(n) = (n+l r(~) = O(v ). Proof: We first express N(r) as a sum, K(r) N(r)= Nm m=1 where Nm is defined to be the number of sample points between Rm_ and Rm, for m = 1, 2,.... Thus Nm is the number of sample points between consecutive improving points (including the sample point corresponding to Rm). Now we take the expectation of both sides to get, E[N(r)] = E[E Nm] = EE [ mKz Nm IK(r), Ro,..., RK(r)-]]. 19

The conditional expectation can be rewritten, for ro > rl >... 7kI> r, and k ~1 E[Z r) NrIK(r) = k, Ro =ro,...,7Rkl1 =rkl]1 - Ek =1 NrIK(r) = k Ro = ro,..Rkl1 = rk.1] - E [ZirIRO = ro0,...,Rk.1l= rk1, Rk< ~r' because the k-th improving point is the first point inside the r-sphere k - ZE [NrnIRo =ro.. IRk-l= rk-1, Rk< r] m= by linearity of conditional expectation k - LE [NrnIRmr = rm-1] rn=1 since Nm is conditionally independent of R, (j =~ m - 1) when given Rm-i k~~~~ rn-i p(rrn-1; rm-1, q) because the number of sample points to achieve the m-th improving point, given Rrni 'm- 1, has a geometric distribution with parameter p('m- 1; rm- 1 q) k 1 rn-i p(r; r, q) ~ince p(r; r, q) ~ p(s; s, q) for s >r k p(r; r,q)' We now can show the first statement in the theorem, E [N(r)] ~ E[p(rrq)] E [K(r)] p(r; r,q)' The second statement follows easily from the expression for p(r; r, q) as given in the text in equation 2, where -y (n) =Noting that F (1 n-1; n+1; — ) < 1, we have E (N(r)] ~ - y(n) E [1K(r)]. r 20

Now for large n, we have n/ (see [6]), yielding 7y(n) = O(V/ni).. Lemma 3.4 For any sphemical program (P), the conditional probability for improving points for IHR can be expressed as P (Rk+l <slRk=r,Ro=q) =p(s; r, q) pXr; r,q) n for0< s< r <q, and for k =0)1 2).. Proof: P (Rk+l < sIRk = r, JO q) =P (Y?,+l < h(s)jY = h(r), Yo = h(q)) +P (yj+2 < h(s),IYj7+j ~ h(r) I Y = h(r),YO= h(q)) =p(s; r,q)~+p(s; r,q) (I1-p(r; r, q))~.. A pS; r, q) E (1 - p(r; r-, q))t i=o p(s; r, q) p(r;,r,q)' From Lemma 3.2 and equation 2 we have p(s; r,q)= and _F n- 1 1 1 n+2 S2 S2 q I.-. 2 r n 2 2 2' 2 ' q2) r 7' r(n) 1 n- 1 n+l r2 2 F - q ]p(n+l)]p( 1) 2' 2 2 ' q2 ' 2 2 p(r; r,q)= It is easy to see from definitions that F(a, b; c; z) is increasing in z for z > 0 when a, b, C > 0. Also, F1 (a, b, c, d; x, y) is increasing in both x and y for x, y > 0 when a, b, c, d > 0. Thus, F(,f I n+lr2) - p(n~~l).p(l) F( ) 21

from [10] 9.122.1 pp. 1042 for 0 < r/q < 1. Similarly, Fi n-1 1 1 n+2 s2 s2 2 )2'2' 2 'q2 r2/ >~ F1n-1 I I n+2 ) -> 1(-2 ' 2' 2' 2; = 1. Hence p(s;r,q) p(r;r, q) / n 1 F1 ( n-l 1 1 n+2, s2 2^) IsJn F, 2; 2 r( ) VrJ n r^ 1 n-1. n+l. r2 rp^n > (s/r)n| n Theorem 3.5 For any spherical program (P), the expected number of IHR improving points needed to achieve an objective function value of h(r) or better is bounded by E [K(r)] < n E [K(r)PAS] 0(n2) for O < r < q. Proof: From Lemma 3.4 we have a bound on the conditional probability for improving points for IHR, for 0 < s < r < q P (RR+ < slRIR r)> (s/r)" Similarly, for pure adaptive search (PAS) we have [28], P (RPS < sIR AS r) = (s/r)n. Now we define the following intermediate algorithm, called A. Algorithm A Step 0. Initialize Xo E S, and k = 0. Step 1. With probability 1/n, choose Xk+l uniformly from the set {x: x E S, and f(x) < f(Xk)}. Otherwise, set Xk+1 = Xk. 22

Step 2. Increment k, and return to Step 1. Algorithm A performs a PAS step with probability 1/n. Thus for 0 < s < r < q we have, P (R, < sR = r) = (s/r) and further, P (RA+ < slR =) < P (R +R < sRIcR =r). We have defined E [K(r)] to be the expected number of improving points needed to achieve Rk < r, and we now extend the definition for the three algorithms and add a superscript. We now have E [K(r)HR], E [K(r)PAS], and E [K(r)A]. With the above comments, we have the following, E[K(r)'HR] < E[K(r)A] - nE [K(r)PAS] 0(n2). B Spherical Coordinates Here we summarize the spherical coordinates in n dimensions as given in [26], pp. 227-228. We only include those details that are necessary for the proof of Lemma 3.2. Let xl, X2,..., Xn denote the Cartesian coordinates and let p, 0, <1,..., bn-2 denote the spherical coordinates. For n > 3, the relationships are given by X1 = pcos0 2 = p sin 0 cos Xl 3 = psin 0sin l cos b2 x,-_l = psin sin kl sins2 2... sin,n-3 cos qn-2 = p sin 0 sin 0i sin 2 * *.. sin n-3 sin n-2. Note that p = ||lxi and that 0 is the angle between x and the positive xl axis. In order to cover the whole space -oo < xl < +oo once, we need 0 < r < +oo; 0 < 0 < 7r; 0 < Xi < 7i for i = 1,..., n -3; and -r < gn-2 < r. The surface element of the unit sphere is given by dw = J(0), 17,..., (n-2) dO di... dn-2, where, w(0, s,,... o-2) = sinn-2 (0) n sin"-2-k( ) for n > 3. 23

IX1axis - -O -No -10 r q Figure 1: Illustration of nested level sets for proof of Lemma 3.2 24