Sampling through Random Walks* H. Edwin Romeijn Department of Operations Research & Tinbergen Institute Erasmus University Rotterdam P.O. Box 1738 3000 DR Rotterdam, The Netherlands Robert L. Smith Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, Michigan 48109-2117 August 1989 Revised December 1989 *This material is based on work supported by a NATO Collaborative Research Grant, no. 0119/89 0

1 Introduction In this paper we will discuss the following problem: Given a bounded set S C 1Zn, how can we efficiently sample points X1,X2,... which are distributed according to a probability density function f over S. That is, we require that the probability that X E A C S be given by 4 f (x)dx =: F(A). In [Smith, 1984] a class of symmetric mixing algorithms (later called "hit-and-run algorithms", see [Berbee et al., 1987]) was introduced, which generate a sequence of points in a bounded, open region 5, having the property that the limiting distribution of this sequence is the uniform distribution, independent of the initial point. In [Boender et al., 1989] a class of "shake-and-bake" algorithms is discussed, which generate a sequence of points on the boundary of a polytope with a uniform limiting distribution. Recently this class of algorithms has been extended to the boundary of arbitrary bounded convex bodies [McDonald, 1989]. We will generalize both classes of algorithms mentioned above to obtain a limiting distribution given by F. Conditions on F (or f) will be given for this to be possible. Possible applications of these algorithms include the estimation of volumes or surface areas. For example, the surface area of engine components is important to General Motors, for estimating heat loss over time. Also, these methods can be used for Monte Carlo integration, which is important in Bayesian statistics when the dimension of the problem is too high for numerical integration to be efficient. A third possible application is in global optimization. We will study this application in this paper. The outline of the paper is as follows. In section 2 we will introduce a general class of Random Walks, and give some examples. Also, we will give conditions under which a Random Walk has a limiting distribution given by a density function f. In section 3 we will define a class of "hide-and-seek" algorithms for global optimization. 2 Random Walks 2.1 The general class Consider bounded set S C Zn. The general random walk can be described as follows. Step 0: Choose a starting point x~ E S, and set i:= 0. Step 1: Generate a direction d' from an arbitrary probability distribution over a direction 1

set D. Find the set Ai = {A E RlZxi + Adi E S}. Step 2: Generate a real number At from an arbitrary probability distribution over the set A2. Step 3: Set xi:= x + Aidi and set i:= i + 1. Go to Step 1. Clearly, the random sequence of points {Xit}~ constitute a continuous state homogeneous M;Iarkov chain. We assume that the transition density function p(ylx) corresponding to this chain exists. Furthermore, we assume that there exists a continuous density function f over S such that p(ylx)f(x) = p(xly)f(y) V x,y E S. Algorithms satisfying these assumptions will be called Random Walks. An example is the hypersphere directions (HD) hit-and-run algorithm, discussed in [Boneh & Golan, 1979], [Smith, 1980], [Smith, 1984] and [Berbee et al., 1987]. In this algorithm the distributions in steps 1 and 2 are chosen to be the uniform distribution, and the direction set D is the unit hypersphere. The density f is the uniform density over S. 2.2 Examples In the following we will assume that a continuous density function f on S is given. For Random Walk 1 the region S must be open. Furthermore, the direction set D is the unit hypersphere, and a direction will be drawn from the uniform distribution over D. We choose the density of At to be the marginalization of f on the line set defined by A': f(x\ + AXd) fi(A) = - f(x + - )d) for A E Ai. fA, f (x' + pld') dty Theorem 2.1 The transition density function corresponding to Random Walk 1 reads: 2 f(y) pi(ylx) = cn ||l) f () d for all x $ y E S, Cn 11X - ylln-1 f~,?) f (X +:;Y-) d\ where Cn is the surface area of the n-dimensional unit hypersphere, and A(x,y) = {A E 1Zl + AZ1 E S}. Proof For all y E S, y: x the transition density function can be defined as: Pr {X E H,(y)X~ = x} Pl(ylx) l= limmn ( Io mn(H,(y)) 2

where Ha(y) is an n-dimensional cube of content 6n centered at y and oriented along the ray from x to y, and where m,( ) denotes the n-dimensional Lebesgue measure. We have: p(y) Pr {X1 E H,(y)IX~ = x} p~ (ylx) - limr lo mn(HU(y)) 2,n- 1 f(y) 1 lim CIO Cn 11 - yln"-1 fA(.) ( + a- ) dA En 2 f(y) Cn jlX- ylj"-1 fA(x,Y) f(x + A-) dA which proves the theorem. A In Random Walk 2 the class of regions S, the direction set D and the distribution of the direction will be the same as for Random Walk 1. The density of A' is a mixed discrete/continuous one: Pr{A' O} = 1 - f3(ylx) and f ()= (yIX) for A E A' d(x,y) where d(x, y) is the diameter of S along the line through x and y, and 3(ylx) is the socalled move probability function (see [Boender et al., 1989]), which satisfies the following conditions: 0 < O(ylx) < 1 for all x,y E S 3(yx)f() f = l(xly) f(y) for all x,y E S. Theorem 2.2 The transition density function corresponding to Random Walk 2 reads: 2 /3(ylx) P2 ( C\ IIx- - y11n- d(x, y) for x = y E S, and p2(xlx):= Pr{X1 = xIX~ = x} = 1 -/ p2(Y Ix) dy for all x E S. 3

Proof Analogous to the proof of theorem 2.1. 1 In Random Walk 3 S will be the boundary of a full-dimensional polytope. The direction set D is, once again, equal to the unit hypersphere. The density of the direction will be expressed in terms of the "hitpoint" y, which is the intersection of S with the line through a point x E S with a certain direction vector, and denoted by p(ylx). We assume that this density is absolutely continuous. The density of A' is again a mixed discrete/continuous one: Pr{A = 0} = 1 - = 3(y x) and Pr{A = Ily - x11} = 0(ylx). Note that in this case Ai = A(x,y) = {0, Ily - xl}. Furthermore, f(ylx) is the move probability function, satisfying the same conditions as in Random Walk 2. Theorem 2.3 The transition density function corresponding to Random Walk 3 reads: p3(ylx) = (y^x) p(ylx) for x $ y E S, and p3(xjx):= Pr{Xl = xlX = x} = 1 - p3(yx) dy for all x E S. Proof See [Boender et al., 1988] and [Boender et al., 1989]. 1 2.3 Limiting density Lemma 2.4 If the transition density function p of a Random Walk on a set S satisfies p(y I) f(x) = p(xly) f(y) for all x $ y e S where f is a continuous density on S, then f is a stationary density of the Markov chain {X}1, i.e. Pr{X'+1 E AIX' = x} f(x) dx = F(A) for all A C S. 4

Proof j Pr{Xit+ E AX' = x} f(x) dx = = J s p(ylx)f(x)dydx + Pr{X+l = xX = x} 1A() f(x) dx =,5J p(xly)f(y) dydx + {1 - j p(yIx)dy f (x)dx = I JsP(xY)f(y)dxdy+ f(x)dx- /A /p(ylx)f(x)dydx = A f(x)dx = F(A). d Theorem 2.5 If f is a stationary density of a Random Walk, and the absolutely continuous component of the transition density function is uniformly bounded away from zero,'then f is the limiting density of the Markov chain generated by the Random Walk. Proof Since the absolutely continuous component of the transition density function p of any Random Walk is uniformly bounded away from zero, the stationary density f is unique, and the limiting density of the corresponding Markov chain is equal to f (see [Doob, 1953]). * 2.4 Examples (continued) In this section we will derive more specific conditions on the Random Walks described in section 2.1 such that these algorithms have a limiting density given by f. Theorem 2.6 If fmi, inf f(x) > 0 xES fma:= sup f(x) < o0 xES then the limiting density of the Markov chain generated by Random Walk 1 is f. Proof Under the conditions mentioned in the theorem the transition density function pi is uniformly 5

bounded away from zero: 2 f(y) pl(ylx) = () Cn IIx - yllIn- fA(x,y) f(x + A I11) dA 2 frnin > —- m >0 C, rSn- rs fma where rs < oo is the maximum diameter of S: rs = max Ix - yjl. x,yES Now use Theorem 2.5. * Theorem 2.7 Suppose the conditions in Theorem 2.6 hold. Possible choices for the move probability function I, such that Random Walk 2 generates a sequence of points having a limiting density given by f are: 3(ylx) = f( fmax fmin 3(ylx) = f(x) f (Y) 3(ylx) = f(x) + f(y) 0(ylx) = min1, f(y)/f(x)}. Proof The absolutely continuous component of the transition density function P2 is uniformly bounded away from zero: 2 f(ylx) Cn IX - ylnl-1 d(x,y) >.2 fmin/2fmax >. C- Crn- rs Now use Theorem 2.5. 1 Consider the HD algorithm (see section 2.1), and choose S to be a strip of width e around the boundary of a polytope. Letting e go to zero gives as a "natural" choice for the acceptance probability function f of Random Walk 3: 6

l/cos ~ 1/cos 1, + 1/cos y' It is possible to prove that the algorithm Random Walk 3 with this acceptance probability function generates a uniform limiting distribution on the boundary of a polytope (see [Boender et al., 1989]) if the direction vector is uniformly distributed over the unit hypersphere (see [Boender et al., 1989]). If, instead of a strip of width a, we choose S to be a strip of width ef(x) around the boundary of a polytope, we get the following choice for the acceptance probability function by letting e go to zero again: f ( ) = f (y) /cos O /3f(yx) = f(x)/ cos $ + f(y)/ cos qy' Theorem 2.8 Suppose the conditions in Theorem 2.6 hold. Then we have that the Random Walk 3 algorithm with the direction vector uniformly distributed on the unit hypersphere and acceptance probability function /3f generates a limiting density on S given by f. Proof Using the result from [Boender et al., 1989] for the case where f is the uniform density, and using Theorem 2.5 the result follows immediately. a In [Boender et al., 1989] a class of so-called "SB algorithms" is discussed. These algorithms generate a sequence of points on the boundary of a full-dimensional polytope having a uniform limiting distribution. In the following theorem we will consider an element of this class of algorithms, characterized by the density of the "hitpoints" p(ylx) and the move probability function f(ylx), like in Random Walk 3. We will modify the move probability function in such a way that the resulting version of Random Walk 3 will have a limiting density given by f. Theorem 2.9 Suppose the conditions in Theorem 2.6 hold. Furthermore, suppose an SB algorithm is characterized by the density of the hitpoints p and the move probability function 3. Then the Random Walk 3 algorithm characterized by the same hitpoint-density, and by a move probability function given by 7(f(y)If(x), fmin, fmax) / 3(ylx), such that: 0 < 6 <_ 7(f(y)lf(x), fmin, fmax) < 1 for all x, y S y(f(Y)lf(x), fmn, fma.) f(x) =?(f(x)lf(y), fmin, fmax) f(y) for all x,y E S. has a limiting density given by f. Proof Using the result from [Boender et al., 1989] for the case where f is the uniform density, and using Theorem 2.5 the result follows immediately. a 7

3 Hide-and-seek algorithms Consider the following mathematical programming problem: max f(z) xES where 1. S C RZn is full dimensional and compact. 2. f is twice continuously differentiable. It is well-known that the number of iterations necessary to get within E of the optimal solution increases exponentially in the dimension n of the problem if we use Pure Random Search to approximate the optimal solution of this problem (see [Dixon & Szeg6, 19787], [Rinnooy Kan & Timmer, 1984]). In this section we will discuss algorithms which generate points from a density function which is a positive, nondecreasing transformation of the objective function f. We will call these hide-and-seek algorithms. Using the methods discussed in this paper we can generate points in the interior of S which are asymptotically distributed to some density function defined on S. [Rubinstein, 1981] suggests that points should be generated from the density g(x; A) oc eAf(x) (1) for large values of A. [Pincus, 1968] showed that the expected value of a random variable defined on S with this density converges (for A -+ oo) to the point at which the global maximum of f on S is attained, under the condition that this global maximum is unique. We conjecture that the distribution of this random variable for A -* oo converges to a (degenerate) distribution which is uniform over the set of global maxima. Note also that simulated annealing algorithms, which have mainly been used for discrete optimization purposes, generates a sequence of points having the same limiting distribution. (See e.g. [Aarts & Korst, 1988]). So, in this sense our class of hide-and-seek algorithms subsumes simulated annealing. Our parameter A corresponds to the reciprocal of the temperature used in simulated annealing. For a given value of A, the probability that a point generated according to the density (??) lies within a ball with radius E around a global optimum x* (denoted by Be(x*)) is given by Pr{X E B (x*)} - fBt() eg(x;x) dx fs eg(x;A) dx 8

W e are now interested in how the value of A changes with the dimension of the problem and the precision parameter E, if we want this probability to be at least p. VWe first analyze this problem for the case f(x) = -x'x. The density function g then corresponds to a (truncated) normal density. Now suppose that x' = 0, the global optimum on Rn, is an interior point of S, and define Be = Be(O). Then we have: fB f -Xx'x dx Pr{X E B} > e d -fJn e-A:x' dx for E small enough. If we use this lower bound on the probability, we are in fact assuming that the elements Xi of X are i.i.d. according to a normal distribution with mean 0 and variance 1/2A. Performing the transformation Yi = v/'2Xi gives: Pr{X E Be} = Pr{Y E By-} = Pr{llYJ12 < 2Ae2}.'W'e know that the distribution of llY112 iS X2(n), and so this probability is equal to p E (0, 1) if 2As2 = y(p; n) or, equivalently, if A y(p;n) 2e2 where y(p; n) is chosen such that Pr({1Y112 < y(p; n)} = p. Using the central limit theorem, we can for large n approximate y(p; n) as follows: y(p; n) x n + /2 z(p) where z(p) is chosen such that Pr{Z < z(p)} = p where Z has a standard normal distribution. So, for large n, we have the following (approx imate) result: If n+ ~ 2nz(p) -A> 2e2 9

then Pr{X E B,} = Pr{llX - x*11 ~} > p. So, we have A = O(n/~2). Suppose now that, apart from the conditions mentioned above, our objective function f satisfies the following conditions: 1. -H(x*) (the negative of the Hessian matrix at the optimum x*) is positive definite. 2. x* is an interior point of the feasible region S. 3. c is chosen such that f can be approximated by a quadratic function in the region B,(x*). Conditions 2 implies that the gradient at x* is zero. Using a Taylor expansion of f at x* we get: f(x) f f(x*) + (x -x*)xH(x - x*) if x is "close" to x*. If furthermore the following inequality holds: js e~'(~) d < js eA(f(x*)+Z(x-")'H(x-"x)) d then we can generalize the abovementioned result as follows: If A > y(p; n) - 220min then Pr{X E B,(x*)} = Pr{||X - x*l < )} > p where Omin > 0 is the smallest eigenvalue of -H(x*). Using the same approximation of y(p; n) as above we obtain again A = O(n/e2). 10

UNIVERSITY OF MICHIGAN 3 9015 04732 6965 References Aarts, E.H.L., and J.H.M. Korst. 1988. Simulated Annealing and Boltzmann Machines. Wiley, Chichester. Berbee, H.C.P., C.G.E. Boender, A.H.G. Rinnooy Kan, C.L. Scheffer, R.L. Smith, and J. Telgen. 1987. Hit-and-run algorithms for the identification of nonredundant linear inequalities. Mathematical Programming 37, 184-207. Boender, C.G.E., R.J. Caron, J.F. McDonald, A.H.G. Rinnooy Kan, H.E. Romeijn, R.L. Smith, J. Telgen, and A.C.F. Vorst. 1989. Shake-and-bake algorithms for generating uniform points on the boundary of bounded polyhedra. Technical Report 89-24, Department of Industrial and Operations Engineering, The University of Michigan, Ann Arbor, Michigan. Boender, C.G.E., A.H.G. Rinnooy Kan, H.E. Romeijn, and A.C.F. Vorst. 1988. Shake-andbake algorithms for generating uniform points on the boundary of bounded polyhedra. Technical Report 8826/A, Econometric Institute, Erasmus University Rotterdam, The Netherlands. Boneh, A., and A. Golan. 1979. Constraints redundancy and feasible region boundedness by random feasible points generated. Paper presented at EURO III. Dixon, L.C.W., and G.P. Szeg6, eds. 1978 Towards Global Optimization 1 and 2. NorthHolland, Amsterdam, The Netherlands. Doob, J.L. 1953. Stochastic Processes. Wiley, New York. MN'cDonald, J.F. 1989. Monte Carolo procedures for generating points uniformly distributed over the surface of a bounded convex region. Technical Report 89-03, Department of Mathematics and Statistics, University of Windsor, Windsor, Ontario, Canada. Pincus, M. 1968. A closed form solution of certain programming problems, Operations Research 16, 690-694. Rinnooy Kan, A.H.G. and G.T. Timmer. 1984. Stochastic methods for global optimization. American Journal of Mathematical and Management Sciences 4, 7-39. Rubinstein, R.Y. 1981. Simulation and the Monte Carlo Method. Wiley, New York. Smith, R.L. 1980. Monte Carlo procedures for generating random feasible solutions to mathematical programs. Paper presented at the ORSA/TIMS conference, Washington, DC. Smith, R.L. 1984. Efficient Monte Carlo procedures for generating random feasible points uniformly over bounded regions. Operations Research 32, 1296-1308. 11