Optimal Direction Choice for Hit-And-Run Sampling Department David E..\4$aufman Robert L. Smith of Industrial and Operations Engineering University of Michigan Ann Arbor, Michigan 48109 Technical Report 90-08 March 1990 Revised May 1991

OPTIMAL DIRECTION CHOICE FOR HIT-AND-RUN SAMPLING David E. Kaufman Robert L. Smith Department of Industrial and Operations Engineering University of Michigan Ann Arbor, Michigan 48109 May 16, 1991 Abstract Hit-and-Run algorithms are Monte Carlo procedures for generating points that are asymptotically distributed according to general continuous target distributions G over open bounded regions S. Applications include nonredundant constraint identification, global optimization, and Monte Carlo integration. These algorithms are reversible random walks which commonly apply uniforml distributed step directions. We investigate nonuniform direction choice and show that under minimal restrictions on the region S and target distribution G, there exists a unique direction choice distribution, characterized by necessary and sufficient conditions depending on S and G, which optimizes the rate of convergence. We provide computational results demonstrating greatly accelerated convergence for this optimizing direction choice.

%;Q v, k"ll %kt', 1-4,.114, 14;,— O 14. N — -tj,QI) I-ft - '3

1 Introduction We consider the Monte Carlo problem of generating a sample of points according to a given probability distribution G over an open, bounded region S in Rn. After motivating the problem through several applications, this section discusses the limitations of exact sampling methods, describes the Hit-and-Run asymptotically exact method, and previews the contribution of the paper to Hit-and-Run direction choice. 1.1 Sampling Applications Nonredundant constraint identification (cf. Karwan et al. [13]). Given a point x satisfying a system of linear inequalities and a unit direction u, an inequality is nonredundant if it is uniquely the nearest constraint to x in direction +u or -u. As a special case, consider points x sampled uniformly from the relative interior of the feasible region of a linear program and directions u sampled uniformly on the unit hypersphere, with all pairs of points and directions independent. The probability that the sample fails to identify any nonredundant constraint decreases to zero as the sample size increases to infinity. We can then more rapidly solve a reduced problem including only the identified nonredundant constraints These may not all be found by finite sampling, but even if the optimal solution to the reduced problem is infeasible for the full problem, the optimum for the dual reduced problem remains feasible and provides a good initial point for solving the full dual problem. Global optimization (cf. Dixon and Szego [8,9], Rubinstein [18]). Deterministic iterative optimization strategies typically choose locally improving search directions and step sizes yielding improvement in each iteration. These methods often yield solutions which are only locally optimal for nonconvex problems. A simple stochastic alternative for global optimization is Pure Random Search [7], which samples points uniformly in the feasible region and reports the sampled point with the best objective function as the optimal solution. More practical methods commonly rely on the Multistart approach, applying local search algorithms from some or all of a group of solutions chosen randomly from the feasible region [17]. Both methods rely on efficient sampling of feasible points. The potential of stochastic global optimization methods is illustrated by Pure Adaptive Search (PAS) [16,24], in which a new iterate is uniformly distributed over the subset of the feasible region superior in objective value to the current iterate. The number of PAS iterations required to approach the global optimum arbitrarily closely is shown to grow only linearly in dimension, a result previously suggested by the computational experience of Solis and Wets [20]. For stochastic global methods in general and PAS in particular, efficient sampling from arbitrary regions is therefore of significant importance. Monte Carlo Integration (cf. Hammersley and Handscomb [11]). We may evaluate I = IS f (x) dx by noting that if the random variable X is distributed according to a positive 1

probability density function g on S, then f W.f ~~~~~~(X)' I=sg( ) E[g(X)]) The sample-mean method of Monte Carlo integration consists of generating X1,...,XN according to g and estimating I by the unbiased estimator I = (1/N) E=1 f (X)/(Xi). Sample-mean estimation may be superior to deterministic numerical methods for nonsmooth integrands with multidimensional domain [18]. The potentially strong dependence of the variance of I on the choice of g motivates interest in sampling from general distributions over S. 1.2 Exact and approximate sampling methods We begin by considering some of the shortcomings associated with the exact sampling methods of transformation, composition, and rejection (Schmeiser [19]). A vector X of n numbers drawn independently from the uniform distribution over the interval [0, 1] is uniformly distributed over the unit cube in 3n. Given a bijective differentiable transformation T from the unit cube onto S, then T(X) is uniformly distributed on S if T has constant Jacobian determinant. However, such transformation techniques are known only for a small class of regions S, such as paralleletopes, hyperspheres, and simplices. Composition techniques express the probability density to be sampled from as a mixture of densities conditioned on a random parameter, for cases when the decomposed densities can be sampled more easily than the composite [18]. These methods tend to be intractable in even moderate dimension [22]. The most generally applicable methods are rejection techniques. Sampling uniformly from a region R enclosing S and rejecting those points not in S, the remaining sample is uniformly distributed over S. However, the expected number of points in R needed in order to hit S grows rapidly in dimension, making rejection techniques inefficient in high dimension. The same problems are associated with sampling nonuniformly according to density g over S, since this is equivalent to sampling uniformly in the region under the graph of g over S. Instead of sampling exactly according to G, Hit-and-Run algorithms generate samples of points whose distributions approach G asymptotically. With each Hit-and-Run algorithm is associated a direction probability distribution H over the unit hypersphere. The algorithm proceeds from a current point xm by generating a direction u according to H and selecting xm+l according to G conditionalized on the resulting line set, i.e., the subset of S lying in the direction Au from xm. The most widely known version is the Hypersphere Directions (HD) Hit-and-Run algorithm, proposed in 1979 by Boneh and Golan [6] and independently in 1980 by Smith [21]. HD selects directions according to a uniform distribution over the unit sphere and chooses 2

iterates uniformly on the resulting line set; in [22], Smith proves that the HD iterates approach a uniform limiting distribution independent of the starting point, and demonstrates experimentally that Hit-and-Run is potentially more efficient than rejection techniques in high dimension. Another example is Coordinate Directions (CD) Hit-and-Run, due to Telgen [23], in which the direction is chosen uniformly from among the 2n coordinate vectors (the vectors which parallel the coordinate axes in Rn) and the new iterate is chosen uniformly on the resulting line set as in HD. These two algorithms were applied in Berbee et al. [3] to identify nonredundant linear constraints, and the same application was also addressed by the related Shake-and-Bake algorithms presented in Boender et al. [5]. Related algorithms addressing global optimization include Improving Hit-and-Run (Zabinsky, Smith, and McDonald [25]) and Hide-and-Seek (Belisle, Romeijn, and Smith [1]). With the exception of the Running Shake-and-Bake variant and the heuristic direction choice methods in Solis and Wets, all of the cited algorithms employ uniform direction distributions. Belisle, Romeijn, and Smith [2] have recently shown asymptotic convergence for general target distributions G for the generalized Hit-and-Run class. Under minimal restrictions on the direction distribution H, choosing new iterates conditionally according to G on the line set causes the distribution of iterates to converge in total variation to G. The direction distribution does not affect the asymptotic target distribution of the iterates, but as we shall see, it has a strong effect on convergence rate. By extending the HD convergence rate analysis of Smith [22], given the region S and target distribution G we will provide a bound on the rate of convergence to the target distribution as a function of H and construct a unique direction distribution H* which optimizes this rate bound. Finally, we provide examples and computational results demonstrating significantly accelerated convergence by employing this nonuniform direction choice. 2 The Hit-and-Run Algorithm Denote the unit ball in RFn by B = {u E Rnl Jlull < 1} and the unit hypersphere by its topological boundary OB. Let S E Rn be an open bounded set. Let G be the target probability distribution on S, assumed absolutely continuous with respect to Lebesgue measure V, with positive density g continuous almost everywhere and bounded from above and away from zero. Let a be the measure giving surface area of measurable subsets of aB, and let H be the probability distribution of direction choice, assumed absolutely continuous with respect to ar and having positive density h. We may now formally define the Hit-and-Run algorithm in S associated with the target distribution G and the direction distribution H. Hit-and-Run Algorithm 1) Choose an arbitrary starting point x~ E S and set m = 0. 3

2) Generate a random direction Um E aB according to H. 3) Select Am from the line set Am - {A E Rlxm + AUm E S} according to the density fAm g(xm + rum) dr 4) Set xm+l xm + AmUm and m = m + 1, and go to step 2. Because step 3 searches for both positive and negative step sizes A in direction u, we may when convenient assume without loss of generality that h(u) = h(-u) for all u E aB. By Theorem 5 of [2] the distribution of the Hit-and-Run iterates {xm, m = 1,2,... converges in total variation to G for any initial x~, i.e., lim Pr(xm E AIx~ = x) = G(A) A E ts,x e S m —oo uniformly in A, where Bs represents the Borel a-field on S. In fact, the cited theorem is more general, not requiring g to be bounded allowing much more general direction distributions H. In particular, our current restrictions on H exclude the CD version of Hit-and-Run (where the direction distribution is not absolutely continuous) and other variations which have zero direction density on some direction set of positive surface measure. However, the convergence rate analysis here requires the stronger conditions. For the HD algorithm, where G and H are uniform distributions, Smith in [22] established the following bound on convergence rate: Pr(xm E Aix~ = x V() < (1- 7 ) A E Bsx E S (1) V(S)nn1 where -y is the ratio of the volume of S to the volume of the sphere in n whose diameter is that of S. As an example, if S is a 10-dimensional cube, 7: 1/250 and the number m of iterations required to upper-bound the error term on the left by 0.01 is over five million. Our goal is the generalize this bound to nonuniform target distribution G and direction distribution H and then find a direction distribution H* which optimizes the bound on rate of convergence to the distribution G over the region S. Ideally, we hope moreover to achieve improvement in experimental performance (i.e., average case behavior) as well as in worst-case behavior. 3 Optimizing the Rate of Convergence Assuming the region S and the target distribution G to be fixed and satisfying the conditions of the previous section, let PH(AIx) be the one-step transition probability distribution for 4

the Hit-and-Run algorithm with direction distribution H. That is, PH(Alx) = Pr(xm+l E Aixm = x) A E Bs,x E S. We introduce the notation uxy - (y- x)/lly- xlI to represent the unit direction from x to y. Also, A(x, y) = {A E RI x + Auxy E S} gives the set of feasible step sizes in the direction between x and y. Lemma 1 Given direction distribution H absolutely continuous with density h, for all x E S the transition probability distribution PH('Ix) is absolutely continuous with density fH(YjX) = (h(uxy) + h(-uxy))g(y) Ify - XI n-l fA(-,Y) g(x + ruy) dr y E 5, y X.1 (2) Proof: Fix x. For 0 < r1 < r2 and measurable D C 9B such that D n (-D) = 0, define CD(rl, r2) = {x + rul u E D, r1 < r < r2}, and let C be the class of all such sets. Let Is(.) be the function indicating membership in S, and let S,~ be a random variable giving the Hit-and-Run step size from x in direction u. Let U be a random variable distributed according to H over 9B. For C = CD(rl, r2) e C, PH(C nl SI x) = Pr(U E D, S.,u E (r., r2)) + Pr(U E -D,-S.,u E (r., r2)) = D Pr(S$,u E (r1, r2))(h(u) + h(-u)) a(du) = ID IsI(~ + s)5 1 fA(X,+U) g(x + ru) dr = s(Y (h(uxy) + h(-uxy))g(y) dy C Is(y){y _ xllin- fA(,y) g(x + ruby) dr by change to spherical coordinates; see e.g. [12, p.4] (h(usy) + h(-uxy))g(y) dy.3 ns Ily - X~~~~~jn-1 f r ~~~(3) Icns jIY- xjj'-' fA(X,) g(x + ruxy)dr Since Bs is generated by the r-system C restricted to S, equation (3) determines PH(AI x) for all A E 3s as well [4, Theorem 3.3], completing the proof.. 'See [2] for an independently derived expression in the general direction distribution case. Note that we have not defined fH(xIx); since V({x}) = 0, we are free to make the definition arbitrarily. 5

The following theorem bounds the rate of convergence to the target distribution by providing an upper bound on the deviation between the target distribution and the distribution of the mth Hit-and-Run iterate for any m, over all A E Bs, as a function of the direction distribution H. We denote by 'H the class of absolutely continuous direction distributions H with density h bounded away from zero. Theorem 2 For H E X-, I Pr(xm E Aix~ = x)- G(A)j < (1 - HV(S))m-1 A e 3s, x E S (4) where 8H is the density bound for H, given by =H- inf fH(Y Ix). x,yES Proof: For H E 'X, we have S bounded, g bounded from above and away from zero, and h bounded away from zero. Hence from the preceding lemma fH(ylx) is bounded away from zero, i.e., 8H > 0. Then the theorem follows from a result of Doob [10, p.197, case b)].. In order to minimize the error on the left of (4) regardless of iteration number m, we seek a direction distribution achieving the optimal error e*, given by e'= inf (I - HV(S)). HE7-W Therefore we formulate the convergence rate bound optimization as: (P) Find 3* = sup 6H. HE/Since each H is the value of a minimization, we call 6* the maximin density. By an optimal direction distribution we will mean any distribution H* E 7 for which the maximin density 6* is attained. Note that such a distribution is optimal only in the worst-case rate of convergence as expressed by Theorem 2. 4 The Optimal Direction Distribution We now investigate the maximin problem P in more detail. We will demonstrate that the problem has a unique solution H* and characterize this solution by necessary and sufficient conditions on H. For each H E 7H, we define f4(u), the infimal transition density in direction u, by f (u) - inf {fH(ylx)} u E 9B. x,yES Uxy'= 6

Then EH= infUEaB f,(u). The infimal transition density decomposes into two parts, the first corresponding to the target distribution G and the geometry of S, which are considered fixed, and the second to the direction distribution H, which we wish to optimize. Lemma 3 The infimal transition density for H is given by 1 f(u) = (u) (h(u) + h(-u)) u G OB where p(u) is the span of S in direction u, defined independently of H by p(u) = sup {fA(~,')g(x + ru)drIIxII} u. (5) P(U) = sup fA (XIIiy - - xlI- u E OB.()...... g(y)... Uy =U Proof: We have inf { f H~y~x) ) - - g(Y) ie{fH(ylx)} = n, { g(xy)x + ru) dr - -xIln- (h(u) + h(-u)) 1 p(u)(h(u)+ h(-u)).. We now state the main result. Theorem 4 Let S C 'Rn be an open bounded set, and let G be a probability distribution on S, absolutely continuous with respect to V with density g bounded from above and away from zero. Then i) There exists a unique optimal direction distribution H* E 'i. ii) The direction distribution H E 'H is optimal if and only if the infimal transition density for H is constant, i.e., if fk = c for some c > O. Proof: We begin by proving sufficiency. Assume that for the direction distribution H with density h, there exists c > 0 such that fk(u) = c for all u E OB. Suppose that H is not optimal, i.e., there exists a direction distribution H' with density h' such that fk,(u) > 6H' > 6H= f(u) u E OB. Then by Lemma 3, h'(u) + h'(-u) > h(u) + h(-u) for all u E OB. But both h and h' integrate to one over OB, hence a contradiction ensues. No direction distribution improves on the density bound SH = c, and H is optimal. 7

Now we prove existence. Let H* be the direction distribution with density p(u) h fB p(u) a(du) U EaR and corresponding infimal transition density f,.. Observe that g bounded above and away from zero and S open and bounded implies p is bounded above and away from zero, and hence H* E 7X. For u E aB, fk.(u) = 2 foB p(u) cr(du) c. Therefore H* satisfies the sufficient condition and is optimal, with 5* = SH* = c. To prove the necessary condition, assume that direction distribution H with density h is optimal. Since H* defined above is also optimal, fk(u) > 8H = 8* = fk-*(u) for all u E aB. Then by Lemma 3, h(u) + h(-u) > h*(u) + h*(-u) u E aB. (6) Since both sides of this inequality must integrate over aB to the value 2, equation (6) must be satisfied with equality almost everywhere on 9B. Then by our earlier assumption that h(u) = h(-u), we have h = h* almost everywhere, i.e., h* is a density for H. Thus fH(u) = fH* (u) = c for all u E B and the necessary condition is established. Furthermore, we have proven that any two optimal distributions must satisfy (6) with equality almost everywhere, i.e., that the optimal distribution is unique. a This main result has intuitive appeal. To maximize the minimum of a probability density, an obvious candidate is to choose a uniform distribution. The proof above shows that although the infimal transition density is not itself a probability density, it does have a certain normalization which depends on S and G through the span p, and to maximize the minimum, uniformity is again the solution. From the proof of the above theorem, we have Corollary 5 Under the conditions of Theorem 4, i) The optimal distribution H* is determined by the span p of S given in (5), with density h*(u) - p(u) faB p(u) du' ii) The optimal error e* is given by 2 faB p(u) duV(S) Note also that H* is absolutely continuous with respect to a and has nonzero density on all of OB. Therefore, H* satisfies the conditions imposed in Section 2 to ensure that the distributions of the iterates do in fact converge to the target distribution G. 8

5 Optimal Direction Choice for the Uniform Target Distribution over a Class of Convex Regions We will now examine the issue of how to generate directions according to a optimal direction distribution H* with density h*. We could proceed by a rejection technique, generating directions u uniformly on OB and accepting only if a uniform (0,1)-variate is less than h*(u)/h, where h is an upper bound on h* (recall p is bounded from above, and hence such a bound exists). This procedure is inefficient, particularly as n becomes large and S deviates from a spherical shape. Moreover, h*(u) may be difficult to calculate. Under certain conditions on the search region and target distribution, we can establish a potentially more efficient procedure. Assume the target distribution G is uniform over S convex. Since g is constant, the span expression simplifies to p(u) = sup Iy- xll u E OB. (7) x,yES UX y =U We further assume that S has a center s such that for almost all u E aB, the supremum defining the span in direction u is realized by points x, y E OS (the topological boundary of S) with s = (x + y)/2. We define the radius in direction u by r(u) = sup{r > 01 s + ru E S} for all u E OB. By (7), we have p(u) = (r(u) + r(-u))n =(2i\ (u) for all u E OB. Theorem 6 Let S C Rn be open, bounded, and convex with center s, and let the target distribution G be uniform. Let Y be a random variable uniformly distributed on S and let U = (Y- S)/IIIY- sI. Then U has distribution H* on OB. Proof: For measurable D C OB let S8(D) be the part of S in the directions D from s, given by Ss(D) x E S11x-511 E D} Then, Pr(U E D) = Pr(Y E S,(D)) fs(D) dy fs dy -=I I| f (u) a(du) by a change to spherical coordinates = fospr(u) ac(du) D fB p(u) a(du) - ID h*(u) (du)by Corollary 5. 9

Hence U is distributed according to H* on aB.. Therefore, to evaluate the efficiency of optimal direction choice on the class of centered convex regions we may implement exact optimal direction choice by generating points uniformly distributed in the search region and normalizing the vectors from the center to these points. Of course, if we could do this efficiently we would already have solved the problem which Hit-and-Run is designed to address. However, this theoretical result motivates an easily implemented approximation to optimal directions. Since the sequence of Hit-and-Run iterates converges to uniformity, we can try choosing directions by randomly choosing one of the previous Hit-and-Run iterates and normalizing the vector from an approximate center to the iterate. A simple scheme for approximating the center would be to set the ith coordinate equal to the mean of the ith coordinates of the previous Hit-and-Run iterates. When we present computational results in the next section, we will comment on the effectiveness of this approximate method for optimal direction choice. We now show that the class of centered convex regions is characterized by a single optimal convergence rate bound. Theorem 7 Let S c Rin be open, bounded, and convex with center s, and let the target distribution G be uniform. Then the optimal error is identical to that for uniform direction choice in a spherical search region, i.e., =*- 1-2 n2n1,Proof: By Corollary 5, 2 -fa p(u) a(du) (s) 2 VS 2n fAB rn (U) c)(du) V(S) 1 ='1 — n2n- ~ For a spherical search region, fy=1 in (1) and hence the same bound obtains. c Now, we examine the optimal direction distributions and convergence rate bounds for particular regions S. The first example verifies an intuitive result. Example 1 Let S = {x e RnI lXII < b} so that S is the open ball of radius b > 0 centered at the origin in Rn. By (7), p = (2b)n and hence from Corollary 5 the optimal direction distribution is uniform. This is natural; S looks the same from all directions, and we have no reason to favor one direction over another. c 10

Example 2 Now let S={x E Rn I 0 < O s < bi= i 1,..., n} so that S is an open rectangular paralleletope determined by the upper bounds bl, b2,... * bn. S is convex and has center s = (b/2, b2/2,...., bn/2). For any unit direction u, a similar triangles argument shows bi/21uil to be the distance in direction u from s to the ith upper bound constraint. Since r(u) is the greatest distance which satisfies all n of the constraints, p(u) = (r(u) + r(-u))n (8) n minn (9) i:1.... n(9) u, I- i) ui.6O Since the optimal density is proportional to p, we can see either geometrically from (8) or analytically from (9) that the density favors directions along the long axes of the rectangle, and is maximized by the directions from s to the corners of S. Let us evaluate the uniform-direction and optimal convergence rate bounds. By Theorem 7, the optimal error is 1 - 1/(n2n-1). In contrast, recalling that -y represents the ratio of the volume of S to the volume of a sphere with diameter equal to that of S, the uniform bound is determined by the term 1- -7 = V(S) = 1- 2Ht=1b n(E,~ b?)n/2V(B)' For each of the upper bound vectors b~ = (1,1,1,...,1), b1 = (1,2,3,...,10), and b2 = (1,4,9,16,...,100) in R10, Table 1 shows the values of the uniform-direction and optimal-direction convergence bound terms, as well as the number of iterations required to upper-bound the error terms in (1) and (4) by 0.01. Clearly optimal direction choice is greatly superior with respect to worst-case behavior, reducing the number of iterations required by two to six orders of magnitude.. 6 Numerical Results We executed Hit-and-Run both with uniform and with optimal direction choice in each of the three hyperrectangles of Example 2. The testing procedure was as follows: run Hit-and-Run for 10,000 iterations, sampling each tenth point in order to reduce serial correlations. The resulting 1000 points in R10 were then shuffled to randomize their ordering. This procedure produces ten samples of 1000 real-valued points, one sample in each coordinate direction. 11

Table 1: Convergence bound values in R10 Error terms # iters m required so that Upper bound vector Uniform dir: Optimal direction: Pr(xm E 1 x~)- G(.) < 0.01 e- =1- 7/n2n-' c* = 1- 1/n2n-1 Uniform dir. Optimal direction b~ = (1, 1, 1,..., 1) 1- 7.84 10-7 1- 1.95. 10-4 5.9 million 24 thousand b = (1, 2, 3,..., 10) 1 - 3.36 10-8 1 -1.95 10-4 137 million 24 thousand b = (1, 4, 9,..., 100) 1-9.90.10- 1 -1.95 10-4 47 billion 24 thousand Table 2: Frequency test statistics for Hit-and-Run in rectangular regions.a xs Statistics~ (ith coord. range (0, bi) broken into 10 equal cells) Coordinate Upper bound vector/Direction choice method b b1 62 i Uniform Optimal Uniform Optimal Uniform Optimal 1 5.7 * 10.6 *. 9.3 * 11.3 *. 6.6 * 8.0. 2 10.9 * 6.6 * 2.5 8.0 * 8.7 * 26.4 3 8.3 * 10.7 * 13.9 * 14.1 * 30.8 12.4. 4 8.1 * 7.2. 5.6 * 12.3 * 16.8 * 4.0 * 5 10.4 * 17.6 20.9 13.0 * 27.6 10.0 - 6 22.0 20.8 50.4 14.3 * 498.8 5.3. 7 19.5 7.5 * 19.3 18.2 843.2 18.1 8 8.6 * 15.8 * 13.0 * 4.8 * 563.0 15.5. 9 9.7 * 16.6 * 27.3 7.7 * 401.2 7.1. 10 16.0 * 9.4 * 44.0 8.5 * 773.1 7.6 - # passing uniformity 8 8 4 9 3 8 at a = 10%.. I I aStatistics passing the frequency test are marked by.. bUpper and lower x2 values for a = 10%, v = 9: (3.3, 16.9). We performed two-tailed x2 frequency and serial correlation tests [15, pp. 59-60] to test the hypothesis that the samples are uniformly distributed with respect to each of the ten coordinate directions. The coordinate directions are each broken into 10 equal cells so that each frequency test has 9 degrees of freedom. Smith [22] performed the same computational test for HD on a cube in R10 (i.e., uniform directions with b~ as the upper bound) and reported that seven of the ten coordinates passed the frequency test for uniformity at a significance level of 10% and nine of the ten coordinates passed the serial correlation test at the 10% significance level. Tables 2 and 3 show the test statistics for the frequency and serial correlation tests respectively. They show that with upper bound vector b~, for which the optimal direction distribution is the closest to uniform of the three upper bounds tested, the results of uni 12

Table 3: Serial correlation test statistics for Hit-and-Run in rectangular regions. X929 Statistics' Coordinate Upper bound vector/Direction choice method b b1 b2 -i Uniform Optimal Uniform Optimal Uniform Optimal 1 98.8 * 79.6 * 89.6 * 107.6 * 98.4 * 94.4. 2 83.6 * 92.4 * 89.2 * 102.4 * 126.0 105.2. 3 101.2 * 92.8 * 116.4 * 117.2 * 123.2 * 108.0. 4 100.4 * 101.6 * 87.6 * 124.4 101.2 * 77.2. 5 110.8 * 99.6 * 113.6 * 103.6 * 111.6 * 91.6. 6 136.8 115.6 * 140.8 101.6 * 693.2 95.2 - 7 124.4 88.8 * 106.8 * 134.4 1223.6 96.8. 8 88.4 * 105.6 * 105.2 * 80.0 * 836.4 101.2. 9 115.2 * 88.0 * 97.2 * 102.8 * 574.0 103.6 * 10 79.6 * 107.2 * 127.2 124.4 1064.4 106.4. # passing uniformity 8 10 8 7 4 10 at a= 10% I I, I aStatistics passing the serial correlation test are marked by *. bUpper and lower x2 values for a = 10%, v = 99: (77.0, 123.2). form and optimal direction choice are comparable. However, when we elongate the region somewhat by using the upper bound bl, only four of the ten coordinate samples pass the frequency test with uniform directions, while nine of the ten coordinates pass with optimal direction choice. The very elongated region with bound b2 yields results close to those for b' with respect to the number of tests passed by each direction choice method, but note that most of the coordinates which fail under uniform direction choice do so with spectacularly poor x2 values. The serial correlation test statistics tell a similar tale; although the serial tests for uniform direction choice are largely passed with bound b', when we further elongate the search region with bound b2, uniform directions badly fails half of the coordinate tests. Taken together, these results demonstrate that optimal direction choice accelerates convergence of Hit-and-Run to a uniform target distribution on regions whose geometry makes clear distinctions between search directions. We can gain further insight by seeking to establish a connection between the test results and the convergence rate bound values which apply to the regions tested. Theorem 7 and Table 1 indicate that the convergence rate bound for Hit-and-Run with optimal direction choice is the same for any upper bound vector. That is, the convergence bound analysis suggests that Hit-and-Run with optimal direction choice should perform the same in any rectangular region. Revisiting Tables 2 and 3, we see that this predicted behavior does occur. The numbers of frequency tests passing for the three regions tested are 8, 9, and 8 respectively; the difference is not statistically significant. The differences among the number 13

of serial tests passing (10, 7, and 10 respectively) might be considered significant, but two of the three tests which failed for b' did so just barely; a slightly lower test significance would have made the serial results as stable as the frequency results. The correspondence of stability between convergence rate bounds and experimental performance for the differing test regions provides empirical justification for the approach of accelerating convergence by optimizing the worst-case performance bound. In a general-purpose application, it will be more difficult to exactly generate optimal directions. However, we are encouraged that comparable levels of acceleration can be achieved by implementing the approximation method outlined following Theorem 6. When we applied this method to the three rectangular regions we have been considering, we found that the resulting statistical samples were as close to uniformity as were those generated with exact optimal direction choice. The numbers of coordinates passing the frequency tests were 7,7, and 9; the numbers of serial correlations tests passed were 8,9, and 9 [14]. We believe that approximate direction choice methods of this type have great practical potential for acceleration of Hit-and-Run convergence. 7 Conclusion The problem of generating points according to a probability distribution G with density g over an openboundedregion S in n hasapplicationtoMonteCarlomethodsofsimulation, numerical methods, and optimization. Even for simple regions S and a uniform distribution G, the computational effort of rejection techniques for exact sampling grows rapidly in dimension. When S and G are complicated, the performance is much worse. Hit-and-Run algorithms offer an efficient method for generating points which asymptotically approach the desired probability distribution G. However, experimental and worst-case performance of HD Hit-and-Run is substantially degraded as S becomes nonspherical and G nonuniform. In this paper we have generalized the Hit-and-Run convergence rate bound, known previously for HD, to nonuniform direction distributions. We have constructed a unique bupnd-ptimal direction distribution which significantly accelerates convergence to the target distribution, to a degree consistent with the corresponding improvement in convergence rate bound. 14

References [1] Belisle, C.J.P., H.E. Romeijn, and R.L.Smith. 1990. Hide-and-Seek: a simulated annealing algorithm for global optimization. Technical Report 90-25, Department of Industrial and Operations Engineering, The University of Michigan. [2] Belisle, C.J.P., H.E. Romeijn, and R.L.Smith. 1990. Hit-and-Run Algorithms for Generating Multivariate Distributions. Technical Report 90-18, Department of Industrial and Operations Engineering, The University of Michigan. [3] Berbee, H.C.P., C.G.E. Boender, A.H.G. Rinnooy Kan, C.L. Scheffer, R.L. Smith, and J. Telgen. 1987. Hit-and-Run Algorithms for the Identification of Nonredundant Linear Inequalities. Mathematical Programming 37, 184-207. [4] Billingsley, P. 1986. Probability and Measure. John Wiley & Sons, New York. [5] Boender, C.G.E., R.J. Caron, J.F. McDonald, A.H.G. Rinnooy Kan, H.E. Romeijn, R.L. Smith, J. Telgen, and A.C.F. Vorst. 1989. Shake-and-bake algorithms for generating uniform points on the boundary of bounded polyhedra. Technical Report 89-24, Department of Industrial and Operations Engineering, The University of Michigan. To appear in Operations Research. [6] Boneh, A., and A. Golan. 1979. Constraints' Redundancy and Feasible Region Boundedness by Random Feasible Point Generator (RFPG). Presented at the Third European Congress on Operations Research (EURO III), Amsterdam. [7] Brooks, S.H. 1958. A discussion of random methods for seeking maxima. Operations Research 6, 244-251. [8] Dixon, L.C.W. and G.P. Szego (eds.). 1975. Towards Global Optimization, NorthHolland, Amsterdam. [9] Dixon, L.C.W. and G.P. Szego (eds.). 1978. Towards Global Optimization 2, NorthHolland, Amsterdam. [10] Doob, J.L. 1953. Stochastic Processes. John Wiley & Sons, New York. [11] Hammersley, J.M. and D.C. Handscomb. 1964. Monte Carlo Methods. Methuen & Co. Ltd., London. [12] Helms, L.L. 1969. Introduction to Potential Theory. John Wiley & Sons, New York. 15

UNIVERST OF MICHIGAN 3 9015 2229 1606 [13] Karwan, M.H., V. Lotfi, J. Telgen, and S. Zionts (eds.). 1983. Redundancy in Mathematical Programming. Springer-Verlag, Berlin. [14] Kaufman, D.E. and R.L. Smith. 1991. Artificial Centering Hit-and-Run. Working Paper, Department of Industrial and Operations Engineering, The University of Michigan. [15] Knuth, D.E. 1981. The Art of Computer Programming, Second Edition, Vol. 2. AddisonWesley, Reading, Massachusetts. [16] Patel, N.R., R.L. Smith, and Z.B. Zabinsky. 1988. Pure Adaptive Search in Monte Carlo Optimization. Mathematical Programming 43, 317-328. [17] Rinnooy Kan, A.H.G. and G.T. Timmer. 1987. Stochastic Global Optimization Methods -Part I: Clustering Methods. Mathematical Programming 39, 27-56. [18] Rubinstein, R.Y. 1981. Simulation and the Monte Carlo Method. John Wiley & Sons, New York. [19] Schmeiser, B.W. 1981. Random Variate Generation: A Survey. In T.I. Oren, C.M. Shub and P.F. Roth (eds.), Simulation with Discrete Models: A State of the Art View, 79-104. IEEE, New York. [20] Solis, F.J. and R. J.-B. Wets'. Minimization by Random Search Techniques. 1981. Mathematics of Operations Research 6, 19-30. [21] Smith, R.L. 1980. Monte Carlo Techniques for Generating Random Feasible Solutions to Mathematical Programs. Presented at the ORSA/TIMS Conference, Washington D.C. [22] Smith, R.L. 1984. Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed over Bounded Regions. Operations Research 32, 1296-1308. [23] Telgen, J. 1980. Private communication with A. Boneh. [24] Zabinsky, Z.B. and R.L. Smith. 1989. Pure Adaptive Search in Global Optimization. Technical Report 89-22, Department of Industrial and Operations Engineering, The University of Michigan. To appear in Mathematical Programming. [25] Zabinsky, Z.B., R.L. Smith, and J.F. McDonald. 1990. Improving Hit and Run for Global Optimization. Working paper, Department of Industrial and Operations Engineering, The University of Michigan. 16