Hit-and-Run Algorithms for Generating Multivariate Distributions* Claude J.P. Belisle Department of Statistics The University of Michigan Ann Arbor, MI-48109-1027 H. Edwin Romeijn Department of Operations Research and Tinbergen Institute Erasmus University Rotterdam 3000 DR Rotterdam, The Netherlands Robert L. Smith Department of Industrial and Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 Technical Report 90-18

Hit-and-Run Algorithms for Generating Multivariate Distributions* CLAUDE J. P. BELISLE Department of Statistics The University of Michigan Ann Arbor, MI 48109-1027 H. EDWIN ROMEIJN Department of Operations Research and Tinbergen Institute Erasmus University Rotterdam 3000 DR Rotterdam, The Netherlands ROBERT L. SMITH Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, Michigan 48109-2117 ABSTRACT We introduce a general class of Hit-and-Run algorithms for generating absolutely continuous distributions on Rd. They include the Hypersphere Directions algorithm and the Coordinate Directions algorithm that have been proposed for identifying nonredundant linear constraints and for generating uniform distributions over subsets of Rd. Given a bounded open set S in Rd, an absolutely continuous probability distribution ir on S (the target distribution) and an arbitrary probability distribution v on the boundary OD of the d-dimensional unit sphere D centered at the origin (the direction distribution), the (v, 7r)-Hit-and-Run algorithm produces a sequence of iteration points as follows. Given the nth iteration point x, choose a direction 9 according to the distribution v and choose the (n +1) t iteration point according to the conditionalization of the distribution 7r along the line {x + AS; A E R}. (The Hypersphere Directions algorithm corresponds to the case where 7r is the uniform distribution on S and where v is the uniform distribution on AD. The Coordinate Directions algorithm corresponds to the case where 7r is the uniform distribution on S and where v is the distribution that assigns equal probability 1/2d to each of the 2d coordinate directions). The (v, 7r)-Hit-and-Run algorithm defines a Markov chain on S. Our first main result is that under some mild conditions on the density of x7, this (v, 7r)Hit-and-Run Markov chain is time reversible with respect to 7r. It then follows that 7r is a stationary distribution for the chain. Our second main result is that under an appropriate condition on v and S, the (v, 7r)-Hit-and-Run Markov *This work was supported in part by NATO Collaborative Research Grant No. 0119/89. Key words and phrases. Hit-and-Run algorithms, Monte-Carlo methods, Markov chains, time reversibility, Harris-recurrence. 1

chain is Harris-Recurrent with respect to Lebesgue measure. It then follows from a theorem of Orey that for every initial distribution the (v, 7r)-Hit-andRun Markov chain converges in total variation, and hence in distribution, to its stationary distribution 7r. 1. INTRODUCTION Let S be a bounded open subset of Rd and let w7 be an absolutely continuous probability measure on S. Let f(x) be a probability density function for 7r and assume that it is bounded, almost everywhere continuous (with respect to Lebesgue measure on S), and strictly positive. Let D denote the d-dimensional unit sphere centered at the origin and let AD denote its topological boundary. Thus D = {x E Rd: I11II < 1} and oD = {x E Rd: 11xII = 1}. Finally, let v be an arbitrary probability measure on QD. The Hit-and-Run algorithm with direction distribution v and with target distribution ir (for short, the (v, 7r)-Hit-and-Run algorithm on S) can be described as follows: Step 0. Choose a starting point xo E S and set k = 0. Step 1. Choose a direction Sk on AD, with distribution v. Step 2. Choose Ak E Ak = {A E R: Xk + A6k E S}, from the distribution with density fk (A) f(xk + AOk) fA, f (xk +rek)dr Step 3. Set Xk+1 = Xk + AkOk and set k = k + 1. Go to 1. Figure 1 illustrates fk(A), the (reparametrized) density function of the conditionalization of the distribution 7r along the line Ak. Geometrically, fk is a cross section of the multivariate density function f, normalized to be a density function. If v is the uniform distribution on &D and if Xr is the uniform distribution on S the algorithm is known as the Hypersphere Directions algorithm. This special case was first suggested by Boneh and Golan [1979] in the o context of non-redundant constraint identification and later independently by Smith [1980] for generating points uniformly distributed over S. Later Smith [1984] showed that, for S open and bounded, the sequence of iteration points of the Hypersphere Directions algorithm converges to the uniform distribution on S. If v is the discrete distribution that assigns mass 1/2d to each of the 2d coordinate directions and if 7r is the uniform distribution on 5, the algorithm is known as the Coordinate Directions algorithm. This special case was suggested by Telgen [1980]. Later Berbee et 2

Ioo Figure 1. The conditionalization of ir along the line A k al. [1987] showed that if S is a convex polyhedron then the sequence of iteration points of the Coordinate Directions algorithm converges to the uniform distribution on S. Recently Boender et al. [1989] have introduced a related class of algorithms, known as Shake-andBake algorithms, for generating points that are (asymptotically) uniformly distributed on the boundary of a full-dimensional convex polyhedron. The Shake-and-Bake algorithms will not be investigated here. Clearly the (v, 7r)-Hit-and-Run algorithm defines a discrete time Markov chain on S with stationary transition probabilities. The main purpose of this paper is to investigate the convergence properties of this Markov chain and to generalize the results of Smith [1984] and Berbee et al. [1987] in the following three directions: a) the region S will be an arbitrary bounded open subset of Rd, b) the direction distribution v will be completely arbitrary, c) the target distribution ir will be an arbitrary absolutely continuous distribution on S with a bounded, almost everywhere continuous, and strictly positive density. The paper is organized as follows. In section 2 we give a precise analytical description of the (I, 7r)-Hit-and-Run Markov chain. In section 3 we show that the (v, 7r)-Hit-andRun Markov chain is time reversible with respect to r. This implies that Xr is a stationary distribution for the (i, 7r)-Hit-and-Run Markov chain. In section 4 we show that under an appropriate communication structure, described in terms of v and 5, the stationary distribution 7r is unique and the (v, r)-Hit-and-Run Markov chain is Harris-recurrent (with respect to Lebesgue measure on S). Using a theorem of Orey, we then conclude that for every starting point x, the (v, r)-Hit-and-Run Markov chain (Xn; n > 0) converges in total variation to the stationary distribution 7r, i.e. lim P [Xn E BIXo = x] = 7(B) Vx E S, VB E Bs, n —oo 3

where Bs denotes the Borel a-field on S. Section 5 is a brief discussion of some generalizations and practical implications of our results. 2. HIT-AND-RUN MARKOV KERNELS We begin by recalling some standard definitions. Let (5, B) be a measurable space, i.e. S is an arbitrary set and B is a a-field on S. A kernel on (5, B) is a nonnegative function, say K, defined on S x B, such that (i) Vx E S, K(x,.) is a o-finite measure on B. (ii) VA E B, K(., A) is a measurable function on S. A substochastic kernel is a kernel K satisfying K(x, S) < 1 for every x. A stochastic (or Markov) kernel is a kernel K satisfying K(x, S) = 1 for every x. See e.g. Nummelin [1984] section 1, Orey [1971] chapter 1 section 0, Revuz [1975] chapter 1 section 1. Now let, 7r, f and v be as described at the beginning of section 1 and let Bs be the Borel a-field on S. Let 0 and U be random variables, defined on a probability space (Q, JF, P), such that 0 has distribution t, such that U is uniformly distributed over the interval (0, 1), and such that 0 and U are independent. For x, y E S, with x $ y, let A(x,y)- = {A R:xz+A y- S} and let f(X,y) be the p.d.f. on R defined by f(x + _(y_- x)/llyy- xli) f( (A) - (, f(x + (y - x)ly - x ll()d if A A(x, y) and f(x,y)(A) = 0 if A A(x, y). Let F(x,y) denote the c.d.f. of f(x,y), i.e., X\ F(xyY)W= j f(xy)(u)du. The function P(x, A) = P [x + (F +e)(U)) e A] defines a Markov kernel on (S, Bs). Here F1y) is the usual right continuous inverse of F(X,y). Since 0 has distribution v and since U and 0 are independent, the right hand side of the last equation can be written as DP [+ (Fxe+O)(U) s E A] v(d-). Furthermore, since F-Z+)(U) is a A(x, x +0)-valued random variable with p.d.f. f(x,x+e), the above Markov kernel is the one-step transition probability of the Hit-and-Run algorithm described in section 1. This motivates the following formal definition: 4

DEFINITION 1. The Markov kernel P(x, A)= P x + (F +(U)) 0 E A] v(dX) xE S, A E Es D L I will be called the (i, 7r)-Hit-and-Run Markov kernel on S. The probability measure v will be called the direction distribution. The probability measure wr will be called the target distribution. 3. SYMMETRY, TIME REVERSIBILITY AND STATIONARY DISTRIBUTIONS Let P be a Markov kernel on an arbitrary measurable space (S, B) and let 1P be a probability measure on (S, B). DEFINITION 2. The Markov kernel P is said to be time reversible with respect to the probability measure p if A P(x, B)/l(dx) = J P(x, A)/(dx) VA, B E B. DEFINITION 3. The probability measure, is said to be a stationary distribution for the Markov kernel P if /(A) = P(x,A)(dx) VA E 3. Definition 3 is standard. Definition 2 is a straightforward generalization of the concept of time reversibility for discrete state space Markov chains (See e.g. Ross [1983], page 126) and it is consistent with the concept of time reversibility for general stochastic processes as discussed e.g. in Kelly [1979], page 5. A function p(x, y) defined on S2 will be called symmetric if p(x,y) = p(y,x) for every x and y in S. The following propositions are elementary consequences of the definitions. PROPOSITION 1. Suppose that P is of the form P(x,A) = p(x,y)(dy) Vx E S, VA E B JA for some jointly measurable symmetric function p(x, y) on S2. Then P is time reversible with respect to p. 5

PROOF: Under the given assumptions, an application of Fubini's theorem yields P(x, B)p(dx) = p(x, y)l(dy))(dx) = |l IA p(xy)(dx)(dy) B A = l p(y x)p(dx) (dy) JB A = P(y, A)A(dy) VA,B E B. B Thus P is time reversible with respect to /i. PROPOSITION 2. IfP is time reversible with respect to i then p is a stationary distribution for P. PROOF: Assuming time reversibility with respect to / yields / P(x, A)(c(dx) = J P(x, S) (dx) = /(dx) = A = u(A) VA E B. Thus p is stationary for P. | We now return to the case where S is a bounded Borel subset of Rd, where v is an arbitrary probability measure on AD and where 7r is an absolutely continuous probability measure on S with a bounded, almost everywhere continuous, and strictly positive density f(x). For every Borel set G C AD, and for every 0 < rl < r2 < oo, let G 0 (r, r2) = {y ~ Rd: y = AX for some9 E GandA E (ri,r2)}. Figure 2 illustrates the set G ~ (rl, r2). ( I D —- G 0(rlr2) Figur.2. An^.ill::.ti.:o t: st::r,,:..... Figure 2. An illustration of the set G ~ (ri, r2) 6

Let m(x, A) be the unique kernel, on (Rd, BRd ), satisfying (1) m(x,x + G ~ (rl,r2)) = (v(-G) + v(G))(r2 - ri). Thus m(O, ) is the measure with polar infinitesimal volume element (v(-dO) + v(dO))dr and m(x,.) is just a translation of the measure m(O,.). The next proposition gives us an explicit analytical expression for the Hit-and-Run Markov kernel. PROPOSITION 3. (a) The (v, 7r)-Hit-and-Run Markov kernel can be written as P(x, A)= f f y() m(x,(dy). JA fA(,y) f(x + r(y - x)/lly - xl)dr y (b) If v is absolutely continuous with respect to the uniform probability distribution on 3D then the (v, 7r)-Hit-and-Run Markov kernel can be written as (2) P(x, A) h((y - x)/lly - xll) + h((x - y)/llx - yil) _rr(dy). (2) P(x A) = A fA(,) f(x + r(y - x)/lly - xli)dr Ily - xlld-1Cd where h(0) denotes the density of v with respect to the uniform probability distribution on 3D and where Cd = 27rd/2/r(d/2) (i.e. Cd is the surface area of AD). PROOF: Part (a) is an immediate consequence of the definitions: P(xA) = P [ + (F,'+)(U))o E A] v(dO) aD [1A( AO) f(+(x ) +A v(d) JaD ]A( x,x+9) fA(xx+) f ( + rG)dr (3) =- f y) m(zx dy). Here, and throughout the rest of this paper, 1A denotes the indicator function of the set A. The first equality is Definition 1. The second equality follows from the fact that F, -fiz6)(U) is a A(x,x + 9)-valued random variable with p.d.f. -(XX+ A ~ f(x + A) S A((,ye) f ( + r()dr The last equality can be justified as follows. If B is a Borel subset of S of the form B =x + G0 (rli,r2) then ID fJxx+9) 1B(X + A\)dAv(dO) = B l(Y)(X, dy) aD A(x,x+O) S 7

since both sides are equal to m(x, x +G 0(rl, r2)). A standard measure theoretic argument (see e.g. Breiman [1968] Proposition 2.23) then yields (4), g(x + )dAv(dO) = j g(y)m(x, dy) D A(x,x+) JS for every non-negative measurable function g defined on S. This holds in particular for the function 1A(y)f(y) ) A(,y) f(x + r(y - x)/llly - xI)dr in which case (4) reduces to (3). Now consider part (b). If v is of the form v(G) = h(0)v(dO) where v, is the uniform distribution on 9D, then the kernel m(x, A) can be written as m(xA), h((y - x)/lly - xl\) + h((x - y)/Ily - xl)dy Ily - Xlld-1CdY and therefore the (v, 7r)-Hit-and-Run Markov kernel can be written as P(x, A) h((y - x)/lly -xll)+ h((x - y)/lly - xl) f(y)dy x A fA(X,y) f(x + r(y - x)/Ily - xll)dr Ily - xlld-1Cd JA h((y - x)/lly - xli) + h((x - y)/lly - xl\) fA(x,,) f(x + r(y - x)/lly - xli)dr Ily - xlld-lCd Proposition 3 is the key step towards our first main result: THEOREM 1. The (v, 7r)-Hit-and-Run Markov kernel is time reversible with respect to r. PROOF: In the absolutely continuous case the result follows from Proposition 1 and part (b) of Proposition 3 (since the integrand in (2) is symmetric). Now consider the general case. Let v be an arbitrary probability measure on AD. Let V1l V2, V3... be a sequence of probability measures on 9D such that v, converges weakly to v as n -* oo (for a definition of weak convergence, see e.g. Billingsley [1968]) and such that v,n is absolutely continuous with respect to the uniform distribution on 9D. (The fact that such a sequence exists can be seen as follows. Let 0 be a OD-valued random variable with distribution v, let Un be an Rd-valued random variable with uniform distribution over the open ball of radius 1/n centered at the origin and assume that 0 and Un are 8

independent. Let v, be the distribution of (0 + Un)/IE) + Un\\. Then vn has the desired properties). Let m,(x, A) be the kernel induced by vn, as in (1), and let P(zx, A) denote the (v,, 7r)-Hit-and-Run Markov kernel. Since Vn is absolutely continuous, P,(x,A) is time reversible: (5) I P(x, B)7r(dx) = Pn(x, A) r(dx) VA, B Bs. Furthermore, since v, converges weakly to v, the measure mn(x, ) converges weakly to the measure m(x,.), for every x in S. Now from part (a) of Proposition 3 we have (6) P$(x, C) P= (lG(y)/Y) -n(x dy) VG E Bs JS (,y) f(x + r(y - x)/lly - x11)dr If G is a finite intersection of open balls then one can show that for almost every x E S the set of discontinuity points of the function _^ 1G__ c(y)f(y) Y (,y) f(x + r(y - x)/lly - x)dr has m(x, -) measure zero. This implies that the right hand side of (6) converges, as n - oo, to f/ 1Gyfcm(y)f(y)x,dy) s fA(,,) f(x + r(y - x)/lly - xll)dr (see e.g. Billingsley [1968], section 5). Thus, if G is a finite intersection of open balls then lim Pn(x, G) = P(x, G) for almost allx E S. n -oo Thus, letting n - oo in (5) and using the Lebesgue dominated convergence theorem (see e.g. Billingsley [1986] Theorem 16.4), we obtain (7) A P(x, B)r(dx) = P(x, A)r(dx) whenever A and B are finite intersections of open balls in S. To complete the proof, we need to show that (7) holds for every A and B E Bs. This can be done via a standard extension argument. Fix A, a finite intersection of open balls in 5, and let CA = {B E BS: fAP(x,B)r(dx) = fBP(x,A)r(dx)}. Using the monotone convergence theorem (see e.g. Billingsley [1986] Theorem 16.2) one can easily verify that CA is a A-system (see Billingsley [1986] page 36). Since CA contains all finite intersections of open balls, Dynkin's ir-A-theorem (see Billingsley [1986] Theorem 3.2) implies that CA = BS. Thus (7) holds whenever A is a finite intersection of open balls in S and B is Borel set in S. Now fix B E Bs and let CB = {A E Bs: A P(x, B)w(dx) = fBP(x, A)r(dx)}. The same argument yields CB = Bs. Thus (7) holds for every A, B C Bs. I In view of Proposition 2, the following result follows from Theorem 1. 9

THEOREM 2. The probability measure ir is stationary for the (v, 7r)-Hit-and-Run Markov kernel. I It is easy to see that the stationary distribution of a (i, 7r)-Hit-and-Run Markov kernel is not necessarily unique. The next section will present a necessary and sufficient condition for 7r to be the unique stationary distribution for the (i, 7r)-Hit-and-Run Markov kernel. 4. THE MAIN LIMIT THEOREM As before, let S be a bounded open set in Rd, let 7r be an absolutely continuous probability measure on (S, Bs), assume that ir possesses a density f(x) which is bounded, almost everywhere continuous, and strictly positive on S, and let v be a probability measure on AD. By Theorem 2, r is stationary for the (v, 7r)-Hit-and-Run Markov kernel P (P(x, B); x E S, B E Bs). The purpose of this section is to show that under an appropriate condition on va and S, r is the unique stationary distribution for the Markov kernel P and for every initial distribution the Hit-and-Run Markov chain converges to 7r. This will be achieved by showing that the Hit-and-Run Markov chain is Harris-recurrent (see the definition below) and by using a well known theorem of Orey. The Hit-and-Run Markov chain will be denoted (Xn;n > 0) and we will write P to denote conditional probability given X0 = x. We begin with some preliminary results on the communicating structure of the Hit-and-Run Markov kernel. For every non-negative integer n, let pn = (Pn(x, B); x E S, B E Bs) denote the n-step Hit-and-Run Markov kernel. Thus P~( B) { 1 if x B X 0 if x B (x, B) = P(x, B) and for n > 2, Pn(x,B) = / Pn-(y,B)P(x,dy) P(, B)Pn-l(x, dy). Recall that pn(x, B) has a unique decomposition Pn(x, B) = Psing(x, B) + Pabs(, B) = Psng(,B) + J Pn( y)dy where PSng(x,B) is a singular substochastic kernel, where Pabs(xB) is an absolutely continuous substochastic kernel and where pn(X, y) is jointly measurable. (See Orey [1971] section 1.1). If Pnbs(z 5) > 0 then we say that the probability measure Pn(x,.) has an absolutely continuous part. If the density pn(x, y) is positive for almost all y's in some Borel set B, then we say that the probability measure Pn(x,.) spreads over B. 10

PROPOSITION 4. Let G be an open subset ofS. Then P(y, G) > 0 y yG. PROOF: From Proposition 3, P(y, G) = = f(Z) m (y,dz) G A(y,z) f(Y + r(z - )/jlz - yll)dr The above integrand is strictly positive. Furthermore, if G is open and y E G, then m(y, G) > 0. Thus the above integral is strictly positive. I Recall that the support of a probability measure p on Rd, to be denoted Supp(,), is defined as Supp(p) = {x E Rd:'(B(x, e)) > 0, Ve > 0} where B(x, e) is the open ball of radius e centered at x. PROPOSITION 5. Supp(Pabs(x,')) C Supp(Pan+l(x, )) Vx E S, Vn > 1. PROOF: Fix n > 1 and ax S, and suppose that v E Supp(P^b,(x, )). Fix e > 0 small enough so that B = B(v, e) C S. Then pn+1(x, B) P(y,B)Pn(x, dy) s = j P(y, B)Psng(x, dy) + j P(y, B)Panbs(X dy). By Proposition 4 we have P(y, B) > 0 for every y in B and since v E Supp(Pb(.(x, )), we obtain Pnb,(x, B) > 0. Thus the second integral on the right hand side of the last equality is strictly positive. This implies that Pabl(zX, B) > 0. Since this holds for every e > 0, we conclude that v E Supp(Pn+l(z,. )). Thus Supp(Pnb(x,.)) C Supp(Pnb+(x, )). DEFINITION 4. The probability measure v is said to be full dimensional if Supp(v) contains a set of vectors that span Rd (in other words if the set {0} U Supp(v) is not contained in any linear subspace of dimension d - 1). PROPOSITION 6. Suppose that v is full dimensional. Then 11

(a) in > d and Vx E S, the probability measure Pn(x, ) has an absolutely continuous part. (b) Vx E S, limnoo Pb,(X, S) 1. (c) There exists an r > 0, depending only on v, such that for all x E S and for all n > d, the probability measure Pnb,(x, ) has a density which is strictly positive on B(x, ryx ), where y7 = infyesc IIx - yll (and where Sc denotes the complement of S). PROOF: Let Ln denote the subset of (aD)n consisting of those points (01, 02,.-, on) for which there exists integers 1 < il < i2 <... < id < n such that Oi, I 2..,, id are linearly independent in Rd. Let O1,02,03,... be the successive directions taken by the (v,7r)Hit-and-Run Markov chain. Since 01,02,03,... are independent random variables with distribution v and since v is full dimensional, P [(01, 02),..., n) E Ln] > 0 Vn > d and P [(01,02,..., On) ELn] - 1 as n -+ oo. One of the most important features of the Hit-and-Run Markov chain is that if (1, 02,...,,n) E Ln then for every x in S the conditional distribution P, [Xn E Gj(1,02,.., n) = (01,82,.., 8,)] is absolutely continuous with respect to Lebesgue measure on S. Thus parts (a) and (b) follow from the fact that Pn(x, G) = P, [Xn G|(O1,02,.., vOn) = (O1,02,... On)v] (dt1)v(d82)...v(dOn). (OD)' Now consider linearly independent 01,02,..., d in Supp(v). By continuity, we can choose r > 0 and e > 0 small enough so that the sets Ak =B(0k, ) nOD k = 1, 2,..., d are linearly independent (in the sense that u 1, u2,..., d are linearly independent whenever Uk E B(Ok, )n OD, k =1,2,...,d) and B(O, r) C { akUk ~-1 < ak < 1, k = 1,2,...,d for every (u1,u2, d) E A2...) A. Then for every (ul,u2...iud) E Al x A2 x...x Ad, the conditional distribution Px [Xd E G|(0,02,..., Od) = (U,U2,,..., Ud)] 12

is absolutely continuous and it has a density which is strictly positive on B(x, ryS). Since O1, 2,..., d are independent random vectors with common distribution v and since 01, 02, *., Od are in the support of v, P[(01, 02,...,rd) E Al x A2 x... Ad] = nLP [0i E A?] = nH1v(Ai) > 0. Thus Pdbs(z, ) has a density which is strictly positive on B(x, ryx). Combined with Proposition 5, this proves part (c). I Let ~ be a a-finite measure on (S, Bs). Recall that a Markov kernel P is called o-irreducible if 00 Pn(, B)>O x0SE n=l whenever o(B) > 0. In terms of the associated Markov chain (Xn; n > 0), io-irreducibility is equivalent to the statement that P, [Xn E B for some n > 1] > 0 Vx E S whenever o(B) > 0. Throughout the rest of this section o will denote the Lebesgue measure on (S, Bs). Recall that a topological space T is said to be connected if the only subsets of T which are both open and closed are the empty set and T itself. It is said to be pathwise connected if for every a, b E T there exists a continuous function f: [0, 1] -, T such that f(O) = a and f(1) = b. If T is an open subset of Rd, then connectedness and path connectedness are equivalent. Furthermore every open subset T of Rd is a union of at most countably many disjoint open connected sets called the connected components of T. PROPOSITION 7. Suppose that v is full dimensional. Let A be a connected component of S. Let G be a measurable subset of A with positive Lebesgue measure. Then (a) For all x E A, there exists an integer n such that Pn(x, G) > 0. (b) If m(.) is a measure on (S, Bs) such that m(A) > 0, then there exists an integer n such that J pn(x, G)m(dx) > O. PROOF: Fix x E A. Let G be a measurable subset of A with positive Lebesgue measure. Choose y E A such that for every e > 0 the set B(y, e) n G has positive Lebesgue measure. Let g: [0, 1] - S be such that g(O) = x, g(1) = y, and g is continuous on [0, 1]. Let 7=inf{llu- vi; u g([0,1]), v E AC}. 13

Observe that 7 is strictly positive since it is the distance between the compact set g([0, 1]) and the closed set AC. Let r be as in proposition 6 and let e. = r-y. Now consider open balls Bi =B(xi, e*) i=1,2,..., k where x1 = x, xk = y, and xi ~ Bi-l n g([O, 1]) for i = 2,3,..., ik, as shown in Figure 3. Figure 3. A path from x to y Such a construction is possible since the set g([0, 1]) is compact. It follows from Proposition 6 that Pid(x,.) spreads over Bi. In particular, pkd(x,.) spreads over Bk. This implies that pkd(x, G) > 0. This proves part (a). The argument actually shows that Pkd(x', G) > 0, VW' E B1. Thus if in the proof of part (a) we take x G Supp(m), then we obtain Pkd(X, G)m(dx') > B pkd(xl, G)m(dx') > 0. This proves part (b). | DEFINITION 5. The connected components of S are said to be v-communicating if for every pair of connected components A and B there exists an x E A and an n > 1 such that (8) Pn(x,B)> 0 In particular if S is connected or if Supp(v) = aD then the connected components of S v-communicate. If d = 2, S = B((O, 0), 1) U B((10, 10), 1) and Supp(v) = {(1, 0), (0, 1)} then the connected components of S do not v-communicate. If d = 2, S = B((0, ), 1) U 14

B((10, 1), 1) U B((10, 10), 1) and Supp(v) = {(1, 0), (0, 1)} then the connected components of S do v-communicate. Using part (a) of Proposition 3 and Theorems 5.2 and 5.5 of Billingsley [1968], one concludes that the Hit-and-Run Markov kernels and their iterates are continuous in the sense that if Xk -- x then, for n > 1, the probability measure Pn(xk, ) converges weakly, as k - oc, to the probability measure Pn(x,.). This implies that if B is an open set and if k -- oo then liminf pn(xk,B) > Pn(x, B). k — oo (See e.g. Billingsley [1968], Theorem 2.1). A consequence of this observation is that condition (8) is equivalent to the condition that for some n > 1 (9) o({x E A: P(x, B) > 0}) > 0. PROPOSITION 8. Suppose that v is full dimensional and that the connected components of S are v-communicating. Then the (v, 7r)-Hit-and-Run Markov kernel P is e-irreducible. PROOF: Fix x E S. Fix G E Bs with p(G) > 0. We need to show that (10) Pn(x, G) > 0 for some n > 1. Let A be the open connected component of S to which x belongs. Let B be an open connected component of S such that p(G n B) > 0. If A = B, then (10) follows from part (a) of Proposition 7. If A 7 B, then let n2 > 1 and y E A be such that pn2(y, B) > 0. By the remark following Definition 5, the set C = {z E A: pn2(z,B) > 0} has positive Lebesgue measure. By part (a) of Proposition 7, there exists an integer n > 1 such that Pn (X, C) > O. We then obtain Pnl+2(x, B) = J Pn2(z, B)Pn (x, dz) > I Pn2(z,B)Pnl(x,dz) > 0 and therefore by part (b) of Proposition 7 there exists an integer n3 > 1 such that J P3(z, G n B)Pn+n2 (x, dz) > 0 JB 15

Now if we take n = nl + n2 + n3, then P(x, G) > Pn(x, G n B) > Pn3 (z, G n B)pnl+n2(, dz) > 0, i.e. (10) holds. I Recall that a Markov kernel P is said to be indecomposible if there are no disjoint nonempty sets A and B such that P(x, A)= 1 x E A and P(x, B)= 1 Vx E B (See Breiman [1968]). It is easy to verify that (p-irreducibility implies indecomposability. Thus the following proposition is an immediate consequence of Proposition 8. PROPOSITION 9. Suppose that v is full dimensional and that the connected components of S are v-communicating. Then the (v, r)-Hit-and-Run Markov kernel is indecomposible. I Theorem 7.16 of Breiman [1968] says that an indecomposible Markov chain posseses at most one stationary probability distribution. The result is stated for the case where the state space is a Borel subset of R but the proof that Breiman presents is also valid for the case where the state space is a Borel subset of Rd. Thus Theorem 2 combined with Proposition 9 implies that if v is full dimensional and if the connected components of S are v-communicating then ir is the unique stationary distribution for the (v, r)-Hitand-Run Markov kernel. It is easy to see that these two conditions are actually necessary for uniqueness of the stationary distribution. If the connected components of S do not v-communicate then S is a union of two connected open sets, say A and B, that do not v-communicate, and the probability measures defined by TrA(G) = 7r(G n A)/7r(A) and rBg(G) = xr(G n B)/7r(B) are both stationary. If v is not full dimensional then the linear subspace of Rd spanned by Supp(v), say H, is less than d dimensional and any conditionalization of the probability measure ir to a set of the form x + H, with x E S, will be stationary. Thus we have proved the following refinement of Theorem 2: THEOREM 3. The probability distribution r is stationary for the (v, r)-Hit-and-Run Markov kernel. Furthermore, it is the unique stationary distribution if and only if v is full dimensional and the connected components of S are v-communicating. I 16

Orey [1971] introduced a notion of periodicity for general sp-irreducible Markov kernels. The next proposition states that Hit-and-Run Markov kernels are aperiodic in Orey's sense. PROPOSITION 10. Suppose that v is full dimensional and that the connected components of S are v-communicating. Then the (v, 7r)-Hit-and-Run Markov kernel is aperiodic. PROOF: By Proposition 8 we have op-irreducibility. Thus Theorem 3.1 of Orey (1971) applies. By part (c) of Proposition 6, we cannot have a cycle of length k > 1 (See Definition 3.2 of Orey (1971)). Thus the (v, 7r)-Hit-and-Run Markov kernel is aperiodic. I Finally, we recall that the Markov kernel P is said to be p-recurrent (or Harrisrecurrent with respect to () if the corresponding Markov chain (Xn; n > 0) satisfies Pz [Xn E B for some n > 1] = 1 Vx S whenever p(B) > 0. This is equivalent to the statement that P, [ l1B(Xn)= oo- =1 V ES whenever g(B) > 0. THEOREM 4. Suppose that v is full dimensional and that the connected components ofS are v-communicating. Then the (v, 7r)-Hit-and-Run Markov kernel P is Harris recurrent with respect to Lebesgue measure on S. PROOF: By Theorem 2, 7r is invariant for P and by Proposition 9, P is indecomposible. Thus, under Pr, the Hit-and-Run Markov chain (Xn,;n > 0) is a stationary ergodic sequence (see Breiman [1968], Theorem 7.16, page 136). Now fix B E Bs. Then under P, the sequence (1B(Xn); n > 0) is also a stationary ergodic sequence (see Breiman [1968], Proposition 6.31, page 119; here again Breiman states the result for the case where the state space is a Borel subset of R but his proof is also valid for the case where the state space is a Borel subset of Rd). Thus by the ergodic theorem (see e.g. Breiman [1968], Theorem 6.28, page 118) we obtain Pr lim ElB(Xi) 7r(B)] =1 n-.oo rt i [? 1 r(B)]n i.e. / P lim - 1 1B(Xi) f(x)dx = 1. iePJs n —on o n 17

Since f(x) > 0 Vx E S, this implies that n (1 Px im 1B(Xi) =7r(B) =1 n-+oo i for almost all x in S. We will now show that (11) holds for all x in S. Fix x in S and fix e> 0. By part (b) of Proposition 6 we can choose k so that PkbS(x, S) > 1 - e. Now let 1 n SO= {ES: PYlim - l B(X) = 7r(B) = 1 y E S' P~n —oo n i= 1 n —-oo n f p~ lim ] Then Pz [lim Z1 B (Xi)= (B)] = j P [lim 1B(Xi) = r(B) Xk= y Pk(X dy) = jPy [lim ylB(X) = 7r(B)] Pk(xdy) Js n —+on i= Py lim - 1B(Xi) = r(B) Pak(X, dy) 1 Pabs(XO) = Pabs(S)> 1->. The last equality follows from the fact that W(So) = p(S). This holds for every e > 0. Thus (11) holds for all x e S. Now if B E Bs and W(B) > 0, then 7r(B) > 0 and therefore (11) yields 00 P, [ Bl(Xi)= = 1. Thus the Markov chain is Harris-recurrent with respect to Lebesgue measure on S. | Recall that if 7r, 7rl, 7r2, r3,... are probability measures on S then, by definition,,rn is said to converge in total variation to 7r if lim 7rn(B) = r(B) VB E Bs. n -0oo (In comparison, 7rn is said to converge in distribution to 7r (or converge weakly to ir) if lim 7rn(B) = r(B) VB E Bs with 7r(&B) = 0; n,-oo 18

see Billingsley [1968] Theorem 2.1). Orey ([1971], page 30, part (i) of Theorem 7.1) has shown that if an aperiodic p-recurrent Markov chain possesses a stationary probability distribution 7r, then for every initial distribution A, the distribution of X, converges in total variation, and hence in distribution, to the stationary distribution ir. In view of Orey's theorem, the following result is a consequence of Theorem 2, Theorem 4, and Proposition 10. THEOREM 5. Let S be a bounded open subset of Rd. Let 7r be an absolutely continuous probability measure on S. Assume that Xr has a density which is bounded, almost everywhere continuous, and strictly positive on S. Let v be a probability measure on the boundary AD of the d-dimensional unit sphere centered at the origin. Assume that v is full dimensional and that the connected components of S are v-communicating. Then Xt is the unique stationary distribution for the (v, 7r)-Hit-and-Run Markov kernel. Furthermore, for every initial distribution the (ti, 7r)-Hit-and-Run Markov chain converges in total variation, and hence in distribution, to the stationary distribution rt. I REMARK. We proved the convergence part of Theorem 5 via Harris-recurrence and Orey's theorem. If the direction distribution v is absolutely continuous (with respect to the uniform distribution on AD) then by part (b) of Proposition 3 the probability measures P(x, -) are absolutely continuous with respect to the stationary distribution ot. In this case the convergence part of Theorem 5 can be obtained via a theorem of Doob [1948] which says that if an aperiodic and indecomposible Markov chain possesses a stationary distribution 7t and if its transition probability distributions P(x,.) are absolutely continuous with respect to rt then, for every initial distribution, this Markov chain converges in total variation to the stationary distribution rt. (See Theorem 7.18 of Breiman [1968].) With this approach one doesn't need to prove Harris-recurrence. But if v is not absolutely continuous (with respect to the uniform distribution on 9D), as in the Coordinate Direction algorithm for instance, then the probability measures P(x, ) are not absolutely continuous with respect to 7r and this approach via Doob's theorem fails. 5. CONCLUSION Theorem 5 asserts that lim P, [Xn E B] = r(B) VB E Bs, Va:E S, n — oo where (Xn;n > 0) is the (v,i7r)-Hit-and-Run Markov chain on S. This result has a significant practical implication for Monte-Carlo methods. It says that one can use the Hit-and-Run algorithm to generate points that are asymptotically tr-distributed. 19

Historically, the Coordinate Directions and the Hypersphere Directions Hit-and-Run algorithms have been used for generating uniform distributions over bounded open subsets of Rd. In that context they are known to be efficient when compared to standard methods such as the acceptance-rejection method, specially when d is large. In this paper we have generalized these traditional Hit-and-Run algorithms to allow for an arbitrary direction distribution v and an essentially arbitrary absolutely continuous target distribution ir. Although not addressed here, the boundedness conditions on the open set S and on the density function f can both be removed. Moreover, the condition that f be strictly positive on S and the condition that S be open can both be substantially relaxed. As a final comment, the general (v, 7r)-Hit-and-Run algorithms introduced in this paper show considerable promise for solving certain global optimization problems. For example, by setting the density f to be a suitable increasing function of the objective function g, the target distribution ir will concentrate near the global maximum of g and therefore the Hit-and-Run Markov chain will, with high probability, find itself near that global maximum. This idea is currently being investigated. REFERENCES Boneh A., and Golan A. [1979]. Constraints' redundancy and feasible region boundedness by random feasible point generator (RFPG). Third European Congress on Operations Research, EURO III, Amsterdam. Berbee H.C.P., Boender C.G.E., Rinnooy Kan A.H.G., Scheffer C.L., Smith R.L., Telgen J. [1987]. Hit-and-Run algorithms for the identification of nonredundant linear inequalities. Mathematical Programming, 37, 184-207. Billingsley P. [1968]. Convergence of probability measures, Wiley, New York. Billingsley P. [1986]. Probability and measure, Second Edition, Wiley, New York. Boender C.G.E., Caron R.J., Rinnooy Kan A.H.G., McDonald J.F., Romeijn H.E., Smith R.L., Telgen J. and Vorst A.C.F. [1989]. Shake-and-Bake algorithms for generating uniform points on the boundary of bounded polyhedra. Technical Report 89-24, Department of Industrial and Operations Engineering, The University of Michigan. Submitted to Operations Research. Breiman L. [1968]. Probability, Addison-Wesley, Reading, Massachusetts. Doob J.L. [1948]. Asymptotic properties of Markov transition probabilities, Trans. Amer. Math. Soc. 63, 393-421. Kelly F.P. [1979]. Reversibility and stochastic networks, Wiley, New York. 20

UNIVERSITY OF MICHIGAN 3 9015 04732 7344 Nummelin E. [1984]. General irreducible Markov chains and non-negative operators, Cambridge University Press, Cambridge. Orey S. [1971]. Limit theorems for Markov chain transition probabilities, Van Nostrand Reinhold, New York. Revuz D. [1975]. Markov chains, North-Holland Publishing Company, Amsterdam. Ross, S. [1983]. Stochastic processes, Wiley, New York. Smith R.L. [1980]. Monte Carlo procedures for generating random feasible solutions to mathematical programs, ORSA/TIMS Conference, Washington D.C. Smith R.L. [1984]. Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions, Operations Research, 32, 1296-1308. Telgen J. [1980]. Private communication with A. Boneh. 21