New Monte-Carlo Procedures for Numerical Integration by John R. Birge Robert L. Smith Technical Report 82-8 Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, Michigan 48109 May, 1982

New Monte-Carlo Procedures for Numerical Integration by John R. Birge Robert L. Smith Department of Industrial and Department of Industrial and Operations Engineering Operations Engineering The University of Michigan The University of Michigan Ann Arbor, Michigan 48109 Ann Arbor, Michigan 48109 Abstract A method for generating uniformly distributed points within arbitrary bounded convex regions is used to evaluate integrals over such regions. The procedure involves approximating the region by a mesh of boxes and of estimating the content medians in each direction. It is most efficient for regions that are nearly rectangular and may be quite useful in evaluating integrals over polytopes.

New Monte-Carlo Procedures for Numerical Integration by J. R. Birge and R. L. Smith 1. Introduction The numerical evaluation of integrals over multi-dimensional regions has been a problem of interest since Clerk-Maxwell's [2] formulas for rectangles in 1877. Approximation procedures have also been successfully applied to standard lower dimensional regions, such as cubes, simplices and spheres. (See Stroud [7]). In higher dimensional problems, especially when the region can be represented as the Cartesian product of lower dimensional standard regions, Monte-Carlo methods become generally more efficient. Difficulties arise, however, when the functional domain is more complicated because of reasons discussed below. The problem we consider is to evaluate numerically: (1) S = f(x)dx, n where f is a continuous function from R into R, x C R, region in R and S is the value of the integral. Deterministic methods for evaluating S are based on product rule (Stroud [7]). In these methods, m points,: in D are chosen and weights, wl, w2,...,wm, are assigned The result is an approximation of S, m (2) Sm C(D) Z wi f(xi) i=l ' 1 D is a contented a so-called x1,o te pins..m to these points.

2 m where E wi = 1 and C(D) is the n-dimensional content of D. The weights and i=l points in (2) are generally chosen so as to ensure that Sm = S for polynomials of degree up to some integer k. The weights are found by solving a system of equations which represents relationships among the coefficients in polynomials obtained as integrals over D of all polynomial functions of degree less than or equal to k. Using symmetrical properties of D, lower dimensional results can be applied to higher dimensional regions (Hammer ans Stroud [4]). As dimensionality increases, however, these approximation techniques require prohibitively many function evaluations and prove inadequate for complicated regions. When n is large or when D is not an elementary region, Monte-Carlo procedures are generally considered the only practical alternative. The simplest Monte-Carlo estimate (Hammersly and Handscomb [5]) is: k S = C(D) Z f(Xi) (3) i ~=l k where X1, X2,...,X are points uniformly distributed throughout D. Two major difficulties may, however, occur in evaluating (3) because of the complexity of D: a) X1, X2,...,Xk may not be efficiently generatable in D, and b) C(D) may be difficult to estimate. The classical techniques for solving (b) are i) to use the "hit-or-miss" procedure of generating points in a simpler region of known content that encloses D and of rejecting points that land outside of D and to estimate C(D) by this fraction, or ii) to decompose D into simpler regions of

3 known content, such as simplices. The rejection procedure may prove inadequate in higher dimensions because the content of D may be quite small in comparison to the content of an enclosing region such as a sphere. Decomposing D may also be extremely difficult if D is irregular. For example, even decomposing a polytope into simplices requires the evaluation of all extreme points. In the discussion below, a method is given which avoids the difficulties inherent in these two approaches. 2. A Markov Chain Approach for Generating Uniform Sample Points, X1y X2,.-.~ Xk. Boneh and Golan [1] and Smith [6] have developed procedures for generating uniformly distributed points over bounded regions. Their procedures are based on a symmetric mixing algorithm that we will adapt to our example. We first need some basic definitions and assumptions about the region D. An n-dimensional rectangular submesh M is defined as: N (4) M - U B j-( ~whe re N < 0fx:j = C) where N < o, B3 = {x: {x: 6 < x < - + 1)&, i = l,...,n} is a box of M,6 is a small positive number, and i are distinct integers for all i and j. A sla__b or level. of M, Lie is defined as: andj A slab or level of N, is defined as: (5) Li = M n{x k < xi (k + 1)6}, where k is an integer.

4 M is connected if the graph, G = (N, A) is connected where the nodes in N correspond to the centers of each box Bj and the arcs in A connect all nodes corresponding to adjacent (i.e., facet sharing) boxes in M. M is strongly connected if M is connected and L. is connected for all i and k. We assume that D is a strongly connected n-dimensional rectangular submesh and note that any bounded convex region may be approximated in content arbitrarily well by such a rectangular mesh. The algorithm which follows for generating uniformly distributed points in D is a discrete version of a symmetric mixing algorithm. Algorithm: 1) Let x be the center of some box in D, set i = 0. 2) Choose a random direction D uniformly distributed over {el, e2,..., e. Let D = eJ, let Y(X) = (X1,..., X + X5, X+,..., Xi) and let Y(X) = min{Y(X) IY(X) ~ D} and J+l n " Y(X) = max{Y(X)IY(X) ~ D}, where Z is the set of all integers. kXZ Choose A uniformly distributed on {X, X + 1,,.. -1 0 1,..., - 1, X. Let i = (X,...,X + 6,..., X). Set i = i + 1. 3) If i = K, an upper bound on the number of iterations, STOP. Else, return to Stcn 2.

5 As will be shown, this algorithm asymptotically approaches the uniform distribution over M. We have discretized D in terms of M in order to find a confidence bound on the algorithm's rate of convergence to uniformity. We first need the following lemma on box-to-box transitions in D. Lemmal: There exists an integer T < 0 such that (6) C = Min P(X c Bj j X0 Bi) > 0. l<i<N l<j_<N Proof: We wish to show that T is an upper bound on the maximum number of transitions necessary in passing from any box to any other box. Let h.= - Z. where i. = min{.: > x, for all x M and Q. = max{i.:. 6 < x. for all x r M}. h. then is the maximum deviation in -1 1 - 1 1 boxes of M in direction i and h. < o since N, the number of boxes, is finite. hiis then a side length of the smallest enclosing rectangle about Di. n We claim that T < Z h.. To see this, assume that we start at 1 i=l 1 X0 Bo and we will construct a path leading to Xj e Bj such that coordinate values are monotone increasing. On the first transition, we move without loss of generality to B such that Z 6 = x < (gl + 1)6 for x CE and require i i i 0 + that (Zg + 1) <. <. + 1 < i for all i. Since D is connected and the slab L. which includes B is connected, 1 the half-mesh D = Dn {x: x. > Q. 6} is connected. Therefore, we can 1 - 1 continue the algorithm on D1. At the next transition, we can again divide 1 and continue. We never return to a level previously crosse by the path. D and continue. We never return to a level previously crossed by the path. Hence, the maximum total number of possible transitions is equal to the sum

6 n of the number of slabs in each direction, T < v - Z h. < 0o, where v is i=l defined as the sum of the side lengths of the enclosing rectangle. The worst case possibility for T occurs when the region is a string of boxes each connected to only one other box in the mesh. For this case, the path from the lowest box, B = {x 6 < x < (Zi + 1)6, i = l,...,n to the highest box, B = {x I (i- 1)6 < x. < i. 6, i = l,...,n}, requires a direction change at each step. A string can be more generally viewed as a collection of v - n + 1 strongly connected rectangular regions. In fact, any mesh M can be decomposed into V rectangular regions, where M= U R, (7) j=l and where R has sides of lengths, sl s,..., sj. 1 2z~~~ n Now, we can reach any point in Rj in n steps, so, for general M, another upper bound on the number of required transitions in T < n v. This bound may not be as convenient as v to measure, but the decomposition can also give us a minimum probability for passing from any box to any other. For any rectangle, this minimum probability for Rj is n nJ = n - (8) i=l s. Next let us assume that at every step of the algorithm we cannot remain in the same box. By this assumption, we in general restrict the number of possible directions we may follow. Let nj be the maximum number of admissible directions in exiting from Rj to any other rectangle Rj'.

7 We can now obtain the minimum transition probability C in M. It is the product of the minimum transition probabilities within each rectangle and the minimum probabilities for exiting from one rectangle to another. That is, e> I -- 1 (9) i mm i=l min min v n = min{sj}. The number of boxes in M is N = Z (I s), min i i j= i 1 where sj l and we define Y = Ns, where v v I I \ ~Y> z n (^ -- -k -k j=1 kij \ n ( max{s.-l,l}s) / i=l min min i~i. mln Lemma 2: For a mesh, M, of N boxes, and a minimum V - n- +l within M of c, Y = N > n + 2(Vn+l) ( imin ). (10) sj + 1 1. min transition probability Proof: We will proceed by induction to show that the string has the lowest possible y. First, we observe that the construction of the string forces a single box at every 6 increment in each direction. Starting from the first box, this leads to v - n additional boxes. Hence, for the string, N v - n + 1. For V - n + 1 even, the string can be decomposed intoV= 2 rectangles with all sides of length 1 except one side of length 2. The maximum number of possible directions in any of these rectangles is 2, hence v-n+l y = NE > (v - n + 1) ( 2 j=l 22 =. - n +1 (11) ( v- n + 1) 2vn~)

8 For V - n + 1 odd, there are - rectangles of 2 boxes and a single box. In the single box, only one direction is possible, so v-n r > N >(V n+1) 1.2*1 ) ( II:2 ) (1 2 ) (jl 22) V - n+ 1 (12) (V-n) + 1 Every mesh of given dimensions hl, h2,...,h summing to v must have at least V - n + 1 boxes since that is the minimum number of boxes that can be traversed in following the algorithm from R to B. We wish to show that the mesh has the lowest y over all meshes of v - n + 1 boxes. To do this, we proceed by displacing boxes from the mesh while preserving the connectedness properties. The number of possible directions in the new mesh will not increase because the path from B to B must cross over every other box and, therefore, only two directions are admissible from any box. This observation shows that we may still consider the mesh as the union of one-and two-box rectangles. Hence, Y is still greater ) - n +1 than or equal to - 2(v-n+1 ) Starting from a mesh of v - n + 1 boxes, any mesh enclosed by a rectangle of side lengths hi, h2,..,h can be constructed by adding boxes. We will proceed by induction on the number of boxes, Nb, in the mesh. Nb N Let C and y = N Cb be the minimum values ~ and y for a mesh of Nb Nb boxes. We know y > n + for N = v - n + 1, and we assume that n2(v-n+ 1) b N b V-n +1 Y > n + for all N < k > v - n + 1. 2(V-n+l) '2

9 Assume that the k + 1st box is added to rectangle R of some mesh of Nb boxes. If the k + 1st box is not adjacent to any box in a Nb minimum transition probability path, then y remains the same. If it is adjacent to a path then we consider that either the number of possible directions from R increases from n to n + 1 or the length of some side i of R has increased to s. + 1. The new box must also be connected to some other rectangle along a minimum probability path in order to maintain connectedness. Therefore, in proceeding along a path between two boxes most distant from each other, we can either follow one of the previous paths or proceed through the new box. The minimum probability is then either kl l1 k n+l n /,or yk+l +> 1 ) +k + +1i / ) =a, or n1+ 1 n+1l s. k+l ( 1 k k -k > 1 + 1 ), (13) s.min + 2 + k+l k \ + 2 Smin + 2 k Since, the number of boxes, k + 1 > k, we have y > y. Hence, the induction is proved. Given Lemmas 1 and 2, we have the following theorem.

10 Theorem: For X0, X1,..., generated by the symmetric mixing algorithm over N a strongly connected region D = U B. and where the uniform distribution o1n i=l on D is defined as p(B) -, C(D) IP(xm Bi x0. B) - li(Bi) < (1- Y) (14) (V - n + 1) where T < v - n + 1 and y > ( Proof: Follows from Lemma 1, Lemma 2, and Doob [3], p. 173. 1 The bound for y found above does not lead to quick convergence to uniformity. The bound in (14) should, however, be taken as the worst case. It is a bound on deviation from uniformity within any box. Uniformity may also be achieved with some confidence on a certain proportion of the boxes. For more computational utility, we may update N, C and T as we proceed n with the algorithm. We begin by assuming that T = v-n+l = Z h. - n + 1, i=l1 c = 1-.... - and N = - n + 1. 2 (V-n+l)

11 c is updated by considering the maximum possible movement found by max the algorithm in each direction, X.. When a new maximum movement is found, we wish to find the largest rectangle enclosing this line segment. Assume that we have found B' and B" such that the ith coordinate of the center of B', X is equal to max. + x.", the ith coordinate of the center of B". Given B' and B", we find the boundaries of D in every direction from B' and B". We take the smallest common intervals in each direction from B' and B" to find the enclosing rectangle, which is in D because of the connected ness property. We repeat this search procedure whenever a new maximum coordinate diameter is found for D. We also restrict the rectangles to be disjoint of any formed previously. In this way, we construct n rectangles with side lengths, X, An for each direction i. We can also determine the minimum proportion of boxes within each rectangle from which we can enter another rectangle and the minimum probability of entering another rectangle. We call this value pi, where pi >..1. * 1. ~1 1 ~ max{X% n J The minimum number of transitions is now the sum of the constructed rectangles around the maximum deviation in each direction and the worst case string for the remaining portion of the enclosing rectangle. We, therefore, have T < n2 (n. - max )+ 1(15) Z n+ (n. -X, )l i=- 1 1 The minimum transition probability is the product of transition probabilities within the constructed rectangles and within the remaining string. Hence, >n 1 1 C > ~ (~p.i~ )m.; max (16) 1=1 ^ ^ 2v - 1 m +1 i1 i j=l1

12 Upper and lower bounds on the number of boxes N may be obtained by keeping a list of different boundary points obtained during k k k k the algorithm. For example, if X (X, X,...,:X) is the next chosen iteration point, j is the direction chosen at this iteration, and there 1 +1 9 I4+l k do not exist boundary points, w3 and w, such that c. = C. = X. for all 1 1 1 k - i $ j, then there exist X + X boxes between the boundary points, X - X 6e and X + X 6e.. We check the list of boundary points to find the number 3 [ of these boxes that has previously been found. We update a lower bound N on N by N = NL + X + X - We can also update an upper bound N on N by considering boxes outside of D. Again, we can use the other boundary points to determine the number E of box centers x on the line k V xk + ae and not in D that have already been identified. N is then updated by NU= NU- h. + + +. 3 1 3 The convergence to uniformity accelerates quickly as v, the minimum number of rectangles required to partition D, decreases. We consider the class of meshes with equally dimensioned rectangular subregions. The worst case is the string composed of unit boxes. In this case, there are v - n + 1 rectangles of a single box. The best case is when there is one rectangle with sides of length -. Here, n 1 = --- - n + l. (:) The other elements of this class contain rectangles of sides with length s - where < s < 1. For this class, we can proceed as in the string n - and reduce the number of rectangles to ( - - n + 1)/2 rectangles with s sides of length s( - ) except for one side of length 2s( ). (We omit the n n possible cases of non-integer numbers of rectangles for ease of exposition.)

13 For each class, the minimum number of transitions is n n T > - -n +n - - s s (17) since one transition is required to locate in the region of the rectangle from which the next rectangle is admissible, one transition is required to enter the next rectangle, and n transitions are required to obtain any box in the destination rectangle. The probability of making the first of these transitions is (-) ( - ) and the probability of the second transin 2 1 tion is also -* Hence s -n+l)/2 4 ( n ) (18) j=n 4n, -\~~~~~ 1" ~~~~~(18) n _ n+l)/2 4n s1 n where (-n ) is the minimum probability of an n-step transition in the destination rectangle. We also have N = ( - n + 1) ( s), so s n 5n4n ( - n + 1)/2 n sv n 1 s Y (n - n + 1) ( 54n 4 (n- n + 1)/2 n 1 s - ( 2n 1 () s 4 4n 1 sv n n (19) We can improve further on these bounds by considering a larger T in the evaluation of e. We define = tn n + 1)/21 2n + n = n s + 3n(20) S S 2 where we allow for n steps within each rectangle. Now, the conditional distribution, given that we do no exit a rectangle, is uniform in each rectangle. In a transition from rectangleR to R we allow n steps for uni1 1ll formity in R and another n steps for uniformity in R, the rectangle formed by half of R and half of R. We condition first on not exiting R1

14 and second on not exiting R. A minimum probability for entering R is the product of the probability of staying in R1, the probability of entering R, the probability of staying in R, and the probability of 2n 11 n1 n 1 (2n-ln 1 entering R n 2 n n n. Hence (n/s-n+l)/2 2n1 2n j=1 (n and n( n-n+l) n +l)/2 ( n _n+l)/2 sn.i 2n-1 1s Y > ( n _ n+1) 2n ( —). (22) In the string case, this bound is not tight since we assumed that only one direction was admissible in each box. The bound in (22 ) does also not strictly apply to the case where s = 1 since only one rectangle n is present in that case. These examples for worst case, s = —, and best case, s = 1, do provide a range for y. For s = (v-n+l)/2 n (v-n+l) y > (v - n + 1) ( ) ( -) and for s = 1, 1 2n-1 2 2n n For s = 2V ' y ( n - n+l)/2 n( v -n+l) y- J,(4 2) (n-1 2 2n ), and for s = (n+l)/2 n(n+l) Y> (n + 1) ( (2n 4 2n

15 We can see how the exponential decrease in Y as s approaches 1 leads to quicker convergence to uniformity on an example. We let n = 10 and v = 100. We wish to find m such that 0 B0 1 P(xm C Bi I x B) - p(B) (23) < a p(Bi) From (14), we want m such that (1 - ) ( - 1) < a p(B) =(24) or > - Sn a/N _ 9n / (2 m > T ( n (I/N-Y)1 An (-l)) (25) n 1 if we begin with the string case where s = nv = then T = 91, 91 28 and N = 91 and from (11), y > 91/2. Here, m > 2 x 10. For 9 -3 a w i 4 x 5 s = 2/3, ~ = 80, N = 109, Y > 4.3 x 10 and we find m > 4.4 x 10 We note that these cases are taken from a class that essentially represents the worst case for regions composed of - - n + 1 rectangles since each rectangle is connected to at most two others. We would expect that general regions have more interconnections. 3. The Use of X1, X2,.. Kk to Estimate S. The next step of the procedure is to investigate each coordinate direction to estimate the content median in that direction, that is, the point mi such that C[D n {x: xi > mi ] = C[D n{x: x. < m.}]. We can define a distribution on L. by C(L g(Li) = (D, i' i+l' ' i (26) C(D) Following the symmetric mixing algorithm can be viewed as taken a random 1 2 N L+l i sample, yl, y,,..., yi from L = L. where = L if I' ~,...,L where y = if xj E L 3 e order Iyj7 1 2 N x L3. We order {yJ} so that Y. *** < y and define zj P{L. < yJ}. We also let m = L. such that mi C Li i i-i I ji' 1

16 P{y < mL } P{Z< 2 (27) Ekk l (1 k-q q=j Hence, we can find a confidence interval on mi by P{y1 < m < y}=2 Z. (28) q=j \q / k1 + ( k + 1)6 j-1 J k. A If For y i = y = L. we estimate mi by xi = j-1 j-1 j - 1), < y L,we let x= We do this for each i and obtain x = (X, x2,..., x) which belongs to some box, B, in unbiased estimator of the box which contains the median m = (ml, m2,..., m). ~A n~1 n Using the planes xi = xi, we can cut D into 2 pieces, 0, 2'",* 2n" of mean content C(D)/2. Each j. is defined C. = Dn Hll H2 n-. n H n (29) Hi = {x: x > x } if ji = 1. These pieces can be viewed as an n-dimensional simplex an with a single irregular face. The vertices of this simplex or x. = x = min{x.: x c D and x. = x. for all j = i}. Letting y, y2'~ n be the non-apex vertices for a given piece, the irregular face if s the n-l n-l surface corresponding to an n-l dimensional simplex o determined by Yim plyo~. y. Te content of a single piece can be estimated by first choosing M uniformly distributed i.i.d. points in a. For each of these points, z., an orthonormal vector vj to ap1e is calculated and we let:

17 h(zj) = max{zj + Xvjzj + Xv. r ])}, and let (30) J x J j J ' g(z.) = X such that h(x) = x. + Xv.. (31) The content of a piece C can then be estimated as M g(z.) C() - C(an) + 1 C )- Z (32) C( n-1 j=1 M as in the Monte-Carlo procedure for estimating content and where C()= ) 1 det=) ( 3 n! et (n \d n We estimate C(C) for a sample of k of the 2 pieces. For k large, we can apply the Central Limit Theorem to obtain an estimate, C(i), of the average content of a piece. An unbiased estimate of the content of D is then: n (33) C(D) = 2n C(Q). 2 M By letting s= E (C(4 ) - C())/M-1 be an estimate of the variance, we j=l obtain a confidence interval on = C(D)/2 as P{j - t/2 -- < C(T) < p + t — } =1- (34) ca/2z a/2 r — where t /2 = {y I P{T < y} = a/2} for T having the t-distribution with M degrees of freedom. We note that if D has a great deal of symmetry then the samplee variance should be small and the error in estimating C() will be reduced.

18 4. Conclusion We have shown that the integral of a function over any bounded convex region in R can be estimated by efficient Monte-Carlo procedures. Our procedure involves approximating the region by a rectangular mesh of arbitrary box size and by generating uniformly distributed points within this region. Uniformity is approached by the symmetric mixing algorithm at a faster rate as the fractional content y of D to an enclosing rectangle increases. This property is important for D, a polytope, as in D = {x: Ax = b} where A is an m x n matrix and b is an m-vector. In most applications, A is very sparse and y is large relative to arbitrary regions. Given the uniform distribution of m points in D, the procedure used non-parametric estimation to find the content median in each dimension. These medians were then used to divide D into pieces whose content we could more easily approximate. The estimates of these piece contents were then used to estimate C(D). Our method is an extension of the "sample-mean" [ 5 ] or crude MonteCarlo procedure which has lower variance than the hit-or-miss method. Other variance reduction techniques such as importance sampling and stratified sampling require information about the region which we have not assumed. They may, however, be used to refine the procedure above in specific cases.

19 REFERENCES [1] Boneli, A. and A. Golan, "Constraint redundancy and feasible point generator," presented at EURO III, Amsterdam, 1979 also in Karwar, M.H., J. Telgen and S. Zionts (eds) Redundancy in Mathematical Programming, Springer-Verlag, New York-Heidelberg, Berlin (1981). [2] Clerk-Maxwell, J., "On approximate multiple integration between limits of summation," Proceedings, Cambridge Philosophical Society, Vol. 3 (1877), pp. 39-47. 13] Doob, J.L., Stochastic Processes, John Wiley and Sons, New York, 1953. [4] Hammer, P.C. and A.H. Stroud, "Numerical evaluation of integrals I," Mathematical Tables and Aids for Comtation, Vol. 11 (1957), pp. 59-67. [5] Hammersly, J.M. and D.C. Handscomb, Monte Carlo Methods, John Wiley and Sons, New York, 1964. [6] Smith, R.L., "Efficient Monte Carlo procedures for generating points uniformly distributed over bounded regions," Techn:ical Report No. 81-1, The University of Michigan, Department of Industrial and Operations Engineering, 1981. [7] Stroud, A.H., Approximate Calculation of Multiple Integrals, PrenticeHall, Englewood Cliffs, New Jersey, 1971.

UNIVERSITY OF MICHIGAN 3 9015 03994 8263 3 9015 03994 8263