RESEARCH SUPPORT DECEMBER 1996. UNIVERSITY OF MICHIGAN BUSINESS SCHOOL UNIFORM SAMPLING IN THE HYPERSPHERE VIA LATENT VARIABLES AND THE GIBBS SAMPLER WORKING PAPER #9612-36 CRAIG CUMBUS KIMBERLY-CLARK CORPORATION AND PAUL DAMIEN UNIVERSITY OF MICHIGAN BUSINESS SCHOOL AND STEPHEN WALKER IMPERIAL COLLEGE, LONDON, UK

I

Uniform Sampling in the Hypersphere via Latent Variables and the Gibbs Sampler Craig Cumbus Kimberly-Clark Corporation, Atlanta, GA, USA Paul Damien School of Business, University of Michigan, Ann Arbor, MI, USA Stephen Walker Department of Mathematics, Imperial College, London, UK. ABSTRACT A simple and general algorithm for generating random vectors from the unit hypersphere is developed. The algorithm is shown to be efficient when compared to methods of rejection from a circumscribing hypercube and more viable when the dimension is large. It is seen that such random vectors can be obtained purely via simple transformations of uniform random variables, or by combining such transformations with Gibbs sampling schemes obtained by appropriate introduction of latent variables. Key words: Gibbs sampler, Unit hypersphere, Inverse transform, Latent variables. I

I. Introduction While there are at least a few efficient ways (Devroye, 1986) to generate random vectors uniformly on a unit hypersphere of dimension d, Cd, apparently, the only known "efficient" way to generate random vectors uniformly in Cd is via rejection from (-1, 1)d. However, this method is terribly inefficient for d larger than, say 5; for example, the acceptance rate is less than 0.0025 for d = 10. Additionally, methods such as the MetropolisHastings algorithm are difficult to implement in this situation as well; for an excellent discussion of the Metropolis-Hastings algorithm and related methods, see, for example, Chib and Greenberg, 1995. Here we describe a fairly efficient andconvenient method, especially for sampling large batches, of generating random vectors from the unit hypersphere. The basic idea is that a random vector in hyperspherical coordinates can be simulated relatively easily from d univariate probability densities, and this random vector can be transformed into a random vector in rectangular coordinates that is uniformly distributed in Cd. It may be of interest to note that uniformity is not lost under a linear transformation (Devroye, 1986), so, in particular, it is possible to transform a sample from Cd into a sample from any hyperellipsoid. In Section II we elaborate on the efficiency of the rejection method and re-derive the formula for the volume of a hypersphere. In Section III we utilize some of the results of Section II to develop the theory behind the algorithm. Illustrative analyses are given in Section IV, followed by conclusions in Section V. II. Sampling via Rejection Algorithms Problems with Rejection Algorithms If x is uniformly distributed in the hypercube (-1, l)d, and the C2 norm, |lxII, of x is less than 1, then X is distributed uniformly in the unit hypersphere, Cd. Furthermore, if we denote volume of Cd by V(Cd), then the acceptance rate, a(d), of the corresponding rejection method is given by (Devroye, 1986) a V(Cd) _ rd/2 a(d)= =2d 2d ( + 1) For example, a(2) m 0.785, a(3) z 0.524, a(4) m 0.308, a(5) - 0.164, a(10) z 0.0025, and a(20) - 2.46 x 10-8. Alternatively, the mean number of variates, n(d), required to generate one random vector in Cd,is given by d d2d r(P + 1) n~(d) -= — -- a(d) 7r2 For example, on average, about 4015 uniform variates are required to accept one random vector in CIo. The computational efficiency of this algorithm may be enhanced by using the CauchySchwarz inequality, i.e, ||XI| < E iXil, in a squeezing step (e.g., Devroye, 1986) to avoid 2

some of the more computationally expensive multiplications in favor of addition and sign changes. Unfortunately, the acceptance rate remains unchanged so that this savings is proportionately small for large d. Since the computationally expensive part of this algorithm lies in the rejection step, minimizing the number of rejection steps is critical for large d. An obvious way to accomplish this is to accept from large batches of vectors. Ideally, one would want to batch generate at least m/a(d) vectors from (-1, 1)d, where m is the number of variates from Cd that are still needed after the previous rejection step to reach the desired sample size. For example, if a sample size of 5000 was desired from Cd, then, on average, 5000/a(d) random vectors would be required to retain the desired number after one rejection. If, after one rejection, 100 vectors are still needed, on average, this would require the generation of another 100/a(d) random vectors to achieve this after the next rejection. Clearly, a limitation of this approach would be the availibility of computing memory. That being said, rejection methods are very inefficient and become unviable for large d. A Derivation of the Volume of the Unit Hypersphere The volume of a d dimensional hypersphere can be obtained via repeated application of the Dirichlet integrals (see pages 132-133, Jeffreys, 1985). However, in order to obtain intermediate results for use in deriving the results of the next section, we re-derive the formula for V(Cd) as follows. Suppose x E Rd such that |xllI = 1. Define the transformation [r,0] -> x, where r E R, 0 Rd-1, by d-1 xl =rcos(0O) f sin(O) j=2 d-1 X2 =r sin(0i) i sin(Oi) j=2 d-1 X3 =r cos(02) [ sin(Oi) j=3 d-1 Xi =r COS(0i-1) H sin(Oj) j=i Xd =rcos(Od-_). Oi maybe be thought of as the angle that x makes with the positive (i + l)st coordinate axis. Note that this transformation is regular on the set for which r > 1, 0 < 01 < 27r, and 3

0 < 0i < T, i > 1. The absolute value of the Jacobian of this transformation is d-1 d-1 rd-1 ] sin-1-(Oi) dr dOi. i=2 i=2 Therefore, the volume inside the d-dimensional unit hypersphere, Cd, can be written as r7 r 7r r27 V(Cd)=.i./ /j Integrating (see, e.g., Selby, 1973), d-1 d-1 rd-l sin-1 (Oi) dr d0i. i=2 i=l d-i 7rn~ 1 l ~2r/2 v(cd) = 2- d= sin'-1(Oi) dOi = - 2 sd) i=2 J/0 i=2 O-1] d-l, i 27r rwt) 27r =d 2\/i r (+ ) d d 7f2 d-1 71 2 r(d) r(d + 1). III. Sampling via Transformations and the Gibbs Sampler In this section, we prove that there exist d univariate densities such that, under the transformation defined in the previous section, a random vector of independent variables from these densities becomes a random vector uniformly distributed in Cd. We then proceed to provide viable methods by which one can simulate from these d univariate densities. Theorem: Let R, 01,..., 0d be independently distributed such that they have the following density functions: fR(r) = drd- I(O < < 1) foe (1) = fe2 (02)= 2 I(0 < 01 < 271), 2w7 -sin(02)- I(O < 02 < 7), 7I sin'-1( i) fei (0i) =fo sin-l (0i) doi Jrlm1o e~e < Oi < 7r). fed-_ (Od-l) sind-2 (Od-) fJ sind-2(Od-1) dOd-1 * I(0 < Od-1 < 7r). 4

Under the transformation defined in the previous section, X = (X1,..., Xd) is distributed uniformly in Cd. Proof: Note that the inverse transformation is defined by * -V ^-^=1 i = II 01 =tan-'(x2/xl) 02 =COS-1 (3/i + +X- = s (3/llx311) -i = ios- (i+d/1xi+ll) Od-1 COS-1 (Xd / ) = cos (Xd/llxll). This is obvious if one thinks of xi as the projection of x onto the positive ith axis, whence cos(0i_l) = x/llxll. Because the original transformation is regular on the support under consideration, from the previous section, we see that the absolute value of the Jacobian of the inverse transformation is just r(X)d-1 -i= sin' (Oi(X)) Therefore the joint density function of X is f(x) I --- ~ dl L (llxlld-1 I(0 < lxlII < 1) e X i rly d iC. dl d-Id d-1 I.(0 ) < tan-()< 2(r) lxil/lxll< cos-(I ) < -) 2 a7 x-J3 fo! sin'-1(0i) d0i IxlI --. d I -- J(0 < |1x|| < 1) 2n3r 7rsln0in1-1i) d^ so that, by the previous section, f(X) V(C (0 < Ilxi< 1), whence X is uniformly distributed in Cd. Clearly, 61 - U(0, 27r), i.e, e1 is uniformly distributed on (0,27r). Note that one can readily obtain and invert the cdf of R, whence it follows that, if U - U(0, 1), then 5

Ud '- R. Similarly, if U - U(0, 1), then cos-'(l - 2U) - (2. Thus, for d < 3 a uniform random vector in Cd can be obtained directly from d univariate uniform random variables by means of simple transformations. Unfortunately, for n > 2, it is not possible to invert the distribution function when f(0) cx sin'(0) I(0 < 0 < 7r). If the desired sample size is small, a viable option is to generate such random variables by, say, rejection from suitable envelope functions (Devroye, 1986), e.g., truncated normals or betas on (0, ir) with mean ir/2 and suitable variance. Since the sin (0) is an invertible function for all n, it is possible to introduce a latent variable (Damien and Walker, 1996) Y such that the conditional densities of the joint density of Y and 0 are uniform and the marginal density of 0 is proportional to sinn(0). Therefore, if the desired sample size merits the cost of convergence, we may sample quite efficiently and conveniently from densities of this form via Gibbs sampling (Smith and Roberts, 1993) as follows: Given f(0) cx sin"(0). I(O < 0 < 7r), n > 2, introduce the latent variable Y such the joint density of O and Y is given by f(0, y) c I( < <sinn(O)) I(O < 0 < 7r). Note that the marginal density of 0 is the target density, and conditional densities, given by [YO = 0] - U(O < y < sinn(O)) [OIY = y] r U(sin-l(y1/") < 0 < r - sin-(yl/1n)), are just uniform densities. IV. Illustrative Analyses The algorithm developed in the previous section and a straight rejection algorithm were implemented in SAS/IML and run for various d on an HP 9000/735 workstation. A final sample size of 5000 random vectors (5000d random variates) was generated in each case. In the case of the rejection method, the sample matrix was initialized, without loss of generality, as a 5000 x d matrix of uniform variates from (-1, 1), with one for one replacement upon rejection. That is, if k of the 5000 vectors were rejected, then k - d more variates were generated from (-1, 1), and so on until all 5000 vectors satisfied the acceptance criterion. In the case of the transformations/Gibbs algorithm, 5000 random variates were simulated independently for each component density. Wherever the Gibbs sampler was used, after experimenting with various simulations, convergence diagnostics not reported here showed a burn-in of 50 would suffice. Note that for d = 2 and d = 3 this method is one for one, hence 5000d uniform random variates were required. If d > 3, it follows that 15, 000 + 10, lOOd uniform random variates were required, since each iteration of the Gibbs sampler requires two random variates. (Thus, for large d this algorithm requires approximately 2d random uniform variates for each desired random vector. These values were tabulated in Table 1 for various values of d, and contrasted with the total expected number (Section II) of variates required to obtain an equivalent sample via the rejection 6

method. The experimentally observed run times were also obtained, whenever feasible, and estimated when not. Table 1: Comparison of run times and number of variates involved for the rejection and transformations/Gibbs methods for various values of d. Straight Rejection Transformations/Gibbs number of ave. number number of variates run time of variates run time variates d desired (hh:mm:ss.ss) required (hh:mm:ss.ss) required 2 1.0 x 10 0.53 1.27 x 104. 0.21 1.00 x 104 3 1.5 x 104 1.33 2.86 x 104 0.42 1.50 x 104 4 2.0 x 104 2.98 6.58 x 104 1.70 2.51 x 10 5 2.5 x 104 9.02 1.52 x 105 3.04 3.53 x 10 6 3.0 x 104 20.55 3.72 x 105 5.62 4.53 x 104 7 3.5 x 104 44.21 9.48 x 105 6.92 5.54 x 104 8 4.0 x 104 2:07.57 2.52 x 106 8.37 6.55 x 104 9 4.5 x 104 6:03.66 6.98 x 106 9.86 7.56 x 104 10 5.0 x 104 21:22.35 2.01 x 107 11.46 8.57 x 104 15 7.5x 104 114 hrs * 6.44 x 109 19.23 1.36 x 105 20 1.0 x 105 3000 days * 4.06 x 1012 27.23 1.87 x 105 30 1.5 x 105 1.5 x 107 yrs * 7.35 x 101s 44.72 2.88 x 105 50 2.5 x 105 3.3 x 1021 yrs * 1.63 x 1033 1:18.49 4.90 x 104 100 5.0 x 105 5.4 x 1063 yrs * 2.67 x 1075 2:53.45 9.95 x 105 * Linearly extrapolated from actual run times as a function of expected number of required variates. V. Conclusions In this paper a simple algorithm to generate random variates from a hypersphere was developed. The algorithm was implemented for various d, the dimension of the hypersphere. It was shown that the algorithm is more efficient than rejection sampling for any d, especially so for large d, in which case rejection sampling becomes impractical. Acknowledgements The authors wish to thank Adrian Smith, Imperial College, London, for helpful discussions. References Chib, S. and Greenberg, E., " Understanding the Metropolis-Hastings algorithm," The American Statistician, 49, 327-335, 1995. Damien P., and Walker, S.G., "Sampling probability densities via uniform random variables and a Gibbs sampler," (Submitted for publication) 1996. Devroye, L., Non-Uniform Random Variate Generation., Springer, New York, 1986. 7

Jeffreys, H., Theory of Probability, Oxford University Press, New York, 1985. Selby, Samuel M., Standard Mathematical Tables, CRC Press, Cleveland, Ohio, 1973. Smith, A.F.M., and Roberts, G.O., "Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods," JRSS-B, 55, 3-24, 1993. 8