AFFDL-TR-75-134 EXACT AND APPROXIMATE BAYESIAN SOLUTIONS FOR INFERENCE ABOUT VARIANCE COMPONENTS AND MULTIVARIATE INADMISSIBILITY THE UNIVERSITY OF MICHIGAN DEPARTMENT OF STATISTICS ANN ARBOR, MICHIGAN 48109 FEBRUARY 1976 TECHNICAL REPORT AFFDL-TR-75-134 FINAL REPORT FOR PERIOD 1 MAY 1972 - 1 JANUARY 1976 Approved for public release; distribution unlimited AIR FORCE FLIGHT DYNAMICS LABORATORY AIR FORCE WRIGHT AERONAUTICAL LABORATORIES Air Force Systems Command Wright-Patterson Air Force Base, Ohio 45433

NOTICE When Government drawings, specifications, or other data are used for any purpose other than in connection with a definitely related Government procurement operation, the United States Government thereby incurs no responsibility nor any obligation whatsoever; and the fact that the Government may have formulated, furnished, or in any way supplied the said drawings, specifications, or other data, is not to be regarded by implication or otherwise as in any manner licensing the holder or any other person or corporation, or conveying any rights or permission to manufacture, use, or sell any patented invention that may in any way be related thereto. This report has been reviewed by the Information Office (OI) and is releasable to the National Technical Information Service (NTIS). At NTIS, it will be available to the general public, including foreign nations. This technical report has been reviewed and is approved for publication. H. LEON HARTER Project Engineer Applied Mathematics Group Aerospace Dynamics Branch FOR THE COMMANDER GERALD G. LEIGH, Lt. Co1, USAF Chief, Structures Division Copies of this report should not be returned unless return is required by security considerations, contractual obligations, or notice on a specific documentt AIR fO - 17 HY 76 - 400

UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE (When Date Entered) READ INSTRUCTIONS REPORT DOCUMENTATION PAGE BEFORE COMPLETING FORM 1. REPORT NUMBER 2. GOVT ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER AFFDL-TR-75-134 4. TITLE (and Subtitle) 5. TYPE OF REPORT & PERIOD COVERED Exact and Approximate Bayesian Solutions for Technical - Final Inference about Variance Components and 1 May 1972 - 1 Jan 1976 Multivariate Inadmissibility. 6. PERFORMING ORG. REPORT NUMBER 7. AUTHOR(s) B. CONTRACT OR GRANT NUMBER(s) Bruce M. Hill F33615-72-C-1213 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT, PROJECT, TASK AREA & WORK UNIT NUMBERS The University of Michigan DOD Element 61102F Department of Statistics 710 Ann Arbor, Michigan 4810970710201 11. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE Air Force Flight Dynamics Laboratory February 1976 Aerospace Dynamics Branch, FYS 13. NUMBER OF PAGES Wri ht-Patterson AFB, Ohio. 45433 - 38 14. MONITORING.AS-LNC' NAME & ADDRESS(if different from Controlling Office) 15. SECURITY CLASS. (of this report) Air Force Flight Dynamics Laboratory Unclassified (Same as above) 1(Same as above) 5a. DECLASSIFICATION/DOWNGRADING SCHEDULE 16. DISTRIBUTION STATEMENT (of this Report) Approved for public release; distribution unlimited.'7. DISTRIBUTION- STATEMENT (of the abstract entered in Block 20, if different from Report) 18. SUPPLEMENTARY NOTES 19. KEY WORDS (Continue on reverse si-de if necessary and identify by block number) Variance components, Bayesian inference, random model, admissible multivariate estimator. 20. ABSTRACT (Continue on reverse side if necessary and identify by block number) Exact and approximate Bayesian solutions for inference about variance components and multivariate inadmissibility are obtained, using a class of prior distributions which allows realistic forms of prior knowledge to be incorporated. Results are based upon new mathematical representations for the Appell and ordinary hypergeometric functions. DD JAN73 1473 EDITION OF 1 NOV65 IS OBSOLETE UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered)

PREFACE This report, prepared by Professor Bruce M. Hill, is one of two parts of the final report on Contract No. F33615-72-C-1213, which was funded by the Aerospace Research Laboratories and was technically monitored by Dr. David A. Harville (until 30 June 1975) and Dr. H. Leon Harter (since 1 July 1975). The other part, prepared by Professor William A. Ericson and entitled "A Bayesian Approach to Two-Stage Sampling", will appear as AFFDL-TR-75-145. iii

I

TABLE OF CONTENTS SECTION PAGE I INTRODUCTION AND SUMMARY 1 II MODEL AND FORM OF PRIOR AND POSTERIOR DISTRIBUTIONS 3 III HYPERGEOMETRIC REPRESENTATIONS 7 IV p"(9) AS MIXTURE OF BETA DISTRIBUTIONS 9 V INTEGRATED GENERATING FUNCTIONS 11 VI THE FUNCTION i(q;s,t) 16 VII E(Oldata) FOR SMALL pl, P2 19 VIII BOUNDS FOR E(Oldata) 24 IX THE ROOTS AND PRIOR PARAMETERS 26 X EXAMPLES 29 XI COMMENTS 31 BIBLIOGRAPHY 33 v

1. INTRODUCTION AND SUMMARY In this report exact and approximate Bayesian s tutions are derived for the problem of inference about the ratio of variance components in the one-way balanced random model analysis of variance, and for the closely related problem of obtaining admissible Bayes estimators for the mean of a multivariate normal distribution, using a class of prior distributions which allows reasonably realistic forms of prior knowledge to be incorporated. In the usual balanced one-way random model analysis of variance, with y.j = ~ t + aC + si, i = 1,...,I, j = 1,...,J (see Section 2 for notation and i J i'1 assumptions), and with T2 the ratio of between to within variance components, it was shown by Hill [1965] that inference about all other parameters in the model is quite simple, given T. However, the posterior distribution of T 2 is of a complicated form, and the posterior moments of T had to be evaluated numerically. It was then shown by Lindley and Smith [1972] that, given T, the posterior expectation of pi = p + ai is simply yi. + (1 - 0) y.., and hence the Bayes estimate of pi for squared error loss is (1.1) yi. E{0|data} + y.. (1 - E{Oldata}), where 0 = JT2/(1 + JT2). See also Hill [1975]. Since the one-way balanced ran model can be regarded as consisting of a sample of J independent observations from an I dimensional multivariate normal distribution, where the usual random model distribution of the a. is incorporated into the prior distribution for the mean vector p = (p,... 1JI)' it follows that (1.1) will yield admissible estimators for p if the prior distributions employed are not too bizarre. 2 However, just as with T, the posterior distribution of 0 is quite complicated, and the required posterior expectation of 0 had to be calculated either 1

numerically, or alternatively modal estimatesemployed, as, for example, by Lindley and Smith [1972]. Thus previous work in this area had stopped short of a full Bayesian analysis due to intrinsic mathematical difficulties. In the present report some new results concerning the Appell hypergeometric function are obtained which lead to both exact and approximate evaluations of the posterior moments of T and 0, and thus to Bayes estimates for these parameters under various loss functions, as well as for i. Section 2 sets forth the basic model and form of prior and posterior distributions. In Section 3 the relationship to the Appell hypergeometric function is explored, while Sections 4, 5, and 6, provide probabilistic representations for this function, upon which are based simple formulas for the exact evaluation of the posterior moments of T and 0. In Section 7 a limit theorem is proved which yields simple approximations to these moments, while in Section 8 upper and lower bounds are derived for the posterior moments. Section 9 then studies the behavior of certain roots of a quadratic equation, which are of central importance in the formulas for the posterior moments. Finally, in Section 10 some examples illustrate the application of the results of this report, and in Section 11, some general comments are made concerning these results. 2

2. MODEL AND FORM OF PRIOR AND POSTERIOR DISTRIBUTIONS The model considered in this report is the balanced one-way random model (2.1) yij = y + c. + Ei, for i=l,...,I, j=l,...,, where ai ~ N(O,a2),. ~ N(O,a2), and these random variables are ca 1J mutually independent. As derived by Hill [1967], the likelihood function for (p,a2,2 ), is L(p,,a2) cc (a2)-I(J-1)/2 exp[-SSW/2a2] x (a2 + Ja2-I/2 exp[-SSB/2(a2 + Ja2)] x exp[-IJ(y_ - I)2/2(a2 + Jo2)] 2 22 where - O < <o, 2 > O, 2 > 0, and J I i. Yi j/J, y.. = Z./I, j 13 i=l SSW = Z (Yij - Y)2 and SSB = J Z (yi. - y )2 j'3 i We consider the implications, for Bayesian inference and design, of the family of prior distributions with densities (2.2) 22 22-X/j2 -1 2 (2.2) p,a a ) x (a ) exp[-C/2a x (a2)-)/2 -1 ep[-C/2a2], X, Cc, x, C> 0, 3

i.e., with a2 and 2 independent a priori, each having an inverted gamma distribution, and with H having the Jeffreys improper uniform distribution. This family of prior distributions was introduced and studied by Hill [1965] for inference in the one-way random model, and that study is continued here both for inference and design purposes. It is believed that Bayesian analysis is particularly appropriate in the one-way random model because there is typically 2 2 substantial prior knowledge about the variance components a and a2 and that knowledge can be very effectively used either in inference or in design. The above family of prior distributions is, of course, not the only one of interest. However, the inverted gamma family is rich and flexible, and there is often reason to view the variance components as approximately independent a priori. In regard to Hi, our use of the Jeffreys prior distribution is motivated by the fact that neither inference nor design is very sensitive to the choice of prior distribution for p, and the flat prior may typically be regarded as a reasonable approximation via the stable estimation argument of L.J. Savage. Alternatively, it is easy to verify that use of the Jeffreys prior for H is tantamount to basing inference about the variance components solely upon SSW and SSB, ignoring y.., and thus obtaining the posterior density p,(2 2) (2 )-[I(J-1)+X]/2 - 1 exp[-(SSW+C)/2a2 ] (aa ) c( exp[-(I-)SSW+C0/2 ] (2.3) x (a + Joa2) 2exp[-SSB/2(a2 + Jo2)] -A/2 -1 x (a o2) exp[-C%/2o2]. 4

We are particularly interested in the implications of (2.3) with 2 2. 2 2 92 respect to the parameter T2 = cr/a2 or equivalently, 0 = JT /(I+J2) since, as shown by Hill [1965] and by Lindley and Smith [1972], conditional upon either of these parameters, the analysis of the model is extremely simple. Thus the complexity and difficulty in the analysis of the oneway random model is really contained in the nature and quantity of information that the data provide about T2 or O. It turns out that 0 is a somewhat more convenient parameter, and it is easy to verify that the posterior density for 0, from (2.3), is N1/2-1 N2/2-1 (2.4) p"(0) ( -, 0 < 1 <Q(0)} 3 where N1 = IJ + X - 1, N I + X N = IJ + X + - 1, and Q(O) = (SSW+ Co)O + JC (l1-) + SSB 0(1-0). This density was obtained by Culver [1971] in his University of Michigan doctoral dissertation. Much of the remainder of this article is concerned with the exact and approximate mathematical analysis of distributions of the form (2.4). Since Q(0) = JC > 0, Q(1) = SSW +Co,and the coefficient of 02 is negative, it follows that the quadratic Q(0) always has two real roots, say xl and x2, with xl > 1 and x2< 0. With ql = l/xl, q2 = 1/(1-x2), the posterior density for 0 can be written as N1/2-1 N2/2-1 (2.5) p"() O (1-o) N [(1-ql ) (-q2 (1-) ) 5

If we define f(1 BABCl do f(A,B,C;qt,q2) = / b( —— 10 dO, 2 0 {(1-qle)(l-q2(-))}C the posterior moments of 9 and T2 are then E(k =f(N1/2 + k - 1, N2/2 - 1, N3/2; q1' q2) E( idata) = f(N1/2 - 1, N2/2 - 1, N3/2; ql1 q2) (2.6) 2k, -k f(N1/2 + k - 1, N2/2- k - 1, Ny2; ql, q2) E(T2k j data): J E(T data) = f(N1/2 - 1, N22 - 1, N32; q1' q2) for all k such that the integrals are finite. Thus the posterior moments of 0 and T2, which are crucial for inference and design purposes, are all expressible in terms of the functions f(A,B,C;ql,q2). The study of such integrals is the primary aim of this report. In the next section such integrals are represented in terms of Appell hypergeometric functions, of which the hypergeometric function is a special case. 6

3. HYPERGEOMETRIC REPRESENTATIONS We first observe that f(A,B,C;ql,q2) is a generalization of the hypergeometric function F(a,b,c;z) = 1 + z + a(a+l)b(+l) 2+ " lXc lx2xcx(c+l) For c > b > 0, then [Whittaker and Watson, p. 293], =(c) 1 xb-1 c-b-1 F(a,b,c;z r(b) = r( x (l-x dx nbjrc-b 0 {1-xz}a where r(.) is the usual gamma function. Hence, if for example q2 = 0, then f(AB,C;qlq2) = r(A+(B+2) F(C,A+1,A+B+2;ql); while if q = 0, then f(A,B,C;ql,q2) = r(A+l)r(B+l,A+B+2;q r(A+B+2) F(CB+A+B+2;2) Here, as is true in almost all of our applications, it is assumed that A + 1 > 0 and B + 1 > 0. The condition qi = 0 is equivalent to the root |xi = o, so that when one or more of the roots is sufficiently extreme, then the integrals we require can be approximated by multiples of the hypergeometric function. In fact, if both qi = 0, then f(A,B,C;ql,q2) = r(A+l)r(B+1)/r(A+B+2), and the posterior distribution of e is simply the beta distribution with parameters N1/2 and N2/2. Needless to say this situation rarely 7

arises in applications, although having one extreme root is quite common, as will be discussed later. In the general case, where neither qi can be approximated by 0, the function f(A,B,C;ql,q2) is a generalization of the hypergeometric function studied by Appell [Bailey, p. 77]. Even in the case where one of the roots is extreme, however, so that we are dealing with the ordinary hypergeometric function, little is known about many of the situations that are pertinent for the analysis of the one-way model. For example, if x1 is very large but x2 is near 0, i.e. ql - 0, q2 1, we must evaluate F(C,B+1,A+B+2;q2) for q2 nearly 1. This, however, cannot be approximated by setting q2 = 1, since the parameters of the hypergeometric function here are such that the series is divergent when q2 = 1. In the next sections we proceed to develop new methods for the exact and approximate analysis of f(A,B,C;ql,q2). 8

4. p"(6) AS MIXTURE OF BETA DISTRIBUTIONS For a Bernoulli sequence with probability of success p, let Sr be the number of failures preceding the rth success. Then Pr{Sr= i ) pr(l-p)i, i=0,1,2,..., defines the Pascal distribution with parameters p and r. In fact this formula defines a distribution (negative binomial) even when r > 0 is not integral, and we shall use the above notation Sr whether or not r is integral. Expanding the denominator in (2.5) yields N1/2-1 N2/2-1 r+ p"() r6 (1-6) i i Z (r-1) q1 8e z ~. q9 (1-8)j i=0 1 j=O J (4.1) N1/2-1 N2/2-1 S (1) S (2) c e (1-e) 2 E{6 r (1-0) r }, for ql8 < 1 and q2(1-e) < 1, where S(i) has the negative binomial distribution with parameters pi = l-qi and r = N3/2, i = 1,2, and Sr ( is independent of S (2) Thus p"(6) can be viewed as a mixture of beta distributions. Note that when both pi = 1 the mixture again reduces to a beta distribution with parameters N1/2 and N2/2. From (4.1) it follows that evaluation of f(A,B,C;ql,q2) is equivalent to evaluation of a sum of the form ~~ Z (Hr+i-l) i er+j-1l j r(A+l+i)r(B+l+j) iO j=o 1i q (2 ) (A+B+2+i+j) and 9

E{OJdata} =+ S (1) N /2 + S (2) - 1 E{/' 9e r (1-0) 2 r d E{9|data} 0 -- - - N 1/2 + S -1) 1 N2/2 + S (2- 1 W * r (1-9) 2 r } E{f e r (-e) } (4.2) E{r(N1/2 + 1 + Sr(1))r(N2/2 S (2))/r(N1/2 + N2/2 + Sr()+ S (2)+ 1)} E{r(N1/2 + Sr())r(N2/2 + S ())/(N/2 + N2/2 + S(l)+ Sr(2)) Although we have developed direct methods for evaluating such sums and expectations, we shall not pursue them in this report, since the methods of Section 5 based upon integrated generating functions are generally more useful for our purposes. The representation of the posterior distribution as an infinite mixture of beta distributions is, however, of interest in its own right, and aids in developing intuition into the mathematical nature of the problem. 10

5. INTEGRATED GENERATING FUNCTIONS A great deal of insight into the nature of f(A,B,C;ql,q2) and the equivalent summation is gained by a probabilistic interpretation as an integrated generating function. Let X be any random variable taking on only non-negative integral values. The generating function for X is the function M(t) = E{t } for 0 < t < 1. Clearly f1 M(t) dt = E{tX} dt = E{(X+1)-1} OA where the interchanging of the order of expectation and integration is justified by Fubini's Theorem. More generally, for any a > 0 and integral b > 0, 1 ta-l b (5.1) f t (l-t) M(t) dt = r(b+l) E{(X+a)(X+a+l)x...x(X+a+b)} 1 It will now be shown that f(A,B,C;ql,q2) can be represented as such an integrated generating function. If, as in Section 4, Sr has the negative binomial distribution with parameters p and r, then it is easily verified that the generating function f S ( r (1) of Sr is (Pqt). Now let Sr(l) have a negative binomial distribution with parameters pl and r, and let Z have a negative binomial distribution with parameters 2 and S (1), so that Z can be viewed as the number of hpam2 ar' failures preceding the S (1) th success in a Bernoulli sequence with probability P2 of success on each trial. By the well-known formula for compound distributions (see Feller [1950, p. 223]), the generating 11

/ p1 r function for Z is then ) — -P2. Now let H = A+B-C+2. Making l-q l q2 I the substitution t = 8/(P2 + q28) in the integral defining f(A,B,C;ql,q2) then yields - (B+) 1 A _t)B (P H Pi (5.2) f(A,B,C;qlq2) = P jC ptBA) t )B-t P 2 P1 0 2t2 2 dt. l"-q l-q2t I By the previous results for S (1) and Z, the generating function of X=Z+S(1) + S(2) t P2 H S (1) + Z X= Z + S + SH(2) is M(t) = E{tX} = {- 2t} E{t r } 2 H SE S (1)]} P2 H S (1) P2 S (1) - {i } E{t r 2]r P2 H p2t Sr() P2 H / PI l-q2t ql p2t -q2t From (5.1) and (5.2) it follows that when B is integral, f(A,B,C;ql,q2) = plC p2B+l)r(B+) E{(X+A+l)x...((X+A+l+B)} (5.3) and = E*.x(X1+A+2)+B)), where X is defined E(eldata) E{(X1+A2)x. -l 1 E{(X+A+2)x...x(X+A+2+B)} like X but with H replaced by H + 1. 12

By symmetry of (2.5), f(A,B,C;ql,q2) = f(B,A,C;q2,ql), so that if A is an integer, then also f(A,B,C;ql,q2) = f(B,A,C;q2,ql) (5.4) = p2 pl( A+l)r(A+ ) E{(X*+B+l)x...x(X +A+B+1)}, and E{(Xl+B+2)x....x(X*+A+B+2)} 1 E(9ldata) -- E{(X*+B+l)x...x(X +A+B+1)} * * (9) (}\ where X = Z + + S S(1) and Z is defined like Z except that pl and P2 are interchanged. Now let q = ql+q2-qlq2 and p = l-q = P1P2. From (5.2), f(A,B,C;ql,q2) = p2; /2a tA(l-t)B (-q 2t)(C-qt)C- dt, (5.5) = 2^ (.-)i(C:H)qi f tA+i(l-t)B(l-qt) dt, i=o 0 where the summation terminates at C-H if C-H is integral. Alternatively, expanding -qt -C tqlP2 -C expanding (l t = (1 - t in (5.5) yields cl-qtt l-q2t (5.6) f(A,B,C;q,q2) = p)X/2 (C+qj-l p' i 1 tA+( lt)B t)-q-J dt. j=2 2 O J 2 p 0 2t) Both of the expansions will prove useful in studying f(A,B,C;ql,q2). First suppose that one qi is 0. As was noted earlier, in this case f(A,B,C;q1,q2) reduces to an ordinary hypergeometric function. Suppose 13

ql = 0. From (5.2), expansion of (l-t)B yields 1 (B+) (-l)' (B tAi P2 H(+ dt (5.7) f(A,B,C;O,q2) = i=0 dtO where the summation terminates at B if B is an integer. From (5.1) P2 H and the fact that { q} is the generating function of the negative binomial distribution with parameters P2 and H, we have (5.8) f(A,B,C;O,q2) = P2( ) (B) E(S(2)+A+i+l)-1 i=O which is a representation for the ordinary hypergeometric function in terms of reciprocal moments of a translated negative binomial q k -r-1 random variable. Now define T(q;k,r) = / x (l-x) dx, where 0 O < q < 1, k > -1, and r is arbitrary. Then if Sr has a negative binomial distribution with parameters p and r, E(Sr+k+l)- tk -qt rdt = Pr q( ) k (l-x)r dx = prq-(k+l) k q ~~(kt')$ X (l-x dx rq-(k l)T(q;k,r-1 Applying this result in (5.8) yields -X /2 -(A+l) i B -i (5.9) f(A,B,C;O,q2) = P 2 q2 (-1 ) ( )q2 Y(q2;A+i,H-1). i=O Note also, from (5.7), that, when B is an integer, 14

f(A,B,C;O,q2) =- 3+1) (l-t)BtA 2 dt 0 1- 2t (5.10) (H-) (A+B+1) (1B 2 ( BxA -x) = P2 {1 - - x(1-x)-H 0 P2 (H-l) (A+B+l) (1B Z (ii B (H-i) (A+B+ (-KL) ) P2p T (q2;AH-1-i), = P2 2 i=0 i 2 i=O with B iB -i E (- 1) () q2 T(q2;A+i,H-1) ) i=o0 (-1 p/q2) Z (-1) (i)P2 (q2;A, H-l-i) Now return to the general case. From (5.6) and (5.10), if B is an integer f(A,B,C;qlq2) = -B1 Z (j- l (-1) B q(A+i+j (q2;Ai+j,I-1) 2 p2 ~j=0 J p9 iq0 i 2 T (g2;A+i+jII+j-1) j=0 j (.)qi=0 (5.11) H-1 q-(A+B+1) B Z(C+j-1 j B - -p2H-l q2(A+B+) (-1) (Cl )l(2/q2) (-1)i( ) p2-'(q2;A+j, H+j-l-i), j=01 =02 On the other hand, from (5.5) with B and C-H integers, we obtain (5.12) f(AjBC;qq) HB 1 (- (C-H) q2 i (;A+i -j. (5.12) f(ASC;qlq2) = P2 -- (-1) ( Z (-1)i( ) q- x (q;a+i+jC-1). j=0 j i=0 i These fonnilas reduce f(A,B,C;ql,q2) to a (sonletimes finite) sum of terms involving the functian (q;s,t) for various s and t. In the next section we study properties of the functicn Y(q;s,t). 15

6. THE FUNCTION i(q;s,t) The results of the previous section show that the study of f(A,B,C;qlq2) can be reduced to that of the function (q;s,t) = yS (l-y) dy, 0 where s > -1, 0 < q < 1, and t is any real number. When t is negative clearly 4(q;s,t) is a multiple of the incomplete beta function; while for t > 0 we have an interpretation in terms of the expectation of the reciprocal of a translated negative binomial random variable, as given by (5.9). We now derive further properties of this function which will be useful either for insight or forqomputaftional purposes. Expanding y = (l+y-l)s by the binomial formula yields 00 q 4l(q;s,t) = ( (-1)i () (1-y)1i-t-l dy i=O (6.1) 00 li where p = l-q, and (1-p )/0 is defined as ltn(l/p), if it should occur in the sum. When t is positive it follows from (6.1) that l(q;s,t) t pt as q -+ 1, where, here and throughout, the symbol "~" means that the ratio tends to one. When t = O, l(q;s,t) t" (l/p) as q +1; while for negative t, P(q;s,t) tends to r(s+l)r(-t)/ (s+l-t), as follows immediately from its relationship with the incomplete beta function. We note that the summation inC6.1) is finite if s is 16

an integer; while, if w = t-s-l is a non-negative integer, qw ws+i+l -l f(q;s,t) = y (l+y) dy = Z (w) [q/p] (s+i+l ) 0 i=0 is again a finite sum. In these cases computation of f(q;s,t) is particularly easy. The function 4(q;s,t) also satisfies certain recursion formulas which are of use either for computation or for insight. Integration by parts yields (6.2) l(q;s,t) = t-1 [qs p - s (q;s-l,t-l)] if t 7 0. Also (6.3) l(q;s+l,t) - 4(q;s,t) = -4(qs,t-l), from which [s]-1 (6.4) l(q;s,0) = i(q;s-[s],O) - Z q /(s-i), i=0 where [s] denotes the largest integer less than or equal to s. Other formulas of use are s+l -1 (6.5) 4(q;s,-l) = q ( ) for all s > -1, (6.6) 4(q;s,- -) = (1+2s) [2sl(q;s-l,- -) - 2 q p ], 2 2 for s 7 -, and 00 (6.7) (qst) = (t) q+i (s+i+l), for t > 0. i=O 17

Some special values that can be used to start off the recursions are 1 1 [p2 l(q;O,O) = Ztn(l/p), 4(q;,- ) = 2[p -], 1 2 arcsin ( (6.8) 2(q;-, - ~) = 2 arcsin (q ), 1 1 1 i(q;- 2-O) = 2 Z q (1+2j) = Zn[(l+q2)/(l-q2)]. j=O In most applications it will suffice to choose X and A so that the requisite s and t are either integral or half of an odd integer, in which case the starting values given by (6.8) are particularly use ful. 18

7. E(Oldata) for Small P1,P2 For applications to the random model the most interesting, and also the most difficult, cases are those in which one or both of the pi are small. We have already seen in Section 4 that when both pi are nearly 1, the mixture of beta distributions representing the posterior density of 0 can be approximated by a single beta distribution, since the Si) will then be 0 with high probability. However, when one or both of the pi are small, one or both of the S(i) will be large with high probability, and the mixture r cannot be adequately approximated by a single beta distribution. Since N3 > N1 and N3 > N2, it is apparent from (2.5) that the posterior distribution of 0 becomes degenerate at 1 if pi + 0 with p2 fixed, and becomes degenerate at 0 if p2 + 0 with p fixed. Since 0 < 0 < 1, also E(OJdata) goes to 1 or 0, respectively, in these cases. If both pi tend to 0, however, it is not clear from (2.5) how the posterior distribution of 0 will behave. We now obtain asymptotic expressions for E(Oldata) when one or both of the p. tend to 0. These expressions lead to simple approximations that will be suitable in most applications to the random model analysis of variance. Let D = C-H = 2C - A - B - 2, and define the functions (7.1) G(A,B,C,D;ql,q2) = J1 tA (l-t)B (1-qt)C (l-q2t)D dt. 19

From (5.5) and (6.1), oo oo r 1 i=0 j=0 i 2 (7.2) = q-(A+l), (-1) 1-7 —1 V(A,B,D,;l/q,q2/q) G,, C, —- - 2 where V(A,B,D,t;x,y) is formally defined as the series ai+j B D A+i+j ii r it (-1 fr(() j( ) (D) x 0i j L i=O j=0 with Z a non-negative integer. It is straightforward, however, to show that in fact -B DJ +jBD A (7.3) V(A,B,D,t;x,y) =(l1-x) (-1) i (x/l (y/l-y)D i t a-i-j i=0 j=0 which is a finite sum and always convergent for x i 1, y i 1. From (2.6), we have G(A+i,B,C,D-l;qlq2) E(Oldata) = p, and E( data) = 2 G(A,B,C,D;ql,q2) (7.4) G(A,B+1,C,D-1;ql,q2) E(1-eo data) = G(A,B,C,D;ql,q2) The asymptotic behavior of these posterior expectations as one or both p.i tend to 0 is now given by Theorem 1. In the theorem we write E(O) for E(I data). 20

Theorem 1. If p -B 2 -(C-A-) + 0, then E(1-O) ppl(B+1)/(C-B-2); C-A-2 (C-B-) if p2 p(C- ) + 0, then E(0) ~ P2(A+I)/(C-A-2). Proof. First note that by symmetry the assertion regarding E(O) follows from that for E(1-0), so we consider only the latter. There are a number of special cases to consider in proving the Theorem, depending upon which, if any, of A,B,C,D, are integers, and whether one or both p. tend to O. However, the proofs for other cases are minor variants of the proof for the case in which A, B and D are integers and both pi tend to O. This case will now be proved. From (7.2) and (7.3), G(A,B,C,D;ql,q2) = (1)B pB (qlp2)D q-(A+l+B+D) A+B+D prj+1 -C l B D B D A x E (k)l E E 1i ) EZ ( -1)J(B) D(D A ) -i'(q )j Z=O kc-~- i=o j=o Now for k< B, the dominant term in the sum on the right-hand side is obtained by taking i = a and j = O, so that B (-1 R+-C 1 B Dj E (-1) p - Z z (-1 )(B D A )p-i(q /qp Lc —l i=O j=O B B -1 1-C Z (-1)% (B) (C-k-l)l p as P1,P2 + O. %=0 On the other hand, for Q > B + D, the corresponding dominant term is obtained by taking i = B and j = D, and A+B+D Rp +1-C I B D A (_lR IP -1 Z 2 (-1)J (B D A )Z-i- (q /q, k=B+D LC-L -1 -ii=O j=O i J -i-j A+B+D D A -B -D z IT-1 "+) P- P2D I =B+D L C-' 2 21

C-1 -B -D C-B-1 -(C-A-i) Since p p p2 = pC p2 goes to 0 by assumption, it follows that the terms for t < B dominate those for t > B + D; and similarly, it can be shown that they dominate the terms with B < t < B + D. Hence B B B+i-C D B G(A,B,C,D;ql,q2) ~ (-1) p P2 (-1) (C-l)1, C-B-1 - (C-A-1) if P1iP2 + 0 in such a way that pC p 0. Applying the same argument to G(A,B+1,C,D-l;ql,q2), G(A,B+1,C,D-l;q lq2) B- D-i B+i B+1 ) i C-B-2 -(C-A-1) ) p2 C (-1) ) i1 2 Since B B i r i B c-2 Z (-1) () (C —i) 1 = J (-t) t- dt e=o 0 =(-) +r (B+) r(C-B-1 /r (C) (7.4) then yields E(1-0) p (B+1)/(C-B-2), as claimed. C-B-2 It may be noted that in applications the case in which pB x - (C-A-1) P2 - 0 is typically the most relevant, since B is typically much smaller than A. In terms of the parameters of the prior distribution, the asymptotic values in the two cases covered by the Theorem are E(O) P (IJ+\-1)/(X -2) E(1-0) p~ p (I-1+X)/[I(J-1)+X-2]. 22

An approximation to E(e) which combines both of these asymptotic values, and is appropriate when one pi is small and the other large, is A+1+(C-B-2)ql/p1 (7.5) E(0) - A+B+2+(C-B-2)ql/pl +(C-A-2)q2/p2 N1+[I(J-l)+X-2]ql/pl N +N2+[(J-1)+A-2]ql/pl+[X-2]q2/p2 This approximation yields the exact value N1/(Ni+N2) when p=p2=l, and has the same asymptotic values given by Theorem 1 when one of the Pi goes to 0 and the other to 1. Of course (7.5) only applies when X > 2. In general it yields a good approximation whenever p2 is not extremely small. Finally, we note that when A, B, C, and D are integers, from (7.2), (7.3), and (7.4) we obtain the exact formula (7.6)A+B+D B+l D-l (7.6) E (_-,l) Z+I-C- 7 (-1) i (-1) l D-1)(- A )p —6 z=O C- -lJi=O j=O E(1-O) = (-l)pl/ql A+B+D B D DE (-1)k ~Z+l-Ci q. ) B D A pZ=O L C-9-l.Ji=O j=O i j J where 6 = (q2/P2ql). This formula makes computation very simple in many applications. Using linear interpolation, we can also employ (7.6) for non-integral values of A,B,C, and D. 23

8. BOUNDS FOR E(O data) It is possible to obtain simple and useful bounds for the posterior expectation of 0. These bounds are in fact closely related to the asymptotic values of Theorem 1. Throughout this section we write E(O) for E(Oldata). Consider first the case p1 = 1. From (2.6) and (5.2), -1 E(0)/P2 = W(q2) [1 - q2 W(q2)], where f1 tA+l (1- B 1 -H-l dt 0 l-t) (-q2t) dt e H = A+2 tA (l-t)/ > -q t) l dt Since H = A+B+2-C = (I-1)/2 > 0, it is easily seen that W(q2) is a non-decreasing function of q2, for 0 q2 < 1. But W(0) = (A+1)/(A+B+2), and if B-H = C-A-2>0, then W(1) = (A+1)/(C-1). Thus for p1 = 1 and X > 2, (8.1) p2(A+1)/(A+B+2) < E(0) < p2(A+l)/(C-A-2). f(A,B+l,C;ql,q2) f(B+l,A,C;q2,ql) Since E(1-0) = -f(A - 2- (BAC;q2,q), the above bounds f(A,B,C;ql,q f(B,A,C;qql) in terms of P2 can be converted into bounds in terms of pi, yielding, for 2 = 1, (8.2) 1-pl (B+1) / (C-B-2) < E () < l-p1 (B+1) / (A+B+2). Clearly E(O) is an increasing function of ql, for any q2, and is a decreasing function of q2, for any ql, so we have proved 24

Theorem 2. For any ql and q2 satisfying 0 q < 1, 0 q2 < 1, p2 (A+1)/(A+B+2) < E(0) < 1 - pl(B+1)/(A+B+2). It is worth noting that the upper bound for E(1-O) given by (8.2), and the upper bound for E(O) given by (8.1), are precisely the asymptotic values given in Theorem 1. Sometimes the universal upper and lower bounds given in Theorem 2 are very nearly equal, in which case E(O) is very nearly determined. The case in which the universal upper and lower bounds provide the least information is when both pi are nearly 0, since the bounds then become 0 and 1. However, in this case the asymptotic values given in Theorem 2 will typically be appropriate. 25

9. THE ROOTS AND PRIOR PARAMETERS According to (2.5), for fixed I, J, A, X, the posterior distribution of 0 is completely determined by the qi,-or equivalently, by the roots x. of the quadratic equation Q(o) = 0. These roots are functions of the data SSW and SSB, and of the parameters Coand C of the prior distribution, and are explicitly given as SSW + SSB + C- JC JC C SSB + [SSW + SSB + Co- JC ]2)2 (9.1) x. 1 2(SSB) where always xl > 1 and x2 < 0. Since ql = xl, q2 = (1 - x2)1, the interesting and delicate cases in which the pi are small occur when x1 is nearly 1 or when x2 is nearly O. Noting that the sum of roots is x1 + x2 = 1 + (SSW + CO- JCa)/SSB, and that the product of roots is xlx2 = -JC /SSB, we have (9.2) P2/q2 = Pl/ql - (SSW + C- JC )/SSB, P2/q2 = qlJC /SSB, and P1/P2 = (SSW + Co)/JC. From (9.2) it follows that both pi will be small if and only if both JC /SSB and (SSW + Co)/SSB are small. Since Q(O) = JC, Q(1) = SSW + C, and Q(1/2) = [Q(O) + Q(1)]/2 + SSB/4, when SSB is small Q(9) becomes nearly linear, with pl nearly 1 if Q(1) > Q(O), and P2 nearly 1 if Q(O) > Q(1). It is important here to note the sensitivity of p2 to the choice of C. Thus, P2 + 0 as Ca + 0, and P2 + 1 as Ca +. 26

Let us now consider ways in which values of the parameters X, Co, X, and C, of the prior distribution of the variance components, can be chosen. Clearly all these parameters must be positive in order that the prior distribution be proper. When X > 2 the prior expectation of a2 is C /(X - 2), 2 2 2 when X > 4 the prior variance of a is 2C /(X - 2) (X - 4), and in any 0 case the mode of the prior distribution of 2 is C /(X + 2). The same 2 2 relationships hold for the prior distribution of a2 also, and since 2 and aa2 are independent a priori, it follows that the prior distribution 2 2 of a 2/a is that of [AC /(X Co)]Fx,where FX denotes a random variable having the (generalized) F distribution with X and X degrees of freedom (X and X are not necessarily integral). Hence the prior expectation of a2/a2 is [XC /(X - 2)C ], and the prior mode is [C (X - 2)/(X + 2)Co] if X>2. a 0 a a - a a' These relationships can be used to choose values for A, Co, X and C, which 2 2 are in accord with one's prior knowledge about a2 and a. An important difference between the prior parameters X and C for a, and the parameters X and C for a, is that ordinarily the posterior distribution of 0 has only a slight dependence upon the choice of X and Co, but is very dependent upon the choice of X and C. Thus there is usually substantial robustness in regard to the choice of X and C, but not in regard to the choice of X and Ca. The insensitivity to X and Co may be seen from the fact that they affect the posterior distribution of 0 only by virtue of the equations Q(1) = SSW + Co, N1 = IJ + X - 1, and N3 = IJ +X + X - 1. Since the prior mode of SSW is I(J - l)Co/(X+ 2), it follows that whenever X is small compared to I(J - 1), we anticipate that Q(1) SSW, N, IJ - 1, N3 IJ + A - 1, and it is very nearly as though X = Co = 0. 27

On the other hand there is a great deal of sensitivity to the choice of X and C, since, as observed earlier, P2 and hence also E{eldata} go to 0 as C +0, while in the equation N2 = I + X - 1 often a is not small compared to I. In particular, it would be absurd to take = C = O, since this would imply E{eldata} = O. Thus in applications it is necessary to exercise great care in the choice of X and C a a 28

10. EXAMPLES Some examples illustrating how the previous results can be applied are now given. Example 1. I = 3, J = 3, X = C = 0, Xa = 2, C = 4, MSB = 5, and MSW = 1. Then p, =.2387, P2 =.4774, and using (7.6) we easily obtain E(Oldata) =.84. The universal lower and upper bounds given in Theorem 2 are.32 and.92, respectively, while the pertinent asymptotic value is 1 - pi x (B + 1)/(C - B - 2) = 1 - pl =.76. Note, however, that p -B-2p2(C-A-) = p2/p2 =.12, which is not particularly small, so that the asymptotic value could not be expected to yield a very good approximation. Formula (7.5) does not apply in this example because X = 2. Example 2. I = 20, J = 10, X = Co = 8, C = 10, MSB = 10, and MSW = 1. Then p, =.4206, P2 =.2337, and by numerical integration we obtain E(Oldata) =.91. The universal lower and upper bounds are.21 and C-B-2 -(C-A-1) 89 4.95, respectively. In this example p C-2 p 4 s I= 2 /P2 is negligible, so we anticipate that the asymptotic value 1 - pl(B+1)/(C-B-2) will yield a good approximation, which it does, namely,.94. Formula (7.5) does even better, giving.90. Example 3. I = 5, J = 2, X = C = 0, X = 8, C = 1, MSB = 10, and MSW = 1. Then pl =.1069, P2 =.0427, and E(Oldata) =.08. The universal c-A-2 -B-1 lower and upper bounds are.02 and.94, respectively. Now p A2pl(C-B = 23/pl2 =.02 is small, so we anticipate that the asymptotic value 29

p2(A+1)/(C-A-2) will yield a good approximation, which it does, namely,.06. Formula (7.5) gives.19. 30

11. COMMENTS We conclude with some general comments. First, it appears C-B-2 (C-A-l) C-A-2 -(C-B-l) that whenever either p -B-2 p or pC2 p( is very small, then the appropriate asymptotic value as given by Theorem 1 yields a good approximation to E(Oldata). However, except when p2 is very small, formula (7.5) often does as well as the asymptotic value, and has in addition the virtue of being a good approximation when both pi are large. (Of course, as both pi tend to 1, the universal lower and upper bounds both tend to (A+1)/(A+B+2), so that the case of large pi is generally quite easy to deal with.) Although it is difficult to give a completely general rule as to which approximation to use, ordinarily the asymptotic value 1 - pl(B+l)/(C-B-2) will be appropriate whenever C-B-2 = [I(J-1)+X-2]/2 is much larger than C-A-2 = (X -2)/2, as is usually the case. Note in this connection that Example 3 was unusual in so far as C-B-2 was 1.5, while C-A-2 was 3, and it was only in this example that 1-pl(B+1)/(C-B-2) =.57 did poorly. In general our attitude is that since the exact formulas derived in this article allow E(Oldata) to be calculated with only modest effort, the primary purpose of the approximations is first to provide insight into the nature and behavior of the Bayes estimates, and secondly, to use such insight to aid in the design of the experiment, i.e., the choice of I and J. Such questions are being explored by E. Bangura in his doctoral dissertation at the University of Michigan. Our second general comment concerns the robustness of Bayesian inference to the choice of X and C. Ordinarily one anticipates that 31

small changes in prior parameters such as X and C will have only a slight effect upon Bayesian inference, particularly, when there is substantial data. But this is not the case here. For example, taking C very small forces E(Oldata) to be nearly 0, while the behavior of the Bayes estimates depends sensitively upon whether X < 2 or A > 2. 0 -- Ot Indeed, as p2 - with X <2, we find an entirely different form of asymptotic behavior for E(Oldata) from that given by Theorem 1. We do not include these results here because X < 2 implies that the prior expectation of a2 is infinite, which is hardly realistic. Nonetheless, the lack of robustness here is important in and of itself as a general warning for those of us who take a Bayesian approach. Our final comment is that the approximation E(Oldata) = 1-pl(B+l)/(C-B-2) is closely related to Stein-type estimators [1966], so that our results concerning when this approximation is not appropriate may be of some value even for non-Bayesian approaches. This situation arises, in particular, when MSW is substantially larger than MSB, in which case, from the Bayesian viewpoint presented by Hill [1965, 1967, 1975], 2 the data carry negligible information about V, the pi, and a2. This is clarified by the Theorem of Hill [1975, p.570], where it is shown that essentially only two possible forms of limiting distribution are possible for extreme data, and that ordinarily as SSW grows large the posterior distribution of the above parameters converges to the prior distribution, whereas when SSB grows large, the stable estimation argument of L. J. Savage can be employed to yield (approximately) the usual least squares estimates. 32

BIBLIOGRAPHY Bailey, W. N. (1935). Generalized Hypergeometric Series. Cambridge University Press, London. Culver, D. H. (1971). A Bayesian Analysis of the Balanced One-way Variance Components Model. University of Michigan, Ph.D. dissertation. Feller, W. (1950). An Introduction to Probability Theory and its Applications. Wiley, New York. Hill, B. M. (1965). Inference about variance components in the oneway model. Journal of the American Statistical Association, 60, 806-825. Hill, B. M. (1967). Correlated errors in the random model. Journal of the American Statistical Association, 62, 1387-1400. Hill, B. M. (1975). On coherence, inadmissibility and inference about many parameters in the theory of least squares. In S. E. Fienberg and A. Zellner, eds., Studies in Bayesian Econometrics and Statistics in Honor of L. J. Savage, 555-584. North-Holland Publishing Co., Amsterdam. Lindley, D. and Smith, A. (1972). Bayes estimates for the linear model. Journal of the Royal Statistical Society B, 34, 1-41. Stein, C. (1966). An approach to the recovery of inter-block information in balanced incomplete block designs. In F. N. David, ed., Festchrift for J. Neyman, 351-366. Wiley, New York. Whittaker, E. T., and Watson, G. N. (1973). A Course of Modern Analysis, Fourth Edition. Cambridge University Press. 33 * U. S. GOVERNMENT PRINTING OFFICE: 1976 - 657-630/675

UNIVERSITY OF MICHIGAN 9015 03025 4430