Bureau of Business Research Graduate School of Business Administration University of Michigan BAYESIAN METHODS FOR ANALYZING NONRESPONSES WHEN ESTIMATING BINOMIAL PROPORTIONS Working Paper No. 66 by William J. Wrobleski Associate Professor of Statistics Graduate School of Business Administration University of Michigan FOR DISCUSSION PURPOSES ONLYNone of this material is to be quoted or reproduced without the express permission of the Bureau of Business Research October 1972

ABSTRACT In this paper, Bayesian allocation procedures for handling the problem of nonresponses in surveys with fixed sample size are examined for the purpose of estimating the true proportioi to be found in one of two underlying population categories. These Bayesian estimation methods use certain families of conditional, conjugatetype prior density functions. In addition, explicit use is made of an allocation parameter which is associated with assigning the nonresponses observed in the sample to the underlying population categories whose (unknown) category proportions are being estimated.

CONTENTS Introduction 1 An Illustration 1 Likelihood Function 4 Some Point Estimators for p 5 Standardized and Nonstandardized Beta Densities 8 Family of Natural Conjugate Densities 10 Notion of Conditional-Prior Invariance 12 Joint Posterior Density 14 Other Posterior Densities 15 Marginal Density of the Observed Data 17 Posterior Mean of f (Pi.ml, m2, mo) 17 Structure of Bayesian Point Estimator 19 Evaluating the Risk of the Bayes Estimator 22 Asymptotic Expansion for C (p ml, m, m) 27 2L ^

Introduction Frequently in fixed-sample size market research surveys and other sample survey studies, the sampling data actually observed yield nonresponses such as refusals to answer, not at home, unrevealed preferences, and so on. In most large probability samples taken on a nationwide basis, sample sizes range from 1,000 to 3,000 respondents. For fixed sample size surveys within this range, the rate of nonresponse often turns out to be a significant fraction of the original fixed sample size chosen. Serious bias may result when statistical analyses of the observed data are made treating the reduced, random sample size actually observed as the fixed sample size originally chosen and the nonresponses are ignored. In this paper Bayesian allocation procedures for handling the problem of nonresponses in fixed qample size surveys are examined for the purpose of estimating one of two underlying population proportions. These Bayesian estimation methods use certain families of conditional, natural, conjugate-type prior density functions and introduce explicitly an allocation parameter which is associated with assigning the nonresponses observed in the sample to the underlying population categories Whose (unknown) category proportions are being estimated. An illustration Suppose one is interested in estimating the proportion of househkolds whose annual income is $10,000 or more Using data obtained from a simple random sample taken from a given target population, say, the population of all U. S. households. From this population a random sample of n households would be chosen, and each household so selected would be queried about whether or not its annual incme was $10,000 or more. Thus prior to sampling the population can be represented by a two-category U- a.-. w ctg Bernoulli model in which one parameter represents the true (unknown)

-2 -proportion of households in- the U. S, population hring:an annual income of -at least $10,000, say, p, while the paramater q.l-p denotes the true (unknown)).proportion of households having an annual income of less than $10,000,. Schematically the following representation: isappropriate: POPULATION CATEGORIES Caego Cteg K gory K2 Income -$10000 or more Income- less than $10,000 p q=$-P To simplify the dis.cussipn, suppose that n sampling only one new..cat-egry will occur,:say, an;"unrevealed" income category. Thus the proces-s of obtaininig information about household income through sampling can be expected to yield a certain (random) number: of nonresponses, and the sampling process can be schematically represented as follows: Sampling. Population Categories Sampling Category Sampling Process Category K1 Cateor Cegory C Income $10,000 or more Income less than $10,000 Unrevealed Income Parameters p q=l-p P Data 1 m m2 In other words, for a random sample of size n, m1 households indicate they belong to the population category K1 (that is, the households having an annual income of $10,000 or more), and m2 households give category K2 as their income classification (annual income less than $10,000), while m households do not reveal their annual income classification for one 'O p3' - '

-3 -reason or another (for example, not at home, refusal to answer, do not know, and so on):There is a simple probability model for describing the observed sample data. Denote by po the probability that a household randomly selected will not reveal its income classification. Besides this sampling parameter, po, and the population parameter, p, one additional parameter is required to specify the sampling model for the observed data. Each household selected in the sample actually belongs to one of the two population categories, K1 or K20 However, m households selected in the sample for some reason do not reveal this information. Thus, it is natural to associate with each household which belongs to the sampling category called "unrevealed" income the conditional probability that such a household actually belongs to the population category K1, as well as the conditional probability that the given household belongs to the second population category K20 Denote the former conditional probability by A. Then, if X is assumed to be constant from household to household, A is simply the conditional probability that a randomly selected household belongs to the population income category K1, given that the household is classified as a nonresponse and did not reveal its income class when interviewed, Similarly, 1-A denotes the conditional probability that such a household actually belongs to the second population income category K20 Thus, for example, if X were known to be 3/4, then in a sample of 1,000 households selected at random, if 180 did not reveal their income classificatio n, it would be expected that 135 of these 180 households woul4 have annual incomes of $10,000 or more and belong to the population income category K1, while 45 of these households would be expected to have annual incomes below $10,000 and belong to the second population income category K2 categor~y K2,

-4 With these definitions in mind for the probabilities p, q, pop and X, the probabiity that a household selected at random reveals its income classification as K1 is simply p-Apo while the probability that a household selected at random reveals its income classification as K is qc-(l-A)p. Schematically, the sampling model may be represented as follows: Sampling Population Categories Sampling Category Model _Category K CateorCagory K Caegry C Category q( o Probabilities P-o -() 1 2 *o l. aml i;2 | ~ |5 i This model enables one to explicitly recognize that income patterns different from those among the households revealing their incomie classification might be associated with households not responding, and that these distinctions can and should be maintained if one were to allocate such nonresponses back to the original population income categories for purposes of statistical analysis of these sampling data. The distinct feature of the sampltong imodel, ti he explicit use of the conditional -robability parameter-A.. as.an all ocation parameter f or then n-onrespons. observed - amon g the.. sample datao Likelihood function In summary, for a random slaple of fixed sample size, n, were there to be no unclassified respondentts (that is, when p = 0 and m = 0), O~~~ O) the number of respondents in the sample, nl, who are classifiad in one of the two population categories (for example, households having an annual income of $10,000 or more) and the number in the sample, t2' who are

-5 -classified in the other category (for example, households whose income is less than $10,000) are distributed by the well-known binomial probability density f(n, n2lp) n n n 21- ----- p l q 2, 0 < n <n, n nn + n2 - _1 2. -! -2 -. On the other hand, when p > 0, it is possible that some respondents in the sample who actually belong to one of the two population categories will remain unclassified (for example, households who will not reveal their income class when interviewed in the sample). The observed sampling data, namely, mi, m2, and mn, are distributed by -ie probability density f(mI h2 m`P, M), po P = - m m2 where 0 < 1; 0 P < p < 1 -0o - _ and.<p <; 0 < < 1 q =.1 - p;. (1 - ",X)p with n= m + m2 +',.*'o (o 1 2 o The inequality Ap < p < 1 - (1- A)p simply expresseF analytically. the requirtment that the two Sampling probabilities p - po and c (1 - X)p be non-,,megativeo 0 Svme point estiats ftor In gen&eal the marginal probability that an individual belongs to one of two mutually exclusive categories, say, category K1, can be written using the law of total probability as P(K1) P(Ki&C) + P(K1&C) =,(KllC)P(C) + P(K(1C)P(C) where C denotes the event that an individual selected at random from the q -

I -6* given population does not reveal which category he belongs to and C denotes the event that such a randomly selected individual does reveal his classification. In the notation used for these probabilities P =. po + P(K1 C)( - po) where p denotes P(K1); A denotes P(K1JC); p denotes P(C); and 1 - p denotee P(C) Similarly, for the other population category, say, K2, P(K) P (K2C)> + P(K2&C) - pK2j C)P(C) 4 P(K2 C)P (C)i or, in the notation used for these probabilities where q = P(KZ) and 1 - A = P(K2IC q - (1 - X)Po + P(KZC)(l po) From these epressioats for p = P(K1): and. q P((K2 one immediately sees that wheneVer po is small p P(1%lIIC) and q P (K2C) and, consequently, even when the samtple size n is not necessarily large, p and q can Wb- estimated by the corresponding saimple proportions observed among tht classified responases aid the nonresponses ignored0 On the other hand,:uppose iht a random sample of n r 99 households that mi 46 revea1d:d 'ea annual income of $10Q,00 or more, m2 =21 revealed an annual income of less than $10,000, while m = 32 did not respond (that is, the households did not reveal their income classification) O For these data, nonresponses represent 32 pertent of the sample, indicating that p is not small for the population of households sampled, 0, ' Thus, it wouldno longer be adequate here to use the above approximations for p and q (which are valid when the proportion of nonresponses is small) and to estimate p and q by the correspond ing sample proportions observed

among the 67 households for which income data were obtained, How should p and q be estimated from these data? In analyzing thesg e data it might be decided to disregardthe m 32 nonresponses on the assumption that these unclassified responses should be treated differently than the classified responses ifi order to avoid potentially serious bias which could occur should the income distributing oif unclassified households be significantly different from the incote distribution of those households revealing their income classification. Thusi -he proportion p having an annual household income of $10,000 br more would be estlmated by the corresponding sample proportion among 'those houseold' in the sample who revealed their income classif ication,, aM i y,... < 46 4,6 p 4 4 0.69. Pi + M2. 46 + 2,1. = 0 69e Alternately, it might be decided to divide the m = 32 nonresponses proportionately between the underlying population categories on the ass umpt'ion that respondents not revealing their actual income class really have an income distribution similar to those stating which income class they belong to among the respondents sampled. Thus, p would be estimated as ' *, ml + +069mo _ 46+ l9x 32 46 -+ 22 68 0690 P =. n- 96 96 It is of interest to note that the same estimate,.or -p ts obtained whether one disregards the toniresponses or whether one allocates them to the basic population categories proportionate to those who indicate which income class they belong to, even though:te; underlying rationales for these two approaches are completely different. Still another approach for analyzing these household income data would -be to divide the nonresponses on a fifty-fifty basis between the two

-8 -income classes being conside ed on the assumption that median annual household income is approximately $10,000 and, hence, for any household selected at random there i8 a fifty-fifty chance that its incomewill be tinder $10,000 or $10,000 or moreo Under this assumption the proportion p would be estimated as m + 005m m + 0 5mo 46 +- 0,5 x32 46 + 16 62 p.. =......_ = ~ = 0_63e Pn 99 99 99 Estimates of p can also be obtained when certain exact a priori knowledge about the probabilities p, p, qnd/or X is available, For example, if A were known with certainty on an a priori basis, then the maximum likelihood estimator for the unknown proportion p would be (ml + m)/n. For these data, this estimator as a function of X takes the linear form 0.46 + 0.32X, giving a minimum estimate of.46 if A were 0 and a maximum estimate of.78 if A were I. This is obvious since X = 0 suggests allocating none of the m unclassified responses back to the $10,000: or more income class, whereas A = 1 suggeSts allocatiig all of the m responses.back to this income class. In most surveys of this type certainly some knowledge of the probabiliti'es p, po and A, especially in the form of a priori judments expressed as:pior probability densities would be expected0 Thus a BayesiAn analysis of such data will be presented in this paper for the case when prior probability densities are selected from the family of natural conjugate densities Standardized and.nonstandardized beta densities The family of-natural conjugate densities to be used in this Bayesian analysis includes products of certain standardized and nonstndard ized beta densities0o For convenience these densities and some of their moments are presented here.

-9 - Standardized Beta Density on Interval (O, 1) with Parameters s >,r > 0: Xr-1 r_ s xr-1 g(xlr, s) = -—, 0 < x <1, s > r > 0 t(r, s) where v-i t' it" dx r(r) r(s-.r). B(r, s) =. ( xr-l( ) dx = F(s) "r n tne c.f In tarn, te mearan ce modeo variatceiond caefticient of aria for a random variable having a beta density are: Mean of X Mode of.X. Variance of X Coefficient of Variation of X E(x|r, s) s = S M(xjr, s) =v —Is-2 2 (k f r(s.- r):: (x-r, s)=) ---- s (s + 1) CV(xjt; s) = a(x r, ) s' - r. 1: E` r S.(x r, s) ) Nonstandardi zed: Beta.Densi ty on Interval (a', b), b, > a > 0 with Parameters s >t > 0t,,r-lI.' ' s-r-1 ~g(:xKa, r1, s) ((-i X) a < x < b, s >r > 0. g(X.',,b r, s) =.T (b - a). 3(r, s) The meat and variance of a random variable having a nonstandarddensity can be shon to be, respectively, br + a(s - r) $(xla, b, r, s) =. —. —.. 0*3-e*1131~rrrr~rrrr;~~-~:.'... ized beta and above a(x|a, b, r, s) = (b - a)2 a2(x|r, s), where a2(x|r, a) is given

Farmily of natural co njugate densities As previously noted, the observed sampling data, namely, m1, m2, and mo, are distributed by the probability density f(ml m2 ml p, Po ) p.- o(p- p)ml (q - (1 A)pom2 where 0o 0< po <; 0 < p <; q = 1 - p and P p < p - (1 - A)p with n =ml + m2 +-mo, Upon inspecting the above likelihood function, it is immediately evident that its kernel and residual are simply the functions L(p, p, xlm n 2, m ) = K(p, pAjml' m2' i ) R(m1, m2, mo) where K(P, Po':lml m2, mo) p'=P (p - XPO)l(q- (1-)pm2 and n. R(ml, m2, mo) m -) — -— r o l m2" 0^ The kernel function can be expressed as products of standardized and nonstantdardized beta densities as follows: K(p, P, Aml' i2' o) = Po o (p - po)ml- (q - (1 - )p) 2 m 1 M M 1 ((pT a)ml (b. p)m2 =(P i0 (Po o (1 - p) l+ (b - P))in where a = APo0 b =1 - (1 - A)Po, and a < p < b.

-11 - Consider in turn the following beta density funtions, namely, (p - a) l (b- p) 2 g(pla, b, mi + 1, ml + m2 + 2) (I- po)ml 2+ 1 B(m1 + 1, m+ + m2 + 2) for a < p < b where a = Ap and b =1 - (1 - X)p, oI i i g(polmo +1, n + 3) = Po - p ) m"r " '2 J, P jo (1 * m 1 i2 ".'. ---. ' n -- '. B(mi + 1,. n + 3) 0 O < po < 1, and g(Xjl, 2) = 1 for 0 < A < 1, i.e., the uniform density on (0- 1), The kernel function is seen to 'be proportional to the product of these three beta densities, namely,, K(p, p, Xlnml m2' mo) -a g<P Po A |im, m2, mO) 0 0 Whe're g(p, s P,' ml, m m ) = g(pla, b, mi + l,imn + m.2 t,.Xg(^0: + 1, n + 3) g(jl, 2)...........~.,-g:.. I-. Famnily 1; Nonstandard,. Beta Priors fqr p. A (1 - bp).( 3r, ) where a = Ap, b = I (21 - X)po, a < p < b, s > r > 0 Fgamily I:;, StAndardiz:ed Beta Priors fo:t p, r m l I --.tS,. p. t-1 u-t'-l ( o =p )) < 'P, I,,f(pft,'... 0 < p f(Po|t,:, u)0='" < 1, u > t > 0, Family III: Standardized Bett Priors for X v-I. t?7-~-~I f( v, w) v (1 - A)w- ~, 0 < 1 w > v > 0. B(|v, 4 -).

-12 -Since the kernel function is proportional to a density formed by multiplying together certain beta densities from these families of beta densities, the general form for a natural conjugate density is obtained as the density function f(p, poXjr, s, t, u, v, w) - f(lpopA, r, s) f(polt, u) f(X|v, w) where f(ppo, A, r, s), f(po|t, u) and f(l|v, w) are chosen, respectively, from among the beta probability densities belonging to Families I, II, and III. Notion of conditional- prior invariance Natural conjugate prior densities for p, p and A are products '0 of densities in which the prior densities for p and X are independent standardized beta densities; the conditional prior density for p, given p and A, is a nonstandardized beta density defined on the interval a < p < b, wherea a and b = 1- (1 - A)p This inequality for p corresponds to the obvious probability statements -that P(K1 &?) < P(K1) < P(K1) + P(K2 C) P(K1 &C) <P(K) (P(Ki) + P(K)) + (P(K & C)- P(K). Since, however, -P(K1) + P(K2) 1, while P(K2) - P(K2 & C) + P(K2 & C), this inequality may be written P(K1 & C) < P(K1) 1 P(K2 & C) or, equivalently, P(K1|C) P (C) P(Ki) < 1 - P(K2|C) P(C), which is the inequality a < p < b where a -= Ap and b =1 - (1 - X)p when expressed in terms of the alternate notation used for these probabilities. Thus, since the range of p depends on po and A, any prior probability density for p in general must be a conditional probability density, In addition, however, for a natural conjugate ptior density,

-13 -the only dependenc on po and X exhibited by any conditional natural conjugate prior density for p is thi analytical dependency which corresponds to the general condition that the range for p is restricted by the above probability inequalities which involved the probabilities po and A. The use of a natural conjugate prior density for p, p, and A, therefore, greatly simplifies the prior assessments of these probabilities in a given application. One reason for this simplification is that judgments about po and kA need only be made so that the corresponding personalistic probability densities which are used to represent these:prior beliefs about p and A are assessed so they are independent (in the probability sense) of one another. A second reason for this simplification is that henever a natural conjugate prior density is used, an assessment of one's prior judgments about p is made conditionally foT the given values of p and ~ according to the principle of 'iconditional prior invariance,' In other words, no matter what values of p and X are chosen for the purpose of assessing a conditional natural conjugate beta prior density for p, the values of r and s which determine such a nonstandardized beta density must be independent of the particular values of p and X being used, and, therefore, the shape of such conditional beta natural conjugate priors for p must be invariant of po and A, except insofar as the range of p depends on these two probabilities. Thus the assessment of a conditional natural conjugate prior density for p can be carried out under the assumption that p = O0 0 Consequently when using a natural conjugate prior for p, p, and A, the prior assessment of the population proportion p ts in effect separated and made independently from the assessments of the proportions p and A, which are probabilities associated with the particular sampling process.

-14 - Joint po.terior- dnsi ty Using a, natural conjugate density as a prior density for p, p, and A, the posterior density of p, 'and A —given the observed data mi, m2, andmo —can be obtained. Te posterior density~ of course, is related to the likelihood and the selected prior density by S.yes' theorem, namely, f(ml' m1, m J"p Po A) f(P, P A) f(p, Po,. Alm, m2, mo) = — ' - and, therefore, f(p, po, lm m mo n a) f f(ml m2, nop, Po" ) ';,.1 A). Using a natural conjugate density for f(p, pO, A), f(ml' i2. momPp, Po, A) f(p, PO' A) a JCg^ ^ lp rO+twl (1 -_ p )m ^+t'l (p aV+m1 ( b 1?n+0~ fi+tl(im!+m21ut-1 II +a-~..ml~+m+s-il -... 0.~~0..,0 P - o) where a = ApO, b = 1 - (1 - AXpO, a < p < b, O < po < 1,0 < A < 1, Consider the following beta densities: f(PIP0, A, r*, s*) = (P -a) 1 - (1 - po) (r*, 2*) where a = Apoj b = 1 - (1 - A)pO; a < p < r* = mL + r; and s* m + m2+ s tO -. 0 ru* -t*f(ppo, t r*,s u*)=) ( - ~ < P < 1p. f(pt* u) 1 B(t*, u*) where..~ Where t* = m + t ahd u* = n u, o 0

-1.5 -and,v -l,-Aw*-v*-l f(Alv*, w*) = (1 < 1 where v* = v and-w* = w. Thus the posterior density of p, po, and A —given the data mr, m2, and m -is the density function f(p, po' A lm1 m2' mo) f( Pop Ar* s*, t, u*, v*, w*) where f(p, po' Ark, sr*:, t*, u*, v, t*) = f(pIpO, A, r*, s*) f(polt*, u*) f(Ajv*, '*) with r* =min + r ^s* = ml + 4m2 + s t*: t=n:t; + t, u* n n + u V* =: w* = W Other posterior densities The joint posterior density of the probabilities p and A 0 associated wihth te sampling nonresponses is of interest too. This joint posterior density is obtained from the joint posterior density of p, p, and X upon integration of f(p, p, ml, m2 mn) with respect to p. Thus, f(P i )11 m, mo) = { f(, P, Aim1, m2 m0) dp, where a = Ap and b = 1 - (1 - )p. Since 0, 0 f(P, PO' Alml,'m2', 2i) f(P'iPo', r*, s*) f(po|t*, r*) f(Alv*, w*), where f(po t*, r*) and f(A]Jv*, *) are standardized beta densities on the interval (0, 1), while f(pjpo, A, r*, s*) is a nonstandardized beta density on the interval (a,b), the joint p6sterior density of p0 and A is obtained immediately is b f(Po, li1 m2, mo) = f(polt, *) f(lv*, w*) f(pjp, A r*, s*)dp. a

-16 -The integral appearing on the right hatd side of this expression is equal to 1, of course, sincef(p|po,, r*, s*) is a probability density on the interval (a,b). Consequently, f(Po, iml' m2' mo) f (Polml' 2m2 m) f(Alml' m2, mo) where m + t - 1 ( l _ P ml + m2 + u-t-l f(Polml', m2, m) ' 0 f(p ol t*, u* B (mn + t,n + u) and: xv!- ~ ).w l-" f(xlv*, w*). f(flmlj 2, m ) = ( A V ( B(v, w) Since the joint posterior density of p and A is the product of their marginal posterior densities, one iortant obsertion is _t iven the observed data m., m2 and o,4 p and A continue to. be ~stochastically independent as they were before sampling (whenever a natural conjugate prior density is used in order to represent prior judgments about.p p, and X) Further, since f (A ml., m2, mo) = f (),:a second important observation is that the, sample data m1, m2, and m do not contain any iritra.n information about the conditional probability allocationp arameter X when a natural conjugate prior is u8sd to assess prior beliefs held about p, p, and X jointly. Finally, the posterior mean and variance of po are given as 0 m + t EQP. fmin1,in,~0 t, MoM E(pom 2 m2 ) u n +u and '2 -t* (u* - t*) _ (mi -+t) (mi +m2 + (t-u)) 0(o ml' "m' mo)..).= 5....... ~ ~ (u~n) (nbu* + 1) (n + u) (n + u + 1) while the posterior mean and variance of \ are identical to the prior mean and variance of A, namely, E(A|ml, m2' mo) w*= w E(A and 2i v*(w* - v*) v(w-v) 2 ca (A ml, m2, mo ) 2 2' ( (w*) (w* +1) (w +1)

-17 -Besides the joint and marginal posterior densities of the parameters po and A associated with the sampling nonresponses, the joint posterior densities of p and po and p and A, as well as the marginal posterior density of p, are of interest too, These posterior deisities, however, are difficult to obtain in analytical form, and a Monte Carlo analysis of them may be necessary. Marginal density of the observed data The marginal density of the observed data ml, m2, and m can be obtained from the relationship f(P. PO'!, ml 2" mo) f(m1, m2, m) f(p f(p, p m, Am!, m2, mo) and it is readily seen to be the density f n! B. B(r*s s*) B(t*, u*) f(ml m2 tso) = { -n m! I L B(r, s) B(t u) or, equivalently, f(m1, m2, mo) = r(n+l r (m+r) (+r) (m2+s-r) r (m+m2+u-t) r (s) (u) r(n+u) r(m +1) r (+l) r(m2+l) m +)r( r)-r(s-r)r(t) r (u-t) sometimes called a "btbta-bihomial" density or a "hyper-binimial" density. Posterior mean of f(plml, m2 mo) Since the Bayesian point estimator of p against quadratic loss is the posterior mean of f(p|ml, m2, mo), namely, E(p|ml, m2, mo) = { pf(p fml, m2, m )dp, o the posterior density of p —given the observed data ml, m 2, and mo —is required0 This posterior density, however, is difficult to obtain in analytical form.

-18 -The posterior mean E(p|m1, m2, mo) may be derived without explicitly obtaining the posterior density of p since an alternate expression for the posterior mean is E(pIm1, 2, m m) = (E(plpo, A, ml, m2' m)), where the outermost expectation appearing on the right-hand side of this identity is understood to be formed with respect to the joint posterior density f(po, X|ml, m2, mo) of po and A, given the observed sample data. The conditional density of p —given p,,X mi, m2' and m —is a nonstandardized beta density defined on the interval (a,b) where a = po and b =1 - (1 - X)po, namely, f(p|po, A, ml, m2, mo) g(pja, b, r*, s*) where g(p|a, b, r*, s*)= (- a)... (b _) _( - p0) B(r*, )s* and a < p < b with r* = ml + r and s* = 1 + m2 + s. Thus the conditional - -^ - ^1 2~..'J ' L. mean of p —given po, As ml, m2, and m — is simply E((p|p, A, mi, m2, mo) = E(p|a, b, r*i s*) where E(pfa, b r*, s $) = br* + a(s*- r*) r.. 1 ) E'(pja. b, r^ s^ ^ --- = a+ ^ j [b- | Substitution of a = Apo, b - 1 - (1 A)po, r* = mi + r, and s* = m+ + m + into this expression gives the conditional mean of p —given P, i, m, m2, and m — explicitly as 0 m + r E(piPoP, A mi m2, mO) = Xpo + m ml1+ M2 + ( - 1 2,B In turn, according to the identity previously given, the posterior mean of p —given only the observed data mI, m2, and m., namely, E(pml, 2, m o), m) — caP be obtained by using the right-hand side of this expression for E(p|po, i, m 2, n,) and forming its conditional

-19 -expectation with respect to the joint posterior density of p and A, given the observed sampledata, namely, nm + r E(E(pjp, X, mi, 2' m )) E(AP+(+ m+s)(1 - ) ml, m ) io 0 IT...."in +nP+o l '2 o' Applying this identity when the prior density of p, po, and is a, natural conjugate density yields the expression p* = E(pjml m2, m0) where F ~. t i t* or, substituting for v*, w*, t*, u*, r*' and s*, v* = v, w* w t* = mi + t, u* = n + u 0. r*p mi +r m + n + m2+ s. ].,/*~.[ "l ++' r m m2 + u- t] w4 n + u mC + m( +S n + U t Structure o f Bayesian point estimator The Bayesian estimator for p against quadratic loss using a natural conjugate prior density has a very simple -and appealing structure, As previoisly noted,- the marginal probability that an individual belongs to one of two mutually exclusive categories, say, category K1, can be written, using the law of total probability,. as P(K1) P(Ki & C) P(K1. & C) P(K1|C)P(C) + P(Kl C)P(C), where C denotes the event.that-,an individual selected;at random from the given population does not reveal which category he belongs to, while C denotes the event that such a randomly selected individual, does reveal his classification.,

In the notation used for these probabilities P A= Xp + P(.l|(l - po) An examination of the Bayesian point estimator for p reveals that p* can be expressed as p* = p* + P*(K c) (1 - p ) where A*, p*, and P*(KlC) are simply estimators for the probabilities A, po, and P(K1 |C) which appear in the expression connecting the probability p with these other population probabilities. The estimators P*, *, and P*(K1 JC) ciand can be shown to be weighted averages of the prior opinions held about these probabilities (as expressed through their expected values determined from the prior probability densities being used) and the sample estimators of those probabilities based on the observed data mi, m2, and in. In other words, consideting X* for example, ( 1m1 m2, mO) = W - = w=(A) -. E(Ajm, m2 i0) V' W 2 W* w Thus tthe posterior estimate of x —given the observed data mi, m2 and mo —i not affected or changed by the data and, therefore, it remains the sate as the prior estimate of A expressed as the me A () = E(A) of the prior density assigned to X. In other, words, the saple data do not.contain any intrinsic information about 4he conditional trobabilit allocat in parameter when a natural coju gate prior is used to assess pIrior beliefs held about p; p, sand A. P. On the other hand, the posterior estimate p* of p, namely, P.~g E -p P~: * (po~.ml, mn2 in) =, can be expressed as Pg* = W (B) + W2 E(Po) -= +. o..n V

where n______ I t w in= +u' 2 =n, T and E(p) = -. W nT u' n + u n Thus, p* is simply a weighted average of the sample estimate of 0 mIn po, namely, Po = ~ and the prior estimate of p as expressed by.the mean -. an P () = E(p ) of the prior density assigned-to po Finally,the posterior estimate of P(K C), the conditionalprobability that a classified observation belongs to the population category K;. is simply the posterior meal of-p —given the observed data mi, m2, and m and given that p = 0 which, of course, simply expresses analytically 0 0 the fact that P(K IC) is a conditional probability formed among classified respondents among whom, therefore, the probability that an unclassified responSewill be found (hamely, po) must be 0. Thus, r* P*:(Klj) = E(ppo = 0,, i', m2', ) s*. A prior estimate of P(KI0), however, is simply the mean of the!. prior conaditional density assigned to p, given p = 0, namey, p(o) (Kl) -= = E(pp = 0,. ), While the sample estimate of P(K1lC) is merely the proportion among the classified observations belonging to the given population category, namely, 1 2. IklKc) - __ _ It can be seen immediately that the posterior estimate P*(KljC) * be. se n,.:i. t' '*: of P(KllC) —given the observed data ml, m2, m — is just a weihted avrage of these.two estimates (the first reflecting prior judgments atout this probability, while the second is.based only on the sample data obtained), namely, ml m w.;in +v I ^i m + r P*(KiC)w j *w = 0, a) ml m2 +:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1: 0 1

-22 -where W = > i --- + w =2 i --- E(pfp =Q, A). l m!L + ta + s 2 m + m2 4'-o Eo s0 Evaluating the risk of the Bayes estimatpr For-quadratic loss, the risk of using the Bayes estimator p* = fEpiml, im2, mo) when the data mi fn2 and mo are observed is proportional the conditional variance aoplml, m2, m ) of the marginal posterior density to 0 of p. WheA using a particular joint prior density, 4, for expressing prior judgements held about p, po and A, the Beg rik associated with choosing p* a E(piml, mn2 mo)is giVen (up to a constant of proportionality) as p3*(g) = E(E (plm1, m2, mo)), namely, the risk of the Bayes estimator, p*, against quadratic loss averaged over the observed data mi, m2, and m using the joint marginal density, f(ml, m2, m ), of the data. The required conditional variance a (Piml, m2, mo) can be obtained indirectly from the identity Y: ml m22 a 2(pmi1, m2',t) 2a (E(pip, A, mil m2, mo)) +E(Q2(plp, i, m 2, m, mn)) where again, as When p* = E(p|ml, m2, mo) was determined, the outermost expected values required for the two expressions which appear on the righthand side of this identity are understood to be made with respect to the joint posterior density of po and A, givet the- obsetved sample data namely, f(P x |i mm m ) Previously the conditional expectation E(pfp, A, ml, m2, tm), whose coi —dtional variance is required as the first of the two terms of this identity, was determined as -Ep:Po,, ml, "2, mo ) = Xo + <) (l-Po)

where r* =:m + r and s* = mi + m2 +, s. To obatin its conditional variance, recall the general expression for the variance of the product iof two stochasticaly independent random variables X and Y, 2 2 2 2 2 2 2 a (XY) = a2 (X) a (Y) + E (X) a(Y) +E ) a2 (X) which, of course, can be extended s- l'd other conditioning random variables Z be used a2(xY Z) = 2(X[Z)2y JZ' ) E2 f (Y Z) + E2(iz(Y 4 Z)2 (XIZ) This expression for a2(XY|Z) can b"e ttamedately applied to obtain the variance a2(E(p|po, m mi, m, )) conditioted only on the observed data ml, m2, and m of the conditional expectati (|pp, A, mp, 2,. Thus let X denote A - c And Y denote po, where —gitn im1, ii2 and m — c denotes the constant, r*/s* or (n +r)/(ml + im2 + 8s) Then E(p|po A,, m, m ) can be written as the product of X and Y shifted by the addition of a constgnt c, namely, E(Ip lpo m, A, l m2, m3 XY + c where X = A - c and Y. po for the givens in.m m2, and m- are stochastically independent random variaBlef. Consequently, taki Z as the conditioning random variables mi1, m2, and mO, it is seen that.. -. -, a2(E(pjp,, 1, i, m, )m i, ) a (XY+cJz) = 2XY) where, of coursej c * = m * + s does not alter the cdnditional s m1 +.m2 + s variance- of XY, given Z, since for given Z (that is, for giVen m, m2, and in) the linear shift c can be treated as a constant. Therefore, a CXYlZ) = ao2xlZ)c2(XX ) + E2(XIZ) o(YlZ) + E2 (YZ) r2(cXZ) or, alternatively, a2(E(plp A m oI, m, mm, m ) a= 2 ( ci, m ), m m, 4) + E (X-cl, ( m2, mo)2p ( oml, m2' mo) + E (l Jml, mi, imo)a (X-c|m m2, mo).

— 24 -Again, conditioned on ml, m2, and mon c may be treated as a constant and, therefore, a2 (X-cfmlm, mo) a 2 (Aml, m2, mO) ad the cohditional variance a (E(p|po, a,. mi, m2, mo)) can be expressed as a2(E(pp P., in, M 2, mo)) = o (A lm, i2, mo)a(P lmlim2' mo) +2 E (x-s*ti, (P 'mI2 n m2' +2 ) E2(polm l'E m2' mo) (Afml, m2, mo) The second trmn required to obtain the conditional variance 2 -a (p|ml.m2, m) from the given identity involves the expectation of the conditional Variance a (2pp, A, ml m2, mo) formed with respect to the joint marginal posterior density of po and A —given the observed data ml, m2, and m, Since the conditional density of p —given p, A,:m m2, and m — is a;.O,..'..; nonstandardized beta density with parameters r* = mi + r and s* = m1 + m2 + s on the interval (a,b) where a = Xp and b =1 - (l-A)po, while the variance 0 " of such a nonstandardized data randoi X variable is given generally aB a2(Xab, r,s*) = (b-a)22 (Xlr*,s*) 2 With a2(X| r,s*) denoting the variance of a standardized random variable with parameters r* and s*, namely, 2 r* (s*~-r*) a (Xr*,s*) = r*(s*r*) (s*) (s*+l) the conditional variance a(plp,, i, mi, m2, mo) is given as a(p |IP, A, m, n2, i) = (b-a) 22Cp |r*,s*) = ( "'2 (1* ) (P2Po"2 )(l-' ) (s*) (s*+l) 0 since b - a =(l-(l-t)p ) - Ap =1 -p. Thus the conditional expectation of a (p|po, A, m 2 m m ) —given ~~ ~~~~~~~~~~0 V2 only the observed data ml, m2, and. m — is o 2 ' ' E(a (pp A, n, mi2 m i2 mo) isn, m m) = (r(tj) YR((l-p)2fim1, m 2, in ). (s*) (s*+l) 1 0

-25 -In tuln E((l-p) Iml, m2, mO) may be expressed in terms of the conditional, expectation and conditional variance of p0, given mi, m2 mo, namely, E((1-po)2 Im, m2, mo) = a (1-polml m2, mo) + E (l-poJm l, m2 mo) or, equivalently, t((l-p) 2-mI, m2, ) m2a (pofm, i m2 mo) ++ (l-E(PoJml, m 2 mo)) 9 2 Using this identity, the expression E(c2(plpoI X 1,m 2, m ) ) = (( ( (Pm, m2, mo) + (l-E(o(p mm 2 m ))2) 1O i P M 2 m "+ 0 4t 2 ~ is obtained for this conditional expectationo Finally, since the risk of using the Bayes estimator p* = E(p|ml, m2, mo), when the observed sample data m1, m2, and m are ' J, ^!.i 0 1 2. JL * &' 0 obtained from the sample' urvey, is given (up to a constant of proportionality for quadratic loss) by a (plml1 m2, m) from the identity, ' 2 2 f)(Y dPm'm) tEc,, ml m2' to n o)),.: a2c (|min in2, in^) = c(E(p~p0, ni1, in2, mi0)) ~ E(af(opr p A, a n1, in2, i)), the risk of this Bayes decision takes the explicit form up to a constant of proportionality a (plmi, m2, mn) = a (AJmn, n2, mo) (2p omn, mM2, mo) + E2(X4- N V In2, 0 0 2' 0 E (X mlmm) (p|ml m, mo) + E (pMl, m2, mo)~ (Xlmlmtin) + ( i —(+1 )(c (p (PoImlm2,o) + (l-E(p iml'm,inao)) 2) (s,) (s*+i).2 The various conditional means and conditional variances required for p0 and A in this expression for c2(p|ml,m2,mo) are suminarized below:

I.:. _.... 1 I.. -1.................a -. —...__._.._~_.r~~~..:.:- ",-."".."".!...... I I.. I Conditional Mean or Variance of.Posterior.Densilties Expressed in Terms of Parameters of Posterior. Densities Expressed in Terms of Data and Parameters of Prior Densities.... -- -. --- I.. I, - I 2 i --.,- Z._ I... — ~-r-* —... _...___... _ -..,. f. S _._ -= ~-*-.1~ " — 1.:,I,,,, _,, - l —~-r --- _-1 ___ _,,@ ----* TI~ -C..~I-~17.. S: b1. ~.11_T-b a (A-mlm2,mo) v (s E. ( AtI, ml,m2,%) E(p =O,m2,m ) E(PIlmlpA,m 2,m), rom thes peson the From these.expresins the *(u*-v*) Wu*-) (W*+1) u*)2 (u't- *) v(w-v) w(w+l) m +t) (ml+m2+u-t) (n+u) 2(n+u+)....- - (n+u) (n — r l) * (s**-r*) 2 *) (s*+l) Uv t* * s* (m +r) (m2+s-r) (ml+m2+s) (ml+m2+s+l) V w.06 n+u following may also ma. +"" be obtained, namely, - = (ml4+m2+u-t+l) (ml+m2+u-t) (n+u) (n+u+l)... a2(P0o nl,,2,mo ) + (-E(p Iml m2,2 ",2 =u (u*t)-t)1) (1-E(Po iml m2 to) ) u* (u*+l) — Using these various expressions F(p |Im,m2,i ) y be written in terms of the parameters of the posterior densities of p, p, and. A as 2- 2: a(pl|m,mv (mW) t v*( v). t*(ut*) (w*) (w*+l) (u*) (u*+!),v* r** 2 t*(U*vt* _ 2 *(w*-v*) r*(s*-r+) (*t*t) (u*-t*+l)w ''S ()2(' ul)' ' (W*) (w, +) u(-?)(u) (.. *)(*+ Alternately, c (pjml, pm ) may be written in terms of the observed data ml, m2, and m and the parameters of the prior densities of p, p; and X, as.

2(w-V ) _. (r(+t) 2+wm,'.+~t) a2(P ml m2 m = ( w ( ( t -2) +... ", (w+t) (n+) (n++l) m mlr 2(m +t) (m1+m2tu-t ) m o +t2 ( ( m ln+s) (n+u) (n+tl)w (wI) V 2. -(m!+r)2 (mr l2+ u -t+l)(m2+u-t) From this last expression for (Po 2mlm2mo) the Bayes rsk under quadratic loss of using the estimator p* = E(p ml,m2,mo) Can be evaluated up to a constant of proportionality by forming the expectations of the various terms involving the observed.data ml, m2, mo ihe i" espect to their joint marginal density f(ml,m2,mo) ) This calculation will not be undertaken here; instead, various asymptotic expressions for a2(p lmlm2,mo), which are valid when the sample size n is large, will be explored, Asymptotic expansions for 2(pi|ml,m2,mo).. ( ~ "9. '.-., First, suppose the samplue size n is large relative to the parameters t and u which determine the prior natural conjugate beta density for p o udaThen h ere; and Se' op ap Then t n 1 on, and p|,m2,n ) way be approximated as =, o, and. ~.. 'o. A ~~~~~~~~~~~~~m^^ ~....... 2 ^ 2 (.+ (52+s-r) 2 2 ^ 2, P)+ (m —( Po) ( X, 2+S) (ml +2 S+:) where po =,-is the proportion of nonresponses actually observed inthe ^ m1 + m2 sample and 1 - = n is the proportion of classified responses observed in the sample. observed in the sample, Altertateyi suppose the sample size n is large with respect to the parameters r and s which determpine the prior natural conjugate conditional

-28 -beta density for p. In this situation H n ' o, and -;terefore an approximate expression for?"(p|ml,m2,m) is 2 (p lmlm2,to) (O2(A )+(E(A) - PL 2) (o+t) (m1lm2+u-t) p 1 0 -2' --- ——. ---_ __ o -0 + pl+P2 (n+u) (n+u+1) i2ngm +tt fr pop t"mtnc -t+l) ( 2+m2+u-t ) p1P2 a2 ) (' u + ^nA (P"pl 1 + np 2 3 w(Pe+Pl) awhere pi = denotes ithe sample proportion of classified responsesbelonging to the first popdlation category, and P denotes the sample n0 ion ( )category. proportion of classif ied responses belonging to the second population category. As a third situation, suppose the Sample size n is large with respect to each of the four parameters t, u r, a, d s related to the prior naturalr conjugate beta densities assigned to p and p. In this case the two d o l o nonres, thecondti bona vre appnce imation approximations previously given for (p,m become the aproximation 2 "^ Plp a(Plmlsm2,mo) n (2(X)+(E(X) -_ )2) + OA( 2 +1 P+P2 " n(pl+p2) Finally, when the sample-size n dominates r, s, t, and u in the sense that r, n' t, and _u are small, then whenever the sample size is large and:A ^ -..:p1 + P2 is not exceeding smali (or, equivalently, w 6henever the proportion of nonresponses observed in thWe sample, o' is' not almost 1 and the sample. does not yield almost all nonresonnses), the conditional varianoe c (pml,m2,mo), which gives the risk of using p* = E(p|ml,m2,m ) as an estimate for p (except, perhaps, for a.multiplicative constant), is approximated simply by a2 (lmLm2,m) 2 ) 2 a (p ml,m2,mo) ~ a (A)(Po '

-29 -In other words, this asymptotic expression for a2 (pjm,m2,m ) indicates that whenever the sample taken is large enough so that the sample data related to p and p are no longer overly influencedby whatever prior judgments may have been held about them, then, provided. mostof the observations in the actual sample are not.nonresponses, the risk of using the Bayes estimator p* = E(plml,m2,m ) to estimate p is.determined by (1) the square of the observed proportion of nonresponses in the sample, and (2) the variance of the prior natural conjugate beta density for the allocation parameter A (this prior density, Of ourse:exp ressin-g the only information which is available about the allocation..parameter In summary,.when:the a.aple,sie n is la rge-.andthe oserved:am le information abhou gp and p outwegh prior judgement made about them, shoul — 2 the proportion of nonrespose servedin the.bple.small the risk ~. in using the Bayes estimastor * E(pin,m2,m0) for jmroseof estimatng p is negligible..Alternately,.if the. observed. rop. of nri of eon ses.ons is large, but not-so large that almost the entire smle nsists of, nonresponses, th.en should the p rior udgements held about the allocation arameter X reflect te cert tat s a partiuar e s that essentially the entire unit mass of the prior, density.splaced atths point, _:.:_..:_~.......~:~.,,,,.-_-.,,,,,~ -.'. aga.in for practical purposes the risk of using this Bayes estimator can be. ignored. - ~-uL ---......~:.,-. _,~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~