Division of Research Graduate School of Business Administration The University of Michigan FINITE POPULATION SAMPLING WITH MULTIVARIATE AUXILIARY INFORMATION Working Paper No. 302 Roger L. Wright The University of Michigan FOR DISCUSSION ONLY NONE OF THIS MATERIAL IS TO BE QUOTED OR REPRODUCED WITHOUT THE EXPRESS PERMISSION OF THE DIVISION OF RESEARCH. April 1982

ABSTRACT This paper examines finite population estimation and sample design from a robust, model-based viewpoint. The paper introduces a new class of multivariate regression estimators that integrates several model-based procedures and clarifies the role of weighted least squares and analysis of residuals in sampling. A new model-based procedure is suggested for designing an efficiently stratified sampling plan. KEY WORDS: Balanced sampling, Regression estimators, Robustness, Stratification, Superpopulation models, Unequal probability sampling. -ii

Author' s Footnote: Roger L. Wright is Associate Professor of Statistics, Graduate School of Business Administration, The University of Michigan, Ann Arbor, MI 48109. This paper was prepared with the support of the U.S. Department of Energy, Grant No. DE-FG02-80ER10125, the Electric Power Research Institute, and Consumers Power Company. However, any opinions, findings, conclusions, or recommendations expressed herein are those of the author and do not necessarily reflect the views of the sponsors. The author wishes to thank K.R.W. Brewer, Tore Dalenius, Leslie Kish, Carl Sarndal, and Allen Spivey for valuable comments on earlier versions of this paper. -iii

1. INTRODUCTION Various strategies have been proposed for combining sample data with available auxiliary information. One approach is to adapt linear model prediction theory to the finite population context, giving a highly efficient BLU estimator if the model is correctly specified but one that can be seriously biased if the assumed model is inaccurate [Brewer (1963), Hansen, Madow, and Tepping (1978), Royall (1970), and Smith (1976)]. An alternative procedure is the generalized regression estimators which feature greater robustness to model misspecification while retaining much of the efficiency of the best linear unbiased (BLU) estimator [Cassel, Sarndal and Wretman (1976) and Sarndal (1980 a,b)]. Recently, Brewer (1979) suggested a robust estimator that blends aspects of the BLU and generalized regression estimators but is restricted to a single auxiliary variable. Isaki and Fuller (1982) gave a closely related multivariate estimator. This paper introduces a large class of robust estimators that utilize multivariate auxiliary information in finite population sampling. This new class includes both BLU and generalized regression estimators, as well as many more conventional sampling estimators. New results are given that suggest the pervasiveness of the generalized regression estimators within the class of robust estimators. A by-product of the analysis is added insight into the relationship between survey sampling and applied regression analysis, especially concerning the widely encountered problems of weighting and analysis of residuals. The paper also examines model-based sample design, and a new procedure is suggested for constructing efficiently stratified sampling plans that preserve the robustness of the generalized regression estimators. This procedure is a natural extension of Neyman allocation to the model-based case, and it

-2 -provides an alternative to the balanced sampling approach of Royall and Herson (1973 a,b), to nonrandom model-based designs, and to other stratified sampling designs such as those of Andrews, Kish, and Cornell (1980), Dalenius and Hodges (1959), Singh (1975), and Rao (1977). The paper discusses model-based estimation in Section 2, first in the ratio case involving a single auxiliary variable and then in the more general multivariate situation. Sample design is discussed in Section 3, and to simplify the presentation, the derivation of several key results is deferred to Section 4. 2. MODEL-BASED ESTIMATION 2.1 The Ratio Case The model-based approach blends elements of finite population sampling and linear statistical models. Assume that a sample of n units is to be randomly selected from a population of N units labeled I = 1,....,N. The probability of obtaining each possible sample s is denoted by p(s), and the probability that unit I is included in the selected sample is denoted frI. In general, the inclusion probabilities may vary from unit to unit but are known from the sampling plan. Consider a superpopulation model i, under which the target variable of interest, say YI, is related to an auxiliary variable xI following a simple zero-intercept regression equation: YI = 3XI + uI' (1) The usual assumption that each residual uI has zero expectation under t is employed; thus, Ej(uI) = 0. The auxiliary variable xI in (1) is assumed to be positive and known for each unit in the population.

-3 -The model 5 is generally taken to be heteroscedastic, with varying residual standard deviation aI associated with each unit in the population. Our analysis will assume that the aI are known, but in practice we have used available data to estimate a functional relationship between a1 and suitable auxiliary variables, using a technique developed by Harvey (1976). In addition, it is assumed that ul,...,uN are uncorrelated. Thus, under i, we have E1(U) = 0, Eg(uI ) = aI, and E(uIuJ) = 0 for I * J. There is considerable confusion about estimation when varying probabilities of inclusion and varying residual standard deviations are involved. In light of the heteroscedasticity of t, many analysts would employ a weighted least squares (WLS) procedure with weights determined by aI. On the other hand, most survey samplers would emphasize the sampling plan and recommend WLS with weights determined by TrI and possibly xI. Table 1 shows five differN ent estimators of the population total Y = yI that have been recommended 1=1 in the recent statistical literature. (Table 1 about here) A framework for examining these alternatives is produced by embedding them within a class of estimators that is amenable to analysis. In fact, each of the estimators shown in Table 1 can be written in the form YQR = X + rIUI, (2) I s where = X qIxIYI / qIx2 Ies Ils and uI = YI - Xl with qI > O and r > 0. Here oX is the naive regression-based estimator using the finite population total X, and the second term is a correction based on the sample residuals.

-4 -The qI are\ weights used to calculate the regression coefficient g, and rI is a weight used to extend the observed residual. Table 2 shows how qI and rI can be chosen for each of the estimators shown in Table 1. (Table 2 about here) One reason why so many alternative ratio-type estimators have been proposed is that there are two conflicting bases for statistical inference in survey sampling. Under model-based inference, the sampling distribution of an estimator Y is considered to be induced by the joint distribution of the residuals uI of the model C. In terms of the expectation EE taken with respect to i, YQR is an unbiased predictor of the random variable N Y = I YI for any choice of q and r. However, Royall has shown that the 1=1I best linear unbiased (BLU) predictor of Y, denoted YBLU uses the model-based B LU WLS estimator BRLU given by choosing qI = i-2,g and places unit weight on the observed sample residuals. Although YBLU completely ignores the samBLU pling plan, it would be preferred by many statisticians if they were certain of the accuracy of the model i. The major objection to Y is that it can be B LU seriously biased if C is even moderately inaccurate. To protect against such dependence on t —i.e., to provide robustness — survey samplers have traditionally emphasized design-based inference. Here the YI are assumed to take unknown but fixed values throughout the population, but the estimator Y is considered to have a sampling distribution induced by the sampling plan. Estimators that are design-unbiased have a desirable robustness since they are less dependent on the accuracy of a model such as i. Design unbiasedness can be provided by linking the choice of weights used in the estimator to the inclusion probabilities of the sampling plan. Consider

-5 -an estimator YR= riy, and let Ep(YR) denote its design-based expected Ies value. Ep(yR) = p(s) _ rIYI s IES N rIYI I p(s), where SI = {s|Ies}, I=1 seSS N = rI1IYII=1 This implies that YR is a design-unbiased estimator of Y if and only if for all Y1,',yYN, N N I rIIYI =; =1. I=1 or, equivalently, if and only if wI > 0 and rI = wI-1 for all I. So YR is a design-unbiased estimator of Y if and only if it is the HorvitzThompson estimator YHT for a sampling plan having positive inclusion probabilities for all units in the population. Unfortunately, design-based analysis of the estimator YQR is complicated by the nonlinearity in B. However, it is useful to examine approximate design unbiasedness in large samples from even larger populations. An estimator Y is said to be an asymptotically design-unbiased (ADU) estimator of Y if and only if for all Y1,> *,Y N tim Ep(Y) = Y. (3) (n,N)+o The sense in which this limit is taken follows Brewer (1979) and Sarndal (1980a) and is described in Section 4. Two results are shown: Result 1. YQR is ADU for Y if and only if there is some constant X such that 1-rI I = qIfIxIX for all I=1,...,N. (4)

-6 -Result 2. YQR is ADU for Y if and only if for all yl,.,YN I rII = I -l uj (5) Ies Ies Result 1 gives an algebraic condition on wI, qI, and rI that guarantees that YQR is ADU for Y. Using Table 2, Result 1 shows that YHTR' YCR, YGR and YRO are all ADU for Y, but YBLU is not. The generalized regression CR' RO BLU estimator YG was developed to retain the use of fBLU as in YBLU while offering the ADU robustness provided by the more conventional design-based estimators YHTR and YCR. Brewer's YRO was apparently introduced in order to retain the rI = 1 used in YBLU while offering conventional robustness. So Result I helps to sort out features of these alternative estimators. Result 1 also shows how to construct a large family of robust, modelbased estimators, i.e., estimators that are model-based but still ADU. Spe-1 cifically, YQ is ADU for Y for any choice of qI as long as rI =I Result 2 provides a converse to the previous statement; it states that whenever YQR is ADU for Y, any choice of rI is equivalent to choosing r = ITrl-. In other words, as long as two estimators in the class (2) are ADU in our sense and use identical qI, they are identical estimators; there is nothing to be gained from any choice of rI * 1I-1 except computational simplicity. These results are consistent with the sound practice that has developed in applied regression analysis of examining the sample residuals for information about model misspecification. In using regression analysis for finite population inference, we find that robustness can be obtained by adding a HorvitzThompson-like residual correction I uI to the regression-based estimate Ies 3X.

-7 -To summarize so far, the choice of estimator in the ratio case reduces to two questions: (1) Do we want the robustness provided by an ADU estimator or can we place faith in the model E and use YBLU? (2) If we want an ADU estimator YQR' how should we choose the weight qI to be used in the WLS regression? The answer to the first question is highly dependent on the purpose and context of the project and the credibility of the model. The second question will be addressed subsequently by examining the asymptotic variance of Y R. 2.2 Estimation with Multivariate Auxiliary Information The preceding results extend easily to the multiple regression case. For added generality assume that the population parameter of interest is N a'y = Z aIY, where aI is known for each unit in the population. For 1=1 example, the aI may indicate a subpopulation of interest. Under i, we now assume that YI is related to a vector XI' = (xI,...xIk) of k > 1 auxiliary variables following a linear regression equation of the form YI = lXIl + + kxIk + uI (6) 2 2 with E (u1) = 0, EE(u1 = a, and Ej(u uJ) = 0 for I J. For any choice of r > 0 and any qI > 0 such that ( ) qiXIX') exists for any possible s, Its define a QR estimator as N a'YQR = aIXI' + i raIuI, where (7) I=1 Is = ( X qIXIXI')-1 I qIXIyI Ies Is (a WLS estimator), and uI = YI - X1' ' Royall (1976) has shown that under i, the BLU model-based predictor of a'y is a'yBLU, given by choosing qI = aI-2 and rI = 1. Sarndal (1980

-8 -a,b) has considered generalized regression estimators that use various q -1 and rI = I. Isaki and Fuller (1982) have studied the estimator given by -2 qI = -I and r = 0. To examine the robustness of QR estimators, define a'yQR to be ADU for a'y QR if and only if for all Y1,i,*YN Him Ep(a'YQR) = a'y. (n,N)+Then the two results given for the ratio case can be generalized as follows: Result 3. a'YQR is ADU for a'y if and only if there is some vector A = (X1... k)' such that for all I = 1,...,N (1-rirr)aI = qI~rIXI1'- (8) Result 4. a'YQR is ADU for a'y if and only if for all yl,*..*YN riauI = Z l i-a (9) Iss Ies Result 3 shows that a'yQR is ADU for all population parameters a'y if and only if rI = iI 1, as in the generalized regresson estimators. The -2 BLU model-based predictor, using q = aI and r = 1, is ADU only if -1 2 (irI -1) aoaI is equal to a linear combination of the auxiliary variables in \ throughout the population. The Isaki and Fuller estimator, using qI = Tr-2 and rI = 0, is ADU if and only if aITrI is a linear combination of XI throughout the population. Result 4 shows that any ADU QR estimator (7) is identical to the generalized regression estimator that uses the same qI with rI = -I1. Proofs of Results 3 and 4 are given in Section 4.

-9 -3. ROBUST MODEL-BASED SAMPLE DESIGN In sample design, as in estimation, the challenge is to develop an approach that takes advantage of the model (6) but is not totally dependent on the model's accuracy, i.e., a robust model-based approach to sample design. Suppose for a moment that (6) is indeed considered to be unquestionably accurate. Then the sample plan ought to be designed to minimize the modelbased expected variance of a'YBLU. Royall has shown that this leads to nonrandom sampling. In particular, under fairly common circumstances, the optimal design under i is to select the n units that have the largest residual standard deviation (oI) in the population. This is unacceptable in many applications because the slightest inaccuracy in T will produce substantial but almost undetectable biases. For a robust model-based approach, consider any estimator a'yQR that is ADU for the parameter of interest a'y. At the planning stage, it is reasonable to utilize both the model C and the proposed sample plan to evaluate the anticipated performance of the estimator. Nonlinearity makes exact analysis difficult, but for many purposes a large-sample approximation is satisfactory. So we define the asymptotic variance of a'yQR to be V(a'YQR) = im EEp(a'yQR - a'y)2. (n,N)-+Co Result 5. If a'YQR is ADU for a'y, then N V(a'yoR) = a2(i - )2. (10) I=1 Note that the asymptotic variance of a'yQR does not depend on the choice of qI, provided that the estimator is ADU. Alternative choices of qI give estimators that have similar sampling distributions in large samples. This implies that if a sample design is based on Result 5, it will be

-10 -applicable to the entire class of robust QR estimators based on the model E. This class includes virtually all of the standard survey sampling procedures. Result 5 can be written more attractively. Define the asymptotic stan1/2 dard error, se, to be V(a'y ), and rescale the inclusion probabilities of I — ~ yQR the sample plan as wI = ui/mean(r) (11) = NffI/n, where mean(nr) denotes the finite population mean N N-1 X mI. I=1 Then (10) becomes N (12) se = -V mean(a2a2/w) - (n/N)mean(a2a2). (12) Equation (12) also provides qualitative insights that are useful for planning. As is usual in sampling, equation (12) shows that the standard error increases in proportion to the total number (N) of population units, and decreases in proportion to the square root of the sample size. The term "(n/N)mean(a a )" generalizes the conventional finite population correction factor and is often negligible. 2 2 The remaining term in (12), "mean(a a /w)," reflects the interaction of the parameter of interest, the residual standard deviations assumed in the model i, and the inclusion probabilities of the sample plan. As Brewer and Sarndal have both noted, an efficient sampling plan can be developed by choosing the Ir to minimize this term. Indeed the Cauchy-Schwartz inequality implies that

-11 - N N N 2 I11 l1 =1 6 C aI2oI2/wI W >6 'w/ or equivalently, that mean(a2a2/w) > mean2(|a|a). Here the lower bound is achieved if and only if wI is proportional to |aIJaI, or specifically if wI = JaIJaI/mean(jalo), and (13) fI = nlaIlaI/N mean(|aoa). A sampling plan is said to be best for a'y under t if and only if it satisfies (13). For a best sampling plan, se = - /mean2(|a|l )-(n/N)mean(a2a2). (14) /n These results can be summarized in terms of the relevance of each unit in the population to the parameter of interest, where the relevance of unit I is defined to be the quantity laltII. Then (13) implies that a best sampling plan selects each unit with probability proportional to its relevance. Moreover, (14) shows how to calculate the standard error of a best sampling plan from the distribution of relevance in the finite population. There are sometimes good reasons to consider a sampling plan that is not best. The efficiency (eff) of any such plan can be defined to be the ratio nb/n, where n is the sample size required to achieve a certain standard error using the plan under consideration, and nb is the sample size required to achieve the same standard error with the best sampling plan. Suppose that wI describes' the plan under consideration. Then eff = mean2(al|a)/mean(a2a2/w). (15) In particular, the efficiency of a simple random sampling plan is (1 + CV )1, where CV is the finite population coefficient of variation of laI|aI.

-12 -Equation (13) can be regarded as a generalization of Neyman allocation for stratified sampling. Consider the special case in which the population parameter of interest is the population total, so that all aI = 1; and suppose strata can be defined such that the aI are constant within strata. Then (13) gives the Neyman allocation. Thus, stratification with Neyman allocation is a best sampling plan in this special situation. More generally, stratification is a very useful technique for developing convenient sampling plans that are highly efficient, i.e., nearly best for any population parameter and any model C. In general, relevance, |aI|JoI, can vary almost continuously through the population. However, strata can be constructed so that the units within each stratum are nearly equally relevant. In such a case, a stratified sampling plan based on the mean within-strata relevance is highly efficient. This principle provides a direct model-based method of constructing strata. To see this precisely, consider any specific stratification of the population into H strata with Nh units in stratum h. Let CVh be the coefficient of variation of laI|oI within stratum h, so that 1 + CVh2 = meanh(a2a2)/meanh2(a la ). (16) Here meanh denotes the population mean within stratum h. A stratification is said to be strong if e = max (CVh) is small. For a strong stratification, 1 <h <H the sample size nh allocated to each stratum h should be nh = nNh meanh(la| a)/N mean(|a| a). (17) In other words, with a strong stratification, the sampling fractions nh/Nh should be proportional tol the mean relevance of the units within each stratum; or equivalently, the total sample size should be allocated to each stratum in proportion to the total relevance of the units within each stratum.

-13 -Any strong stratification with allocation following (17) will be highly efficient. In fact, (17) implies that for any unit I in stratum h, w, = meanh( a )/mean( a a); (18) then (15) and (16) give eff = mean (tal) (19) N-1 H Nh(l+CVh2)meanh(lala) h=l > (1 + c2)-. It is often convenient to construct strata with an equal number of sample units in all strata. With the allocation in (17), the nh will be equal as long as the total relevance of units is constant from stratum to stratum. So a suitable stratification can be constructed by sorting the population in order of relevance, and then dividing the population into H strata containing H-1 of the total relevance. For example, to form ten strata, each stratum should contain 10 percent of the total relevance. The efficiency of the design can be made as high as desired by increasing H, but as few as ten strata are often adequate. 4. DERIVATION OF KEY RESULTS 4.1 Notation The purpose of this section is to provide a derivation of Results 1-5. The analysis builds on Brewer (1979) and Sarndal (1980a). Additional vector notation is useful. Define a = [a... aN]', Y = [Y1... YN]', and u = [u1... uN] t Let X = [X1. XN]', the (Nxk) matrix of auxiliary information. Also, define the following (NxN) diagonal matrices: = diag(Z); II = diag(7I); Q = diag(qI); R = diag(rI); and A = diag(6I), where 6 =1 if Ies and 6 =0 if Its. In this notation model t is

-14 - y = X8 + u, Et(u) = 0, Ec(uu') = Z, assumed known. (20) A sampling strategy is characterized by the triplet (H,Q,R). To estimate the population characteristic a'y, we use the estimator a'YQR with YQR = X8 + RAu, (21) = (X'QAX))- X'QAy, and u = y - Xg. N It is assumed that the sample size n = f fI is fixed, and that X'QAX I=1 is nonsingular for all s with nonzero probability of occurrence. A substantial advantage of model-based analysis lies in the strong links that are established with linear statistical inference. With the added definitions C = X(X'QAX)-lX'Q, and (22) T = C + R - RAC, we have Xg = CAy, YQR = TAy, CAX = X, and I - TA = (I-RA)(I-CA) = (I-A) + (I-T)A. The prediction error a'y - a a a'(I-TA)y reduces to a'(I-TA)u under QR 5, since '(I-CA)X = 0. This implies that a'YQR is a S-unbiased predictor of a'y, with the mean squared error a' (I-TA)(I-T'A)a (23) = a'(I-A)Z(I-A)a + a'(I-T)ZA(I-T')a. As Royall and others have shown, this is minimized by using a BLU prediction strategy, (H, -1,I), which is, of course, conditional on the sample s.

-15 - 4.2 ADU Estimators In dealing with finite population sampling, care must be exercised in defining the context of asymptotic analysis. Various approaches can be utilized for letting the sizes of the sample and population both increase while the sampling fraction remains more or less fixed. We will follow a formulation introduced by Brewer (1979) and used by Sarndal (1980a). For the asymptotic analysis, the population of interest is assumed to consist of N* = mN units composed of m blocks of N units. Each of these m blocks is assumed to have an identical matrix X of auxiliary information. For model-based analysis, (20) is used to generate m independent realizations of the vector y, say yj, j = l,...,m. However, to make the definition of ADU independent of the model, in this subsection the yj are assumed to be identical copies of some y. The vector a is assumed to be identical across blocks so that the population m parameter of interest, say a'y*, can be written as Z a'y, or simply ma'y j=l J in this subsection. Similarly it is assumed that the strategy (I,Q,R) is identical across blocks, and in particular that a sample of size n* = mn is selected with first-order inclusion probabilities following I within each block. The matrix Aj indicates the units in the sample from block j, and the estimator a'y* is formed following (21) as m a'y* = a'y., where (24) j=l J yj = X6* + RAj Uj, m m O* = ( m X'QAjX)-1 Z X'QAj yj, and j=l j=l Uj = yj - XJ*.

-16 - We also define m i* = m-1 Aj, (25) j=l C* = X(X'QI*X)-IX'Q, and T* = C* + R - RII*C* so that m a'y* = Y aT*A.y. = ma'T*l*y if the y. are identical. j=1 J With this formulation, the strategy (II,Q,R) is said to be ADU for the characteristic a if and only if for all y, Ep(a'y*/m) converges to a'y as m increases to infinity. As m increases, II* converges almost surely to I. The assumption that X'QAX is nonsingular for all samples that can occur under ~~~~~~~~~~^ ~~~-1 H implies that C* is bounded and converges almost surely to C = X(X'QIX) X'Q. So T* converges almost surely to T = C + R - RHC and Ed(a'y*/m) converges to a'THTy. Thus, Lemma. The strategy (l,Q,R) is asymptotically design unbiased (ADU) for the characteristic a if and only if a'(I-TH)y = 0 for all y e RN. It is helpful to note that in its derivation, this lemma describes each block of the population, but once the derivation is complete, the lemma can be considered to describe the entire population of interest. An immediate consequence of the lemma is that for any strategy (H,Q,R) that is ADU for a, wI = 0 implies aI = 0. Any unit with both wI = 0 and aI = 0 is clearly irrelevant and can be eliminated from the population. Because we are primarily interested in ADU strategies, it is assumed henceforth that I > 0. An algebraic characterization of ADU strategies can be developed from the identity I-TH = (I-RH)(I-CH). Suppose initially that Q > 0, so that QO defines an inner product over R. In this case CH = X(X'QHX) X'QH

-17 -is the orthogonal projector onto the linear manifold M(X) spanned by the column vectors of X, and I-CH is the projector onto the linear manifold orthogonal to M(X) with respect to the inner product QI. Since a' (I-TR)y = a'(I-RJI)(QI)-Q H(I-CH)y, (J,Q,R) is ADU for a if and only if (Qi)-l(I-RH)a e M(X), or equivalently, (I-RH)a = QHx for some x E M(X). The restriction Q > 0 can easily be relaxed, giving Theorem 1. A strategy (H,Q,R) is ADU for the characteristic a if and only if (I-RH)a = QHx for some x e M(X). This gives Results 1 and 3 of Section 2. Two strategies, (H,Q1,R1) and (H,Q2,R2), are said to be equivalent for a if and only if they produce identical estimates of a'y for all y and all samples with positive probability of occurrence. Using (21) and (22), two strategies that employ identical Q are equivalent if and only if a'(R1 - R2)Au = a'(R1 - R2)A(I - CA)y = 0 for all s and all y. Using an argument similar to the proof of Theorem 1, this is true if and only if (RL-R2)a = Qx for some x e M(X). However, Theorem 1 shows that a strategy (I,Q,R) is ADU for a if and only if (F-1-R)a = Qx, x e M(X). This proves Theorem 2. A strategy (I,Q,R) is ADU for a if and only if (H,Q,R) and the generalized regression strategy (H,Q,H-1) are equivalent for a. This gives Results 2 and 4 of Section 2.

-18 - 4.3 Efficiency of ADU Strategies Within the class of ADU strategies, a useful planning criterion is the asymptotic variance of a'YQR, denoted V(a'YQR). Here V(a'yQR) is defined to be the asymptotic expectation, with respect to both design and model, of the mean square prediction error of a'YQR. To develop the asymptotic analysis we must return to the assumption that the population comprises m blocks, as in the previous subsection, but with yj independently generated within each block following (20). In this case, there are m independent u. with Eg(u) = 0 and Eg(juju') = Z, j = l,...,m. To examine the square error 2 (a'y*-a'y*), use (25) to note that m m - Yj j = i (I-T*Aj)uj, since j=1 j=l m ) (I-T*Aj)X = m(I-T*i[*)X j=l = m(I-Rf*)(I-C*T*)X = 0. A derivation similar to that of (23) implies m m-lEg(a'y*-a'y*)2 = m-1 E[ I a'(yj-y)]2 m = m1 a'(I-T*Aj) Z (I-T*'Aj)a j=L = a'(I-T[*) a + a' (I-T*) *Z(I-T*')a. Now the asymptotic design-based expectation can be evaluated as in the previous subsection, giving lim m EdE (a'y*-a'y*) = a'(I-H)Za + a'(I-T)JTE(I-T')a. (26) m+iGiven that (IT,Q,R) is ADU for a, a'TlIy = a'y for all y E RN; so (26) simplifies to a'(R-l-I)Za.

-19 -Note that this expression represents a summation over a single block, so that the corresponding summation over the entire population is m times larger. Thus, in terms of the entire population we have Theorem 3. If (H,Q,R) is ADU for the characteristic a, then the asymptotic variance of (H,Q,R) for a is V(a'y) = a'(H-l-I)a (27) N = aI2(I-1 - 1) o2. 1=1 This gives Result 5 of Section 3. Theorem 3 has been proven previously for a = 1 with specific choices of qI and r. Brewer (1979) considered the case k = 1, qI = (WIxI) (1 - i), and rI = 1, as discussed in Section 2.1. Sarndal (1981a) obtained the result -1 -2 -1 -2 for qI = 'I, I, and wI aI Isaki and Fuller (1982) obtained -2 (27) for qI = I when NI follows (13), using a somewhat different asymptotic argument.

-20 - REFERENCES Anderson, D. W., L. Kish and R. G. Cornell, (1980), "On Stratification, Grouping, and Matching," Scandinavian Journal of Statistics, 7, 61-66. Brewer, K. R. W., (1963), "Ratio Estimation in Finite Populations: Some Results Deductible from the Assumption of an Underlying Stochastic Process," Australian Journal of Statistics, 5, 93-105., (1979), "A Class of Robust Sampling Designs for Large-Scale Surveys," Journal of the American Statistical Association, 74, 911-915. Cassel, C. M., C. E. Sarndal and J. H. Wretman, (1976), "Some Results on Generalized Difference Estimation and Generalized Regression Estimation for Finite Populations," Biometrika, 63, 615-620. __,, __, (1977), Foundation of Inference in Survey Sampling, New York: John Wiley & Sons. Cochran, W. G., (1961), "Comparison of Methods for Determining Stratum Boundaries," Bulletin of the International Statistical Institute, 38, 345-358. Dalenius, T. and J. L. Hodges, Jr., (1959), "Minimum Variance Stratification," Journal of the American Statistical Association, 54, 88-101. Hajek, J., (1971), "Comment On an Essay by D. Basu," in Foundations of Statistical Inference, eds. V. P. Godambe and D. A. Sprott.. Toronto: Holt, Rinehart & Winston. Hansen, M. H., W. G. Madow and B. J. Tepping, (1978), "On Inference and Estimation from Sample Surveys," Proceedings of the Survey Research Section, American Statistical Association, 82-107. Harvey, A. C., (1976), "Estimating Regression Models with Multiplicative Heteroscedasticity," Econometrica, 44, 461-464. Isaki, C. T. and W. A. Fuller, (1982), "Survey Design under a Regression Superpopulation Model," Journal of the American Statistical Association, 77, 89-96. Rao, T. J., (1977), "Optimum Allocation of Sample Size and Prior Distributions: A Review," International Statistical Review, 45, 173-179. Royall, R. M., (1970), "On Finite Population Sampling Theory under Certain Linear Regression Models," Biometrika, 57, 377-387., (1971), "Linear Regression Models in Finite Population Sampling Theory," in Foundations of Statistical Inference, eds. V. P. Godambe and D. A. Sprott. Toronto: Holt, Rinehart & Winston., (1976), "The Linear Least Squares Prediction Approach to Two-Stage Sampling," Journal of the American Statistical Association, 71, 657-664.

-21 - and J. Herson, (1973a), "Robust Estimation in Finite Populations," Journal of the American Statistical Association, 68, 880-889. and, (1973b), "Robust Estimation in Finite Population, II: Stratification on a Size Variables," Journal of the American Statistical Association, 68, 891-893. Sarndal, C. E., (1980a), "On it-Inverse Weighting versus Best Linear Unbiased Weighting in Probability Sampling," Biometrika, 67, 639-650. (1980b), "A Two-way Classification of Regression Estimation Strategies in Probability Sampling," The Canadian Journal of Statistics, 8, 165-177. Singh, R. (1975), "On Optimal Stratification for Proportional Allocation," Sankhya, 37, C, Pt. 1, 109-115. Smith, T. M. F., (1976), "The Foundations of Survey Sampling, A Review," Journal of the Royal Statistical Society, A, 139, Part 2, 183-204.

-22 -Table 1. Various Estimators Proposed for the Ratio Model a. Horvitz-Thompson Ratio (Hajek, 1971) YHTR = T X, where HT = YHT/X^HT, N with YHT = TrI-1YI and X = xI. I s I=I b. Combined Regression through the Origin YCR = YHT + OCR (X - XHT), with %CR = X rI-lxIYI / Z rI-lxI2 I ~s I ES c. Best Linear Unbiased (Royall, 1970) YBLU = I YI + OBLU i XI, I s IUs with 3BLU = I a-2 xIYI / OI-2XI2. I es I S d. Generalized Regression (Cassel, Sarndal, and Wretman, 1976) YGR = YHT + OBLU (X - XT) e. Another Robust Estimator (Brewer, 1979) YRO = I YI + RO xI' I es I ts where RO = qIxIYI / qIxI2 I es I es with qI = (TIXI)-l (1-TI).

-23 -Table 2. Choice of qI and rI for the Estimators in Table 1 Estimator a. YHTR b. YCR c. YBLU d. YGR e. YRO 9TT 7rI1' a1-2 GI-2 rT 0 1 i- 1 I