Division of Research Graduate School of Business Administration The University of Michigan February, 1981 iJ a Sample Design with Multivariate Auxiliary Information Working Paper No. 247 Roger L. Wright The University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Division of Research. V

ABSTRACT Strategies are investigated for planning large administrative sample surveys of populations having known auxiliary variables related to the target variable through a linear superpopulation model. Both model-based linear prediction strategies and design-based generalized regression strategies are imbedded within a class of strategies combining weighted least squares regression estimators and varying probability sample designs. Strategies are identified which provide asymptotically design unbiased (ADU) estimators regardless of the validity of the assumed model. The model-based asymptotic efficiency of these ADU strategies is related to the sample design. Practical stratified sampling plans are proposed which utilize inclusion probabilities related to a simple measure of the relevance of units. These plans generalize equal aggregate size rules for constructing stratified sampling plans. This methodology is illustrated in the context of utility load research and cost os accounting. KEY WORDS: Balanced sampling, Cost accounting, Load research, Regression estimators, Robustness, Stratification, Superpopulation models, Unequal probability sampling. Or -ii

Author' s Footnote: Roger L. Wright is Associate Professor of Statistics, Graduate School of Business Administration, The University of Michigan, Ann Arbor, MI 48109. This paper was prepared with the support of the U.S. Department of Energy, Grant No. DE-FG02-80ER10125. However, any opinions, findings, conclusions, or recommendations expressed herein are those of the author and do not necessarily reflect the views of the D.O.E. The author wishes to thank K.R.W. Brewer, Graham Kalton, and Leslie Kish for valuable comments on earlier versions of this paper. Up -iii

1. INTRODUCTION Often in management a project is undertaken to collect data on a sampling basis to augment an existing administrative database. For example, an accountant may need to estimate the current value of assets, a utility cost-of-service study may require estimates of the usage of electricity by time-of-day, or a marketing study may be undertaken to estimate the potential sales of a new product to established customers. Typically, additional relevant data, e.g. past sales of related products, are available within the administrative database for each unit in the population. This auxiliary information can be exploited to produce more reliable sample estimates. One approach is to use the sample data to estimate a regression model relating the target variable to the relevant auxiliary information, and then use the estimated regression equation to extend the target variable to the unsampled part of the population. II This use of auxiliary information can usually be anticipated when the project is planned. Past experience often indicates the character of the regression relationship to be expected. This expected regression relationship can be utilized to choose the size of the sample and to develop an efficient sampling procedure. In many cases, a single-stage sample can be selected directly from the frame provided by the administrative database, so that the sampling plan is almost completely characterized by the probability of including each unit in the sample. These inclusion probabilities can be effectively chosen in accordance with the relevance of each unit as determined by the expected regression model and the population characteristic to be estimated. Planning these administrative sampling projects is similar in many respects to planning the establishment surveys of public agencies, e.g. the U.S. Bureau of Labor Statistics' Current Employment Survey. The differences are a

-2 - matter of degree —most importantly the greater reliance on the administrative database both as a sampling frame and as a basis for estimation through modeling. Moreover, while most of the public establishment surveys utilize rotating sample designs, these more complex designs are fairly rare in management applications. The management sampling applications that are of interest, then, share the following characteristics:...A single-stage, stratified sampling plan is to be used to select a sample s from a finite population comprised of N units labeled I = 1,...,N....The purpose of the project is to estimate a finite population characN teristic of the form X aIYI with aI known. This may be the population I=1 N total X yI, a subclass total or mean, a difference between subclass totals I=-1I or means, or even a more complex characteristic such as a finite population regression coefficient....Past experience suggests that the target variable YI is closely related k to a vector XI = (xl... Xk)'E R of k auxiliary variables which are known throughout the population. The relationship between the target variable y and the auxiliary variables is thought to be well described by a regression model k F: yI = XI'B + uI with unknown 6 E R. Under i, the uI are assumed to be 2 2 random variables satisfying E(uI) = 0, E(uI ) = aI > 0, and E(uIuJ) = 0 if I 1 J....The aI can be regarded as known, at least up to a constant of proportionality. In practice the aI are often assumed to be proportional to some known measure of size. More generally, past experience may suggest a particular functional relationship between aI and multivariate auxiliary information, (Harvey, 1976).

-3 - *..A sampling strategy is to be developed which is to be efficient in the sense that it can be expected to provide a highly reliable estimator of N aiyI if ~ is accurate, but which is also robust in the sense that the I=1 estimator is not badly biased even if t is misspecified. An example may be helpful. Under the Public Utility Regulatory Policies Act of 1978 (PURPA), U.S. electric power companies are required to estimate the total power usage of various classes of customers during certain peak hours. These estimates are used in cost-of-service studies, to allocate the cost of maintaining generation and transmission capacity to various customer classes. Because hourly usage is not normally metered, peak usage is estimated on a sampling basis. Available sample data generally show a strong relationship between peak usage and monthly consumption, which is usually metered for the entire population. Other potential predictors of peak usage include annual consumption, local weather characteristics, and perhaps the price of electricity. The latter is especially relevant in rate experiments. In addition household income, composition, and appliance stock may be available on a double sampling basis. Some references are Aigner (1979), Aigner and Housman (1980), and Taylor (1977). In these studies the marginal cost of each sample unit is several hundred dollars or more per year so there is ample motivation to make efficient use of relevant and available auxiliary information. However, since these estimates have a substantial impact on electricity prices, they must have strong credibility with the public. This credibility seems to be related to the robustness of the sampling strategy. In applications like these, the conventional approach is to use stratification to bring auxiliary information into the estimator and possibly to

-4 - introduce varying inclusion probabilities in the sampling plan. A common practice is to establish a stratification on one or more auxiliary variables using the Dalenius and Hodges (1959) cumulative square root rule on each variable, and then to use Neyman allocation based on within-strata variances estimated from available sample data. Often, because of limited sample data, allocation is based on the within-strata variance of an auxiliary variable that is thought to be highly correlated with the target variable. Various aspects of this procedure have been discussed by Anderson, Kish and Cornell (1980), Cochran (1961), Rao (1977), and Singh (1971, 1975). This procedure requires considerable care and judgment. In deciding on the number of auxiliary variables to be used in the stratification and the number of cutpoints for each such variable, assumptions must be made about the joint distribution of the target variable and the auxiliary variables. Withinstrata variances are usually difficult to estimate, and the use of an auxiliary variable as a proxy for the target variable conflicts with the use of the same auxiliary variable as a stratification variable. Moreover, there is scarce justification for using the Dalenius-Hodges rule for multivariate stratification, but other alternatives are extremely cumbersome. In the present paper, the auxiliary information is used in a multivariate regression estimator. Multiple regression provides a familiar, easily used, and extremely flexible tool for bringing auxiliary information into the analysis. Dummy variables can be used in the model to represent categorical information; this is analagous to analysis of variance models and generalizes the technique of stratification in sampling. Other suitable variables can be included in linear or perhaps quadratic form, as in analysis of covariance. Interaction variables offer additional flexibility that generalizes the distinction in the sampling literature between separate and combined estimators in stratification.

-5 - In part because of the great flexibility that is available, care must be taken in developing a suitable regression model, (Konijn, 1973, p. 131). Attention must be given to variable selection, multicollinearity, and identifying outliers. However, techniques for handling these problems are fairly well developed and are familiar to many analysts, (e.g. Belsley, Kuh and Welsch, 1980 and Hocking, 1976). An impediment to wide use of regression estimators in sampling applications has been the apparent contradictions between model-based procedures and' sampling considerations. To help reconcile several approaches, Section 2 formulates a class of multivariate regression estimators which includes the linear predictors of Royall (1970, 1971, 1976) and the generalized regression estimators of Cassel, Sarndal and Wretman (1976, 1977). Sampling strategies that integrate the choice of sampling plan and estimator are considered. Emphasizing design-based considerations, Section 3 proposes that robustness be achieved by restricting the strategies to those that are asymptotically design unbiased (ADU), regardless of the model's validity. The subclass of ADU strategies is shown to be determined by an algebraic condition which is used in subsequent analysis and facilitates construction of specific ADU strategies. In Section 4 the model is used to examine the asymptotic efficiency of ADU strategies. When the available auxiliary information is used in an ADU regression estimator, the main role of the sampling plan is to provide suitable inclusion probabilities. The optimal inclusion probabilities are determined by the heteroscedasticity in the model and the characteristic to be estimated, and are in fact proportional to la Ila, called the relevance of I. A useful basis for evaluating varying probability sampling plans is the efficiency of an equal probability plan, e.g. a simple random or proportionately allocated sampling plan.

-6 - Section 5 shows that the ideal inclusion probabilities can be well approximated with a stratified sampling plan using a simple stratification based on relevance. The choice of strata boundaries is shown to be much less critical than in the conventional approach since only the residual variation is of concern, not the within-strata variation of the auxiliary variables. A simple rule is proposed for constructing administratively convenient strata, which generalizes the equal aggregate size recommendation of Hansen, Hurwitz and Madow (1953, pp. 215-219). Section 6 provides two numerical illustrations drawn from utility rate research and cost accounting. 2. SAMPLING STRATEGIES As suggested in the previous section, the finite population characteristic N aYI is to be estimated using observed YI, I c s, together with auxiliary I=1 information X% e Rk known throughout the population. The basis for planning is the superpopulation regression model E: YI = XI'B + uI with E (uI) = k 2 2 E (uIuJ) = O, I J. Here Bg R is unknown but the a = Eg(uI ) are regarded as known. To estimate B, we use the class of weighted least squares estimators with weights qI > 0: ( qIXIXI 1 qXIY I s I Es While an obvious estimator for YI is XI', additional useful information may be extracted from the sample residuals. For all I, define uI to be YI - X' 0. For I c s, uI is the observed sample residual which is usually regarded as containing information about the accuracy of i. By defining additional weights r > 0 associated with the uI we obtain the class of N estimators aiYI with yI = XI ' + rI6IUI. Here 5I identifies the sample, I-1 i.e. I = 1 if I e s, 0 otherwise. We regard a sampling strategy to be determined by a sampling plan characterized by the inclusion probabilities

-7 - 7I = Pr(I E s) together with an estimator determined by the choice of qI and ri, I= 1,...,N. Additional vector notation is useful. Define a = [a1.. aN]', Y = g[Y... YN]' u = [uI... uN]1' and e = [1... 1]', all in RN. Let X = [X1... XN]', the (Nxk) matrix of auxiliary information. Also define the following (NxN) diagonal matrices: Z = diag(a 2), n = diag(wI), Q = diag(qI), R = diag(rI), and A = diag(6 ). In this notation, the model F is y = XB + u, E (u)= 0, E (uu') = Z, known. (2.1) A sampling strategy is characterized by the triplet (H,Q,R). To estimate the population characteristic a'y, we use the estimator a'y with y = Xf + RAu, = (X'QAX)-1 X'QAy, and u = y - XB. (2.2) N It is assumed that the sample size n = I TI is fixed, and that X'QAX is I1= nonsingular for all s with nonzero probability of occurrence. Various subclasses of strategies (H,Q,R) have been considered previously. Strategies with R = I, the identity matrix, will be called linear prediction strategies, (Royall, 1970, 1976; Scott and Smith, 1969; and Smith, 1976). The class of strategies with H > 0 and R = H-1 will be called generalized regression strategies following Cassel, Sarndal and Wretman (1976), and Sarndal (1980). Strategies with R = 0 will be called simple projection strategies. Strategies can be further classified by Q. Important cases are the BLU strategies with Q = -1 and the HI (H-inverse) strategies with II > 0 and Q = H-1.

-8 - As Holt and Smith (1979) note, preference for strategies depends upon a tradeoff between considerations derived from the model (2.1) and considerations derived from the sample design, I. Under i, a'y is regarded as a random variable to be predicted by a'y. Conditional on the sample s, the best linear unbiased predictor is Z aIYI + X aIXI I s Ifs = a'Ay + a'(I-A)XB = a'Xf + a'Au. as determined by the BLU linear prediction strategy (R,'1,I), (Royall, 1976; Smith, 1976). Here the observed sample residuals are used for the sample cases but provide no information about the unobserved residuals. The sample design R plays no role in this estimator. The heavy reliance of BLU strategies on t bothers many samplers, (e.g. Hansen, Madow and Tepping, 1978). They seem to prefer HI strategies because X'ATF1X and X'AITly are design unbiased estimators of X'X and X'y, (e.g Fuller, 1975; Jonrup and Rennermalm, 1976; Kish and Frankel, 1974; and Konijn, 1962). Looking for another compromise, Brewer (1979) suggests a modified HI linear prediction strategy which uses qI = (I1- - l)/xI and rI = 1 for the ratio model (2.1), with k = 1 and xI > 0. A more fundamentally sampling-based position is taken by Cassel, Sarndal and Wretman (1976, 1977). They recommend the generalized regression estimator N i aiX' - ai1X' i X') I s II I = a'IT lAy + a'(X-r-fAX)~ = a'Xg + a'FlhAu. Here the sample residuals are thought to be informative about the unobserved

-9 - residuals. As in the Hurwitz-Thompson estimator, the sample residuals are extended to the population by using the sampling design. With this approach, the choice of B seems to be less critical than with other estimators, and both BLU and RI estimators have been suggested (Sarndal, 1980), as well as generalized ratios (e.g. Raj, 1965). There remains considerable confusion about effective regression strategies, and this confusion has undoubtedly deterred their use. The choice of strategy seems to be unavoidably dependent upon a subjective evaluation of the credibility of the model, the character of the application, and the nature of the available data. A universally optimal strategy cannot be prescribed, but perhaps some of the issues can be clarified. A substantial advantage of model-based analysis is the strong links that are established with linear statistical inference, (e.g. Rao, 1973). With the added definitions -1 C = X(X'QAX) X'Q, and (2.3) T = C + R - RAC, we have Xf = CAy, y = TAy, CAX = X, and I - TA = (I-RA)(I-CA) = (I-A) + (I-T)A. The prediction error a'y - a'y = a'(I-TA)y reduces to a'(I-TA)u under ^ since (I-CA)X = 0. This implies that a'y is a i-unbiased predictor of a'y, with the mean squared error a' (I-TA)(I-T' A)a = a'(I-A) (I-A)a + a'(I-T) A(I-T')a. Under any linear prediction strategy, T-I = (I-A) C so the mean squared error simplifies to a'(I-A)( + EACd')(I-A)a.

-10 - As mentioned, this is minimized by using a BLU linear prediction strategy, (R,'1,I). Of course this is conditional on the sample s. If (2.1) is believed to be accurate, the choice of strategy seems to reduce to the choice of sampling plan II. For instance, working with the ratio model, Royall (1970) has shown that rather weak conditions on Z imply that the mean squared error is minimized by systematically selecting the n largest units in the population. Despite the optimal properties of such a strategy, many survey samplers find it unacceptable, (e.g. Hansen, Madow and Tepping, 1978). For example, few consumer advocates would accept a utility cost-of-service study which determines prices for electricity from the consumption patterns of the largest users in various classes. The problem with optimal model-based strategies does not seem to be the use of (2.1). Godambe's work implies that suitable strategies can only be identified by utilizing some sort of assumptions about the population (Smith, 1976, p. 187). One approach is to utilize information incorporated in a prior distribution, (Ericson, 1969; Scott and Smith, 1969). A closely related approach that has practical appeal is to use the model t to provide this information, (Anderson, Kish and Cornell, 1980; Brewer, 1963; and Rao, 1970). The real problem with optimal model-based strategies seems to be their potential bias if the assumed model is even moderately inaccurate. This concern has stimulated interest in robust strategies that provide some degree of protection against model misspecification. Royall and Herson (1973a,b) and Scott, Brewer, and Ho (1978) provide unbiasedness under a specified class of alternative models by imposing balance conditions on the sample s. Although these writers restrict themselves to the strategies (J,Z-,I) for the ratio model, it is easily seen that given any strategy (H,Q,R), a'y is

-11 -unbiased under the alternative model y = Zy + v with E(v) = 0 if and only if s satisfies the balance conditions a'(I-TA)Z = 0, or equivalently a' (T-I)AZ = a'(I-A)Z. For a linear prediction strategy, the balance condition simplifies to a'(I-A)CAZ = a'(I-A)Z. The balanced sampling approach raises three questions: 1. How to choose the relevant Z, 2. How to identify the set of samples that satisfy the balance conditions for Z, and 3. How to isolate the most suitable sample within the set of balanced samples. The literature that addresses these issues shows an evolution toward a designbased viewpoint, although the model-based strategies (, E-1,I) are generally retained. Just as advocates of model-based strategies have been led to recognize design considerations, survey samplers more comfortable with design-based inference acknowledge the potential importance of model-based planning, although they tend to stay with the design-based strategies (R, F, R1-1) or (R,n-1ro)) Brewer (1979) and Sarndal (1980) have begun a systematic reconciliation of these approaches. This paper attempts to extend and unify their work by studying the general class of strategies (H,Q,R) in a fashion that integrates model-based and design-based considerations. The concept of asymptotic design unbiasedness is used in the place of balance to provide robustness, while the asymptotic model-based mean squared error is used to analyze efficiency. This analysis is carried out in Sections 3 and 4 using the convenient context of varying probability sampling, but Section 5 shows how these strategies can be implemented using conventional stratified sampling.

-12 - 3. ASYMPTOTICALLY DESIGN-UNBIASED STRATEGIES Samplers find great comfort, and appropriately so, in strategies that yield estimators that are design-unbiased regardless of the population, e.g. y calculated from a simple random sample. A bit more reluctantly, they have recognized the usefulness of an estimator such as a ratio estimator that may be biased but is asymptotically design unbiased (ADU). The concept of balance has been introduced in an attempt to meet these concerns and still retain most of the advantages of model-based planning and inference. A more direct approach advanced by Brewer (1979) is to examine model-based strategies that give ADU estimators regardless of the validity of the model. This seems to side-step the problems with balanced sampling and to meet the needs of samplers. In dealing with finite population sampling, care must be exercised in defining the context of asymptotic analysis. For our purposes, it is inadequate to simply let n increase to N. Instead we let the population size and sample size both increase with the sampling fraction fixed. To preserve the character of the original finite population, the population size is conceptually increased by considering an aggregate population of mN units comprised of m copies of the original population. These m copies are assumed to be identical with respect to the known auxiliary information X. For model-based analysis, (2.1) is used to generate m independent realizations of y, say yj, j = l,...,m. However to make this section's analysis independent of the model, in this section the yj are considered to be identical copies of the original y. Under any strategy (R,Q,R), an aggregate sample of mn units is selected from the aggregate population by selecting an independent sample sj from each of the m copies of the population. For each s. we construct an (NxN) indicator matrix A = diag[ I(s)]. j = ip"~J~

-13 - m An estimator of the aggregate population characteristic a'y = m-l X at'y M J~j is formulated by applying the chosen strategy to the aggregate sample. The estimator is defined to be a'ym where mm m Ym m= m1 yJ, (3.1) j=-1 y= XBm + RA. u, = ( y X'QA.X)-1 X'QAj y, and j=l j=l. - - = X. j j m We also define m I _ m -1 A., (3.2) m j=l C = X(X'QH X)-IX'Q, and m n T = C + R - RHI C, m m m m so that m m m j=lJ Using the assumption of this section that yj = y, the population characteristic of interest is a'y and y becomes T ft y. Moreover the assumption m m m that X'QAX is nonsingular for all samples with non-zero probability of occurrence implies that C is bounded. Using this, the strong law of large numbers, and the Helly-Bray Theorem (Rao, 1963, p. 117), we have lim Ep(Ym) (3.3) m+~ = lim Ep(fTmiy) m+oo = TIy

-14 - with C = lim Ep(Cm) m+oo = X(X'QHX)-lX'Q, and T = lim Ep(Tm) m+00 = C + R - RHC. Here E represents expectation with respect to the sampling distributions P determined by H. This motivates Definition 1. The strategy (II,Q,R) is asymptotically design unbiased (ADU) for the characteristic a if and only if a'(I-TH)y = 0 for all y RN. An immediate consequence of this definition is that for any strategy (II,Q,R) that is ADU for a, qI = 0 implies a = 0. Any unit with both fI = 0 and aI = 0 is clearly irrelevant and can be eliminated from the population. Because we are primarily interested in ADU strategies, it is assumed henceforth that H > 0. An equivalent characterization of ADU strategies can be developed from the identity I-TH = (I-R]I)(I-CH). Suppose initially that Q > 0 so that QH defines an inner product over RN. In this case CH = X(X'QHX)-1X'Qn is the orthogonal projector onto the linear manifold M(X) spanned by the column vectors of X, and I-CH is the projector onto the linear manifold orthogonal to M(X) with respect to the inner product QI, (Rao, 1973, p. 47). Since a'(I-TH)y = a'(I-RII)(QH)-'lQ(I-CH)y, (H,Q,R) is ADU for a if and only if (QH)-1(I-RH)a C M(X), or equivalently, (I-RH)a = QIx for some x e M(X). The restriction Q > 0 can easily be relaxed, giving

-15 - Theorem 1. A strategy (H,Q,R) is ADU for a if and only if (I-RR)a = QIx for some x E M(X). While a purely model-based viewpoint leads to the BLU linear projection strategies (R, -1,), the imposition of asymptotic design unbiasedness favors generalized regression strategies (H,Q,r1l). Since 0 E M(X), Theorem 1 implies that a generalized regression strategy is ADU for all a. In fact, any strategy (H,Q,R) is ADU for a if and only if it is equivalent to the generalized regression strategy (I,Q,I-1) for a. For this purpose, two strategies, (R,Q1,R1) and (1I,Q2,R2) are said to be equivalent for a if and only if they produce identical estimates of a'y for all y and all samples with positive probability of occurrence. Given identical Q, two strategies are equivalent if and only if a'(R1-R2)Au = 0 for all s and all y. But, as in the proof of Theorem 1, this is true if and only if (R1-R2)a = Qx for some x c M(X). However Theorem 1 shows that a strategy (R,Q,R) is ADU for a if and only if (1Fl-R)a = Qx, x e M(X). This proves Theorem 2. A strategy (H,Q,R) is ADU for a if and only if (H,Q,R) and the generalized regression strategy (R,Q,R-1) are equivalent for a. Several special cases may illustrate the utility of these results. (a) The ratio model k = 1 with xI > 0 is of great practical importance and has been intensively studied. Theorem 1 shows that (n,Q,R) is ADU for a if and only if qI = (X xI)-l(1-rIrI)aI. (3.4) Here X > 0 is an arbitrary constant of proportionality. For a linear prediction strategy (H,Q,I), (3.4) gives Brewer's (1979) relationship, qI = (XxI)-1(r-I-l)a', X > O.

-16 - A simple projection strategy (H,Q,O) is ADU if and only if qI = ( aIXI)-laI' giving the estimator N a'y = (I t1 aIyI/ X il7aIxi) E aIxI. Jes I s I=1 (b) (2.1) is said to include an intercept-if e = [1... 1]'e M(X). In this case an ADU strategy for a can be constructed using Theorem 1 with x = e, giving qI = (Xa-I)-l(1-rInI)aI. An ADU linear prediction strategy is obtained using qI = X-'l(rrI-l)aI while, for an ADU simple projection strategy, use q = (XTfI)-laI In the previous cases Q involves both H and a, but sometimes III or BLU strategies may be constructed that are ADU for a. (c) (2.1) is said to be directed to a if a e M(X). In this case the strategies (H,H-1,0) and (H,17l-I,I) are ADU for a. (d) A BLU strategy (H,F-1,R) is ADU for a if and only if HIlE(I-RH)a e M(X). In particular, a BLU linear prediction strategy (R,E-1,I) is ADU for a if and only if Z(T-l-I)a e M(X). This odd requirement seems to reflect the dissatisfaction of many samplers with these strategies. A somewhat nicer condition characterizes a BLU simple projection strategy, namely ITR-a e M(X). 4. EFFICIENCY OF ADU STRATEGIES Within the class of ADU strategies, a useful planning criterion is the asymptotic variance of a'y, denoted v(a'y). Here v(a'y) is defined to be the asymptotic expectation, with respect to both design and model, of the mean square prediction error of a'y. The asymptotic construction is as developed in Section 3 but with yj independently generated following (2.1). In this case, there are m independent uj, with E (uj) = 0 and E (ujuj) = E, j = l,...,m. To examine the square error (a' -a'y )2, use (3.2) to note that Ym Ym

m m jI Y - Y = Iy (I-T A )u, since j=l J i j=l m i m Y (I-TmAj)X = m(I-Tm lm)X j=1 = m(I-REm)(I-C )X m mm -0. The t-independence of the u. implies jl m E(aYma'Ym) = m1 E[ X a yjy )]2 j= =1 m =_ l I a'(I-TfA) E (I-T' )a = a'(I-m )Za + a'(I-Tfm) (I-f')a. m m in Now the asymptotic design-based expectation can be evaluated as in Section 3, giving lim EdE (a'Y-a', )2 = a'(I-I)Ea + a'(I-T)HE(I-T')a. (4.1) m+00 Given that (R,Q,R) is ADU for a, a'THy = a'y for all y e RN so that (4.1) simplifies to a'(r-1-I) a. This justifies Definition 2. If (I,Q,R) is ADU for a, then the asymptotic variance of (H,Q,R) for a is v(a'y) = a' ( T1-I) Ea (4.2) N = ) aI2 I -1) I 2 I=1 v(a'y) becomes especially recognizable with an equal probability sample plan, IT = n/N. In this case, v(ay) N2 N-n 1 aI2) v(a'9) = )( a(43) n N N I=1 C,

-18 - By defining the population variance of ay to be N N 2 N-1 a 2 2_ (N-1 ' a y )2 I=l I=1 and the coefficient of determination of (2.1) for ay to be N Ra2 = (a 2 - N-1. ai2 )/ay2 Ray ay CII aya then v(a'y) N2 ( N-n ) R 2)a 2 (4.4) n N ay ay There are three ways of increasing the asymptotic precision of an ADU regression estimator: (1) increase n, (2) increase R 2 by utilizing more relevant auxiliary information, and (3) choose a more efficient strategy (H,Q,R). We now explore the latter possibility. (4.2) shows that v(a'y) depends only on H for any ADU strategy (H,Q,R). The Cauchy-Schwartz inequality implies that N N N ( Ila 1a)2 (I I)( I a 2 aI2 I 1) 1=1 ~l ] [=1 with equality if and only if TI1/2 is proportional to |a Ioa i-1/2. Since N I TI = n, we have 1=1 Theorem 3. Within the class of strategies (T,Q,R) that are ADU for a and have sample size n, the minimum asymptotic variance is N N v(a'y) = n-l( X la IaI)2 I a ai2 aI2 I=1 I=1 The minimum asymptotic variance is achieved by an ADU strategy for a if and only if

-19 - N IT = naII / laJoj. J=l It is perhaps appropriate to call a strategy best for a if it is ADU for a and achieves the minimum asymptotic variance. If laIla is called the relevance of I, then a strategy is best if and only if 1I is proportional to the relevance of I. The best strategy depends very strongly on the population characteristic a. For any best strategy, TI = 0 if and only if a = 0, so units not relevant to a are not sampled. A single strategy can only be best for two characteristics a and a* if Jai and la*l are proportional. For example a strategy that is best for the population total, a = e, is also best for all differences between complementary subclass totals. In the previous section, Theorem 1 was used to examine conditions allowing the construction of certain types of ADU strategies. The class of BLU simple projection strategies (H,Z-1,0) seems especially appealing when the sample size is not large, and the model (2.1) is credible. Theorem 3 shows that there exists a best BLU simple projection strategy for a if and only if sign(aI) oa = xI'X, for some X e Rk. In particular, a best BLU simple projection strategy exists for the population total if and only if (a1...ON)' ~ M(X). Ordinarily it will not be possible to follow a strategy that is best for all a of interest. So it is useful to define the asymptotic efficiency for a of any strategy (H,Q,R) that is ADU for a. Let n be the sample size of (I,Q,R) and let v(a'y) be its asymptotic variance. Suppose n* is the sample size of another strategy that is best for a and has the same asymptotic variance v(a'y). Then it is natural to regard n*/n as the efficiency of (I,Q,R) for a. But Theorem 3 implies that n*/n is equal to

-20 - N N (n I1 -1 a 2 aI 2)( laI IaIj2. (4.5) I=1 1 1 I =1 This quantity is defined to be the asymptotic efficiency of (R,Q,R) for a. In certain cases, (4.5) is determined by the population coefficient of variation of the relevance laIlIi, denoted cvaa. For instance, consider an ADU strategy with inclusion probabilities proportional to aI2 aI2. The efficiency of any such strategy is equal to N N (N I a 2 a12)-( X la 1o )2 1=1 I=1 which is simply (1 + cv 2)-1. An example of this is any ADU strategy for a the population total with pps sampling, if ao2 is proportional to size. A second case, of greater interest, is any ADU strategy using an equal probability sampling plan, uI = n/N. These strategies will usually be preferred in practice unless their efficiency is very poor. (4.5) shows that their efficiency is (1 + cv 2)-1. This means that an equal probability sampling a C plan will be reasonably efficient if and only if all units are more or less equally relevant. Such a plan will be reasonably efficient for the population total if and only if (2.1) is reasonably homoscedastic. However, experience suggests that in many applications the relevant coefficient of variation is well in excess of unity, so that an equal probability sampling plan often has efficiency below 50% even for the population total. Such cases may call for a sampling plan providing inclusion probabilities more in line with the relevance of units. 5. STRONGLY STRATIFIED STRATEGIES We now consider sample design in situations in which the efficiency of an equal probability ADU strategy is poor enough to justify the use of unequal T I. In these cases, stratification can provide nearly optimal inclusion

-21 - probabilities giving strategies which are simple to execute and very compatible with common practice. By using equal iI within each stratum, these stratified sampling strategies side-step most of the problems that are encountered with general varying probability designs. Moreover there is no significant loss in efficiency. Suppose that {Sh:h=l,...,H} is any stratification of the population, and let cvh be the coefficient of variation of laIlaI within the Nh units of stratum h, so that 1 + cvh2 = Nh a ai2( X laI lI)-2. (5.1) I ES I CS Ih ISh We are interested in stratifications satisfying cvh < 2, h = 1,...,H (5.2) for some specified small e > 0. For any such stratification, we consider a sampling plan having sample allocation proportional to the aggregate relevance of units within each stratum, i.e. with rI = nh/n, for I e Sh, where N nh n( l Ia |a )/ V la Ip h = 1,...,H. (5.3) I ~S I =1 h Equivalently the sampling fractions nh/Nh are proportional to the average relevance of units within each stratum. Definition 3. A strategy (H,Q,R) is strongly stratified for a if a) the stratification satisfies (5.2) for a specified small s, b) the allocation follows (5.3), and c) Q and R define an estimator a'y which is ADU for a'y. Theorem 3 provides a lower bound on the asymptotic variance of any ADU strategy with sample size n. However, for any strongly stratified strategy a tight upper bound is also easily derived from (4.2), (5.1) and (5.3): H v(a'y) = I (Nh/nh-1) I a 2 a 2 h=l IcSh I

-22 - N H N a= - IioI (1+cv 2) laIIoI - ' a12 oI2 1=1l h=l I eSh = N N (l+e2)n-l( ) la Iia)2 - 2 a2 2. (5.4) Equivalently, we have Theorem 4. The asymptotic efficiency of a strongly stratified strategy is at least (1+c2)-1. The allocation rule (5.3) is not actually optimal in terms of minimizing v(a'y). The optimal allocation is to choose nh proportional to (Nh aI2 aI2)1/2, I^ I I I S giving H N v(a'y) = n-l[ (l+cvh2)1/2 laIaj]2 - l 2. h=l I Sh 1=1 However, as long as e is small this cannot be much better than the simpler allocation (5.3). As long as e is small, all strongly stratified strategies are almost equivalent in terms of asymptotic efficiency, so the actual choice of stratification is almost inconsequential. The complexities of optimal stratification (discussed in Anderson, Kish and Cornell, 1980; Rao, 1977; and Singh, 1971, 1975) can be avoided simply by utilizing a regression estimator, so that the efficiency of the design depends only on the residual variation and not on the within-strata variation of the auxiliary variables. Any convenient construction of strata can be used, as long as the e is small, including the Dalenius-Ilodges procedure. In practice it may be advantageous to use a design with equal nh. Following (5.3) this is achieved by constructing strata to equalize the aggregate within-strata relevance of units, i.e. by equalizing laI a I. This is a generalization of the equal aggregate size recomIeSh mendation of Hanson, Hurwitz and Madow (1953, p. 219). Cochran (1961) seems to discredit this simple rule, but his findings are due to his failure to use

-23 - the available auxiliary information not only in the sampling plan but also in the estimator, i.e. in both components of the sampling strategy. 6. APPLICATIONS The methods of sample design proposed in this paper are relevant whenever a key target variable is to be measured on a sampling basis from a frame which provides one or more relevant predictor variables. Although this situation is encountered in a variety of contexts, the two examples to be discussed involve accounting for energy usage. In electric utility load research, the target variable y is often customer consumption (i.e. "demand") of electricity during certain peak hours, and the characteristic of interest is the population total of y. The auxiliary information in the simplest case is the monthly usage of electricity (x) that is metered for billing each customer. Analysis can usually be based on a simple heteroskedastic ratio model relating peak period demand to monthly usage. The analysis illustrated by this example is applicable in most sampling situations in which the univariate ratio estimator would ordinarily be used. In the second example the population is comprised of 205 buildings operated by a major university, and the target variable y is the heating cost of each building, which is related to several measures of the size and usage of the building. The characteristic of interest a'y is the share of total heating costs that can be allocated to sponsored research. The vector a is considered to be known from space usage reports for each building, but y is only available on a sampling basis. This example will illustrate the use of multivariate auxiliary information and a nontrivial characteristic of interest. This example also illustrates sample design with 100% inclusion of the most relevant units.

-24 - 6.1 A Load Research Example This example is based on a dataset that Brandenburg and Higgins (1974) have previously used to illustrate sample design for load research. The dataset provides peak demand YI (in kw) and monthly usage (in mwh) for each of n = 210 commercial and industrial customers. We will use these data, called the analysis sample, to plan a new sample of a population of N = 840 customers with known x but unknown y. The purpose of the new sample is to estimate (or N predict) j yI. I=1 Toiplan the new sample, the analysis sample will be used to estimate the parameters of a superpopulation model (2.1) that is assumed to underlie both the analysis sample and the target population. It sometimes may be useful to pool the data from several available past studies and possibly to take into account trends or other changes in superpopulation parameters, but these complexities will not be introduced here. However planning will take full account of the known distribution of x in the target population. Figure 6.1 shows a scatterplot of the analysis sample. Exploratory analysis and experience with several other load research datasets suggest the simple heteroskedastic ratio model: YI - eXI + uI' with (6.1) aI =a 0 XI It is further assumed that uI is normally distributed, although there are two of observations that seem to strain this assumption. The assumed normality can be used to calculate model-based maximum likelihood estimates 0, y and CO using an iterative algorithm (Harvey, 1976). 4I

-25 - Figure 6.1 Scatterplot of Load Research Data Y KW 39400. +t * + 31674. + + * * * 23947 + * + * 16221. + + * *k~c~ * 8494.4 + ** 2 32*** * * 2**** + 23*4* 5X93 * 4XX2** 768.00 +XX* +.... +....................................... +._.+...+_....................................................... +f.~........ +. 242.16 6514+8 3378*5 9651.1 12787. X MWH 15924.

-26 - To describe the algorithm, consider the more general model YI = XI'3 + u with aI = aOz, ZI > 0. Conditional on an initial estimate y0, weighted least squares gives = ( I rZI XIXI) ZI XIy (6.2) Ies Ies ^ 2 -1 ^2 o n vI where I ~s VI =zI (YI-XI')A revised estimathA A revised estimate Y1 = y + Ay is obtained by calculating the ordinary least squares regression coefficient Ay = ( wi2 )- wIc1, where (6.3) Ies I es A2 ^2 wI = v /2a0, and cI =log(z) - n ) log(zI). es This is repeated until convergence. With the analysis dataset, this algorithm gives the estimated relationships YI = 2.737x and A.9832 aI =.9223xI Although the distribution of x in the target population was not published, the following target population statistics are consistent with the analysis sample: N = 840 N -1 N: 9y = 4353.9 kw, 1 N N ) a = 1278.9 kw, and I=1 N-1 a21/2 = 2322.6 kw. =1 2322.6 kw. I=l

-27 - AA 2 These statistics, together with the partial sums of a and a with cases in order of increasing a, are all that are needed to develop an efficient sampling plan. The first step of analysis is to calculate the sample size that would be N required to estimate a'y = X y using an ADU estimator with an equal prob1=1 I ability sampling plan. PURPA specifies + 10% or less relative error with 90% probability. Using this criterion, the asymptotic variance v(a'y), given by (4.3), should satisfy 1.645 v( )/a'y =.10, or equivalently, 1 -n 1.645 - - cv =.10. 1 - n au The statistic cv, called the residual coefficient of variation of relevance, au is cv = (N'1 a N 2 21/2 -1 (6.4) cv u= (N aI a~I ) /(N aIYI) (6.4) au 1 1 1 N N 1 N 21/2 1 A (N E a I) 2/(N I yl) 1=1 I=1 =.5335. When the sampling fraction n/N is negligible, the sample size required with equal probability sampling is O = (1.645 cv /.l0)2 = 77 customers. Correcting for the finite population, the sample size required with equal probability sampling is n1 = n/(l+no/N) = 70.53 customers. The second step of analysis is to examine the reduction in n1 resulting from the best varying probability sampling plan. Using results of Section 4,

-28 - the asymptotic efficiency of an equal probability sampling plan relative to the best varying probability sampling plan is eff = (1+cv 2)-1 ac N N - 19 N 2 = (N 1 I laIlaI /(N 1 a2 ai2) I= l ~=1 N N N 2 -1 N 2 (N X I) /(N X a/ ) 1=1 I=1 =.3032. This means that the best varying probability sampling plan will require a sample size of n2= eff-n = 21.38 customers. In practice, n might be increased to thirty customers to increase confidence in the accuracy of the asymptotic approximations. The best varying probability sampling plan uses inclusion probabilities A Y proportional to a or x'. Since y is so close to one, an alternative design would be to use inclusion probabilities proportional to xI, i.e. to use a pps sampling plan. Using (4.5) and the additional target population statistic N -1 ^ 2 N aI /XI = 1029.7, I=1 the asymptotic efficiency of the pps plan is about - N N (N I a1) /(N1 x NX a /x,) ( N ^2 1 Nn 2 I=1 I=1 I 1=1 =.9996. Another alternative is to use a strongly stratified sampling plan along the lines of Section 5. For any strong stratification, the sample is allocated among strata in proportion to the within-strata totals of aI. These

-29 - strata can be defined in any convenient fashion provided only that the coefficient of variation of ao is small within all strata. In particular, a balanced design can be obtained by dividing the aggregate target population total of aI about equally among strata. This is easily done by examining the cumulative sum of the a in increasing order. Table 6.1 showed a stratified sampling design using six strata with five observations per strata, designed to equalize aI as much as possible. The I eSh h last column shows that the efficiency of this design is at least.92. In fact, this is somewhat conservative, since (4.5) gives the asymptotic efficiency as.95. Table 6.1 Stratum h 1 2 3 4 5 6 Size Nh 440 204 100 56 24 16 Strongly Upper Boundary Xh mwh 728 1,554 3,042 7,955 11,596 16,000 Stratified Design for Load Research Example Sampling Fraction I a 2 nh/Nh IS IsS hh ICh Ich % 104kw 108kw2 cv 1.1 17.83.7764.9: 2.5 17.76 1.638.91 5.0 17.82 3.329.9! 8.9 19.05 7.032.9: 20.8 17.12 12.40.9! 31.3 17.84 20.14.9! 2)-1 h 3 4 5 2 9 9 This simple stratified sampling plan sidesteps the controversy involved in estimating the achieved precision of the pps plan. With the ratio model and stratified sampling, the simple projection strategy discussed in Section 3 gives the combined ratio estimator. In this case, the expressions for the expected asymptotic variance are closely related to the traditional design-based

-30 - measure of precision for the combined ratio estimator. So,. while planning is necessarily model-based, post-sampling analysis can be conventional if desired. Table 6.2 summarizes this sort of analysis for twelve different load-study populations. In the first five of these examples, an equal probability sampling plan can be teamed with the ordinary ratio estimator to provide a rather efficient strategy. In fact each of these is a population of residential customers. The remaining populations, in which varying probability designs will be more advantageous, are all groups of commercial, industrial or municipal customers which are characterized by high variation in x and strong heteroskedasticity in the relationship between y and x. The wide variation of y, eff, and n2 in these examples dramatizes the need to tailor a sampling strategy to the characteristics of each population. Of course some variation in these statistics is due to their sampling distributions, but simulation experiments indicate that these statistics are rather reliable. Simulation can also be used to explore the validity of the asymptotic approximations to the mean and variance of a'y. These results, to be reported in a later paper, are favorable although very small samples, say less than thirty, are not generally recommended. 6.2 A Cost Accounting Example One purpose of utility load research is to allocate the indirect cost of maintaining system capacity in proportion to the peak demands of various subclasses of customers. The second example involves a related problem of cost allocation. In this example, the administration of a large university wants to allocate some of its heating costs to sponsored research. For each of its N = 205 buildings, the administration knows the proportion a > 0 of the building assignable to sponsored research. However, the cost of heating

-31 - N each building (YI) is unknown, although the total cost I yI is known. The N I=1 characteristic of interest is a'y = j aIYI. I=1 Example 1 2 3 4 5 6 7 8 9 10 11 12 Table 6.2 Equal Probability nl 134 12 38 17 166 44 40 92 32 684 983 5,621.51.51.62 1.08.86.91.89.84.77.74 1.05 1.38 1.28 Statistics of Other Rate Research Standard Error of y Efficiency.29.92.23.88.41.81.18.77.14.75.14.57.12.57.06.44.07.38.09.29.11.21.08.16 Best n2 123 10 31 13 125 25 23 41 12 198 205 876 Populations Analysis Sample Size 185 29 30 51 185 32 32 73 83 90 63 30 A simple ratio model might building xI (measured in square be used feet): to relate yI to the total size of each yI= jxI + U1. In this model, the expected heating cost per square foot is identical for all buildings and is the coefficient 0. With this assumption, sampling is unnecessary since 3 can be estimated as

-32 - N N e 3 I y/ / xI I=1 I=1 and a'y can be estimated as a'y where yI = Bx. However this assumption is unrealistic. For example, heavy use of fume hoods in chemical laboratories significantly increases heating costs since replacement air must be heated. A more realistic model relates heating costs to several categories of k building use. Define a vector X = (x1,..., )' R with xj equal to the square footage of building I in use category j. Then the model YI = XIB + uI introduces a vector B e R of coefficients associated with the distinct use categories. In this case YI can be measured for a sample of buildings and this sample data can be used to estimate 8 and a'y. If the cost of measuring YI is high, it is worthwhile to develop an efficient sampling strategy. In order to develop a strategy, an analysis database has been assembled which includes the known aI and XI and a preliminary estimate of YI for all 205 buildings. Three use categories have been used: 1. General, including classrooms and offices; 2. Laboratory, both class and research; and 3. Nonassignable, including out-of-use, custodial, and structural areas. Analysis of these data using the algorithm (6.2)-(6.3) led to the estimated relationship I = 0.371 xi+ 2.359 xI2 + 2.359 xI3, with.7594 oI = 22.43 (x2 + XI3) and to the following finite population statistics:

-33 - N = 205 N N1 aIYI = $17,520 1=l N N- I aII = $10,614 I=1 -1 N 2A 2 1/2 (N X aI I ) = $26,439. I=1 Initially, the analysis follows the same steps as the previous example. If an error limit of + 10% with 95% probability is adopted, the required sample size using an equal probability sampling plan uncorrected for the finite population is 2 n = (1.96 cvau/.10) = 874.8 buildings, where. (-1N 2i 2)1/2 -1 N au ~CVa= a1 I(N 1 aI I) = 1.509. This is corrected for the finite population size N = 205: n = n0/(l+no/N) = 166.1 buildings. The efficiency of this equal probability sampling plan is eff = (l+cv 2)-1 a N N 2 = (_ aII ) / aI aI 1=1 I=1 =.1612, so that this plan can be greatly improved. An alternative plan is to select buildings with probability proportional to their size as measured by x2 + x3. Using (4.5), the efficiency of this

-34 - plan is approximately N 2 N N 2 2 -1 ( aI )/[ I (XI2+XI3) 'I 1 I I (XI2+xI3) ] =.2616; so this helps, but not much. The best plan, in the sense of Theorem 3, is to select units with probability proportional to their relevance for the characteristic of interest, i.e. with probability proportional to aIaI. The sample size n2 that would be required with this plan can be calculated from the size and efficiency of the equal probability plan: n2 = (eff)n1 = 26.78 buildings. This figure n2 should be regarded as a lower bound that can only be achieved by using the optimal inclusion probabilities of Theorem 3. However N in this case, n21aIlI / l|aa JI exceeds one for the most relevant units, J=l so the optimal rule is infeasible. In this situation the best feasible design is to use 100% sampling for units M+1 to N with optimal choice of fI for I < M. Here the units are considered to be in order of increasing relevance, and M is found as follows. Let v be the required value of v(a'y): -1 2 M 2^ 2 v = n ( lla II ) - I aI o I=1 1=l since units M+1,..., N contribute no variance. Moreover, for I=1,..., M, M I = nlaI aI/ Y l ajIl J=1 M M = laI la Iaj/( aJ a2 + ). p, J= i l J=l suh In particular, M is the largest unit such that

-35 -M M 1M 1 aMlI laJ/( aJ aj +v) (6.5) J= l J=1 < 1. Using v = (.10 X aIyI/1.96)2 = 3.35787 x 1010, M turns out to be 193. So the best feasible design is to select the 12 most relevant units with certainty, and to select n3 additional units with probability proportional to relevance as in Theorem 3. Here M M n3 = ( laIlaI)/( I a2 I + v) I 11 =l = 16.43 buildings. This may be raised to 18 to comply with the convention of using a sample of at least 30 observations. In fact, 123 buildings are totally irrelevant in the sense that aI = 0, so these 18 buildings are selected from a very small population of 70 buildings. It may be convenient to select these eighteen buildings using a stratified sampling plan. Table 6.3 summarizes a preliminary plan with three buildings selected from each of six strata with approximately equal aggregate relevance in each stratum. The asymptotic efficiency of this plan is.87. The last column of Table 6.3 shows that the inefficiency comes mostly from stratum one. Table 6.4 shows a subdivision of stratum one into three strata with one sample building per stratum. With this refinement the asymptotic efficiency is improved to.95 so this stratified plan is almost optimal. All of this analysis relies on asymptotic approximations which require validation in this and any other application involving small or moderate sample sizes. In specific cases, both bias and mean squared error can be effectively examined through computer simulation of both the finite population and the sample, conditional on an assumed superpopulation model. This

-36 - Table 6.3 A Strongly Stratified Design for the Cost Accounting Example u Stratum h 1 2 3 4 5 6 Size Nh 43 9 7 4 4 3 Sampling Fraction nh/N h 7 33 43 75 75 100 I aI IeSh x104 17.285 17.538 19.775 15.341 19.943 18.052 I a,2c 2 aeSh x108 13.093 35.457 56.484 59.567 99.999 108.727 (l+cv 2)1 h.53.96.99.99.99 1.00 Table 6.4 A Substratification of Stratum 1 Stratum h Size Nh Sampling Fraction nh/Nh, h ' a aI I I Sh x104 ^2 x a2oI vh I Sh+cv 2)-1 xl08 h (lC la 30 lb 9 Ic 4 3 11 25 6.085 5.895 5.305 1.778 4.264 7.050.69.91 1.00

-37 - technique can be used to study both aspects of the strategy, sample design, and estimator. Within the accuracy of the asymptotic approximations, a large class of estimators is unbiased and equally efficient, but simulation may reveal important differences in the performance of these estimators with small and moderate samples. This work is underway. 7. SUMMARY AND CONCLUSIONS Most work in sampling methodology has been directed to survey research, public health, and other fields where auxiliary information is limited, where the study is multipurpose, and where most of the collected information is qualitative. The present work is directed to management applications of sampling in that the study is narrowly focused on one or just a few quantitative variables that are closely related to detailed auxiliary information readily available in an administrative database. This relationship can be exploited to plan efficient data collection and analysis —in particular to determine the required sample size and to determine the most relevant units to be included in the sample on a random basis with varying inclusion probabilities. The optimal sample design can often be well approximated by a one-way stratified sampling plan. These sampling plans are based on an assumed superpopulation model for the relationship between the target variable and the auxiliary information. However the proposed strategies utilize estimators which are more conventionally based on the sample design in the sense that they are asymptotically design unbiased even if the assumed model is misspecified. This provides a kind of robustness that is important in many sampling applications. The proposed methodology for sample design is based entirely on asymptotic approximations which need to be investigated in specific applications involving

-38 -small or moderate samples. Simulation can perform this task and perhaps reveal differences in the small-sample distributions of estimators that are asymptotically equivalent in terms of their mean and variance. For larger samples, the generalized regression estimators are expected to perform well. In many management applications, multivariate regression models are widely and effectively used for data analysis. This paper has offered an approach to data collection which ties directly into these models.

-39 - REFERENCES Aigner, D. J., (1979), "Bayesian Analysis of Optimal Sample Size and a Best Decision Rule for Experiments in Direct Load Control," Journal of Econometrics, 9, 209-222.,and J. A. Hausman, (1980), "Correcting for Truncation Bias in the Analysis of Experiments in Time-of-Day Pricing of Electricity." Bell Journal of Economics, 11, 131-142. Anderson, D. W., L. Kish and R. G. Cornell, (1980), "On Stratification, Grouping, and Matching," Scandinavian Journal of Statistics, 7, 61-66. Belsley, D. A., E. Kuh and R.. E. Welsch, (1980), Regression Diagnostics, John Wiley & Sons, New York. Brandenburg, L. and C. E. Higgins, Jr., (1974), "Stratified Random Sampling Methods for Class Load Surveys for Electric Utilities," Applied Statistics for Load Research, Vol. III, Association of Edison Illuminating Companies, New York. Brewer, K. R. W., (1963), "Ratio Estimation in Finite Populations: Some Results Deductible From the Assumption of an Underlying Stochastic Process," Australian Journal of Statistics, 5, 93-105. ____, (1979), "A Class of Robust Sampling Designs for Large-Scale Surveys," Journal of the American Statistical Association, 74, 911-915. Cassel, C. M., C. E. Sarndal and J. H. Wretman, (1976), "Some Results on Generalized Difference Estimation and Generalized Regression Estimation for Finite Populations," Biometrika. 63, 615-620., (1977), "Foundation of Inference in Survey Sampling," John Wiley & Sons, New York. Cochran, W. G., (1961), "Comparison of Methods for Determining Stratum Boundaries," Bulletin of the International Statistical Institute, 38, 345-358. Dalenius, T. and J. L. Hodges, Jr., (1959), "Minimum Variance Stratification," Journal of the American Statistical Association, 54, 88-101. Ericson, W. A., (1969), "Subjective Bayesian Models in Sampling Finite Population, I," Journal of the Royal Statistical Association, B, 31, 195-234. Fuller, W., (1975), "Regression Analysis for Sample Survey," Sankyha, 37, C. Pt. 3, 117-132. Hansen, M. II., W. Hurwitz and W. G. Madow, (1953), Sample Survey Methods and Theory, Vol. 1, John Wiley & Sons, New York.

-40-, W. G. Madow and B. J. Tepping, (1978), "On Inference and Estimation from Sample Surveys," Proceedings of the Survey Research Section, American Statistical Association, 82-107. Harvey, A. C., (1976), "Estimating Regression Models with Multiplicative Heteroscedasticity," Econometrica, 44, 461-464. Hocking, R. R., (1976), "The Analysis and Selection of Variables in Linear Regression," Biometrics, 32, 1-49. Holt, D. and T. M. F. Smith, (1979), "Post Stratifications," Journal of the Royal Statistical Society, A, 142, Part 1, 33-46. Jonrup, H. and B. Rennermalm, (1976), "Regression Analysis in Samples from Finite Populations," Scandinavian Journal of Statistics, 3, 33-36. Kish, L. and M. R. Frankel, (1974), "Inference from Complex Surveys," Journal of the Royal Statistical Society, B, 36, 1-37. Konijn, H. S., (1962), "Regression Analysis in Sample Surveys," Journal of the American Statistical Association, 57, 590-606., (1973), Statistical Theory of Sample Survey Design and Analysis, North Holland, Amsterdam and American Elsevier, New York. Raj, D., (1965), "On a Method of Using Multi-Auxiliary Information in Sample Surveys," Journal of the American Statistical Association, 60, 270-277. Rao, C. R., (1973), Linear Statistical Inference and Its Applications, Second Edition, John Wiley & Sons, New York. Rao, T. J., (1977), "Optimum Allocation of Sample Size and Prior Distributions: a Review," International Statistical Review, 45, 173-179. Royall, R. M., (1970), "On Finite Population Sampling Theory Under Certain Linear Regression Models," Biometrika, 57, 377-387., (1971), "Linear Regression Models in Finite Population Sampling Theory," Foundations of Statistical Inference, V. P. Godambe and D. A. Spratt (eds.), Holt, Rinehart & Winston, Toronto., (1976), "The Linear Least Squares Prediction Approach to Two-Stage Sampling," Journal of the American Statistical Association, 71, 657-664., and J. Herson, (1973a), "Robust Estimation in Finite Populations," Journal of the American Statistical Association, 68, 880-889., (1973b), "Robust Estimation in Finite Population, II: Stratificatfion on a Size Variables;" Journal of the American Statistical Association, 68, 891-893. Sarndal, C. E., (1980), "On w-Inverse Weighting Versus Best Linear Unbiased Weighting in Probability Sampling," to appear in Biometrika.

-41 - Scott, A. J., K. R. W. Brewer and E. W. H. Ho, (1978), "Finite Population Sampling and Robust Estimation," Journal of the American Statistical Association, 73, 359-361., and T. M. F. Smith, (1969), "Estimation in Multistage Surveys," Journal of the American Statistical Association, 64, 830-840. Singh, R., (1971), "Approximately Optimal Stratification on the Auxiliary Variable," Journal of the American Statistical Association, 66, 829-30., (1975), "On Optimal Stratification for Proportional Allocation," Sankhya, 37, C, Pt. 1, 109-115. Smith, T. M. F., (1976), "The Foundations of Survey Sampling, A Review," Journal of the Royal Statistical Society, A, 139, Part 2, 183-204. Taylor, L. D., (1977), "On Modeling the Residential Demand for Electricity by Time-of-Day," in Forecasting and Modeling Time-of-Day and Seasonal Electricity Demands, Electric Power Research Institute, Palo Alto, CA.