Bureau of Business Research Graduate School of Business Administration University of Michigan March, 1971 SOME NEW STATISTICAL METHODS FOR ANALYZING INCOMPLETE BRAND PREFERENCE DATA WORKING PAPER NO. 28 by M. J. Karson Assistant Professor of Statistics University of Michigan W. J. Wrobleski Associate Professor of Statistics University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Bureau of Business Research

BACKGROUND OF THIS PAPER This paper presents several new research findings in statistics developed through nonsponsored faculty research activities in the University of Michigan Graduate School of Business Administration. The primary goal of the authors is to extend mathematical methods of statistics to the analysis of data-analytic problems which occur in other functional fields of business administration. Thus it is the intent of the authors to contribute both to the theory of statistics as well as to provide the active business manager with more powerful statistical aids for actually carrying out quantitative decision analyses related to practical problems in business administration involving incomplete information and uncertainty. The authors wish to acknowledge the assistance of Mr. W. E. Miklas in preparing Figures 1-6 and aiding with other helpful computations. Mr. Miklas' help was partially supported by the Faculty Assistance Fund of the University of Michigan Since this paper is going to be submitted for publication elsewhere, its format differs from that usually prescribed by the Bureau.

ABSTRACT In this paper a probability model is introduced for sampling data frequently encountered where some observations have not been classified into existing population categories and, therefore, are only partially informative. By using this sampling model, which involves a certain conditional probability appearing as an allocation parameter, the problem of estimating various underlying population parameters is discussed in terms of identifiability, consistency, and unbiased estimation, Several estimators related to constrained maximum likelihood estimation and minimum average (weighted) bias estimation are derived and their properties discussed.

I' 1. Introduction This paper is concerned with statistical inference in sampling situations where a response actually belongs in one of a number of mutually exclusive population categories, but, when the sampling is performed, other categories not included in the original set appear. Consider, for example, the following typical marketing problem. We want to estimate, by means of a two-brand preference study, the proportions of potential product users who prefer Brand A, Brand B, or who have no preference. A random sample of size n is chosen from the population and each sampled individual, after using the products, is asked to state whether he prefers A, B, or has no preference. The philosophical question raised by Odesky [5] of permitting a "no preference" category as an easy way out for the respondent is of no concern here. What is of concern is that new categories arise upon sampling. For example, some people are not at home at the time of the interview or refuse to answer. The former often results in costly and frequently wasteful call-back procedures, while the latter often occurs when income classifications are needed. A survey of the brand preferences of 200 individuals might yield the following data: Brand A Brand B No preference Not at home Refusal Number of People 79 66 7 28 20 -1 -

.e -2 — The problem of analyzing such data for the purpose of statistical inference —particularly for estimation of the population proportions for Brand A, Brand B, and "no preference" —is apparent. A probability model describing,in general,brand preference market surveys which involve I population categories and J new sampling categories can be developed. Consider a random sample of size n drawn from a multinomial population with I categories K1,K2,..,,KI and cor-I responding probabilities p,P2,..,p, with.. n the sample m 1=1 observations fall in K., i=l,2,...,I, and,for various reasons, J new 1 categories KK*,...,K* arise with m* observations in K*. These new 1.2' J J categories are called partially informative categories since an observation in K* actually belongs in one of the Ki, but has not been classified 3 in its proper category. Denote by X.. the probability that a response, observed in K4, actually belongs in K.; in other words, = P(K.i|K). Furthermore, let p* be the probability that an observation falls in K*, 3 ' or p = P(K4). In the brand preference example, there are I=3 population categories (Brand A, Brand B, and no preference). Upon sampling, J=2 new sampling categories arise ("not-at-home" and "refusal-to-answer"). Schematically this situation may be represented by the diagram on page 3. Thus the cell probabilities for the I+J-5 categories which actually occur in the market survey explicitly display the allocation parameters A.. related to the two partially informative sampling categories. In general, if we let p = (p,P2,...,pI), p* = (p*,p* S,...p), A = (A11,A12,...,A I) and 0 = (p,p*,A), then the sampling model is given JLJL 2~z IJ

-3 - Sampling Population Categories (I=3) Categories (J=2) Not Refusal Brand A Brand B No Preference at to Home Answer P1l P2 P3 | ll'2131 (X12'X22'X32) 2 2 2 p1 - j.p p12 C- Z X2.p 3 pXp * j-l j 2P2 23 - 3j P1 P2 j=- j= j=l. I All ( X21 31 12 _22 32 by I J m. J m* f(m,m*; e) = I (Pi..., (1.) 1 3[~~j l 1 ij 3 fim.! H m*.! j-1 j 1 i=l 1 j=l where m = (mlm2...,mI) and m* = (m*,m,...,m*) are the data vectors of classified and partially classified observations. Moreover, the parameters in (1.1) are subject to the following restrictions: I J I X =1, X..ptc<p<l, O<pX*..<l. (1.2) 1 ij j=l j i ' Pj.1 3 j=l '1 The model is simply a multinomial distribution for the observed sampling categories with the usual restriction that the sum of the category probabilities for the I+J cells be one. In other words, if K is the J J event "partially informative," or K U Kt, then Pi X.i.pj is the j=- J j- t1 probability of the event "K. and not K." 1

lw - -4 -This formulation of the general model is distinct from other approaches which have been proposed for analyzing other kinds of data. In particular it differs from multinomial sampling with misclassification, as in Bryson [2] for example. It also differs from the assumption of Draper, Hunter, and Tierney [3] that population categories themselves include the new partially informative sample categories. This last approach does not explicitly introduce allocation parameters for the nonpopulation categories which occur through the sampling process. Finally, it differs from a model of Blumenthal [1] for "nested" kinds of sampling data where the observed measurements are known to fall into certain primary categories but cannot be further subclassified into secondary categories. The model formulation in the present paper, for example, enables the marketing analyst to explicitly recognize that different patterns of preference behavior might be associated with the different partially informative categories, and that these distinctions can and should be maintained whenever one desires to allocate these partially informative responses back to the original cells. The distinct feature of the model discussed in this paper is the explicit use of the conditional probability parameters, Xi, which serve as basic'allocation arameters from partially informative categories back into the original population categories. 2. Binomial Model with One Partially Informative Category In this paper we are concerned primarily with the estimation of p1 in the case when (1.1) corresponds to a binomial model with two

-5 -informative categories, K1 and K2, and a single partially informative category, K*=K. For example, a brand preference study might involve only two brands, lack a "no preference" category, and restrict sampling to consumers who have a brand preference among the two products. In the course of sampling, however, a single partially informative category occurs such as "refusal to specify a preference, "can't be reached for interview," or some other kind of nonresponse. Consider (1.1) when there are two informative categories with one partially informative category occurring in the sample, that is, when 1=2, J=l, X1=X, X21=l-, and p*=p m*m. Then the sampling model '11 '21 1,p1 P 1 o for a binomial population in which one partially informative sampling category occurs is simply m m1 m2 f(mlmo; ) ) m P0 1 1) Po 1 2 where the parameter 0 = (pl,po,k) is restricted to the admissable parameter space given by O = {(Pl, PX)0I<Po',p1X<l,Xp<pl < P+(l-p)}. (2.2) 3. Estimation of Constrained Maximum Likelihood under Identifiability Conditions The sampling model (2.1) is unidentifiable with respect to 0 = (plPo,X) since different values of 0 may correspond to the same sampling density. In fact, of the three parameters, only p is identifiable. One consequence of this is that there is no consistent estimator of p1 or of X. Furthermore, the maximum likelihood equations 6f/6pl = 0, 6f/6p = 0, and 6f/6X 0 are not linearly independent, and

. ir -6 -there are infinitelymany solutions, A = (P1, p,X). In this section we assume certain a priori knowledge about the parameters of (2.1) which leads to the model's identifiability, and the constrained maximum likelihood estimators for p- are found. The four following cases are considered: 1. When p and X are known 2. When X alone is known 3. When X is unknown and X = p1 4, When A/(1-X) = 6(pl/p2) with 6>0 known When p and X are known, the single maximum likelihood equation for p1 has a unique solution given by P1 = XPo + (ml/(ml+m2))1-p, (3.1) provided not all observations are unclassified, namely, provided m #n. Noting that (K1) = P(KI|K)P(K) + P(K1K)P(K), (3.2) where K is the complement of K, or the union of the informative categories, the maximum likelihood estimator of p1 simply estimates P(KllK) by ml /(ml+m2). In this case p1 is the uniformly minimum variance unbiased estimator of p1. In the case when X is known the maximum likelihood equations yield the solutions A ^ For these estimators, E(p ) = ndE = p. Iaddition, pp and (p 0 P are complete and sufficient statistcs for p and p, and, consequently, they are the uniformly minimum variance unbiased estimators of pO and p

. 1 -7 -This estimator of p1 weights the observed proportion of partially informative observations by A and allocates this fraction to the observed proportion of classified observations in K1. The condition X = p1 means that whether an observation belongs to one population category or the other is independent of whether or not it is a partially classified observation. In this case the maximum likelihood estimators of p and p1 are 0 p = m /n and pi = m (m ) (3.4) Again, these estimators are the uniformly minimum variance unbiased estimators of p and p1. In the fourth case above, the ratio of the conditional category probabilities is assumed to be proportional to the ratio of the unconditional category probabilities, and it is assumed that the proportionality constant 6 is known. The previous assumption, X = pp is equivalent to 6 = 1. In general, when 6 t 1, the maximum likelihood equations are quadratics in pI and the solutions to these equations lead to the estimators A-[A 2-4nm (1+) 12 P m /n and p = n —) (3.5) where A=ml+6m2+(1-6)n. For 6 between 0 and 1, the numerical values of this estimator of p1 are bounded by ml/n, the maximum likelihood estimator of p1 in a model with no partially informative categories, and ml/(ml+m2), the maximum likelihood estimator of p1 when X = p-. 4. Unbiased Estimation of 1_ without Identifiability Conditions As previously discussed in Section 2, no consistent estimator for p1 exists in (2.1) without a priori identifiability conditions.

I i r I -8 -Besides consistency the question of unbiased estimation of p1 also arises. Clearly, an unbiased estimator of p1 or of any polynomial of finite degree in p1 does not exist. This follows since the expected value of any function of the data contains multinomial terms of nonzero degree in X and p with nonzero coefficients. Nor can the ratios P1/P2 or p2/pl be estimated without bias. Actually, there does not exist any unbiased estimator for any function h(p, po,) of the three parameters pi, Po, and X which is partially differentiable in Pl and for which 6h/6pl # 0 when evaluated at X = 1 and p = 1. Alternately, since unbiased estimators for p1 do not exist, one might consider families of estimators having given parametric forms for their bias and seek an estimator whose variance achieves the Cramer-Rao lower bound. Since the model (2.1) is unidentifiable, however, it can be shown that the information matrix which appears in the expression for the Cramer-Rao lower bound is singular, and that consequently no unique lower bound of the Cramer-Rao type exists. 5. Mean Square Error Comparisons of Two Estimators In this section we consider the general problem of estimating p- when no a priori knowledge of any form about the parameters is available. We proceed in this case by developing certain canonical estimators for p1 and comparing these estimators in terms of their mean square errors, with particular attention on the bias component of the mean square error. In particular, we compare a so-called "natural" estimator and an estimator selected as a canonical representative with respect to a statistical optimality criterion from a family of potential estimators of pl'

al — 9 — In many brand preference studies leading to (2.1) for the probability density of the data, a common procedure for estimating p1 is to ignore the m observations falling in K and to estimate p1 by the "natural" 0 estimator, /s,. 1) p1 = ml/(ml+m2), when m<n. (5.1) For example, suppose that 20 respondents in a brand preference study expressed preference for Brand A, 48 for Brand B, and 32 did not express their preference. For these data, the estimated proportion of buyers in the population preferring Brand A to Brand B would be p1 = 0.294 using { R 1 \~~~~~~~~~~~~~~~~~~~~~~~~~~~~T The estimator (5.1) is used widely since it would be the uniformly minimum variance unbiased estimator if the m observations belonging to category K were ignored and the original fixed sample size n considered to be the reduced sample size n-mo=ml+m2. If the estimator (5.1) is expressed in the form, = (l/n)(ml+(ml/(ml+m2))m), (5.2) one finds this same estimator is also obtained when the m partially informative observations in category K are allocated to the categories K1 and K2 in proportion to the sample proportions ml/ml+m2 and m2/ml+m2 observed in these categories. It is of interest to recall that the estimator (5.1) or, equivalently, (5.2), would be the constrained maximum likelihood estimator of p1 if it were assumed that pl = A. In other words, when the conditional probability that an observation belongs in K1 given that it is observed in K is the same as the unconditional probability that it

-10 -belongs in K1, constrained maximum likelihood estimation would yield the estimator (5.1). This, of course, may be an unreasonable assumption in many applications. Indeed, the marketing analyst who uses (5.1) is assuming implicitly that whether a respondent prefers Brand A to Brand B or Brand B to Brand A is independent of whether or not the respondent is willing to reveal his preference. Of course, under this assumption, allocation of responses in the undecided category according to the proportion of the respondents expressing a preference is clearly a reasonable estimation procedure. The expected value of p1 in (5.1) is E(p1) = (Pl-Xpo)/(l-p (5.3) while its squared bias, denoted by B(pl), is given by B(pl) [(o(A-pl)/(l-p)]2 (5.4) The variance of pl can be expressed as PltXP o P1-poVar(l) = ( 1 p ) (1- P) E((ml+m2)ml+m2>O). (5.5) Pi 1-P 2o mPo Since the conditional expectation in (5.5) can only be represented as a finite sum and can't be expressed explicitly in a closed form involving pl,P,and A, we use an approximation for inverse binomial moments suggested by Mendenhall and Lehman [4] to approximate Var (P1) by.nha to a Pl-PPo) by Va - n n-2 X pi!o Var(p1 = ) (n (-p ) (1- (5.6) =n (Tn- ) (I -p0 1-> I-p We observe for large n that Var(pl) approaches zero, but that B(pl) does not. Hence for large n it becomes more important to consider the bias contribution to the mean square error of the "natural" estimator (5.1).

-11 -Recalling, when A is known, that the minimum variance unbiased estimator of Pl is simply p1 = (ml +XAm)/n suggests investigating estimators of the form, pl(a) = (ml + am)/n, (5.7) where a is chosen to satisfy a statistical criterion for optimality related to the variance and bias of p(a) in (5.7). Since E(pl(a)) P1 + po(a-A), the squared bias of pl(a) is A 2 2 B(pl(a)) = p ( (a-X(5.8) The variance of Pl(a) is given by Var(pl(a)) = [a po(l-Po) - 2ap (pl-po) + (p -Xpo) (p2+tp)] (5.9) while its mean square error is MSE(pl(a)) = B(pl(a)) + Var(pl(a)). One criterion for choosing a, which leads to a canonical representative from this family of estimators, is to minimize an average weighted mean square error criterion over the admissable parameter space 0 given by (2.2). In other words, if (e6) is some weighting function defined over the admissable parameter space, then a is to be chosen to satisfy, 1 1 1-(1-X)p min, (5.!0) fa / ff f MSE(pl(a))u(plpoX)dpldpod. (510) a 0 Ap0 0 0 Assuming c(o) is continuous with continuous partial derivatives over 0, one may differentiate with respect to a under the integral appearing in (5.10). Now MSE (Pl(a)) is a quadratic in a of the form 2 a + Sa + y, whose coefficients a, ~, and y depend on 0. If we use (5.8) an (2.) t e cs ae f d to be (5.8) and (5.9) these coefficients are found to be a(0) p + p

it (1- po)/n, (e) - -2(Xp + Po (p - Xpo)/n) and Y(9) A2 (P1 - 4p ) (p + Xpo)/n. Thus, the value a* of a which minimizes this average weighted mean square error criterion is given by a* = /B(e)w(o)de/2 f a(o)wo(e)da. (5.11) O O 0 0 Applying the criterion in (5.10) to the special case in which the weighting function w(0) = 1 is chosen leads to the stationary value a* = 0.5 as the minimizing value, and the corresponding canonical estimator in the family (5.7) is p = (m1 + 0.5mo)/n = p(05). (5.12) This equal allocation of the responses in one partially informative sampling category to the two population categories is of special interest in brand preference analysis, and it has been discussed in Odesky [5] without quantitative statistical justification. Using the data previously discussed in which 20 respondents from a sample of n = 100 reported a preference for Brand A, 48 for Brand B, and 32 did not reveal their preferences, the population proportion p1 preferring Brand A to Brand B would be estimated by p1 = 0.360 using (5.12) rather than by pl = 0.294 when the "natural" estimator (5.1) is used. 6. Squared Bias Comparisons of nd (a Since for large n, Var(pl) and Var(0(a*)) for a*=0.5 both approach zero while their biases do not, we compare the squared biases B(P1) and B(pl(a*)) of these two estimators. Thus we define R(a,pl,po,X) r

-13 -as the ratio of the squared bias of pl(a) to the squared bias of pi, namely, R(a,pl,poX) = B(pl(a))/B(pl) = (X-a)2(l-po)2/(-pl)2. (6.1) Figures 1-5 give the contour R=l of R(a*,pl,pA) for a*=0.5 and 1 = 0.2,0.4,0.5,0.6, and 0.8 in the admissable (pO,)- parameter space. It is interesting to note that in each of these figures, the proportion of the admissable (P1,X)- parameter space in which the squared bias of the estimator Pl(a*) for a*=0.5 is smaller than the squared bias of the "natural estimator" (namely, where R<l) is greater than the proportion of the admissable region where the "natural" estimator p1 is preferred to pl(a*) for a*.05 in terms of its squared bias (namely, where R>1). In fact, for any value of p1, let A(pl) denote the percentage of the area in the admissable (p,X)- parameter space where the estimator pl(a*) for a* = 0.5 is preferred to p1 on the basis of smaller squared bias. Figure 6 gives a graph of A(pl). Thus, in terms of smaller squared bias, one finds that the "canonical" linear estimator Pl(a*) is more frequently preferred in terms of smaller squared bias than the "natural" estimator for values of p1 between 0.19 and 0.81. I

: I I -14 - P. 1.0.9.8.7.6.5.4.3.2.1 R> I. R<1 0 x 0.1.2.3.5.6.7.8.9 1.0 Figure 1: R(.5,.2,p,X) 0

l l. -15" P0 x 0.1.2.3.4.5.6.7.8.9 1.0 Figure 2: R(.5,.4,p,x) 0

p Ir. o /\ 1.0.9.8.7.6 -16-.5.4.3.2.1 0.1.2.3.4.5.6.7.8.9 1.0 Figure 3: R(.5,.5,p,X) 0

-17 - p0 1.4 0.1.2.3.4.5.6.7.8.9 Figure 4: R.5,.6,p,X) 0.

a -18 - Po.9 1.8.7.6.5.4 - 3 1-(l-"A)p =0.8 0. 41.2 R< R> 1.1 A.1.2.3.4.5.6.7.8. 9 1.0 Figure 5: R(,5,.8,p,X) 0

-19 - Per cent 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.1.2.3.4.5.6.7 hs.8.9 1, 0,r p1. 8. 9 1.0 Figure 6: A(p )

-20 -7. References 1. Blumenthal, S. "Multinomial Sampling with Partially Categorized Data" Journal of the American Statistical Association, Vol. 63, No. 322 (June, 1968). 2. Bryson, M. R. "Error of Classification in a Binomial Model. Journal of the American Statistical Association, Vol. 60, No. 309 (March, 1965). 3. Draper, N. R.; Hunter, W. G.; Tierney, D. E. "Which Product is Better?" Technometrics, Vol. 2, No. 2 (May, 1969). 4. Mendenhall, W., and Lehman, E. H., Jr. "An Approximation to the Negative Moments of the Positive Binomial Useful in Life Testing." Technometrics, Vol. 2, No. 2 (May, 1960). 5. Odesky, S. H. "Handling the Neutral Vote in Paired Comparison Product Testing." Journal of Marketing Research, Vol. 4, No. 2 (May, 1967).

-21 - WORKING PAPERS Working Paper Number 1 f"Reflections on Evolving Competitive Aspects in Major v^.,~~~ Industries," by Sidney C. Sufrin 2 "A Scoring System to Aid in Evaluating a Large Number of Proposed Public Works Projects by a Federal Agency," by M. Lynn Spruill 3 "The Delphi Method: A Systems Approach to the Utilization of Experts in Technological and Environmental Forecasting," by John Do Ludlow 4 L"What Consumers of Fashion Want to Know," by Claude R. Martin, Jr —Out of print. To be published in a forthcoming issue of the Journal of Retailing. 5 "Performance Issues of the Automobile Industry," by Sidney C. Sufrin, H. Paul Root, Roger L. Wright, and Fred R. Kaen —Out of print. To be published as a future Michigan Business Paper. 6."Management Experience with Applications of Multidimensional Scaling Methods," by James R. Taylor 7 "Profitability and Industry Concentration," by Daryl Winn 8 "Why Differences in Buying Time? A Multivariate Approach," by Joseph W. Newman and Richard Staelin —Out of print. To be published in a forthcoming issue of the Journal of Marketing Research, 9 "The Contribution of the Professional Buyer to the Success or Failure of a Store," by Claude R. Martin, Jr. 10 "An Empirical Comparison of Similarity and Preference Judgments in a Unidimensional Context," by James R. Taylor 11 "A Marketing Rationale for the Distribution of Automobiles," by H.O. Helmers 12 "Global Capital Markets," by Merwin H. Waterman 13 "The Theory of Double Jeopardy and Its Contribution to Understanding Consumer Behavior," by Claude R. Martin, Jr. 14 "A Study of the Sources and Uses of Information in the Development of Minority Enterprise —A Proposal for Research on Entrepreneurship," by Patricia Braden and H. Paul Root

rl -22 - Working Paper Number 15 "Program Auditing," by Andrew M. McCosh 16 "Time Series Forecasting Procedures for an Economic Simulation Model," by Kenneth O. Cogger 17 "Models for Cash Flow Estimation in Capital Budgeting," by James T. Godfrey and W. Allen Spivey 18 "Optimization in Complex Management Systems," by W. Allen 19 "Support for Women's Lib: Management Performance," by Claude R. Martin, Jr. 20 "Innovations in the Economics of Project Investment," by Donald G. Simonson Spivey -4k 21 "Corporate Financial Modeling: Systems Analysis in Action," by Donn C. Achtenberg and William J. Wrobleski 22 "Sea Grant Delphi Exercises: Techniques for Utilizing Informed Judgments of a Multidisciplinary Team of Researchers," by John D. Ludlow 23 "The Spanish in Nova Scotia in the XVI Century —A Hint in the Oak Island Treasure Mystery," by Ross Wilhelm 24 Not yet ready for release. 25 "Market Power, Product Planning, and Motivation," by H. Paul Root 26 "Competition and Consumer Alternatives," by H. Paul Root and Horst Sylvester 27 "Stepwise Regression Analysis Applied to Regional Economic Research," by Dick A. Leabo