RESEARCH SUPPORT UNIVERSITY OF MICHIGAN BUSINESS SCHOOL APRIL 1997 A NOTE ON LATENT SEGMENTATION MODELS WORKING PAPER #9712-08 BY STEPHEN WALKER IMPERIAL COLLEGE, LONDON, UK AND PAUL DAMIEN UNIVERSITY OF MICHIGAN BUSINESS SCHOOL

A Note On Latent Segmentation Models Stephen Walker1 and Paul Damien2 Department of Mathematics, Imperial College, 180 Queen's Gate, London SW7 2BZ. 2 Business School, University of Michigan, Ann Arbor, 48109-1234, USA. Summary Developing polytomous models of choice which could enable in identifying market segments has been the focus of attention in many papers in recent literature (Ramaswamy et al., 1996 and the references therein). In this note, we show that latent segmentation models of the type developed by Ramaswamy et al. is always nonidentifiable, and hence the parameters of interest are nonestimable. 1

The joint segmentation model of Ramaswamy et al. (1996) considers data from consumers on two sets of categorical variables that serve as distinct bases for segmentation and are interdependent (specifically, benefits sought and services used). It is apparent that the data collected provides no information concerning the allocation of consumers to particular segments (indexed by (i, j)). Therefore, it is readily apparent that it is impossible to make inference (i.e., estimate) the parameter 0iy, which is the joint probability of a consumer belonging to segment i and segment j. Consequently, the model they construct is nonidentifiable. To illustrate this point we consider an analagous model: let observations from N individuals be categorised from the finite set {l,....,m}, and let Xnk = 1 if the nth individual is in category k and zero otherwise, and let -Tk be the probability that an individual is in category k. Now let Qi represent the probability of a consumer being in segment i and rTik the probability of an individual in segment i being in category k. Obviously we have Ek rik = 1, Et 0i = 1 and Ek Wk = 1. The likelihood is given by I (zE [ 1\]) This likelihood expression is essentially a special version of the type of likelihood appearing in equation (2) of Ramaswamy et al. (1996). Clearly our likelihood can be written as n (E o7wikn) n i where kn is the category of the nth individual. This becomes H ([E Oiik] where nk is the number of individuals located in the kth category. Finally, our likelihood becomes m nk k=1 which is precisely what we should have started with in the first place. It is obvious the Oi and 7rik are nonestimable. An exact argument to the one 2

above can now be applied to the likelihood appearing in Ramaswamy et al. (1996, equation (2)). The details are now given. Let L denote equation (2) from Ramaswamy et al. (1996), i.e. the likelihood function. We can write this likelihood in a different way: N XK M \ n=l i \k=1 m=l where, for example, iklkn denotes that the nth individual is in the Ith category, (I = 1,..., Lk), for the kth variable, (k = 1,..., K), in segment i; and so, iktkn denotes the probability of this event. Let C denote the set of possible categories, and n, denote the fnumber of individuals who are in c E C. Then, we have cEC ij where 7rij is the probability of being in category c conditional on being in segments i and j. This last expression is identical to HI 7Q-, cEC which, again, is the correct likelihood: so, clearly, this implies that Oij is nonidentifiable. The models discussed above should not be confused with continuous mixture models: f(y; ) = (Y; ), where E. 0, = 1. Each (0, b) defines a unique density and hence such models are identifiable (in general). The discrete categorical model is given by: p(y = k;, 7r) = Op(y = ks). Let 7rsk p(y = kIs). Clearly EA 0ssWk = rk, where 5rk = p(y = kc; i). Note, then, that for all (0, 7r), p(y = k; 0, 7r) = p(y = k; i). 3

Hence, the model is nonidentifiable for 0; i.e., many (0, 7r), in fact an infinite number, lead to the same probability model for the data. Acknowledgement We thank Venkat Ramaswamy for drawing our attention to the literature on latent class models, and for valuable discussions on this topic. REFERENCE Ramaswamy,V., Chatterjee, R. and Cohen, S.H. (1996). Joint segmentation on distinct interdependent bases with categorical data. Journal of Marketing Research, 33, 337-350. 4