THE UNIVERSITY OF MICHIGAN INDUSTRY PROGRAM OF THE COLLEGE OF ENGINEERING CONTRIBUTIONS TO ESTIm4TION IN A CLASS OF DISCRETE DISTRIBUTIONS Ganapati P. Patil A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the University of Michigan 1959 June, 1959 IP-371

Doctoral Committee: Professor Cecil Co Craig, Chairman Professor Arthur Ho Copeland, Sr. Professor Paul S. ]Dwyer Professor Edwin Eo Moise Associate Professor William J. Schull

ACKNOWLEDGMENT The author wishes to express his deep sense of gratitude and appreciation to Professor Cecil C. Craig and to Professor Paul S. Dwyer for their constant help and encouraging guidance during the preparation of this dissertation. He is also most thankful to Professor Arthur Ho Copeland, Sr. for encouragement. The author wishes to take this opportunity to express his sincere thanks to Professor C. R. Rao and Doctor J. Roy for stimulating discussions and fine facilities at the Indian Statistical Institute, Calcutta, where the author started his investigations in the field of discrete distributions at the kind suggestion of Professor C. R. Rao. Finally, the author is indebted to the Industry Program of the College of Engineering for the preparation of the final manuscript and copies of this thesis. iii

TABLE OF CONTENTS Page ACKNOWLEIDGMENT....................................... iii LIST OF TABLES.v.......................................... vii LIST OF CHARTS..........,,.........,.,,.............. viii INTRODUCTION.....*........................I...................... 1 General Background............................. 1 Review of Previous Work in Estimation in Discrete Distributions................................. 2 Present Contributions to Estimation in Discrete Distributions...*............................. 6 CHAPTER I 1.0 A CLASS OF DISCRETE DISTRIBUTIONS AND CERTAIN CHARACTERIZATION THEOREMS........................... 8 1.1 Introduction..............................a...... 8 1.2 Functional Dependence of Variance and Mean of a gpsd. 11 CHAPTER II 2.0 LIKELIHOOD ESTIMATION AND ALLIED PROBLEMS IN A CLASS OF DISCRETE DISTRIBUTIONS..*.......................... 19 2.1 Estimation by Likelihood for a Complete gpsd,....&... 19 2.2 Estimation by Likelihood for a Truncated gpsd........ 23 2.3 Estimation by Likelihood for Censored gpsd......... 25 2,4 Estimation with Doubtful Observations........*....... 26 2.5 Homogeneity and Combined Estimation.................... 29 2.6 Estimation for a gpsd with Two Parameters,........... 32 CHAPTER III 3.0 SIMPLE METHODS OF ESTIMATION FOR A CLASS OF DISCRETE DISTRIBUTIONS.......... ~ ~ ~ * *. *..... ~~ ~ ~ ~~ ~.. 35 3.1 Estimation by the Ratio Method for a gpsd s........... 35 3.2 Unbiased Estimation by the Ratio Method for a gpsd... 40 3.3 Estimation by the Two-Moments Method for a gpsd...... 42 3.4 Estimation by the Two-Moments Method for a Truncated gpsd................................. 44 iv

TABLE OF CONTENTS (CONT'D) Page 3.5 An Upper Bound for Bias per Unit Standard Error for Ratio Estimates and Two-Moment Estimates...... 47 3.6 Estimation for a Truncated gpsd with a Finite Range of Consecutive Integers, Maximum Unknown........... 48 CHAPTER IV 4.0 ESTIMATION PROBLEMS FOR THE BINOMIAL DISTRIBUTION......... 51 4.1 Introduction...........,...................... 51 4.2 Estimation from a Sample for a Singly Truncated Binomial Distribution.. *........,*......*.........* 53 4.3 Homogeneity and Combined Estimation for Singly Truncated Binomial Distributions................. 72 4.4 Estimation from a Sample for a Doubly Truncated Binomial Distribution.......................~.... 77 4.5 Simultaneous Estimation of Both Parameters of a Binomial Distribution......................... 80 CHAPTER V 5.0 ESTIMATION PROBLEMS FOR THE POISSON DISTRIBUTION........... 89 5.1 Introductionc,.i..,...... 89 5.2 Estimation from a Sample for Truncated Poisson Distribution........,..... *............*.... 91 5.3 Estimation from a Sample for a Censored Poisson Distribution.....*.........~..,....... 109 5.4 Estimation with Doubtful Observations...*............ 113 CHAPTER VI 6.0 ESTIMATION PROBLEMS FOR THE NEGATIVE BINOMIAL DISTRIBUTION................................. 117 6.1 Introduction............................. 117 6.2 Estimation of Parameters of Complete Negative.Binomial Distribution.......................... 119 6.3 Estimation of Parameters of a Truncated Negative Binomial Distribution....................... 121 6.4 Estimation when k is Known.......................... 124 6.5 Homogeneity and Combined Estimation, k Known........ 132

TABLE OF CONTENTS (CONTD) Page CHAPTER VII 7.0 ESTIMATION PROBLEMS FOR THE LOGARITHMIC SERIES DISTRIBUTION........................ 136 7.1 Introduction................ 136 7.2 Estimation from a Sample for Complete Logarithmic. Series Distribution.............***........ 136 7.3 Estimation from a Sample for Truncated Logarithmic Series Distribution...........ii a.............. e * * * 0 0 145 TABLES I - VII............... aa.............................. 146 CHARTS 1 - 4-............... 170 BIBLIOGRAPHY................................................. 175 vi

LIST OF TABLES Table Page I n- of Singly Truncated Binomial Distribution on the Left at c = 1................4.............. 147 II - of Singly Truncated Binomial Distribution on the Left at c = 2................................... 152 III -- of Doubly Truncated Binomial Distribution n at c = 1 and d = n....57 n at c = 1 and d = n...,.......................... 157 IV u* of Singly Truncated Poisson Distribution on the Right at d................................ 162 V I* of Singly Truncated Poisson Distribution on the Left at c....................................... 164 VI u* of Singly Truncated Negative Binomial Distribution on the Left at c = 1...,........................ 168 VII t of Logarithmic Series Distribution................... 169 vii

LIST OF CHARTS Chart Page 1 Estimation of E of Singly Truncated Binomial Distribution at c = 1 for n = 3(1)6.............. 171 2 Estimation of E of Singly Truncated Binomial Distribution at c = 1 for n = 7(1)10............. 172 3 Estimation of E of Singly Truncated Binomial Distribution at c = 1 for n = 11(1)15.............. 173 4 Estimation of.i of Singly Truncated Poisson Distribution at c = 1......................... 174 viii

INTRODUCTION General Background Different approaches are possible with regard to the basic structure of the standard discrete distributions like the binomial, Poisson, negative binomial, or logarithmic series. Under plausible assumptions, these distributions may be regarded as descriptive models of populations. For instance, under the usual genetic theory, the distribution of the number x of boys in families of a fixed number of children, say n, follows the binomial law. Or again, the number of mistakes per printed page can ordinarily be assumed to have a Poisson distribution. The same is true of the distribution of accidents met with, over a period of time, by a particular individual. However, different persons may have different accident proneness as measured by the average number of accidents to the individual. If this average has, say, a Pearson's type III distribution, the distribution of the number of accidents pooled over the individuals can be shown to follow the negative binomial law. A limiting case of this is the logarithmic series distribution found useful in ecology. Again, these distributions may arise as a result of the sampling scheme adopted. In sampling with replacement n items from a lot of manufactured items, the number of defectives follows the binomial law. If the proportion of defectives in the lot is small and the number of items sampled is moderately large, one can use the Poisson approximation for the distribution of the number of defectives. On the other hand, if one uses what is known as the inverse binomial sampling procedure (that is, one goes on sampling with replacement until he gets a

-2fixed number such as k of defectives), the number of items sampled would then follow the negative binomial distribution. The discrete distributions described above occur sometimes in truncated or censored forms. For instance, in human genetics, to estimate the proportion of albino children produced by couples capable of producing albinos, sampling has necessarily to be restricted to families having at least one albino child. This is because there is no way of distinguishing families incapable of producing albinos from those that are capable but by way of chance have not produced any. The number of albino children x, if sampling is restricted to families of n children, can thus take the values 1,2,...n; the value 0 being excluded. Thus, x follows what is known as the "truncated" binomial distribution —truncated on the left at 1 to be specific. We may similarly think of distributions with both extremes truncated. Again, some types of counters can record the exact number of radio-active particles (emitted by a radio-active substance in fixed intervals of time) if the number does not exceed a certain limit such as d, otherwise it merely records that the number has exceeded d. Data obtained from such counters are samples from what are called "censored" distributions, consored on the right at d. In such cases, individual counts of all observations not exceeding d are available; only the total count of those exceeding d. Review of Previous Work in Estimation in Discrete Distributions The problems of estimation of parameters in various discrete distributions and their truncated or censored forms have been considered

by various authors. Generally, the estimate for the case of complete distributions is neat and easy to compute, but complications come in for truncated and censored distributions. For the truncated binomial distribution, Fisher (1936) and Haldane (1932,1938) gave the maximum likelihood procedure for estimating the parameter of the distribution. While studying the albinism in man by sampling from families with variable number of children (but having at least one albino), Haldane (1938) solved by an elaborate iterative process the complicated maximum likelihood equation based on data obtained simultaneously from different truncated binomial populations. Finney (1949) utilized the method of scores for solving the likelihood equation and provided some tables to facilitate the heavy computation. Because of computational difficulties in getting the maximum likelihood estimate, Moore (1944) suggested an alternative simple estimate which is a ratio of two suitably chosen linear functions of frequencies, whereas Rider (1955) equated the first two sample moments to the corresponding population moments and obtained another simple estimate. The sampling properties of these two simple estimates, however, have not been studied. Estimation problems in relation to the Poisson distribution have been investigated by many authors. For truncated Poisson distributions, the case of truncation on the left has been considered by David and Johnson (1948) who provided the maximum likelihood estimate and a small numerical table for computational facility. Plackett (1953) gave a simple unbiased and highly efficient estimate which is a ratio of -two linear functions of frequencies. Rider (1953) used first two moments and obtained another simple estimate, but did not study its sampling properties.

-4Truncation on the right has been discussed by Tippett (1932), Bliss (1948) and Moore (1952). Tippett derived the maximum likelihood solution, Bliss developed an approximation to it and Moore suggested a ratio of two linear functions of frequencies as a simple alternative estimate. For estimation in case of doubly truncated Poisson distributions, Cohen (1954) provided maximum likelihood equations, though rather unwieldy to solve; whereas Moore utilized a ratio of suitably constructed linear functions of frequencies to estimate the Poissonparameter. They also discussed the problems of estimation for samples from censored Poisson distributions. Moore (1952) gave the simple ratio-estimate and Cohen (1954) derived maximum likelihood equations for both singly and doubly censored distributions. For a negative binomial distribution involving two parameters, Fisher (1941) discussed the efficiency of moment-estimates and derived maximum likelihood equations for simultaneous estimation. He also gave a simple rule as to when one should proceed for getting maximum likelihood estimates. Haldane (1941) reduced the likelihood equations to a simpler form for computational facility. Sampford (1955) gave methods to obtain moment-estimates and likelihood estimates for a truncated negative binomial distribution with "zero" truncated, whereas equating the first three sample moments to the corresponding moments of the truncated negative binomial distribution, Rider obtained simple estimates for the two parameters. Results for estimation of the parameter of a logarithmic series distribution are rather complicated, even when the distribution is complete. The estimation problems do not seem to have been thoroughly

investigated. Fisher, Corbet and Williams (1943) found it useful in ecology and derived the maximum likelihood estimate of its parameter. We thus see that previous work on estimation in these discrete distributions can be broadly classified under two heads: (1) estimation by the method of maximum likelihood, and (2) other methods of estimation, the need for the other methods arising from the fact that frequently the method of maximum likelihood leads to complicated equations for estimation. For maximum likelihood estimation, the authors have derived the estimating equations in individual cases and have suggested the use of Fisher's iterative procedure based on "efficient scores" for the solution of the equations when these turn out to be complicated. Some numerical tables are provided here and there to help in the process of solution. The identity of the maximum likelihood and the moments method has been noticed in a few cases. Other estimates suggested in individual cases to avoid the computational difficulties of maximum likelihood estimation are of two types. One is derived by equating the first two sample moments with the corresponding theoretical moments. The second is obtained by taking the ratio of two suitably constructed linear functions of frequencies such that the ratio of the expectation of the numerator to that of the denominator is the required parameter. These two types of estimates are easy to compute in the cases suggested, but one has to remember that they are, in general, biased and inefficient though consistent. We note that the sampling properties of these estimates have not been investigated by the previous authors.

Present Contributions to Estimation in Discrete Distributions It is first shown that the binomial, Poisson, negative binomial and logarithmic series distributions can be regarded as special cases of a general class of discrete distributions which we refer to as generalized power series distribution (gpsd). It is then possible to examine the previous work on estimation in the case of the above discrete distributions from a general point of view. The approach in this thesis is to derive results for this general class of distributions and then apply them to the special cases of binomial, Poisson, etc. To begin with, we present a few results which bring out some interesting properties of a gpsd. We discover an explicit functional relationship between the variance and mean of a gpsd and based on this fundamental relation, we present some characterization theorems. To mention one, we establish that the equality of variance and mean is necessary and sufficient for a gpsd to be Poisson. Next, we investigate certain estimation problems connected with a gpsd. We show that the maximum likelihood method and the method of moments give the same estimate when the gpsd involves a single parameter. A computational method for evaluating the maximum likelihood estimate is developed which requires only a table of values of the mean of the gpsd. for various values of the parameter at sufficiently close intervals. It is shown how the standard error of the estimate can be approximately evaluated by using this table. The formulae for the amount of bias in the likelihood estimate are obtained to the order of 1/N where N is the sample size.

-7Large sample methods based on maximum likelihood are then derived for testing the homogeneity of several distributions and providing the estimate for the common parameter in case the distributions are homogeneous. The likelihood equation and a method for solving it are derived for the problem of estimation in censored forms of a gpsd. Methods based on the maximum likelihood principle are given for the treatment of doubtful observations. The problem of estimation when the gpsd involves two parameters has been considered. For the gpsd, in addition to the maximum likelihood estimate, two other simple estimates are provided. One is called the "two-moments estimate" and is derived by equating the first two sample moments to the corresponding population moments. The other estimate is called the "ratio estimate" as it is obtained by taking the ratio of two suitably constructed linear functions of frequencies such that the ratio of the expected value of the numerator to that of the denominator is equal to parameter. Expressions are derived for the bias and variance of these two estimates correct to terms of order 1/N where N is the sample size. Lastly, the results obtained by general approach are applied to specific distributions - namely, the binomial, Poisson, negative binomial, and logarithmic series. In each case, exhaustive numerical tables are given to facilitate computation of the maximum likelihood estimate. The bias and efficiency of the "ratio" and "two-moments" estimates are numerically evaluated for different values of the parameter and recommendations are given for the suitability of the different methods of estimation. Illustrative examples have been worked out in detail to illustrate the methods suggested.

CHAPTER I 1.0 A CLASS OF DISCRETE DISTRIBUTIONS AND CERTAIN CHARACTERIZATION THEOREMS 1.1 Introduction Let g(Q) be a positive function admitting a power series expansion with non-negative coefficients for non-negative values of - smaller than the radius of convergence of the power series: 00 g(~) = Z azz. (1.1.1) z=0 Noack (1950) defined a random variable Z taking non-negative integral values z with positive probabilities Prob Z=z = g() (z = 0,1,2,...m). (1.1.2) He called the discrete probability distribution given by (1.1.2) a power series distribution (psd) and derived some of its properties relating its moments, cumulants, etc. To be more general, we note that the set of values of an integral-valued random variable Z need not be the entire set of non-negative integers (0,1,2,...cx). For, let T be an arbitrary non-null subset of non-negative integers* and define the generating function f(G) = Z aXOx (1.1.3) xeT with ax > 0; 0 > 0 so that f(0) > 0, is finite and differentiable. * In fact, one can take T to be a countable subset of real numbers; for purposes of this dissertation, however, T is chosen to be a subset of non-negative integers. -8

Then we can define a random variable X taking non-negative integral values in T with probabilities a ox P ProbX = x x xT (1.1.4) and call this distribution analogously a generalized power series distribution (gpsd). It may be noted that gpsd reduces to a psd when T is the entire set of non-negative integers. We add here that we call the set of admissible values of the parameter 0 of gpsd as the parameter space 0 of the gpsd. Also we refer to set T of values of random variable X defined by the gpsd, as the range T of the gpsd. Writing the mean. = E(X), the crude moments mr = E(Xr), the central moments mr = E(X-O)r, the moment generating function (mgf) M(t) = E(etX) and the cumulants Kr = [dr log M(t)] (in case they n te dt t=O exist); we obtain, for a gpsd, the following relations derived on the same lines as shown by Noack (1950) for a psd: = of, ()/f(Q) (1.1.5) mr+l = Gm' r + pr (1.1.6) 1r+l = 0l'tr + r L2r_1 (1.1 7) M(t) = f(Qet)/f(Q) (1.1.8) r mr = Z ( r-1)mr-iKi (1.1.9) i=1 r r Kr+l = 9 Z (i)mr-iK'i Z (i-2)mr+1iKi (1.1.10) i=l i=2 where primes denote differentiation with respect to 0.

-10We note further that for a gpsd dQ -= 0 [log f(o )] (1.1.11) 2 do(i.1.12) dKr Kr+l = dKr (1.1.13) dQ Also, writing the factorial moment of order r as ~(r) = E[(X)(X-l).....(X-r+l)] we can derive for a gpsd gr dr L e [Cr(,)] ~(1.1.15) ~(r+l) = ( P'r)~(r) + Q L [4(r)] (1.115) The Binomial, Poisson, Negative Binomial and the Logarithmic Series distributions can be obtained as special cases of the gpsd by taking f(_) = (1+0)n, n positive integer for Binomial f(~) = e@, for Poisson f(Q) = (l-)'k, k positive for Negative Binomial f(Q) = -log(l-Q), for Logarithmic Series. It is interesting to note that the Poisson and Negative Binomial distributions are special cases of a psd also; however, the Binomial and Logarithmic Series are not. The relations (1.1.5) to (1.1.15) are generalizations of corresponding results obtained by various authors (Romanovsky, Frisch, Haldane) separately for Binomial, Poisson, Negative Binomial distributions.

-111.2 Functional Dependence of Variance and Mean of a gpsd In this section, we present a few results which bring out some interesting properties of a gpsd. We discover an explicit functional relationship between the variance and the mean of a gpsd and based on this relation, present some characterization theorems. Theorem 1: For a gpsd, Variance = Mean + 2 d [log f(@)]. dO2 Proof For a gpsd, we have from (1.1.12) and (1.1.11), Variance p2(@) = @ $ [A(@)] and dO dQ Mean A(@) = 0 d [log f(@)], Consider dO -0 d I~ do [log f(Q)] d 2da2 0 - [log f(@)] + 02. 02- [log f(O)] i.e., 12(g) = i() + 2 2[log f(@) (1.2.1) dQ Hence, the statement of the theorem. Consider now, Lemma 1: If the parameter space 8 of a gpsd contains zero, then the range T of the gpsd contains zero and the corresponding random variable takes the value zero with positive probabilities for all 0 in the parameter space; and conversely.

-12Proof: oee,' f(o)>o. But f(Q) = Z axQX xeT,' OCT and ao> O Converse follows by retracing the steps above. Hence, Lemma 1. Lemma 2: The logarithm of the generating function of a gpsd is a monotone non-decreasing function of @. Proof: Two cases arise: (1) 00e, i.e., Parameter space of the gpsd does not contain 0, and (2) Oce, i.e., Parameter space of the gpsd contains 0. Case 1: Here @ > 0, @ee axQX. A(@)= Z x ro> O dQ But from (1.1.11), L(@) = ~ d [log f(Q)]' d [log f(o)] > o Case 2: By Lemma 1, we have in this case, f(@) = ao +Zaxx, ao> o xeT - jot where T - [0t denotes the set T without 0. Direct computation gives, therefore: z [log f (Q)] [xa a9.~ = -~o) x=T -~....

-13clearly, for 0 > 0, d [log f(0)] > 0, for 0 = O, - [log f(0)] = if 1ET and a1=0 or 1fT > 0 otherwise, Thus, we have always d [log f(o)]> O. dQ Hence, Lemma 2. Theorem 2: The necessary and sufficient condition for the variance of a gpsd to equal its mean for every 0 of its parameter space 8 is that the generating function be of the form f(0) = ekQ+c where k > 0 and c are arbitrary constants. Proof Sufficiency: obvious Necessity: now, have for a gpsd 12(Q) = ~(o), oce,', By (1.2.1) of Theorem 1, 2 d [log f(0)] = o. (1.2.2) dQ2 Now, two cases arise: Case 1: o0e, i.e., Parameter space of the gpsd does not contain 0. In this case, (1.2.2) reduces to d02 d[log f(~)] = k, where k is some positive constant by Case 1 of Lemma 2.. log f(0) = kO + c, where c is arbitrary constant.

Hence, for all Qe8, f(Q) = ekO+c where k > 0 and c are arbitrary constants. Case 2: Oee, i.e., Parameter space of the gpsd contains 0. For positive values of Q, Case 1 applies and we get for all Gce - 03, f(G) = ekQ+c where k> 0 and c are arbitrary constants. To verify that this form holds for 0 = 0, we have by Lemma 1, OET and ao > O, so that, f(0) = a = elog ao = ek.O+c where c = log ao. Hence, the statement of the theorem. Theorem 3: The equality of mean and variance is necessary and sufficient for a gpsd to become Poisson. (Characterization of Poisson distribution.) Proof: We first prove the following lemma. Lemma: A positive constant multiple of the generating function of a gpsd does not affect it; i.e., gives rise to the same original gpsd. Let the generating function be as given in (1.1.3) and the corresponding gpsd as given in (1.1.4).

Consider the new generating function h(G) = kf(Q) where k is some positive constant i.e.,, h(Q) = Z kax Qx xeT and the gpsd corresponding to h(Q) becomes kax Qx axQX Prob IX=xt = h) = f I) xeT which is the original gpsd given in (1.1.4). Hence the Lemma. Now, Theorem 3 follows immediately by applying the above lemma to Theorem 2. Hence, the characterization of Poisson distribution as stated by Theorem 3. Theorem 4: The necessary and sufficient condition for the variance of a gpsd to exceed its mean for every non-zero Q of its parameter space 8 is that the generating function be of the form f(Q) = eP(Q)+RQ+Q where Q and R are arbitrary constants and P(Q), along with its derivative, is a positive monotone increasing function of Q. Proof: Necessity: Let 22(@) - (Q) = Q2p(Q) where p(Q) is a positive function of 0. Then by Theorem 1, we have.2 d [log f(@)]: = 2p(). dO

d2 i.e., - [log f(~)] = p(~)o (1.2.3) dd Integrating (1.2.3), a[log f(0)] = P'() + R, (1.2.4) where R is arbitrary constant and R + P' () = fp(G)dO, a positive monotone increasing function of 0. Integrating (1.2.4), we have log f(0) = P(o) + RO + Q, (1.2.5) where Q is arbitrary constant and Q + P(o) = fP' (Q)d~, a positive monotone increasing function of 0. Hence, from (1.2.5), we have the required form for the generating function, namely, f(0) = eP(~) + RQ + Q where symbols carry the meaning as stated. Sufficiency: Sufficiency follows from above by retracing the steps. Theorem 5: The necessary and sufficient condition for the variance of a gpsd to be less than its mean for every nonzero 0 of its parameter space 8 is that the generating function be of the form f(0) = eA(Q) + BQ + c, where B and C are arbitrary constants and A(@) is such that its derivative is a monotone decreasing function of 0.

-17Proof On the same lines as that of Theorem 4. Theorem 6: The mean.(~) of a gpsd is a non-negative monotone non-decreasing function of 0. Proof Consider the relation (1.1.12), which states clL(=) dQ (1.2.6) We know that It2(9) > 0. Also 0 > 0.'.- from (1.2.6) follows that d 4[t()] > O; also AL(@) > o. d --,, p(g) is a non-negative monotone non-decreasing function of 0. Theorem 7: The graph of the mean of a gpsd with parameter space containing zero is convex or concave or linear in accordance with the variance of the gpsd being greater than or less than or equal to the mean and conversely. Proof: Suppose ~'2(Q) > C (~),, Q d[(o)] > L(0) dQ d [4(o)] > -A- when 0 $ 0. Also, as the gpsd is taken with parameter space containing zero, we can speak of g(O) which is clearly = 0. Hence, follows the convexity of the graph of the mean when the variance exceeds the means On similar lines the rest of the statement can be very easily established.

Theorem 8: The mean of a gpsd with parameter space containing zero is respectively a linear or convex or concave function of 0 if and only if the generating function is respectively of the form of Theorem 2, Theorem 4, or Theorem 5. Proof The proof follows immediately from Theorem 2, Theorem 4, Theorem 5, and Theorem 7.

CHAPTER II 2.0 LIKELIHOOD ESTIMATION AND ALLIED PROBLEMS IN A CLASS OF DISCRETE DISTRIBUTIONS We show first in this chapter that for gpsd (1.1.4), the maximum likelihood method and the method of moments give the same estimate of the gpsd parameter. The likelihood equation and a method for solving it are derived for the problem of estimation in truncated and censored forms of the gpsd (1.1.4). We mention here that we call gpsd (1.1.4) a complete gpsd as opposed to its truncated and censored forms. Large sample methods based on maximum likelihood are then derived for testing the homogeneity of several gpsd's and providing the estimate for the common parameter in case the gpsd's are homogeneous. A treatment for doubtful observations is given. Lastly, the problem of estimation when the general distribution involves two parameters is discussed. 2.1 Estimation by Likelihood for a Complete gpsd 2.1.1 Let xi (i=l1,2,...N) be a random sample of size N from the gpsd (1.1.4). Then the logarithm of the likelihood function L is N log L = constant + Z xi log 0 - N log f(0) i=l so that the "efficient score" for 0 is v(o) = -[log L] d. N N x - Nf'()/f(x) i-=l = (x- ) (2.1.1) -19

-20where N x = Z xi/N is the sample mean and by (1.1.5) i=l = Qf'(Q)/f(Q), the mean of the gpsd (1.1.4). The likelihood equation v(o) = 0 for estimating 0 thus reduces to x= p() (_ i, say) (2.1.2) which means equating the sample mean to the population mean. The method of maximum likelihood and the method of moments thus lead to the same estimate in the case of a gpsd. Denoting this estimate by 0, the asymptotic variance is given by 11I(Q) where I(@) = -E(d) dQ -E[- () - (x-)] N d O Q N (do (2.1.3) also, = N' 2(9), because of (1.1.12). (2.1.4) Thus, 9 dg Var(O) = N/(d) (2.1.5) also, Q2 N= -/2() (2.1.6) 2.1.2 If Equation (2.1.2) does not readily give an algebraic solution, one may use an iterative process of solution (which converges; see Rao, 1952) by starting with an approximation Go. An improved approximation ~1 is then obtained from 1 = Go + V(Go)/I(~o) = Qo + [x - I(o)]/(d)o (2.1.7)

-21or from the equivalent formula 1 = Go + Go[~ - _p(Qo)]/A(Qo) = Qo[l + - (o) (2.1.8) p2 (go) and the process is repeated till one gets a sufficiently accurate solution. To carry out this process by Formula (2.1.8), a table of numerical values of p(Q) and P2(a) of the gpsd under consideration for sufficiently close values of 0 could be very useful. Formula (2.1.7) dii would require a table of numerical values of (Q@) and d-. However, it may be observed that a table for only >(@) of the gpsd under cornsideration for sufficiently close: values of 0 would do. Because da dQ can be approximated by the finite difference ratio At and this approximation is expected to be good if the tabular interval is small. Illustrative examples (4.2.5; 5.2.11; 7.2.6) given later substantiate this observation. 2.1.3 To find the amount of bias in the maximum likelihood estimate 0, following Haldane and Smith (1956), we have the amount of A 1 bias in 0 to order Nas b('Q) = - 1 ( k) (2.1.9) where dPx 2 A= Z ( /P (2.1.10) xeT and B= dPx d2Px B = ()(T2")/Px, (2.1.11) xcT

-22in which,as usual, Px = ax /f() xET f'(Q) dCp 4 = 0 f)-and 12 = d-' dPX P x dO Q and d2px Px Xd = o_[(x Q_ )2 -_ 2 _ (x- i)] so that AA1 = 2 0 and 13 - P2 B1 = 1 Therefore, from (2.1.9), we have the amount of bias in 0, to order 1, _,~3 - e2 b(a) = - 1 2 (2.1.12) N 2 2 12 2.1.4 Next, to estimate a differentiable function of 0, such as w(~), the method of maximum likelihood leads to the estimate with variance 9ddw)2 2 Var (I)= = d.. is the maximum likelihood where Q is the maximum likelihood estimate of ~.

-23To find the amount of bias in ~, following Haldane and Smith (1956), we have that the amount of bias to order N in w is given by: (A) 1 B2 b() = - () (2.1.14) 2 where dPx Al A2 = Z (-)/Px = o2 (2.1.15) xeT ()dQ and dPx d2P B2 C(~xET ) /Px xeT d2w =d) [B1 - d- Al] (2.1.16) (d)3 _ dQ dQ where Al and B1 are defined by (2.1.10) and (2.1.11), respectively. Then from (2.1.14), d2w dg2 L3 - 2(1+ ( d'd ) b() = 1 do d (2.1.17) N 2 dQ 2 "2 2.2 Estimation by Likelihood for a Truncated gpsd Let T* be a non-null subset of the range T of gpsd (1.1.4), and consider the distribution (1.1.4) truncated to the subset T*. In this case, it can be easily verified that the (truncated) random variable X* has the probability distribution: P* = Prob IX* = x = xeT*, (2.2.1)

-24where *(I) = Z axOx. (2.2.2) xeT* 2.2.1 It is easy to see that the truncated gpsd (2.2.1) is in turn a gpsd in its own right with the generating function given by (2.2.2), and consequently, all the properties of (1.1.4) are valid for (2.2.1). To be explicit, distinguishing the characteristics of this truncated distribution (2.2.1) from the complete gpsd (1.1.4) by means of an asterisk(*), it immediately follows that relations analogous to those in (1.1.5) - (1.1.15) will hold for the mean i*, crude moments mr*, central moments rt*, mgf M*(t), etc., such as A* = o Jo[f*() ] 2= d=Q,, etc. 2.2.2 Similarly, the maximum likelihood estimate Q (which in this case, is also equivalent to the moments estimate) for Q is to be computed from the likelihood equation: x* = ~*(Q) (2.2.3) where x* is the mean of a random sample of size N from the truncated gpsd (2.2.1). The asymptotic variance of 0 is similarly given by Var (A) = /( ) /'2( ) The iterative process of solving (2.2.3) can again be put down in the form 1 = Qo + x-* - ~*(oo)]/(~)

or o1 = Go[l + L* (go) The formulae for the amount of bias to order 1 in Q [and N = (Q) ] can be written down similarly. 2.3 Estimation by Likelihood for Censored gpsd Let T*, Tj (j = 1,2,...k) be k+l mutually exclusive and exhaustive subsets of the range T of the gpsd (1.1.4). Suppose that in a random sample of size N from the gpsd (1.1.4), we have a record of the number nj of observations in the subset Tj (j = 1,2,...k) and of the n* observations xi (i = 1,2,...n*) in the subset T*, so that k N =n* + n. j=l and we write n* x* = Z xi/n*. (2.31) i=l 2.3.1 The logarithm of the likelihood function may be written as n* log L = constant + Z n. log f + Z xi log Q -N log f j=l i=l where fj = fj(Q) = Z ax and f = f(Q) = Z ax~X. (2.3.2) xeTj xeT

The "efficient score" for 0 is fl(Q) = -[log L] dQ k n* = f njft/fj + Zxi/ - Nf'/f j=1 i=l 1 (2..k - l[n*x* - (Nu - Z njvj)] (2.3.3) 0 j=1 where vj = v.(0) = xax@/f (2.3.4) is the mean of the j-th class Tj and A is the mean of the gpsd (1.1.4). Thus, the likelihood equation for estimating 0 is k n*x* = N~I - Z njv (2.3.5) j=l where = (Q) and vj v v.(Q) The asymptotic variance of the estimate 0 derived from (2.3.5) is 1/I(Q) where I(Q) = - E()= N pl!v (2.3.6) j=l where pj = fj/f is the probability for the j-th class Tj and primes denote the differentiation with respect to 0. It may be noted that when the subsets Tj (j = 1,2,...k) are all empty, we get our previous formula (2.1.2) for the estimate from sample for a complete gpsd. 2.4 Estimation with Doubtful Observations Let T* be a subset of the range T of the gpsd (1.1.4). Consider the situation where the experimenter has doubts about sample

-27observations not in T*, and therefore, merely records the number of observations not in T*. The model considered is: Prob X - x = X for x j T* axQx aX - (1-) f*(Q) for x e T* (2.4.1) where O < f < 1 and f*(Q) = Z axQX, (2.4.2) xET* 2.4.1 If, in a sample of size N, the number of observations not in T* is n1 and the records of the other N-nl = n* (say), are Xi (i = 1,2,...n*); the logarithm of the likelihood function is then given by log L = constant + n1 log P + n* log(l-P) n* + Z xi log 0 - n* log f*(Q) ~ i=l The "efficient scores" for f and 0 are then..ogL n n*.. 1-(2.4.3) 2 log L- n$*([)] (2.4.4) where n* x = Z xi/n* (2.4.5) i=l and.*(o) = Z xax x/f*(g) (2.4.6) xeT* is the mean of the subset T*.

-28The likelihood estimates of P and 0 are thus given by = nl/N (2.4.7) ~*(0 -- x* (2.4.8) so that? the estimate of 0 is derived simply by neglecting the nl observations not in T* and treating the sample of n* as one from gpsd (1.1.4) truncated to T*. The elements of the "information matrix" are given by: Ill = - E( d )= N (2.4.9) I12= - d.l (d ) d ~ (2.4.10) I = - d.2 N(l-) (d*) (2.4.11) 22 d.Q 0 d. 2.4.2 The hypothesis Ho of interest is that the proportion of doubtful observations conforms to the gpsd (1.1.4), that is, HO: = 1 - f*(Q)/f(Q) If Ho is true, the estimate of 0 is obtained by consideration of the sample as a censored one as in Section 2.3. The estimate of 0 under Ho is thus obtained from: n*x* = Noi - nlvl (2.4.12) where v1 = Z xax~X/[f () - f*(0)] (2.4.13) xeT* is the mean of the subset complementary to T*. The estimate of 0 under Ho derived from this equation will be denoted by 0. The estimate of P under Ho is then given by o = 1 - f*(Qo)/f(Qo). (2.4.14)

-29Following the method suggested by Rao (1948) for testing HO, we have the criterion CX.2 [')[ ]2 [*2(')]2 H2[jT(]T + = (2.4.15) which is asymptotically distributed as a Chisquare with one degree of freedom. In the present case, (2.4.15) takes the following form: 2 N2 nl*2 [ (o)]2 X: 1 "o"l-%C)(- -~) (l —o) - (o) (2.4.16) where 2 dO 2.5 Homogeneity and Combined Estimation In the light of random samples from a number of gpsdts, it may be required to examine if the distributions are homogeneous in respect to the parameter Q and if so, to make a combined estimate of 0. Let xji (i = l,2,...Nj) be a random sample from the j-th gpsd characterized by the probability law: ax(J)~jx/fj(~j) x E Tj (2.5.1) where fj(0j) = Zax(J)Qjx (2.5.2) xcT Tj is the range of the j-th gpsd, and j = 1,2,...k. 2.5.1 The logarithm of the joint likelihood function is k k log L = constant + C Njxj log ~j - EiNj log fj(~j),

-30where Nj xj = x xji/Nj i-=l is the mean of the sample from the j-th gpsd. The j-th "efficient score" is then _ log L = i[X. - cIj(Qj)l (2.5.3) where,j(gj) is the mean of the j-th gpsd. The elements of the "information matrix" are Nj daj(gj) ~Ij N= di. (2.5.4) a j Nj2j(gj)/O2 (2.5.5) I= 0 J j jt (2.5.6) where P2j(gj) is the variance of the j-th gpsd. 2.5.2 The hypothesis of homogeneity is H: 1 = 2. k If the hypothesis Ho is true, the common value may be denoted by 9 and the efficient score and the information with respect to 0 are given by: N - (Q) = ~(x -. Njqij (0)/N) (2.5.7) j=1 where k x= Z Njxj/N, j=l1

and a i d0) (2.5.8) k E ZNJ_ 2j ~(o)' (2.5.9) j=l To solve the equation 4r(Q) = 0 for the maximum likelihood estimate 8, one starts with an approximation Qo and derives a better approximation 01 from the formula: 01 = 0o + N[x - ZNjlj(Qo)/N]/ j d (2.5.10) or k k 1= QO[1 + Nix - Z NjiLj(0o)/Nt/ Z Njp2j(o)]. (2.5.11) j=l j=l 2.5.3 A test of the homogeneity hypothesis Ho is then given by the statistic k X2 1 [ (~)]2/'I. (8) (2.5.12) k Z N [xj -j() (2.5.13) k = Nj[xj- ij(0)]2/ 2j ) (2.5.14) j=l which is asymptotically distributed as a Chisquare with (k-l) degrees of freedom, if H is true. O

2.6 Estimation for a gpsd with Two Parameters Consider a gpsd with two parameters taking the form: Prob X = x = x (x)x x e T, (2.6.1) f(Gx) where T is the range of the gpsd and the generating function f(Q,x) = Z a,(X)@x, (2.6.2) xcT such that f(Q,X) is positive and bounded for all admissible values of the two parameters Q and X, and the non-negative coefficients ax(X) now depend on x and X. The binomial and negative binomial distributions are special cases of (2.6.1) when they are considered to be the distributions with two parameters. 2.6.1 To estimate 0 and X on the basis of a sample xi (i = 1,2,...N) of size N from (2.6.1), the logarithm of likelihood function is N N log L = constant + C x log @ + Z log a i(X) i=l- i =1l - N log f(G,X) The "efficient score" for 0 is then I1 = 1(,x) = [log L] = [x (,)] (2.6.3) and the likelihood equation 01(Q,$) = 0 reduces to x = l($,x) (2.6.4) which is the same as the first-moment equation.

-33The "efficient score" for X is 2 =2 (Q, X) - [log L] = N[E dX log axi(X)/N - log f(Q,X)] (2.6o5) and the estimating equation 42(QX) 0 0 becomes N - log f(,X) = Z This, however, is not a moment equation. The second moment equation will be 2 S- CLx(, (2.6 7) where N -I Z(xi - 2/ Thus, unlike gpsdts of the form (1.1.4) with single parameter, gpsdts given by (2.6.1) with two parameters do not yield identical "'moment" and "maximum likelihood" estimates. 2.6.2 The elements of the "information matrix" - l (1 112) (2.6.8) I12 I22 are given by 6*1~= N 6(h) (2.6.9) 112 = l E( 6 E( 2

a-2 122 = -E( = N[2 log f(Q,X) - h(x)] (2.6.11) where h(X) = E[ 2 log ax(X)] dX - 2 f (@,X) Ed log ax(X) ]2 ~(Qx) 2x a The asymptotic "dispersion matrix" of the estimates Q,X obtained by solving (2.6.4) and (2.6.6) is then given by (or(~) ) cor (,i =, (2.6.12) cor (Q,X) var (x) If instead of Q and X, 1i = p(@,X) and X are regarded as the parameters, the maximum likelihood estimates of,u and X are asymptotically uncorrelated: this follows from (2.6.10).

CHAPTER III 3.0 SIMPLE METHODS OF ESTIMATION FOR A CLASS OF DISCRETE DISTRIBUTIONS In Sections 2.1, 2.2 and 2.3, we discussed the method of maximum likelihood for estimation on the basis of samples from gpsd's which are either complete, truncated or censored. The method, though Y'efficient"t, generally involves heavy computation. Moreover, it does not yield an unbiased estimate in several cases. In this chapter, we consider some other methods of estimation and investigate their important properties. All these are easy to compute, and it also turns out that some of them provide unbiased estimates. 3.1 Estimation by the Ratio Method for a gpsd [Range T finite and T = (c,c+l,...c+k = d) with positive probabilities] 3.1.1 Consider the gpsd (1.1.4) with range T finite and T (c,c+l,...c+k = d) with positive probabilities, that is, the coefficients ax > 0 for all xeT. To be explicit, the gpsd that we consider here is of the form: ax x Px =Prob {X = xi (3.1.1) where xeT = (c,c+l,...c+k = d), d finite d f() = Z aXx (3.1.2) xc X=C and ax > O for xeT. -35

-36Let g(X) axr xeT (3.1.3) with r being an integer such that x-reT. Then V V Z gr (X)Px = Z ax-rQx/f(Q) X —-u x=u v-r = 0r Z axax/f(Q) x —u-r v-r = gr' Px (3.1.4) x= —u-r where u and v are arbitrary with c+r < u < v < d. From (3.1.4), we get the identity Z gr(x)Px x=u (3.1.5) E Px x=u-r which can be made use of in problems of estimation. In a sample of size N, if nx is the observed frequency for x, then since E(nx) = NPX, the statistic v V-r Z gr(X)nx/ Z nx (3.1.6) x=u x=u-r may be taken as an estimate of gr for admissible values of r=1,2,etc. Since u and v are arbitrary, the same method is applicable for estimation in truncated and censored gpsd's also, provided that their range contains a subset of consecutive integers. We call these estimates "ratio estimates." It is interesting to note that methods given by Plackett (1953) and Moore (1952,1954) for estimating Q in truncated Binomial and Poisson

-37distributions come out as special cases of the method we suggest here. The method which we call the ratio method is applicable not merely for estimating 0, but also for its integral powers and for any gpsd of this section, truncated or censored. The ratio estimate is not generally unbiased or efficient, but is always easy to compute. In certain cases (see Section 3.2), however, unbiased estimates can be obtained by the ratio method. In other cases, such as those in this section, the bias is generally of the order i discussed below. N 3.1.2 Consider the following ratio estimate of 0 for gpsd t= t (3.1.7) t2 where d ax-1 t = Z (a ) n (3.1.8) X=c+1 X and d-l t2 = Z nx. (3.1.9) x=c Then, writing d-l E(t2)= N Z Px = N(-Pd) = NP, say, (3.1.10) X=c where P = 1 - Pd, (3.1.11) we have E(t1) = NPQ. (3.1.12) Let tl - E(tl) = St1 and t2 - E(t2) = t2. (3.1.13)

Then, = —1=0(1+ )(l + NP )' t NPQ NP Since the deviations btl, 6t2 are stochastically of order Nl/2, we get on expansion stl bt2 (Stl)(6t2) (5t2)2 0, = 0[l + o- - N2P2 N22 ] (3.1.14) neglecting terms of order higher than 1. Thus, to this order of N approximation, E(Q') = [1 + E(6t2)2 E(6tl)(6t2) ( 15) N2p2 N2p20 Now a little computation gives E(St2) = NP(1-P) (3.1.16) and E(6t1)(6t2 = NQ[P(1-P) - Pd-_l] (3.1.17) Thus E(0') = o + N (d-l (3.1.18) from which we get the magnitude of the gias in Q', to order, N 1 _Pd-i b(@') =( p2 ) = (QPdl)/N(-Pd)2 (3.1.19) 3.1.3 The variance of Q' correct to terms of order 1 is N Var (') = -[E(Stl)2 + E2 E(St2) - 2~E(stl)(St2)]. (3.1.20)

-39Now E(tl)2 = N(D - p2@2) (3.1.21) where D= 1 ax-1) Px ~ (3.1.22) x=c+l Thus, to order 1 N Var (Q) = 2 [D - P2 + 2Q2 Pd-l] * (3.1.23) NP 3.1.4 One simple estimate suggested by the identity ax Px+l ax =....(3.1.24) ax+l Px is given by ax nx+l m = x, (3.1.25) ax+l nx 1 For this estimate, to terms of order N b(m) 1 i (3.1.26) N Px and o(1 + bxO) Var (m) (3.1.27) NbxPx where ax+l x =ax It is suggested in (3.1.26) that the order of the amount of bias for 0 is only N. Also (3.1.26) and (3.1.27) suggest jointly that one may use the modal class for estimation with advantage.

-403.2 Unbiased Estimation by the Ratio Method for a gpsd [Range T infinite and T = (c,c+l,...oo) with positive probabilities] It is east to demonstrate that the ratio method discussed in Section 3.1 gives the unique unbiased estimate of 0, linear in frequencies, for a gpsd with range T infinite and T = (c,c+l,...o) with positive probabilities. For, consider the gpsd axx Px = Prob ~ X = x =f x = c,c+l,...co (3.2.1) where 00 f(~) = Z axQx (3.2.2) X=C and ax > 0 for all x = c,c+l,...co. 3.2.1 Now, if in a sample of size N from gpsd (3.2.1), the frequency of x is nx and we want an unbiased estimate for. of the type linear in nx, we should be able to demonstrate the existence of a function of x, t(x), such that, denoting the corresponding estimate 00 = Z t(x)nx (3.2.3) X=C we must have E(Q) = 0 for all 0 in the parameter space of (3.2.1). That is 00 N E t(x) axOx/f(Q) = ~ X=C or N~tlrx00 00~x+l N Z t(x) ax = a + x ~ x

-41Since this is an identity in Q, equating coefficients of corresponding powers of a, we get t(x) = 0- for x = c ( ax-) for x = c+l,c+2,.,,oo N ax 3.2.2 Thus, the unique unbiased estimate of 0 linear in the frequencies comes out to be the ratio estimate Q', because _=l1 Z (ax-1 N x=c+l anx x=c+l 00 ax-l ( ax -1) 2 2 _=[P] (3.2.4) c nx =X=C+C The exact variance of this estimate is a2(0') =x=c+l (x~*-) P -. (3.2.4) An unbiased estimate of 2(Q') is [ ( "xl)2 n - N't2]/N(N-1) (3.2.5) x=c+l X the proof of which is almost immediate once one recognizes that Q' is the mean of N independent identically distributed random variables Yi with probability distribution given by (for i = 1,2,...N) Prob Yi = 03 Pc and P ob Yi: ax.}Px for x-= c+lc+2,...c. h~obf~x

One can compare c2(g') with the asymptotic variance Var (a) of the maximum likelihood estimate of 0 and the efficiency of the ratio estimate Q' can be computed. Of course, a2(@t) > Var ($) from the Cramer-Rao Information limit to the variance; but this comparison is not quite valid because the maximum likelihood estimate is not generally unbiased. Lastly, one may establish that 1 ax-r'r =-N E7 a, )n (3.2.6) x=c+r x is the only unbiased estimate of gr (r an integer) which is a linear function of the frequencies. 3.3 Estimation by the Two-Moments Method for a gpsd [Range T = (c,c+l,...d), d finite or T = (c,c+l,...*) with positive probabilities] Consider the gpsd (1.1.4) with finite or infinite range T = (c,c+l,...d) with positive probabilities; that is, consider the gpsd Px= Prob X = XI= axx (331) where xeT = (c,c+l,...d), d finite or infinite d f(Q) = Zaxx ax >O (3.3.2) x=c 3.3.1 For this distribution, it is easy to see that = Go01 + CPc (3 3 3)

-43and m2 = + Gll + c(c-1)Pc (3.3.4) where d-1 (x+l)ax+ j Gij = i[ a P (3.3.5) X=C Further, from (3.3.3) and (3.3.4), we have m2 - -Gll - _Go = c-l when c 0 (3.3.6) 01 which when solved for O gives the identity = 2 m C.l @ = - (c-)G0 when c + 0. (3.3.7) l (c-)01 From (3.3.3), we have the identity = -. when c = 0. (3.3.8) GOi 3.3.2 The identities (3.3.7) and (3.3.8) can be made use of in estimating 0. One has only to compute d Si = Z xinx i = 1,2 (3.3.9) x=c and d-l (x+l)ax+ j gi = E xi[ a nx i = 0,1; j = 1 (3.3.10) X=C x from the sample, and then S2 - cS1 t = (c-l) when c t 0 (3.3.11) gll - go1) or t = when c = O t (3.3.12) go]

can be taken as an estimate for 0. Because we use the first two moments for the estimation of the single parameter, we call the estimate t as "two-moments estimate" and the method as "two-moments method." 3.3.3 Proceeding along the same lines as in Section 2.6, one gets to terms of order 1 b(t) = E(t) - = NG (a22 - 12) (3.3.13) and Var (t) = [a - 22 + 1 22 ] (3.3.14) NG2 1 12 22 where (i) for c j O, G = G11 - (c-l)G01 2 c2 - 11 = (m4 -m2) + (m 2) 2c(m3 - m2) a12 = (G31 - m2G11) - c(G21 - pGll) - (c-1)(G21 - m2G01) + c(c-1)(Gll - Gol) 2 = (G2 - G21) + (c-l)2(G02 -G0) - 2(c-l)(G - GiG ) and (ii) for c = 0, G = G1 11 = m2 - L2 a12 = G11 - ~G0o 22 = G02 - 01 3.4 Estimation by the Two-Moments Method for a Truncated gpsd Consider the gpsd (3.3.1) truncated to T* = (c*,c'+l,...d' ), dA & d when d finite.

-45The truncated gpsd can be written as P* = Prob X*= x}= xcT* (3.4.1) where d. f *() = Z axQx (3.4.2) X=Co' 3.4.1 For this distribution, it is easy to see that * - OHo1 = cP* - (d +l)P+ (3.4.3) and m- iA* - OHll = c' (c.-l)P* - d (d +l)P*+1 (3.4.4) 2 11 c T +1 where H. = xi[(x+l)ax+l] * (34.) xS3 a 3.4.2 For estimation purposes, we consider following four mutually exclusive and exhaustive cases: Case (1) c'l = 0 and d' Finite Case (2) c' = 0 and d. Infinite Case (3) c': 4 0 and d' Infinite Case (4) c' 4 0 and d' Finite. Case (1): c' = 0 and d' Finite From (3.4.3) and (3.4.4), we have the identity - - (d-+l)* (3.4.6) Hll - d' H01 which we utilize to estimate 0. We have only to compute Si = xin i = 1,2 (3.4.7) i1

-46and a" d (x+l)ax+i j h E xi[ ax ] n i = 0,1; j =1 (3.4.8) M=C'" from the sample and then S2 - (' +l)S1 t* =.-, d.h (3.4.9) can be taken as an estimate for 0. The estimate t* makes use of the (additional) information that the sample is taken from some known gpsd and truncated to the one under consideration. The estimate t of Section (2.8) does not require, and hence, does not make use of this information. The formula for the bias and variance of t* can be written down to order - as: b(t*) = L (Q* - 1*) (3.4.10) NH2 22 12 and Var (t*) = (o* - 2-0l*2 + ~22) (3.4.11) where H = Hll - d'H01 = (m - m*2) + (d'+ +1)2(m*- *2)-2(d'+1)(m* - )*m*) * = (H - m*H )-(d.+l)(H *H 12 31 2 11 21 11 d' (H21 - mH0) + d' (d'+l)(H - H01) and t = ( 22 - H) + d 2(H2 Hol) - 2d (H12 - H

-47Case (2) and Case (3): c' = 0, d' Infinite and c' O, d& Infinite It can be easily verified in these cases that Hi = Gij and, hence, t* =t Thus, we have the same treatment as in Section 3.3,3 Here we also observe that if we allow d d -- even when d is finite -- and use ad+l = 0 formally, we again get Hij = Gij and t* = t. This observation is specially important in the case of the binomial distribution. Case (4): c' + 0 and d' Finite It may be noted that the t* estimate is not available in this case. However, the t estimate still works, and the estimate of 0 can thus be obtained by employing two moments. 3.5 An Upper Bound for Bias Per Unit Standard Error for Ratio Estimates and Two-Moments Estimates We first establish a general result true for the bias of an estimate of a certain type. Let the probability distribution, from which a sample Xlx2,...xn is drawn, be a general distribution of a random variable X with a single parameter 0. 3.5.1 Let tl and t2 be two statistics based on the sample such that E(tl) E [tl(xl,X2,...Xn)] E(t2) E[t2 (xlx2,..xn)] for all 0 in the parameter space of the given distribution.

tl Consider the estimate s = t to estimate 0. To find the bias in s per unit standard error of s, we have Cov (s,t2) = E(st2) - E(s) E(t2) = E(tl) - E(s) E(t2) = [e - E(s)] E(t2). (3.5.1) Now!cov(s,t2)! < a(s) a(t2) (3.5.2) where a denotes the standard error. Therefore, from (3.5.1) we have E(s) - Q a(t2) (3.5.3) I ~(s) I < I,tI = I c.v.(t2)3 where c.v.(t2) is the coefficient of variation of t2. Thus, for the bias in s we have a(s) < | cv(t2) ( In particular, when t2 is a constant, we have an unbiased estimate for 0. It may be noted that the ratio-estimates and two-moments estimates for the parameter 0 of gpsdts, which we discussed earlier, are estimates of the actual type of the estimate s that we have discussed in this section. Hence, the result in (3.5.4) also applies to them. 3.6 Estimation for a Truncated gpsd with a Finite Range of Consecutive Integers, Maximum Unknown We have discussed methods of estimation in relation to gpsd's with known range. Sometimes, however, one has to estimate the parameter 0 even when the range is not completely known, as well as when one is interested in estimating the maximum of a finite range. The problem

-49of estimation can be solved in such cases by using the "Ratio Method" and "Two-Moments Method" simultaneously. This is taken up in this section. 3.6.1 On the basis of a sample of size N with frequency nx for x (O < x < L, Z nx = N) drawn from the gpsd given by the truncated gpsd (3.4.1), axox P* = Prob X* = x x = 0,1...d- (3.6.1) for which d f*(Q) = x axx ax > 0 (3.6.2) x=O where d' is not known; to estimate Q, we choose the ratio estimate L a L-i, = L (ax.1 )n/ l (3.6.3) x=O x=0 The advantage with Q' is that besides its simplicity, it does not require the knowledge of d'. 3.6.2 To estimate d', however, the identity mp* - (d' +1) = m2 (d l)..* (3.6.4) H11 - d H01 gives d. = _ L OHo (3.6.5) where t* and m* are the first two moments about the origin of (3.6.1) and d (x+l)ax+l jp Hi x i[ a=x+ (3.6.6) x=O

-50Therefore, the estimate of d' can be obtained as S2 - S1 - ~'hll s =.1,-thol (3.6.7) where Q' is given by (3.6.3) and L Si = xi nx i = 1,2 (3.6.8) x=-O and L (x+l)ax+i j h. xi[ n (3.6. 9) J x=O x n

CHAPTER IV 4.0 ESTIMATION PROBLEMS FOR THE BINOMIAL DISTRIBUTION 4.1 Introduct'ion The gpsd defined by (1.1.4) becomes Prob IX = x = () X/ (1 + )n(4.1.1) x = 0,1,2,...n when f(Q) = (1 + g)n. Writing G = i/(1-i), (4.1.1) gives the probability law for X as: ProbX = x3 = b ( x, I ) = () (l_1)n-x, (4.1.2) x = 0,1,2,..n the well-known form of the binomial distribution. The important properties of (4.1.2) can be summarily stated as follows: t n M(t) = (1 - t + e ). (4.1.3) The first two central moments and the coefficients 1l, paLare of the form: 4 -=.nit 42 = n (1 -(4.1.4) = (1 - 2X)2 / n(l - i) 2= 3 + 1 - 6t(l - i)/ nt(l - -A). The recurrence relations reduce to: lr+l = t(l - A) Vdr + nr-il1 (4.1.5) -51

derived first by Romanovsky (1925) and Kr+i = i(l —) dKr (4.1.6) dA deduced by Frisch (1925) and rediscovered by Haldane (1940). The distribution function B(r, A, n) defined by r B(r, i, n) = Z b(x, i, n) (4.1.7) x-=o can be reduced to B(r, A, n)= ITii (n-r, r+l) (4.1.8) where x m-1 n-l Ix (m,n) f1 u (l-u) du B(mn) o for which extensive tables have been edited by K. Pearson. Romig has extensively tabulated b(x,i,n) and B(r,,n) for the range of arguments - = 0.01 (0.01) 0.50, n = 50(5) 100 and Applied Mathematics Series 6, 1950, gives b(x,x,n) for n = 2(1)49. The ordnance corps tables (1952 give values of 1 - B(r,n,n) for i = 0.01 (0.01) 0.50, n = 2(1) 150. For large n and small i,one can use tables of Poisson or Normal Probabilities, because: lim b (x,i,n) = p(x,p) n e- *, i-4 0 (4.1.9) ni = where p(x,i) = e-l. x x' and lim (B(r,,n) = O(Z), (4.1.10) n <X

-53where Z = r + 1/2 - n nit (1 - i) and x -1/2. u2 C(Z) =1If e du -2 i o On the basis of single observation on X, X = x, the maximum likelihood estimate for i is given by' = x/n, (n known). One has E(X) = it with Var (E) =::t(l:-)/n. On the basis of a random sample xi (i = 1, 2,...N) of size N from (1.2), the maximum likelihood estimate for it is given by i = x~ (4.1.11) n where N x= E xi /N. i=l (4.1.11) provides an unbiased estimate for the parameter T with Var(it) = it (l1-i)/nN. 4.2 Estimation from a Sample for a Singly Truncated Binomial Distribution Fisher (1936) and Haldane (1932, 1938) discussed uses of the truncated binomial distribution. For instance, in problems of human genetics, in estimating the proportion of albino children produced by couples capable of producing albinos, sampling has necessarily to be restricted to families having at least one albino child. Finney (1949) has cited some more applications. Fisher and Haldane derived the maximum likelihood procedure to estimate the parameter it. Moore

(1954) suggested a simple "ratio-estimate" based on an identity between binomial probabilities. For a slightly different problem, Rider (1955) suggested an alternative estimation procedure which uses first two moments. We present in this section some numerical tables to facilitate the heavy computation involved in evaluating the maximum likelihood estimate of i from a sample from singly truncated binomial distribution. The estimates given by Moore and Rider have been derived from the general results discussed in Chapter III. The efficiency and the amount of bias of these estimates are investigated in certain special cases. The probability law of the binomial distribution truncated at c on the left can be written as b* (x, T, n) = (B* (c, it, n) 1 (x) x(l_.)n-x x = c, c+l,....n. (4.2.1) where B* (r + 1, it, n) = 1 - B (r, it, n), (4.2.2) The first two moments about the origin of (4.2.1), then, are * = t* (c, i, n) = nt. B* (c-l,it,N-l)/B*(c,t,n) (4.2.3) and m* = m* (c,, n)= * (c, it, n) i 1+ Ii*(c-l,i,n-1)3. (4.2.4) 2 2 The case of truncation to the right can be dealt with in a similar way by replacing Et by 1 - it and the truncation point c by n - c.

4.2.1 To estimate t by likelihood on the basis of a random sample x. (i=l, 2,...N) of size N from (4.2.1), results derived by the general approach in Section 2.2 can be written down as follows with proper substitutions in this particular case. The likelihood equation for it is x = p* (4.2.5) where N x = i/N i=l and pu* is defined by (4.2.3) Denoting this estimate as', its asymptotic variance is given by Var ()() ( - ) (4.2.6) _ L (1-t)]2 (4.2.7) where,* is the variance of (4.2.1). As the equation (4.2.5) does not readily give an algebraic solution, one may use an iterative process of solution. However, (4.2.5) suggests that if tables be made available for means [*s for sufficiently close values of Ti, one can have a ready solution. The practical case of importance is c = 1 and sometimes c = 2. For the case c = 1, t*= nit/B* (l,c,n) = nT so that the likelihood equation becomes x E = -(7T- ) (4.2.8)

and the Expression (4.2.6) for the asymptotic variance reduces to n n-1 -11 Var (it = - _t(Ni.) (l'i) ( +n (l-E)) l (4.2.9) N n 1 - (1-E)n a result first derived by Fisher (1936). Here we present in Table I the values of p*/n for the binomial distribution truncated on the left at c = 1 for values of Ei spaced at suitable intervals. For the case c = 2, we present Table II for values of it at intervals of 0.01. Suitable charts based on these tables may be also of great help in facilitating the procedure of estimation. These tables can be used to compute Var (E) by using either Formula (4.2.6) or Formula (4.2.7). In case (4.2.6) is used, d can be approximated by the finite difference ratio dir 4*. This approximation is expected to be good since the tabular AnT interval is small. In case Formula (4.2.7) is used, the relationship for use is *U (c, T, n)= i (c, it, n) 1+ *(c-L],T,n-l) - A* (c, it, n)j{ (4.2.10) 4.2.2 For a slightly different problem, where, in a sample from a complete binomial distribution, the frequencies in some lowest classes are missing, Rider (1955) suggested a method of estimation, which uses first two moments of the complete binomial and leads to a linear equation. The method of two-moments is also applicable in the usual problem of estimation from a sample from singly truncated binomial and forms a particular case of the general method discussed in

-57Section 3:4. Proceeding on those lines, one gets in this case m* - c>* 2- E= (4.2.11) 1-It H1-(c-1)Hol where 11* and m* are defined by (4.2.3) and (4.2.4) respectively, 2 and Hll and Hol reduce to H = n* - m* 11 2 Ho1 = n - W* (4.2.11) gives then m* - c>* = 2 (4.2.12) (n-l)>* - n(c-1) so that, on the basis of a random sample of size N with nx as the frequency of x drawn from (4.2.1), the estimate for Et can be written as S -cS1 t= S2 cS (4.2.13) (n-l)Sl-n( c-1 )N where S = xn 1 x and S2 =.xnx. It is obvious that (4.2.13) is quite simple and that a great deal of computational labour can be saved if (4.2.13) is used instead of (4.2.5). On the other hand, the estimate obtained from (4.2.13) is likely to be inefficient. It is important, therefore, to investigate the loss in efficiency due to the use of (4.2.13) instead of (4.2.5).

-58To find the asymptotic variance of the two-moments estimate t of it, one gets on some simplification, NH where H = (n-l) i* - n (c-l) 2 2 *2 a* = (m* - m* ) + c (m* - ) - 2c(m*-*m*j) 11 4 2 2 3 2 2 2 d* = (n — 1) (m* p*) 22 2 and *2 J2* = (n - 1) (m* ) - *m) -c(m - ) 12 3 2 2 where m* is the r-th theoretical moment of (4.2.1) about the r origin. Thus, Var (t)= 1 (m*,m*2) +2 (n-l)l+c] (mj- ) N ~ (n-1)~*-n_(c-1)]2 - 2 S(n-l)+c (Unn*-m21)7 (4.2.15) The asymptotic efficiency of t is then given by Eff (t) = Var (i) /Var (t) (4.2.16) The special cases of some importance in genetics are c = 1 and E = 1/4, 1/2 or 3/4. The efficiency of the Two-Moments Estimate (TM) relative to the Maximum Likelihood Estimate (ML) in these cases is tabulated on the following page.

TABLE 4.2.1 ASYMTOTIC EFFICIENCY OF TM FOR c = 1 Efficiency n E = 1/4 1/2 3/4 3.925.875 t8V5 4.871.818.859 5.817.795.870 6.809,789.886 7.781.794.901 8.766.803.913 9.755,814.923 10.749.823.931 Close investigation of the above table shows that the efficiency of TM in case of it = 1/2 and T = 3/4 decreases in the beginning with n, reaches a minimum and then increases with increasing values of n. For i = 1/4, however, the efficiency decreases throughout. Let us compute, therefore, the efficiency of TM for higher values of n. The following gives the results obtained for n = 11(1)15. Asymptotic Efficiency of TM for c = 1 and i = 1/4 n.Efficiency 11.746 12.744 13.745 14.747 15.750

-60-. Thus, in case of t = 1/4 also, the efficiency reaches a minimum and then increases with increasing n. It is interesting to note that in all these cases the efficiency of TM has reached the minimum at n = 3/fE. 4.2.3 Following the general approach discussed in Section 3.1, a simple estimate for i can be obtained in the case of singly truncated binomial distribution (4.2.1). In this case, a l/ax= x/(n-x+l) and since O = i/(l-t), we have the following "ratio-estimatet" for Tr: tl i, = 1, tl+t2 (4.2.17) where n xn t = Z (- X ( ) 1 =cl! n-x+l and n-l t =Z nX X=C When c = 1, i.e. when only "zero" values are truncated, the estimate takes the form suggested by Moore (1954): a = t1 (4.2.18) tl+t2 where n=2 xnx) x=2 n-x+l and n-l t Z nx 2 x=;l

To investigate the efficiency of i' given by (4.2.17) its asymptotic variance can be written down as: Var ("') = ( 1 - ~ { )2 Np2 - 2 2 2'n-11 (4.2.19) where n-1 P = Z b*(x,T,n) n=c n D = z (x_ ) b*(x,T,n) x=c+l n-x+l and P = b* (n-l,rn) n -l Also the asymptotic variance of the maximum likelihood estimate S obtained from (4.2.5) is given by 2 Var (;) = (1Where Ci* is the variance of (4.2.1) 2 Therefore the asymptotic efficiency of IT' takes the form: 2 2 -1 Eff(t') =- 1 () D- P + 2 Pn-1) (4.2.20)

-62In particular, when c = 1 n P = 1 - i(i)n n-in-l * n1 1 nltt ((l-)n 2 =_(_ —)n _ 1-(1-_)n n-l Pn-1 n 1 -( 1-it) and n 2 x n-x D F. ( * ( 1) 141-A) reduces to n where n x ix x E= (,n-X) =' (n)x (l-n)n-x /{1 _ (-En 3 and is tabulated by Grab and Savage (1954). The special cases of some importance in genetics are c = 1 and t = 1/4, 1/2 or 3/4. The efficiency of the Ratio-Estimate (R) relative to the Maximum Likelihood Estimate (ML) in these cases is tabulated and shown on the following page.

-63TABLE 4.2.2 ASYMPTOTIC EFFICIENCY OF R FOR c = 1 n It= 1/4 1/2 3/4 3.924.875.875 4.909.769.772 5.919.715.664 6.933.694.565 7.947.693.523 8.952.705.481 9.956.723.435 10.959.776,388 Close investigation of the above table shows that the efficiency of R in case of t = 1/4 and Et = 1/2 decreases in the beginning with n, reaches a minimum and then increases with increasing values of n. For t - 3/4, however, the efficiency decreases throughout for n = 3(1)10. 4.2.4 For c = 1, we have separately discussed the Two-Moments Estimate and the Ratio-Estimate for a. To make a comparative study of these two equally simple estimates, let us investigate their amount of bias and relative efficiency. Following Sections 3.4 and 3.1, one gets, to order 1/N, the amount of bias of t (TM) and it' (R) as follows: b(t) 1 +2 vmm2 3 B(t) N (-l)p#2 N2 and b(,rt) = p + n(-l22 )p_1- (i-_2)2D] = B(,') NP' L N

-64The table on the following page gives B(t), B(ir') and also a relative efficiency of t over i' for c = 1 and i = 1/4, 1/2 and 3/4. The relative efficiency is given by Rel. Eff = Var (T')/Var(t). Let us also study the amount of bias relative to the standard error of the two estimates for some sample size say 100. The.following table gives the bias as a percentage of standard error (100 I.bl /S.E.) for both TM and R for c = 1 and E = 1/4, 1/2 and 3/4. TABLE 4. 2.4 BIAS AS A PERCENTAGE OF STANDARD ERROR FOR c = 1 AND N = 100 -1/14 E= 1/2 = 3/4 n TM R TM R TM R 3 6.34 5.07 7.11 3.82 6.15 2.86 4 6.94 4.05 6.21 4.05 5.40 3.31 5 7.03 3.39 6.48 4.17 4.80 3.44 6 7.24 2.94 5.84 4.15 4.35 4.08 7 7.12 2.61 5.66 4.05 3.99 4.11 8 6.97 2.37 5.31 3.90 3.71 4.20 9 6.46 2.18 4.99 3.73 3.49 4.18 10 6.39 2.02 4.28 3.62 3.29 4.08 Table 4.2:.3 shows that both TM and R are underestimates of ~i. A closer investigation, however, brings out that the bias to order 1/N is in general considerably smaller for R. Also, Table 4.2.3 shows that whereas for it = 1/2 and It = 3/4, R is less efficient than TM, it is more efficient when t = 1/4. Thus, a closer study of the relative efficiency of the two estimates is necessary. However, Table 4.2.3 suggests for n = 3(1)10 that the Ratio-Estimate may be used to estimate

TABLE 4.2.3 COMPARISON BETWEEN TM AND R WHEN c=l N(Amount of Bias to Order 1/N) Var(R) n TM R Var(TM) Case (1) - = 1/4 3 -.2412 -.1927 1.000 4 -.2152 -.1227.958 5 -.1896 -.0861.889 6 -.1715 -.0648.867 7 -.1535 -.0510.824 8 -.1379 -.0420.,804 9 -.1187 -.0355.790 10 -.1097 -.0301.781 Case (2) i = x/2 3 -.2717 -.1458 1. 000 4 -.1940 -.1307 1.065 5 -.1748 -.1184 1.111 6 -.1398 -.1062 1.138 7 -.1230 -.0943 1.146 8 -.1063 -.0833 1.139 9 -.0934 -.0736 1.126 10 -.0748 -.0651 1.053 Case (3) -: = 3/4 3 -.1763 -. 0820 1.000 4 -.1290 -.0833 1.112 5 -.1004 -.0893 1.543 6 -.0818 -.0962 1.569 7 -. 0689 -.0931 1.724 8 -.0595 -.0928 1.900 9 -.0524 -.0915 2.124 10 -. o468 -. 0896 2.398

the parameter Ti of a binomial distribution truncated at c =. 1, especially when Ti is near to 1/4 or less; whereas Two-Moments Estimate may be preferred when it is near to 1/2 or more. 4.2.5 The detailed computation procedure of evaluating the three types of estimates discussed above will be illustrated with reference to K. Pearson's data on albinism in man. The table below gives the number of families (nx) each of five children having exactly x albino children in the family, (x = 1, 2, 3, 4, 5). Number of albinos in family (x) 1 2 3 4 5 Number of families (nx) 25 23 10 1 1 If t is the probability for a child to be an albino, we may accept the truncated binomial model: (n) (x n-x i1 - (1)n x = 1,2,...n. for the probability of x albinos in a family of n. Here n = 5, and the problem is to estimate A on the basis of the data given in the table above. Maximum Likelihood Estimate: From the table, we get N = 60 s = Z xn = 10 1 X x = S1/N = 1.83333,

-67so that x/n = 0.366667. Referring to Table I for n = 5, we find the following: E I*/n 0.30 0.360607 0.31 0,367474 The maximum likelihood estimate is given by that value of A for which ~*/n = 0.366667. By linear interpolation, we thus get = 0.30 + 0.366667 - 0.360607 (0.31 - 0.30) 0.367474 - 0.360607 = 0.3088. The variance of this estimate is estimated from the formula Var (c) = ~ (1 ) N (d*) dc (d*) can be obtained approximately from the tables by taking dt. differences instead of derivatives. Thus, (dL ^ 5 X (0.367474 - 0.360607) = 3.4335. de f... 0.31 - 0.30 Hence, Var ( ) 0.3088 X.6912 - 0.0010361. 60 X 3.4335 Thus the standard error of t is given approximately by: S.E. (i) = JF(0.0010361) = 0.03219. This is as far as linear interpolation in the tables will go. If we want to carry the approximate solution of the likelihood equation

further, we start with o = 0.3088 as the first approximation and compute, the theoretical mean n = 5 x 0.3088 1 - (-_o) 1 - (0.6912)5 = 1.83330 and O 0 0 1.83330 = 0.3088 X 0.6912 ( 1 + 4 X 0.3088 - 1.83330) = 3.4520 It is interesting to compare the exact value of di* viz., 3.4520 deT at n = 0.3088 with the approximation obtained by differencing in the tables, viz., 3.4335. The next approximation is then given by: = E + (X *) / (IJA*) 1 0 d = 0.3088 + (1 83333 - 1.83330) / 3.4520 = 0.308809 which does not affect the fourth place. The variance of the estimate is Var ('E ) = N(/ —) / d) 0.3088 x 0.6912 / 3.4520 60 3 = 0.0010305 so that the standard error is S.E. (~)=40.0010305 = 0.03210. The agreement of this with the previous estimate obtained directly from the tables is remarkable, and these latter computations

-69are really unnecessary, especially in view of the somewhat large standard error. It is believed that linear interpolation in the tables would be generally adequate for all practical purposes and the second cycle of approximation would not be necessary. Two-Moments Estimate: To compute this estimate for t, we require 2 in addition the value of S2 =. x nx = 248. Then the estimate is t 1 S2 - 1 n-l S = 4 248 - 1l= 0.3136. v L 110 To compute the variance of t we require:' ni n = 1.95280 el = i*[(n-l)t + 1l= 4.40239 m* = *[(n-l)t+l + (n-l)it { (n-2)i + 23] = 11.60615 and m4 = px*[(n-1)7t+l + 3(n-l)it (n-2)it+21 + (n-l)T. (n-2)it(n-3)t+3, ] all evaluated by taking 0.3136 as the estimate for it. The variance of t is estimated from the formula Vaxr (t)) 1 (m* - m* ) + 1(n-1)T+1 2 (m*-p*2) N (n-1 )012t 4 2 2 - 2.{ (n-l) t+1l (m*-pi*m*)] = 0.0012066 so that the standard error is S.E. (t) = 0.03474.

-70Ratio Estimate: The ratio estimate for te is given by t tl+t2 where n tl=Z ( x )n x=2 n-x+l x and n-l t2 = n 2 x=l x Here t2 = 59 and tl can be computed from the following table: x n x n-x+l x 2 5 23 3 1 10 4 2 1 5 5 1 Thus, t1 = 28.50. The ratio estimate is obtained as t' = 28.50 = 0.3257. 28.50 + 59 To compute the variance of -A', we require n P = 1 - = 0.99574 1-( 1-t )n n(1-t) (1-P) = 0.0o4408 Pn -1= 0. = (l)n [(n-1)+l- (_ nX] = 0.86052,

-71and ID = (;) _ (1_)n 4(1 - en) E (1 n, 1-ni) n n+! = 0.56156 The quantity Et-zn,l-xT) was obtained from the table by Grab and Savage, which gives for n = 5, it - rc E (1, n,l-t) 65 0.35465 70 0. 32183 By interpolation, E(l,n,l-ir) = 0.33870 taking it' = 0.3257 as the estimate for Ec throughout. Then the variance of i' is estimated from the formula Var () = P t)D - P + 2P >*l Ad n-1 = 0.0013410 so that the standard error of -c' is S.E. (x') = 0.03662. The following table summarizes the results obtained: Estimate Value Variance Standard Error ML 0.3088 0.0010305 0.03210 TM 0.3136 0.0012066 0.03474 R 0.3257 0.0013410 0.03662

-724.3 Homogeneity and Combined Estimation for Singly Truncated Binomial Distributions While sampling for studies in albinism, observations are available simultaneously from families with varying family-size. In such situations, one may be required to examine if the distributions (in the case of albirtLnm:families) are homogeneous in respect of the parameter A and if so, to make a combined estimate of A. k 4.3.1 On the basis of a random sample of size N Z N. j=l j from k singly truncated binomial distributions characterized by the probability law: IB* (c, i, nj)]l ( (1n,)nJx (4.3.nj-1) x = 1,2,...n. j = 1, 2,... k, following Section 2.5 The j-th "efficient score" is N. = JIz.Tj - Ij(Tt) (4.3.2) where Aj (Cj) is the mean of the j-th distribution. The elements of the information matrix are Nj dkj Nj k2j(0j) JJa1 -g. d j [Itj(l-l -)T (4.3.3) I =0 ja j' where ~2j("j) is the variance of the j-th distribution. The hypothesis of homogeneity is Ho: t1 = m2 ='''k.

If the hypothesis Ho is true, the common value may be denoted by it and the efficient score and the information with respect to t are given by: k = N [xk- Z N.(=)] (4.3.4) (1-~) N jtl J J where k x * g N.x./N (4.3.5) j=l J J and k d. 1 E N (4.3.6) I= t(-it) j=l j d(t 1 k [ _(1TT) j-l Nj~2j (4 3 7) To solve the equation 0 for it, we may start with an approximation ot and derive a better approximation it1 from the formula, = - N dj i1l =i + N [x i tN (i)]/), N/ (d i) o N7 33 0 di- o (4.3.8) Or Tr = +N~ o(l-To) [x N 0Njj (o )]/ZNj (it) (4 3.9) and repeat this process of iteration till sufficient accuracy is attained. This maximum, likelihood estimate will be denoted by J1. A test of the homogeneity hypothesis Ho is then given by the statistic k 2 k- - N -j( ) /Ijj(() kN 2Xk2 = lZ N. Ex. - vtj(t)]2/i2 (I( ) jl 3 3 3 2J (k31.

which is asymptotically distributed as a Chi-square with (k-l) df if H is true. 0o 4.3.2 We shall illustrate the computational technique and the use of the tables with reference to the problem of estimating the proportion of albino children from K. Pearson's (1913) data quoted below: No. of children No of Total number of in family families albino children in the families nj Nj Tj 2 40 49 3 55 76 4 50 85 5 60 110 6 53 116 7 46 103 8 27 77 9 29 73 10 20 52 11 14 50 12 8 28 13 4 19 14 4 16 15 1 10 Total N = 411 T = 664 To get the first approximation, we compute the values of xj/nj = Tj/Njnjand referring to the tables or to the charts obtain for each household-size an estimate of it.. This is done in columns (3) and (4) of Table A4.341. We find the estimates clustering around io = 0.30 which value we take as the starting point of.our computations. The next step is to read off from the tables the mean values * (ao) and the difference-ratios 6. =[4) / Tat] for different values of nj's. These are shown in columns (5) and (6) of Table 4.3.1 3

TABLE 4.3.1 COMPUTATIONAL PROCEDURE FOR HOMOGENEITY AND COMBINED ESTIMATION (1) (2) (3) (4) (5) (6) nj d i- Nj Ti pj xj/n n ) N. ) T.) x; ) xi n COJ (tFo) ( )c J() (dJ) j ()) (See (See Be- (See Be- } A Below*) low**) low**) 2 40 49 1.22500.611.36 1.17647 0.699 1.18217 0.69878 0.1050 3 55 76 1.38182.461.31 1.36986 1.512 1.38224 1.51804 0.0000 4 50 85 1.70000.425.35 1.57916 2.430 1.59905 2.44156 0.2087 5 60 110 1.83333.367.31 1.80304 3.433 1.83115 3.44952 0.0001 6 53 116 2.18868.351.31 2.04001 4.500 2.07686 4.52109 o.1466 7 46 103 2.23913.320.29 2.28847 5.612 2.33443 5.63611 0.0741 8 27 77 2.85185.354.34 2.54682 6.750 2.60211 6.77694 0.2485 9 29 73 2.51724.278.26 2.81354 7.901 2.87826 7.92879 0.4767 10 20 52 2.60000.260.25 3.08721 9.052 3.16137 9.08069 0.6941 11 14 50 2.67143.325.32 3.36657 10.197 3.45012 10.22481 0.0201 12 8 28 3.50000.291.29 3.65053 11.331 3.74338 11.35621 0.0417 13 4 19 4.75000.365.36 3.93816 12.448 4.04018 12.47266 0.1616 14 4 16 4.00000.286.29 4.22867 13.552 4.33975 13.57311 0.0340 15 1 10 10.00000.667.67 4.52146 14.6 41 4.64146 14.65803 1.9589 Total 411 864 4.1701,=0.3082 -2 =19.558 Weighted 2.10219 2.06462 4.57352 2.10207 4.323323 =19 average I10.30+ 2.10219 - 2.06462 V)000 =0.3082 0.3082 * From Table I nx. it ** Computed directly from the formulae j(i) = l+(njl) j] _(,_,,)njdir 7(1-)[+(-1)0~j

The details of the computation process are the same as on the illustrative example in Section 4.2 The next step is to compute the weighted averages. x = Z Njxj/N = 2.10219 = Z NJI/ N = 2.06462 and = Z N.j./N = 4.57352. Then next approximation to the maximum likelihood estimate is thus: 1! = o + = 0.3082 the same as obtained by Haldane (1938) who wrote down the likelihood equation in the form -s iL 1(1A)nj and solved it directly by iteration. One single computation in our case is thus sufficient. The variance of the estimate is approximately given by: v(E) =.(1_-) N 6 0.3082 X 0.6918 = 0.000113 411 X 4.57352 so that S.E. (r) = 0.0106. The values for SE. obtained by Haldane is 0.0107 which is the same for all practical purposes. To test if the proportion of albino children is the same in faAilies of different sizes, we have to compute X2 1 Z N,,.X *( 13 ni(l-) )d>

-77But since the maximum likelihood extimate 0 = 0.3082 does not differ very much from our starting approximation iO = 030 we may use the approximation -d4.lj -6 0 which are already computed. Thus C2 =1 Z N( i)2 3 0-3082 X 0.6918 6. 4.1701 = 0.3082 X 0,6918 = 19.558 which with 14-1 = 13 degrees of freedom is not significant. The families of different sizes can thus be regarded as homogeneous in respect of the proportion of albino children and the common proportion is 0.3082 + 0.0106. 4.4 Estimation from a Sample for a Doubly Truncated Binomial Distribution In studying albinism, sampling may be restricted to only those families which contain at least one albino child and also at least one non-albino, Finney (1949), giving thus rise to samples from doubly truncated binomial distribution. We discuss the case of general truncation here and present some numerical tables to facilitate the heavy computation involved in evaluating the maximum likelihood estimate of Tr from a sample from a binomial of which only extremes are truncated. The simple "ratio-estimate" is also derived and its efficiency is investigated for this special case of practical importance. 4.4.1 The probability law of a doubly truncated binomial

with truncation points, say, at c and d can be written as: b* (x, t, n) =[B* (c, d, d,, n)] n)Ax(_n (4.4.1)-x x = c,c+l,...d where B* (c, d, it, n) = B (d, t, n) - B (c -1, it, n). (4.4 2) The first two moments about the origin is (4.4.1) are i* = i*(c, d, I, n) = nt. B*(c-l, d-l, t, n-l)! B* CIa, d, A n) (4.4.3) and m = mj(c, d, i,>n) = i*(c, d, i, n). [ 1+I*( c-l, d-l,, n-1 ) ] (4.4.4) To estimate it on the basis of a random sample of size N with frequency nx for x drawn from (4.4.1), the likelihood equation for i can be written down as: - A x = _* n n (4.4.5) and the asymptotic variance of the estimate n obtained from (4.4 5) is Var ()' t(l-t) / da* Var (t) N= N dit (4.4.6) (4.4-7) To facilitate the solution of (4.4.5) in the special case when c = 1 and d = n-l i.e extreme observations truncated, we present tables at suitable intervals of t for kt*/n which reduces in this case to: a*/n = *(1,l n-l, =, n) = (i - E)/[1- - (1-) ]. (4.4 8)

-79An approximate value of Var (h) may be obtained by getting d — - from tables of Il* or else to get the exact value, one can use d All n *2(1,n-l,R,n) =,u*(l,n-l,i,n) [l+,* (otn-2,i,n-1) 2 - ~*( l, n-l,,t,n )] (4.4.9) where values of 1j*(o,n-2,ir,n-1) can be obtained from Table I for single truncation. 4.4.2 Unlike cases of single truncation, it may be of interest to note that the two-moments estimate is not available here. This follows from Section ~3.4, Case 4. 4.4.3 Following the general approach discussed in Section 3.1, the simple "ratio-estimate" for it can be written down for c = 1 and d = n-l as: tl -l+t2 (4.4.10) where n-l tl =z (Xnx ) x=2 n-x+l and n-2 t2 = nx x=l The asymptotic variance of A' then takes the form: n2.2 2 2 Var (') -(l- [(1-) D PA + 2n Pn-2] where P (n),x(l_,)n-x/ [l_fn _ (l_,)n] X ~ n-2 P =Z P x=l

-80and n-l 2 D =Z ( x ) p D= (n-x+l x x=2 which reduces, in this case, to D=(Il) (n+l) I (1-( 1 -n n -1. 1 -A where n E(X, n, A) n=l x (x) S (1-E) n-} 4.5 Simultaneous Estimation of Both Parameters of a Binomial Distribution The binomial distribution has essentially two parameters it and n of which n is usually known and only Et has to be estimated. However, certain cases might arise in which n is unknown,and both n and T have to be estimated. For instance, while experimenting with a radioactive substance, in addition to the mean number (f = n') of disintegrating atoms, it may perhaps be of interest to know the number (n) of atoms capable of disintegration for the su1,stance in fixed intervals of time for some specified solid angle and fit a model correspondingly. 4.5.1 To estimate t and n on the basis of a random sample of size N with observed frequency nx for x (nx = N) drawn from (4.2.1) with n unknown, following Section 2.6, the moment-estimates are given by x = ni (4.5.1) and s2 = ni (1-i), (4.5.2)

where x = xn/N and S2 = Z nx(x - x) /N The likelihood equations reduce to x = nt (4.5.3) and E Tr+l + N log (1- ) = 0 (4.5,4) r _o a-r where Tr+l= nx x>r Eliminating S from (4.5.3) and (4.5.4) we have to solve for an,the equation: Z Tr+l + N log (1 - x) =. (4.5.5) ro -tr d The elements of the information matrix are: Ill = nNi/(1- ) I~2 = N/(l1-rt) (4.5.6) I22 = E[L Tr+l/(n-r) ] N Z [l-B(r,n,n)]/(n-r) We note that n is a discrete parameter and also the range of (4.2.1) depends on n. The properties of the estimates are therefore not known, but in the (4.5.1) and (4.5.2) worked out, however, fairly accurate results are obtained. 4.5.2 To estimate A and n on the basis of a random sample of size N with observed frequency nx for x (&nx = N) drawn from a truncated binomial, say given by (4.4.1) the moment-equations are: x =, (4.5.7) and S = m* (4.5.8) 2'

-82 - where x = xn N, S = x2n x/N and t and m*2 are defined by (4.4.3) x X and (44.4.4) respectively. The "efficient scores" for t and n reduce to: = N (x - ) (45.9) and =E Z r+l + N log (1-Jt) - N 6B*/ B* (4.5.10) 2 ro n-r n where B*'s are defined by (4.4.2). The likelihood equations then become x = ~* (4.5.11) and Jr2 (4.5,12) The elements of "information matrix" are I N. aN S I11 It(-)') I = N. a>* 12 (v and 2 2 T I22 [2 B*/B* ( B*/*) + E r 22 nn2 n Er' (4.5.11) and (4.5.12) may be solved for estimation, approximating.A B(r,n,n) _B an and getting AB from binomial tables where B is defined by (4.1.7).

However, exact values of a B(r,n,n) and a2 B(r,i,n) which 7n 2 an we shall call "Incomplete Dibeta and Tribeta Function" respectively, can be obtained as follows: 4.5.3 3 B (r,i,n) = n Ii- (n-r, r+l) z=l- = I1 (n-r, r+l) [Ez(n-r, r+l)] z=l (4.5.14) where I's are incomplete beta functions, and z n-l m-l f log u u (l-u) du 0 =(n,m) iz n-1 4 f u (l-u) du 0 which means the expected value of log u when u follows a beta distribution truncated on the right at z, with parameters n. and m. E (n,m) can be reduced to z m-1 I (n+r, m-r) E(n,m) = log z - 1 Z z z (nm) r=o n+r m integer In particular, m-l E1 (n,m) = F`1 r=o (4.5.16) suggests that the values of "Incomplete Dibeta Function" can be exactly obtained by using tables of "Incomplete Beta Funtion.' which are extensively tabulated. To obtain "Incomplete Tribeta Function" a2 B(r,it,n) = I (n-r, r+l) [E (n-r, r+l)'izL an ( 4.5-1..( tZ.l

We get after some simplification of R.H.S. of (4.5.18), 1-IT z 1 an2 1 _T+ [Vz(n-r, r+l)]l (4.5.19) where V (n,m) is the variance of log u when u follows a beta distribution with parameters n and m truncated on the right at z. V (n,m) can be obtained from z V (n,m) = E (n,m)- [E (n,m)]2 (4.5.20) where E2 (n,m) can be reduced to z ~~~~2 2 ~ m-l E (n,m) (log Z) 2 z (n+r, m-r) F z z I (n,m) r=o n + r (n+r, n-r). (4.5.21) In particular, m-l m-l El (n,m) =2 ( E 1 )/(n+r). (4.5.22) r=o r=o n+r 4.5.4 The computation procedure for simultaneous estimation will be illustrated with reference to two examples: one on radioactive disintegrations and the second one on throwing of dice. Example 4.5.1 The first two columns of the following table give data collected by Rutherford and others, showing the number (nx) of intervals of time, each of 7.5 seconds, during which the number (x) of a particles omitted from a certain radioactive substance.

-85Data: Rutherford and Geiger: Radioactive Disintegration No. of No. of T, nsTx ce particles intervals -ss- n x + 1 x nx Tx _ n = 77 n = 79 0 57 2608 - 1 203 2551 33.1299 32.2911 2 383 2348 30.8947 30.1026 3 525 1965 26.0000 25.5195 4 532 1440 19.4594 18.9474 5 408 908 12.4384 12.1067 6 273 500 6.9444 6.7568 7 i139 227 3.1972 3.1096 8 45 88 1.2571 1.2222 9 27 43 0.6232 0.6056 10 10 16 0.2353 0.2286 11 4 6 0.0896 0.0870 12 2 2 0.0303 0.0294 Total 260t 134.2995 131.0065 Z xnx = 10094 N log n - log(n-x) x = 3.870 134.4390 130.9693 xn = 48650 X S- = 3.676 -0.1395 +0.0372 n = 77 n = 78. Moment Estimates: Following Section 4.5, we have for this data, N = 2608 Zxnx x = = 3.870 s2 (x-x) nx 3.676 N so that the estimate for the mean number Ct of ca-particles emitted per interval is = x = 3.870

-86and the number (n) of particles capable of disintegration for the substance during the interval of 7.5 seconds is estimated by n= x =77. - S Maximum likelihood estimates: The estimate for the mean number of aparticles per interval remains the same, namely W = 3.870. To get the estimate n of n, starting with the moment estimate n = 77, we solve the equation: I(n) = Z Tr+l - N [log n - log (n-x)] =0. rao n-r For n = 77, we have N [log n - log (n-x)] = 134.4390i From column 4 of the above table, we have for n = 77, Z Tr+l = Z Tx - 134.2995 r>o n-r x1l n-x+l. (77) = 134 2995 - 134.4390 - 0.1395. Let us try next n = 79, say. Now, N [log n - log(n-x)] = 130.9693 and column 5 of the above table gives for n = 79. Z Tr+l = 131.0065 ro n-r' (79) = 0=0372 Thus, whereas V(77) is negative,'(79) is positive and therefore the likelihood estimate for n is n ) 77 and < 79. n = 78. Example 4.5.2 The first two columns of the following table give data, due to Weldon, that show the results of throwing n dice 4096 times, a throw of 4, 5 or 6 being called a success. x denotes the number of successes and nx the frequency of x.

-87Successes Frequency T E = n Tx s~_x n - x+l x nx Tx n = 12 n = 13 n = 11 0 0 4096 - 1 7 4096 371.7273 315.0769 372.3636 2 60 4089 371 7273 340.7500 408.9000 3 198 4029 402.9000 366.2727 447.6666 4 430 3831 425.6667 383.1000 478.8750 5 731 3401 425.1250 377.8888 485.8571 6 948 2670 381.4285 333.7500 445.0000 7 847 1722 287.0000 245.0000 344.4000 8 536 875 175.0000 145.8333 218.7500 9 257 339 84.7500 67.8000 113.0000 10 71 82 27.3333 20.5000 41. oooo 11 11 11 5.5000 3.6666 11.0000 Total 4096 2927.7641 2600.6383 3366.5123 an = 25145 N log n - log (n-x) x x = 6.139 2935.1030 2617.7000 3344.9000 &x n =166367 t x s2 2.930 -7.3389 -17.0617 + 21.9123 n = 12. n = 12. Moment Estimates: Following Section 4.5, we have for Example (4.5.2) N = 4096 x = 6.139 2 S = 2.930 so that the estimate for the number of dice thrown is given by -2 n = x = 12, 2-S x -s

-88and the estimate of the proportion of successes (i) is x 6.139 - -n = = 0.5116 Maximum likelihood estimates: To get firstly the estimate n of n, starting with the moment-estimate n = 12, we solve the equation: (~n) Z y Tr+l - N[log n - log (n-x)]=O r>o n-r For n = 12, we have N[log n - log (n-x)] = 2935.1030. From column'4 of the above table, we have for n = 12, Z. r + 1 = T x = 2927.7641 r.o n - r x>l n-x+l v(12) = 2927.7641 - 2935.1060 = -7.3389 Let us try next n = 13, say, to see if (13) is near zero. We get by proceeding as before, t(13) = -17.0617 which is further from zero than 4r(12). Therefore, we try n = 11. We have then t(11) = 21.9123 which indicates that n = 12 The estimate of Ec is then obtained by =[ = = 0.5116. (Note: Weldon had thrown 12 dice).

CHAPTER V 5.0 ESTIMATION PROBLEMS FOR THE POISSON DISTRIBUTION 5.1 Introduction The gpsd defined by (1.1.4) becomes Prob X = x= - x /e x = 0,1,2,... (5.1.1) when f(Q) = eQ. Writing p. =, (1.1.4) gives the probability law for X as: Prob JX = xi = p(x,pl) = e- x, (5.1.2) x = 0,1,2,... the well-known form of the Poisson distribution. The important properties of (5.1.2) can be summarily stated as follows: M(t) = e.(et-l)- (5.1.3) The first two central moments and the coefficients Pl'2 are of the form: p = ~ p.2 = I B1 = 1/p. 52 = 3 + 1/4 (514) The recurrence relation connecting the central moments is!r+l = L(d + rp=dp) (5.1.5) and all the cumulants are equal Kr = 4 r = 1,2,... (5.1.6) It is well-known that the equality of all cumulants is necessary and sufficient for a probability distribution to become Poisson. For -89

-9o0any gpsd to be Poisson, however, the equality of first two cumulants only is necessary and sufficient (Chapter I, Theorem 3). The distribution function P(r,L) defined by r P(r,l) = Z p(x,) (517) x=O can be reduced to P(r,F) = I (r+l) (5.1.8) where 00 00 Ix(r) = e-u ur-ldu/f e-u ur-ldu x 0 is the incomplete gamma integral tabulated by K. Pearson. Molina (1947) has extensively tabulated p(x,0) and l-P(r, l) for the range of argument 1 = 0.001 (0.0001).010(.01).30(.1) 15(1) 100. Kitagawa (1952) has also edited "Tables of Poisson Distribution." For large p, we have the normal approximation given by: lim P(r,1) = 0(z) (5.1.9) -L — c00 where z = (r + 1/2 -t)/A& and 1 e-l/2 u2 0(z)'- 2 f du. On the basis of a random sample xi(i = 1,2,...N) of size N from (5.1.2), both the moments and maximum likelihood estimate for 1t is given by = x where N x = Z xi/N. (5.1.10) provides an unbiased estimate of t i=l with Var (~) = v/N.

-915.2 Estimation from a Sample for Truncated Poisson Distribution Problems of estimation in a truncated Poisson distribution with known truncation points have been discussed by various authors. The case of truncation on the left has been considered by David and Johnson (1948) who gave the maximum likelihood estimate, by Plackett (1953) who gave a simple and highly efficient ratio-estimate, and by Rider who used first two moments. Truncation on the right has been discussed by Tippett (1932), Bliss (1948), and Moore (1952). Tippett derived the maximum likelihood solution, Bliss developed an approximation to it, and Moore suggested a simple ratio estimate. Double Truncation has been studied by Moore (1954) and Cohen (1954). Moore gave ratio-estimates, while Cohen provided likelihood equations. Neat and compact equations for estimation by the method of maximum likelihood (which has been shown to be identical with the method of moments, in general, for single-parameter gpsd's) can be derived from the general approach discussed in Chapter II. We present numerical tables and some suitable charts to facilitate the solution of these equations in certain special cases. The estimates given by Rider and Moore have been derived from the general results discussed in Chapter III. The efficiency and the amount of bias of these estimates are investigated in some cases. Problem of estimation has been also considered for single truncation with unknown truncation point. 5.2.1 The probability law of the singly truncated Poisson Distribution with truncation point on the right at d can be written as: p*(x, ) = [P(d,)]- e — x (5.2.1) x = 0,1,2,...d where P(r,i) is defined by (5.1.7).

-92The first four moments about the origin of (5.2.1) can be written down as: * = =*(d,i) = I P(d-l,p.)/P(d,Ci) (5.2.2) m* = (d,L) = m*(d,) = * (d1 + *(d-l,) (5.2.3) m = m3(d,) = t*(d,l)[l + 2+i*(d-1,) + m2*(d-l,p)] and m* = m*(d,i) = *(d,)[l + 3*(d-l,j) + 3m*(d-,) 4 4 2 + m*(d-lI>]. 3 5.2.2 To estimate p on the basis of a random sample xi(i = 1,2,...N) of size N from (5.2.1), the results derived by the general approach in Chapter II can be written down as follows. The likelihood equation for p. is x= ~* (5.2.4) where N x = Z xi/N and p* is defined by (5.2.2). i=l Denoting this estimate as j, the asymptotic variance is given by Var ( A = - (N-d) (5.2.5) = 1P2 (5.2.6) where.*< is the variance of (5.2.1). Equation (5.2.4) suggests that 2 if tables be made available for means W*s' for sufficiently close values of p, we can have a ready solution of (5.2.4). We present in

-93Table IV a numerical table for the arguments L = 0.0(.1)4.9 and d = 4(1)10. This table can be used to compute Var (f) by using Formula (5.2.5) or Formula (5.2.6). In case (5.2.5) is used, d* can be approximated by the finite difference ratio 4L. In the event Formula (5.2.6) is used, the relationship for use is t*(d,4) = 4*(di)El + 4*(d-l,) - >*(dy)]. (5.2.7) 2 5.2.3 The method of two moments is applicable in the usual problem of estimation from a sample from a Poisson distribution singly truncated on the right, and forms a particular case of the general method discussed in Section 3.4 of Chapter III. Proceeding on those lines, one gets in this case rn4 -(d+l)4* H11 - dH01 (5.2.8) where p* and m2* are defined by (5.2.3) and (5.2.4), respectively, and H11 and HO1 reduce to H01 = 1. Then (5.2.8) gives 14 = * - d(5.2.9) so that, on the basis of a random sample of size N with nx as the frequency of x drawn from (5.2.1), the estimate for 4 can be written as S2 - (d+l)S1 S1 - dN (5.2.10)

where S1 = xn and S2 = x2n. x 2 x To find the asymptotic variance of the two-moments estimate (TM) given by (5.2.10), one gets on simplification, Var (t) 2 ( + 22 12) (.2.11) NH2 11 22 12 where H =*- d = (m~ - m*2) + (d+1)2(m -2) - 2(d+l)(m* - *m*) (5.2.12) 5 2 22 2 and 12 (m* - *m*) (d+l)(m* _*2) 12 3 2 2 where m* is the r-th theoretical moment of (5.2.1) about origin. r Thus, Var (t) - 2(=t+d+l)2(m*- 4 Om*)) 2 (+22.1m) - 2(4+d+l)(m*- *m*)]. (5.2.13) 5 2 The asymptotic efficiency of t is then given by Eff (t) = Var (")/Var (t). The following table gives the asymptotic efficiency of t relative to g for values of d = 5 with ~ =.25,.5(.5) 2.5, and d = 10 with =. 5(.5)5

TABLE 5.2.1 EFFICIENCY OF TM.25.50 1.00 0 2.00 2.50 Case (i) d=5 Eff..978.954.904.867.850.838.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Case (ii) d = 10 Eff..989.988.960.942.920.897.874.855.835.815 Thus, the asymptotic efficiency of TM is never less than 82 percent in the above cases, and one may therefore use TM to estimate i in such problems. 5.2.4 Following the general approach discussed in Section 3.1 of Chapter III, a simple ratio-estimate can be obtained for p. of (5.2.1). In this case, axl/ax = x, and since 0 = p, the ratio-estimate for p takes the form d d-l't = xnx/ Z nx (5.2.14) x=O x=O as first suggested by Moore (1954). The asymptotic variance of the "Ratio-estimate" (R) given by (5.2.14), can be obtained as Var (') = (D p2 + 2p12Pd_1) (5.2.15)

-96where, in this case, d-i P= Z p*(xX) 0 d D = 7 x2 p*(x, ) = m* 0 2 and P = d- 1 The asymptotic efficiency of 0' is then given by Eff (k') = Var (^)/Var (t). The following table gives the asymptotic efficiency of Cit relative to Ii for values of d = 5 with [ =.25,.5(.5)2.5, and d = 10 with = TABLE 5.2.2 EFFICIENCY OF R.25.50 1.00 1.50 2.00 2.50 Case (i) d= 5 Eff..999.990.979.967 ~.951.923 ~.5 1.0 1.5 2.0 2.5 3.0 3.5 4 4.5 Case (ii) d = 10 Eff. 1.000 1.000 1.000 1.000.999.992.981.894.817 Thus, R seems to be highly efficient on the whole and its efficiency always exceeds 82 percent in Table 5.2.2.

-975.2.5 So far, we have separately discussed the Two Moments Estimate and the Ratio Estimate for A. To make a comparative study of these two simple estimates, let us investigate their amount of bias and relative efficiency. Following Sections 5.1 and 3.3 of Chapter III, one gets to order 1 the amount of bias of t(TM) and' (R) as follows: N b(t) = (or2*2 - o*2)/NH2 B(t (5.2.16) where H,a 2 and a*2 are defined by (5.2.12) and b(>') = Lpd-l/NP2 =.B(t (5.2.17) N The relative efficiency of R over TM is given by Rel.Eff. = Var (t)/Var (t't). Table 5.2.3 gives B(t), B('t) and Rel. Eff. for values of d = 5 with 1 =.25,.5(.5)2.5 and d = 10 with CL =.5(.5)5. Table 5.2.3 shows that both TM and R are over-estimates of A. A closer investigation, however, brings out that the bias to order 1 is always considerably smaller for R. Also, R is more efficient than TM. Thus, we conclude that one may prefer the Ratio Estimate to the Two-Moments Estimate for [i of the Poisson distribution singly truncated on the right because of its simplicity, small bias and high efficiency. 5.2.6 The probability law of the singly truncated Poisson distribution with truncation point on the left at c can be written as: p*(x,4) = [p*(c,p)f-l e1 I- (5.2.18) x = c,c+l,...o

TABLE 5.2.3 COMPARISON BETWEEN TM AND R N(Amount of Bias to order 1/N) Var(TM) TM R Var(R) Case (i) d = 5.25.0526.0003 1.022.50.1111.0008 1.48 1.00.2498.0015 1.134 1.50.-4260.0719 1.187 2.00.6507.1977 1.199 2.50.9181.4461 1.210 Case (ii) d = 10 ~ 5.0526.0000 1.011 1.0.1111.0000 1.012 1.5.1765.0003 1.041 2,0.2500.0004 o 062 2.5.3333.0022 1.087 3.0.4284.0081 1.115 3.5.5547.0231 1.144 4.0.6640.0536 1.170 4.5.8093.1063 1.243 5.0.9786.1876 1.226

TABLE 5.2.4 BIAS AS A PERCENTAGE OF STANDARD ERROR FOR N = 100 Bias/ S.E. X 100 > TM R Case (i) d = 5.25 1.0412.0063.50 1.5339.0112 1.00 2.3566.1550 1.50 3.9107.5763 2.00 3.9114 1.2576 2.50 4.3275 2.3137 Case (ii) d = 10 5.7402.0000 1.0 1.1043.0000 1.5 1.4120 0oo029 2.0 1.7150.0057 2.5 2.0202.0136 3.0 2.3352.0467 3.5 2.7484.1219 4.0 3.0121.2607 4.5 3.3599.4650 5.0 3.8477.7380

-100where P*{(c,) = 1 - P(c-l,p). (5.2.19) The first two moments about origin of (5.2.1) can be written down as *( = *(c,) = p P*(c-l;,)/P*(c,p) (5.2.20) and m = m*(c,p) = *(c,p1)[ + p*(c-l,p)]. (5.2.21) To estimate p on the basis of a random sample xi (i = 1,2,...N) of size N from (5.2.18), results derived by the general approach in Chapter II can be written down as follows. The likelihood equation for p is x ='.* (5.2.22) where N x = Z xi/N and p* is defined by (5.2.20). i-=1 Denoting this estimate as p., the asymptotic variance is given by Var (4) - Nd(p) (5.2.23) (5.2.24)`42* where [p. is the variance of (5.2.18). It is suggested by (5.2.22) that if tables be made available for means p*'s for sufficiently close values of p, we can have a ready solution of (5.2.22). For a case of special significance when c = 1, i.e., when only zero observations are truncated, (5.2.22) becomes x = p/(1-e- P) (5.2.25)

-101and the asymptotic variance given by (5.2.23) reduces to Var () = ( - e)/(+1 - (5.2.26) N e C)/(jCJt l -eHere, we present in a somewhat extensive table, Table V, values of ~*(1,t) for the Poisson distribution truncated on the left at c = 1 for values of ~ spaced at suitable intervals. A chart based on this table is also given to facilitate the procedure of estimation. This table can be used to compute Var (a) by using Formula (5.2.23) or Formula (5.2.24). In case Formula (5.2.a3) is used, d)* do can be approximated by the finite difference ratio In case Formula (5.2.24) is used, the relationship for use is 4 (1, I = o*(1, 0)[1 + A - O*( 1, ")]. (5.2.27) Tables for 4*(c,t) of the Poisson distribution truncated on the left have been also given for various values of c and A. 5.2.7 For a slightly different problem, where in a sample from a complete Poisson distribution, the frequencies for some lowest "counts" are missing, Rider (1953) suggested a method of estimation which uses first two moments of the complete Poisson and leads to a linear equation. The method of two-moments is also applicable in the usual problem of estimation from a sample from singly truncated Poisson and forms a particular case of the general method discussed in Section 3.4 of Chapter III. Proceeding on these lines, one gets the estimate for ii in this case as S2 - cS1 t= S-c () (5.2.28) -, (c-l)N

-102where S1 = xnx and S2 = x2nx 0. To find the asymptotic variance of the two-moments estimate t of At, one gets on simplification, V1 [( m2) + (L+c)2(m* - 2 Var (t) [(m= ~, N[p.* - (c-l)]2 4 2- 2 -.) - 2(.+c)(m* - At*m*)] (5.2.29) 3 2 The asymptotic efficiency of t is then given by Eff (t) = Var (')/Var (t). The case of single truncation on the left at c = 1 is of practical importance. David and Johnson (1952) studied the efficiency for this particular case. The following is the table of Eff.(t) computed by them. TABLE 5.2.5 EFFICIENCY OF TM FOR c = 1.4.5 - 1 1.5 2.0 — 2.5 3.0 4.0 Eff..87.80.75.73.71.71.72 Source: David and Johnson (1952) Thus, the efficiency of TM is not less than 70 percent for c = 1 with u =t.5(.5)4.0. 5.2.8 Following the general approach discussed in Section.1 of Chapter III, a simple ratio-estimate for o can be obtained. in the case of singly truncated Poisson distribution (5.2,1). In this case,

-103axl/ax = x and since 0 = p, we have the following "ratio-estimate" for Ci: o00,=- Z xnx/N (5.2.30) x=c+l when c = 1, i.e., when only "zero" counts are truncated, the estimate takes the form suggested by Plackett (1953): 00'= Zxnx/N. (5.2.31) x=2 The unique unbiased estimate of A linear in the frequencies (ibid., Section 3.2 of Chapter III), is provided in (5.2.30). The exact variance of this estimate is 2 ) 1 Z2 x2P 2 o ( N Nxx (5.2.32) and an unbiased estimate of o2(t) is 00 x2n N'2 /N(N-1) (5.2.33) x=c+l when c = 1, (5.2.32) reduces to o (A') = N[t + 12/(e- 1)] (5.2.34) first derived by Plackett (1953). Plackett computed also the efficiency of p' in this special case. The following table gives the efficiencies of i' relative to j. It can be shown that the efficiency of t' never falls below 0.9536, the minimum value being attained when p = 1.355 (Plackett).

TABLE 5.2.6 EFFICIENCY OF R FOR c = 1 0.5 1.0 1.5 2.0 2.5 3.0 3.5 40o Eff..9693 9559 9539 9586 9662 9743 9815 9872 Source: Plackett (1953) 5.2.9 So far, we have separately discussed the Two Moments Estimate and the Ratio Estimate for A. To make a comparative study of these two simple estimates, let us investigate their amount of bias and relative efficiency. Following Sections 3.1 and 3.3 one gets to order 1/N the amount of bias of t (TM) as follows: b(t) (ka 2- )I NH2 52-3) * *2 where H = - (c-l) a22 = m2 - I and 12 =(m - m2) -c (m2 i - ) 12 5 2 2 For A', however, one has b(',) = 0. (5.2.36) The relative efficiency of R over TM is given by Rel. Eff = Var(t)/Var([i'). The following table gives bias and relative efficiency of TM and R for p =.5(.5)4. Thus, we conclude that one may prefer Ratio Estimate to estimate p. of the Poisson distribution singly truncated on the left at c = 1 because of its simplicity, unbiasedness and high efficienty.

TABLE 5.2.7 BIAS AND RELATIVE EFFICIENCY OF TM AND R for c = 1 ki~ N(Amount of Bias to order 1/N) Var(TM) TM R Var (R).5 -.3935.0000 1.11 1.0 -.6321.0000 1.19 1.5 -.6373.0000 1.27 2.0 -.8647.0000 1.31 2.5 -.9179.0000 1.36 3.0 -.9502.0000 1.37 3.5 -.9698.0000 1.37 4.0 -.9817.0000 1.37 5.2.10 Some cases are likely to arise in which one is aware of the type of truncation, but does not know the point at which truncation occurs. For instance, when lots of manufactured items come for acceptance to a consumer from the producer who has earlier censored items having more than, say, d defects, consumer has to draw a sample or samples from a singly truncated Poisson population with unknown truncation point on the right. Estimation of [ and d thus becomes essential before setting up any acceptance sampling plan. On the basis of a sample xi (i = 1, 2,... N) of size N with observed frequency nx for x (O < x < R, Z nx = N) drawn from p* (x,, d) = (P (d,))-l e (5.2.37) x = O l, 2,.. d

to estimate 4, we choose the Ratio-estjmate R R-1' = xnx/ nx. (5.2.38) x=O x=O The advantage with A' is that besides its simplicity, it does not need the knowledge of the truncation point. To estimate d, the identity m - (d+l) p* - d gives m*- (Ct+1) b-*W 2 - (+ (5.2..9) where p' is given by (5.2.38), Sl = nx nd S2 x nx. 5.2.11 The detailed computation procedure of evaluating the three types of estimates discussed above is illustrated with reference to data collected by Varley (1949) to study population balance in the Knapweed Gall-fly. The table below gives the number of flower-heads (nx) each having exactly x gall-cells (x = 1,2,...) Number of gall-cells in a flower-head (x) 1 2 3 4 5 6 7 8 9 10 Number of flowerheads (nx) 287 272 196 79 29 20 2 0 1 0

-107Assuming the truncated Poisson model: x = 1, 2,... x' (e-1) for the probability of x gall-cells in a flower-head, the problem is to estimate i on the basis of the given data. Maximum Likelihood Estimate: For the data, we get N = 886 S1 = 2023 so that x = 2.2833. Referring to Table V for c = 1, we find the following: 1.9 2.2342 2.0 2.3130 The maximum likelihood estimate is given by that value of p for which A* = 2.2833. By linear interpolation we thus get ^ = 1.9623. The variance of this estimate is estimated from the formula: Var(4) = No where p2 = * (1 + A - *) On computation,.2 = 2.2833 (1 + 1.9623 - 2.2833) = 1.5504 and so, V(19623)2 886 x 1.5504.28

-108Thus, the standard error of L is given by S.E. (i) =0o.002803 = 0.0529 Two Moments Estimate: To compute this estimate of i, we require in addition the value of S2 = Zx2nx =6027 Then the estimate is = S2 6027 - 1 Si 2023 = 1.9792. To compute the variance of t, taking 1.9792 as the estimate for i, we have,* = - = 2.2965 em-* m2 = 4*(l+[l) = 6.8417 = m* - *2 = 1.5678 m= p(b*+tm*) + (l++)p.* 3 2 2 = 22.7571 and m2=[j* + m*+2 + m - 214(1 + l)] + (m* -,m*)(1 + 2kL) = 91.6896. The variance of m is then estimated from the formula: Var(t.) = 1 [(m - m*2) + (1 + )2 - 2(1 + t)(m* - O*) Lj. 2 2 3 2 = 0.003600, so that the standard error is S.E. (t) = 0.0546. The following table summarizes the results obtained:

-109Estimate Value Variance Standard Error ML 1.9623 0.002803 0.0529 TM 1.9792 0.003600 0.0600 R 1.9594 0.002982 0.0546 5.3 Estimation from a Sample for a Censored Poisson Distribution Moore (1952) and Cohen (1954) discussed the problem of estimation of t from a censored sample of the Poisson distribution. Moore gave a simple ratio-estimate and Cohen derived likelihood equations for both singly and doubly censored samples. In this section, we provide a neat and compact likelihood equation for estimation. The amount of bias involved in estimating. by the sample mean after pooling observations of higher counts has been investigated. A suitable chart is provided to suggest as to when one should resort to a finer method of estimation. 5.3.1. Suppose that in a random sample of size N from (5.1.2), we have a record of the number n1 of observations in the right tail defined by (> c) and of the n* observations xi(i = 1, 2,. n*), xi < c) so that N = n* + n1. Results for estimation derived by the general approach in Section 2.3 can be written down in this case as follows: The efficient score for p is = ~ {n* ~ - (N - niv1)} (5.5.1) n where x = Z xi/n* and v1 is the mean of the Poisson distribution truncated ion the left at c= on the left at c.

ll10Thus, the likelihood equation for estimating i is n* x* = No - nlv1 (5.3.2) The asymptotic variance of the estimate t derived from (5.3.2) is 1/I(>) where N dv1 I(G) = - P ) (5.3.3) where P is the probability of the right tail. Equation (5.3.2) does not readily give an algebraic solution and the iterative process of solution has to be resorted to. To facilitate the process of estimation we use Table V. to obtain values of means for values of p. and c spaced at suitable intervals. 5.3.2 A simple estimate of p from a censored sample under consideration can be obtained as n* Z~x. + cn* m = il. (5.3.4) N which, though always an underestimate, may at times, when the magnitude of the bias is small may be useful. The relative bias is b E(m) - P=_( * - c) (5.3.5) where p.* is the mean of the left tail. Here we present in Table 5.3.1 the values of the relative bias expressed as a percentage for various values of p. and c spaced at suitable intervals. Chart 4 based on this table may be also of help to suggest as to when one should resort to a finer method of estimation. Also, before

TABLE 5.3.1 -b = C) c 4 5 6 7 8 9 10.5.000372.000326.000000.000000.000000.000000.000000 1.0.004348.000ooo688.Q0ooo0096.000000.000000.000000.000000 1.5.016l09.003723.000752.000013.000000.000000.000000 2.0.037570.011244.002962.ooo000695.000147.000028.000005 2.5.068305.024781.007971.002296 ooo000597.000141.000030 3.0.106454.044873.016901.005731.001762.000494.000127 3.5.149588.071141.030462.011810.004169.001348.000400 4.0.195368.102571.048856.021189.008405.003063.001030 4.5.241801.137821.071805.034255.015016.oo6068.002270 5.0.287375.175461.09o8660.051093.024418.010798.004431 5.5.331034.214227.128168.071544.036846.017637.oo007858 6.0.372169.253006.160623.094999.052330.026868.012878 6.5.392222.290968.193899.121057.070709.038639.019773 7.0.445659.327507.227610.148995.091659.052955.028736 7.5.477929.362219.261078.178163.114771.069693.039877 8.0.507431.394879.293784.207953.139563.088626.053193 8.5.534296.425390.325334.237822.165527.109411.068572 9.0.558749.453743.355472.267357.192186.131684.10000 9.5.581036.480000.384046.296118.219109.15.5060.104685

0.60 co 0.50 0.40.a 0.30 I~~~~~~~~~~~~~~ I~~~~~~~~~~~~~~~~~~~~~~~~ I-' 0.20 0.10 00, 0 I.0 2.0 30 4.0 5.0 6.0 7.0 8.0 9.0 O Chat 5.3.1 Chat for te Relative Bias of te "Pooled Estimate" for c l Chart 5.3.1 Chart for the Relative Bias of the "Pooled Estimate" for c 4(1)10

-113starting an experiment on "Poisson-counts", Table 5.3.1 and Chart 5.3.1 may be of great help to an experimentor in deciding the value of the count beyond which he need only record not the individual counts, but only the total frequency. Equation of estimation in other types of censored samples may be written down on the same lines as above. 5.4 Estimation With Doubtful Observations While sampling from a Poisson distribution (5.1.2), cases arise in which one is doubtful if the observations in the "zero-class" really come from the Poisson population. For instance, while recording data on counts of minute particles in an experiment, one is doubtful if the "zero-counts" sometimes occur because of failure in the working of the counter. Before actually estimating the parameter, therefore, one has to test if the zerocounts conform to the Poisson distribution under consideration, which, incidentally also tests if the counter works throughout or not. 5.4.1 We take the model for the distribution as Prob {X=x} = x { for x = 0 X = x121 = for x = 1,2,.. e~-l x' On the basis of a sample of size N with nl zero-counts and n* non-zero counts xi (i = 1, 2,... n*, n* = N-nl) one gets the efficient scores for P and t as: nl n* (5.4.2) 1. P3 1

and fl* (x* - A*) (5.4.3) n* xi i=l where x - and * e The estimates therefore are given by, n( -- {5.4.4) N and * =-*x (5.4.5) Now, the hypothesis Ho of interest is p = e (5.4.6) obviously the estimate of A under Ho is given by 0o = X where n* x = Z xXi/N (5.4.7) Therefore, the estimate of 3 under Ho becomes go = e'o (5.4.8) and the Chi-square criterion with 1 df given by (2.4.16) reduces to x2= X (1 + ) (5.4.9) where 2 N(N - %o)2 O _.. x to (1 -l o) and Vo = - _ -i

-115Illustrative Example: Let us illustrate the method of estimation with doubtful observations with reference to data given by Scrase. The problem is connected with the number of dust nuclei in the air and the data give the frequency distribution of the number of drops in a small volume of air that fall on to a stage in a chamber containing moisture and filter. The data are as follows: Number of dust nuclei (x) 0 1 2 3 4 5 6 7 8 Frequency (nx) 23 56 88 95 73 40 17 5 3 For the full distribution, we have N = 400 S1 = 1170 and T = 2.9250 Scrase is of the opinion that this mean (and hence the estimate of the Poisson parameter given by. = x) is slightly high in that a number of zero counts were wrongly rejected as being due to the apparatus not working. The zero-counts are doubtful and so before writing down the estimate, we shall have to test if these zero-counts conform to the full Poisson-data. Under the null hypothesis Ho that zero-counts conform to the full distribution, we have, following the results derived in Section 5.4, 40 =x = 2.9250. Therefore under Ho the estimate of P, the proportion of zero-counts to the total number of counts, is given by to = ee = e.950 = 0.53665

To compute the Chi-square criterion given by (5.4.9) we have 2 _N o)2 O A o4l ( 23) 400(4- 0.055665)2 0.053665 X 0.946335 0.115839 and Vo = - o -1 = 5.028763 so that, the Chi-square criterion with 1 degree of freedom to test H comes out to be 2 2 + 1 X = Xo (1 +V) = 0.1538874 which is not significant, showing thereby that the zero-counts conform to the full Poisson distribution and the estimate of t (obtained to be e% = 2.925) computed from the full data is statistically quite legitimate. This conclusion brings out that the rejections of zero-counts were rightly judged by the experimenter, rightly because the counter had really failed to work then.

CHAPTER VI 6.0 ESTIMATION PROBLEMS FOR THE NEGATIVE BINOMIAL DISTRIBUTION 6.1 Introduction The gpsd with two parameters defined by (2.6.1) becomes Prob IX = x = + x - 1) X/(1)- (6.1.1) x x = 0,1,2,... 0 when f(g, ) = (1 - G) > O. Writing k = X and = Q, (6.1.1) gives the probability for X as: k x k+x-1 k ___ Prob [X = x= y(x,;i,k) = (k+x-) (+k) (+k x = 01,2,..oo (6.1.2) a well-knownr form of the negative binomial distribution. The important properties of (6.1.2) can be summarily stated as follow s: M(t) = [! - (e - 1)]- (6.1.3) The first two central moments are given by y = t ~a = E (~ + k) (6..4) and the coefficients 1 2 take the form:: = 1+ 1 (3+ ~ ) 1J- k 1+k P2 =3 + 1+ 6 ~2 k -117

The recurrence relation connecting central moments (6.1.6) ~r+l = ~2 (d — (6 r 6 ) does not seem to have been noticed before, this follows immediately from (1.1.7). The recurrence relation connecting cumulants is dK Kr+l = K2 r (6 1.7) dkt which is derived in a slightly different form by Guldberg (1935) and Wishart (1949). The distribution function Y (r, ~, k) defined by r Y(r, ~, k) = Z y(x, p, k) (6.1 8) X=O can be reduced to Y(r, t, k) = IX (k, r + 1) where Iz= k.+ k (6.1.9) and X m-l n-l Ix (rm, n) 1 fu (1-u) du B(m,n) o for which extensive tables have been edited by K. Pearson. When k is positive integer, one can use r+k r+k k kx )r+k-x Y(r,~,k) = E ( (-) ( ) +k x=k kt+k _+k = 1 - B(k -y 1, r+k) (6 1.10)

where 0 is given by (6.1.9) and B(ric,,n) is the cumulative binomial probability defined in (4.1.7). For large k, one can use tables of Poisson probabilities, because lim y(x, p., k) = p(x, l) (6.1.11) k -o0 where p(x, ) = e. - x x! For large p and k, however, we have the normal approximation given by lim Y(r,., k) = O(z) (6.1.12) p. - 00, k -> o where z = (r + 1/2 p)/42 and z -Vi2. u2 (Z):'1 f e du 6.2 Estimation of Parameters of Complete Negative Binomial Distribution The negative binomial distribution can be viewed as a compound Poisson distribution, Greenwood and Yule (1920), when the mean of the Poisson distribution follows the gammna distribution. One finds, therefore, its applications in all fields wherever the data under consideration are too heterogeneous to be fitted by a Poisson distribution. For instance, Greenwood and Yule (1920) applied it to accident-data, Fisher (1941) and Bliss (1953) to biological data, Sichel (1951) to psychological data, whereas Wise (1946) found its use in an industrial sampling problem.

-1206.2.1 To estimate H and k on the basis of a random sample of size N with observed frequency nx for x (Zn = N) drawn from'6.1.2), following Section 2.6, the moment-estimates are given by x = p (6.2.1) and S = (p+ k) (6.2.2) k where x = xn /N and 2 Z= n(x-x)2/N. x x The likelihood equations reduce to = (6.2.3) and Tr+l = N log (6.2.4).k+r k where.Tr+1 n xwhich was first derived by Haldane (1941). x > r x Eliminating j from (6.2.3) and (6.2.4), we have to solve for k the equation x + k Z Tr+l = N log ( - ) (6.2.5) k +r The elements of the information-matrix are: I = N 11 - p.2 I 0 (6.2.6) 12 and ~22 E [1 -~ Y(r,, k) + +Np 22 r o ~ (k + r) ] k(p.+k)

-1216.2.2 It is easy to see that the moment-estimate is identical with the likelihood-estimate for the mean Ci and is given by the sample mean x. The estimates for k are however different. The moment estimate for k, though inefficient,is easy to compute, whereas the likelihood estimate, though efficient, is rather difficult to obtain. It is important to know, therefore, as to when one should finally proceed to obtain likelihood-estimate for k. For large j, Sichel (1951) showed the efficiency of the moment-estimate for k to be never less than 0.80. He also showed that the efficiency is minimum for large p. when k = 5.5 and recommended that for values of k > 5.5, it is not necessary to estimate k by the arduous likelihood-estimate. Fisher (1941) examined the efficiency of the moment-estimates and concluded that if i is less than k/9 for any value of k or if k exceeds 18 for any value of >, ihigh efficiency is assured, for intermediate values, however, if the product (1 + k) (k + 2) exceeds 20, the efficiency is satisfactorily high. 6.3 Estimation of Parameters of a Truncated Negative Binomial Distribution Sampford (1955) discussed an application of truncated negative binomial distribution. Taking k and w = k to be parameters, 4+k he gave methods to obtain, "moment-estimates" and " likelihood-estimates" for w and k when zero-observations are truncated and also investigated the efficiency of the moment estimates so obtained. Rider (1955) used three moments and provided simple estimates for k and p = _p. k Following the general approach discussed in Section 2.6, neat and compact likelihood equations can be made available.

-122The probability law of the negative binomial distribution of which zero-observations are truncated can be written as k k -1 k+x-1 x k y*(x, I, k) =[l -( k)kIl (( x > X k: x = 1,2,3,...O (6.3.1) The first two moments about the origin of (6.3.1) then, are 1 -_( k )k (6.3.2) and m* = C* (1 + W + k.). (6.3.3) 2 k 6.3.1 To estimate t and k on the basis of a random sample of size N with frequency nx for x(nx = N) drawn from (6.3.1) the moment equations can be written down as x = la* = II k k (6.3.4) and 2= m* = (1 + ~ + (6.3.5) 2 k where x = Zxn /N and S = Zx N./ Eliminating k from (6.3.4) and (6.3.5), one gets =x Ll_ (-)S 2 - X(TL+l)] (6.3.6) An estimate of g can be obtained by solving this single equation (6.3.6) by using an iterative method. It is easy to see that the

-123estimate of k is also available during the process of iteration, because one has S 2 (6.3.7) 6.3.2 Following the general approach discussed in Section 2.6, the likelihood equations for j and k in this case can be written down as: x- (6.3.8) and Z Tr+l N4 1 ( X ) (6.3.9) rco k+r lo where Ip* is defined by (6.3.2) and Tr+l = Z nx. The elements of x> r the information matrix are: I N 2. = N2 (6.3.10) 11 -' 2 E-2 2 NI* N a:2 1 (6.3.11) 12 Ad- 11 2f C+k and 22 2 +k r o k+r) ]+ [ log + (p2k kMA -) 1 (6.3.12) + ICk+k - where t2 and pt* are the variances of (6.1.2) and (6.3.1) respectively. 2 6.3.3 Equating the first three sample-moments to the corresponding theoretical moments about origin of (6.3.1) Rider (1955) gets simple estimates for p. and k as follows: 2S2 -s (s + s ) 2 1 2 3 (6.3.13) ls2- S

-124and 2S - S (S +s ) k'= 2 1 2 3 (6.3.14) S (S + S3 )+S2(Sl+S2 ) where S. = Zxi n/N i = 1,2,3. 6.4 Estimation when k is Known The negative binomial distribution is essentially a distribution with two parameters. Cases, however, arise in which the parameter k is known and only the other parameter has to be estimated. This is taken up in this section. When k is a positive integer, writing, = k, the negative binomial law given by (6.1.2) becomes y(x, 0, k) = (k + x - 1) k (1_0)x (6.4.1) x = 0,1,2....oo (6.4.1) can be looked upon as the probability law for the number X = k + x of independent successive trials required to get k successes when Z is the probability of success at each trial and can be used in sampling problems in which sampling is done until a certain number k of "character-bearers" is available in the sample. It has been used by Haldane (1945) in biology and by Craig (1953) among others in industrial problems. 6.4.1 On the basis of single observation on X, X = k+x, the likelihood estimate for, can be obtained as 7,k k k Y'k+x X (6.4.2)

-125It can be seen that ~ is biased for 0. Following Section 3.2, however, a unique unbiased ratio-estimate for 0 can be obtained. In this case, ax l/ax = xl(k+x-l) and since 0 = 1 -, we have the ratio-estimate for 0 as 0 = k -l X 1 k - 1 (6.4.3) k+x-1 X' provided first by Haldane (1945) and shown to be the only unbiased estimate for 0 by Girshick, Mosteller and Savage (1946). To estimate 0 on the basis of a random sample X = k + x of size N with frequency nx for x drawn from (6.4.1), the following estimates are available: Maximum Likelihood Estimate: Following Section 2.1, the likelihood equation (which is the same as moment equation) can be written down as x = i where x = Zxnx/N and > is the mean of (6.4.1) given by 4 ~~= k(l-0~)/0~ ~(6.4.4) one gets the estimate for 0, therefore, as " = k k+7 (6.4.5) The asymptotic variance and the amount of bias to order 1 of 0 N can be easily obtained as Var () (1 (6.4.6) Nk and b(j) = Nk (6.4.7)

-126The following table gives the amount of bias and standard error of, (ML) for k = 1, 2, 3 with 0 =.01,.05,.10,.25, 50,.75. TABLE 6.4.1 Bias and Standard Error of ML for N = 100 Amount of Bias Standard 100 X Bias k to order 1 Error S.E. N. 01.000099.000990 10.00.05.000475.004873 9.75.10.000100.oo9487 9.49.25.001875.021660 8.66.50.002500.035355 7.07.75.001875.037500 5.00.01.000050.000704 7.04.05.000238.003446 6.89.10.000450.006708 6.70 2.25.000938.015300 6.13.50.000125.025000 5.00.75.000938.026510 3.54.01.000033.000574 5.75.05.000158.002813 5.63.10.000625.005477 5.48 3 25.000625.012490 5.00.50.000833.020411 4.08.75.000625.021660 2.89 Ratio Estimate: Following Section 3.1, the unique unbiased linear estimate for 0 can be written down as 00 = 1- +x- = ( x )n (6 4.8) N x=1 k+x-1 x (6.4.8) The exact variance of this ratio-estimate O' is a (f ) = _ [ _ ( )2 y(x,0,k) -(10)2] (6.4.9) N x=l k+x-1

-127and its unbiased estimate is given by co 222 [x ( k+x -) n - N(1-') ) ]/N(N-1). The efficiency of 0' is then obtained as Eff.(O') = Var (~')/a 2(0') The following table gives the efficiency of 0' for k = 1,2 with 0 =.01,.05,.10,..25,.50,.75. TABLE 6.4.2 Efficiency of R.01.05.10.25.50.75 1 |.010.050.100.250.500.750 2.136.221.289.442.647.830 Also, if one notices that S1 = xnx follows the negative binomial law with parameters 0 and Nk, one more unbiased ratio-estimate can also be written down for 0 on the lines of the estimate given for single observation by (6.4.3) as,'' = _Nk - 1 Nk + S1 -1 = k -1/N~ (6.4.10) k + x-l/N The exact variance of 0" is then given by oo 2 2( ) = Z Nk+S -I) Y(SL, 0, Nk) - 02 (6.4.11) S1=o

-128Also the asymptotic variance of'" can be written down as Var (") = d2(1-~) 1 Nk Var( j) (1 1)2 1 Nk (6.4.12) which indicates that'" is highly efficient. It is also unbiased and easy to compute. Hence it may be used with advantage to estimate 0 because the maximum likelihood estimate 0 given by (6.4.5), though efficient, has some bias whereas the ratio-estimate O' given by (6.4.8), though unbiased, involves serious loss of efficiency. 6.4.2 Cases can arise in which one has to estimate 0 on the basis of a sample drawn from truncated negative binomial distribution. The negative binomial distribution (6.4.1) truncated say, on the left at c can be written down as: y*(x,;,k) = [Y*(c,j,k)] l (k+x-l) k(1_0)x (6.4.13) X C C;C+l...oo where Y*(r+l,0,k) = 1 - Y(r,O,k). The first two moments about origin of (6.4.13) are * =,*(c, 0, k) = k(l-0). Y*(c-l, O, k+l) (6.4.14) Y*(c, O, k) and m* = m* (c, 0, k) = It*(c, 0, k) [l+Pl*(c-l,j,k+l)]. (6.4.15) 2 2 To estimate 0 on the basis of a random sample of size N with frequency n for x drawn from (6.4.13), results derived by the X

-129general approach in Section 2.2 can be written down as follows with proper substitutions in this particular case. Maximum Likelihood Estimate: The likelihood equation for, is x-* = ~* (6.4.16) where x* = I xn /N and W* is defined by (6.4.14). Denoting this estimate as t, its asymptotic variance is given by Var(O) = -(l/ (d*) (6.4.17) Nji* (6.4.18) where 2* is the variance of (6.4.13). As the equation (6.4.16) does not readily give an algebraic solution, one may use an iterative process of solution. However, one can have a ready solution of (6.4.16), if tables be nmade available for,i's for sufficiently close values I. Here we present in Table VI values of t* for the special case of c=l with k=l, 2, 3 at suitable intervals of j). When c=l,(6.4.16) reduces to * k0(-) (6.4.19) This table can be used to compute Var(') by using either (6.4.17) or (6.4.18). In case (6.4.17) is used, can be approximated... d can be approximated by the finite difference ratio h-. In case Formula (6.4.18) is used, the relationship for use is,u* (c,~,k) = jl* (c,Ok) [1+,t* (c-l,0,k+l) —t* (c,Ok)]. 2

-130Ratio Estimatel: The simple ratio estimate for ~ of (6.4.13) can be written down as o00 - (k+x —) n (6.4.20) Nx=c+lk x (6.4.20) provides the unique unbiased linear function of frequencies for estimating ~ and its exact variance is given by 2 x 22 (2(' ) 1 Nx )2x y (x,,k) - (1-)2] ~~~~x=c+l k~~~x-(6.4.21) We have already seen, however, for the complete negative binomial distribution that this type of ratio-estimate, though unbiased, involve serious loss of efficiency. In the case of truncated distribution also, therefore, it is expected to have a similar pattern of low efficiencies. Two-Moments Estimate: Following Section 3.4, we get for the distribution (6.4. 1) m* - cl* G=1- = H -!_o( )Ho (6.4.22) where l* and ni* are defined by (6 4.14) and (6.4.15) respectively 2 and Hol and Hl reduce, to ol 11 H =k + * ol H k=k.* +m* 11 2 (6.4.22) gives then = (k + l),* - (c-l)k (6.4 23) m* + (k-c+l),i* - (c-1)k 2

-131so that a simple estimate for 0 can be written down as t (k+l) S1 -(c-l)kN S +(k-c+l)S -(c-l)kN (6.4.24) 2 1 where S = x n i =1,2. 1 x When c = 1, (6.4.24) reduces to t = (k+l) S1 S2+ kSl (6.4.25) S2+ kS1 The asymptotic variance of t given by (6.4.25) reduces to Var(t) - 1 (a* + - 2( a* ) (6.4.26) NH2 11 22 12 where H = m* + k4* a* = (k+l) 2* 11 2 cr* = (m* - m*) + k + 2k (m* - *m*) 22 4 2 2 3 2 and C*= (k+l)[(m* - [*m*) + (k a*)] 12 3 2 2 where m* is the r-th theoretical moment and a* and 4* are the mean r 2 and variance respectively of (6.4.13) with c = 1. Also one gets to order N the amount of bias of t for c = 1as b(t) = () N* - * H 22 12 where H, a*, a* are defined above. 22 12

-1326.5 Homogeneity and Combined Estimation, k Known Consider a situation in which m different machines in a section of a factory are producing items of some kind. One has to examine if these different machines are homogeneous in respect of the quality (0) of the product as judged by the proportion defectives, and if so, to make a combined estimation of ). On the basis of inverse binomial sampling applied to lots of fairly large number of itemns from a machine corresponding to each lot from every machine, one will be having an observation of number of items (k+x) that had to be inspected to get k defectives. Let Nj be the numm J ber of lots inspected from j-th machine, so that N = Z N. denotes the j=l J total number of lots inspected. m 6.5.1 Thus, on the basis of a random sample of size N = Z N j=l j from m distributions characterized by the probability law: ~~k+x-1~)x (6.5.1) ( k-l ) j (1- j j = 1, 2,...m where 0j denotes the quality of product from j-th machine. Following Section 2.5, we have the j-th "efficient-score" N *j= - (X - (0) (6.5.2) where pj(0j) is the mean of the j-th distribution. The elements of the information matrix are N dj Nj j (j) IX= - l d1 (- 1 - 0) 2 (6.5.3) IJj - O0 (; jit) where'2 (hj) is the variance of the J-th distribution. The hypothesis of homogeneity is Ho: 81 = ~2 ='' m'

-133If the hypothesis H is true, the value may be denoted by 0 0 and the efficient score and the information with respect to 0 are given by N ( - ) (6.5.4) where x = Z N. x./N j=l and m = Z NJ j(O)/N = (1 -) j=l'J.J m dj. = 1 N. j=l do (1)2 j=l j 2j = NK 02(^)g, (6.5-5) Solving = O, we have 0 = k (6.5.6) k+x A test of the homogeneity hypothesis H is then given by the statistic [ ()] / () (6.5.7) j=l J Ji m = Z N. [x. - ()] 2j/() (6.5.8) j=l j J which is asymptotically distributed as a chisquare with (m-1) degrees of freedom if H is true. 0o 6.5.2 We will illustrate the computation procedure with reference to data in the table below, which gives, for 10 (m=10) pairs of pages from Tippett's random sampling numbers, the observed distribution of the number (x) of even integers between two consecutive zeros (k=l) characterized by the model: e(-(l_)x x= O

-134For the distribution of x, we have to test if the pairs of pages are homogeneous in respect of 0 and if so, to get a combined estimate of 0. FREQUENCY FOR PAGES No. of even j=1 2 3 4 5 6 7 8 9 10 Total inte - gers (x)Pair:2 4 6 8 10 12 14 16 18 20 j & &. & & & & & & & & x 3 3 5,7 9 11 13 15 17 19 21 0. 67 71 53 65 79 71 59 70 60 66 661 1 44 58 50 47 59 59 51 54 52 47 521 2 50 32 35 43 26 44 46 44 48 44 412 3 42 25 23 33 27 37 29 36 25 21 298 4 26 23 28 26 27 30 25 29 23 30 267 5 19 28 21 19 24 21 24 15 22 20 213 6 16 17 18 17 18 15 17 5 16 17 157 7 14 20 12 14 20 10 6 8 21 16 141 8 7 9 11 13 13 8 15 9 11 11 107 9 5 5 9 9 11 7 9 5 5 8 73 10 9 4 11 5 10 5 3 7 3 9 66 11 6 9 7 7 4 9 2 6 7 6.3 12 3 3 1 4 2 1 3 2 5 3 27 13 1 2 2 4 5 3 3 2 3 3 28 14 0 2 4 3 5 2 2 4 6 4 32 15 4 3 2 1 2 2 6 2 3 1 26 16 4 2 1 3 1 3 1 3 0 0 18 17 1 2 2 2 3 2 0 0 1 2 15 18 1 2 0 0 2 1 1 1 1 0 9 19 2 0 2 1 2 0 2 2 1 1 13 20 0 1 2 0 0 1 2 0 1 1 8 21 0 0 1 1. 0 0 0 0 0 0 2 22 0 0 0 0 1 0 1 0 0 1 3 23 0 0 0 0 0 2 1 3 0 1 7 24 1 0 0 0 0 0 0 1 2 0 4 25 0 0 1 0 0 0 0 1 0 0 2 26 2 1 0 0 0 0 0 0 0 3 27 0 0 0 0 0 0 0 0 0 0 0 28 0 0 0 0 0 0 1 0 0 0 1 Total.... Nj 324 319 296 317 342 333 309 309 315 313 3177 1242 1210 1252 1222 1378 1194 1225 1138 1247 1223 xj 3.8333 3.7931 4.2297 3.8549 4.0146 3.5856 3.9644 3.6828 3.9587 3.9073

-135For the above data, we have the following: j N xJ j N. x. 1 324 3.8333 6 333 3.5856 2 319 3.7931 7 309 3.9644 3 296 4.2247 8 309 3.6828 4 317 3.8549 9 315 3.9587 5 342 4.0146 10 313 3.9073 10 N =jE Nj =33177 s = xn = 23 6 1 x so that x = 1 =3.8798. N The maximum likelihood estimate of-0 under the hypothesis of homogeneity is then given by __ 1 = 1_ = 0.2049. 1 + x 4.8798 To test if ~ is the same for different pai's.b'of pages, we have to c ompute E Njxj j(2), 2 N (X - )2 90.974 and 2 g2 so that x 2 E Nj (Xj-)2 = 90.974 4.8045 12 (18,935 which with 10-1 = 9 degrees, of freedom is not significant. The pairs of pages of Tippett's random sampling numbers can thus be regarded as hocogeneous in respect of the distribution of even integers between two consecutive zeros, as they-.should be expected,

CHAPTER VII 7.0 ESTIMATION PROBLEMS FOR TIIE LOGARITHMIC SERIES DISTRIBUTION 7.1 Introduction The gpsd defined by (1:.1.4) becomes Prob[X=x}= ox -x log (1-0) (7.1.1) x =,2,3,.. oo when f(O) = -log(l-@). Writing a = _log(-w)' (7.1.1) gives the probability law for X as: Prob {X = x} = p(x, ) (71.2) x = l,2,3Y.... a well-known form of the Logarithmic series distribution. The first two moments about origin of (7.1.2) can be obtained as: =0 (7.1.3) and m = L-.~ (7.1.4) 2 1-0 7.2 Estimation from a Sample for Complete Logarithmic Series Distribution Applications of logarithmic series distribution have been discussed among others by Fisher (1943), Williams (1943, 1944), Harrison (1945) and Kendall (1948). Problems of estimation, however, do not seem to have been thoroughly investigated. Following the general approach discussed in Chapters II and III, we provide in this section different estimates for the parameter 0 of the logarithmic series and investigate their efficiency and the amount of bias in -136

-137certain special cases. A table of mean-values of the logarithmic series is provided to obtain the maximum likelihood estimate with facility. 7.2.1 To estimate Q by likelihood on the basis of q random sample of size N with frequency n for x(l_ x _<, Z nx = N) drawn from (7.1.2), results derived by the general approach in Section 2.1 can be written down as follows: The likelihood equation for 0 is x = (7.2.1) where x = Z xn /N and i is defined by (7o1.3). Denoting this estimate as Q, its asymptotic variance is given by Var(%) N- (7o2.2) N~2 where 12 is the variance of (7.1.2). (7.2.1) suggests that if a table be made available for means 1i's for sufficiently close values of i, we can have a ready solution of (7.2.1)'-...Here"; we present a. numerical Table VII for the argument 0 =.01 (.01).99. This table can be used to compute Var(Q), because ie=12' (1 - Al),(7.2.3) 7.2.2 The expression (7.1.4) for m2 gives 0,=~ 1 -'9 m2 (702.4) The two-moments estimate for 0 can then be written down as t = 1 -S1 ~~S102~~~~ ~(7.2o5) where S. =Z xin i = 1,2. 1 X

-138The asymptotic variance of t is given by Var(t) = 1 [a-1 2(1-@) a12 + (1-@)2 22] (7.2.6) 2 12 Nm2 where 2 all = m2 - ~ a12 = m3 -'Lm2 (7.2.7) and 2 a2 - m - m2 22 4 2 where m denotes the ith moment of (7.1.2) about origin. The efficiency of t is then obtained as Eff(t) = Var(a) Var(t) The following table gives the efficiency of t for @ =.10,.50,.90. TABLE 7.2.1 Efficiency of TM Q.10.50.90 Var( 6.228.449,488 Var(t) 7.2.3 Following the general approach discussed in Section 3.13, a ratio-estimate can be obtained for Q of (7.1.2). In this case, since ax/ax = X = 1 + 1, the ratio-estimate for 0 can x-l x-l be written down as 1 x (7.2.8)1 1-) + N x. - (7. 2.8)

-139(7.2.8) provides the unique unbiased estimate of Q linear in the frequencies. The exact variance of this estimate is 2 00 2 2(0') [ -(~_ _ ) P - _ (7.2.9) N x=2 x and an unbiased estimate of C (@') is 00 2 [ (x1) nx - NO' ]/N(N-1). (7.2.10) x=2 x The efficiency of 0' is then obtained as Eff(') = Var, a (@') The following table gives the efficiency of 0' for 0 =.10,.50,.90. Efficiency of R 0.10.50.90 Var($) a2(G').895.447.057 7.2.4 One more estimate for 0 can be obtained when one notices that the expression (7.1.3) for mean p = x can be written down as o = 1 -_ 1(7.2.11 because Pz a=. (7.2.12) The identity (7.2.11) suggests an estimate for 0 as n1 I" = 1 - 1 (7.2.13) Si with Var(0'') = 1 - + 2 Var [ 11 2(1-0) a12 + (1-0) 221 (7.2.1.4) Np2

-140where a11 = P1(1-P) ~12 = P(1-) (7.2.15) and 2 a22 = m2 -. ~ The asyptotic efficiency of Q'' is then obtained as Eff ("'') =ar(O) Var(@' ) The following table gives the efficiency of 0'" for 0 =.10,.50,.90. Efficiency of "'.10.50.90 Var(.). Var(0'1).983.897.739 var (O'~' 7.2.5 So far, we have separately discussed the simple estimates denoted by t, 0' and "''. To make a comparative study of these estimates, let us investigate their amount of bias and relative efficiency. It can be easily deduced that the amounts of bias to order 1 of the estimates t and G'' are N b(t) =.[(1-) a22 _ B(t= (7.2.16) wher[(1 22-0 a N where a22 and a12 are defined by (7.2.7).

and b(")_ N= 2 [(1-Q) a22 2] N where a22 and al2 are defined by (7.2.15). As regards O', it is known that it is unbiased. The following table gives B(t), B(Q'') and also relative efficiency of these estimates with respect to the ratio-estimate 0' for @ =.10,.50,.90. TABLE 7.2.2 Comparison of the Estimates N(amount of bias to Var(O') Var(Q') O order 1) Var(t) Var(O't ) N t 0''.10 0.9128 0.0948 0.255 1.o98.50 1.7329 0,3466 1.003 2.006.90 0.9279 0.2303 8.564 12.973 Thus, it is easy to see that the estimate Q"' may be used with advantage to estimate the parameter 0 of the logarithmic series because of its simplicity, small bias and high efficiency. 7.2.6 The detailed computation procedure of evaluating the four types of estimates discussed above is illustrated with reference to the logarithmic series data due to Williams. The following table gives the distribution of 1534 biologists according to the number of research papers to their credit in the Review of Applied Entomology, Vol. 24, 1936.

-142No. of papers per author (x) 1 2 3 4 5 6 7 8 9 10 11 No. of authors (rnx) 1062 263 120 50 22 7 6 2 0 1 1 Maximum Likelihood Estimate: For the data we get N = 1534 S1 = 2379 so that x = 1.5508. Referring to Table VII we find the following:.56 1.5503.57 1 5706, The maximum likelihood estimate is given by that value of 0 for which = 1.5508. By linear interpolation, we get ~ = 0.5602. To compute the variance of 0, taking 0.5602 as the estimate for Q, we require log e ~ = -l0og (l-Q) = 1.21738 10 = =- 1.5507 m = = 3.5259 2 2 = m2 - 1 = 1.1212, Then the variance of 0 is estimated from Var(Q) - 0 = 0.000182 2so that S ( so that S.E. (0) = 0.0135.

-143Two-Mbments Estimate: To estimate this estimate of G, we require in addition the value of 2 S2 = Z x n = 5439, 2 x Then, the estimate is S1 t = 1 - 0.5626. S2 To compute the variance of t, taking 0.5626 as the estimate for @, we have with usual symbols, = 1.2093 = 1.5555 m = 3.5562 2 2 = 1.1366. Also, m3 = Im2 + 1 -2 += 12.7045. m4 = im3 t ( m3 2 Amp) ( t 1 42 (m +2) + _ ( 1) = 79.0059 so that 11= = m2 = 1.1366 12 3 = -m2 = 7.1728 2 22 = m4 m2 = 66.3593. Then, the variance of t is estimated from 1 2 Var(t) = [a1 - 2(1-G) a12 + (14-) a221 Nm2 = 0.000390 so that S.E.(t) = 0.0197.

-144Ratio Estimate: The ratio estimate for G is given by o 1' = (1 1 E x N N x=2 x-l = 0.3077 + 0;2269 = 0.5346. The variance of 0' is estimated from the formula: 00 x Var(Q') = 1 @(1-G) + a E Z 2 N x=1 X Approximating _0x 20 x Z E 2 =2 0.6310 x=l X x=l x Var(gQ) = 0.000450 so that S.E. ("II) = 0.0212. Alternative Estimate: The estimate based on n1 and S1 is given by nl 1062 =" 1 _ n 1 1062 S1 2378 = 0.5536. To compute variance of Q', using usual symbols we have a = 1.23986 P1 = 0.23986 = 1.5376 m2 = 3.4444 ~2 = 1.0802 so that a11 = P1(1-P1) = 0.2153 1 P(1 - (- ) )=-0'13690 1 2 1 a2 = m2 - = 1.0802.

Then the variance of'' is estimated from Var(') N 2 [11 - 2(1-) 12 + ( 1-6) - 22 = 0.000279 so that S.E.("'') = 0.0167. The following table summarizes the results obtained. Estimate Value Variance Standard Error ML 0.5602 0.000182 0.0135 TM 0.5626 0. 000390 0.0197 R 0.5346 o.oo000450 0.0212 WI " 0.5536 0.000279 0.0167 7.3 Estimation from a Sample for Truncated Logarithmic Series Distribution Following the general approach discussed in Chapters I, II, and III, the results for the estimation of the parameter @ can be written down on the basis of a sample drawn from a truncated logarithmic series distribution.

TABLES I - VII -146

-147TABLE I - OF SINGLY TRUNCATED BINOMIAL DISTRIBUTION ON THE LEFT AT c=l (Zero- Observations Truncated) n = 3(1)15; i =.00(.01).99 it 00 01 02 03 04 05 06 07 08 09 n = 3 00 000000 336689 340098 343536 347029 350570 354158 357795 361481 365217 10 369004 372842 376733 380677 384675 388727 392835 396999 401220 405499 20 409836 4i4233 418690 423209 427789 432432 437139 441911 446748 451651 30 456621 461659 466766 471943 477190 482509 487900 493364 498902 504515 40 510204 515969 521812 527732 533732 539811 545971 552212 558534 564940 50 571428 578001 584658 591401 598229 605144 612145 619233 626409 633673 60 641026 648466 655996 663613 671321 679117 687002 694975 703037 711187 70 719424 727749 736160 744657 753239 761905 770653 779484 788395 797384 80 806451 815594 824810 834098 843455 852878 862366 871915 881523 891186 90 900901 910664 920471 930319 940203 950119 960061 970026 980008 990001 n= 4 o00 000000 253781 257626 261841 265510 269551 273659 277835 282080 286396 10 290782 295241 299772 304377 309056 313812 318643 323553 328540 333607 20 338753 343981 349290 354681 360156 365714 371357 377086 382900 388801 30 394789 400864 407028 413281 419622 426053 432574 439185 445885 452677 40 459559 466531 473594 480748 487991 495325 502749 510262 517864 525555 50 533333 541199 549151 557189 565311 573518 581807 590177 598627 607157 60 615763 624446 633203 642033 650933 659903 668939 678041 687206 696432 70 705716 715057 724453 733900 743397 752941 762530 772161 781831 791539 80 801282 811057 820862 830694 840551 850430 860330 870248 880182 890130 90 900090 910060 920037 930022 940012 950006 960002 970001 980000 990000 n = 5 00 000000 204040 208162 212365 216652 221025 225483 230027 234660 239382 10 244194 249098 254098 259181 264364 269641 275014 280484 286050 291715 20 297477 303339 309300 315362 321523 327785 334148 340611 347176 353841 30 360607 367474 374441 381508 388675 395941 403304 410766 418324 425978 40 433727 441569 449503 457529 465644 473848 482138 490513 498971 507510 50 516129 524825 533596 542440 551356 560340 569390 578504 587680 596916 60 606207 615554 624952 634399 643893 653432 663012 672632 682289 691981 70 701705 711459 721241 731049 740880 750733 760605 770496 780402 790323 80 800256 810200 820155 830118 840088 850064 860046 870032 880022 890014 90go 900009 910005 920003 930001 940001 950000 960000 970000 980000 990000

TABLE I (CONT'D) it 00 01 02 03 04 05 06 07 08 09 n= 6 00 000000 170882 175196 179611' 184126 188745 193467 198295 203229 208270 10 213420 218680 224049 229530 235123 240828 246646 252567 258623 264782 20 271056 277443 283944 290559 297287 304128 311082 318146 325322 332607 30 340001 347502 355109 372820 370634 378550 386564 394676 402884 411185 40 419576 428056 436622 445271 454002 462811 471696 480653 489681 498777 50 507936 517158 526439 535775 545165 554605 564093 573626 583201 592816 60 602468 612154 621872 631620 641396 651197 661021 670866 680731 690613 70 700511 710422 720327 730283 740229 750183 760145 770114 780088 790068 80 800051 810038 820028 830020 840014 850010 860006 870004 880002 898001 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n= 7 00 000000 147200 151659 156236 160932 165748 170686 175747 180933 186243 10 191680 197243 202934 208753 214700 220776 226979 233312 239772 246359 20 253073 259914 266879 273968 281179 288512 295963 393532 311217 319015 30 326924 334941 343064 351291 359618 368042 376561 385172 393871 402654 40 411520 420464 429483 438574 447723 456957 466243 475587 484986 494437 50 503937 513482 523071 532699 542364 552063 561794 571553 581340 591151 60 600985 610838 620710 630599 640502 650418 660347 670286 680234 690190 70 700153 710122 720097 730076 740059 750046 760035 770026 780019 790014 80 800010 810007 820005 830003 840002 850001 860001 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n = 8 00 000000 129441 134015 138724 143570 148553 153676 158940 164345 169892 10 175582 181416 187393 193513 199771 206183 212731 219420 226248 233215 20 240319 247557 254928 262429 270058 277813 285689 293684 301796 310020 30 318352 326790 335330 343967'352699 361520 370427 379415 388482 397623 40 406833 416110 425448 434845 444297 453800 463350 472944 482580 492253 50 501961 511700 521469 531265 541085 550926 560788 570667 580562 590471 60 600393 610327 620270 630221 640180 650146 660118 670094 680075 690059 70 700046 710035 720027 730020 740015 750011 760008 770006 780004 790003 80 800002 810001 820001 830000 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000

TABLE I (CONT'D):' 00 01 02 03 04 05 06 07 08 09 n=9 00 000000 115630 120299 125120 130119 135226 140513 145958 151561 157323 10 163244 169324 175561 181957 188508 195215 202075 209087 216247 223554 20 231005 238596 246324 254186 262178 270295 278534 286890 295358 303945 30 312615 321394 330267 339229 348276 357402 366604 375876 385215 394615 40 404072 413583 423143 432748 442396 452082 461803 471556 481338 491146 50 500978 510832 520704 530594 540498 550416 560346 570287 580236 590193 60 600157 610127 620102 630082 640065 650051 660040 670031 680024 690018 70 700014 710010 720007 730005 740004 750003 760002 770001 780001 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n= 10 00 000000 104583 109333 114253 119343 124607 130043 135654 141440 147400 10 153534 159841 166320 172970 179787 186770 193916 201222 208683 216297 20 224058 231963 240006 248184 256489 264981 273465 282125 290891 299758 30 308721 317773 326911 336127 345417 354776 364199 373680 383216 392802 40 402433 412106 421817 431562 441339 451143 460972 470823 480695 490584 50 500489 510407 520338 530279 540229 550187 560152 570123 580099 590079 60 600063 610050 620039 630030 640023 650018 660013 670010 680008 690006 70 700004 710003 720002 730001 740001 750001 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n= 11 00 000000 095546 100367 105375 110570 115955 121531 127297 133253 139398 10 145732 152207 158957 165843 172908 180146 187555 195129 202864 210754 20 218794 226978 235299 233751 252329 261024 269832 278745 287757 296861 30 306052 315322 324667 334080 343556 353090 362676 372310 381988 391704 40 401456 411240 421052 430889 440749 450628 460524 470436 480361 490298 50 500244 510199 520162 530131 540105 550084 560067 570053 580041 590013 60 600025 610019 620015 630011 640008 650006 660004 670003 680002 690002 70 700001 710001 720000 730000 740000 750000 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000

-150TABLE I (CONT'D) 00 01 02 03 04 05 06 07 08 09 n= 12 00 000000 088016 092901 097989 103282 108781 114486 120398 126515 132837 10 139359 146080 152997 160105 167399 174874 182525 190346 198330 206469 20 214758 223159 231736 240445 249256 258178 267205 276328 285542 294838 30 304211 313653 323159 332723 342339 352002 361708 371452 381230 391038 40 400873 410731 420609 430506 440419 450345 460283 470231 480188 490152 50 500122 510098 520078 530061 540048 550038 560029 570023 580017 590005 60 600010 610007 620005 630004 640003 650002 660001 670001 680001 690000 70 700000 710000 720000 730000 740000 750000 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000'880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n= 13 00 000000 081647 086588 091751 097135 102742 108571 114622 120892 127380 10 134082 140993 148110 155486 162935 170630 178505 186550 194943 203124 20 211635 220283 229061 237959 246970 256084 265293 274591 283968 293419 30 302935 312511 322141 331819 341540 351299 361091 370913 380762 390632 40 400523 410431 420353 430289 440234 450190 460153 470122 480097 490077 50 500061 510048 520037 530029 540022 550017 560013 570010 580007 590002 60 600004 610003 620002 630001 640001 650001 660000 670000 680000 690000 70 700000 710000 720000 730000 740000 750000 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n = 14 00 000000 076188 081183 086415 091885 097594 103542 109725 116143 122790 10 129663 136755 144060 151572 159282 167181 175261 183513 191927 200493 20 209201 218041 227004 236080 245260 254535 263896 273336 282846 292419 30 302048 311728 321453 331217 341015.350843 360698 370575 380472 390385 40 400314 410254 420205 430164 440131 450104 460082 470065 480051 490039 50 500030 510023 520018 530013 540010 550008 560006 570004 580003 590001 60 600001 610001 620001 630000 640000 650000 660000 670000 680000 690000 70 700000 710000 720000 730000 740000 750000 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000

-151TABLE I (CONT'D) 00 01 02 03 04 05 06 07 08 09 = 15 00 000000 071458 076502 081800 087353 093160 099221 105533 112091 118892 10 125927 133191 140676 — 148371 156269 164357 172627 181066 189665 198411 20 207291 216302 225425 234654 243977 253386 262872 272427 282043 291713 30 301431 311191 320986 330814 340669 350548 360446 370362 380292 390235 40 400188 410150 420119 430094 440073 450057 460044 470034 480026 490020 50 500015 510011 520008 530006 540004 550003 560002 570002 580001 590001 60 600001 610000 620000 630000 640000 650000 660000 670000 680000 690000 70 700000 710000 720000 730000 740000 750000 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000oooo 90 900000gooooo 920000 930000 940000 950000 960000 970000 980000 990000

-152TABLE II -OF SINGLY TRUNCATED BINOMIAL DISTRIBUTION ON THE LEFT AT c=2 n n = 3(1)15; v =.00(.01).99 itr 00 01 02 03 04 05 06 07 08 09 n = 3 00 000000 667771 668916 670066 671232 672413 673611 674825 676056 677305 10 678571 679856 681159 682482 683823 685185 686567 687970 689394 690840 20 692308 693798 695312 696850 698413 700000 701613 703252 704918 706612 30 708333 710084 711864 713675 715517 717391 719298 721239 723214 725225 40 727273 729358 731481 733643 735849 738095 740385 742718 745098 747525 50 750000 752525 755102 757732 760417 763158 765957 768817 771739 774725 60 777778 780899 784091 787356 790698 794118 797619 801205 804878 808642 70 812500 816456 820513 824675 828947 833333 837838 842466 847222 852113 80 857143 862319 867647 873134 878788 884615 890625 896825 903226 909836 90 916667 923729 931034 938596 946428 954545 962963 971698 980769 990196 n= 4 00 000000 501674 503388 505127 506895 508693 510523 512384 514277 516204 10 518164 520160 522191 524258 526363 528505 530687 522909 535171 537476 20 539823 542214 544650 547132 549663 552239 554866 557543 560272 563055 30 565891 568784 671734 574742 577810 580939 584131 587388 590711 594101 40 597561 601092 604695 608373 612128 615960 619873 623867 627946 632111 50 636364 640707 645142 649671 654298 659023 663849 668771 673814 678957 60 684210 689576 695057 700656 706373 712213 718176 724266 730485 736834 70 743314 749932 756684 763575 770606 777778 785092 792549 800151 807898 80 815789 823826 832006 840331 848797 857404 866149 875028 884040 893178 90 902439 911816 921302 930891 940573 950338 960177 970076 980023 990003 n = 5 00 000000 402014 404080 406184 408330 410520 412756 415037 417366 419743 10 422170 424648 427178 429761 432399 435093 437845 440655 443525 446457 20 449452 452511 455637 458830 462092 465426 468831 472311 475867 479500 30 483212 487006 490883 494844 498891 503027 507253 511571 515983 520491 40 525096 529801 534608 539517 544531 549653 554883 560223 565675 571242 50 576923 582721 588638 594674 600832 607112 613515 620043 626695 633473 60 640378 647410 654569 661856 669269 676809 684475 692266 700182 708220 30 716380 724660 733057 741569 750194 758928 767769 776713 785756 794894 60' 804124 813439 822836 832310 841855 851466 861138 870866 880643 890464 9'9 900324 910217 920138 930083 940046 950022 960009 970003 980000 990000

-153TABLE II (CONT'D) 00 01 02 03 04 05 o06 07 08 09 n=6 00 000000 335577 337882 340239 342652 345121 347650 350239 352891 355606 10 358386 361233 364149 367135 370194 373326 376534 379819 383183 386629 20 390158 393772 397472 401262 405142 409115 413183 417347 421609 425973 30 430438 435009 439685 444470 449364 454370 459490 464725 470076 475546 40 481135 486846 492679 498635 504716 510922 517255 523715 530302 537017 50 543860 550830 557929 565154 572507 579985 587588 595314 603163 611132 60 619219 627423 635740 644169 652706 661348 670093 678936 687874 696903 70 706019 715218 724496 733849 743271 752759 762308 771914 781572 791277 80 801025 810813 820636 830489 840370 850274 860199 870140 880096 890064 90 900040 910024 920014 930007 940003 950001 960000 970000 980000 990000 n= 7 00 000000 288125 290605 293151 295765 298451 301209 304042 306952 309941 10 313010 316163 319401 322727 326142 329648 333241 336946 340741 344636 20 348635 352738 356948 361268 365699 370244 374904, 379682 384579 389598 30 394739 400005 405398 410918 416568 422348 428251 434303 440479 446790 40 453324 459813 466526 473374 480355 487469 494716 502094 509602 517237 50 525000 532887 540896 549024 557270 565629 574099 582677 591158 600140 60 609017 617987 627044 636186 645406 654702 664069 673501 682995 692547 70 702151 711805 721503 731241 741017 750825 760663 770527 780414 790321 80 800246 810185 820127 830100 840071 850049 860033 870022 880014 89xXoo8 90 900005 910002 920001 930000 940000 950000 960000 970000 980000 990000 n = 8 00 000000 252536 255152 257848 260624 263485 266433 269470 272599 275822 10 279142 282561 286082 289708 293440 297282 301237 305306 309492 313717 20 318225 322776 327455 332262 337199 342270 347474 352815 358294 363912 30 369670 375569 381611 387795 394122 400593 407207 413964 420863 427904 40 435085 442405 449862 457454 465180 473036 481020 489128 497359 505707 50 514170 522743 531423 530205 549085 558058 567121 576267 585493 594793 60 604164 613600 623097 632651 642256 651909 661606 671342 681114 690918 70 700751 710609 720490 730390 704308 750240 760185 770141 780106 790078 80 800057 810041 820029 830020 840013 850008 860005 870003 880002 890001 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000

-154TABLE II (CONT'D) n1 00 01 02 03 o4 05 06 07 08 o09 00 000000 224857 227584 230401 233314 236324 239435 242650 245972 249403 10 252947 256607 260385 264285 268309 272461 276743 281158 285708 290396 20 295225 300196 305311 310574 315984 321544 327255 333118 339133 345303 30 351625 358101 364731 371512 378446 385529 392761 400140 407663 415327 40 423131 431070 439141 447340 455664 464108 472668 481338 490115 498994 50 507968 517034 526186 535418 544727 554106 563552 573058 582620 592234 60 601895 611598 621340 631117 640926 650762 660623 670505 680407 690325 70 700257 710202 720157 730120 740091 750068 760051 770037 780027 790019 80 800013 810009 820006 830004 840002 850001 860001 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n= 10 00 000000 202715 205533 208455 211485 214626 217882 221256 224753 228374 10 232125 236007 240025 244182 248480 252924 257515 262257 267152 272202 20 277410 282777 288305 293996 299850 305868 312051 318399 324910 331586 30 333823 345422 352579 359894 367362 374982 382750 390661 398712 406899 40 415217 423661 432226 440907 449698 458593 467588 476675 485851 495108 50 504442 513847 533317 532848 542433 552069 561751 571474 581234 591028 60 600851 610700 620572 630464 640374 650300 660238 670187 680146 690113 70 700087 710066 720049 730036 740027 750019 760014 770009 780006 790004 80 800003 810002 820001 830001 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n = 11 00 000000 184599 187496 190509 193642 196901 200289 203810 207468 211267 10 215211 219304 223549 227949 232509 237230 242117 247171 252394 257790 20 263359 269103 275022 281117 287389 293836 300458 307253 314221 321358 30 328661 336128 343756 351539 359474 367555 375779 384138 392628 401243 40 409976 418822 427773 436825 445970 455202 464514 573902 483359 492878 50 502456 512085 521763 531482 541240 551032 560854 570703 580575 590468 60 600378 610303 620241 630191 640150 650116 660090 670069 680052 690039 70 700029 710021 720015 730011 740008 750005 760003 770002 780001 790001 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000 50 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000

-155TABLE II (CONT'D) 7r 00 01 02 03 04 05 06 07 08 09 n = 12 00 000000 169504 172469 175562 178790 182156 185666 189324 193134 197102 10 201230 205523 209985 214620 219430 224419 229590 234944 240483 246209 20 252123 258225 264515 270991 277654 284501 291529 298737 306120 313674 30 321395 329277 337317 345507 353841 362314 370919 379648 388494 397452 40 406513 415670 424917 434247 443653 453129 462668 472265 481914 491610 50 501347 511121 520929 530765 540627 550510 560413 570332 580265 590211 60 600166 610130 620101 630078 640059 650045 660003 670025 680018 690013 70 700009 710007 720005 730003 740002 750001 760001 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 n = 13 o0 000000 156733 159757 162924 166238 169704 173329 177116 181072 185200 io0 189505 193992 198664 203525 208578 213825 219270 224913 230756 236799 20 243042 249484 256124 262959 269987 277204 284607 292189 299947 307185 30 315965 324214 332608 341145 349818 358616 367533 376561 385693 394919 40 404233 413627 423095 432629 442223 451871 461567 471306 481083 490894 50 500734 510599 520486 530392 540314 550250 560198 570156 580122 590094 60 600072 610055 620042 630031 640023 650017 660012 670009 680006 690004 70 700003 710002 720001 730001 740000 750000 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000oooo 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000 = 14 00 000000 145786 148865 152098 155493 159054 162787 166698 170793 175077 10 179554 184228 189104 194185 199473 204972 210682 216604 222738 229085 20 235641 242406 249375 256545 263912 271471 279214 287136 295230 303487 30 311901 320462 329162 337993 346946 356011 365181 374447 383800 393233 40 402739 412309 421938 431619 441345 451112 460915 470749 480610 490493 50 500397 510317 520252 530199 540156 550122 560094 570072 580055 590042 60 600031 610023 620017 630012 640009 650006 660004 670003 680002 690001 70 700001 710000 720000 730000 740000 750000 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 8900oo 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000

-156TABLE II (CONT'D) 00 01 02 03 04 05 06 07 08 09 n = 15 00 000000 136300 139428 142724 146194 149844 153682 157713 161943 166377 10 171021 175878 180953 186248 181766 197508 203474 209666 216081 222717 20 229571 236640 243918 251400 259080 266950 275002 283228 291618 300165 30 308858 317688 326644 335718 344900 354181 363551 373002 382527 292117 40 401764 411464 421208 430992 440810 450658 460532 470427 480341 490271 50 500214 510167 520130 530101 540077 550059 560045 570033 580025 590018 60 600013 610010 620007 630005 640003 650002 660002 670001 680001 690000 70 700000 710000 720000 730000 740000 750000 760000 770000 780000 790000 80 800000 810000 820000 830000 840000 850000 860000 870000 880000 890000 90 900000 910000 920000 930000 940000 950000 960000 970000 980000 990000

-157TABLE III L* OF DOUBLE TRUNCATED BINOMIAL DISTRIBUTION AT c=l AND d=n n n = 3(1)15; n =.oo(.01).99 n 00 01 02 03 04 05 06 07 08 09 n= 3 00 000000 336667 340000 343333 346667 350000 353333 356667 360000 363333 10 366667 370000 373333 376667 380000 383333 386667 390000 393333 396667 20 400000 403333 406667 410000 413333 416667 420000 423333 426667 430000 30 433333 436667 440000 443333 446667 450000 453333 456667 460000 463333 40 466667 470000 473333 476667 480000 483333 486667 490000 493333 496667 50 500000 503333 506667 510000 513333 516667 520000 523333 526667 530000 60 533333 536667 540000 543333 546667 550000 553333 556667 560000 563333 70 566667 570000 573333 576667 580000 583333 586667 590000 593333 596667 80 600000 603333 606667 610000 613333 616667 620000 623333 626667 630000 90o 633333 636667 640000 643333 646667 650000 653333 656667 660000 663333 n=4 00 000000 253781 247625 261530 265498 269526 273616 277766 281977 286247 10 290576 294963 299409 303911 308470 313084 317753 322476 327251 332078 20 336956 341884 346861 351884 356954 362061 367227 372428 377669 382950 30 388268 393623 399013 404435 409890 415374 420886 426425 431988 437574 40 443182 448808 454452 460112 465785 471469 477164 482866 488574 494286 50 500000 505714 511426 517134 522836 528531 534215 539888 545548 551192 60 556818 562426 568012 573575 579114 584626 590110 595565 600987 606377 70 611732 617050 622331 627572 632773 637931 643046 648116 653139 658116 80 663043 667922 672749 677524 682247 686916 691530 696089 700591 705036 90 709424 713753 718023 722234 726384 730474 734502 738470 742375 746219 n ="5 00 000000 204040 208162 212365 216652 221024 225480 230023 234653 239370 10 244176 249070 254054 259127 264289 269542 274884 280315 285836 291445 20 297143 302928 308800 314757 320798 326923 333129 339415 345780 352220 30 358734 365320 371975 378697 385483 392330 399235 406194 413206 420265 40 427368 434513 441694 448909 456153 463422 470712 478019 485339 492667 50 500000 507333 514661 521981 529288 536578 543847 551091 558306 565487 60 572632 579735 586794 593805 600765 607670 614517 621303 628025 634680 70 641266 647780 654220 660585 666871 673077 679202 685243 691200 697072 80 702857 708555 714164 719685 725116 730458 735771 740873 745946 750930 90 755824 760630 765347 769977 774520 778976 783348 787635 791838 795960

-158TABLE III (CONT'D) 00 01 02 03 04 05 06 07 08 09 n = 6 00 000000 170882 175196 179611 184126 188745 193467 198294 203228 208269 10 213419 218677 224045 229524 235113 240814 246627 252551 258587 264734 20 270992 277361 283839 290426 297121 303922 310827 317835 324944 332151 30 339455 346852 354339 361914 369573 377312 385127'393015 400972 408992 40 417071 425205 433388 441616 449883 458184 466513 474864 483233 491614 50 500000 508386 516766 525135 533487 541816 550117 558384 566612 574795 60 582929 591008 599028 606985 614873 622688 630427 638086 645661 653148 70 660545 675056 667849 682165 689173 696078 702879 709574 716161 722639 80 729008 735266 741413 747449 753373 759186 764886 770476 775955 781323' 90 786581 791731 796772 801706 806533 811255 815874 820389 824804 829118 n= 7 00 000000 147200 151659 156236 160932 165748 170686 175747 180933 186243 10 191680 197243 202934 208752 214699 220774 226976 233307 239765 246350 20 253061 259897 266857 273938 281141 288461 295899 303450 311114 318885 30 326763 334743 342822 350996 359262 367614 376050 384564 393151 401807 40 410526 419304 428135 437013 445933 454888 463874 472884 481913 490953 50 500000 509046 518087 527115 536125 545111 554067 562987 571865 580696 60 589474 598193 606849 615436 623950 632386 640738 649004 657178 665257 70 673237 681114 688886 696549 704101 711538 718859 726062 733143 740103 80 746939 753650 760235 766693 773023 779226 785301 791248 797066 802757 go 808320 813757 819067 824253 829314 834252 839068 843764 848341 852800 00 000000 129441 134015 138724 143570 148553 153676 158940 164345 169892 10 175582 181416 187393 193513 199777 206183 212730 219419 226247 233214 20 240316 247537 254923 262423 270049 277800 285673 293663 301767 309983 30 318305 326730 335253 343871 352579 361371 370244 379192 388210 397294 40 406438 415636 424884 434177 443508 452872 462264 471679 481110 490552 50 500000 509448 518890 528321 537736 547128 556492 565823 575116 584364 60 593562 602706 611790 620808 629756 638629 647421 656129 664746 673270 70 681695 690017 698233 706337 714327 722200 729951 737577 745077 752446 80 759683 766786 773753 780581 878269 793817 800223 806487 812607 818584 go 824418 830108 835655 841060 846324 851447 856430 861276 865985 870559

-159TABLE III (CONT'D) Or 00 01 02 03 04 05 06 07 08 09 n = 9 n=9 00 000000 115630 120299 125120 130096 135226 140513 145958 151561 157323 10 163244 169324 175561 181957 188508 195215 202075 209087 216247 223554 20 231004 238595 246323 254185 262176 270292 278513 286884 295350 203924 30 312601 321375 330243 339198 348235 357351 366539 375794 385112 394487 40 403914 413389 422340 432461 442049 451665 461304 470962 480633 490314 50 500000 509685 519366 529038 538696 548335 557951 567538 577093 586611 60 596086 605513 614888 624206 633461 642649 651765 660802 669757 678624 70 687399 696076 704650 713116 721470 729708 737824 745815 753677 761405 80 768995 776446 783753 790913 797925 804785 811492 818043 824439 830676 go 836756 842677 848439 854042 859486 864774 869904 874880 879701 884370 n = 10 00 000000 104583 109333 114253 119343 124607 130043 135654 141440 147400 10 153534 159841 166320 172970 179787 186770 193916 201222 208693 216297 20 224058 231963 240006 248183 256489 264918 273464 282183 290888 299755 30 308716 317768 326903 336117 345404 354758 364175 373650 383177 392752 40 402370 412027 421718 431439 441186 450955 460743 470544 480357 490177 50 500000 509823 519643 527901 539257 549045 558814 568561 578288 587973 60 597630 607248 616823 626350 635825 645242 654596 663883 673097 682232 70 691284 700225 709111 717877 726536 735082 743511 751817 759994 768037 80 775942 783703 791317 798778 806084 813230 820213 827030 833679 840159 go 846466 852600 858560 864346 869957 875393 880657 885747 890667 895417 n = 11 00 000000 095546 100367 105375 110570 115955 121531 127297 133253 139398 10 145732 152253 158957 165843 172908 180146 187555 195129 202864 210754 20 218794 226978 235299 243751 252329 261024 269832 278745 287756 296860 30 306050 315315 324664 334077 343551 353084 362668 372299 381973 391685 40 401431 411208 421010 430836 440681 450543 460419 470305 480199 490098 50 500000 509902 519801 529695 539581 549456 559318 569164 578990 588792 60 598569 608315 618027 627701 637332 646916 656448 665923 675335 684679 70 693950 712244 703140 721255 730168 738976 747671 756249 764701 773022 80 781206 789245 797136 805980 812445 819854 827092 834156 841042 847747 90 854268 860602 866747 872703 878469 884045 889430 894625 899633 904454

TABLE III (CONT'D) ot 00 01 02 03 04 05 06 07 08 09 n= 12 00 000000 088016 092901 097989 103282 108781 114486 120398 126515 132837 10 139359 146080 152997 160105 167399 174874 182525 190346 198330 206469 20 214758 223189 231753 240445 249256 258178 267205 276328 285542 294838 30 304210 313652 323158 332721 342337 351994 361705 371448 - 381224 391030 40 400862 410717 420592 430484 440389 450307 460234 470169 480110 490054 50 500000 509946 519890 529395 539765 549693 559611 569516 579408 589283 60 599137 608970 618776 628552 638295 648000 657663 667279 676842 686348 70 695790 714458 705162 723672 732795 741822 750744 759555 768247 776811 80 785242 793531 801670 809654 817474 825125 832601 839895 847003 853919 90 860641 867163 873485 879602 885514 891219 896718 902011 907099 911984 n = 13 00 oooooo000000 086588 081647 091751 097135 102742 108571 114622 120892 127380 10 134082 140993 148110 155426 162935 170630 178505 186550 194760 203124 20 211635 220283 229061 237951 246970 256084 265293 274591 283968 293418 30 302935 312511 322141 331819 341539 351298 361090 370912 380759 390629 40 400519 410425 420346 430279 440221 450173 460130 470093 480060 490029 50 500000 509970 519940 529906 539869 549827 559778 569721 579654 589575 60 599481 609370 619240 629088 638910 648702 658461 668181 677859 687489 70 697065 706581 716032 725409 734707 743916 753030 762041 770939 779717 80 788365 796876 805240 793509 821495 829370 837065 844574 851890 859007 90 865918 872620 879108 885378 891429 897258 902865 908249 913412 918353 n = 14 00 000000 076188 081183 086415 091885 097594 103542 109725 116143 122790 10 129663 136755 144060 151572 159282 167181 175261 183513 191927 200493 20 209201 218041 227004 236080 245260 254535 263896 273336 282846 292419 30 302048 311728 321453 331216 341015 350843 360697 370574 380471 390384 40 400312 410252 420202 430160 440125 450097 460072 470051 480033 490016 50 500000 509984 519967 529949 539928 549903 559874 569840 579798 589748 60 599688 609616 619529 629426 639303 649158 658985 668783 678547 688272 70 697951 707581 717154 726664 736103 745465 754740 763920 772996 781959 80 790799 799507 808073 816487 824738 832819 840718 848428 855940 862245 90 870337 877210 883857 890275 850137 902406 908115 913585 918817 923812

-161TABLE III (CONT'D) 00 01 02 03 04 05 06 07 08 09 n = 15 00 000000 071458 076502 081800 087353 093160 099221 105533 112091 118892 10 125927 133191 140676 148371 156269 164357 172627 181066 189665 198411 20 207293 216302 225425 234654 243977 253386 262872 272427 282043 291713 30 301431 311191 320986 330814 340669 350547 360446 370362 380292 390235 40 400187 410149 420117 430092 440071 450054 460040 470028 480018 490008 50 500000 509991 519982 529972 539960 549946 559929 569908 579882 589851 60 599812 609765 619708 629638 639554 649452 658317 669186 679013 688809 70 698569 708287 717957 727573 737128 746614 756023 765346 774575 783698 80 792706 801589 810335 818933 827373 835643 843731 851629 859324 866809 90 874073 881108 887909 894467 900779 906840 912647 918200 923498 928542

-162TABLE IV ~* OF SINGLY TRUNCATED POISSON DISTRIBUTION ON THE RIGHT AT d d = 4(1)10; A = o.o(o.1)4.9 0.0 0.1 0.2 0.3 0.4 0.5 o.6 0.7 0.8 0.9 d = 4 0.0 0.0000 0.1000 0.1200 0.2999 0.3997 0.4992 0.5982 0.6965 0.7939 0.8900 1.0 0.9846 1.0775 1.1685 1.2574 1.3439 1.4281 1.5097 1.5886 1.6649 1.7386 2.0 1.8095 1.8778 1.9435 2.0065 2.0671 2.1252 2.1809 2.2344 2.2856 2.3346 3.0 2.3817 2.4267 2.4699 2.5113 2.5510 2.5891 2.6255 2.6605 2.6941. 2.7263. 4.0 2.7573 2.7870 2.8156 2.8430 2.8694 2.8948 2.9192 2.9428 2.9654 2.9872 d = 5 0.0 0.0000 0.1000 0.2000 0.3000 0.4000 0.4999 0.5998 0.6995 0.7990 0.8982 1.0 0.9969 1.0951 1.1925 1.2890 1.3845 1.4787 1.5716 1.6630 1.7527 1.8406 2.0 1.9266 2.0107 2.0926 2.1725 2.2502 2.3502 2.3989 2.4700 2.5388 2.6054 3.0 2.6698 2.7321 2.7923 2.8504 2.9065 2.9606 3.0128 3.0632 3.1118 3.1586 4.0 3.2037 3.2473 3.2892 3.3297 3.3688 3.4064 3.4427 3.4778 3.5116 3.5442 d=0.0 0.0000 0.1000 0.2000 0.3000 0.4000 0.5000.6 0.7000 0.7999 0.8997 0.0 0.0000 0.1000 0.2000 0.3000 0.4000 0.5000 0.6000 0.7000 0.7999 0.8997 1.0 0.9995 0.0991 1.1985 1.2976 1.3964 1.4947 1.5925 1.6896 1.7859 1.8814 2.0 1.9758 2.0692 2.1613 2.2521 2.3415 2.4294 2.5157 2.6003 2.6832 2.7643 3.0 2.8435 2.9211 2.9964 3.0700 3.1416 3.2113 3.2791 3.3450 3.4090 3.4711 4.0 3.5314 3.5898 3.6465 3.7014 3.7547 3.8062 3.8562 3.9046 3.9515 3.9968 d= 7 0.0 0.0000 0.1000 0.2000 0.3000 0.4000 0.5000 o.6000 0.7000 0.8000 0.9000 1.0 0.9999 1.0999 1.1997 1.2996 1.3993 1.4989 1.5983 1.6975 1.7964 1.8950 2.0 1.9931 2.0908 2.1879 2.2844 2.3801 2.4750 2.5691 2.6621 2.7540 2.8448 3.0 2.9344 3.0227 3.1096 3.1950 3.2790 3.3614 3.4422 3.5214 5.5990 3.6748 4.0 3.7490 3.8215 3.8922 3.9613 4.0286 4.0942 4.1582 4.2204 4.2811 4.3400

TABLE IV (CONT'D) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0o.g9 d= 8 0.0 0.0000 0.1000 0.2000 0.3000 0.4000 0.5000 o.6ooo 0.7000 0.8000 0.9000 1.0 1.0000 1.1000 1.2000 1.2999 1.3999 1.4998 1.5997 1.6995 1.7992 1.8988 2.0 1.9983 2.0976 2.1967 2.2955 2.3940 2.4922 2.5900 2.6873 2.7841 2.8801 3.0 2.9756 3.0703 3.1642 3.2572 3.3493 3.4404 3.5304 3.6192 3.7068 3.7932 4.0 3.8783 3.9621 4.0444 4.1253 4.2048 4.2828 4.3592 4.4342 4.5076 4.5795 d=9 0.0 0.0000 0.1000 0.2000 0.3000 0.4000 0.5000. 6000 0.7000 0.8000. 9000 1.0 1.0000 1.1000 1.2000 1.3000 1.4000 1.5000 1.5999 1.6999 1.7998 1.8997 2.0 1.9996 2.0994 2.1992 2.2989 2.3984 2.4978 2.5979 2.6962 2.7950 2.8936 3.0 2.9919 3.0898 3.1873 3.2844 3.3810 3.4770 3.5724 3.6671 3.7611 3.8543 4.0 3.9464 4.0381 4.1286 4.2181 4.3066 4.3940 4.4802 4.5652 4.6490 4.7485 d = 10 0.0 0.0000 0.1000 0.2000 0.3000 0.4000 0.5000. 6000 0.7000 0.8000. 9000 1.0 1. 0000 1. 1000 1.2000 1.3000 1.4000 1. 5000 1. 6000 1. 7000 1.8000 1.8999 2.0 1.9999 2.0999 2.1998 2.2997 2.3996 2.4995 2.5992 2.6990 2.7986 2.8981 3.0 2.9976 3.0968 3.1960 3.2949 3.3935 3.4920 3.5900 3.6879 3.7853 3.8822 4.0 3.9788 4.0748 4.1702 4.2651 4.3593 4.4528 4.5455 4.6375 4.7286 4.8188

-164TABLE V W* OF SINGLY TRUNCATED POISSON DISTRIBUTION ON THE LEFT AT c c = 1(1)10; i = 0.0(0.1)9.9 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 c=l 0.0 0.0000 1.0508 1.1033 1.1575 1.2133 1.2708 1.3298 1.3905 1.4528 1.5166 1.0 1.5820 1.6489 1.7172 1.7870 1.8582 1.9308 2.0048 2.0800 2.1565 2.2342 2.0 2.3130 2.3930 2.4741 2.5563 2.6394 2.7236 2.8086 2.8945 2.9813 3.0689 3.0 3.1572 3.2462 3.3360 3.4264 3.5174 3.6090 3.7011 3.7938 3.8870 3.9806 4.0 4.0746 4.1691 4.2639 4.3591 4.4547 4.5506 4.6467 4.7431 4.8398 4.9368 5.0 5.0339 5.1313 5.2288 5.3266 5.4245 5.5226 5.6208 5.7191 5.8176 5.9162 6.0 6.0149 6.1137 6.2126 6.3116 6.4107 6.5098 6.6090 6.7083 6.8076 6.9070 7.0 7.0064 7.1059 7.2054 7.3049 7.4045 7.5041 7.6038 7.7035 7.8032 7.9029 8.0 8.0027 8.1025 8.2023 8.3021 8.4019 8.5017 8.6016 8.7014 8.8013 8.9012 9.0 9.0011 9.1010 9.2009 9.3008 9.4008 9.5007 9.6006 9.7006 9.8005 9.9005 c=2 0.0 0.0000 2.0339 2.0689 2.1051 2.1424 2.1810 2.2208 2.2617 2.3039 2.3475 1.0 2.3922 2.4382 2.4856 2.5342 2.5842 2.6354 2.6880 2.7419 2.7970 2.8535 2.0 2.9114 2.9705 3.0309 3.0926 3.1556 3.2198 3.2853 3.3521 3.4200 3.4892 3.0 3.5595 3.6310 3.7036 3.7774 3.8522 3.9281 4.0050 4.0830 4.1611 4.2418 4.0 4.3226 4.4043 4.4869 4.5703 4.6546 4.7396 4.8254 4.9119 4.9991 5.0870 5.0 5.1755 5.2647 5.3544 5.4448 5.5356 5.6270 5.7189 5.8112 5.9040 5.9972 6.0 6.0908 6.1848 6.2792 6.3739 6.4689 6.5642 6.6599 6.7558 6.8519 6.9484 7.0 7.0450 7.1419 7.2389 7.3362 7.4336 7.5313 7.6290 7.7270 7.8250 7.9233 8.0 8.0215 8.1199 8.2185 8.3172 8.4159 8.5147 8.6136 8.7126 8.8117 8.9108 9.0 9.0100 9.1093 9.2086 9.3079 9.4073 9.5068 9.6062 9.7058 9.8053 9.9049 c =3 0.0 0.0000 3.0244 3.0515 3.0785 3.1062 3.1347 3.1642 3.1944 3.2256 3.2576 1.0 3.2906 3.3245 3.3594 3.3952 3.4320 3.4698 3.5086 3.5484 3.5893 3.6313 2.0 3.6743 3.7184 3.7636 3.8099 3.8572 3.9058 3.9554 4.0062 4.0580 4.1111 3.0 4.1652 4.2206 4.2770 4.3346 4.3933 4.4532 4.5142 4.5763 4.6395 4.7038 4.0 4.7693 4.8358 4.9034 4.9720 5.0417 5.1125 5.1842 5.2570 5.3307 5.4054 5.0 5.4811 5.5577 5.6352 5.7136 5.7928 5.8729 5.9539 6.0356 6.1181 6.2014 6.0 6.2854 6.3711 6.4555 6.5416 6.6284 6.7157 6.8037 6.8922 6.9813 7.0710 7.0 7.1612 7.2518 7.3430 7.4346 7.5266 7.6191 7.7119 7.8052 7.8988 7.9928 8.0 8.0871 8.1817 8.2766 8.3718 8.4673 8.5631 8.6590 8.7553 8.8517 8.9484 9.0 9.0452 9.1423 9.2395 9.3369 9.4345 9.5322 9.6301 9.7280 9.8262 9.9244

TABLE V (CONT'D) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 c=4 0.0 0.0000 4.0710 4.0440 4.0626 4.0842 4.1062 4.1301 4.1542 4.1782 4.2032 1.0 4.2290 4.2554 4.2824 4.3103 4.3388 4.3681 4.3981 4.4289 4.4604 4.4927 2.0 4.5259 4.5599 4.5947 4.6304 4.6670 4.7044 4.7428 4.7802 4.8222 4.8633 3.0 4.9053 4.9483 4.9922 5.0371 5.0830 5.1299 5.1778 5.2267 5.2766 6.3275 4.0 5.3794 5.4323 5.4863 5.5413 5.5973 5.6544 5.7124 5.7714 5.8317 5.8928 5.0 5.9550 6.0181 6.0823 6.1475 6.2136 6.2807 6.3489 6.4179 6.4879 6.5589 6.0 6.6308 6.7034 6.7773 6.8518 6.9274 7.0036 7.0807 7.1587 7.2375 7.3170 7.0 7.3974 7.4785 7.5603 7.6428 7.7261 7.8099 7.8946 7.9798 8.0656 8.1521 8.0 8.2391 8.3267 8.4149 8.5036 8.5928 8.6825 8.7727 8.8633 8.9544 9.0459 9.0 9.1378 9.2302 9.3229 9.4159 9.5093 9.6031 9.6972 9.7915 9.8862 9.9812 c = 5 0.0 0.0000 -- 4.9391 5.o468 5.0738 5.0930 5.1137 5.1057 5.1481 5.1677 1.0 5.1880 5.2100 5.2314 5.2540 5.2771 5.3006 5.3248 5.3494 5.3746 5.4006 2.0 5.4271 5.4542 5.4820 5.5104 5.5395 5.5693 5.5997 5.6309 5.6628 5.6954 3.0 5.7287 5.7628 5.7977 5.8333 5.8697 5.9069 5.9450 5.9838 6.0235 6.0641 4.0 6.1054 6.1477 6.1908 6.2348 6.2797 6.3255 6.3721 6.4197 6.4682 6.5177 5.0 6.5680 6.6193 6.6716 6.7247 6.7788 6.8339 6.8899 6.9468 7.0047 7.0635 6.0 7.1233 7.1840 7.2457 7.3083 7.3718 7.4362 7.5016 7.5679 7.6351 7.7031 7.0 7.7721 7.8420 7.9127 7.9843 8.0567 8.1300 8.2042 8.2791 8.3548 8.4313 8.0 8.5086 8.5867 8.6655 8.7451 8.8253 8.9063 8.9880 9.0703 9.1533 9.2369 9.0 9.3212 9.4061 9.4915 9.5776 9.6642 9.7513 9.8390 9.9272 10.0159 10.1050 c = 6 1.0 6.1616 6.1761 6.1968 6.2134 6.2337 6.2532 6.2734 6.2940 6.3146 6.3359 2.0 6.3577 6.3700 6.4028 6.4261 6.4499 6.4742 6.4991 6.5245 6.5505 6.5771 3.0 6.6042 6.6319 6.6602 6.6891 6.7186 6.7488 6.7796 6.8119 6.8432 6.8760 4.0 6.9095 6.9437 6.9786 7.0142 7.0506 7.0877 7.1255 7.1641 7.2034 7.2435 5.0 7.2845 7.3262 7.3687 7.4120 7.4561 7.4964 7.5469 7.5936 7.6410 7.6894 6.0 7.7386 7.7886 7.8395 7.8913 7.9440 7.9975 8.0519 8.1072 8.1634 8.2204 7.0 8.2784 8.3372 8.3969 8.4575 8.5190 8.5813 8.6445 8.7086 8.7735 8.8393 8.0 8.9060 8.9735 9.0418 9.111o 9.1810 9.2518 9.3235 9.3959 9.4691 9.5431 9.0 9.6178 9.6934 9.7696 9.8466 9.9243 10.0028 lo.O819 lo.1617 l0.2422 10.3234

TABLE V (CONT'D) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 - 7.1463 7.1713 7.1790 7.2048 7.2181 7.2335 7.2524 7.2715 7.2885 2.0 7.3067 7.3255 7.3446 7.3642 7.3842 7.4046 7.4254 7.4467 7.4684 7.4905 3.0 7.5131 7.5361 7.5596 7.5836 7.6081 7.6331 7.6586 7.6846 7.7111 7.7382 4.0 7.7658 7.7940 7.8227 7.8520 7.8819 7.9124 7.9435 7.9573 8.0076 8.0406 5.0 8.0742 8.1085 8.1434 8.1790 8.2153 8.2533 8.2900 8.3284 8.3675 8.4073 6.0 8.4478 8.4891 8.5312 8.5740 8.6175 8.6619 8.7070 8.7529 8.7995 8.8470 7.0 8.8953 8.9443 8.9942 9.0449 9.0964 9.1488 9.2019 9.2559 9.3107 9.3664 8.0 9.4229 9.4802 9.5383 9.5973 9.6571 9.7177 9.7791 9.8414 9.9045 9.9684 9.0 10.0331 10.0987 10.1651 10.2322 10.3001 10.3688 10.4384 10.5087 10.5797 10.6516 c=8 2.0 8.2677 8.2838 8.3002 8.3170 8.3344 8.3517 8.3695 8.3876 8.4061 8.4249 3.0 8.4441 8.4636 8.4836 8.5039 8.5246 8.5457 8.5672 8.5891 8.6115 8.6342 4.0 8.6575 8.6811 8.7052 8.7298 8.7548 8.7804 8.8063 8.8328 8.8598 8.8873 5.0 o 8.9154 8.9440 8.9731 9.0027 9.0330 9.0637 9.0951 9.1270 9.1596 9.1927 6.0 9.2264 9.2607 9.2957 9.3313 9.3675 9.4045 9.4420 9.4802 9.5191 9.5587 7.0 9.5989 9.6399 9.6815 9.7239 9.7670 9.8108 9.8554 9.9097 9.9467 9.9935 8.0 10.0410 10.0892 10.1383 10.1881 10.2387 10.2900 10.3421 10.3949 10.4486 10.5031 9.0 10.5583 10.6144 10.6712 10.7288 10.7871 10.8462 10.9062 1, 9670 11.0284 11.0901 c=9 2.0 9.2365 9.2499 9.2643 9.2790 9.2947 9.3095 9.3252 9.3410 9.3571 9.3734 3.0 9.3900 9.4061 9.4242 9.4417 9.4595 9.4777 9.4962 9.5150 9.5342 9.5537 4.0 9.5736 9.5937 9.6143 9.6352 9.6566 9.6783 9.7004 9.7229 9.7458 9.7691 5.0 9.7929 9.8171 9.8417 9.8667 9.8923 9.9182 9.9447 9.9716 9.9990 10.0269 6.0 10.0553 10.0841 10.1136 10.1435 10.1740 10.2050 10.2365 10.2687 10.3013 10.3345 7.0 10.3683 10.4027 10.4377 10.4733 10.5095 10.5463 10.5838 10.6219 10.6606 10.7000 8.0 10.7401 10.7807 10.8220 1o.8640 10.9067 10.9501 10.9941 11.0388 11.0841 11.1303 9.0 11.1772 11.2249 11.2732 11.3221 11.3718 11.4221 11.4733 11.5252 11.5778 11.6313

-167TABLE V (CONT'D) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 c = 10 2.0 10.2062 10.2147 10.2277 10.2423 10.2565 10.2713 10.2863 10.3006 10.3155 10.3303 3.0 10.3450 10.3596 10.3754 10.3908.10.4063 10.4226 10.4389 10.4552 10.4722 10.4892 4.0 10.5065 10.5240 10.5417 10.5599 10.5785 10.5975 10.6151 10.6360 10.6556 10.6756 5.0 10.6961 10.7170 10.7380 10.7596 10.7815 10.8037 10.8264 10.8494 10.8728 10.8965 6.0 10.9207 10.9453 10.9703 10.9958 11.0217 11.0482 11.0749 11.1022 11.1299 11.1581 7.0 11.1867 11.2159 11.2455 11.2756 11.3063 11.3376 11.3694 11.4016 11.4343 11.4677 8.0 11.5017 11.5360 11.5710 11.6066 11.6428 11.6796 11.7168 11.7546 11.7930 11.8322 9.0 11.8720 11.9125 11.9536 11.9950 12.0371 12.0798 12.1234 12.1677 12.2123 12.2581

TABLE VI 4* OF SINGLY TRUNCATED NEGATIVE BINOMIAL DISTRIBUTION ON THE LWFT AT c=l (Zero-Observations Truncated) k = 1(1)3; - =.01(.01).99 00 01 02 03 04 05 06 07 08 og k= 1 00 - 100.0000 50.0000 33.3333 25.0000 20.0000 16.6667 14.2857 12.5000 11.1111 10 10.0000 9.0909 8.3333 7.6923 7.1429 6.6667 6.2500 5.8824 5.5556 5.2632 20 5.0000 4.7619 4.5455 4.3478 4.1667 4.0000 3.8462 3.7037 3.5714 3.4483 30 3.3333 3.2258 3.1250 3.0303 2.9412 2.8571 2.7778 2.7027 2.6316 2.5641 40 2.5000 2.4390 2.3809 2.3256 2.2727 2.2222 2.1739 2.1277 2.0833 2.0408 50 2.0000 1.9608 1.9231 1.8868 1.8519 1.8182 1.7857 1.7544 1.7241 1.6949 60 1.6667 1.6393 1.6129 1.5873 1.5623 1.5385 1.5152 1.4925 1.4706 1.4493 70 1.4286 2.4085 1.3889 1.3699 1.3514 1.3333 1.3158 1.2987 1.2821 1.2658 80 1.2500 1.2346 1.2195 1.2048 1.1905 1.1765 1.1628 1.1494 1.1364 1.1236 go 1.1111 l1.0989 1.0870 1.0753 1.0o638 1.0526 1.0417 1.0309 1.0204 1.0101 k= 2 o0 - 99.0099 49.0196 32.3624 24.0384 19.0476 15.7233 13.3511 11.5741 10.1937 10 9.0909 8.1900 7.4404 6.8073 6.2657 5.7971 5.3879 5.0277 4.7081 4.4229 20 4.1667 3.9355 3.7258 3.5348 3.3602 3.2000 3.0525 2.9163 2.7902 2.6731 30 2.5641 2.4624 2.3674 2.2784 2.1949 2.1164 2.0425 1.9728 1.9070 1.8447 40 1.7857 1.7298 1.6767 1.6263 1.5783 1.5326 1.4890 1.4474 1.4076 1.3697 50 1.3333 1.2985 1.2652 1.2332 1.2025 1.1730 1.1447 1.1175 1.0912 1.0660 60 1.0417 1.0182. 9956 0. 9738 0.9527 0.9324 0.9128 0.8937 0.8754. 8576 70 o. 8404 0.8237 o.8075 0.7918 0.7767 0.7619 0.7476 0.7337 0.7203 0.7072 80 0.6944 0.6821 0.6701 0.6584 0.6470 0.6359 0.6252 0.6147 0.6045 0o.5945 go 0.5848 0.5753 0.5661 0.5572 0.5484 0.5398 0.5315 0.5233 0.5154 0.5076 k= 3 00 - 99.0000 49.0004 32.3342 24.0015 19.0024 15.6701 13.2903 11.5059 10.1185 10 9.0090 8.1049 7.3460 6.7070 6.1598 5.6859 5.2716 4.9065 4.5823 4.2926 20 4.0323 3.7971 3.5836 3.3890 3.2111 3.0476 2.8971 2.7580 2.6291 2.5095 30 2.3981 2.2941 2.1970 2.1060 2.0206 1.9403 1.8648 1.7935 1.7263 1.6627 40 1.6026 1.5455 1.4914 1.4401 1.3912 1.3447 1.3005 1.2583 1.2180 1.1796 50 1.1429 1.1077 1.0741 1.0419 1.0111 0.9815 0.9531 0.9259 0.8997 0.8745 60 0.8504 0.8271 o.8Q47 0.7831 0.7623 0.7423 0.7230 0.7044 0.6864 0.6691 70 o. 6523 0. 6361 0. 6205 0. 6054 0. 5907 0. 5766 0. 5629 0. 5496 0.5368 0. 5243 80 0.5123 o0.5006 0.4893 0.4783 0.4622 0.4573 0.4473 0.4375 0.4281 0.4189 90 0.4160 0.4013 0.3929 0.3847 0.3768 0.3690 0.3615 0.3542 0.3470 0.3401

TABLE VII 4 OF LOGARITIMIC SERIES DISTRIBUTION 0 =.o1(.o1).99 0 00 01 02 03 04 05 06 07 08 09 oo0 - 1.005043 1.010172 1.015383 1.020690 1.026091 1.031587 1.037177 1.042873 1.048672 10 1.054580 1.060598 1.066730 1.072980 1.079351 1.085846 1.092472 1.099231 1.106129 1.113168 20 1.120355 1.127694 1.135191 1.142852 1.150681 1.158686 1.166872 1.175248 1.183817 1.192590 30 1.201574 1.210777 1.220207 1.229874 1.239789 1.249960 1.260400 1.271117 1.282128 1.293442 40 1.305076 1.317043 1.329358 1.342039 1.355104 1.368569 1.382458 1.396791 1.411590 1.426882 50 1.442695 1.459053 1.475991 1.493543 1.511744 1.530632 1.550251 1.570649 1.591874 1.613982 60 1.637035 1.661095 1.686240 1.712564 1.740101 1.769007 1.799368 1.831307 1.864961 1.900476 70 1.938027 1.977805 2.020029 2.064946 2.112839 2.164042 2.218924 2.277935 2.341581 2.379962 80 2.485340 2.567035 2.656609 2.755341 2.864811 2.986979 3.124362 3.280188 3.458686 3.665562 90 3.908651 4.199053 4.671580 4.996013 5.568560 6.342356 7.456020 9.220787 12.525489 21.497578

CHARTS 1 - 4 -170

350 300 250 200 Ix 50.~~~~~~hd. I"to fXfSingyTuct 31aDirbtonst 1fr -() 50 0 100 200 300 400 500 600 700 800 900 1000 W/n Chart 1. Estimation of x of Singly Truncated Binomial Distribution at c = 1 for n - 3(1)6

140 _ 120 I00 80o I O 100 200 300 400 500 600 700 800 900 1000 X/n Chart 2. Estimation of s of Singly fruncated Binamial Distribution at c - 1 for n - 7(i)10 Chrt 2. gEtimtion, of n of Singly Truncated Binomial Distribution at c - 1 for n.m 7(1)10

105 90 1-; - - __ ------- _ __ _ _ _____!_ 75 2 1 60 50 -- 2 25I0 350- 40 45',!,x - 45 i - 45 --- - -- - - ----- - -1 -- - - - _ _ _ - - - _ - -- __ 50 100 150 200 250 300 350 400 450 500 Chart:3. stimainot ini/n Chart 3. Estimation of:K of Singly Truncatea Binomial Distribution at c = 1 for n = 11(1)15

o0. 9t.0.C 00 20 3.0 4.0 5.0 6.0 10 90 60D QChart 4. Etiat ion of p of Singly Truncated Poinso Distribution st c - 1

BIBLIOGRAPHY 1. ANSCOMBE, F. J. (1949) The statistical analysis of the insect counts based on the negative binomial distribution. Biometrics, 5. 165-173. 2. AYYANGAR, A.A.K. (1934): Note on the recurrence formulae for the moments of the point -binomial. Biometrika, 26. 262-264. 3. BLISS, C.S. and Fitting the negative binomial FISHER, R.A. (1953) distribution to biological data, with a note on the efficient fitting of the negative binomial distribution. Biometrics, 9. 176-200. 4. COCHRAN, W.G. (1954): Some methods for strengthening the commom chi-square tests. Biometrics, 10. 417-451. 5. COHEN, A.C. (1954) Estimation of the Poissonparameter from truncated samples and from censored samples. J.A.S.A., 49. 158-168. 6. CRAIG, C.C. (1953): Note on the use of fixed numbers of defectives and variable sample sizes in sampling by attributes. Industrial Quality Control, Vol. 9, No. 6, 43-45. 7. CRAMER, HAROLD (1945) ~ Mathematical Methods of Statistics. Princeton University Press. -175

-1768. DAVID, F.N. and: Truncated Poisson JOHNSON, N.L. (1952) Biometrics, 8. 275-285. 9. FELLER, W. (1943): On a general class of contagious distributions. A.M.S. 14, 389-400. 10. FELLER, W. (1957): Introduction to probability theory and its applications. 11. FISHER, R.A. (1936): The effects of method of ascertainment upon estimation of frequencies. Annal of Eugenics, 6. 13-25. 12. FISHER, R.A. (1941): The negative binomial distribution. Annals of Eugenics, 11, 182-187. 13. FINNEY, D.J. (1949): The truncated binomial distribution. Annals of Eugenic s, 14, 319-328. 14. FINNEY, D.J. and: An example of the truncated VARLEY, G.C. (1955) Poisson distribution. Biometrics, 11, 387-394. 15. FRISCH, R. (1925): Recurrence formulae for the moments of the point-binomial. Biometrika, 17, 165-171. 16. GIRSHICK, M.A., MOSTELLER, F.: Unbiased estimates for and SAVAGE, L.J. (1946) certain binomial sampling problems with applications. A.M.S., 17, 13-23.

-17717. GRAB, E.L. and Tables of the expected SAVAGE, I.R. (1954) value of 1/X for positive bernoulli and poisson variables. J.A.S.A. 49, 169-177. 18. GREENWOOD and YULE (1920) An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents. J.R.S.S., 83, 235-279. 19. GULDBERG, S. (1935) Recurrence formulae for the semi-invariants of some discontinuous frequenc'functions of n variables. Skandinavisk Actuarietidskrift 18, 270-278. 20. HALDANE, J.B.S. (1939) The cumulants and moments of the binomial distribution and the cumulants of chisquare for a(nx2)-fold table. Biometrika, 31, 392-395. 21. HALDANE, J.B.S. (1941) The fitting of binomial distribution. Annals of Eugenics, 11, 179-181. 22. HALDANE, J.B.S. (1945) On a method of estimating frequencies. Biometrika, 33, 222-225. 23. HALDANE, J.B.S. and The sampling distribkuttlon SMITH, S.M. (1956) of a maximum likelihood estimate. Biometrika, 43, 96 -103.

-17824. HARRISON, J.L. (1945) Stored products and the insects infesting them as examples of the logarithmic series. Annals of Eugenics, 12, 280-282. 25. KENDALL, D.G. (1948) On some modes of population growth leading to R.A. Fisher's logarithmic series distribution. Biometrika 35, 6-15. 26. KIRKHAM, W.J. (1935) Moments about the arithmetic mean of a binomial frequency distributiona. A.M.S., 6, 96-101. 27. KITAGAWA, T. (1952) Tables of Poisson distribution. Biafukan, Tokyo, Japan. 28. MOLINA, E. C. (1942) Poisson's exponential binomial limit. D.Van Nostrand Coo. nc. 29. MOORE, P.G. (1952) The estimration of the Poisso-L parameter from a truncated distribution. Biometrika 39, 247-251. 30. MOORE, P.G. (1954) A note on truncated Poisson distributions. Biometrics, 10, 402-406. 31. NEYMAN, J. (1939) On a new class of contagi.)-ol distributions, applicable iri entomology and bacteriology. A.M.S. 10, 35-57. 32. NOACK, ALBERT (1950) A class of random variables with discrete distributions. A.M.S. 21, 127-132.

-17933. OLDS, E.G. (1940) On a method of sampling. A.M.S., 11, 355-358. 34. PEARSON, K. (1913) A monograph on albinism in man. Drapers Company Research Memoirs. 35. PLACKET, R.L. (1953) The truncated Poisson distribution. Biometrics, 9, 485-488. 36. QUENOUILIE, M.H. (1949): A relation between the logarithmic, Poisson and Negative Binomial Series. Biometrics, 5, 162-164. 37. RAO, C.R. (1948) Large sample tests of statistical hypothesis concerning several parameters with application to problems of estimation. Proc. Camb.Phil.Soc. 44, 50-57. 38. RAO, C.R. (1952): Advanced statistical Methods in biometric research. John Wiley and Sons, Inc. 165. 39. RIDER, P.R. (1953) Truncated Poisson distributions. J.A.S.A., 48, 826-830. 40. RIDER, P.R. (1955): Truncated binomial and negative binomial distributions. J.A.S.A. 50, 877-883. 41. ROMANOVSKY, V. (1923): Notes on the moments of a binomial about its mean. Biometrika 15, 410-412.

-18042. SAMPFORD, M.R. (1955) The truncated negative binomial distribution. Biometrika 42, 58-69. 43. SICHEL, H.S. (1951) The estimation of the parameters of a negative binomial distribution with special reference to psychological data. Psychometrika 16, 107-127. 44. SKELLLAM, J.G. (1948) A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. J.R.S.S. Series B, 10, 257-264. 45. STEPHAN, F.F. (1945) The expected value and variance of the reciprocal and other negative powers of a positive Bernoullian variate. A.M.S. 16, 50-61. 46. THOMAS, M. (1949) A generalization of Poisson's Binomial limit for use in ecology. Biometrika 36, 13 -25. 47. TIPPETT, L.H.C. (1932) A modified method of counting particles. Proceedings Royal Society, Series A, 137, 434-46. 48. WALSH, J.E. (1955) Approximate probability values for observed number of successes from statistically independent binomial events with unequal probabilities. Sankhya, 15, 281-290.

49. WILLIAMS, C.B. (1944): The numbers of publications written by biologists. Annals of Eugenics, 12, 143-146. 50. WISE, M.E. (1946): The use of the negative binomial distribution in an industrial sampling problem. Supplement to J.R.S.S. 8, 202-211. 51. WISHART, JOHN (1949): Cumulants of multivariate multinomial distributions. Biometrika 36, 47-58.

UNIVERSITY OF MICHIGAN 3III IIlrlJ1llI f111111111 9015 0 2 6 1 3 9015 03022 6461