Division of Research Graduate School of Business Administration The University of Michigan July 1980 A SEQUENTIAL PROBABILITY RATIO MODEL FOR TESTING COMPLIANCE AT MULTIPLE SITES Working Paper No. 221 James T. Godfrey Richard W. Andrews The University of Michigan FOR DISCUSSION PURPOSES ONLY None of this material is to be quoted or reproduced without the express permission of the Division of Research.

ABSTRACT This paper considers the situation in which an auditor desires to use statistical sampling procedures in the evaluation of compliance with an internal control procedure which is in effect at multiple sites. The objective is to reach a statistical conclusion about whether or not at least one of the sites has too large a noncompliance rate. A sequential probability ratio test is proposed which is sequential in the sites. A common density function is assumed to generate the error rates at the sites. Some of the results that are reported have required sample sizes and numbers of sites to test that are significantly lower than commonly used. KEY WORDS 1. Auditing 4. Error Rates 2. Sequential Test 5. Cost Minimization 3. Hypothesis Test 6. Likelihood Ratio

* James T. Godfrey is Associate Professor of Accounting and Richard W. Andrews is Associate Professor of Statistics in the Graduate School of Business Administration at The University of Michigan, Ann Arbor, Michigan 48109. This project was funded by a grant from the Peat, Marwick, Mitchell Foundation through its Research Opportunities in Auditing program. The views expressed herein are those of the authors and do not necessarily reflect the views of the Peat, Marwick, Mitchell Foundation.

1. INTRODUCTION In this paper we consider the situation in which an auditor desires to use statistical sampling procedures in the evaluation of an internal control procedure which is in effect at multiple sites. The approach typically taken is to apply statistical methods that were designed for single sites. These methods include taking samples and calculating upper precision limits for an attribute at each of a subset of the sites and then reaching a nonstatistical conclusion about the attribute for all sites. Leslie [1979] examined this situation and suggested that the use of single-site methods in multiple-site situations causes two major concerns for an auditor: "1. How many locations should be visited?" and "2. What size samples should be selected at the locations visited?" Leslie addressed these questions in a dollarunit sampling setting, with the objective being to reach a statistical conclusion about the total dollar error at all sites. Our analysis takes place in a pure error rate setting, with the objective being to reach a statistical conclusion about the maximum error rate at any one site. This conclusion is based on samples taken at a subset of the sites. For the single-site case a statistical conclusion usually takes the form of stating that there is a certain percentage of confidence that the true error rate does not exceed a prescribed value. The prescribed value is a maximum tolerable error rate' which will be subjectively determined by an auditor so that if it is concluded that the unknown true error rate is greater than or equal to it, the internal control procedure

-2 - represented by the attribute is judged to be out of control. Statistical methodology has been developed for a single attribute at a single site. The methodology we propose in this paper is designed for the case of a single attribute occurring at more than one site. On the basis of our model, a statistical conclusion will have the form of stating that there is a certain percentage of confidence that no site has a true error rate greater than the maximum tolerable rate. The model we propose is a sequential probability ratio test [Mood, Graybill, and Boes, 1974, pp. 468-470] which is sequential in the sites. Vance [1950] first suggested the appropriateness of a sequential probability ratio model for attribute testing in auditing. Roberts [1974] proposed a sequential probability ratio test for the case of an attribute at a single site (sequential in terms of items selected) and claimed the following advantages: "(1) the simplicity of the procedure, (2) the control of both risks, and (3) the saving of observations in comparison to the corresponding fixed sample procedure." Although the analytical development of our model is complex, the development of appropriate computer programs makes implementation simple. We also claim the other two advantages for our model, that it accounts for both types of risks and that it requires a total sample size smaller than that required for a valid statistical conclusion using traditional methods. In the latter case, as we will show, the only statistically defensible approach would be to take a minimum sample size at virtually every site.

-3 -- o.. 2. THE PROBLEM In this section we develop a naive sequential probability ratio model that may be compared with the primary model that we develop later, in Sections - 3, 4, and 5. The model we develop in this section is naive in the sense that we simply assume random selection of sites and of items at the selected sites. It can be viewed as an extension of the traditional single-site model, except that it is:a sequential probability ratio model andboth types of risk are controlled. A comparison of the naive model with our later model will reveal that substantial reductions in the amount of audit work can be achieved by assuming a relationship among the error rates at the multiple sites. Consider the situation where an internal control procedure is being carried out at K different sites. The auditor wishes to make a statistical statement about the error rate of this control. Specifically, he wants to be probabilistically assured that none of the individual sites has an error rate above a certain level. We let K = total number of sites; k = number of sites to audit; n = sample size at each site audited; p. = the error rate at site i (i = 1,..., K); and x. = the number of errors observed at site j (j = 1,..., k). J We assume that x, conditioned on p,. has a binomial distribution: P(x Pj) = ()p (lp); j = 1,..., k. A hypothesis structure is as follows: H0 All Pi < Pa; i 1,..., K H: All but one p. < p and one p. > p a i — -a

-4 - a = Type I risk X = Type II risk Pa = an acceptable error rate Pu = an unacceptable error rate This hypothesis structure describes the most difficult auditing si —:ation,since under Ha only one site has a "bad" error rate and we wa-z statistical assurance of finding it. By designing our test with regard forthishypothesis structure we will have even greater assurance o: finding a "bad" site if more than one actually exists. To test these hypotheses a sequential probability ratio model takes the following form: first, we calculate L(x, x2,.., Xk | HO) k L(xl, x2,..., xkIHa) where L(xl, x2,..., xkfH) is the likelihood function for the sample data observed at the k sites audited, given hypothesis H. Then a decision rule can be stated as follows; if Ak, we accept Ho; or if X <- 1- r we reject Ho; or if (1- < Ak < -%), we continue by sampling more sites. For hypothesis H0 every possible set of outcomes for the k sites audited is equally likely, and therefore '8 ~ k x n-x. L(x, x2,..., xkHo) = 1 (.) Pa ( a) k k X x. X (n-x ) = ia (Ipa) iX1-P) j=l x

-5 -For hypothesis H in which one of the K sites is out of control (p = p ), a u the k sites audited will sometimes include the bad one and at other times it will be excluded. In fact, if the k sites are randomly selected, P(all k sites good) = () k =. 1 - k and P o` Then n P(all k sites good, except one) = -- = k k re can write the likelihood function for Ha: a _J J>J k.n-x x. L(x1, Xa, x IH = k n P (1-P + \.Kv k " X. n-x. X.. xj) Pa xi n-x) (1-Pa) 3 -(4)k k x. n j ) a j=l1 x. a = n-x. (1-P ) I + (G) j_ ~,. x. n _ ( ) <1-p). Pa a n jX )I j=l j ( p =1 pa 1 a = I -x. Ji 1 - [1 J —1= J k X x. j=l = P a k 7 (n-x.) - k (1 - i (n - k+ I ( =1 jl a Finally,

-6 - k I x. pj=l j a k k I x. X (n ~1 i j=l a a k I (n-x.) k aa jj K j=l i -x.) k i=l I [K - k + (p ) x. 'l-P n-x. K - k + a l-P /) jii a^ - K k K - k + I j=l If we set an acceptance total sample (i.e., all -- () x -P n-x Pa l -Pa criterion for H0 of x. = 0), then we can J i observing zero errors in our write our test statistic as K xk= k - K K - k + k 'LIa n \l-P/ 1-a _ e ' 1-a, and B k > F-K where F = [i(7-1-) Thus, the minimum number of sites that must be audited is a factor F times K, the total number of sites. F is a function of the sample size n,and it can be seen that k and n have an inverse relationship which satisfies our intuition; that is, the more sites we audit, the lower the sample size requirement at each site. Obviously we would prefer F < 1 since we would prefer that k < K. If we set F < 1, then

-7 - (1-a 1) F = En- = — -- = A);0< 1; which becomes a -P ln{ ----V Thus, given a, P, Pa, and Pu, if we set in ( ) n - in (l-p a then F = l,and wemust take initial samplesof sizen atall k = K sites. For example, suppose we assume the following values, a = B =.05, p =.005, _ _a p =.05. u Then, for H0 to be'accepted, initial samples of size n = 64 must be taken at all K sites and no errors may be observed. It is interesting to note that in using a sequential probability ratio test model for the singlesite case and with the same specifications of a,, p a' and p as above, acceptance of H0 again requires an initial sample size of n = 64 with no errors observed. Returning to our test, we previously observed that k and n have an inverse relationship. Thus, a reasonable question to consider next would be how much k could be reduced by increasing n. If we let n + a,

-8 - F + 1-c 1-a' which will be the minimum value of F. For the numerical example given above, 05 min F = 1 - -.9474. This result reveals that, even with an infinite sample size, we can reduce the number of sites to audit by only 5.26 percent. Furthermore, K must be greater than nineteen before we can select k < K. It appears, in general, that there cannot be a significant decrease in the number of sites that must be audited when the simple naive model approach is used. However, it is a fact that in the multiple-site situation it is common for auditors to test significantly fewer than all K sites.2 A common rationale given for testing less than K sites is that the same set of internal control procedures is prescribed at each site and the procedures are well documented. Therefore, there is an implicit belief that the propensity for errors is similar across all sites. In the context of the six essential characteristics of internal accounting control,3 we might assume that all but the "personnel" characteristics are similar across all sites. In the case we might expect similarities in the error rates but also differences caused by different personnel implementing the same well-prescribed system.

-9 -The model we develop below is a formalization of the auditing environment described above. In this development the error rates at the individual sites are assumed to be realizations of the same random variable,and therefore related to each other, but not necessarily the same value. This structure will allow us to come to a statistical conclusion based on samples from k < K sites. We will develop a method for making a statistical statement about the largest error rate among the K sites by inspecting only a subset of the sites. The procedure developed will be sequential in the sites. That is, after auditing k sites (k < K), one of the following three decisions will be made: 1. It will be decided at a specified risk level that all sites are in control, i.e., all sites have an error rate equal to or less than a specified maximum tolerable rate; 2. It will be decided at a specified risk level that at least one of the sites has an error rate larger than a specified maximum tolerable rate; 3. It will be decided to audit at least one more site. In Section 3 the symbology necessary for the statistical development is given and a model for the error rates is introduced. In Section 4 a sequential hypothesis test is set up in terms of the model parameters. Section 5 relates the auditing parameters to the sequential test through probability statements. In Section 6 - we derive the necessary equations that will allow us to implement the test. Section 7 contains extensive computations which provide the required sample sizes and associated decision rules.

-10 -3. MODEL FOR THE ERROR RATES Let K represent the total number of sites and p.i be the proportion of errors at site i, for i = 1, 2,..., K. We assume that pi, P2,..., K are independent and identically distributed from the density fp(p) where Yu { P) if 0 < p < 1 (3.1) fp(p) = (3.1) 0 otherwise. We call fp(p) the generating distribution, and it is a special case of the beta distribution r(n') r'-l n'-r'-l f = r(r')r(n'-r') p (lp) with r' = 1 and n' = y + 1. The beta distribution is often suggested for the error rate p because of its ability to reflect a wide range of potential beliefs by auditors [Felix and Grimlund, 1977; Corless, 1972; Francisco, 1972; Crosby, 1979]. Our rationale for using fp(p) is that (1) it is flexible in its ability to reflect typical auditors' beliefs and (2) it provides the link which ties the K error rates together. We test hypotheses about different values of the parameter y which represent different degrees of assurance that the unknown error rate is close to zero. By setting r' = 1, we restrict fp(p) to have a mode at zero (for Y > 1), and its general shape is shown in Figure A. For most auditing situations it is expected that the probability mass will be close to zero,and therefore this is a realistic shape for the generating distribution of error rates. The larger the value of y, the more concentrated the distribution is close to zero. Therefore, if we accept a hypothesis that says that y is very large we will have good reason, probabilistically, to believe that all the sites have low error rates and therefore

-11 - (Insert Figure A here) are in control. However, if we reject the hypothesis that A is large in favor of an alternative hypothesis that A is small, we will have reason to believe, probabilistically, that at least one of the sites has an intolerably high error rate. The next section will set up this hypothesis structure. 4. HYPOTHESIS STRUCTURE FOR THE SEQUENTIAL TEST As stated in the previous section, we assume that p1, P2',.., Pk are independent and identically distributed from fp(p). The auditor then randomly selects k sites which will be audited. A random sample of size n is taken from each of the k sites and the n-k items are audited with respect to the control procedure of interest. Since each item (it could be a transaction, an account, or some other system component) is either in error or not and since there will be a finite number of items in total at each site, the number of items in error will have a hypergeometric distribution which is conditional on X., the total number or errors at site j. If we let x. be the number of errors observed in a sample, then, 3

-12 - f(xj Xj,p) = J J J (X \ /N.-X.\ xj n -xJJ ( n n where N. = total population size at site j, j = 1,..., k, and x. = 0, 1,..., min(n,X). J J We can relate this to our generating distribution (f (p)) by observing that, given an error rate p. generated at site j, the total number of errors X. will have a binomial distribution: /N.\ X. N.-X f(Xj)Pj ) =X.p j (1-pj) X J; X. = 0, 1,..., N.. Then, the distribution of the number of errors observed in a sample x;, given pj, is.3 f(xj Pj)=.3.3 N. -n+x. J J X.=x. 3 3 (f(x.jX.,p.))(f(X. p.)).3.3 1 I.3i N. -n+x. J J X. =x.. 3 Xj (. N.-X 1^ d -pj *J = x. )n-x) x j Now we can derive the marginal distribution of x.:.J

-13 - f (x.) = f (f(xjlpj))(f (pj))dp n nj j Px. - = (:9). P (1 -p.) J y(1-p) d 3J r (x +i) r (y+n-x.)r(y+l) Kx r (y+l+n) r (y) for x. = 0,1,..., n. The marginal distribution, f(xj), will be used in the calculation of the likelihood functions which are required for the sequential probability ratio test we describe below. Based on a sample of k sites, from which we will obtain the sample data xl, x2,..., Xk, we want to test the hypothesis structure H: Y = y H y = y < Y a Y of V0 The null hypothesis, by virtue of a large value of y, will indicate that all the sites are in control. For this hypothesis structure we use the approximate sequential probability ratio test [Mood, Graybill, and Boes, 1974]. Based on the sample data X1 x 2 9 Xk the likelihood ratio is given by L0(x1, x2,..., Xk; YO) k Ll(Xl x2,-.., Xk; Y1) k H f(xj) (with Y = Y0) j=l 0 k 11 f(x.) (with y = y1) j=l J

-14 - k r(Y0 + n - x.) r(y0 + 1) r(Y + 1 + n) r(y 1) j=l (Yo + 1 + n) r(y0) r(yl + n - xj) r(Y1 + 1) 4 The decision rule, based on Ak, is: If SX > - - then accept H*; a^ If k - 1 - *' then reject H*; If 1 < Ak < B, then audit at least one more site. The size of the Type I error of the hypothesis test is a* and the size of the Type II error is B*. We can use the result that F(X) = (X - 1) F(X - 1) to refine the form of Xk given in (2). k Y (Y1 + n) (Y1 + n - 1)... (y + n - x.) tk Y= (Y + n) (Y+ + n- 1)... ( + n- x.) j Y1. - [ + n ~0 + n - 1 + n - 2 Y + n - m [~1] [yO i ^o + n- 1 S Y[, - + n - 2] +on-m (4]2) xwher& S = the number of sites with at least i errors and i = 0,..., m, S = k, since all sites will always have at least zero errors, m = the total number of errors observed at all sites. The first component of the right-hand side of(4.2), -f, will be a constant in any given application since it does not depend on the sample data. Thus we can write our criterion for acceptance of H* as m [Yb + n-"i * - - a*

-15 - Our criterion for rejection of H* can be written as "n L 1 + < C+b 1 * j=ly0 + n - j J and our criterion to sample additional sites becomes m y_0 + n - j i A(m) = ~ S. In + n - j (4.3) j=O J U be our test statistic, and Y1 CA (a*, *) = in ( ) + k In (-) (4.4) be our acceptance criterion, and CR (a*, I*) = In (1 - ) + k In (-) (45) be our rejection criterion. Then our restated decision rule is, If A(m) > CA(a*, ( *), then accept H; (46) If A(m) < CR(a*, ( *), then reject H1; (4.7) 11.,. (4.8) If C R(a, (*) < A(m) < C A(c, 3*), then audit at least one (4.8) more site. The value of our test statistic, A(m), will depend on the number of errors, m, observed at the k sites and the way in which these m errors are distributed over the k sites. For a given value of m and a specific distribution of the m errors the S.'s (j = 0, 1,..., m) will take on a unique set of values.

-16 -In order to perform this test in a sequential manner the following two questions must be answered: (1) What should the values of Y0 and Y1 be in order for this test to discriminate between a group of sites which are in control and a group which has at least one site out of control? (2) What values should be set for ca and 8* in order to maintain specified overall risks? These two questions will be addressed in the next section. 5. RELATING AUDITING PARAMETERS TO THE SEQUENTIAL TEST As previously discussed, a* and B* are the probabilities of the Type I and Type II errors, respectively, associated with the hypothesis structure H*: Y Yo0 H^: Y = 1 < a Y0 However, it should be emphasized that the hypothesis structure of primary interest is Ho: All p i < u, H: At least one Pi > u, a 1i where p. = error rate at site i; u = desired upper precision limit; i = 1,..., K. For this structure we let a and 0 be the desired probabilities for the Type I and Type II errors, respectively. Thus, our direct statistical test of H1 and H* will serve as an indirect test of H0 and H. Since 0 a 0 a it is a and 8 that we must ultimately control in reaching a conclusion about the internal control system under evaluation, it is important that

-17 - we establish explicit relationships between the two hypothesis structures. To accomplish this, we define the following events: C - the event that we accept the null hypothesis H*: Y = y0 and therefore conclude that all K sites are in control. C E the event that we reject the null hypothesis H': Y = y0 and therefore conclude that at least one of the sites is out of control. L E the event that all K sites have error rates less than or equal to u. L - the event that at least one of the K sites has an error rate greater than u. G - the event that the generating distribution has parameter y 0 G = the event that the generating distribution has parameter y1. Now we can relate the two hypothesis structures as follows: a = P(C L) = P(CM) P(L) P (C LG) +P (CL LG)P (L | G) P (L) P (C LG) P (L I| G) P (G)+P (CG) (LG) P (L P (G) P(L) <c P(CG)P (LI G) P (G)+P (CJ)P(L J P(G) P(L) The latter inequality holds since P(C<LnG) < P(CJG) and P(CGLflG) <_ P(QCG). That is, the additional information that all sites have error rates less than or equal to u would not increase the probability of rejecting the null hypothesis (Hg). Next, = P(CL) X = P(C|L) - (CiL) - P (CnfnG) +P (CnLf\G) P(L)

-18 - 13 P(C|LnG)P(L|G)P(G)+P(CILOG)P(LIG)P(G) P(L) <P(C|G)P(LG)P(G)+P(c|G)P(L|G)P(G) P(L) The latter inequality holds since P(C|LPG) < P(CjG) and P(C|LrG) < P(C|G). That is, the additional information that at least one site has an error rate greater than the desired upper precision limit (u) would not increase the probability of accepting the null hypothesis (H*). Next we observe that a* = P(C|G) and 3* = P(CJG); then the two inequalities may be rewritten as follows. aP (L G) P (G) + (l- ) P(L G) P (G) - <- P(L) 3 < (l-t*) P (L|G) P(G) +P (L|G)P(G) P(L) We now have bounds on cc and 1 that are each functions of a' and 13 and which serve as initial links between the two hypothesis structures. Our intuition leads us to conjecture that we must have ac' < a and 13 < 3, i.e., the Type I and II risks for the direct statistical test of (Ho, Ha^) 0 a cannot exceed those for the indirect test of primary interest (H0, H ). In Appendix A we show that this is indeed true when the necessary bounds on the other components of the expressions are taken into account (e.g., P(L) = 1 - P(L), 0 < P(L) < 1, P(L) = P(L|G)P(G) + P(L[G)P(G), etc.). When values of a and 1 are set by an auditor and the two expressions above are solved as equalities we would expect that the achieved levels of a and 1 would be less than the nominal values initially set. Next we solve the two inequalities as equalities and after some rearranging

-19 -of terms we can write the resulting equations in terms of P(L). Then we can equate the two equations. P(L G) ) (L) P(LI G( -) (5.-*-1) (l-a-3*) - P(L|G) (1-c,-a)G) (l-a-,8) + (-a*) Writing the two equations in this form also demonstrates another link between the two hypothesis structures. Both P(LIG) and P(L[G) are probabilities that no error rate will be greater than the upper precision limit (u). However, P(LIG) is conditional on H* being true (y=y0), while P(LjG) is conditional on H* being true (y=y1<yO). Thus, in the latter case, if P(L|G) > 0, the error rate may still be under control (all p.< u), although the alternative state of nature under H* I a is the true state of nature. In order to implement our statistical test we can develop expressions for P(LIG) andP(L[G) in terms of our generating function. Let Y = max (Pl, P2..., PK), i.e., we define a new random variable, Y, which is the largest error rate at all K sites. The density of Y is given by fY(y)= <Ky[l - (1 - y)y]K-1 (1 - y)yl if 0 < y < ~ (0:; otherwise. Event L is equivalent to the event that Y is less than or equal to u, since if the largest of all the error rates is less than or equal to u then all of them must also be. Therefore, P[LIG] = Jf Ky['l - (1 - y)0]K- (1 - y)Y-1 dy (5.2) = [1 - ( - u)O]K. Similarly,

-20 - P[L|G] = [1 - (1- u)1], and therefore.(5.3) P[L|G] = 1 - [1 - (1 - u)0] and (5.4) P[LfG] = 1 - [1 - (1 - u)Yl]. (5.5) We can now rewrite the previous expression. [1 - (1-u) ] (s-S*) [l-(l-u) ] (-*-) (56) Y 0KY K (1-a-l*) - [l-(l-u) ] (l-ca-) [l-(l-u) 1] (l-a-f) + (a-a*) It can be seen now that with expressions 1(5.2) and (5.3)for P(LIG) and P(LJG) written in terms of f (p), we can solve for Y and Y. ' 0 ' 1 - ln{l - [P(L G)] } ~0 In (1-u) - (57) ln{l - [P(L G)]k} 1 in (1-u) (58) Values for K and u will be supplied by the auditor and values for P(LIG) and P(LIG) must be arbitrarily set by the one designing the statistical test. In a later example we set P(LIG) =.99 on the basis of the reasoning that, given Ho is the true state of nature for the generating function, we would like a large probability of all sites being under control P(LJG) = 1.0 is not admissible since the generating function would be undefined). Likewise, in a later example we set P(L|G) =.01, since we would like a small probability of all sites being under control when H* a is the true state of nature for the generating function (P(L|G) = 0 is not admissible since the generating function would be degenerate).4 Expression (5.6) contains eight unknowns which may be subdivided into two logical groups, as follows:

-21 - Statistical Auditing Test Parameters Parameters a a S S'* u Y0 K Y1 Among the auditing parameters, a and 8 are the overall risks that the auditor desires to control, u is the desired upper precision limit which is determinedby auditor judgment, and K is the total number of sites which should be known. We classify a*, *, y0' Y1 as statistical test parameters, since Y0 and y1 are the hypothesized values of y (theparameter of the generating distribution) and a* and 8* are the specified risk levels associated with the hypothesis structure, H6, Ha. We can now perform a direct statistical test of the hypothesis structure, 1H8: = y H*' a Y 1 0 which will be an indirect test of the hypothesis structure, H0: All P i < u, H: At least one p. > u. a In the next section we describe a logical process for the development of valuesfor the eight parameters. 6. DERIVATION OF PROCEDURES FOR IMPLEMENTATION To implement our statistical test a planning phase must be conducted in which we determine the number of sites and the sample size at each site to audit initially. We must also determine the criteria

-22 -which will lead to either accepting Ho, rejecting H*, or auditing additional sites. We call this the initial planning phase; if it becomes necessary to audit additional sites, we will have a second planning phase, and so on. At the initial planning phase we must decide on a decision criterion for acceptance of H3. In keeping with the expectation of observing few errors and the auditing strategy of minimizing sample size,we develop equations for k and n based on accepting Ho only when zero errors are observed at all sites audited. From (7) recall that the criterion for acceptance of H1 is m ~y1 + n - ii /l ln ik A(m) = S iny0 > ln ( in + k in CA*, 6*) With our condition of observing zero errors this results in k in[ > in ) + k In ). (6.1) ryYO + nn /1 ~'~~yO It can be seen in (6.1) that values for the statistical parameters (ct*,3*, y0, and y1) must be determined before values of k and n can be determined. Even then, there will be many possible pairs of values (k, n) with a relationship such that as k increases (decreases), n will decrease (increase). A formal approach to the selection of values for k and n should include a consideration of sampling costs. Other considerations might include a subjective assessment by the auditor of the relative risk associated with various sites. However, we restrict our development to a consideration of costs only. Let C1 = fixed cost of auditing a site, and C2 = cost of auditing an item at a site.

-23 - We assume that C1 and C2 are the same for all sites. Thus, the objective at the initial planning phase would be to select values for k and n such that we minimize, z = C k + C2kn = k(C1 + C2n). However, based on the acceptance criterion for our statistical test (HI, H*), we must select k and n to satisfy (6.1). We assume that solving (6.1) as an equality will be compatible with the desire to minimize costs, since we will be meeting the minimum requirement for acceptance. By rearranging terms, (6.1)becomes 1 (A;) k (11 n = I -- Y 0^1 ' Next we substitute (6.2) into the objective function, minimize Z = k + C2 L1tii- ] This leaves the task of determining an optimal value for k which will implicitly determine optimal n. However, the objective function now contains the statistical parameters a', f*, y0,and Yi' Selection of values for these parameters will be restricted by expression (5.6)developed earlier. Recall that(5.6.) resulted from our analysis which related the two hypothesis structures, (H1, Ha) and (H0, Ha). By rearranging terms in expression(5.6) we are able to write a* as a function of S*.

-24 - b3'* + b a* =, (6.3) bib* + b2 where b = (l-a-1)B(l-A) -ac(A-B) (6.4) b = A-B (6.5) b2 = (1-a)B(1-A) - BA(l-B) (6.6) b3 = aA(l-B) - (1-)B(l-A) (6.7) ~0 K A = P(LIG) = [l-(l-u) ] (6.8) T1 K (6.9) B = P(LG) = [l-(l-u) ]. (6 Values for a, 5, u, and K will be determined by auditing judgment and knowledge. Values for a*, S*, yo, and y1 which satisfy (6.3) will guarantee that both hypothesis structures are satisfied. Values for y0 and y1 will be determined by setting P(LIG) =.99 and P(LJI) =.01, as discussed earlier. Selection of values for a* and g* will be restricted by our earlier development that a* < a and 5' < P. Furthermore, with respect to (6.3),it is simple to show that da'/dS* < 0. Therefore as 8* apb3'* + b approaches a, a* will approach its minimum value,, + b > 0, and for $* = 0, a' will equal its maximum value, b/b, which is less than a. Now, since in (6.3) we have written a* as a function of all other variables, we can rewrite the objective function. C" 1(5^-~: (b-b3) + b2-b)\ k minimize z = k + (b + Cb 2) 1 2 i (6.10) *(b 1-b 3) + (b 2-b)> YO TY1 P*(b1* + b2)

-25 -Constraints on values of the four decision variables are as follows: 0 < A' < S (6.11) b35* + b b 0 < =< 2< (6.12) bl + b2 - b2 0 < Y0 < o (6.13) 0 < Y1 < 0 (6.14) 1 < k < K (integer) (6.15) This is a nonlinear, integer (in k) programming problem. As structured in (6.10) through (6.15), it cannot be solved for a global optimum since the constraints on a*, *, O, and y1 are open sets. Further, it is not immediately apparent that the objective function is of a form that would have a global optimum even if the constraints were not open sets. Our approach6 to solution of the problem is to first assign values for Y0 and Y1 based on the rationale that was explained earlier. Then, given values for yO and Y1, dz/dS* < 0 for any feasible value of k (1 < k < K) when either -b - Ib2b - b - b 2 b Lbb - 2b 0 <b - b, or (6.16) b - b 3 1 3 1 > 1 -b2 [ -b (b1 2b3)]- (6.17) 1 3 (demonstrated in Appendix B). Thus, we are assured that increasing P* in the ranges indicated by (6.16) and (6.17) will decrease total sampling cost z. However, the range indicated by (6.17) violates the restriction determined earlier, that

-26 - 1* < B; i.e., b- b2 + [ b (bb1 - 8 < < 8 ' <b b-b b1 b3 Thus we direct our attention to the range for B3* indicated by (6.16).This range is particularly important for the purpose of finding solutions to the cost minimization problem, because when it holds for 13', then 0 < B* < 3 also holds; i.e., b - b2- b2 (bb - b 0 < 6^ <_ - - b < b. (6.18) 1 3 Also, when (6.16) holds for 1*, then for a*, 0 < c- < -< a (demonstrated in Appendix C). (6.19) The importance of these findings is that we now have a closed interval for values of 13, and z will be minimized for any k by setting 13 equal to its maximum value. Also, the constraint that relates the two hypothesis structures will be satisfied. Further, our initial intuition referred to earlier is fully supported in that in an optimal sampling plan the Type I and II risks of our direct statistical test, a* and 1*, should not exceed those of our indirect test, a and 1, respectively. Finally, a global optimum solution can be found by simply enumerating the minimum z values for all values of k and then selecting the lowest one. A formal presentation of the cost minimization problem can now be given.

-27 - 1 k /*W(b -b3) + (b2-b) YO -Y1 g*(b1* + b2) subject to b - b2 - b2b3)] * =i (6.21) b - b 1 < k < K (integer), (6.22) where b, b1, b2, b are as defined in (6.4) through (6.7). It should be recognized that the cost minimization problem in (6.20) through (6.22) applies only to the initial planning phase. If it is necessary to audit additional sites (i.e., an accept or reject decision cannot be made at the end of the initial sampling phase), the cost minimization problem can be solved again. In the second planning phase the total number of sites remaining would be K - k, and f* would have-to be kept at the same value as determined in the solution of the initial-phase problem. Also, the objective function would have to be adjusted to reflect the data observed in the initial phase. Similar logic holds if more than two planning phases are required. (This will be illustrated later in a numerical example.) The solution of successive cost minimization problems as described above is, of course, an approximation of an optimal solution to an unspecified dynamic cost minimization problem. A dynamic model would be more complex both in formulation and solution and, perhaps, analytically and calculationally infeasible. Thus we do not consider a dynamic

-28 -formulation at this time but rather restrict ourselves to the problem as described in (6.20) through (6.22). -. COMPUTATIONAL EXAMPLE As as example of our cost minimization model we assume that, an auditor has specified the following values for the auditing parameters. K = 20 sites in total a =.05, the type I error g =.05, the type II error u =.01, the desired upper precision limit. Then the primary hypothesis structure can be stated as H0: Pi <.01 H: At least one p. >.01 a - i = 1,..., 20. Following our earlier discussion, in order to make the two hypothesis structures approximately equivalent we set YK P(LIG) = [l-(l-u) Y]K =.99 and Yl K P(LIG) = [1-(l-u) ] =.01. Then, inserting values for u and K, we calculate 1 nln(l-.9920) = 756 0 iln(l-.01) 756 1 iln(l-.0120) Y = ln(l-.0l) = 157. 1 ln(l-.0l) Thus the hypothesis structure tested directly is H: Y = Y0 = 756 Hj-: = Y = 157. a Y 1'

-29 -Next we assume that the fixed cost of auditing a site is C1 = $100 and the variable cost of auditing an item at a site is C2 = $1. We now have all of the data inputs necessary to solve the cost minimization problem. Referring back to (6.3) through (6.9), we can calculate A =.99 B =.01 b = -.00236 b =.98 b = -.04891 b3 =.04891 =, (.04891) * -.00236 (.98)0, -.04891 ~ Then the cost minimization problem ((6.20) through (6.22)) is I501* -.0475) k III minimize z = k LO + (1) (756) (157) L *(f* -.0499)J k k kjO +1 756-157 ( 9501l* -.0475> k J 7 WQ7 (S*V-.0499) subject to =0.048918-00236) (.000079).04891 -.00236 -.091 B* = = 04791. 1 < k < 20. Solutions for this problem for all values of k are given in Table 1. The lack of an entry in Trable 1 for k = 1 indicates that it is impossible to accept H6 by auditing only one site. The total cost for each value of k is a minimum, since B* was set at its maximum value in the range (6.18)for which dz/d3* < 0. The overall minimum total cost occurs for

-30 - (Insert Table 1 here) k = 6 and n(6) = 158. Thus, the initial sampling plan would be to randomly select 158 items at each of six randomly selected sites. For comparison purposes, if we use the naive model developed earlier and set =.01 and Pa =.005, it is necessary to go to all sites and take a u a sample of size 585 at each site. The traditional single-site model would require a sample size of 300 for 95 percent reliability and an upper precision limit of.01. Continuing our numerical example, the initial sampling plan is to take samples of size n = 158 at k = 6 sites and, if no errors are observed, to accept H*. The next question to consider is how many errors it would take to reject H*. Recall that our test statistic is

-31 -m Y1 + n - i I A(m) = - s. inYl + n i i=O 0 and our rejection criterion is A(m) < Ck(a*,S*) = in (1ls) + k in (-1). For our numerical example, Ck (C,*) = -14.13. Thus we want to find the minimum value of m (number of errors), such that m yl+n - i A(m) = i s. i n i < -14.13. i=0 0 - Finding a minimum value of m is complicated by the fact that, given m > 1, A(m) will take on different values which depend on how the m > 1 errors are distributed over the k sites. Thus it is possible that for a given m, some distributions will lead to rejection while other distributions will lead to continued sampling. This is realistic, and reflects the richness of our model which is sensitive to both the number of errors and how the errors are distributed throughout the k sites. The maximum value of the test statistic A(m), given a specific value of m, is achieved if the m errors are distributed among the k sites as evenly as possible. We call this value MAX (m). The minimum value of the test statistic is achieved if the m errors are all at the same site. We call this value A MIN(m). In building a decision rule for the hypothesis test we will first check to see if A MIN(m) is larger than C Ra*,*). If this is the case, we will continue to sample more sites for all distributions of m errors. If AMIN(m) is smaller than CR(a*,B*), we will check to see if AMA (m) is also less than CR(a*,B*). If this is the case, we will rIAX R\

-32 -reject H* for all distributions of m errors. However, if A (m) is larger and AMIN(m) is smaller than CGR(a*,8*), then we must check all possible distributions of the m errors among the k sites to see if we reject or continue to sample. In Appendix D we present an integerprogramming model for which an optimal solution will determine whether a distribution of m errors results in rejection or additional sampling. In our numerical example we do not have the problem described above. For m = 8 errors in any distribution over the k = 6 sites we have A(8) < -14.13, while for m = 7 we have A(7) > -14.13. Thus, a complete decision rule for the initial sampling phase is as follows. Take samples of size n = 158 at k = 6 sites, and if (1) Zero errors are observed, accept H1; or if (2) m > 7 errors are observed, reject HS; or if (3) 0 < m < 7, continue sampling. Next, continuing our example, suppose m = 4 errors are observed at the k = 6 sites in the following distribution (2, 0, 1, 1, 0, 0). Since the decision would be to continue sampling we must determine how many additional sites to audit and the required sample size. We let Ak be the value of our test statistic A(m),given the results of auditing the initial k = 6 sites. We assume that the criterion for acceptance of H* at our second sampling stage will be to observe no additional errors. Then, for acceptance of H1, we must have ml [Y + n' - i] Y 1 Ak + s 1 ' o> (k + k') ln ) + n (, (1) where m' = total number of additional errors s' = number of additional sites with at least i errors 1 n' = sample size at additional sites k' = number of additional sites to audit.

-33 -For acceptance of H* with m' = 0 we must have Y0 P(k') - y1 l-P(k') where P(k')= [(1) (e-A*) e. Since P*, y0, and y must remain the same as in the initial planning stage and we assume the sampling costs stay the same (C1 = $100, C2 = $1), all that remains is to calculate total sampling costs for every possible value of k'. The second stage problem can then be stated as minimize z' = k' (100 + (l)n') k' = k' 100 + o lP(k') - Y1)] subject to 1 < k' < K - k = 14. In Table 2 we, give the results from the solution of the second stage problem. The results in Table 2 indicate that at least three additional sites must be audited. However, minimum cost will be achieved by auditing eight additional sites with a sample size of 173 at each site. The number of errors required for rejection or to continue sampling can be calculated in a way analogous to the procedure given for theinitial planning phase. This concludes our numerical example. Of course, it may be necessary to go through more than two planning phases, especially when K is large. Specific rules for stopping must be developed, since it is possible to go through many planning phases without either accepting or rejecting H*. An auditor may limit the number of planning phases, limit the total sample size that will be taken, or limit the total number of

-34 - (Insert Table 2 here) sites that will be audited. The specific limit(s) that an auditor decides upon will depend on cost-benefit considerations and the potential impact that more sampling will have on his evaluation of the internal control procedure being tested. Presumably, not being able to either accept or reject H* will cause an auditor to lessen his reliance on an internal control procedure. The judgments invoked by an auditor in placing special limitations on the statistical test will have an impact on the planned risks associated with the test.

-35 -8. SUMMARY Models for testing compliance with an internal accounting control procedure that exists at multiple sites have been the subject of this paper. Current statistical procedures are designed only for single sites although they are sometimes used for the multiple-site problem. The first statistical model we suggested for the multiple-site problem was a naive sequential probability ratio model. That model was naive in the sense that the probabilistic nature of the model was simply generated by the sampling design. We showed that using the naive model would generally require auditing most of the multiple sites. The model of primary interest suggested was also a sequen- - tial probability ratio model. However, the probabilistic nature of that model was generated both by the sampling design and by an assumption about the generation of error rates at the multiple sites. The basis for the assumption was that for many clients with operations at multiple sites there are well-prescribed and documented procedures for internal accounting control activities. However, different personnel have responsibility for the internal control activities. Thus, while a case can be made that there are similarities in the way errors are generated at the sites, a case can also be made that there are differences caused by the different personnel. Our assumption was then manifested in a common generating function for the error rates. However, the error rates generated by that function were independently generated and likely to be different from each other. Another important characteristic of the common generating function was that the probability mass was greater on low error rates than on high error rates. This reflects beliefs that have been expressed

-36 -by auditors and which appear to be quite typical. It also, of course, gives direction for a logical extension of the model into a full-blown Bayesian model, on which we are currently working. For implementation purposes we cast our model within a total sampling cost minimization problem and derived the conditions for finding an optimal solution. Numerical examples gave results which showed the potential for substantial reductions in sampling effort (and costs) when compared to the requirements of the naive model developed earlier. For repetitive uses of our model in practice it would be simple to develop an interactive computer program that would produce a sampling plan for a specific application. An auditor would input all auditing parameters and the program would output the required sample size and number of sites to audit which would minimize total cost, as well as the number of errors necessary to cause rejection or to continue auditing more sites. Because of the number of parameters that must be input by an auditor (a, P, u, K, C1, C2), we believe that the development of books of tables as traditionally done would not be feasible. Besides the development of a Bayesian model, another area of research would be to consider the robustness of our results to other assumptions about the form of the generating function.

-37 -APPENDIX A In this appendix we develop bounds for a* and P* for use in implementation of the sequential test. For notational convenience we let A = P(L|G), B = P(LIG), and W = P(G). Then, P(L) = AW + B(1-W) and P(L) = (1-A)W + (1-B)(1-W), and the two inequalities developed earlier may be more compactly written as: O*AW + (1-V*)B(1-W) a < aAW + (l )B(l-W) (A.1) (l-c*)(l-A)W + g*(l-B)(l-W) B - (1-A)W + (1-B)(1-W) (A.2) If (A.l)and (A.2) are solved as equalities, the realized values of a and P should be less than the nominal values initially specified. We now write them as equalities and solve each one in terms of W: B (1-a-6*) W = A(a-a*) + B(l-a-*) (.3) W =(l-A) (-A.^-*) + (l-B)(8-8') (A4) We cannot freely choose values for A, B, and W in (A.3)and (A.4) since we must have P(L) = AW + B(1-W). Also, as discussed in an earlier section of the paper, we will set 0 < B < A < 1, and, therefore, B < P(L) < A and 0 < W < 1. Thus, with the latter condition holding for W, we are assured that P(L) will be within its bounds. We now consider bounds for a* and B* so that the bounds for W will be satisfied. Analysis of (A.3) For W < 1, a* < a For W > 0, S* < 1-a. ---

-38 - Analysis of (A. 4) For W = 0, *' = 1 For the other cases, both the numerator and denominator of (A.4) must be positive or both must be negative. For positive numerator and denominator of(A. 4), B*, < 1, and For W < 1, a* < 1-8 For W > O, 1* < S. 1-A For negative numerator and denominator of (A.4), 1* > 1 + (1-O) (l-a*-1), and For W < 1, a* > 1-a For W > 0, 1' > 1. A summary of the analysis of(A.3)and(A.4)is below. Equation W > 0 W < 1 (A.'3)' ' < 1-a a* < a (A. 4) (with 1* < 1) 1' < 1 a* < 1-1 1-A (A.4) [with 1-A * > 1 + (B) (1-c*-13) a* > 1-> 1* > 1 + (1 -9 *-)] Equations (A.3)and(A.4)must be simultaneously satisfied; therefore, the bounds on a* and 8^ (assuming a < 1- and 1 < 1-a) must be a* < a < 1-1, (A.5) 1* < 1 < 1-a. (A.6) In the implementation of our sequential probability ratio test, if we follow restrictions(A.5)and(A.6)then we are assured that W and P(L) will be within their'respective bounds.

-39 -Although (A.5) and (A.6) guarantee that the bounds for W and P(L) will be satisfied, we must make one further alteration of (A.5) and (A.6). Since we assume in general that B < I-a and a < 1-3, then in (A.3) a* = a and B* = 1-a, and equation (A.4) will not be satisfied. Likewise, in (A.4), when g* = 3 and a* = 1-p, (A.3) will not be satisfied. Thus, our final restrictions on a* and 1* are a* < ca and (A.7) <* < 3. (A.8)

-40 - APPENDIX B In this appendix we show the general conditions for dZ/dSf* < 0, where Z is the total cost function to be minimized. A general form for Z was given in expression (6.20), and for convenience we rewrite it below. 1 ( [r/P*(bl-b3) + b2 - b k Yoyi LE "-;(b I + b2) minimize Z = k C + C2 y - k (^*(bl-b3) + b2 - b\ k 0 Y0Y1 K f*(bl,* + b2) J (6.20) We set positive finite values for y and y1 according to criteria that are independent of the minimization procedure. Then, for fixed values of k, (1 < k < K), we study the behavior of Z as,' is allowed to vary. dZ dS* ] 3 b - b b - b where P = b1* + b2 P*(bl* + b2) -(b1 - b3)b1 (b2-b)(2b1l* + b2) 2 b 22 (bl* + b2) [P*(bl* + b2)] and C2, y0, Yi' and k are as defined previously. k-1 (k dTo test for we know that P >0, since bl - b3 2 - b bl1 * + b2 +*(bl * + b2) = <(lc) > 0. Also,

-41 - 1 ( - 7 2 > 0; C2 > 0; and Y0 > Y1 > 0. dZ Therefore, to show that do* < O, wemust show Q < 0. For Q < 0, we must have (b1 - b3)b1 (b2 - b)(2blS* + b2) _ 3 -2 — - + 2- --- 2- < 0; (b1, 1 + b2)]2 [ b (b1 - b3)bl3*2 + (b2 - b)(2b1l* + b2) 2 --- —- ~~> 0; [3*,(b1* + b2)]2 2 and, sinceb1 = A - B > 0 and [P*(b13* + b2)] > 0, b 2 2 (b1 - b3)1'* + (b2 - b)(21* + -) > 0. The result above is quadratic in p*, and by completing the square we obtain the solutions for 3* that will give Q < 0 and d* < 0. b- b - b ) (b2b3 - bbl) 13*i —b — - ^bl -- (6.16) 1 3 b - b2 + 2b b) (b2b3 - bbl)] 3*1 _l- *> (6.17) bl 3 In addition, it is not difficult to show that a value of 8* satisfying (6.16), will also satisfy 1* < 3, and a value of 1* satisfying (6.17), will also satisfy 8* > 3. The latter result is not admissible, however, as we show in Appendix A. Therefore, it is (6.16) that is useful for implementation purposes.

-42 -APPENDIX C In this appendix we show that, for o < 13 < 1 *, then - - max 0 < c* < a. In (6.3)we derived the relationship between a* and 13* that guaranteed the two hypothesis structures would be satisfied: b33* + b * e bi* + bj (6.3) bl * + b2 Next, we derive d _* b2b3 - bbl dc* 23 [1*(bI + b2)] For d-* < 0, we must have d13 b2b3 - bb1 < 0. Substituting for b, bl, b2, b3 (as given in (6.4) through (6.7), -AB(1-A) (l-B) (ac —l)2 < 0, AB(1-A)(1-B) > 0, which will always hold. Therefore with da*/d1* < 0, we know that a* will have its maximum (minimum) value when 1* is at its minimum (maximum) value. Given the results of Appendix B, the bounds on 1* are 2 - (b2 ) (b2b3 - bb 2 0 < 13*< b —b <1. For B* = O, the maximum value of a* will be b max b2 2

-43 - Substituting for b and b2 (as given in (6.4) and (6.6)), it is not difficult to show that 0 < b < a. b2 - b b - b - 2 (b2b3 bbl) 2 If we let f* = max b1 - b then the minimum value of a* will be b 3* + b 3 max min b *B + b 1 max 2 Substituting for b, bl, b2, b3 (from (6.4) through (6.7)), we find that b33ax + b 3 3max < 0 < b *a + b max< 1 max 2 Summarizing, if 6* is in the interval 0 < 3* < 3* < g, - -- max then a* will be in the interval b3B + b b 0< < +bT * 2<. 1 Bmax 2 2

-44 -APPENDIX D In developing a decision rule for the sequential model, the troublesome case is when the minimum value of our test statistic for a fixed value of m, A(m), is less than or equal to C R(*, B*), but the maximum value of A(m) is greater than C (a*, * ). In that case the distribution R of the m errors over the k sites will affect the decision of whether to reject H* or to audit additional sites. Even with a moderate value of k and m it is cumbersome to enumerate all possible distributions of the m errors. For all values of A(m), a more systematic approach can be taken by viewing the problem as one of optimization. Consider the following linear integer programming model in which we find values of the S. such that we m 1y + n -j minimize Z = X S.ln ( + n - A(m) j=0 3 ' + n J.- " subject to A(m) < ln(1 _ ) + k ln(-) = C (a*, R3) (D.1) S = k (D.2) m s. = m (D.3) J-l J So _ S S1 > S2 (D.4) S > S m-l - m S > ~ (D.5) S. integer, j = 0,...,m. (D.6) J~~~~~~~~~~~~~~~~~~D6

-45 - We recognize the test statistic, A(m), as our objective of minimization. The only unknown values are the S's(j = 1,...,m), and constraint (D.1) J guarantees that sets of solution values of the S. will correspond to error distributions that cause rejection of H*. We are certain that one such solution exists because we are considering the troublesome case. In fact, the solution to our model above will yield the minimum possible value of A(m) (let min Z = Z ) and the S-. will have the values j S1 = S2 =... = S = 1 and, S = k. This solution corresponds to all 1i S2 m o m errors being observed at one site. Constraints (D.2)and (D3) guarantee that the specific values of k and m for the given situation are properly related to the S!s. Constraint set (D.4)preserves the ordering of the S. as they are defined. Constraint (D.5) is the usual nonnegativity requirement, and (D.6) reflects the requirement that only integer values can be observed. Our goal is to find all sets of values of the S. that will yield J values of A(m) between the minimum and maximum and which will cause rejection of Hg (i.e., which will yield A(m) < C (a*, S*)). Then, any other sets of values of the S. will indicate that we should audit more J sites. In order to accomplish our goal we adjust our optimization model by adding a new constraint, A(m) > Z1 + e. (D.l.a) Recall that Z1 is the minimum value of the objective function found in the solution of our first model. By adding a nonnegative increment e to Z we will guarantee that we will not obtain the same optimal solution (if one exists) to our second model as we did to our first model.

-46 -Suppose that an optimal solution to our second model exists which yields minimum Z = Z2. We will then have another distribution of the m errors that will cause rejection of H* and we can proceed to search for more. The search for distributions of m errors that cause rejection of H0 is a recursive process. This process consists of recursively forming a new optimization program by altering the right-hand side of constraint (D.l.a) after an optimal solution to the previous program is found. These recursive models have the general form minimize Zh = A(m) subject to A(m) < C (ca*, *) (D.1) - R A(m) > Z. + e (D.l.a) - h-1 S = k (D.2) - m S. - m (D.3) j=- J S > S o - 1 S1 > S (D.4) S > S m-1 - m S > 0 (D.5) mSj integer, j = 0,...,m (D.6) where h = 2,...,p, p = the number of different distributions of the in errors that will cause rejection of I{1, Z1 = minimum value of A(m) over all possible distributions of the m errors.

-47 - Of course, a priori, we do not know the value of p. We continue to recursively generate solutions until we get to a program where the objective function is to minimize Z and for which no feasible solution exists. At that point we will have generated p different solutions which represent p different distributions of the m errors at the k sites that cause rejection of H*. The only remaining question is whether we have found all possible sets of values of the S. that will cause rejection of H*. That is, at J 0 this point we will have found p of them, but how do we know we have not excluded any? In terms of the objective function's value, the p solutions we have found have generated p values of our test statistic, A(m), with the following relationships: 1 2 p-<... < Z < CR The question is, have we excluded any solutions that will yield Z values in the interval Z1 to C (ac, 6*)? The answer lies in the selection of the nonnegative increment, e, which is part of constraint (D.l.a). The increment e is what causes a different solution to be found and, therefore, a different value for Z. If e is too large, it is possible that we could unintentionally miss a solution that would cause rejection of H11 The minimum increase in Z is +n - + n - + n 2 ' 0 + n 1 v n —2J' which we can use to set the value of e. If the value of e is in the interval (Y) + n - 1)(Y. + n - 2) 0 < e < ln -L: 0 (Yin + n - 1)(yy + n - 2)J ' constraint (D.l.a) willprovide the condition necessary to guarantee

-48 -that the p solutions we obtain will represent all possible distributions of the m errors that will cause rejection of H*. 0

-49 -FOOTNOTES 1. The maximum tolerable error rate is also referred to as the desired upper precision limit. 2. One international auditing firm has a client with over 1,000 geographically separated sites, and the auditing firm conducts attribute tests at no more than thirty sites each year. 3. AICPA, "Statement on Auditing Standards No. I," Section 32, paragraphs.35 through.42. 4. If we could have P(LIG) = 1.0 and P(LIG) = 0, and also P(L) = P(G) =.5, then we would have a* = a and a* = X. 5. a* is not really a decision variable since it is determined by 3*. 6. Advice by Professor K. Murty on finding an optimal solution to this problem was very helpful and is appreciated. ) - -

-50 - 1. Initial Planning Phase: Alternative Sampling Plans k n(k) Total Cost 2 10,557 $21,314 3 642 2,226 4 321 1,684 5 212 1,560 6 158 1,548 7 126 1,582 8 105 1,640 9 89 1,701 10 78 1,780 11 69 1,859 12 62 1,944 13 57 2,041 14 52 2,128 15 48 2,220 16 44 2,304 17 41 2,397 18 39 2,502 19 36 2,584 20 34 2,680

-51 - 2. Second Planning Phase: Alternative Sampling Plans _k'____ __________n' (k') Total Cost 3 3,792 $11,676 4 769 3,476 5 418 2,590 6 285 2,310 7 215 2,205 8 173 2,184 9 144 2,196 10 123 2,230 11 108 2,288 12 96 2,352 13 86 2,418 14 78 2,492

(O,rY -52 -FIGURE A. Shape of Generating Distribution with y > 1 p(P) ----------- ^, --- —------ P (1,0)

-53 -REFERENCES Auditing Standards Committee, Statement on Auditing Standards No. 1: Codification of Auditing Standards and Procedures (1972), New York: American Institute of Certified Public Accountants. Corless, J. C. (1972), "Assessing Prior Distributions for Applying Bayesian Statistics in Auditing," The Accounting Review, XLVII, No. 3, 556-566. Crosby, M. A. (1979), "Bayesian Statistics in Auditing: A Comparison of Probability Elicitation Techniques," Working Paper #702, Purdue University, Krannert Graduate School of Management. _____, "Bayesian Statistics in Auditing: An Examination of Sample Size Decision," (1979), Working Paper #705, Purdue University, Krannert Graduate School of Management. Felix, W. L. and Grimlund (1977), "A Sampling Model for Audit Tests of Composite Balances," Journal of Accounting Research, 15, No. 1, 23-41. Francisco, A. K. (1972), "The Application of Bayesian Statistics to Auditing: Discrete Versus Continuous Prior Distributions," unpublished Ph.D thesis, Michigan State University. Leslie,D. A. (1979), "Auditing in Multilocation Environment (Phase I), paper presented at the American Statistical Association Meetings in Washington D. C. Mood, A. M., Graybill, E. A., and Boes, D. C. (1974), Introduction to the Theory of Statistics, 3rd Edition, New York:McGrawHill Book Co.

-54 -REFERENCES, continued Roberts, D. M. (1974), "A Proposed Sequential Sampling Plan for Compliance Testing," Symposium on Auditing Research I, University of Illinois at Urbana-Champaign. Vance, L. (1950), Scientific Method for Auditing, Berkeley and Los Angeles: University of California Press. Wald, A. (1947), Sequential Analysis, New York:John Wiley and Sons, Inc.