TRUNCATED SPRT FOR NON-STATIONARY PROCESSES: SENSITIVITY OF ASSUMPTIONS Technical Report 83-21 Damodar Y. Golhar Stephen M. Pollock Industrial and Operations Engineering Department The University of Michigan, Ann Arbor

TRUNCATED SPRT FOR NON-STATIONARY PROCESSES: SENSITIVITY OF ASSUMPTIONS Damodar Y. Golhar Stephen M. Pollock Industrial and Operations Engineering Department The University Of Michigan, Ann Arbor Summary: There is little literature on the truncated SPRT when observations have distribution parameters that change over time. We develop a truncated SPRT when the observations come from a Normal distribution with parameters that linearly increase over time. The truncation point, specified in advance, gives error probabilities within desired limits. The method is developed for two different assumptions about the non-stationary observations: i) the observations are independent and ii) the differences between successive observations are independent. The sensitivity of results to these assumptions is studied.

1. Introduction Most of the literature on the truncated SPRT deals with the situation where the random variables X1, X2,..., are independent and identically distributed. In many situations, however (see Golhar (4)), the sequential observations X1, X2,..., are continuous random variables whose distribution parameters change with time, and thus form a non-stationary process. These time trends should provide additional information for efficient hypotheses testing. Anderson (1) and Armitage (2) considered IID Normal random variables when studying the behavior of the operating characteristic and ASN functions of the truncated SPRT. Madsen (5) gave approximate stopping bounds and the truncation point using numerical integration for IID observations. He noted, however, that solving those equations recursively could be hard and, in practice, it might not be possible to obtain both the stopping bounds and the truncation point such that the resulting test would give actual error probabilities less than or equal to the desired error probabilities. Aroian and Robison (3) showed that for a small truncation point, for IID Normal random observations, actual error probabilities can be numerically computed to any desired degree of accuracy. Their method, however, becomes tedious for large truncation point values. There is also some literature available on the untruncated SPRT for non-stationary processes. Phatarfod (6) considered Markovian dependence among discrete random variables, assuming 1

the same transition probability matrix at each sampling stage, and derived the expressions for the operating characteristic function and the ASN function. Siegmund (8,9) obtained an expression for the ASN function when independent random variables X1, X2,.. have means /, 2,... and variances'1,,.2 Phatarfod (7) also developed relationships for the ASN and operating characteristic functions for continuous Normal random variables, with Markovian dependence, when testing the hypotheses regarding the correlation between successive observations. Thus, the literature on the truncated SPRT is for IID random variables and the literature on non-stationary processes deals only with the untruncated SPRT. Here we consider truncation for non-stationary processes. When using a truncated SPRT it is desirable to specify (in advance) a truncation point such that the resulting test gives the minimum expected number of observations with a constraint on desired error probabilities. We find such a truncation point when the mean and variance of the Normal sequential observations X1, X2,.... increase linearly over time. This linear trend can be due to one of two possible underlying behaviors: i) the sequential observations are independent or ii) the differences between successive observations are independent. We will find appropriate truncated tests, and investigate the sensitivity of the truncated tests, to these assumptions. Let fi(Xi/Wj) be the density function of a random variable Xi, at time i, under the hypothesis Wj for j = 0,1. 2

Then the log-likelihood ratio at time i is Zi In fi(Xi/W1 ) Zi = n ------ (1) fi(Xi/WO) If the Xi's are independent then the log-likelihood ratio at time n is n Zn = zi i=l Let aed and Ed be the desired error probabilities of type I and type II respectively. Then, Wald's (10) approximate lower and upper stopping bounds are: 1 4(d n L d } Wald (10) proposed the following decision rule for the SPRT truncated at time m: reject WQ if Zn > b for n = 1, 2,..., m accept W0 if Z < a for n = 1, 2,..., m and take one more observation if a < Zn < b for n = 1, 2,..., m-1. If the experiment does not stop at or before m then reject W0 if b > Zm > 0 and accept W0 if a < Z < 0 (3) — m 2. Truncation for the SPRT when observations are independent Assume that the random variables Xi are independent and normally distributed with unknown mean i and known variance i2. Thus, Xi N(i, if2) for al i = 1, 2,... Then, from relation (1), the log-likelihood ratio Zi at time i is: 3

(%0) (Xi - irl)2 (Xi - i0)2 Zi = In ---- - ---------- ----------- (G-1) 2 i/12 2 i a-2 We now assume* that W- G- = G to get?, - /O /5 2 -_ 02 ------ X. i -------- (4) Z i (2 2i -2 Since Zi is a linear function of only Xi, it is independent of Zk, k=i and is normally distributed. Taking moments of Xi in (4) gives id2 Zil N ( —-, id2) 2 and (5) -id ZioO ~N ( -—, id2) 2 aO - where d= ---- Since the Zi's are independent, the log-likelihood ratio at time n is n Z = i i Zi i=i *If DO $ -1, the density function of Zi becomes non-Normal and, although the details of this case can be worked out, the analysis becomes complicated and does not contribute to the general conclusions reported here. 4

In order to find a truncation point that gives actual error probabilities less than or equal to the desired error probabilities, we need to calculate the operating characteristic function L(Wj) when Wj is the true hypothesis. Hence we must find the probability density of Zn given that a < Z < b for k = n-l 1, 2,..., n-l. Denote by P(z,n) the prob {(Zn<z) f\ (a<Zk<b)} k=l where pj(z,n) is the derivative of Pj(z,n) with respect to z for j = 0, 1. Successive convolutions are required to calculate Pj(z,n) namely pj(z,l) = f(z/Wj) for j = 0, 1 (6) b and pj(z,n) = f pj(u,n-l) fn(z-u/Wj) du n > 1 and j = 0, 1 (7) a where fn(z/Wj) is a Normal density function at time n with mean nd2 (d2 ) and variance nd2 when the hypothesis W1(WO) is 2 true. Using these relationships we can calculate L(Wj, m) = f pj(z,n) dz + / p(z,m) dz (8) n= - E(N/Wj, m) = n fpj(z,n) dz + pj(z,n) dz n=l - b + m pj (zm-l) dz. (9) Let (a, flBa and (d' /Sd be the actual and desired error probabilities. Also, let m* be the non-integer truncation point such that the SPRT truncated at m* gives (a = cd and 8a = d. The integer truncation point m** will be obtained by rounding m*

up to the next higher integer. We can now establish a relationship between m*, the desired error probabilities, and the discrimination factor d for the symmetric case (i.e., for Ed = Dd)> by means of the following procedure: i) Given o(=- d =/3d' Wald's constant stopping bounds a and b are computed by means of equation (2). ii) For a given value of d, using equations (6) through (8), da(m) (=fa(m)) are computed, for different truncations m, by carrying out the numerical integration. iii) The value of m for which o(a(m) = o'd is found (by interpolation, if necessary), and is, by definition, m* Figure 1, shows ln(m*) vs. ln(d). An approximately linear relationship between ln(m*) and ln(d) is immediately apparent. A common slope ( -1.09) for.01 < o( <.2 can be obtained by a linear regression. This suggests that m* and d have the following relationship: ln(m*) - ln[k(o<)] - 1.09 ln(d) where k(*() is a constant and depends upon the value of o. To obtain k(C), m* was plotted against C for d=l, as shown in figure 2. This curve is well fit by the equation: k(oK) 11.57 - 13 ().2 Hence, the relationship between ln(m*) and ln(d) can be rewritten as: ln(m*)~ ln(11.57 - 13(*()'2) - 1.09 ln(d) (10) The smallest integer truncation point m** is now found by 6

C3I U) Ln.. C) C3 tO I -1.50 -1.13 -0.75 -0.38 0.00 0.38 0.75 In(d) Figure 1 - The relationship between ln(m ) and ln(d) for independent observations.

CD r CT CD CM n.o00 0.04 0.08 0.12 0.16 0.20 Figure 2 - The relationship between m and a when d=l for independent observations. Eo X^

rounding up the solution to (10). The resulting test will then give actual error probabilities not greater than the desired error probabilities, at the expense of a slight increase in the maximum, and average, sample size. 3. Truncation for the SPRT when increments are independent: In this section we assume that the differences (increments) between successive observations are independent. This underlying behavior can also lead to a linear time dependence of observation means and variances. Thus, we assume that the variables X1, X2-X, 1 i~i-l... are independent and identically distributed. Let gi(X1,...,X./W1) denote the joint density function of observations X1,..., X. at time i when the hypothesis W. is true, for j = 0, 1. Then, the log-likelihood ratio at time n is l nI (X1' X2, 2., Xn/Wl) Z In in —--------------------- n ( ^' X2 * X/WO) ( By transformation we get, Pn (X1' 2-X1' 0063 Xn-Xn-l/Wl) Z In j —-------- ----------------- Z gn = ln )- - nn-l/WO since the Jacobian of the transformation is the determinant of an upper triangular matrix with one's along the diagonal. Defining the log-likelihood ratios: g1 (X1/w1) ) Z = In -------- g1 (X1/Wo) agn (Xi-Xi-Z lf a and Z- = In / — (X —---------- for all i > 2

n We get, Zn = Z + ~ Zi i=2 If we assume that X1 ~ N(j,. 2) for j = 0, 1 and Xi - X N(ji,'2) for j = 0,1 and i > 2 then the mean and variance increase linearly over time with Yj and g2 respectively. To find m* a numerical integration procedure can be carried out similar to that outlined in section 2. Instead of d, there are two parameters d1 and d2, d1 = /0 -1 and d2 = ~0- 1. We chose d1 =.5d2, d1 = d2 and d1 = 2d2 to study the relationship between m* and d2 (or, as it turns out, between ln(m*) and ln(d2)). ( was varied between.01 to.2 and d2 between.25 to 2. Note that when d1 = d2 = d the truncated SPRT for IID Normal Xi's is a special case of the truncated SPRT for independent increments. In this case we get the linear relationship between ln(m*) and ln(d) shown by Golhar (4): ln(m*): ln( -79 + 72 (o)079) -2.09 ln(d) (11) Figure 3 and figure 4 show ln(m*) vs. ln(d2) when d1 =.5d2 and d1 = 2d2 respectively. It is seen that the relationship between ln(m*) and ln(d2) is non-linear. Some reasons for this are explained by Golhar (4). Thus, under two different independence assumptions about the sequence of non-stationary random variables X1, X2,..., we have obtained truncation points that give actual error probabilities not greater than desired error probabilities. One question that immediately follows is: how sensitive are these tests, in terms of m** and the resulting oa,'a', and 8

r) CT U) cNj'"'l'~~'~' "' e'"' oCza" <; —-.5L U) C3 ~-0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 Figure 3 - The relationship between ln(m ) and In(d2) for independent increments when dl=.5d2

cI) CY) LD c* CV =C If) C= Ic C=_- i,, i, 4 -0.75 -0.50 -0.25 0.00 0.25 0.50 0.75 In(d2) Figure 4 - The relationship between ln(m*) and ln(d^) for independent increments when dl=2d2

E(N) to the assumptions involved? Since it might be time consuming and/or expensive to verify which model is actually governing the observations, there might exist a range of parameters for which the test derived using the assumptions of one model might be superior to that of the test derived using the other, in the sense that it gives smaller <a = Ba or E(N) or m**. In the next section we examine this possibility. 4. Sensitivity of independence assumptions: We have seen that when the marginal mean and variance of a sequence of random variables increases linearly with time, one of the following two assumptions could describe the underlying behavior: i) the observations are independent or ii) increments are independent. The former model we will refer to as IO, the later as II. Thus, for II X1 ~N(j,.2) for j = 0, 1 X i-Xl N((1j, C2) for j = 0, 1 and i > 2 and for IO X{ " N( ij, ir2) for j = 0, 1. A) Assume II but in reality IO: If II is assumed then the log-likelihood ratio is computed to be: z Xn(12) n 2 2 n 2^ 2 i=l 2r and these values of Zn will be compared to the thresholds a and b. However, if the Xi's are in reality indpendent and normally distributed with mean i;j and variance i q 2, then, Z1, Z2, *.., 9

Zn will also be independent Normal random variables. Taking moments of (12) we obtain, Znl N(nd2 nd2 ) 2 Zn0 - N(-nd2 2 d 2 Since we assume II, the truncation point m** would be obtained from relation (11). However, the values of E(N/Wj, m**) and o(a(m**) = pa(m**) are obtained using relations (8) and(9). B. Assume IO but in reality II: Under assumption IO, the computed log-likelihood ratio at time i is given by relation (4). However, since in reality the increments are independent, we have for i > 1, X. = (X-Xi ) + X. for any value of X. I 2. i-' Therefore, XI " N(/j, 02) for j = 0, 1 and Xi N(kj + Xil, C2) for j = 0, 1 and i > 1. Taking moments of equation (4) with the Xi's thus distributed we obtain the conditional distributions of Zi as: d2 2 i _d 2 d2) and Z^il N( —- + Zi-l, d2 2 / and Zi0 i2 io Zi-)2 Since we assumed IO, the truncation point m** would be obtained from relationship (10). Thus, the values of E(N/Wj, m**) and oa(m**) = a(m**) are obtained using relations (8) and (9). C. Example Results: To study the effect of wrong assumptions, three values of 10

the discrimination factor d were chosen (d =.75, 1 and 1.5), and <(d( = d) was varies between.01 and.1. For fixed values of d and o(d' m** was obtained for each model. This m** was used as the truncation point for that particular assumed model, no matter the reality. For m** thus known, and fixed d and <d' values of <a and E(N) were obtained. An example of the results is shown in table 1, where d=l and,d=.01. When we assume II holds, and it does in reality, then m** = 25, a =.0095, and E(N) = 10.31. When we assume II but in reality IO holds, we use the same m** = 25 but get ua =.0035 and E(N) = 6.79. Similarly, when we assume IO then m** = 7, and if IO actually holds, then <a =.006, and E(N) = 4.41. However, if the same m** = 7 is used because we assume IO, but in reality II holds, we obtain a =.0353, and E(N) = 4.2. It can be seen from table 1 that when II holds in reality, using the wrong model gives E(N) = 4.2 which is much less than E(N) = 10.31 obtained by using right model. However, we also obtain <(a =.0353 which is much higher than o(d =.01. On the other hand, when the observations are independent in reality, the use of a wrong model gives a =.0035 which is much less than.01 but it gives E(N) = 6.79 compared to 4.41 obtained by using the right model. Since the verification of independence assumptions might be expensive and/or time consuming, an experimenter might prefer such a slight increase in E(N), in the event of the underlying assumption being wrong, as long as 11

E-4 Z rz WU~~~C) Gi' LI 63N r 1 0 E- - 0 0 0 0 Lo II O II Q V) (M (t II E -II Ez Ca rJ3 uI Z~ It 3 | CI ~ z~ c II Z CD II 2 o o0. 0 ~ -- z *,o z. X N e 1 a E o 11 Q Z Z Ez Ec ZI W S 1 Z tl 0 0,'o U) = 0-D Z 0 0 >n L I o ii:Cz0 z >4 C> E-' ZCZ U a3 n en a3 2 -2 Qd rz n z IIz m 0 rz cu d ~-* in 11 ~1 U)E- rZ 02 U) Z 0 Z X z < z c/) tn < CO 3:~~~~~E

5. Observations: For other values of d and o<d it has been numerically confirmed (Golhar (4)) that the independent increments assumption is marginally superior to that of independent observations, in the sense that the II model gives only a slightly higher E(N) (in the event when II assumption is wrong) than that given by the right model, but still gives ~a < d. This behavior is due to the fact that, for the IO model, the SPRT is truncated at an early stage assuming that a lot of information will be available. (Note that for IO, Zn n 2 Zi). But when, in reality, the II assumption is true then i=l the IO model gives actual error probabilities much greater than desired error probabilities. On the other hand, the II model makes use of only the most recent information (Zn is a function of Xn only). Hence the truncation point is set high to get ea < d'. When in reality IO is true (which uses all the available information) then the independent increments model will give a slightly higher E(N) than the correct model but still gives actual error probabilities less than the desired error probabilities. 12

References: 1. Anderson, T.W., "A Modification of the Sequential Probability Ratio Test to Reduce the Sample Size," Ann. Math. Stat., 1960, Vol. 31, pp. 165-196. 2. Armitage, P., "Restricted Sequential Procedures," Biometrika, 1957, Vol. 44, pp. 9-26. 3. Aroian, L.A. and Robison, D.E., "Direct Methods For Exact Truncated Sequential Tests of The Mean of a Normal Distribution," Technometrics, 1969, Vol. 11, no. 4, pp. 661-675. 4. Golhar, D.Y., "Sequential Analysis: Non-Stationary Processes and Truncation," a Ph.D. dissertation submitted to industrial and operations engineering department at the University of Michigan, Ann Arbor, 1983. 5. Madsen, R., "A Procedure For Truncating SPRT'S," Journal of Amer. Stat. Assn., 1974, Vol. 69, no. 36, pp. 403410. 6. Phatarfod, R.M., "Sequential Analysis of Dependent Observations I," Biometrika, 1965, Vol. 15, pp. 157-165. 7. Phatarfod, R.M., "Sequential Tests For Normal Markov Sequence," Journ. Austrial. Math. Soc., 1971, Vol. 12, pp. 433-440. 8. Siegmund, D.O., "The Variance of One-Sided Stopping Rules," Ann. Math. Stat., 1969, Vol. 40, no.3, pp. 10741077. 9. Siegmund, D.O., "On the Asymptotic Normality of One-Sided Stopping Rules," Ann. Math. Stat., 1968, Vol. 39, no. 5, pp. 1439-1497. 10. Wald, A., Sequential Analysis, Dover Publications Inc., New York, 1947. 13