ENGINEERING RESEARCH INSTITUTE UNIVERSITY OF MICHIGAN ANN ARBOR SIGNAL DETECTABILITY: A UNIFIED DESCRIPTION OF STATISTICAL METHODS IEPLOYING FIXED AND SEQUENTIAL OBSERVATION PROCESSES Technical Report No. 19 Electronic Defense Group Department of Electrical Engineering By: W' C. Fox By: U:Approved by: ~4~~. A. B. Macnee W. W. Peterson Project M970 TASK ORDER NO. EDG-3 CONTRACT NO. DA-36-039 sc-15358 SIGNAL CORPS, DEPARTMENT OF THE ARMY DEPARTMENT OF ARMY PROJECT NO. 3-99-04-042 SIGNAL CORPS PROJECT NO. 29-194B-0 December, 1953

C4- j7cLC< ljl

TABLE OF CONTENTS Page LIST OF SPECIAL SYMBOLS iv ACTOiLEDGEMENTS vi ABSTRACT vii 1. ITRODUCTION 1 1.1 Prefacing Remarks 1 1.2 The Problem of Signal Detection Formulated for Statistical Analysis 2 2. TESTS OF FILfE SAIPLES 3 2.1 Finite Sampling Plans 3 2.2 The Concept of a Criterion 5 2.3 Probabilities Associated with Criteria 5 2.4 Likelihood Ratio and the Ratio Criteria 6 2.5 Weighted Combination Criteria 7 2.6 INeyman-Pearson Criteria 8 2.7 ROC Curve 10 2.8 Siegert's "Ideal Observer's" Criteria 12 2.9 The Finite Ratio Test 13 3. SEQUEntIAL TESTS 13 3.1 Infinite Samples 13 3.2 Sequential Tests 15 3.3 Probabilities Associated with a Sequential Test 17 3.14 Average Sample Numbers 18 3.5 Sequential Ratio Tests 20 3.6 Optimum Sequential Tests 20 I4. CONCLUSIONS 22 4.1 Applicability of Finite Ratio Tests 22 4.2 Applicability of Sequential Ratio Tests 25 APPEINDIX A -- The Mathematical Theory of Sequential Tests 28 A.1 Introduction 28 A.2 Sequential Tests 29 A.3 Sequential Ratio Tests 34 A.4 Optimum Tests 35 APPEITDIX B -- Sample Plans 38 B.1 Introduction 38 B.2 If Populations N and SN are Finite Dimensional, Then There Is an Admissible Sample Plan 39 B.3 Sampling in Arbitrarily Short Intervals 40 APPENDIXD C -- Probability Density Functions 42 BIBLIOGRAPHY 44 DISTRIBUTION LIST 47 iii

LIST OF SPECIAL SYMBOLS AND TEIMS In Order of Appearance Population N Defined on page 2 Population SN Defined on page 2 Zn A point in n-dimensional space f'N (z ) Probability density function (of dimension n) for population N fSN(Zn) Probability density function (of dimension n) for population SN PSN Population SN's probability function, defined on page 5 PN Population N's probability function, defined on page 5 F Conditional probability of a false alarm, defined on pages 6 and 17 M Conditional probability of a miss, defined on pages 6 and 18 (Zn) Likelihood ratio, defined on page 6 A(g) Ratio criterion, defined on page 7 ROC curve Receiver Operating Characteristic curve, defined on page 8 An, Bn, Cn Sequential criteria, defined on pages 15 and 16 Sn n-th stage sample space, defined on pages 15 and 16 EN N-conditional average sample number, defined on page 19 TSN SN-conditional average sample number, defined on page 19 The Following are frao the Appendices The n-dimensional Euclidean Space, E1 is then the system of real numbers fn; En El A real valued function defined on En A x B The "Cartesian Product" or the set of all possible pairs of points (a, b) with a in A and b in B. For example, E2 = E X E A U B The set of points which belong either to A, or to B, or to both; read "A union B" iv

LIST OF SPECIAL SYMBOLS AND TERMS (Con't) A n B The set of points which belong to both A and B; read "the intersection of A with B" Ho, H1 Hypotheses, defined on page 32 ca, 3 The error probabilities, defined on page 32 E1(r) Eo (r) The expected values of r, defined on page 33 A D B A "includes" or "contains" B V

AC2JOWLE3DGEMSTS It is impossible to single out credit for one individual in the joint work of a team. Because of this fact, the merits of this report should reflect equally upon Messrs. W. W. Peterson and T. G. Birdsall as well as upon the author, who, however, is alone responsible for all opinions and statements of fact contained herein. In addition to acknowledging the assistance of Mr. P. C. Hayes in the calculations, the author would like to thank Geraldine L. Preston and Jenny-Lea E. Mesler for their patience and skill in preparing the text. vi

ABSTSACT Signal detectability has been studied statistically from various points of view. Those involving an observation interval of fixed length are essentially equivalent, as opposed to those which involve a sequential process. Both approaches are discussed with a minimum of mathematics to provide a reasonably non-technical account of the "state of the art." Definitive comparison of the two observation processes is not possible until more general knowledge is available concerning the existence and nature of optimum sequential tests. In addition, a general mathenatical formulation of sequential analysis is given in which the current theoretical obstacles in applying it to signal detectability are emphasized. vii

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN SIGNAL DETECTABILITY: A UNIFIED DESCRIPTION OF STATISTICAL METHOD EMPVLOYING FIXED AiD SEQUENTIAL OBSERVATION PROCESSES 1. INTRODUCTION 1.1 Prefacing Remarks Signal detection in this report means the detection of certain functions of time (for example, voltages) called "signals" when perturbed by the addition of some other functions called "noise." No attempt will be made to consider methods of estimation of signal parameters or in general to obtain other information about the "signals." A mathematically detailed report,1 (hereafter referred to as Technical Report No. 13) has been made on certain statistical approaches to signal detection; that report constitutes a unified description of the subject heretofore unavailable. In addition, a number of specific applications of the resulting theory have been developed (Technical Report No. 13, Part II). However, it is felt that much of that material is inaccessible to all but a few specialists because of its highly technical nature. Therefore, it seemed appropriate to supplement a report on the applications of sequential analysis to signal detection with a non-technical Peterson, W. W., and Birdsall, T. G., Theory of Signal Detectability, Part I, "The General Theory," Part II, "Applications with Gaussian Noise," Technical Report No. 13, Electronic Defense Group, Department of Electrical Engineering, University of Michigan.

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN description of the results of Technical Report No. 13. In this way a complete survey of the applications of statistical methods could be given in which the text would be accessible to those with a minimum of mathematical training. 1.2 The Problem of Signal Detection Formulated for Statistical Analysis Because a receiver is essentially a linear device, noise generated by the receiver can usually be referred to the input. Thus the situation can be represented schematically as a (noiseless) receiver whose inputs are derived by adding the voltages from two sources: a "signal" generator and a "noise" generator. The totality of possible receiver inputs when the "noise" generator alone is in operation will be called "Population N." "Population SN" is the name given to the totality of possible receiver inputs when the "signal" generator and the "noise" generator are in operation simultaneously. The individual observing the receiver outputs is then being presented with a "sample" of one of the two populations, but he is in ignorance as to which population was in fact sampled, and of the probability that any particular one of them was sampled. All he knows with certainty is that one of the two was sampled. He must then judge which population was sampled. In this discussion it should be kept in mind that the event of population SN being sampled corresponds to signal and noise being present at the receiver input. Also the event of population N being sampled means that noise alone was present at the receiver input. 2

- ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 2. TESTS OF FINITE SAMPLES 2.1 F-inite Sampling Plans This part of the report is concerned with a method of statistical analysis which requiresl for raw data a finite sample; that is, a-finite sequence of numbers Zn = (xi,.. Xn). In the present context, such a sample is thought of as the result of n measurements made at the receiver input. The act of making these measurements is supposed to occupy a certain interval I in time, starting at to of length T. I is called the sample interval. Any particular scheme of making n measurements at the receiver input during the sample interval I is called a sample plan based on I. If n were very large, a receiver which had to make the measurements called for by a sampling plan would certainly be impractical. However, the theory to be developed here is intended to specify an optimum receiver and is couched in the language of finite samples. This practical difficulty can be avoided if it be required that the sampling plan should "throw away" no information. This would mean that from each sample Z it would be possible to reconstruct completely the function of time present at the receiver input during the sample interval. Then the specification of the optimum receiver could be translated back to the language of receiver inputs, from that of samples. The theory to be described below was developed on the assumption that the populations N and SN are "finite dimensional." This means that they can be constructed from some finite number of functions of time li(t-) aJt2(t), -., v n(t) %The statistical theory itself has been carried out for infinite samples (footnote 2, p. 23), but the application of it to specifying an optimum receiver for a particular case las been carried out so far for finite samples only, 3

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN by forming all possible combinations like al wl(t)+ a2 w2(t). + a (t) n(t) where the coefficients al, a2,..., an, are any chosen numbers. The significance of this restriction will be discussed in Section 4.1, Applicability of Finite Ratio Tests. For the purposes of the subsequent development, any sampling plan which throws away no information will be considered,l provided enough properties are known of the associated sample variable Zn = (xl, x2,..., xn) so that certain probabilities may be calculated. Specifically, the probability density functions2 fN(Zn) and fSN(Zn) of the sample variable Zn for the cases when Zn is drawn from population N and from SN respectively must be known. The two basic properties of density functions are fN(Zn) 0 fN(Zn) dZn= 1 and fSN(Zn) 0 fSN(Zn) dZn 1 where the integration symbol represents the multiple integral taken over the entire range of the sample variable Zn. A large part of Technical Report No. 13 is devoted to determining some circumstances where the derivation of the density functions can actually be carried out and the optimum receiver specified. These are listed in Section 4.1. In Appendix B it is shown that many such sampling plans are available when the populations are finite dimensional. The idea of a finite sampling plan is a device useful in performing computations for the finite dimensional case, and in approximating the infinite dimensional one. It is not essential to the theory itself. See Appendix C for a brief discussion of probability density functions, if this term is not familiar. ____________________________.!

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 2.2 The Concept of a Criterion Consider now an observer who has as available data the sample point Zn = (xl,..., xn) given him by the receiver. The observer's job is to judge for each sample point whether or not it was taken from population SI. Although it is not possible to determine the (probably subconscious) criterion used by the observer, it is quite possible to find an external manifestation of it. Ideally all that is necessary is to submit each possible sample point Zn to the observer and to record his judgment. This will yield a tabulation of those sample points which the observer decided were drawn from population SN. If any other observer is given this tabulation and instructed to base his decisions on it, he will behave exactly as did the first observer. Thus the tabulation of these responses can be used to replace the mental criterion employed by the observer. Such a tabulation will also be called a criterion and will be denoted by the letter A, which refers to the phraseology common in statistics of "Accepting the hypothesis that a signal is present." The tabulation of the remaining sample points, those which the observer concluded were drawn from population N, will be denoted by B. 2.3 Probabilities Associated with Criteria There are of course as many different criteria as there are observers. Among all possible criteria it is necessary to select those that are best for various purposes. To do so, certain ntunerical quantities must be associated with each criterion. It will be necessary to know the probability that a sample from one of the populations will be listed in a particular criterion A. According to the standard definitions, these probabilities are given by PSN(A) = fSN(Zn) dZn and PN(A) = fN(Zn) dZn ~~~~~~~~~~~~~~~~'" ~

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN where the multiple integral is taken over all sample points listed in the criterion A. For example, a particular sample plan might have a density function 2 2 2 of the form fIJ(xL ~ x2,.. x) = K exp-(x + 2 + *.. + x ). A possible criterion would consist of those sample points Zn = (xl x2,..., xn) which lie outside a sphere of radius one centered at the origin. Then the integral would be taken over the exterior of this sphere. These probabilities have a special significance. PN(A) is the conditional probability that a sample from population I will be listed in criterion A, that is, will be judged as a sample from population SN. Thus PN(A) = F is the conditional false alarm probability. Also, PSN(A) is the conditional probability of a certain kind of correct response called a hit (that of judging correctly that a sample is from population SN). The conditional probability of judging falsely that a sample is from population SN' is therefore given by 1 - PSN(A) = M, the conditional probability of a miss. The only errors which can occur are false alarms and misses; their conditional probabilities, F and M, are called briefly the error probabilities. A reader familiar with the formal content of probability theory should note that these quantities are true conditional probabilities: the first is conditional on the sample being drawn from population SN; the second is conditional on it being drawn from N. This is to distinguish them from a priori probabilities (the probabilities that a certain population will be sampled, for example) which are not as yet assumed known. 2.4 Likelihood Ratio and the Ratio Criteria It is convenient to introduce a new function called the likelihood rt rt (Zn) ratio, X(Zn), defined as the ratio Z for sample points Zn = (xl,..., xn). _~ 6

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN i(Zn) represents the likelihood that the sample point Zn was drawn from SN relative to the likelihood it was drawn from N. Hence, if l(Zn) is sufficiently large, it would be reasonable to conclude that Zn was in fact drawn from population SN, i.e., that Zn should be listed in the desired "best" criterion. Thus for each number 3 > 0O a certain criterion A(P) will be selected; A(B) is chosen by listing each sample point Zn for which ((Zn) ~ P. The problem then reduces to that of making a wise choice of 3; that is, to determine how large "sufficiently large" is. Criteria of the form A(3) will be called ratio criteria. A number of writers have presented varying definitions of a criterion being "optimum." It turns out that each of these optimum criteria can be expressed as a ratio criterion, so that a receiver designed to yield likelihood ratio as output could be used with any of them. 2.5 Weighted Combination Criteria Suppose it is possible to assign a certain number 3 as a weighting factor representing the importance of a false alarm relative to a hit. Since PSN(A) is the probability of a hit and P%(A) the probability of a false alarm, it would then be reasonable to find a criterion A which maximizes the quantity PSN(A) - OPN(A) But this quantity can be written as /[fSN(Zn) - fN(Zn)] dZn where the integration is taken over the sample points Zn listed in A. To maximize this integral, one would list in A every sample point Zn for which the integrand was not negative. Solving that inequality for 3, one sees that A should contain those sample points Zn for which....... _,....7

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN fsN (Zn) (zn) -> (Z) Thus the desired criterion A is simply A(3), and so it is a ratio criterion. 2.6 Neyman-Pearson Criteria If it is critically important to keep the probability of a false alarm PIT(A) below a certain level k, then it would be reasonable to choose, from among such criteria, that one which maximizes the probability of a hit. Thus Neyman and Pearson proposed as a type of optimum criterion any criterion Ak for which (1) P3(Ak) < k, and (2) PSN(Ak) is a maximum for all the criteria A with the property PN(A) < k. The Ak type criterion can also be expressed as a ratio criterion. This can be made plausible as follows. To begin with, it is necessary to consider only those criteria A for which PN(A) = k, because A will be taken as large as possible in order to meet condition (2). Now consider the curve given parametrically by the equations x =x(3) = PI (A()) and y = Y() = PSN (A(P)) This curve will be called the Receiver Operating Characteristic (briefly, ROC) curve, for a receiver whose output is likelihood ratio and with which ratio criteria are being used. The ROC curve passes through the points (0, 0) and (1, 1), the first at 3 = CO, the second at 3 = O. At 3 =0, (Zn) > 3 = 0 for all Zn, so A(O) ___________________________________ 8 ___________________

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN consists of all possible sample points. Thus the observer will report that every sample is drawn from SN, so he will be certain to make a false alarm and to make a hit. (This assumes that the sample points will not be drawn exclusively from one of the populations.) This can be verified, using the basic property of the density functions expressed by the following equations: PSN(A(O)) = fS(Zn) dZ = 1 and PN (A(O)) - fN(Zn) dZn 1 when the integration is taken over all possible sample points Zn. These equations mean that x(O) = y(O) = 1. Moreover, x(o0) = y(00) = O, because for 3 = o there are no sample points Zn with L(Zn)> CO; i.e., A(oo) contains no sample points at all and the operator will never report a signal is present. Therefore the operator cannot possibly make a false alarm nor can he make a hit. Thus PSN(A(CO)) = 0 and PN (A(c)) = 0. These considerations, together with those of the next section, show that the ROC curve can be sketched somewhat as in Fig. 1. (x,y) =: a-" O, / F. - 0 x(I)=PN(A(o)) l FIG. 1. TYPICAL ROC CURVE. _~ 9...

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN To determine the desired Ak, recall that all probabilities lie between zero and one, so that PN(Ak) = k is between zero and one. Then there is a point Q of the ROC curve which lies vertically above the point (k, 0). The coordinates (x, y) of Q are x = PN(A()) = k and y PSN(A(3)), for some f, which will be written 3k. Now A(3k) is a possible candidate for Ak since PN(A(Pk)) = k. Let A be any criterion with PN(A) - k; it will be shown that PSN(A) < PSN(A(3k)), so that A(3k) meets the requirements of the Neyman-Pearson criterion. From the discussion of the weighted combination criterion, it is clear that T = PSN(A(k))- 3k.PN(A(k)) ~ PSN(A) - 13PN(A) = T*. Thus PSN((k)) = T + k and PSN(A) = T*+ k. The known inequality between the T s yields the desired inequality by subtracting the last equation from the one above it. Therefore, Ak should be chosen to be this particular A(3k); whence the optimum criterion proposed by Neyman and Pearson reduces to a ratio criterion. 2.7 ROC Curve It will be desirable to digress for a moment to study the ROC curve more closely. Its value lies in the fact that if the type of criterion chosen for a particular application is a ratio criterion, A(3), then a complete description of the detection system's performance can be read off the ROC curve. By the very definition of the ROC curve, the x coordinate is the conditional probability, F, of false alarm, and the y coordinate is the conditional probability of a hit. Similarly (l-x) is the conditional probability of being correct when noise alone is present, and (l-y) = M is the conditional probability of a _____................_____ 10 _____.......................___

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN miss. Since most proposed kinds of criteria can be reduced to ratio criteria, the ROC curve assumes considerable importance. In order to determine some of its geometric properties, it will be assured that the parametric functions x x(3) PNI(A(0)) and Y = Y() PS(()) are differentiable functions of 3. The slope m of the tangent to the ROC curve is given by the quotient ). To calculate the slope at the point(x (o), (d3) y(Ro)) notice that among all criteria A, the quantity PSN(A) - 3oPN(A) is maximized by A = A(3o). Therefore, in particular, the function y(3) - 3oX() = PSN(A()) - oPI(()) has a maximum at 3 = D, so that its derivative must vanish there. Thus differentiating, - o - = 0 at = Po. Solving for 3o, one obtains 3 I = Po This shows that the slope of the ROC curve is given by its parameter 3, and so is always positive. Hence the curve rises steadily. In addition, this means that y(P) can be written as a single valued function of x(r), y = y (x), which is monotone increasing, and where y(O) = 0 and y(l) = 1. These remarks make fully warranted the sketch of the ROC curve given in Fig. 1. 1

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN 2.8 Siegert's "Ideal Observer's" Criteria Here it is necessary to know beforehand the a priori probabilities that population SN and that population N will be sampled. This is an additional assumption. These probabilities are denoted respectively by P(SN) and P(N). Moreover, P(SN) + P(N) = 1 because at least one of the populations must be sampled. The criterion associated with Siegert's Ideal Observer is usually defined as a criterion for which the a priori probability of error is minimized, (or, equivalently, the a priori probability of a correct response is maximized). Frequently the only case considered is that where P(SN) and P(N) are equal, but this restriction is not necessary. Since the conditional probability F of a false alarm is known as well as the (a priori) probability of the event (that population N was sampled) upon which F is conditional, then the probability of a false alarm is given by the product P(N)F. In the same way the probability of a miss is given by P(SN)M. Because an error E can occur in exactly these two ways, the probability of error is the sum of these quantities P(E) = P(IT)F - P(SN)M. It has already been pointed out that F = PN(A) and M = 1 - PSN(A). If these are substituted into the expression for P(E) a simple algebraic manipulation gives P(E) = P(SN) - P(SN) [PSN(A) - P(S) P(A)] 12 _.

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN It is desired to minimize P(E). But from the last equation this is equivalent to maximizing the quantity PS (A) P(SN) PN(A) And, of course, this will yield a weighted combination criterion with = P() P (SN) which is known to be simply A(B), a ratio criterion. 2.9 The Finite Ratio Test Once populations N and SN have been chosen, "a finite test" of these populations means a particular choice of finite sampling method and of criterion, where the requirements made by Section 2.1 are met. If the criteria are restricted to ratio criteria, then a finite test is determined by the choice of the following parameters: t Starting time of sample interval T Length of sample interval 3 Parameter of the ROC curve, from which the two conditional error probabilities and the two correct response conditional probabilities can be read off. Such a test will be called a finite ratio test. Note especially that the ROC curve is independent of the particular sampling plan chosen. 3- SEQUENTIALI TESTS 3.1 Infinite Samples Among the various methods of statistical analysis which have been developed, some are designed to make use of infinite samples. This does not mean that infinitely many measurements must be made in an actual application; it 13

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN involves only the theoretical possibility of doing so. If such a theoretical eventuality is allowed, one finds that in actual applications only a finite number of samples are ever needed. In fact this number may even be smaller than the number needed by a comparable standard test. These remarks will be amplified later; at the moment they should suffice to justify consideration of infinite samples. A plan for taking an infinite sample does not necessarily entail an infinitely long interval of time. The "time base" of such a plan can be infinite —for example, by having one measurement made every second. On the other hand, a plan could call for making one measurement at each of the instants tn = l-n, n = 1, 2,... These instants all lie in the time interval from zero to one, and thus such a plan would involve only one unit of time at most. Only those sampling plans for which certain statistical information is known can be used in a test. If the sampling plan has been carried out to the point where n measurements, (xl, x2,... xn) have been made, the variable Zn = (Xl, x2,..., xn) is called an "n-th stage sample variable." For each stage n, the two density functions fN(Zn) and fsN(Zn) of the n-th stage sample variable Zn must be known, where the first is the density function applicable when population N is being sampled and the second applies when population SN is sampled. The density functions at different stages may very well differ, so that actually they should be written fNn(Zn), and fSNn(Zn). However, the n appearing in the argument Zn should always make the situation clear, so that the superscript n on the functions themselves will be dropped. 14

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN 3.2 Sequential Tests A sequential test will consist of two things: 1) An (infinite) sampling plan with density functions fN(Zn) and fSN(Zn) 2) An assignment of certain criteria to each stage of the sampling plan. The idea of a sequential test is as follows. First, make one measurement, xl; if the evidence xl is sufficiently persuading, draw a conclusion as to whether or not a signal is present. If the evidence is not so strong, make a second measurement x2. Then, considering the evidence (xl, x2), repeat the above process, and continue in a similar manner. A particular scheme for making these decisions consists of the assignment of three criteria to each stage of the sampling process. The three criteria represent the three possible conclusions: 1) A signal is present, or 2) A signal is not present, or 3) Another measurement should be made. At the first stage, any (real) number at all could theoretically result from the first measurement. This means that the first stage sample variable Z1 = (xl) ranges through the entire number system, which will be written S1 to stand for the first stage sample space. Suppose the three first-stage criteria Al, B1, and ClD have been chosen. If the sample Z1 is listed in Al, the conclusion that a signal is present is drawn and the test terminated. If it is listed in B1, the conclusion is that noise alone is present, and again the test is terminated. If ZL should be listed in C1, another measurement will be made, and the test moves on to the second stage instead of terminating. ~ ~~~~~~~~~1^

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN When the first stage criteria have been chosen, a limitation is placed on S2, the space through which the second stage sample variable Z2 = (xl1 x2) ranges. The only way the test can proceed to the second stage is for Z1 = (x1) to be listed in C1. Therefore, S2 does not contain all possible second stage samples Z2 = (xl, x2) but only those for which (xl) is listed in C1. Three second-stage criteria, A2, B2, and C2, must now be chosen from those samples Z2 listed in S2. They must be chosen in such a way that there are no duplications in the listings and no sample in S2 is omitted. These criteria carry exactly the same significance as those chosen in the first stage. That is, the three conclusions that a signal is or is not present, or that the test should be continued, are drawn when the sample Z2 is listed in A2, B2, or C2 respectively. The selection of criteria proceeds in the same way. If n-th stage criteria An, Bn, and Cn lave been chosen, then the next stage's sample space Sn+l consists of those samples Zn+l = (x1, x2,... Xny, Xn+l) for which Zn = (xl, x2l...- xn) was listed in Cn. Then from Sn+l are drawn the three (n+l) stage criteria An+l a Bn+l and Cn. When an entire sequence (Al, Bl, Cl), (A2, B2, C2), (An, Bn, Cn), of criteria is selected, a "sequential test" has been determined. This does not mean of course that the test will necessarily be particularly useful. However, 16

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN among all the possible ways of selecting a sequence of criteria and hence a sequential test, there may be particular ones which are very useful. 3.3 Probabilities Associated with a Sequential Test If Qn is any n-th stage criterion, then the quantities1 PN(Q) = f fN(Zn) dZn and Qn PsN(Qn) = fsN(Zn) Z n represent the (N or SN) conditional probabilities that an n-th stage sample Zn will be listed in the criterion Qn. Some examples of the use of this notation are: 1) The n-th stage conditional error probabilities: If population N is sampled, then the probability that the sample variable Zn will be listed in An is PN(An). This is the N-conditional probability of a false alarm. If population SN is sampled, then the probability that the sample variable Zn will be listed in Bn is PSN(Bn). This is the SN-conditional probability of a miss. 2) The conditional error probabilities of the entire test: 0C F = PN(An) the N-conditional probability of a false n=l alarm, and The notation f indicates that the integration is to be carried out over all Q sample points listed in Q. 17 -

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN Co M = PSN(Bn) the conditional probability of a miss, n=l are merely the sums of the same error probabilities over all stages. 3) The conditional probabilities of terminating at stage n are TN = PN(An) + PN(Bn) TSN PN(An) + PSN(Bn) These formulas can be justified by a simple argument. The only ways the test can terminate at stage n is for the sample variable Z to be listed in n either An or Bn. The probability of this event is the sum of the probabilities of the component events which are mutually exclusive since Zn can be listed in at most one of An and Bn. 4) The conditional probabilities that the entire test will terminate are 0o n TN = Z TN n=l CO n TSN = TSN n=l 3.4 Average Sample Numbers There are two other quantities which must be introduced. One feature of the sequential test is that it affords an opportunity of arriving at a decision early in the sampling process when the data happens to be unusually convincing. Thus one might expect that, on the average, the stage of termination of a wellconstructed sequential test would be lower than could be achieved by an otherwise equal, good standard test. It is therefore important to obtain expressions 18

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN for the average or expected value of the stage of termination. As with other probabilities, there will be two of these quantities: one conditional on population N being sampled; the other conditional on population SN being sampled. They are given by =2 n TN and n=l o0 SN= Z nTN n=l The letter E is used to refer to the term "expected value." The quantities EN and ESN are called the average sample numbers. The form these formulas take can be justified (somewhat freely) on the grounds that each value, n, which the variable "stage of termination" may take on must be weighted by the (conditional) probability that the variable will in fact take on that value. It should be heavily emphasized that the average sample numbers are strictly average figures. In actual runs of a sequential test, the stages of termination will sometimes be less than the average sample numbers but will also be upon occasion much larger. Any sequential test whose average sample numbers are not finite would be useless for applications. Therefore the only ones to be considered are those with finite average sample numbers. Under this assumption, it can be shownI that TN = TSN = 1, so that the test is certain to terminate (in the sense of probability). On the other hand, if it is known that TN = TSN = 1, it does not always follow that the average sample numbers are finite. Such a situation would mean only that if a sequence of runs of the test 1See Appendix A. This particular result should be intuitively evident. 19

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN were made, each run would probably terminate, but the average stage of termination would become arbitrarily large as more runs were made. 3.5. Sequential Ratio Tests In studying tests using finite samples it was found that the best criterion could always be expressed in terms of likelihood ratio. Therefore, it may be useful to introduce likelihood ratios at each stage of an infinite sample plan. The n-th stage likelihood ratio function ~ (Zn) is defined as the fs3(Zn) ratio f(Zn). Optimum criteria in the finite sample tests turned out to be criteria listing all samples Z for which 1(Z) is greater than or equal to a certain number. It should be possible to choose sequential criteria (An, Bn, Cn) in the same way. For each stage two numbers an and bn with bn < an could be chosen. Then the criteria (An, Bn, Cn) determined by the numbers an and bn would be An lists all samples Zn of the sample space Sn for which i(Zn) > an Bn lists all samples Zn of the sample space Sn for which (Zn) < bn Cn lists all samples Zn of the sample space Sn for which bn< I(Zn) <an If criteria selected in this way meet the requirements that the average sample numbers be finite, then the resulting sequential test is called a "sequential ratio test." 3.6 Optimum Sequential Tests Because the task of computing the various parameters (error probabilities and average sample numbers) of a sequential test is considerably more difficult than the corresponding task for the standard test, certain simplifications have been introduced. For example, each systematic study of sequential 20

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN tests has been restricted to those of the ratio type introduced in the last paragraph. For these tests there are two ways of defining an optimum which would probably occur to one immediately. The first would say that among all ratio tests with conditional false alarm probability F, that one for which M, EN, and ESIT are minimum will be called optimum. The complexities of such an extremum problem are enormous and there are no answers known as yet. The second natural possibility is to try to find, among all ratio tests with fixed error probabilities F and 14, that one for which the average sample numbers El and ESN are minimum. This is the usual sense in which the word optimum is used concerning sequential tests. Wald has proposed a particular test as an optimum ratio test, which will will be known as the Wald test in this report. A ratio test is a Wald test if each of the sequences {bn} and {an} are constant, that is, if bl = bn and al = an for all n. Moreover, Waldl proved under very restrictive conditions that his test is optimum. Unfortunately, his conditions are never satisfied in the case of applications to signal detectability, as is shown in Section A.4 of Appendix A. However, the absence of theoretical knowledge concerning the optimum nature of the Wald test should not be construed to ban the use of the test, but merely to temper its use with caution. No examples of ROC curves are given for the Wald test in various cases because of the heavy computational difficulties involved. Numerical comparison of the Wald test with a finite ratio test is given in the next section. ~2_____1____________________________21

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN h. CONCLUSIONS 4.1 Applicability of Finite Ratio Tests From a theory of signal detection one would hope to obtain two basic results: 1) The ROC curve, i.e., performance of an optimum receiver, and 2) Specification of an optimum receiver. When population N is taken as finite dimensional with a white Gaussian density function, actual specification of an optimum receiver has been carried outl for certain particular SN populations. These cases are tabulated as follows. In the table S denotes the signal population before being perturbed by the noise. TABRTLE I S Application Signal Known Exactly Coherent radar with a target of known range and character Signal Known Except for Ordinary pulse radar with no intePhase gration and with a target of known range and character. Signal a Sample of White Detection of noise-like signals; Gaussian Noise detection of speech sounds in Gaussian noise. Output' of the Detector Detecting a pulse of known startof a Broad Band Receiver ing time (such as a pulse from a radar beacon) with a crystal-video or other type broad band receiver. A Radar Case (A train of Ordinary pulse radar with intepulses with incoherent gration and with a target of known phase) range and character. 1Technical Report No. 13, Part II. 22

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN TABLE I (cont.) S Application Signal One of M Orthogo- Coherent radar where the target is nal Signals at one of a finite number of nonoverlapping positions. Signal One of M Orthogo- Ordinary pulse radar with no intenal Signals Known Except gration and with a target which for Phase may appear at one of a finite number of non-overlapping positions. In all these cases, either population SN is finite dimensional, or a special method is used to reduce the problem to an equivalent finite dimensional one. Once such a reduction is achieved, a sampling plan which throws away no information can be found,l and the solution of the problem then consists of deriving an expression for likelihood ratio and specifying a receiver whose output will be that likelihood ratio. However, this restriction to finite dimensionality is not at all essential. The theory concerning the existence of an optimum criterion depends only on the presence of a function to play the part of the likelihood ratio function.2 For the purposes of initial investigation and of exposition the restriction of finite dimensionality is very convenient, for then the calculations necessary can be formulated in terms of carefully chosen sampling plans, and the expression for likelihood ratio takes a closed form. With likelihood ratio in a closed form it is not difficult to specify the optimum receiver (i.e., the receiver which has 1See Appendix B. Grenander, U., "Stochastic Processes and Statistical Inference," Arkiv Ftr Matematik, Vol. 2, 195 (1950 )............ 23

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN likelihood ratio as its output) in certain cases such as those tabulated above. Moreover, when the general form of the theory is used, actual calculations would be carried out by using finite dimensional approximations. It appears that the results already obtained concerning the optimumn receiver will not be changed materially when the more general theory is used. Although to date actual ROC curves and optimum receivers have not been determined for cases susceptible to only the general theory, there is no essential obstacle to doing so. In the absence of experimental verification of the accuracy of the ROC curve in predicting the performance of the optimum receiver, there is one remaining fact which could be interpreted as casting doubt on the reliability of the theory so far developed. Under the restrictions 1) that populations N and SN are finite dimensional and 2) that the functions of time in these populations be (real) analytic, it is possible to prove1 that sampling plans utilizing arbitrarily small sample intervals can be found, all of which yield the same error probabilities or ROC curves. One way to explain this anomaly is as follows: There can be little doubt that observations restricted successively to arbitrarily small intervals cannot be equally effective in detecting a signal. At the same time there can be little doubt that extremely precise measurements cannot be made of arbitrarily small intervals. It is not at all uncommon that the assumption that errorless measurements are possible should lead to physically ridiculous conclusions. The apparent anomaly cited above can certainly be thought of as a case in point. 1See Appendix B. 24.

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 4.2 Applicability of Sequential Ratio Tests The current status of the theory of sequential ratio tests is marred by two essential defects so far as its possible applications to signal detection are concerned. 1) The Wald test is known to be optimum only relative to a very restricted class of sequential ratio tests.l 2) Even if the Wald test were known to be optimum in general, conditions under which its average sample numbers are finite remain unknown. However, there are some strong reasons to believe that sequential ratio tests would be very useful if the points cited above were cleared up. The first is the point made in Section 4.1 concerning the desirability of having a practical theory which is not restricted to finite dimensional populations. Sequential analysis might be the needed key for such a-theory formulated in terms of infinite sampling plans. Moreover, whether or not the Wald test is optimum, there are many instances where it compares very favorably with the finite ratio test. For example, suppose that both populations N and SN are finite dimensional with white Gaussian density functions. In this event, successive measurements of the amplitude of the receiver input will be independent and Wald's approximation formulas for the average sample numbers of the Wald test can be used. First a particular sample interval I was chosen. Then a large number W was selected, and the functions of time present in the two populations were determined by taking all such functions which have a Fourier expansion on the interval I and deleting all terms in the expansion of frequency greater than W. This meant that the two populations were of dimension n = 2WT, where T is the 1See Appendix A for a technical discussion of this matter. 25

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN length of the sample interval. In this case the sampling plan, which consists of making n measurements equally spaced in the interval I, will have the property that it throws away no information. In order to secure as fair a comparison as possible, an infinite sampling plan was chosen for the Wald test which involved making measurements at T evenly spaced intervals of length n+. Thus for the first n measurements this sampling plan coincides with the finite sampling plan chosen above. A particular point (F, M) was chosen on the ROC curve of the finite ratio tests, and the average sample numbers of the Wald test whose error probabilities are (F, M) were calculated. These calculations were performed for various ratios of signal energy to noise energy and in all cases the average sample numbers came out appreciably less than the dimension n = 2Wt of the finite ratio test. Thus in this case the Wald test would terminate on the average before the entire sample interval for the finite ratio test had elapsed. The quantitative results are tabulated below. TABLE II Average Dimension of the Power Ratio Sample Numbers Finite Ratio Test S/N ESN N.368 80 15 100.9804 828 195 1,000.02911 8754 2015 10,000.00902 83221 20154 100,000 T 1The spacing would be. 26

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN Although little use can be made at present of sequential analysis in signal detection, it appears that if all possibilities of obtaining the practical theory desired without the finite dimensional restriction are to be explored, then the gaps mentioned in the theory of sequential analysis should certainly receive more attention in the future. 27

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN APPENDIX A T' MATHEEMATICAL THEORY OF SEQUENTIAL TESTS A.1 Introduction The discussion of sequential tests given in the body of this report is somewhat novel compared to the current literature of the subject. The novelty stems primarily from a special orientation and notation. Previous work on sequential tests has been done with the chief emphasis on its application to the case of finite populations where the distribution of the n-th measurement is the same as that of the first measurement, i.e., where successive measurements are independent. Finite populations have been of special interest because of the use of sequential tests in quality control, where the assumption of independence is rarely a significant restriction. Moreover, this assumption made it possible to establish a number of formulas which are of great value in computing the various parameters of a sequential test. However, in the field of signal detection it is easy to find quite simple cases where successive measurements are not independent. Therefore, the use of sequential tests in this direction will depend on the extension of the general theory in the absence of the hypothesis of independence. The material of this appendix has been written with the purpose of outlining the kind of theory needed and to point out certain theoretical questions which will have to be answered before sequential analysis can be applied to signal detection. As a result, the orientation of this discussion differs from that of Wald,1 for example; Throughout the appendix, the source for references to Wald's work is A. Wald, Sequential Analysis, John Wiley and Sons, 1947. 28

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN this change in orientation also requires a new notation, which has already been introduced above. A.2 Sequential Tests A sequential test is a particular combination of two basic kinds of mathematical objects, which will be called hypotheses and criteria. Let En denote the n-dimensional Euclidean space and p. Lebesgue measure on En. A hypothesis is a family { f: En-*E1}, n = 1, 2,..., of nonnegative functions subject to the conditions that, for each n, I. f d = 1 and f n dk and En II. If A is any set in En for which fn d/ exists, then A In = fn+l d/. A A X E II is called the "cylindrical" property, because A X E1 can be thought of as a cylinder erected on the base A. Note that if g is a real function of one variable such that J g dL = I, then a hypothesis may be constructed from g by defining E1 n n (x1, x2,..., xn) to be the product TT g(xi). Such a hypothesis is called i=l independent. A criterion is a collection {An Bn, Cn}, n = 1, 2,..., of sets subject to the conditions III. An, Bn, and Cn are pairwise disjoint. IV. A1UB1U C1 = E1 andAnUBnUCn = Cn_l X E if n > 1. Finally, a sequential test (of two hypotheses) consists of two hypotheses {fn: En-E El} and {gn: En-El} together with a criterion {An, Bn, Cn} 29 ~.

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN for which V. The integrals of fn and gn over the sets An, Bn, and Cn exist. VI. Z C fn d.L and J gn d~ converge. n n The chief notational difference here from that used by Wald is in the criterion. Wald supposes that, for each n, En has been partitioned into three disjoint sets n n h mutually disjoint sets Rl, R21 and R3. Then he distinguishes between "effective" and "ineffective" samples (i.e., points of En) but does not assign a symbol to the "effective" samples, that is, the set Cnl x E1. Because it will be necessary to compute probabilities that "effective" samples belong to, say, Rl, it is desirable to have a symbol for such a set. In the notation of this report, for example, An = R (Cn_1 X E) It will be shown in a moment that the quantities appearing in VI are merely the average sample numbers diminished by one. Wald employs as an axiom the condition that the two conditional probabilities of terminating be unity, and shows that if the hypotheses are independent, then this axiom holds. However, it is doubtful that sequential tests for which the average sample numbers are infinite will ever be of real interest. Moreover, VI implies that Wald's axiom holds, and that the conditional error probabilities converge. Because VI is actually stronger than both these conditions, VI appears to be a natural and useful axiom. Associated with the hypotheses are families {Fn} and {Gn} of measures defined by Fn(Q) = f fn dA and Q Gn(Q) = / n d. Q.... 30.....

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN Fn(Q) is interpreted as the conditional probability that a point x of En is a point of Q, where the "condition" is that hypothesis {fn: En —El} actually does describe the statistical properties of the sample x. A similar interpretation is made for Gn(Q). Lenma Al E1 = A1UB1UC1 and E[n+l =[An+lU Bn+lUCn+l] U (AiUBi)x E+-] Proof: The first statement is merely IV. The second can be proved by induction. Let Qn represent the right hand side of the second equality. Then 1 C1 x E )U KAXUBlX E) by IV Factoring E Q1' = (C1UA1UB1) x E = ElxE1 = E2 Thus the lemma is true when n = 1. Suppose the lemma is true for some particular number; i.e., suppose that En = n-1. This will now be shown to imply that En+l = Qn, which will complete the inductive argument. Using the inductive hypothesis, one obtains n \ En+l-i1 E1 [An ( (AiU Bi) x En E x X [A Bn (U (AiUBi) E ) i=l 1L = X (n-1 - Cn) = E X (En -Cn) 51

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN Therefore, Qn = [Cnx EJ U [El (En - Cn)] = (Cn x E) U (En) - E1 X Cn En+l which was to be proved. An immediate corollary of this lemma and II is Theorem Al n I \ E Fi (Ai) + Fi(Bi + Fn(Cn) = 1 i=l 1 (Gi(Ai) + Gi(Bi) + Gn(Cn) = 1 At this point it should be noted that Fi (At) + Fi(Bi) and Gi(Ai) + Gi(Bi) are the (conditional) probabilities that a sample (x1, x2,..., xi) be a point of AiUBi, which is equivalent to the assertion that the test will terminate at exactly the i-th stage. Thus Theorem A2 CD C Il (Fi(Ai) + Fi(Bi) i(Ai) + Gi(Bi) = i=l\ i=l means that the (conditional) probabilities of termination are unity. This theorem is proved by applying Theorem Al and the fact that Lim Fi(Ci) = i-CDO Lim Gi(Ci) = 0, which is a necessary condition that VI hold. i -- Cr If the hypotheses {fn} and {gn are denoted by Ho and ll respectively, then the quantities 00 = Z Fi(Ai) and i=l 00 3= Z Gi(Bi) i=l 32

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN are the conditional probabilities of a type I or type II errorl respectively. Let r be the (random) variable denoting the stage of termination of the test. Then the conditional expected values of r are El(r) = E nFn (AnuBn) and n=l CD Eo(r) = n Gn (AnUBn) n=l The question of convergence of these series is settled by the following evaluation. Theorem A3 E1(r) = 1 + F(Ci) i=l Eo(r) = 1 + Gi(Ci) i=l Proof: Since Cnx = An+lU Bn+lU Cn+l, it is equally true that (Cnx E1) - Cn+l = An+lUBn+1. Therefore El(r) = F1(A1UB1) + nl(Fn (CnL x E1) - Fn(Cn)). Using the facts that 1 - Fl(Cl) = n=2 F,(E1) F- (C) " F1(A1UB1) and that Fn(Cn-i x E1) = F.1 (C 1)' (i.e., It), one Ca obtains El(r) = 1 - Fl(C1) + E n(Fnl (Cnl) Fn(Cn)). This series n-=2 collapses and yields the desired result. A similar argument works for Eo(r). This is the notation used by Weld; type I is a false alarm, type II is a miss. 33

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN Thus the condition VI is seen to mean that the average sample numbers must be finite. Moreover Theorem A3 represents the average sample numbers in a way that greatly facilitates comparison of average sample numbers yielded by two different criteria by comparing the sets {Cn}. It is possible that this particular representation will eventually lead to a general proof of the optimum character of the Wald test.1 A.3 Sequential Ratio Tests Now that the basic properties of a sequential test have been explored, it is time to consider the problem of selecting a useful criterion for given hypotheses H1 and Ho. Bolstered by the success of the likelihood ratio as a criterion-selecting device in the finite ratio test, one hopes it would be equally efficacious here. The likelihood ratio is usually defined as the ratio fn(x)/gn(x). That is, at each point x~ in En at which the limit Lim fn(x)/gn(x) = T(x~) exists, one writes Y(x~) = T(x~). Let Sn denote x - x the set of all such pointr in En; Sn is called the ratio sample space. It is, of course, the domain of definition of 4n; usually Sn will be all of En. One would expect to construct a ratio criterion as follows from two sequences L = {b and R = an} with 0 < bn an. Let In ={x bn< < x < and further let Rn and Ln be the rest of E1 to the right and left respectively of In.2 1See below, Section A.4. 2The notation l (Q) denotes all points x in Sn for which in(x) is in Q. 534

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 1) Let A1 = -i (R1 ), B1 = 1(L1), and C = 41(1). 2) If A, Bn, and Cn have been defined, let An+l = ( Rn+l)) (Cn El), Bn =( n+l(Ln+l) n (Cn E1), and Cn1 = (Cn X E1) - An - Bn+1 The resulting sequence of sets {An, Bn, Cn is a criterion provided that Sn ) Cnl x E1 (except possibly for a set of n-measure zero). That is, the (n+l)-th stage likelihood ratio function must be defined at least for all points in Cn1 x E. If the pair L, R of sequences yields a criterion which satisfies V and VI, then it is said to be admissible, and the resulting criterion is written [L/R] to denote its dependence on the given sequences. Moreover, the resulting test is called a sequential ratio test. It is perhaps moot whether there are other systematic means (of generating criteria) which cannot be rejected immediately on the grounds that the resultant computational difficulties would be excessive. At any rate, the only such systematic method known as yet is that employing likelihood ratio. For that reason consideration is usually restricted to sequential ratio tests. When L ={bn} and R = an} are both constant sequences, i.e., al = an and b1 = bn for'all n, and if [L/R] is admissible, the resulting ratio test is called a Wald test. A.4 Optimum Tests To each sequential ratio test there are assigned four numbers or parameters: a, 3, Bl(r) and Eo(r). It is desirable to choose criteria [L/R which make these numbers as small as possible. Suppose hypotheses Hi and Ho are given and error probabilities a and 3 prescribed. Then Wald defines an optimum test (at the level of a, 3) as a test 35

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 1) Whose error probabilities are a and B, 2) Whose average sample numbers are minimum among all other tests whose error probabilities are also a and P. Further, Wald conjectures that when the class of tests at the (a, 3) level contains a Wald test, then the Wald Test is optimum. To support this conjecture he proves that if the hypotheses Ho and H1 are independent and if, in the Wald test, bi = bn< n(x) < a = a for all x in Cn x E - AnUBnUCn, then the n n-l Wald test is indeed optimum. The second hypotheses says that, in the Wald test's criteria, -I -1 O AnC en (al) and BnC in (bl). Moreover E F(Ai) + F(Bi) = 1. Hence for i=l some n at least, Ln (al) has Fn measure positive, and therefore positive Lebesgue measure; i.e., necessarily some of the point inverses of the likelihood ratio function have positive measure. In applications to signal detectability however, the likelihood ratio will be (real) analytic, so that all its point inverses have measure zero. Thus Wald's theorem is of little value for such applications. In fact, whenever the functions rn and gn are continuous (and hence) induce measures Fn and Gn which have the property of assigning the same measure to a set as is assigned to its closure, it will follow that all point inverses of the likelihood ratio function have probability zero. Therefore even under these much less restrictive conditions Wald's hypothesis does not hold. This is one major gap in the theory of sequential analysis so far as applications to signal detection are concerned. The other question which also See Technical Report No. 13, Part I, Lemma 2, page 34. 2Having probability zero means the Fn and Gn measures both are equal to zero. Any such set will make no contributions to the parameters of the test. 36

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN remains to be answered is concerned with the consequences of knowing that Wald's test is always optimum. In this event, it would be desirable to know when the class of all sequential tests at a given (a, 3) level includes a Wald test. Moreover, those pairs of hypotheses Ho and H1 for which there is some sequential test whose average sample numbers are finite should be characterized, for these are the only hypotheses with which one would consider using a sequential test in the first place. In connection with the question regarding the (a, 3) levels at which there is a Wald test, some information is available. Lemma A4. For every Wald test, a + 3 < 1. Proof: Because the given test is a Wald test, an = al and bn = b1 for all n. From the inequalities Fn(An) = fn() d (x) = (X) dGnn ())( dGn) = G An n AnAn An An Fn(Bn) = / fn(X) d, (x) = f n(X) dGn( xmaX n(x))( (dGn) =blGn(Bn) Bn Bn \ Bn / B one obtains Fn (An) > a Gn(An) and Fn(n) < b Gn(Bn) These expressions summed over all n yield Fn(Bn) <b<a< Fn(An) 1-a (ZGn(Bn) G n(An) a ~~l~~ 1_~ ~37

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN But - 1 - = 1-a-_ >0 means that 1-3-a>0 since the denominator is known to be positive. APPENDIXD B SAMPLE PLATS B.1 Introduction The theory of the finite ratio test developed in Technical Report ITo. 13 depends on finding a sampling plan which throws no information away. If the measurements are to be of the instantaneous amplitude of the receiver input, then such a sampling plan on the sample interval I consists first of a basis for population IT containing n linearly independent functions {xi(t)}, i = 1, 2,..., n: and sample points {tiJ i = 1, 2,..., n, in I with the property that every function w(t) in populations N and SN can be expressed as n w(t) = I w(ti) xi(t) i=l By measuring values of the receiver input w(t) at the sample points {ti} one obtains the coefficients needed to represent w(t) in terms of the known basis 58 -

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN functions{xi(t}. Such a basis together with the sample points determines an admissable sample plan on I. B.2 If Populations IN and SN are Finite Dimensional, Then There Is an Adnissible Sample Plan Since the populations are finite dimensional, there is a basis {i(t)}, i = 1, 2,..., n, for them. It will be sufficient to construct a new basis xi(t)} and sample points {ti} in I which have the property that Xi(tj) = ij 8 First it is necessary to show that there are sample points {ti} for which det (Yi(t)) / 0. This is certainly true if n = 1. Suppose it is true when the dimension equals n; this will imply that it is also true when the dimension equals n + 1. The proof goes as follows: By the inductive hypothesis there are n sample points {ti4, i = 1, 2,..., n, with the property that iY(tl)''' Yl(tn) det.. Yn(ti).. Yn(tn) Let Ylt1 ) ~ ~ y1(tn) Yl (t) * * a D(t) = det (tl)'' Yn(tn) Yn(t) Yn-+l(fi )'' Y SQ.9+l(t) 39

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN Then D(t) can be expanded by minors along the last column, yielding n+l D(t) = L ai yi(t), where an+l is not zero, for it is the n-by-n determinant i=l above. If D(t) = 0 for all t in the interval I, then all the ai's must vanish because the Yi's are linearly independent on I. Hence D(t) is not identically zero on I and therefore some tn+l can be found for which D(tn+l) j 0. Now, in order to construct the desired basis {xi(t)}, it is necessary only to solve the n2 linear equations n Z aij yj(tk) = 8ik j=in n2 unknowns{aiJ}; for if they can be solved, the desired Xi' can be chosen n as xi(t) = Z ai yj(t). The solubility of these equations can be determined j=l 2a by examining the n2 by n determinant formed by their coefficients. If Q is used to denote the n X n matrix whose elements are {y(tk)}, the determinant of the coefficients can be written as o0 ~Q~~ ~ n Q det. | (det a) which has just been shown to be non zero. Hence the equations can be solved. B.3 Sampling in Arbitrarily Short Intervals There are many instances where the functions of populations N and SN can be taken to be (real) analytic, as for instance when the signal is a tone modulated CW transmission. Such functions have the property that they never ~ bO ~~~~~~~4

ENGINEERING RESEARCH INSTITUTE ~ UNIVERSITY OF MICHIGAN vanish on any interval. This means that if{yi(t)}, i = 1, 2,... n, is a basis for the populations on the interval I, then it is also linearly independent and therefore a basis on any sub-interval of I. Thus the proposition proved in B.2 by induction could be applied to determining sample points in any sub-interval of I. The rest of the demonstration in that paragraph applies to any collection of sample points as long as they are chosen in the interval I. In this way an admissible sampling plan for the interval I can be chosen with the sample points restricted to any arbitrarily small sub-interval. But for any admissible sampling plan in the given sample interval there is an optimum criterion, and so the ROC curves of any two admissible sampling plans for the given interval will be identical, since each is "optimum." If this theoretical result is interpreted terally it means that observations of the receiver input can be restricted to any small interval without impairing the effectiveness of the detection system (see page 24 for discussion of this matter).

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN APPEIIDIX C PROBABILITY DENSITY FUITCTIOUIS The only technical concept used in the body of this report which is needed in understanding the material is that of density function. The purpose of this appendix is to give a simple account of the meaning of this term. Suppose that in a particular study only a finite number of different events are possible, for example, the events that can result from rolling dice. Then the classical definition of the probability of an event E is number of ways E can occur _ (E) total number of events P(E Unfortunately when the possible numlber of events is not finite, then the denominator of the above expression is infinite, and the quotient is zero (unless the denominator is also infinite, which only accentuates the difficulty). An example of such a situation can be constructed as follows. Suppose a dart is to be thrown while aimed at the center of a target, where the dart's point is idealized into a mathematical point. It is (again ideally) possible to find a probability that the thrown dart will land in a certain circle by determining the frequency with which this occurs in a large number of tries. This probability may very well depend on where the circle is located on the board. In order to be able to compare the affinity of the thrown dart for circles of unequal size, one would divide the probability of each circle by the area of the circle, 4.2

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN P (dart landin in circle) area of circle C Let P (C) be called the nornalized probability of hitting the circle. In order to assign a number to each point x of the target in such a way as to represent the affinity of the dart for landing near that point, one would take an entire sequence of circles, each centered at the point, whose radii are decreasing to zero, designated by Cn. Then the limit of P*(Cn) as n-~CO could be used as the number f(x) to be associated with the point x. This number may or may not be zero, which avoids the difficulty pointed out above when the classical definition of probability is applied to infinitely many events. In addition, it can be proved that when the resulting function f(x) is integrated over a circle C, the value of the integral is merely P(C) all over again. This function f(x) is called the probability density function, or more simply, the density function. Its basic property is that by integrating it over a (geometric) figure, one obtains the originally assigned probability that an event (events are represented by points of the target) will be a point of the given figure. Thus the integral over the entire target, i.e., over all points, will be unity. ~43

BIBLIOGRAPHY On Statistical Approaches to the Signal Detectability Problem: 1. Peterson, W. W., and Birdsall, T. G., "The Theory of Signal Detectability," Technical Report No. 13, Electronic Defense Group, Department of Electrical Engineering, University of Michigan. 2. Lawson, J. L., and Uhlenbeck, G. E., Threshold Signals, McGraw-Hill, New York, 1950. This book is certainly the outstanding reference on threshold signals. It presents a great variety of both theoretical and experimental work. Chapter 7 presents a statistical approach of the criterion type for the signal detection problem, and the idea of a criterion which minimizes the probability of an error is introduced. 3. Davies, I. L., "On Determining the Presence of Signals in Noise," Proc. I.E.E. (London), Vol. 99, Part III, pp.45-51, March, 1952. 4. Woodward,P. M., and Davies, I. L., "Information Theory and Inverse Probability in Telecommunication," Proc. I. E. E. (London), Vol. 99, Part III, p. 37, March, 1952. 5. Woodward, P. M., and Davies, I. L., "A Theory of Radar Information," Phil. Mag., Vol. 41, p. 1001, 1950. 6. Woodward, P. M., "Information Theory and the Design of Radar Receivers," Proc. I.R.E., Vol. 39, P. 1521. Woodward and Davies have introduced the idea of a receiver having a posteriori probability as its output, and they point out that such a receiver gives a maximum amount of information. They have handled the case of an arbitrary signal function known exactly or known except for phase with no more difficulty than other authors have had with a sine wave signal. Their methods serve as a basis for the second part of this report. 7. Reich, E., and Swerling, P., "The Detection of a Sine Wave in Gaussian Noise," Jour. App hys., Vol. 24, p. 289, March, 1953. This paper considers the problem of finding an optimum criterion (of the second type presented in this report) for the case of a sine wave of limited duration, known amplitude and frequency, but unknown phase in the presence of Gaussian noise of arbitrary autocorrelation. 8. Middleton, D., "Statistical Criteria for the Detection of Pulsed Carriers in Noise, Jour. Appl. Phys., Vol. 24, p. 371, April, 1953. 44

A thorough discussion is given of the problem of detecting pulses (of unknown phase) in Gaussian noise. Both types of optimum criteria are discussed, but not in their full generality. The sequential type of test is discussed also. Middleton's equation (6.1) does not hold for the sequential test, and as a result, his calculations for the minimum detectable signal with a sequential test are incorrect. 9. Slattery, T. G., "The Detection of a Sine Wave in Noise by the Use of a Non-Linear Filter," Proc.,I. R. E., Vol. 40, p. 1232, October, 1952. This article considers the problem of detecting a sine wave of known duration, amplitude, and frequency, but unknown phase in uniform Gaussian noise. The article contains several errors, and the results are not clearly presented. 10. Hanse, H., "The Optimization and Analysis of Systems for the Detection of Pulsed Signals in Random Noise," Doctoral Dissertation (MIT), January, 1951. 11. Schwartz, M., "A Statistical Approach to the Automatic Search Problem," Doctoral Dissertation (Harvard), June, 1951. These dissertations both consider the problem of finding the optimum receiver of the criterion type for radar type signals. 12. North, D. 0., "An Analysis of the Factors which Determine Signal-Noise Discrimination in Pulsed Carrier Systems," RCA Laboratory Report PTR-6C, 1943. The ideas of false alarm probability and probability of detection are introduced. North argues that these probabilities will be most favorable when peak signal to average noise ratio is largest. The ideal filter, which maximizes this ratio, is derived. (This commentary is based on second-hand knowledge of the report.) 13. Kaplan, S. M., and Fall, R. W., "The Statistical Properties of Noise Applied to Radar Range Performance," Proc. I.R.E., Vol. 39, p. 56, January, 1951. The ideas of false alarm probability and probability of detection are introduced and an example of their application to a radar receiver is given. 14. Marcum K. I., "A Statistical Theory of Target Detection by Pulsed Radar: Mathematical Appendix," Rand Corporation Report R-113, July 1, 1948. This report contains a careful, thorough study of the mathematical problem which it considers. On Statistics: 15. Neyman, J., and Pearson, E. S., "On the Problem of the Most Efficient Tests of Statistical Hypotheses, "Phil. Trans, Roy. Soc., Vol. 231, Series A p. 289, 1933, 45

16. Cramer, H., Mathematical Methods of Statistics, Princeton University Press, Princeton, 1951. 17. A. Wald, Sequential Analysis, John Wiley and Sons, 1947. 18. Grenander, U., "Stochastic Processes and Statistical Inference," Arkiv For Mathematik, Vol. 2, p. 195 (1950). This paper presents among many things a likelihood ratio for infinite dimensional probability measure spaces which thereby relieves the NeymanPearson test of its restriction to finite dimensionality. 46

DISTRIBUTION LIST 1 copy Director, Electronic Research Laboratory Stanford University Stanford, California Attn: Dean Fred Terman 1 copy Commanding Officer Signal Corps Electronic Warfare Center Fort Monmouth, New Jersey 1 copy Chief, Engineering and Technical Division Office of the Chief Signal Officer Department of the Army Washington 25, D. C. Attn: SIGGE-C 1 copy Chief, Plans and Operations Division Office of the Chief Signal Officer Washington 25, D. C. Attn: SIGOP-5 1 copy Countermeasures Laboratory Gilfillan Brothers, Inc. 1815 Venice Blvd. Los Angeles 6, California 1 copy Commanding Officer White Sands Signal Corps Agency White Sands Proving Ground Las Cruces, New Mexico Attn: SIGWS-CM I copy Signal Corps Resident Engineer Electronic Defense Laboratory P. 0. Box 205 Mountain View, California Attn: F. W. Morris, Jr. 75 copies Transportation Officer, SCEL Evans Signal Laboratory Building No. 42, Belmar, New Jersey For - Signal Property Officer Inspect at Destination File No. 25052-PH-51-91(1443) 47

1 copy W. G. Dow, Professor Dept. of Electrical Engineering University of Michigan Ann Arbor, Michigan 1 copy H. W. Welch, Jr. Engineering Research Institute University of Michigan Ann Arbor, Michigan 1 copy Document Room Willow Run Research Center University of Michigan Willow Run, Michigan 10 copies Electronic Defense Group Project File University of Michigan Ann Arbor, Michigan 1 copy Engineering Research Institute Project File University of Michigan Ann Arbor, Michigan UNIVERSITY OF MICHIGAN 3 9015 02826 5547 48