ENGINEERING RESEARCH INSTITUTE UNIVERSITY OF MICHIGAN ANN ARBOR A THEORY OF RECOGNITION Technical Report No. 50 Electronic Defense Group Department of Electrical Engineering By: W. P. Tanner, Jr. Approved by: A. B. Macnee Project 2262 TASK ORDER NO. EDG-3 CONTRACT NO. DA-36-039 sc-63203 SIGNAL CORPS, DEPARTMENT OF THE ARMY DEPARTMENT OF ARMY PROJECT NO. 3-99-04-042 SIGNAL CORPS PROJECT NO. 194B May, 1955

TABLE OF CONTENTS Page LIST OF ILLUSTRATIONS iii ABSTRACT iv 1. BACKGROUND FOR THE THEORY 1 1.1 The Detection Theory 1 1.2 Definitions of Optimu 4 1.3 Forced-choice Optimization 6 1.4 The Application of the Theory to Human Behavior 6 2. RECOGNITION FOR THE CASE OF TWO SIMPLE AERNATIVES 7 2.1 The Decision Axis for Two Signals 8 2.2 Modification to Allow for Observation at both Frequencies 14 2.3 Experimental Design 19 2.4 The Experiments 20 2.5 Generality of the Theory 20 3. EXPANSION OF THE THEORY 29 3.1 Requirement of a Set of Alternatives 29 3.2 The Existence of Hypotheses 31 3.3 The Entropy of the Alternative Ensemble 31 3.4 The Equivocation 32 3.5 Optimum Behavior Criteria 32 3.6 Complex Alternatives 33 3.7 Information Basis for a Choice between Complex Alternatives 33 4. CONCLUSIONS 34 APPENDIX I OBSERVER EFFICIENCY 35 APPENDIX II THURSTONE'S LAW OF COMPARATIVE JUDGMENT 37 REFERENCES 40 DISTRIBUTION LIST 41 ii

LIST OF ILLUSTRATIONS Figure No. Pa 1 Block Diagram Illustrating Basic Assumption 3 2 The Recognition Space for a Signal Known to be One of Two Frequencies 9 3 Simplified Diagrams for the Orthogonal Cases 12 4 Detection Case for a Signal Known to be at One of Two Frequencies. Observer 1. Duration.1 Sec. 15 5 Detection Case for a Signal Known to be at.One of Two Frequencies. Observer 2. Duration.1 Sec. 16 6 Detection Case for a Signal Known to be at One of Two Frequencies. Observer 1. Duration.3 Secs. 17 7 Detection Case for a Signal Known to be at One of Two Frequencies. Observer 2. Duration.3 Secs. 18 8 Observer 1. Duration =.05 Sec. Around 1000l Recognition of One of Two Alternatives 21 9 Observer 2. Duration =.05 Sec. Around 1000 - Recognition of One of Two Alternatives 22 10 Observer 1. Duration =.1 Sec. Around 1000 ~ Recognition of One of Two Alternatives 23 11 Observer 2. Duration =.1 Sec. Around 1000 - Recognition of One of Two Alternatives 24 12 Observer 1. Duration =.5 Sec. Around 1000 ~ Recognition of One of Two Alternatives 25 13 Observer 2. Duration =.5 Sec. Around 1000~ Recognition of One of Two Alternatives 26 14 Observer 1. Duration = 1 Sec. Around 1000 ~ Recognition of One of Two Alternatives 27 15 Observer 2. Duration = 1 Sec. Around 1000~ Recognition of One of Two Alternatives 28 iii

ABSTRACT The theory of signal detection as applied to the human observer is reviewed. The theory is then extended to include the simple case of recognizing a signal as one of a set of two alternatives, and experiments relating to this case are reported. The principles upon which the theory can be extended to cover more complex alternatives are developed. iv

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN A THEORY OF RECOGNITION 1. BACIKROUND FOR THE THEORY The theory of statistical decision has previously been applied to the oblem of the sensory detection of signals. In this paper a simple recognition oblem is analyzed in the framework of statistical decision. The extension of e detection theory to include the problem of recognition is the first phase of general expansion of the theory to encompass the field of perception. To make is paper self-contained, a brief review of the detection theory, and its corresndence with data presently available, is presented. This presentation is fol.wed by a development of the recognition theory for the simple case studied here. nally, certain principles are outlined for the general extension of the theory. e paper is presented largely in terms of auditory theory, although it is felt at the theory is applicable to the entire field of human perception. 1 The Detection Theory The application of the theory of signal detection, or statistical desion theory, depends on three basic assumptions. 1) Sensory systems function primarily as communication channels. 2) Sensory systems are noisy channels. 3) Central mechanisms, where decisions are made, are capable of approximating optimum use of the information gathered by the peripheral sensory mechanisms. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~iii"

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN The assumptions are illustrated by the block diagram in Figure 1. Th physical environment, which in the theory discussed below is equivalent to the signal and noise generators, presents an input to the receptor organ. The func of this organ is to transform the physical energy into neural activity. The in mation contained in the neural activity is then transmitted along the sensory pathways. These pathways are subject to internally generated noise. The infon tion plus the noise added in transmission, is then presented to, or displayed a cortical centers. The presentation is here considered as an observation, x, up' which the decision is based. The key points of the theory are that noise is ads in the transmission of the information, and that the decision making device is perfect device. A decrement in performance (from that which would be expected a perfect device were placed at the receptor level) is attributed to the noise added in sensory transmission. The fundamental problem in signal detection is the fixed observation interval problem. That is, the observer is asked to observe the output of a s, sory system, and is then asked to decide on the basis of his observation, wheth this output arose from noise alone, or from signal plus noise. In this theory signal is known to be from a certain ensemble of signals. This is the criterio: approach. In other words, the observer chooses a set of observations (the criterion A) which he will say represents signal plus noise; all other observations E in the complement of the criterion, CA, and he will say that these represent no: alone. The notation SN denotes signal plus noise, and N denotes noise alone. there are only a countable number of possible observations, each observation xJ the probability PSN(x) of occurrence if signal plus noise is presented and the 1. The discussion in Sections 1.1 and 1.2 is based on an unpublished paper by T. G. Birdsall of the Electronic Defense Group, University of Michigan. 2

OBSERVATION SPACE 10 ~~~~~~~~~~~NOISE~~ ENVIREDNMENT ORGAN DISPLAY I OBSERVATION SPACE PHYSICAL RECEPTION 7 INFORMATION S DECISION w ENVIRENMENT ORGAN I-SENSORY PATHWAYS DISPLAY MAKER BLOCK DIAGRAM ILLUSTRATING BASIC ASSUMPTION. FIG. I.

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN probability PN(x) of occurrence if noise alone is present. The likelihood ratic is defined as l(x) = P(x. Usually there are uncountably many observation PN(x) points (x is a continuous variable), and the probability density functions fSN( and fN(x) must be used; the likelihood ratio is then the ratio of these two quantities. The evaluation of a criterion is usually in terms of the integrals of the density functions over the criterion A, since the integral fsN(x) over A is the conditional probability of detection, PSN(A), and the integral of fN(x) ovei the criterion A is the conditional probability of a false alarm, PN(A). 1.2 Definitions of Optimum The theory of signal detectability is essentially this: a class of criteria is defined in terms of likelihood ratio. Six slightly different defini tions of "optimum" are advanced, and under each definition the optimum is found to be in this class of likelihood-ratio criteria. The notation denoting a criterion in this optimum class is A(P), which means that the criterion contains al observations with likelihood ratio greater than p, and contains none of those wi likelihood ratio less than P, (that is, p represents the boundary condition). Whether or not likelihood ratios equal to p fall in or out of the criterion is unimportant. The six definitions of optimum, and their solutions (the exact values 5 to be used, called the operating level) are listed below: 1) The Weighted Combination Criterion. This criterion, by definition maximizes PSN(A) - PPN(A). Solutior: A(P), that is the observer reports that a signal is present is l(x)> (), where x is the stimulus magnitude. -_________________ 4

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 2) Seigert's Ideal Observer. The observer employs a criterion that minimizes total error; this is a special case of the Weighted Combination Criterion. Solution: A(P) where P = P(N)/P(SN), the ratio of a priori probabilities. 3) Expected Value Observer. The observer employs a criterion that maximizes the total expected value, where the individual values are: VSNA = value of detection VN.CA = value of a correct "no signal present" KN.CA = cost of a miss KN.A = cost of a false alarm This is a further refinement beyond Seigert's Ideal Observer. Solution: A(p) where P(N) (vNA+ KNA) P(SN) (VN.A + KSNCA) 4) The Neyman-Pearson Observer. The observer employs a criterion such that PN(A) = k, with PSN(A) a maximum overall criteria. Solution: A(P), where PN[A(C)] = k. 5) A Posteriori Probability Observer. Here the observer does not actually employ a criterion, he makes the best estimate of the probability that signal-plus-noise was the input leading to observation x = x(t) PX(SN) = t()(N) (x)P(SN) + 1 - P(SN) ---------— 5,

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 6) Information Observer. This criterion maximizes the reduction in i certainty, in the Shannon sense (Ref. 2), as to whether or not a signal was sent. Solution: A(P) where B is the solution to P(N) logPB(0) (N) - log PA() (N) P(SN) logPA( ) (SN) - log PB(p) (SN) 1.3 Forced-Choice Optimization A somewhat different optimization is that involved in the forced-choic psychophysical experiment. In the forced-choice experiment a signal is presente in one of n intervals either in time or space, and the observer's task is to ste in which interval the signal occurred. Optimum behavior requires making an obse vation, x, in each interval, and choosing the interval for which the likelihood ratio, (x) = fSN( ), is greatest. The situation is somewhat different from t1 fN(x) criterion approach, as is the a posteriori observer. 1.4 The Application of the Theory to Human Behavior A series of papers previously have dealt with experiments designed to test the applicability of the theory of statistical decision to signal detection by the human observer. The conclusions drawn from the experiments are the following: 1) A subject can observe well into noise. The observation variable, x, is indeed continuous (Refs. 4, 5, 6, 7, 8, 9). 2) The observer can act to optimize expected values in the fixedobservation interval experiment. This is shown to be true for both vision (Refs. 4, 6) and audition (Ref. 9). 3) The observer can act as a Neyman-Pearson observer (Ref. 4). 4) The observer can act as an a posteriori observer (Ref. 4). 6 -

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 5) The observer can optimize in the forced-choice experiment, as indited by the predictability from forced-choice to yes-no experiments. The orderg extends beyond the first choice, as indicated by the second-choice and fourth-:ice experiments (Refs. 4, 9). 6) Within limits the observer can optimize the use of knowledge of sigL parameters. He can tune to a narrow band of frequencies in auditory experiats, and can,within limits, adjust his auditory bandwidth to knowledge of sigL duration. 7) At any instant in time the observer is tuned to exactly one band of nquencies. To act as a wide band receiver is a process requiring time. (Refs. 9). 8) When the observer is listening for a signal known to be at one of 3 frequencies, detection performance decreases as a function of the separation the frequencies (See Section 2.2). This performance suggests a scanning-type ahanism (Refs, 5, 9). These conclusions furnish the background for the recognition theory. 2. RECOGNITION FOR THE CASE OF TWO SIMPLE ALTERNATIVES Recognition, by definition, is the process of classifying a signal as?articular one of a set. The fundamental problem treated here is the case of a t with two members, each a signal of a specified frequency, fl or f2. Through e experimental design, the a priori probabilities of the two signals P(S1N) + 32N) = 1.00. The observation xy is now associated with two probability density 3tributions, fS N(xy) and fS2N(xy). The decision is again based on likelihood tbi as is shown in Equation 1. 7

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN I(xy) fslN(xY)(1) fs2N(xy) If the decision function can be defined along an axis, the problem is similar to the detection problem with one of the probability distribution functions (fS N(x) or fS (x)) substituted for fN(x) of the detection problem. S1NST S2N N 2.1 The Decision Axis for Two Signals The problem is illustrated in Figure 2. The axis,OX,represents the decision axis for the detection case where the signal is known to be at fl. The axis,OY,represents the decision axis for the detection case where the frequency is known to be at f2. The distance,OX,divided by the standard deviation of the noise distribi tion, fN(x), is called di, the d' for detection of frequency f1. The distance, 0 divided by the standard deviation of fN(y) is d', the d' for detection of frequel f2. The d' for recognition of frequency when the signal is known to be either f or f2, is designated d,2 The distributions fN(x), fN(y), fSLN(x) and fS2N(Y) all assumed to have equal variance. The problem considered is again the fixed observation interval problem An observation, xy, a function of time for T seconds, is the datum upon which th4 decision is based. The signal is known to be either fl or f2, but not both. Th fS1N(xY) is the joint probability density function fSlN(x) and fN(y), while fS2 ^ (xy) is the joint probability density functions fSN (y) and fN(x), assuming and y are independent. 1. As the development is suggestive of Thurstone's law of comparative judgment, the similarities and differences between this theory and that law are discus; in Appendix II. 8

FIG. 2. THE RECOGNITION SPACE FOR A SIGNAL KNOWN TO BE ONE OF TWO FREQUENCIES.

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN X 2 2 (x _ d)2 1 2 2 If fSlN(Y) = 2 e e (2) (y 2) 2 and fSN(Y)= 2 e e (3) then, from Eq. 1, x2 (ai)2 Y2 2 (da)2 +2 = xd' _ _ yd' + 2(4) log ~(x)=-+x. - -5 —+ -' y + — +- - 1 2 2 2 If d = d and d' = 0, then log l(x) y = x - (5) d' Thus, if l(x) is held constant, this is the equation for a straight li log l(x) with slope 1.00, and intercept. This line passes through the origin d' when f(x) = 1.00. By equation 5 each value of 1(x) is represented by a line of slope 1 which intersects the line connecting d. = x and d' = y at right angles. From th 1 2 it follows that the decision axis for the recognition problem can be mapped on t] line, with S1 normally distributed (x, 1) and S2 normally distributed (y, 1). Part of Figure 2 has been reproduced as Figure 3(a) to illustrate this point more simply. The dotted lines on this figure are lines of constant likeli hood ratio. The slope of the line x y is -1 while the slope of the lines of con stant likelihood ratio are +1 (by Eq. 5). Therefore, these lines intersect at right angles. If the value of y is held constant, say at y = 0, then the values of x are normally distributed along the x axis, indicating the normality of the mapping along the x y axis. 10

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN The assumption of independence implies that the angle e(XOY) = 900~. srefore 2 2 2 (di) (d ) + (d') (6) 1,2 1 2 d' f d', then Equation 5 becomes 1 2' d log t(x) + (di) - (d) y= x. 1 2 (7) 2 d2 i it can be shown that I(x) constant is represented by a line which intersects 3 line x y at right angles, the line l(x) = 1.00 intersecting at the mid-point. us, equation 6 also applies to this case. Again, part of Figure 2 has been reproduced as Figure 3(b). The line anecting x y is at slope -_ while the lines of constant likelihood ratio are di 1 slope + -, and again the two lines intersect at right angles. In the figure, d' 2- - 2 )2 e distance x y =(d) + (d). Solving for the intersection of the line x) = 1.00 and the line x y: d' cos A = 1 y a(x y) 2ad'(x y) cos A = - = (. ) ( )2 (d)2 + (')2 2d' 1 uating: di 2ad' (x y) x y (d)2 + (d+ 2 1 2 11 -

M-RLI- nf. cuc-'wJ-V 99ZZ 0 d', (a) <(X) =1.00 Y l\ / d2 // AXY -/ — / d/ / / (b) FIG. 3. SIMPLIFIED DIAGRAMS FOR THE ORTHOGONAL CASES. 12

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 2a( y)2 (dl)2+ (d2)2 Lnce (x ) = ( )2 + (d)2, a and thus, if l(x)= 1.00, the line x y is 1 2 2 itersected at the midpoint. If the two signals are not independent, or, in other words, if there is common factor in the observations x and y, then the signal spaces x and y are )rrelated. The degree of correlation is defined by the cosine of the angle 0.'r this case (di ) = (d)2 + (d')2 - 2 cos 0 d' (8) 1,2 1 2 1 2 Equation 8 is the general form. For the orthogonal or independent case, )s e = 0 and Equation 8 is identical with Equation 6. For the perfectly correla-'d case, such as two signals of the same frequency differing only in d' as a.sult of different intensity, cos e = 1.00 and d' =d'- d (9) 1l,2 1 2 Thus, in each case, the decision function hIs been defined along an axis, iowing that each recognition case is essentially the same as the detection case. This discussion furnishes the basis for the development of the theory. ) far it is based on the assumption that the process of observing one frequency )es not interfere with the process of observing at other frequencies. That this; not a valid assumption is indicated by the seventh and eighth conclusions in action 1.4. 13 -

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 2.2 Modification to Allow for Observation at Both Frequencies The experiments upon which the 7th and 8th conclusions of Sec. 1.4 are based suggest that for a signal 0.1 second in duration at a frequency of either 900 cycles or 1000 cycles, the detection rate is as if both frequencies can be ob served simultaneously. When the frequencies are separated by more than 100 cycle the detection performance is lower until the separation reaches 300 cycles (fl = 700 and f2 = 1000), at which point the performance is such that only one frequency can be observed during the duration of the signal. The results first reported by Tanner and Norman (Ref. 5) are illustrated in Figures 4 through 7, because they relate closely to the data to be reported below. The curve above the other two is the forced-choice curve for a signal of known frequency. The middle curve is for a signal known to be at one of two frequencies when it is possible to observe at both frequencies simultaneously. The bottom curve is for signal known to be at one of two frequencies, when it is possible to observe at only one frequency. Thus, if the observer happens to be observing at the wrong frequency he is forced to make his choice without relevant information. The data are placed on the graph as follows. First the d' is determine for the signals when the frequency is known exactly, and then the percentage correct for the experiment in which the signal is known to be at one of two frequencies is entered for that d'. Two durations are represented, with the results virtually the same. It should be noted that it is likely that both of the durations are within the range for matching of bandwidth to duration (Section 1.4, Conclusion 6), and the results for the durations should not be markedly different Thus, for a signal 0.1 second in duration, the signal space is expected to show the angle of correlation, e, increasing until the frequencies are separat by 100 cycles (900 or 1000 cycles) at which point a maximum of e = 90~ is expecte< 14

V.S-9-6 W3r ~f1l- 9-V Z9ZZ 1.0 la l^ 11 STMULTANEOUS /00 or 1000 ( = 0.02) 0.8 / / 800 or 1000 (o= 0.03) SUCCESSIVE 0.6 /700 or 1000 (a- =0.03) 04 OBSERVER I _ / t O0.1 SEC. 0.2 0 0 I 2 3 4 5 6 d' FIG. 4. DETECTION CASE FOR A SIGNAL KNOWN TO BE AT ONE OF TWO FREQUENCIES. OBSERVER I. DURATION O. ISEC. 15

t S-8-6 WI3r 9f l-~9- Z9ZZ 1.0 m SIMULTANEOUS 0.8 /Qo or 1000 (or ~ 0.03) 800 or 1050 (0= 0.03) SUCCESSIVE 0.6 a. | /AZ0 Oor I000 ('- 0.03) 0.4 -// / OBSERVER 2 t = 0.1 SEC. 0.2 0 2 3 4 5 6 d' FIG. 5. DETECTION CASE FOR A SIGNAL KNOWN TO BE AT ONE OF TWO FREQUENCIES. OBSERVER 2, DURATION O.1 S 16

f,-8- 6 W3 r ot61p-e9-Vt Z9z I MULTANEOUS IOOOor 1000 (o= 0.02) ~~~~~/0.8 5900 or 1000 (o: 0.02) 0/ 800 or 1000 (0o 0.02) 700 or 1000 (a0' 0.04) SUCCESSIVE Q6 0 2 3 4 5 6 d' FIG. 6. DECTECTION CASE FOR A SIGNAL KNOWN TO BE AT ONE OF TWO FREQUENCIES. OBSERVER I, DURATION 0.3 SEC l7

1.0, - r I M ULTA N EOUS 00. oO or 1000 (ar 0.02) 0.8 SUCCESSIVE 0.6 700 or 1000 (t = 0.04) ~ i/ A X l800 or 1000 ( - a 0.02) 0.4 f // /L ___ __ OBSERVER 2 0 0.2 0 1 2 3 4 5 d' FIG. 7. DETECTION CASE FOR A SIGNAL KNOWN TO BE AT ONE OF T% FREQUENCIES. OBSERVER 2, DURATION 0.3 SEC. 18

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN -r- frequencies further separated, the calculations in e (Section 2.3) are expected o yield a decrease in 9 until at a separation of 300 cycles e should appear to be 00. An apparent 60~ can be achieved if the observer attends to one frequency, ffectively performing a yes-no experiment. If he accepts a signal at that freuency, he states so. If he does not, then he indicates that the signal was at he other frequency. After the maximum is reached, the decrease in e does not epresent correlation, but rather a loss due to the observer's inability to oberve by x and y. If the signal is sufficiently long in duration the observer hould be able to observe both x and y regardless of the separation, so that once reaches 90~ it should stay there for frequencies of wider separation. 3 Experimental Design The experimental design involved first a two-choice, forced-choice xperiment at each of the frequencies, until approximately equal d's are. deterined. Then a signal known to be at either one of the two frequencies is presented t a specified time, the observer's task being to state whether the signal is f, r f. The two-choice, forced-choice experiment is actually the choice of one of wo signals orthogonal in time. The d' and d' are determined in the following 1 2 anner. The percentage correct is used as an estimate of the probability of corect. This figure is used to enter normal tables, and the corresponding X is etermined. This value is multiplied by f2 giving the equivalent yes-no d'. For lie recognition experiment, the same procedure is used, except in this case the x alue of a is multiplied by 2. From a rearrangement of Equation 8 the d's are then used to find 9. 2 2 2 o (d)2 + (d2) - (d' )2 cos 0 = - (10) 2d'd' 12 Ius 9 is shown as a function of frequency separation. 19

TABLE I RESULTS OF THE EXPERIMENTS1 No = 52.3 db re.0002 dyne/cm2 Frequency 2E 2E2 Separation fl f2 N N1 N2 N1 2 Duration.05 seconds 25 cps 975 cps 1000 cps 3.16 3.16 198 198 198 50 950 1000 3.16 3.16 198 198 197 100 900 1000 3.10 3.16 197 198 198 200 800 1000 3.05 3.16 198 198 198 300 700 1000 3.02 3.16 198 198 198 400 700 1100 3.02 3.25 198 198 198 500 700 1200 3.02 3.42 198 198 197 600 700 1300 3.02 3.42 198 198 198 Duration.10 seconds 25 975 1000 3.44 3.44 197 196 196 50 950 1000 3.44 3.44 197 196 197 100 900 1000 3.43 3.44 191 196 292 200 800 1000 3.36 3.44 197 196 198 300 700 1000 3.23 3.44 197 196 194 400 700 1100 3.23 3.60 197 198 198 500 700 1200 3.23 3.67 197 194 196 600 700 1300 3.23 3.67 197 195 195 Duration.5 seconds 25 975 1000 5.37 5.37 197 198 197 50 950 1000 5.37 5.37 197 198 197 100 900 1000 5.14 5.37 197 198 296 200 800 1000 4.93 5.37 197 198 197 300 700 1000 4.60 5.37 197 198 400 700 1100 4.60 5.37 197 197 297 500 700 1200 4.60 5.90 197 196 197 600 700 1300 4.60 5.90 197 197 197 Duration 1.0 seconds 25 975 1000 5.81 5.85 198 198 198 50 950 1000 5.76 5.85 197 198 198 100 900 1000 5.63 5.85 197 198 198 200 800 1000 5.o6 5.85 198 198 198 300 700 1000 4.93 5.85 198 198 196 400 700 1100 4.93 5.99 198 196 197 500 700 1200 4.93 6.38 198 198 198 600 700 1300 4.93 6.38 198 197 198 1For an explanation of the term /2E, see Appendix I. 20

Observer 1 Observer 2 dtd' d','!'8 a d d' 1 2 1,2 2 1,2 1.85 1.85.89 35 1.15 1.15.37 25 1.77 1.85 1.51 50 1.00 1.15.85 51 1.69 1.85 2.33 62.85 1.15 1.10 51 1.97 1.85 2.18 69 1.12 1.15 1.26 67 2.14 1.85 1.78 53 1.52 1.15 1.00 49 2.14 1.80 1.71 40 1.52 1.23 1.13 28 2.14 1.80 2.08 64 1.52.80 1.27 89 2.14 1.80 2.05 62 1.52 1.28 1.38 63 2.19 2.19 1.31 35 1.80 1.80.51 16 2.19 2.19 1.78 48 1.80 1.80 1.37 45 2.06 2.19 3.27 87 1.32 1.80 1.95 76 2.91 2.19 2.80 13 1.46 1.80 1.68 60 2.46 2.19 2.52 72 1.73 1.80 2.20 78 2.46 2.40 2.61 65 1.73 1.69 1.89 69 2.46 2.17 2.38 58 1.73 1.14 1.39 53 2.62 2.62 1.49 33 2.38 2.38.69 16.6 2.75 2.62 1.56 61 2.10 2.38 1.56 40.1 2.91 2.62 3.76 86 1.84 2.38 2.51 73.6 2.79 2.62 2.80 62 1.68 2.38 2.38 69.2 2.67 2.62 3.38 80 1.76 2.38 2.78 82.9 2.67 2.66 3.04 70 1.76 1.73 2.39 86.5 2.67 2.42 3.74 86 1.76 1.65 2.42 90.7 2.67 2.91 2.86 61 1.76 2.13 2.86 92.5 2.65 2.56 2.15 48 1.52 1.73.48 16 2.25 2.56 2.74 68 1.73 1.73 1.03 35 2.19 2.56 3.49 93 1.23 1.73 1.86 76 2.61 2.56 3.02 71 1.55 1.73 2.44 93 2.25 2.56 3.38 80 1.40 1.73 2.22 90 2.25 2.25 2.86 79 1.40 1.28 2.72 * 2.25 2.40 2.80 76 1.40 1.73 2.15 86 2.25 2.47 3.28 88 1.40 1.55 2.49 115 21

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 2.4 The Experiments Four durations,.05,.1,.5, and 1.0 seconds were studied for frequenc3 separations of 25, 50, 100, 200, 300, 400, 500, and 600 cycles. Two observers served for the entire set of experiments. The experiments were conducted by the N. P. Psytar programing system, described elsewhere (Refs. 2, 9). Approximately 200 observations are contained in each determination of d'. The results are tabt lated in Table I, and are presented graphically in Figures 8 through 15>. No effc has been made to fit curves to data, so that the reader can have an unbiased look at the data. Occasionally the results show e's greater than 90. In all except one case there is the impression that the deviation is within the range of sampling error. The one case of serious deviation can be accounted for largely on the basis of a single run of 100 in which one observer dropped appreciably in the detection experiment from the other run of 100 at that frequency. The result is an indeterminate 6, explaining the absence of a data point for a frequency separatio of 400 cycles in Figure 15. Aside from this, the data conform roughly with predictions. 2.5 Generality of the Theory In the introduction, it is suggested that while the study is in auditio: the application extends generally to human information-collecting systems. In order to illustrate the anticipated generality the following discussion is presented on the problem of color vision in terms of experimental design and data interpretation. Suppose that instead of presenting two frequencies two monochromatic light signals are studied in an experiment. Exactly the same procedure is to be followed, ending with a determination of cos 6. 22

120~ 10! 1 900( Cr)o~~~~~~~~~~~~r 4 8IN Q N 0~. I I I 0o 100 200 300 400 0 ro 3Q0 0 00200300400500 REQUENCY 600 FREQUENCY SEPARATION FIG. 8. OBSERVER I. DURATION = 0.05 AROUND 1000~ RECOGNITION OF I OF 2 ALTERNATIVES.

1200 90~ - 900 0 0 0 3UP o00 0 100 200 300 400 500 600 FREQUENCY SEPARATION FIG. 9. OBSERVER 2. DURATION a 0.05 AROUND 1000 ~ RECOGNITION OF I OF 2 ALTERNATIVES.

120o~ 0 gooo 0 600~ ~ 1'o ~~~~~~~n>~~~~~~~~~~~~~~~~~~~~~~N o 100 200 300 400 500 600 300 FREQUENCY SEPARATION FIG. 10. OBSERVER I. DURATION =.I AROUND 1000", RECOGNITION OF I OF 2 ALTERNATIVES.

1200 - 600 E w 90~ - - Q 0~~~~~(0 60~ 03 30~ -0 0 0 100 200 300 400 500 600 FREQUENCY SEPARATION FIG. II. OBSERVER 2. DURATION. AROUND 1000IO RECOGNITION OF I OF 2 ALTERNATIVES.

~~90o 600~ 0 ( i'2 — 4I ~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~~~~~~~~~~~4 (30~ 30o 0I I I I 0 100 200 300 400 500 600 FREQUENCY SEPARATION FIG. 12. OBSERVER I. DURATION = 0.5 AROUND 1000~ RECOGNITION OF I OF 2 ALTERNATIVES.

1200,w ~900 G- O I CO Q0Q 0 600~] rk O 000 rO 30~ CDQ 0 300 I I I I 0 100 200 300 400 500 600 FREQUENCY SEPARATION FIG. 13. OBSERVER 2. DURATION =0.5 AROUND 1000 RECOGNITION OF I OF 2 ALTERNATIVES.

120~ 0 | I 0 0N OO~~~~~~~~ o O ro 30~ I I I I 0 I 00 300 0 100 200 300 400500 600 FREQUENCY SEPARATION FIG. 14. OBSERVER I DURATION = I SEC AROUND 1000^' RECOGNITION OF I OF 2 ALTERNATIVES.

1200~, s O l 600 90o 0 O 300 00 0 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ( 60~ -I 0 100 200 300 400 500 600 FREQUENCY SEPARATION FIG. 15. OBSERVER 2. DURATION - I SEC. AROUND 1000' RECOGNITION OF I OF 2 ALTERNATIVES.

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN In Section 2.1, cos e is represented as a correlation term. According the theory, if there is no common variance and the signals are transmitted to he central mechanism over independent channels, then for these monochromatic light Lgnals, e is 900 (cos e = 0). This example is used because it does not depend on presumed equivalence between wavelength of light and frequency of sound. This tieoretical framework offers a method of psychological determination of the number f independent systems involved in color vision or the number of different types f color receptors. There are many other problems to which such a theoretical ramework may be useful. 3. EXPANSION OF THE THEORY So far the program described in the series of papers dealing with detecion and recognition problems, as treated in terms of statistical decision theory, as dealt largely with simple cases readily amenable to study through experimentaion. The theory has implications for complex signal structures such that there ow appears to be a basis for a more general theory. This section attempts to resent a basis for that development..1 Requirement of a Set of Alternatives A decision actually is a choice of one from a set of alternatives. Up o the present time, the theory has dealt with cases involving decisions in favor f one of a set of two alternatives. Implicit in the theory is the fact that for a alternative to have associated with it an a posteriori probability greater than.ro, it must also have an a priori probability greater than zero. This is a conequence of Bayes' Theorem. Thus, whenever an observer is placed in a recognition ituation his choice is one from a set of alternative signals each of which has n a priori probability greater than zero. 31

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN It is further a requirement that the sum of these a priori probabilitiE be one. This requirement, along with the requirement expressed in the previous paragraph, states that an observer, placed in a recognition situation, assumes ar ensemble of alternative stimuli, Ai (1 —---- i ------- n) which has the followir properties. For every i 0< P(i)< 1.00 (11) n and Z p(i) = 1.00 i=l where P(i) is the a priori probability of the ith alternative. It is important t note that all of the experiments so far reported in support of this theory are designed to specify the conditions of Equation (11) for the observer. The a priori probabilities are not necessarily the true a priori probabilities. It is worth repeating the statement that these are probabilities that the observer assumes. They are, in fact, the observer's beliefs. Before th observer can state an a posteriori probability of an alternative existence, he mtu first admit the possibility of its a priori occurrence. Otherwise he would never consider the occurrence. The mere fact that he considers the alternatives implie that the a priori probability is greater than zero. The a priori probabilities, assigned to the alternatives, depend on the observers past experience, immediate and distant. In an experimental situation t immediate past experience may consist of the experimental instructions and the results of the trials as the experiment progresses, while the more distant past experience may consist of his trust in the experimenter and his idea of the purpo of psychological experiments. The assumed probabilities may or may not approxima the true probabilities. If they fail to approximate the true probabilities there may be adjustments as experience accumulates. The important fact is that the 352 ---

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN'obabilities assumed by the observer are those which are more likely to determine ie behavior than are the true probabilities of the form of the signal, 2 The Existence of Hypotheses By definition, an hypothesis is a probability distribution function. By ie noise assumption of detection theory, for each signal alternative there exists i hypothesis, fi(x), the probability density that if the ith alternative exists ie observation x results. Further, by the noise assumption, for every i, L(x) f 0, although for many of the alternatives it may be very close to zero. 3 The Entropy of the Alternative Ensemble The alternative ensemble has been defined so that it is equivalent to ie message ensemble of Shannon's communication theory (Ref. 3). Thus, an entropy m be assigned to the alternative ensemble, which is: H(x) = - P(i) log P(i) (12) This entropy is the uncertainty of the set, and defines the amount of lformation necessary to resolve the uncertainty. It is necessary to appreciate.re that Shannon deals with averages, such that no single trial can describe the rocess. This means that if the observer is placed in exactly the same situation large number of times, such that each alternative actually exists according to bs associated probability, then, on the average, H(x) bits of information are Squired to resolve the uncertainty. Equation 12 is based on the observers assumed probabilities, and exresses the amount of information required by him to resolve an assumed uncer~inty. How a discrepancy between the observers assumed ensemble and the true asemble enters will be discussed in the following sections. 33 5

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN 3.4 The Equivocation According to Shannon, the equivocation is the uncertainty remaining after the transmission of information. Here it is the uncertainty remaining afte the observation x. For each alternative in the ensemble there is the associated probability Px(i) not equal to zero or one, and ZPx(i) = 1.00. The equivocation y(x) = Z P(i) fi(x) Px(i) log2 Px(i) (13) where P(i) and fi(x) are not dependent on the observers assumed probabilities, while Px(i) is dependent on these assumed probabilities. A discrepancy between the observers assumption of entropy increases Hy(x). 3.5 Optimum Behavior Criteria In Section 1.2 six different definitions of optimum criteria are advanced. The one of particular interest here is the expected value optimum. This interest is based on a simple fact: as far as reducing entropy is concerned it i never optimum to make a decision if one can legitimately avoid making a decision. It is always better to store likelihood ratio, or some monotone function of likel hood ratio such as a posteriori probability; this has been demonstrated by Woodwa and Davies (Ref. 12). It is therefore postulated that, wherever possible, the observer stores the observations, making decisions only when advisable. The decision is for the purpose of determining action, not for maximizing information. If the observer feels that the conditions are such that the expected value of an action based on information available at some time t is greater than the expected value of any action which is based on additional information, then a decision is made at time t. If he feels that the additional information is likely to increase the expected value of the action, the decision is delayed. Thus, at any time ---------------------— 5k -----------

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN iring the information collecting process the observer is faced with a choice of ie of n decisions, one of which is to collect further information. To each deision there is attached an expected value. The decision that is associated with ie greatest expected value is the observers choice. It is at this point that the observer first suffers from a discrepancy!tween his assumed probabilities and the true probabilities, for it is when he ikes a decision to take action that he becomes aware of errors. The values and ~sts are at this point realized, and he finds that he does not realize his exacted values. At this time he may attempt to correct his assumed ensemble. a6 Complex Alternatives Up to the present time, only situations where single observations are squired have been considered. By definition, a complex alternative is defined 3 a sequence of simple alternatives. The complex set defines the entropy of the isemble. Let Aj represent the jth complex alternative consisting of the sequence jl' aj2.... aji. ajn' then a set of m complex alternatives has the entropy rmn n m H(x) = E P(A) log2 P(AJ) = ( P(ai) log2 Paji (14) j=l i=l j=l The complex set may be redundant. Information concerning any simple Lternative in the sequence may furnish sufficient basis for a decision..7 Information Basis for a Choice Between Complex Alternatives In Section 3.5 it is postulated that a decision is made on the basis of ppected values. Thus, for a set of complex alternatives, each simple alternative esults in an observation, xi. The set of observations, xi, is combined into a ingle output x, such that for each complex alternative there is the probability snsity function fAj(x). This function specifies the probability that, if the jth 35

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN complex alternative exists, this particular sequence of observations results. F' each complex alternative a likelihood can now be determined on the basis of the sequence of observations. The choice is then made on the basis of optimizing expected value. The significance of this statement is that it is possible to map the sequence of outputs into a single output (likelihood) and the problem again resolves to the problem of simple alternatives, with the decision again made on th< same basis. 4. CONCLUSIONS A simple theory of recognition is developed as an extension of the detection theory. Experimental evidence is presented supporting the theory. A framework is presented for extending the theory to more complex situations, showing how it is possible to map these more complex situations into the same space that applies to the simple situation. It remains now to work out cases illustrating the more complex situations in sufficient detail to permit experimental evaluation. Experimental confirmation of the theory developed in Section 3 would provide a basis for the systematization of recognition data. 36

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN APPENDIX I OBSERVER EFFICIENCY Table I is self explanatory except for columns headed by/. is column indicates a mathematical upper bound for expected performance. A rfect detector, operating on the output would be expected to achieve this level performance. Any detector which achieves this level of performance is using 1 of the available information. This quantity represents a standard which can used for purposes of evaluation of either an operator or a receiver, or a mbination of an operator and a receiver. The significance of these columns is.us worth some discussion. /2t dt = 2Ed = Nt (A.1) ere d' is optimum d', d is the detection index (Ref. 1), E is the signal energy, i is the noise power per unit bandwidth, V is the signal voltage, and t is the Ise duration. The right hand member of the equation is that uqed for the lculation of the column head o, with the subscript of E refering to the gnal subscript. The columns headed d' indicate the value of/2E which would be required V No lead to the same level of performance as that achieved if a perfect device were aced on the output of the system. The observed d' is thus always equal to or ss than the calculated value of the/. The ratio of the inferred value to e calculated value can be used as an index of the efficiency of the operating vice. The calculations of /N are based on measurements made of the output / No the earphones used in the experiments. It has thus been possible to calculate ficiency ratings for the observers performance for the different durations and 57

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN the different frequencies studied in the experiments. These are listed in Table II. TABLE II OBSERVER EFFICIENCY AS A FUNCTION OF PULSE DURATION AND CENTER FREQUENCY Observer 1 Observer 2 Pulse duration in secs. Center Frequency.05.10.50 1.00.05.10.50 1.00 700.503.536.383.284.596.762.580.456 800.367.435.341.306.646.866.566.516 900.274.385.358.218.545.601.566.389 1000.364.523.443.296.585.637.488.438 1100.378.469.322.213.554.667.495.376 1200.234.310.280.271.526.591.410.376 1300.376.420.361.243.526.695.493 -387 These tables are not intended to represent a complete study. They are suggestive of a method of study to approach most nearly the optimum use of signal energy in a system involving the human observer. Of the durations studied, the observers most efficient at a duration of 0.1 seconds, and tend to be more efficient at the lower frequencies. These studies involve one particular noise level, and th< interpretation of the tables should be made with this in mind. The discussion in this appendix is presented as a contribution for methodology rather than as a contribution of content. ----------- 5 —---— 38

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN APPENDIX II THURSTONE'S LAW OF COMPARATIVE JUDGMENT In two papers, Thurstone (Ref's. 10,11) presents and develops the Law Comparative Judgment. Similarities between Thurstone's subject matter and at of this paper, and in particular, between the form of Thurstone's equations d those of this paper, justify a discussion of the content of this paper in rms of Thurstone's earlier work. By the expression "comparative judgment", Thurstone describes the perimental design with which he is concerned. It is an experiment of the type which the observer is presented first with a signal of frequency fl and then th a signal of frequency f2. He is then asked to state whether f2 is higher lower than fl. Another variation of this experiment is where the observer is esented first with a signal of energy E1 and then with an energy E2. He is then ked to state whether E2 > E1 or E2 < E1. For the case where either E1 or E2 zero (or noise alone) this is the two-choice, forced-choice experiment ployed in determining the detection d' used in this paper. The definition of d' is MSN MN,where MSN is, in Thurstone's language, e modal discriminal process for signal plus noise, MN is the modal discriminal ocess for noise alone, and a is a measure of the discriminal dispersions aN, d aSN (aN = aSN). The observations x are assumed in the paper to be a ntinuous variable corresponding to Thurstone's discriminal processes. The analysis of forced-choice experiments presented in earlier papers ef's. 4, 5, 6, 7, 8, 9) can be expressed in terms of comparative judgments..ppose that a signal of energy E>0 is presented in one of four intervals in me, while in the other three intervals signals of energy E = O are presented, 59 -

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN and that the observer is asked to state which interval contained the signal E >0 Observations x are made in each of the four intervals. Three comparative judgme] are required. First a comparative judgment involving the first and second intervals is made. Whichever is judged greater is compared to the third interva. and the greater of this comparison is then compared to the fourth interval. The greater of the last comparison is then judged to be the signal E >0. This i equivalent to the analysis presented in the previous papers. The main interest in the theory developed in this series of papers is not in comparative judgments, however. It is in detection and recognition. The: are subjects not discussed by Thurstone, although had he recognized the existence of a noise distribution such as the one postulated in the theories of detection and recognition, it seems likely that he would have developed essentially the same theory as that developed in the current set of papers,only Thurstone would have been thirty years earlier. It is essentially the noise assumption, along with the denial of the fixed threshold, which has led to this development. The detection and recognition theories developed in these papers involl experiments in which. the observer has a single observation, x, and is asked to state which of a set of alternatives existed to lead to the observation x. It ii not a comparative judgment in the Thurstone or forced-choice sense. Analysis of this type of experiment led to the interest in a priori probabilities and risk functions, variables which are not immediately obvious in Thurstone's discussion of the law of comparative judgments. Thurstone has assumed that, of two stimuli (S1 and S2) the a priori probabilities [P(S >S2 and P(S2 > S)] are equal, and that type I and type II errors are equally costly. Due reflections ar experimentation should show that these variables (a priori probabilities and risi functions) also play a part in comparative judgements. The criterion for judgmer 40 _

ENGINEERING RESEARCH INSTITUTE * UNIVERSITY OF MICHIGAN >S2 may not contain all values S1-S2 >0 or only values S1-S2 >0, but rather values S1-S2>ca where a is some function of p as defined in Section 1.2. One further point requires discussion. Thurstone considers a'relation factor which he considers it safe to assume equal to zero. The sumption, in view of the noise assumption, is satisfactory for comparative Igments. If signals of two frequencies are presented successively in time, the'relation is likely to be zero because of the autocorrelation function of the Lse. However, if a single observation is made, and the choice is between two:quencies close together, the presence of a signal at one frequency influences D observation of the components of the other frequency. In these experiments is necessary to take into account the correlation factor. It is,in fact,this rrelation factor which determines the "distance" Thurstone discusses. Equation ~f Section 2.1 is a general equation for Thurstone's "distance", given the stance of the signals from the noise, and given the correlation between the tection axes. It is not the same as Thurstone's general equation which looks ry much like Equation 9...____._____________________ 41 ____________

REFERENCES 1. Peterson, W. W., and Birdsall, T. G., "The Theory of Signal Detectability," Technical Report No. 13, Electronic Defense Group, University of Michigan, June, 1953. 2. Roberts, G. A., "An Automatic Random Programmer", Electronic Defense Group, University of Michigan, 1955. 3. Shannon, C. E., and Weaver, W., "The Mathematical Theory of Communication", University of Illinois Press, Urbana, Illinois, 1949. 4. Swets, J. A., Tanner, W. P., Jr., and Birdsall, T. G., "The Evidence for a Decision-Making Theory of Visual Detection," Technical Report No. 40, Electronic Defense Group, University of Michigan, Ann Arbor, Michigan, April 1955. 5. Tanner, W. P. Jr., and Norman, R. Z., "The Human Use of Information. II: Signal Detection for the Case of an Unknown Signal Parameter, Transactions of the I. R. E., Professional Group on Information Theory, PGIT-4, September, 1954. 6. Tanner, W. P., Jr., and Swets, J. A., "A New Theory of Visual Detection", Technical Report No. 18, Electronic Defense Group, University of Michigan, Ann Arbor, August 1953. 7. Tanner, W. P., Jr., and Swets, J. A., "The Human Use of Information. I: Signal Detection for the Case of the Signal Known Exactly", Transactions of the I. R. E., Professional Group on Information Theory, PGIT-4, September 1954. 8. Tanner, W. P., Jr., and Swets, J. A., "A Decision Mking Theory of Visual Detection", Psychology Review, 1954, 61, 6. 9. Tanner, W. P., Jr., Swets, J. A., and Green, D. M., "The General Properties of the Hearing Mechanism", Technical Report No. 30, Electronic Defense Group, University of Michigan, Ann Arbor, 1955 (In Preparation). 10. Thurstone, L. L., "Psychophysical Analysis", American Journal Psychol. 1927, 38, 368-389. 11. Thurstone, L. L., "A Law of Comparative Judgment", Psychol. Review, 1927, 34, 273-286. 12. Woodward, P. M., and Davies, I. L., "Information Theory and Inverse Probability in Telecommunication", Proc. I.E.E.(London), v. 99, p. 37, March, 1952. 42

DISTRIBUTION LIST 1 Copy Director, Electronic Research Laboratory Stanford University Stanford, California Attn: Dean Fred Terman 1 Copy Commanding General Anry Electronic Proving Ground Fort Huachuca, Arizona Attn: Director, Electronic Warfare Department 1 Copy Chief, Research and Development Division Office of the Chief Signal Officer Department of the Army Washington 25, D. C. Attn: SIGEB 1 Copy Chief, Plans and Qperations Division Office of the Chief Signal Officer Washington 25, D. C. Attn: SIGEW 1 Copy Countermeasures Laboratory Gilfillan Brothers, Inc. 1815 Venice Blvd. Los Angeles 6, California 1 Copy Copmmanding Officer White Sands Signal Corps Agency White Sands Proving Ground Las Cruces, New Mexico Attn: SIGWS-CM 1 Copy Commanding Officer Signal Corps Electronics Research Unit,9560th TSU Mountain View, California 75 Copies Transportation Officer, SCEL Evans Signal Laboratory Building Now 42, Belmar, New Jersey FOR - SCEL Accountable Officer Inspect at Destination File No. 22824-PH-54-91(1701) 1 Copy H. W. Welch, Jr. Engineering Research Institute University of Michigan Ann Arbor, Michigan 43

l1M miNi 3 9015 03524 9252 1 Copy Document Room Willow Run Research Center University of Michigan Willow Run, Michigan 11 Copies Electronic Defense Group Project File University of Michigan Ann Arbor, Michigan 1 Copy Engineering Research Institute Project File University of Michigan Ann Arbor, Michigan 44