THE UNIVERSITY OF MICHIGAN RESEARCH INSTITUTE ANN ARBOR THE CONCEPT OF THE IDEAL OBSERVER IN PSYCHOPHYSICS Technical Report No. 96 Electronic Defense Group Department of Electrical Engineering By: W. P. Tanner, Jr. Approved by: -_T. G. Birdsall A. B. Macnee F. R. Clarke AFCRC TR 59-54 Contract no. AF 19(604)-2277 OPERATIONAL APPLICATIONS LABORATORY AIR FORCE CAMBRIDGE RESEARCH CENTER AIR RESEARCH AND DEVELOPMENT COMMAND Apri 1 960

TABLE OF CONTENTS Page LIST OF ILLUSTRATIONS iii ABSTRACT iv 1. INTRODUCTION 1 2. GENERAL THEORY OF THE IDEAL OBSERVER 5 3. THE CASE OF A SIGNAL KNOWN EXACTLY IN BAND-LIMITED WHITE,GAUSSIAN NOISE 13 4. THE MODEL OF THE IDEAL OBSERVER 19 5. BASIC DEFINITIONS OF d' AND l 24 6. SUMMARY 34 REFERENCES 41 ii

LIST OF ILLUSTRATIONS Figure Page 1. Decision Axis 11 2. Receiver Operating Characteristic (ROC) for the Ideal Receiver when 2E/No = 1.00 11 3. The Ideal Observer 21 4. Basic Psychophysical Experiment in BlockDiagram Form 25 5. Composite Block Diagram of Channels for Psychophysical Experiment 25 6. Individual Block Diagram of Channels for Psychophysical Experiment 26 7. Relation between d' and (2E/N )1/2 26 8. Distribution of the Difference of Two Variables for the Two-alternative Forcedchoice Experiment 26 9. Difference Signal for the Two-alternative Forced-choice Experiment 26 10. Illustration of Recognition Space for Definition of 37 11. Recognition Space for Large Signals 37 iii

ABSTRACT This report discusses the manner in which the concept of the ideal observer serves as a tool for the development of a description model of human performance in the detection and recognition of signals. iv

THE CONCEPT OF THE IDEAL OBSERVER IN PSYCHOPHYSICS 1. INTRODUCTION In this section, some of the factors which must be considered by an observer in arriving at a decision will be discussed. The concept of an ideal observer will be introduced and this model will be contrasted with the more typical descriptive type of model. Then the model of the ideal observer will be considered as a tool in arriving at descriptive models of sensory systems. Support for many of the statements made in this introduction will be given in the body of the paper. Consider any situation in which there is a noisy input to some observer. Following this input the observer must make some decision about the physical event which gave rise to the input. In making his decision, the observer must consider two types of information: the actual observed input and the decision function to be maximized. This latter factor is illustrated by the fact that for some particular input, the observer may be led to one decision if he is attempting to maximize the percentage of his responses which are correct and be led to a different decision if he is attempting to maximize the expected value of his decisions. Thus, in studying any observer, one must take into account not only the inputs which are being observed, but also, all external criteria affecting the observer's responses. The concept of the ideal observer recognizes the fact that there are statistical properties of the environment which limit performance in situations similar to those involved in psychophysical experiments. Though the concept of the ideal observer is a general

one, the actual mathematical specification of the ideal observer is specific to the particular situation under study. For any particular situation, it is necessary to specify the decision function which the observer is attempting to maximize, the a priori probabilities of the various possible signal inputs, and all of the physical parameters of the signals and of the noise in the channel. Given such specification, an ideal observer may be defined for that situation. The performance of the ideal observer is then obtained by mathematical computations. On the average, no physical observer, human or man-made, may equal or surpass the performance of the ideal observer. It is important to note that the model of the ideal observer is a concept developed without reference to data. In this way it differs from most models in psychology which are aimed toward describing relations already observed. In dealing with descriptive models, one should keep in mind that whenever there is only a finite set of data upon which to base a description, then there are an infinite number of descriptive models capable of handling these data. Additional data reduce the class of models which yield satisfactory descriptions, but as long as the data are finite, the remaining class of models is always infinite. At best, a descriptive model can only be a mathematical statement of the relations between observable events. The mathematical parameters may have intuitive meaning permitting the model to be extended to predict previously unobserved relations between events. However, the fact that there are an infinite number of models capable of accounting for the observed data should make it clear that the mechanisms assumed by some particular descriptive model are not necessarily the mechanisms of the entity under study. 2

Although the model of the ideal observer is not a descriptive model, it is of considerable importance to theorists who desire to construct descriptive models. The precise mathematical definition of an ideal observer for some given situation specifies all of the variables which are of importance in leading to a decision in the situation under study. Also, it completely describes the manner in which these variables affect the performance of the ideal observer. Several advantages are to be gained from this knowledge. Since the model of the ideal observer is stated in abstract form, it is a useful framework for the stating of a wide class of problems involving sensory systems. Each specific problem statement incorporates the special features of the particular problem into the general structure. The model thus serves as a theoretical framework for organizing an experimental program investigating the behavior of sensory systems, and serves as a guide toward selecting the relevant measures necessary to describe their performance. In any specific situation the model of the ideal observer not only specifies an upper and a lower bound on the possible performance of the human observer, but specifies all of the relevant variables which should be used in arriving at a decision and gives the manner in which they should be utilized. Insofar as the human observer fails to utilize all of these variables to their full extent, or attempts to use irrelevant variables, his performance must fall below that of the ideal observer. It is possible to study the degradation in the performance of the ideal observer as uncertainty concerning the values of the various parameters is introduced. Experimental study of the human observer as the same uncertainties are intro3

duced and comparison of his performance with that of the ideal may in many instances lead us to specify exactly which aspects of the situation are being utilized by the human observer. This is one example (which will be expanded in the text) of the manner in which the concept of the ideal observer serves as a tool in the study of the real observer. From experimental evidence it is known that the performance of the human observer in psychophysical experiments is markedly affected by variables not typically considered within the domain of the sensory system under study. Also, comparisons of human performance with that of the ideal observer suggest that human beings perform a large number of tasks with surprising efficiency. This suggests that the parameters of the sensory mechanism are adjusted in a nearly optimal way for specific observation situations. This leads to the view that human sensory mechanisms may profitably be regarded as subsystems of a larger intelligent system. The form that the sensory mechanism assumes is very likely dependent upon the particular experiment being performed and the way in which this larger intelligent system views the experiment. In order to understand the nature of the sensory system it will be necessary to understand the way in which these parameters might be fixed and the range of values they can assume. Precise specification of the parameters of the sensory system and the manner in which they are varied to meet the needs of the larger system would constitute one possible model of the sensory system. 4

2. GENERAL THEORY OF THE IDEAL OBSERVER The concept of the ideal observer was first advanced by Seigert in Lawson and Uhlenbeck (Ref. 1). The concept, as Seigert advanced it, was much simpler than that which exists today; and methods of application to psychophysics were not worked out in any great detail. The theory was given greater generality with the publication of two papers in 1954. One, by Peterson, Birdsall and Fox (Ref. 2), forms the basis for the development in this section. The second paper, which was written by Van Meter and Middleton (Ref. 3), contributed an independent development. It is instructive to examine a particular instance of an ideal observer. This specific case will deal with the detection of a known signal (some particular waveform) in a background of bandlimited white Gaussian noise. Based upon the information (observed waveform) received during an observation interval of time T, it is the task of the receiver to determine whether the observed input arose from noise alone or from signal-plus-noise. The approach followed in studying the general problem can be divided into three steps. The first is the identification of the relevant information leading to a decision based solely upon the a priori knowledge of the probabilities of the various alternatives. The next stage is the identification of the relevant information, given not only the a priori knowledge but also the knowledge contained in the observation interval. The final step is the determination of the difference between the aposteriori information and the a priori information. This difference specifies the relevant information carried in the observation interval. 5

The case in which the decision must be based upon a priori information alone can be looked upon as a game of chance. There are two possible strategies here: to say that the signal is present in the background of noise, or to say that the noise alone is present. As one is here considering an observer designed to maximize the expected value of his responses, one should be concerned with EA(V), the expected value of a strategy of saying that a signal is present, and with ECA(V), the expected value of a strategy of saying that noise alone exists. (The symbol A, denotes the collection of points making up the criterion for saying that the signal exists, while CA is the collection of points making up the criterion for saying that noise alone exists; it is the complement of A). These expected values are given by the equations: EA(V) = P(SN) VSN.A - P(N) KNA (1) ECA(V) = P(N) VNCA - P(SN) KSNCA (2) where P(SN) is the a priori probability that the signal exists, P(N) = 1 - P(SN) is the a priori probability that noise alone exists, VSNA is the value of saying that the signal exists when the signal is present, KN.A is the cost of saying the signal exists when noise alone is present, and V C and K are similarly defined. For those games of chance N.CA SN CA for which the value of Eq. (1) is greater than the value of Eq. (2), one should follow the strategy of always saying that the signal exists. If the value of Eq. (2) is greater than that of Eq. (1), one would follow the strategy of always saying the noise alone exists. If the two 6

equations are equal, the choice of strategies is irrelevant. Thus, the condition for saying that signal-plus-noise exists is given by Eq. (3). P(SN) VSNA - P(N) KN.A > P(N) VN.CA - P(SN) KSNCA (3) By manipulation one obtains VN + Kw P(SN) > VNCA + KN-A P(N) VSN-A + SN.CA which describes the condition for saying that signal-plus-noise exists when the decision must be based solely on a priori information. In the case in which the decision is based on observation, we will designate the input to the receiver as x. We may then define P (SN) as the probability that the signal exists, given the observation x, and P,(N) as the probability that noise alone exists given the observation x. In making the decision, these conditional probabilities now play the same role that the a priori probabilities did in the previous case. Thus, the condition for saying that signal-plus-noise exists when the decision is based on observation is given by the inequality P (SN) VN-CA + KNA P.(N)CA KN.A x(N) VSNA + SN-CA If one assumes that x is a discrete variable and has associated with it a probability, then the a posteriori probabilities of the ratio in the left hand member of the inequality, Eq. (5), are defined by Bayes' theorem as follows P(SN) PSN(x) P7(x) -Px(S)= p(= (6) 7

P(N) PN(x) P (N) P(x (7) The ratio of a posteriori probabilities appearing in the lefthand member of the inequality of Eq. (5) can now be written as P(SN) P(SN) PSN (X) x(N P(N) PN(x) ~PjN~T- =PINT *-r~ (8) Examination of this equation will indicate that the left-hand member is a ratio of a posteriori probabilities or an indication of an information state of the receiver after the observation task. The righthand member of the equation is made up of two ratios. The first of these is the ratio of the a priori probabilities and can be looked upon as the information state of the receiver prior to the received input. The second ratio is the likelihood ratio, that is, the ratio of the likelihood of the input x given signal-plus-noise to the likelihood of the input x given noise alone. The likelihood ratio represents the change in the state of the receiver occurring as a result of the input x. The inequality of Eq. (5) can now be rewritten in the following fashion: PSN(x) P(N) VN CA + KN i. >j- P(SN) V(9) NP > P (SN) VSN.A + KSNCA Thus the decision variable for the ideal receiver is likelihood ratio. When the likelihood ratio is greater than some value which is defined by the right-hand member of Eq. (9), the ideal receiver designed to maximize expected value must state that a signal is present in the background of noise. If the likelihood ratio is less than this value, then, this ideal receiver must state that noise alone is present. 8

In the above development, x is considered to be a discrete variable. If x is continuous, one need only to write the likelihood ratio as a ratio of probability densities or likelihoods, and a similar development can be carried through. In this case, 1 f f(x) (x) = f N() Thus far, we have made no use of the fact that this example deals with the detection of a known signal in a background of bandlimited white Gaussian noise. The above development is far more general than implied by any such limitation. However, when we come to the question of how l(x) is distributed it is necessary to specify the situation in complete detail. It will be shown in section 3 that in the case where the ideal observer knows all of the parameters of a signal which is presented in a background of band-limited white Gaussian noise, a simple monotone transform of l(x) is normally distributed. There are four possible outcomes to any given observation interval. The signal may either be present or not present in the background of noise and the observer may in either instance state that it is or is not present. As the observer always makes a response, his behavior is described by the following probabilities: PSN(A) + PSN(CA) 1.0 (10) PN(A) + P(CA) = 1.0 (11) Thus the observer's performance may be specified by PSN(A) and PN(A). A plot of PSN(A) against PN(A) is termed a receiver operating 9

characteristic (ROC curve). Let us now examine the generation of an ROC curve for the ideal observer. In the case where some monotone transformation of I(x) is normally distributed, we may represent the two hypotheses, signal-plus-noise and noise alone, as normal distributions each with unit variance and a mean difference of some value, say, d'. Thus, P(N) VN.CA + KN.A any value of pN.A + K may be transformed appropriately N) -VSN-A + KSN-CA and represented by a single point, B, on the axis of Fig. 1. All points to the right of B represent the criterion Awhile all to the left represent CA. Thus, PSN(A) is given by the area which lies to the right of B under the distribution for signal-plus-noise, while PN(A) is given by the area which lies to the right of B under the noise distribution. The ROC curve illustrated in Fig. 2 is generated by allowing B to vary from negative infinity to positive infinity. Thus far we have been dealing solely with an ideal observer designed for a particular situation. Nonetheless, as pointed out in the introduction, this gives us some information about the human observer. For the particular signal energy leading to the ROC curve of Fig. 2, we know that the performance of the human observer cannot result in a datum point lying above this curve. Nor can the human observer perform in such a manner as to fall below the mirror image (about the diagonal) of this curve. Thus, we have specified upper and lower bounds on the performance of the human observer. Furthermore, Eq. (9) specifies all of the variables which are relevant to the performance of an ideal observer. This specification is implicit 10

d d' - S + N NH sA PV-(A) N/^~~ ~~~~~ // \///) — PN(A) b DECISION AXIS Figure 1. Decision Axis. 1.0.2 0.2.4.6.8 1.0 FALSE ALARM PROBABILITY Figure 2. Receiver Operating Characteristic (ROC) for the Ideal Receiver when 2E/No = 1.00. PSN(x) since all of the variables affecting the likelihood ratio, SN \ are not detailed in this equation. These variables will be made explicit in section 3. Knowledge of all of the relevant informationbearing variables thus may lead us to design experiments to determine 11

which of these variables are not being utilized, or are being utilized imperfectly by the human observer. It may be well at this point to add one summarizing comment. The development above is based upon probability theory. It states the results that one might expect of behavior based on decision rules in cases where there are a large number of events and the statistical properties of the environment are stable. The development has been carried on based upon the criterion which optimizes the expected value of the outcome. This is of course an arbitrary choice. As far as any real-life situation is concerned it may not be the proper one. It was chosen here because the experiments which will be based upon this development will be designed in a way which makes the optimization of the expected value a logical one. However, there may be many reasons why, in any particular case, one might want to optimize some other outcome. Peterson, Birdsall, and Fox carried their development through merely for optimizing the number PSN(A) - wPN(A) where w is an abstract weighting factor. This is called the weighted combination criterion. They found that in general the decision rule which optimized this quantity is that one which leads to a response signal-plus-noise whenever the likelihood ratio is equal to or greater than the number w. Otherwise the response should be noise alone. They then consider several quantities upon which to base the optima. One of these was the expected value. A second was the optimization of correct decisions. A third was the optimization of correct decisions given a fixed falsealarm rate. A fourth is the optimization of information. They then showed that each of these criteria could be expressed as a weighted 12

combination criterionand they specified the form that w should take for each of the cases of optima. These are discussed by Birdsall in his chapter in Quastler's book,Information Theory and Psychology. It should again be pointed out that the use of the model developed by Peterson, Birdsall and Fox is not intended to represent a theory of how the human being operates, it is, rather, used as a tool to describe experiments in a way which will permit a study of how the human being operates. 3. THE CASE OF A SIGNAL KNOWN EXACTLY IN BAND-LIMITED WHITE GAUSSIAN NOISE The theory of signal detectability is actually a theory of the capacity of signals in noise to lead to detection. The ratio 2E/N has been used to state this capacity in other papers. This ratio is the result of the development of Peterson, Birdsall, and Fox stating the difference between two statistical hypotheses when a signal is known exactly and imbedded in a white Gaussian noise. It is assumed that both the signal and the noise are Fourier series band limited and of finite duration. That is, on the observation interval, both the signal waveform and the noise waveform can be described by a Fourier series with a finite number of terms. These assumptions make it possible to describe any particular voltage waveform containing either noise alone or signal plus noise by a finite set of numbers, that is, by a single point in a finite-dimensional space. Description of a continuous waveform by a single point in space simplifies the statistical analysis of the detection problem, for it is relatively easy to specify the distributions of these points conditional upon various hypotheses. Peterson, 13

Birdsall, and Fox also consider a number of cases in which the signal is known statistically. More recently, Birdsall has developed a model which can be roughly referred to as a radius model. This model treats the case of a signal of which the statistical uncertainty is distributed over N orthogonal dimensions. In the development in this section, only the case of the signal known exactly will be developed in detail. This analysis is made so that the reader may have a feel for the type of development which leads to the statement that 2E/N is a ratio which represents the capacity of a signal to lead to detection. Again the problem is one of a fixed observation interval from zero to T. A signal known exactly is one which,if it exists is precisely placed within that observation interval. Its waveform can be said to be a voltage function of time, this voltage function being known exactly at every position in the observation interval. This voltage function is referred to as s(t). The receiver input will also be considered a voltage function of time, x(t). This receiver input may or may not contain the signal. It always contains the noise. The development below will depend upon the application of a sampling theorem which says essentially that a Fourier series bandlimited waveform extending over a finite interval 0 to T can be described entirely by 2WT equally-spaced independent points in time. The important conclusion from the use of this theorem is the following relation. 14

2WT T 2 2 s = [s(t) ]dt = E(s).... (11) i=l 0 Here, s. represents the value of s(t) at the ith sampling point. This states that the signal energy is T times the average value of the signalvoltage-squared at the i sampling points where i goes from 1 to 2WT. No attempt will be made here to justify the use of the sampling theorem. It will merely be accepted to be satisfactory. The development of the case of a signal known exactly will follow these steps. First it is observed that the decision rule is valid for any variable which is a monotone transformation or likelihood ratio. That is to say, if the decision rule is that the likelihood ratio should be equal to or greater than P in order for the ideal observer to accept the hypothesis that the signal exists, then this rule is exactly equivalent to a rule which states that the logarithm of the likelihood ratio should be equal to or greater than the logarithm of P in order for the ideal observer to accept the hypothesis that the signal exists. Having invoked this equivalent, the first step is to study the distribution of the likelihood ratio at the ith point for the case that noise alone exists. It is then observed that the likelihood ratio of any input waveform is the product of the likelihood ratios for each of the 2WT points when these points are independent and when in any single input the signal actually exists at each of the points or does not exist at each of the points. When the logarithmic transformation is made, then this says that the logarithm of the likelihood ratio of the input waveform is the sum of the logarithms of the likelihood ratios at each of the 2WT points. The distributions giving a sum 15

of these logarithms of likelihood ratios will then be studied for the case of signal-plus-noise and for the case of noise alone. It is found that the ratio 2E/N to the one-half power expresses the difference 0 between these two statistical hypotheses divided by their standard deviations. The noise variance and the signal-plus-noise variance are equal. Let x. be the value of the voltage of the input waveform at the ith point. Since the case being studied is the case of ideal white Gaussian noise, the expected value of this voltage when noise alone exists is zero. The variance of this value when noise alone exists is N, the noise power. The x. point thus can be looked at as a random value drawn from a normal distribution with zero mean and variance N. The probability density of any particular value of x, is given by the following equation. (xi)2 2N fn(x) = 1 e (12) When signal-plus-noise exists the expected value of x. is s.. ]j 1 The variance is still N, the noise power. The probability density of any value of x. when signal-plus-noise exists is given as 2.3 -(x i-s 2 f (X 1 2N (13) = - ----- e (13) SN i 4yr The likelihood ratio is the ratio of Eq. (13) to Eq. (12), 16

(xi-s 2 (x ) e 2N (14) -X. 1 e 2N The natural logarithm of the likelihood ratio is 2x s -s n (x.) = x = - i (15) It is the distribution of this logarithm, X., for the case of the noise alone and for the case of the signal-plus-noise which is relevant to the problem at hand. First, for the case of noise alone, 2 2siMN (x )-s the mean can be written as MN(Xi) =.2Ni (16) where MN(Xi) is the mean of In ~(xi),given that x(t) contains noise alone. Above it was pointed out that MN(Xi), the mean of xi for the case of noise alone, is zero. Therefore the mean of the logarithm of o the likelihood ratio of x. is -s2/2N. Invoking the sampling theorem and summing over the 2WT points,the mean of the logarithm of the likelihood ratio taken over the whole waveform is obtained. 1 2WT 2 M(X) = 2 E(s) (17) 2N i=l N Here E(s) is the energy in the signal and No is N/W, the noise power per unit bandwidth. The next problem is to determine the variance of X. when noise alone is present. The only random variable in Eq. (15) is xi, thus 2 2 a X = ( (18) 17

As the 2WT sampling points are independent of one another, the variance of the in 2(x) over the whole waveform is given by the sum of the variances at the 2WT points. 2WT 2(X) 1 2 = 2WE(s) 2E(s) c -N\ N il N N (19) Thus, the variance of the in 2(x) where noise alone is presented is equal to twice the energy in the signal divided by the noise power per unit bandwidth. Although Eq. (17) and Eq. (19) describe distribution parameters for the case of noise alone, E(s) refers to the energy of the signal which was not presented. This is due to the fact that the distribution under study is that of In i(x) and not simply the distribution of the input waveform. Since each of the x. distributions is normal (because of Gaussian noise), the distribution of the sums is also normal. Therefore, the logarithm of the likelihood ratio of the input waveform xi, when noise alone is presented, is a normal distribution with a mean of -E/N and a variance of 2E/N Eo o The next relevant question concerns the distribution of the value of Eq. 15 for the case where the signal exists. At the x. point the expected value or mean of x. is s.. Thus the mean of the logarithm 1 1 of the likelihood ratio at the xi point is s/22N for the signal-plusnoise. This is the same value as that for noise alone except that it has the opposite sign. Since this is true at each of the xi points the expected value of the sum of the logarithms of the likelihood ratio when the signal is present is the same as the sum of the expected values of the logarithm of the likelihood ratio when noise alone is present except that it has the opposite sign. Thus, the logarithm of the likelihood 18

ratio has a distribution with the mean of +E/No when signal-plus-noise exists. For the variance, again s. is a constant and consequently contributes no variance. All the variance again comes from x.; consequently, the variance of X, when signal-plus-noise is present does not differ from the variance of X. when noise alone is present. The distribution of the logarithm of the likelihood ratio of the input x(t) in the case of signal-plus-noise is thus a normal distribution with mean E/N and variance 2E/N. The difference between the mean of X for the case of signal-plus-noise and the mean of X for the case of noise alone is 2E/N. Each of these conditional distributions of X has as its standard /2E deviation.' Therefore the difference in the means divided by the /2E standard deviation is E Y o Since the case of the signal known exactly is the one in which it is necessary to resolve the least uncertainty, this case will be that which specifies the capacity of a signal of energy E for leading to detection when imbedded in a band-limited white Gaussian noise of power N. One of the critical points in this development is that in order to calculate likelihood ratio it is necessary in this case to know the signal waveform exactly. It is further necessary in this case to know the parameters of the noise distribution. Given this knowledge it is then possible to calculate a likelihood ratio in a manner which permits a separation of the two statistical distributions as described above. Thus, if the human observer were to perform as an ideal observer the following would be necessary: (1) he would have no source of internal noise. That is, the input signal would have to be transformed to a different type of energy by the end organ and transmitted by the nervous 19

system, all with perfect fidelity. (2) He would have perfect memory for the signal parameters and the noise parameters. At any time t within the observation interval he must know the exact amplitude of the signal wave'orm. (3) He would be capable of calculating likelihood ratio or some monotonic transformation of likelihood ratio. These are some of the requirements which must be met by the human observer it he is to perform as well as the ideal observer. Clearly, the human observer does not meet these specifications. However, it is possible to determine experimentally the manner and degree to which the human observer fails to meet these requirements and thus obtain a better understanding of the human observer. 4. THE MODEL OF THE IDEAL OBSERVER The discussion in the two proceeding sections, that covering the general theory of signal detectability and that covering the case of the signal known exactly, is the basis for the block diagram in Fig. 3. This block diagram, is a diagram of the ideal observer. Again it is pointed out that the diagram is not intended to summarize a model based upon psychophysical data. It is rather a description of a mathematical observer designed to perform in the best possible way in an experiment based upon carefully stated assumptions. These assumptions include the assumptions underlying the probability theory, that is, that the environment in which this observer is operating is one in which the statistical parameters are fixed and remain constant. It also assumes that this observer is designed particularly to operate on the ensemble of signals being transmitted and to operate under the particular conditions of the channel. Being so designed this observer can make use of the knowledge of the statistical properties both of the signal and of the noise. 20

PROBABILITIES OF ALTERNATIVES CRITERION UTILITIES OF DECISIONS COMPUTER PARAMETERS OF NOISE DISTRIBUTION DISTRIBUTION FUNCTION SIGNAL PARAMETERS COMPUTER 1 LQA )^ TRANSDUCER ^ RATIO OUTP \SPACE fCOMPUTER SPACE;I.PUT 1 COMPUTER I i I r- -.- -_ - -. _.. J Figure 3. The Ideal Observer. The three blocks enclosed in the dotted lines are actually the description of this observer. The transducer is included only to point out that at times we are dealing with systems in which there must be a transformation of the type of energy involved. The second box computes the likelihood ratio, and the third box matches this likelihood ratio to a point in a criterion space. It is assumed that the matching at this point specifies completely the response of the receiver. The development in Section 2 described the basis for decisions in terms of some external c riterion. In this case the decision computer requires information from a criterion computer. This criterion computer in turn requires the information or knowledge of the a priori probabilities 21

of the signals and a knowledge of the utilities of the various decisions. It should further be noted that the lieklihood ratio computer, in order to perform its task, needs a knowledge of the statistical distributions associated with the signal and with the noise alone. For this reason a distribution computer has been included to feed this information to the likelihood ratio computer. In turn this distribution computer needs a knowledge of the signal parameters and of the noise parameters. Both the distribution computer and the decision computer perform the function of molding the sensory system into a form which best serves the purpose of the larger intelligent system. These computers receive as instructions from the larger system estimates of the environment in the form of statements of the parameters of the signal and of the noise as well as estimates of a priori probabilities and the utilities of the possible decisions. In this way the sensory system is actually a subsystem of a larger system and its form depends upon the commands of the larger system. Study of the operations which take part in the components included in the dotted line will aid in understanding the operation of this ideal observer. Each of the components can be viewed as having an input-output relation,and this relation takes the form of a mapping. The input to the transducer is a point in a multi-dimensional space. In the case considered in the previous section this is a point in a space which has 2WT dimensions. Ideally the mapping from the input of the transducer to the output of transducer is isomorphic; that is, for each point in the input there is a unique point in the output and viceversa. In other words no information processing takes place in the transducer. 22

The input to the likelihood ratio computer is the output of the transducer,and this again is a point in the 2WT-dimensional space. The likelihood ratio computer transforms the information in the input to information which is relevant to the particular hypotheses of the test. In doing this it reduces the dimensionality of the space. Since in the case discussed in the previous section the likelihood ratio is a single number, this number can be mapped on a unidimensional space. Thus the mapping from the input of likelihood ratio computer to the output of this computer is homomorphic; that is, there are several points at the input,each of which can be mapped into the same point at the output. If one wants to generalize the operation of this ideal receiver to the case where there are more than two possible alternatives, the dimensionality of the output of the lieklihood ratio computer has M-l dimensions, where M is the number of possible alternatives. Three possible alternatives can be described by the output of the likelihood ratio computer in terms of two dimensions, four in terms of three, and so forth. Again the input to the decision computer is the output of the likelihood ratio computer. The mapping again is homomorphic. All of the points in the likelihood ratio computer which match the points in any given criterion space are mapped into a single point at the output of the decision computer. In general the output of the decision computer is a space which consists of M discrete points, again where M is the number of possible alternatives. The general functions of the ideal receiver can be looked at as a processing of the receiver input. The first step is to get that information in the form of likelihood ratio, that is, the form 23

which contains the information in that input relevant to the particular question asked. The next step is to match that information to a point in a criterion space which determines the response again in a way which is relevant to a particular situation. This specificity of the ideal receiver is exceedingly important to recognize when one uses the concept. 5. BASIC DEFINITIONS OF d' AND q * The fundamental problem considered in the theory of signal detectability is illustrated in the block diagram of Fig. 4. A signal from an ensemble of signals is transmitted with a fixed probability over a channel in which noise is added. The receiver is permitted to observe during a fixed observation interval in time, at the end of which it must state whether the observation was one of noise alone or signal plus noise. The particular case upon which this discussion is based is that of an ensemble containing only one signal. That one is a signal known exactly; its voltage, point-for-point in time during the observation interval, is known to the receiver. It is not known that the signal exists during the interval. The signal is transmitted over a channel in which band-limited white Gaussian noise is added. It was shown in Section 3 that the detectability of this signal in this channel can be described by the ratio (2E/No) 2 in which E is the signal energy and N is the noise power per unit bandwidth. * Taken with minor changes from Tanner and Birdsall, "Definitions of d' and r as Psychophysical Measures," JASA, Vol. 30, No. 10, pp. 922-928, October, 1958. 24

BAND LIMITED WHITE GAUSSIAN NOISE SIGNAL OUTPUT TRANSMITTER - R RECEIVER ENSEMBLE ENSEMBLE Figure 4. Basic Psychophysical Experiment in Block-Diagram Form. BAND LIMITED WHITE GAUSSIAN NOISE IDEAL SSE0~~~~~~~ -RECEIVER Sl- I S2 *________ 2 2 * _____l__2 l ~~~~~~~SSS | | RECEIVER UNDER STUDY Figure 5 Composite Block Diagram of Channels for Psychophysical Experiment. 25

BAND LIMITED WHITE GAUSSIAN NOISE SIGNAL IDEAL KNOWN (A) EXACTLY CHANNEL CI, BAND LIMITED WHITE GAUSSIAN NOISE SIGNAL RECEIVER KNOWN - (8) EXACTLY CHANNEL CIDUNDER STUDY BAND LIMITED WHITE GAUSSIAN NOISE SIGNAL IDEAL KNOWN (C) STATISTICALLY CHANNEL C21 RECEIVER BAND LIMITED WHITE GAUSSIAN NOISE SIGNAL RECEIVER KNOWN ( > tD) STATISTICALLY CHANNEL UNDER STUDY Fig. 6 Individual block diagram of channels for psychophysical experiment. 2E N N (a) o E ~, No loge J(x) I ~.. (b) o A N. ---- d'- - (C) Fig. 7 Relation between d' and (2E/No)1/2 26

Consider now the experimental arrangement illustrated in Fig. 5. The particular experiment being performed is defined by the positions of switches 1 and 2. A channel includes the transmitter and the receiver. The block diagrams of Figs. 5 and 6 illustrate this use. In Fig. 5 there are two possible types of transmitters and two possible types of receivers. The positions of the two switches determine those which are actually in the channel. The switch positions are used as subscripts to specify the channel. C1 is the channel in which the signal transmitted is one specified exactly and the receiver is an ideal receiver designed to operate on the particular signal specified (Fig. 6(a)). C2 is the channel in which the signal transmitted is one specified exactly and the receiver is the one under study (Fig. 6 (b)). C21 is the channel in which the signal transmitted is one specified statistically and the receiver is an ideal receiver designed to operate on a particular ensemble of signals. The receiver is designed only with reference to a particular statistical ensemble (Fig. 6 (c)). C22 is the channel in which the signal transmitted is one specified statistically and the receiver is the one under study (Fig. 6 (d)). In each of these channels Fourier-series, band-limited white Gaussian noise is added. The following symbols are also defined: 27

Tr is the efficiency of the receiver in the channel C12. Since all other components in that channel are ideal, the difference between the performance of channels C12 and Cll is attributable entirely to the receiver. Tt is the efficiency of the transmitter in channel C21, since all other components in that channel are ideal. rtr is the efficiency of channel C22. Eij is the energy required of channel Cij to achieve a given level of performance. The subscript i refers to the position of the first switch and the subscript j to the position of the second switch. First, an experiment is performed in which a signal known exactly of energy En2 is transmitted over the channel C12 with bandlimited white Gaussian noise, of noise power per cycle N, added. The output is presented to the receiver under study, either a human observer or "black box". The task of the receiver is to observe a specified waveform and to determine whether or not that waveform contains a signal. If the question is asked a large number of times, both when the signal is present and when the signal is not present, the data necessary for estimating the false alarm probability PN(A) and the detection probability PSN(A) are obtained. The next experiment (a mathematical calculation) performed is the same, except that the ideal receiver is substituted for the receiver under study. In this experiment the energy of the signal is "attenuated" at the transmitter (N is the same as in the previous 0 experiment) until the performance obtained in the previous experiment is matched. The energy (El1) leading to the matched performance 28

is then determined. The efficiency of the receiver is defined as r = El/E12, (20) and the measure d' is defined by the equation (d')2 = r(2E12i/N) = 2Ell/o. (21) Thus (d')2 is that value of 2E/N required to lead to the 0 receiver's performance if an ideal receiver were employed in its place. This second experiment is not performed in the laboratory, since the problem has a mathematical solution. One takes the performance of the receiver under study and plots the point on the graph paper of Fig. 2(b). The coordinates of this point are read on the axis to the right and the axis above. The difference of the distances in standard value units is (2Ell/N ) 1/2 The value 2E12/N is measured physically. Since N is assumed constant: o 2E11/N Ell \ =.rlW = E11 -(22) r P12N1o E12 The measure d' is illustrated in Fig. 7. Figure 7(a) contains the probability distributions of Fig. 1. Figure 7(b) is a transformation of the variable to a new variable loge (x) + (E/N) ) X = ()./2 (23) (2E/No)/2 This new variable is distributed normally. The mean of the probability density distribution for noise alone is 0 and the variance, unity. The 29

mean of the signal-plus-noise distribution is (2E/N )/2 and the variance, unity. Figure 7 (c) is the distribution of the variable upon which an ideal receiver would be operating if it were to match the performance of the receiver under study. The mean of the noise distribution is 0, and the mean of the signal-plus-noise distribution is d' = (12E/No )1/ The variances are again unity. Both the measure of d' and r are specific to a particular performance in terms of false-alarm rate and detection rate. If a different experiment were performed employing the same signal and noise conditions, and permitting a different false alarm rate and, consequently, a different detection rate, both d' and q may assume different values. This would be the case if something happened in the receiver to upset the equal-variance condition for noise alone and signal-plus-noise. However, an examination of the specific cases studied in the theory of signal detectability suggests that there are a large number of cases in which this departure, while it exists, is not important. That is, the departure over the range likely to be investigated experimentally is not sufficient to lead to significant changes in d' and r. Next, consider a mathematical experiment in which a signal specified statistically (SSS) is transmitted over the channel to an ideal receiver for that statistical ensemble. Again, the energy E21 is employed, and performance is measured. Furthermore, a second mathematical experiment is performed transmitting a signal spceified exactly (SSE) to the ideal receiver, determining the Ell necessary 30

to give the same performance. This permits calculation of It, the efficiency of a transmitter with that statistical ensemble. Both of these experiments are mathematical calculations. When t = r, each being referred to the case of the signal known exactly, it can be said that the amount of uncertainty represented by the statistical parameters of the transmitter ensemble SSS is reflected to the receiver when SSE is transmitted. This is the same thing as saying that knowledge which the receiver cannot use might as well not be available. If the receiver under study has no provisions built into it for the use of phase information, but all other knowledge can be utilized optimally, then the channel C12 is expected to lead to the same performance as the channel C21 when the signal is known except for phase. Actually, it is not the specific uncertainty which is matched when At = r. A signal known except for phase is one in which all phases are equally likely. Measurement is required in two orthogonal dimensions. If the uncertainty were one of frequency such that any frequency within a band were equally likely, and if this band is such that again measurement in two orthogonal dimensions is sufficient, then this leads to the same change in performance as does the uncertainty of phase. The parameter, m, is defined as the number of orthogonal dimensions over which the statistical uncertainty exists. It is now possible to state a theorem leading to inference about the receiver based on the measurement of i. If Ir = It' then the receiver, through its inability to use knowledge contained in SSE, introduces an equal statistical uncertainty, m, to that of the transmitter, SSS. If the channel SSS to the receiver under study is 31

then established and the condition ntr = r = qt' then the receiver with SSE has introduced exactly that uncertainty existing in SSS. The first part of the theorem states that, if 1r = E/E12 = =t = E1/E21' then the receiver has introduced the same amount of uncertainty in the channel C12 as the transmitter in the channel C21 for that statistical ensemble. Essentially, this means that if the efficiency is less than one, there is uncertainty due to something other than white Gaussian noise which was added in the channel. Since in one case the transmitter is ideal, this uncertainty must be introduced by the receiver. In the other case, the receiver is ideal, and the uncertainty must be introduced by the transmitter. The usefulness of the theorem arises from the fact that the amount of uncertainty introduced by SSS can be stated quantitatively. The second part of the theorem states that, if ltr = E1/E22 t = E E 21 = El/E12 then the exact uncertainties are introduced in the receiver in one case, and in the transmitter in the other case. If particular information, such as that of phase of the signal, is not used by the receiver then the introduction of phase uncertainty into the transmitter will not degrade the receiver's performance. The concepts may be thought of as a procedure for determining what aspects of the signal the receiver cannot utilize in detection. 32

If the theory of signal detectability is applicable, then r is the variable which contains the information necessary to modify the ideal receiver to match the receiver under study. So far the discussion of d' has been entirely in reference to the detectability of signals. The measure can also be applied to the ability of two signals to lead to recognition. First consider the two-alternative forced-choice experiment, in which a signal known exactly is presented in one of two positions in time. The receiver is asked to state in which of the two positions in time the signal did, in fact, occur. This is essentially a recognition experiment. The question asked the receiver is whether the signal is a 01 or a 10. An ideal receiver can test each position for the existence of the 1. The position most likely to contain the signal is the one which he chooses. The information upon which he bases his decision is the difference between the two measures. The distribution of the difference is illustrated in Fig. 8, a normal distribution with mean d' and standard deviation 42. If the two signals are equally likely, then the shaded area represents the probability of a correct choice. Another way of looking at this type of experiment is to treat the task as one of recognitionas illustrated in Fig. 9. In this case, the signal shown in line 1 can be subtracted from the observed input, which contains either the signal of line 1 plus additive noise or the signal of line 2 plus additive noise. The subtraction leaves noise alone if the signal of line 1 was present, or the signal of line 3 plus noise if the signal of line 2 was present. Now the receiver can test 33

d Fig. 8 Distribution of the difference of two variables for the two-alternative forced-choice experiment. [ POSTION I —1 POSTION 2 (2) (3) Fig. 9 Difference signal for the twoalternative forced-choice experiment. 34

for the presence of the signal in line 3 in the noise. If the measure is sufficient to state that the signal of line 3 was present, he chooses the signal of line 1. This experiment is like a detection experiment with twice the energy. A third way of looking at this experiment is illustrated in Fig. 10, taken from an earlier paper. The two signals are orthogonal; that is, the angle e is 90~ If (2E1/N )12 and (2E2/N /2 are equal, then the recognition decision axis is (4E/N )/2 consistent with the result of the previous two views. Now, in the two-alternative forced-choice experiments in which the alternatives have equal energy, one could measure either (1) the distance (r4E/No)1i/2 (2) the recognition d',2, or (3) the distance (r2E/No) /2 (the detection d' for the signal which is presented in one of the two positions in time). Since in forced-choice experiments involving more than two alternatives, with each containing equal energy, a sirgle number permitting analysis is the detection d', the authors and their colleagues have been using this measure. Thus, when a d' is presented without a subscript, or with a single subscript, it is a measure of the difference between two hypotheses, one of which is noise alone. Whenever the d' is intended to indicate the difference between two signals, each is indicated by a subscript. In Fig. 10, d' refers to the distance 0 to X, d' to the distance 0 to Y, and d' 2 to the distance X to Y. 1,2 If a two-alternative forced-choice experiment is found to lead to a percentage of correct choices, this can be used as an estimate of the probability of a correct choice. This estimate is the data 35

necessary to enter the graph in Fig. 2(b). The point to be plotted projects on the ordinate at P(c) and on the abscissa at 1 - P(c). The sum of the standard units is d' and f2dl, if d' = d'. 1,2 1 J i d 2. Now, let us consider a more general case, illustrated in Fig. 11. In this case the angle e can assume any value, and the energies of the two signals, S1 and S2, are not necessarily equal. If an experiment is now performed in which one or the other of the two signals is presented at a fixed position in time, and the receiver is asked to state which one, again the data are furnished for entering the graph of Fig. 2(b). The estimated probabilities required are PS (A1), the probability that if S1 is presented the receiver is correct, and 1 - PS (A2), the probability that if S2 is presented the receiver is incorrect. The d' so estimated is d' = (q2E /N) /2 where E, is the energy of S1' S2I 0 the difference signal. This can be referred to a shifted point of origin 01 with reference to which these signals are orthogonal. The 1 2 distance from 0' to each of the signals is (rEA/N) 2. The energy required to shift the point of origin from 0 to O0 is redundant energy. It may be useful in phasing the receiver or bringing it on frequency. It does not, however, contribute to the capacity of the signals to lead to a decision. If S1 and S2 are now presented in random order and the receiver is asked to state the order, again the data necessary to enter the graph of Fig. 2(b) are available. In this case, the pairs can be considered orthogonal to each other. Thus, the measure is now d'S 136 36

Fig. 10 Illustration of recognition space for definition of Q. _S2 v/-/ X "-"-" /~\ SHN T2E 2 NO Figure 11. Recognition Space for Large Signal1 37

A = 2d's 1S2 The theory of signal detectability deals only with signals for which the space for any set of signals in a given noise background is Euclidean. Distances in this space are linearly related to the square root of the energy of the difference signal represented by two points. The unit of measure is the square root of one-half the noise power per unit bandwidth, (N /2) /2. From the above discussion it is obvious that, in psychoacoustics at least, d' is a voltage-type variable. Ideally, d' is linearly related to the square root of the energy of the informationcarrying component of the signal, not to its power. In studies in which a receiver's response to an incremental stimulus is investigated, the incremental stimulus should be stated in terms of added voltage, not added energy. If one's measure is that quantity leading to a constant d', as would be the case if one measured a "difference limen," then this constant d' would be expected to result when there is a constant voltage difference between the two signals, rather than a constant energy difference. This should be the case whenever there is enough redundant energy to remove the statistical uncertainty of the signal. In some cases, where there is an uncertainty which cannot be removed, as in the case of the signal which is a sample of white Gaussian noise, the energy of the signal is the basis for a good approximation of the detectability. On the other hand, r is an energy ratio, since efficiency is 38

commonly measured in terms of energy. This term is useful in inferring the properties of the receiver under study as suggested at the beginning of this section. 6. SUMMARY The purpose of this report was to give a general view of the ideal observer. The discussion of the opening section was designed to indicate the manner in which this model differs from the descriptive type of model typically used in psychophysics, and to indicate some of the implications of this approach for the study of the human observer. This introduction was followed by an examination of a specific instance of the ideal observer. Here it was seen that in order to evaluate the performance of an observer, it is necessary to consider the decision function which this observer is attempting to maximize. In the particular case where the observer is attempting to maximize the expected value of his decisions concerning the presence or absence of a signal it was seen that the observer must utilize three types of information. First, he must utilize the information input which occurs in the observation interval. He must also consider the values and costs associated with his decisions as well as the probability of a signal occurring in the observation interval. It is not possible to interpret meaningfully data reflecting on an observer's ability to utilize information presented in an observation interval without considering these additional variables. That is to say, the responses of an observer in a psychophysical experiment must be regarded as responses of the total organism and not simply as outputs of the sensory system under study. 39

In the third section of this report, a detailed analysis was made of the case of a signal specified exactly presented in a background of white Gaussian noise. It was shown that an ideal observer could transform any input waveform to a single point on a decision axis with no loss of information relevant to a decision between the hypotheses "signal plus noise" and "noise alone". The distribution of this transformed variable was computed conditional upon each of these hypotheses. An optimal decision can be made by choosing a cut-off on the decision axis in accord with the principles outlined in section two. Following this very specific case of the ideal observer, a very broad model was presented. Essentially, this consisted of a block diagram showing the major components which must be present in an ideal receiver. The role of each of these components was discussed in a general way. The final section introduced measures which may be used in the study of any less-than-ideal observers. Introduced here was a theorem which may be used for deducing very specific properties of the receiver under study by comparing its performance with that of the ideal observer. This was followed by a generalization of the detection model to indicate the manner in which the concept of the ideal observer can be applied to the broader case of recognition of signals. 40

REFERENCES 1. J. F. Siegert in ch. 7, J. L. Lawson and G. E. Uhlenbeck, Threshold Signals, New York, The McGraw-Hill Book Co., Inc., 1950. 2. W. W. Peterson, T. G. Birdsall, and W. C. Fox, "The Theory of Signal Detectability," Inst. Radio Engrs. Transactions on Information Theory, Sept. 1954, PGIT-4. 3. D. VanMeter and D. Middleton, "Modern Statistical Approaches to Reception in Communication Theory," Inst. Radio Engrs. Transactions on Information Theory, Sept. 1954, PGIT -4. 41