Technical Report No. 200 03674-22-T THE THEORY OF SIGNAL DETECTABILITY: BAYESIAN PHILOSOPHY, CLASSICAL STATISTICS, AND THE COMPOSITE HYPOTHESIS by David Jaarsma Approved by >421 D for COOLEY ELECTRONICS LABORATORY Department of Electrical Engineering The University of Michigan Ann Arbor, Michigan Contract No. Nonr-1224(36) NR187 - 200 Office of Naval Research Department of the Navy Washington, D. C. 20360 February 1970 Reproduction in whole or in part is permitted for any purpose of the U. S. Government

ABSTRACT Receiver design and performance from a Bayesian viewpoint depend upon a priori specification whenever unknown parameters are encountered in the detection situation; any available information is expressed in the form of an a priori density. A sensitivity index is developed which measures the performance loss that occurs when the receiver is designed to be optimal with respect to the given a priori density g(.) but operates in an environment in which the a priori density h(-) is considered to hold. The sensitivity index is shown to possess several desirable features. A comparison of receiver performance is made for the composite hypothesis situation. The Bayesian approach is contrasted to the classical approach. Initial indications were that classical statistics (including parameter estimation) could be closely linked to Bayesian philosophy in view of the fact that analysis according to either mode often led to the same receiver. It appeared possible that many of the classical tests could be generated from a Bayesian viewpoint by an appropriate assignment of the a priori density. Investigation revealed that this was not true in general; and the conclusion is drawn that the Bayesian approach is uniquely distinct from the classical approach. Emphasis upon unbiasedness is an admirable quality of classical statistics; yet incorporation of all prior information into the detection model by Bayesian methods is considered a superior attribute. iii

The externally sensed parameter (ESP) receiver is reviewed and its receiver operating characteristic (ROC) is evaluated for several examples not considered before. The ESP receiver is a hypothetical receiver whose performance with respect to the assigned a priori density serves as an upper bound to the performance of the optimum receiver. Receiver design via numerical integration techniques is demonstrated to be feasible for composite hypothesis situations previously considered too complex to solve. The receiver is conceived of basically as a digital computer; and its performance is nearly optimal with little increase in complexity of operation. The numerical procedure is especially adept at handling diffuse a priori densities. Receiver design via estimation techniques is considered justifiable in case optimal (Bayesian) procedures are too complex. The optimum receiver is formulated for the single composite hypothesis in such a way that when the receiver is operating in a sequential mode it appears to "estimate" the unknown parameter at each point in time. It is not an estimator in the usual sense of the word but nevertheless exhibits features and characteristics that are considered desirable of an estimator. Subject to some mild conditions the "estimator" converges asymptotically to the minimum mean square error estimator. iv

FOREWORD In practice detection equipment invariably operates in a variety of different situations. Many of the parameters considered known in the original design theory of the equipment will be unknown in operation. This lack of knowledge affects the available processing gain from a sophisticated receiver. The lack of knowledge of such parameters as source level, propagation loss, target strength (that is, those parameters affecting the received signal energy) or any parameters affecting the received noise level or spectrum, further complicate meaningful evaluation of system performance. Classical engineering and classical statistics propose a type of receiver design and evaluation which is often at odds with that developed from the more recent Statistical Theory of Signal Detectability. Basically, the classical designs are made insensitive to uncertainties or —if the uncertain parameters are too important to be ignored —the designs are based on worst case values; performance is necessarily evaluated on a conditional basis. Unsatisfied or confused, some operational analysts add means and variances and treat an "average situation." The theory of signal detectability allows the designer and the user to incorporate what they do know or can learn about the many uncertain parameters, and, in fact, makes them responsible for v

accurate specification of what they do know. Performance is evaluated by treating all possible conditions, and "optimum" means best average performance (although performance can be evaluated on a conditional basis). This present theoretical thesis introduces mathematical investigations into (1) the performance degradation due to inaccurately specified uncertainties, (2) a re-evaluation of classical statistical principles, in an attempt to salvage relevant work and to discard misleading principles, (3) the feasibility of realizing the receiver as a digital computer, taking into account inherent discrepancies and quantization error that this type of processing will introduce. This type of mathematical investigation, motivated by practical and physical considerations, is necessary to provide proper guidance to applied mathematicians and theoretical engineers in information theory and signal detection so that they may develop practical design principles and evaluation principles. This is necessary before such theoretical principles can be trusted to guide the development of advanced detection systems. vi

ACKNOWLEDGEMENTS The research reported in this dissertation was supported in part by the Office of Naval Research, Department of the Navy, under Contract No. Nonr-1224(36), "Acoustic Signal Processing. " vii

TABLE OF CONTENTS Page ABSTRACT iii FOREWORD v ACKNOWLEDGEMENTS vii LIST OF ILLUSTRATIONS xi LIST OF SYMBOLS xv LIST OF APPENDICES xx CHAPTER I: INTRODUCTION 1 1.1 Problem Formulation 1 1.2 Previous Literature 4 1.3 Procedure 6 1.4 Organization 7 1. 5 Notation 8 CHAPTER II: REVIEW OF FIXED-TIME DETECTION THEORY 11 2.1 Introduction 11 2.2 Receiver Design 14 2.2.1 Criteria 14 2.2.2 Likelihood Ratio Receiver 16 2.2.2.1 Simple Hypothesis 17 2.2.2.2 Composite Hypothesis 17 2.2.2.2.1 Single Composite Hypothesis 18 2.2.2.2.2 Double Composite Hypothesis 19 2.3 Receiver Realization 20 2.4 Receiver Evaluation 21 CHAPTER III: SENSITIVITY OF RECEIVER PERFORMANCE TO A PRIORI SPECIFICATION 26 3.1 Introduction 26 3.2 Sensitivity Study 29 3.2.1 Equality of Moments 35 3.2.2 Signal Known Exactly, Unknown Noise Level 38 viii

TABLE OF CONTENTS (Cont.) Page 3.2.2.1 Preliminaries 39 3.2.2.2 Receiver Design 42 3.2.2.3 Receiver Realization 47 3.2.2.4 Sensitivity Index 48 CHAPTER IV: COMPARISON OF RECEIVER PERFORMANCE FOR THE COMPOSITE HYPOTHESIS SITUATION 62 4.1 Introduction 63 4.2 Bayesian Approach 64 4.2.1 Evaluation Methods 64 4.2.2 Examples 67 4.2.2.1 Signal Known Except for Amplitude, Normal A Priori Density 68 4.2.2.2 Signal Known Except for Amplitude, Known Sign 78 4.2.2.3 Signal Known Exactly, Unknown Noise Level, GammaA Priori Density 82 4.3 Classical Approach 99 4.3.1 Uniformly Most Powerful Tests 100 4.3.2 Uniformly Most Powerful Unbiased Tests 101 4.3.3 A Modified Uniformly Most Powerful Unbiased Test 111 4.3.4 Estimation Techniques 112 4.4 Comparison of the Bayesian Approach and the Classical Approach 118 4.4.1 Single Composite Hypothesis 119 4.4.2 Double Composite Hypothesis 124 4. 5 Conclusions 126 CHAPTER V: RECEIVER DESIGN VIA NUMERICAL INTEGRATION TECHNIQUES 128 5.1 Motivation 128 5.2 Formulation 128 5.3 Signal Known Except for Amplitude 133 5.3.1 Receiver Design 134 5.3.2 Receiver Realization 137 ix

TABLE OF CONTENTS (Cont.) Page 5.3.3 Quality Index 137 5.3.4 Receiver Performance 145 5.4 Conclusions 153 CHAPTER VI: PSEUDO-ESTIMATION 154 6. 1 Introduction 154 6.2 Formulation 154 6.3 Signal Known Except for Amplitude in Added White Gaussian Noise 161 6.4 Receiver Operation 167 CHAPTER VII: SUMMARY 177 7.1 Conclusions 177 7.2 Contributions 180 APPENDICES 182 REFERENCES 222 DISTRIBUTION LIST 225 x

LIST OF ILLUSTRATIONS Figure Title Page 2.1 Illustration of the basic signal detection problem 12 2.2 Cost structure of the decision process 15 2.3 Normal ROC's with detectability parameter d' 24 3.1 Separation of the processing objective from the overall objective 26 3.2 Optimum processor for the composite hypothesis problem as two cascaded processors 27 3.3 Decision process based on the given a priori density g(8) 29 3.4 Comparison of optimum and suboptimum receiver performance 31 3. 5 Typical probability density functions from the gamma family 40 3.6 Extreme value rob 43 3. 7 Comparison of probability density functions 44 3.8 Receiver realization based on the given a priori density g(O) 48 3.9 Transformation z[z(xlg) h] 52 3.10 Contour map of 6(u,v), b = 1, E /c = 1, n =10, r0 =.5 54 3.11 Contour map of 6(u,v), b =1, E /c =1, n= 2, r0 =.5 55 3.12 Sensitivity index, p=3, b =, Es/c=l 56 xi

LIST OF ILLUSTRATIONS (Cont.) Figure Title Page 3.13 Standard deviation of 6(u, v), p = 3, b = 1, Es/c =1 58 3.14 Sensitivity index, Es/c =1, d =.5, p =3, n= 2 P 60 3.15 Sensitivity index, b = 1, r.5, p = 3, n= 2 61 4.1 Conditional ROC's, comparison of optimum and suboptimum receiver performance, SKEA, d0 =1, m/v = 2 73 4.2 Average ROC's, comparison of ESP and optimum receiver performance, SKEA, dO = 1, m = 2, v = 1 74 4.3 Conditional ROC's, comparison of optimum and suboptimum receiver performance, SKEA, d0 =1, m/v = 0 75 4.4 Average ROC's, comparison of ESP and optimum receiver performance, SKEA, d0 = 1, m = 0 76 4. 5 Decision region of the conditionally optimum receiver, SKE-UNL 85 4. 6 Decision region of the conditionally suboptimum receiver, SKE-UNL 88 4.7 Conditional ROC's, comparison of optimum and suboptimum receiver performance, SKEUNL, n= 2 93 4.8 Average ROC's, comparison of ESP and optimum receiver performance, SKE-UNL, b = 1 94 4.9 Conditional ROC's, comparison of optimum and suboptimum receiver performance, SKEUNL, c = 0 95 xii

LIST OF ILLUSTRATIONS (Cont.) Figure Title Page 4.10 Average ROC's, comparison of ESP and optimum receiver performance, SKE-UNL, d0/c =1 96 4.11 Decision region of the UMPU test, SKE-UNL 105 4.12 Classical solution to the detection problem SKE-UNL 106 4.13 ROC's for the UMPU test, SKE-UNL 109 4.14 Decision region of a modified UMPU test, SKE-UNL 11 4.15 ROC's for a modified UMPU test, SKE-UNL 113 5.1 Receiver realization as a digital computer incorporating numerical integration techniques 138 5.2 Plot of 6(y), p = 1 142. 3 Plot of 6(y), p = 2 143 5.4 Quality index 144 5.5 Variance of 6(y) 146 5. 6 Receiver operation and performance evaluation structure 148 5.7 Effect of numerical integration techniques on receiver performance, SKEA, p = 1, p = 2 151 5.8 Effect of numerical integration techniques on receiver performance, SKEA, a2 = 1, p = 2 152 6.1 Sketch of a posteriori probability density functions 158 xiii

LIST OF ILLUSTRATIONS (Cont.) Figure Title Page 6.2 Simulation of receiver operation, comparison of the pseudo-estimator and the posterior mean, SKEA, diffuse information level, a = 0 169 6. 3 Simulation of receiver operation, comparison of the pseudo-estimator and the posterior mean, SKEA, diffuse information level, a =4 170 6.4 Simulation of receiver operation, comparison of the pseudo-estimator and the posterior mean, SKEA, precise information level, a =0 172 6. 5 Simulation of receiver operation, comparison of the pseudo-estimator and the posterior mean, SKEA, precise information level, a = 4 173 6. 6 Simulation of receiver operation, comparison of the pseudo-estimator and the posterior mean, SKEA, intermediate information level, a = 0 175 6.7 Simulation of receiver operation, comparison of the pseudo-estimator and the posterior mean, SKEA, intermediate information level, a = 4 176 xiv

LIST OF SYMBOLS ROC receiver operating characteristic N noise hypothesis SN signal and noise hypothesis A decision "signal present" B decision "signal absent" P(A IN) probability of false alarm P(A I SN) probability of detection x observation x(t) x observation vector x. observation sample 1 s signal s(t) s signal vector s. signal sample W receiver bandwidth T duration of observation interval r () Gamma function, (ao) = S t1 e dt 0 o2 noise level, a2 = N0/2 NO noise power per hertz 0 unknown noise process parameter; also 0 = (y2 in SKE-UNL detection problem xv

LIST OF SYMBOLS (Cont.) 0 9 parameter space 4Q ~ unknown signal parameter t2 C ( ) normal distribution function, <& (a) = 1- e dt ~2T -x W( ) (x) = O(x)/O'(x) Q ( ) likelihood ratio f*( ) approximate likelihood ratio.( I ) conditional likelihood ratio z( ) log-likelihood ratio, z( ) = 2n 2( ) z*( ) approximate log-likelihood ratio ZA I ) conditional log-likelihood ratio F( ) probability distribution function f( ) probability density function f*( ) approximate probability density function f( I ) conditional probability density function g( ) a priori probability density function, g( ) is "given" g( I ) a posteriori probability density function h( ) a priori probability density function, h( ) "holds" b parameter of the a priori gamma density b parameter of the a posteriori gamma density with respect to SKE-UNL detection problem c parameter of the a priori gamma density xvi

LIST OF SYMBOLS (Cont.) c parameter of the a posteriori gamma density with respect to SKE-UNL detection problem d' detectability parameter on normal ROC 2E d detectability parameter on normal ROC, d = d' =2 N0 E total signal energy s SKEA signal known except for amplitude SKE-UNL signal known exactly, unknown noise level c~a ~ threshold threshold on likelihood ratio A threshold on log-likelihood ratio for SKE-UNL a amplitude; also parameter in SKE-UNL detection problem r parameter in SKE-UNL detection problem (u, v) sufficient statistics of the observation for SKE-UNL ( IH) sufficient statistic given the hypothesis H -, mean vector SC ~ autocovariance matrix'I' ~ autocovariance matrix () E (x) = z(xlh)- z[z(xlg) h]; also (x) = z*(x) - z(x) 6( ) transformed version of e( ) I(glh) sensitivity index, I(glh) = E[ (x)lh, SN] - E[e(x)lh,N] J quality index, J = E[e(x)lSN] J(1: 0) divergence between hypotheses xvii

LIST OF SYMBOLS (Cont.) E expected value operator P() probability of ( ) w observation statistic, w = xt S x y observation statistic, y = s -1 x t do nominal signal-to-noise ratio, dO = s Z s m mean of a priori amplitude density v variance of a priori amplitude density p measure of diffuseness, p =a2 p upper limit on summation index 0( ) decision rule n number of observations, usually n = 2WT v number of degrees of freedom for the't' distribution a minimum variance, unbiased estimator of the unknown amplitude a a. pseudo-estimator of amplitude at t = t. 1~~ ai MMSE estimator of amplitude at t = ti a = E(alxl,x2,... xi, SN) CO conditionally optimum CS conditionally suboptimum ESP externally sensed parameter OPT optimum is distributed according to xviii

LIST OF SYMBOLS (Cont.) t is monotone with respect to A = is defined as AGC automatic gain control LHS left hand side RHS right hand side xvix

LIST OF APPENDICES Page Appendix A: Nature of the Set R0 182 Appendix B: Extreme Values of r0 185 Appendix C: A Transformation from n Dimensions to 2 Dimensions 189 Appendix D: Optimum Receiver Operation and Performance for SKE + KGN 194 Appendix E: Suboptimum Receiver Operation and Performance for SKE + KGN 197 Appendix F: Monotonicity of the Likelihood Ratio for a Special Case 202 Appendix G: A Useful Transformation of Normal Random Variables 204 Appendix H: Useful Integral Relations 208 Appendix I: Non-existence of an A Priori Density 209 Appendix J: Existence of A and Calculation of Coefficients 212 Appendix K: The Transformation z[z(xlg)(h] 216 xx

CHAPTER I INTRODUCTION The problem of detecting a signal in a background of noise is statistical in nature. The state of the environment can only be expressed in a probabilistic sense. Sometimes, even the statistics are not completely known. When the statistics are known except for one or more of the parameters the situation is termed a composite hypothesis problem. Several authors applied statistical decision theory to develop a theoretical foundation for making the best possible decisions (Refs. 1-3). The theory shows that the optimum decision rule is based on likelihood ratio whenever "good" decisions are preferred to "bad" decisions (Ref. 4). 1.1 Problem Formulation This work is conceived of as being somewhat expository in nature with one of its main goals being to explain the basic nature of detection devices and thereby reconciling Bayesian philosophy with classical statistics. The decision process is compounded whenever uncertainties exist either in the noise and/or signal plus noise process. The theory as formulated in the early 1950's (Refs. 1-3) inherently allows for the inclusion of any available information (or lack of information) by

2 assigning an a priori density to the unknown parameter(s); statistical inference is accomplished by adhering to the Bayesian philosophy. Prior information is modified by actual data (observation) according to Bayes rule, i.e., the a posteriori densities are computed from the a priori densities and the observation statistics according to Bayes rule. Bayesian philosophy is generally considered to reflect a subjective or personalistic attitude of ideally consistent people (Ref. 5). However, Bayesian philosophy (assignment of a priori densities) can equally well be interpreted to reflect the true state of the unknown parameter(s) by absorbing any environmental data. For instance, the mean of the a priori density might be chosen to be the most probable value (mode) or the mean value (ensemble average) of some function of the available data; the variance of the a priori density would be chosen to indicate the designer's degree of confidence in the data. Hence, a large variance would indicate either that no prior information is available (total ignorance) or that the confidence level in the data is virtually nil; information of this nature is described by what is referred to as a diffuse a priori density. By the same token,a small variance indicates a high level of confidence in the data; and the corresponding a priori density is highly dependent on the data. The goals of this dissertation are: a) To consider the effect of a priori specification on receiver design and performance. In particular, a study was conducted to

3 examine the effect on receiver performance of a priori densities whose means and variances were identical. b) To examine the relationship between the Bayesian approach to detection theory and that of classical statistics. The Bayesian approach and the classical approach often result in identical decision rules (receiver action); yet they are contrasting viewpoints. Emphasis must be placed on the evaluation of receiver performance conditional to the actual value of the parameter; classical statistics never assigns a priori densities, and in fact the receiver will be operating in an environment in which the parameter is constant though unknown. c) To develop a method of coping with the inherent integration difficulties that often arise in a composite hypothesis detection problem. Since a priori densities are never really known precisely, it is suggested that numerical integration techniques be employed. The receiver is conceived of as a digital computer; and it is shown that the degradation in performance can be kept negligible. d) To study the use of estimation techniques in connection with the optimal procedure from the Bayesian point of view. It appears that "estimators" can be defined which yield optimal performance from the Bayesian standpoint; yet the "estimators" are closely allied to the estimators of classical statistics and exhibit characteristics which are considered desirable.

4 1. 2 Previous Literature There are many textbooks available today which serve as basic background information to decision theory. These textbooks include topics on mathematical statistics, hypothesis testing, statistical decision theory, random processes, information theory and communications engineering, among others (Refs. 6-18). Much of the work that has been carried on in recent years has been concerned with detection problems in which only signal uncertainties exist (Refs. 19-21). The theory of signal detectability was extended to the double composite hypothesis problem (uncertainties exist in both the noise alone and the signal plus noise process) by Spooner (Ref. 22). Some of the present work is based on his work. Since this work is intimately concerned with the evaluation of receiver performance in terms of receiver operating characteristics (ROC), the work by Birdsall on ROC curves and their character (Ref. 23) is of prime importance. In addition, the article by Birdsall and Tanner on psychological measures (Ref. 24) presents a useful onenumber measure of receiver performance (in lieu of the two-number measure of the ROC) which is appropriate under certain circumstances. The article by Dempster and Schatzoff (Ref. 36) proposed to use the area under the ROC curve as a measure of performance level. This concept, although interesting, is of little value in comparing ROC curves. If neither ROC curve dominates the other (one ROC is superior or equal to the other at all threshold levels), the

5 measure is useless; even when one ROC does dominate another the proposed measure can easily lend too much significance to a particular characteristic of the ROC's (e.g., it might weight high probabilities of false alarm more than deemed desirable). Both Bayesian analysis and classical analysis of the composite hypothesis detection problem base decisions on sufficient statistics. Ferguson (Ref. 7) and Raiffa and Schlaifer (Ref. 9) present a good discussion about sufficient statistics. An excellent discussion of reproducing densities (or natural conjugate densities) is presented by both Raiffa and Schlaifer (Ref. 9) and by Spragins (Ref. 25). Spragins aptly demonstrates the relationship of Bayes rule to the existence and functional form of a reproducing density. The classical approach to testing statistical hypotheses is presented from a fairly theoretical point of view by Lehmann (Ref. 8). Estimation techniques are presented by Deutsch (Ref. 26) and by Van Trees (Ref. 16). A presentation on a measure of information provided by an observation was reported by Lindley (Ref. 27); further information theoretic concepts are discussed by Kullback (Ref. 13) and by Feinstein (Ref. 28). A thorough discussion of numerical integration techniques can be found in the book by Kopal (Ref. 29); tables of the numerical constants needed are conveniently listed in The Handbook of Mathematical Functions (Ref. 30).

6 1.3 Procedure Sensitivity of receiver performance to a priori specification is approached by specifying a nominal or assumed a priori density g(-) and then perturbing g(.) in such a manner that the desired a priori density h( ) has its first few moments equal to those of g(-). A sensitivity index is developed which exhibits several desirable features. Adherence to Bayesian philosophy inherently leads to receivers based on a monotone function of likelihood ratio. Prior information (opinions, specifications or data) is specified in the form of an a priori density and is modified according to Bayes rule by taking the actual observation into account. Classical statistics, on the other hand, never assigns an a priori density, and instead attempts to design a receiver which operates "well" regardless of the actual operating conditions. Unbiasedness (performance is at least as good as chance) is a criterion often imposed by classical statistics; estimation methods are also applied by the classical approach to detection theory. For the composite hypothesis situation numerical integration techniques can be used to design a receiver whenever integration difficulties prevent exact analysis. Once the observation statistics are specified and the a priori density assigned, the absolute observation statistics are obtained by using appropriate numerical integration techniques. The particular choice of a numerical integration method depends on the nature (region of definition, prior information available,

7 etc.) of the unknown parameter. Given the exact form of the a priori density, a quality index is defined, based on information theoretic measures, to determine the effect of the approximate numerical methods employed. Furthermore, the performance of the receiver based on numerical methods is compared to the receiver based on exact (not always possible to do) analysis. A nominal degradation in performance is not considered critical since a priori densities are usually quite subjective anyway. An attempt to justify estimation as an optimal procedure (the Bayesian approach is considered optimal) is conducted by simulating receiver operation for a particular example and comparing a pseudoestimator obtained by Bayesian methods to the minimum mean square error estimator at each increment in time.. 4 Organization The basic background material is presented in Chapter II. The composite hypothesis detection problem is presented; and the inherent inclusion of prior information in the form of an a priori density is emphasized. Chapter III conducts a study of the sensitivity of receiver performance to a priori specification. Separation of the processing objective from the overall objective is discussed. An index is defined which measures sensitivity of receiver performance to the particular choice of a priori density assigned. In Chapter IV receiver performance via Bayesian methods is compared to receiver

8 performance via classical methods. The ESP receiver is defined and its important features are discussed. Receiver design via numerical integration techniques is proposed in Chapter V for the composite hypothesis situation whenever exact analysis is difficult or impossible. A simulation of receiver operation based on a pseudo-estimator is presented in Chapter VI to justify the use of estimation methods as a "good" approach to solving signal detection problems. 1. 5 Notation The receiver, as referred to in this work, is a device which computes a test statistic (based upon the actual observation) and makes a decision as to presence or absence of a signal depending on whether or not the test statistic exceeds a pre-assigned threshold. Whenever Bayesian methods are employed,the test statistic is the likelihood ratio or a monotone function thereof. The terms "receiver" and "test statistic" will often be used interchangeably throughout the succeeding chapters. Since this work is intended to apply primarily to fixed-time detection theory, an observation will be denoted by an n-dimensional column vector unless otherwise specified, i.e., x = (X, X2,... Xn)t (A column vector will be denoted as the transpose of a row vector to save space; in this connection t denotes transpose.)

9 Since much of this work deals with or relates to the normal detection situation, it is convenient to define the normal distribution function t2 x 2 (x) = i e dt Probability density functions are denoted in symbolic form. For instance, f(x I H) is the probability density function of the observation x conditional to the hypothesis H. Likewise, g(O Ix, H) is the a posteriori probability density function of the random variable 0 given the observation x and the hypothesis H. A priori densities will generally be denoted by g(0). The symbol - denotes "is distributed according to." For instance, x - MVNE (g,) indicates that the observation x is distributed according to a multivariate normal probability distribution of dimension n with mean vector p. and autocovariance matrix Z. The symbol t denotes "is monotone with respect to." For example, Q(x) t x means that 2(x) is a monotone function of x. The likelihood ratio of an observation x given that the assigned a priori density is g(0) will be denoted by 2(xlg). The likelihood ratio will be simply denoted by 2(x) when no uncertainties exist or

10 when no ambiguities exist as to the assigned a priori density. The expectation operator will be denoted by E; and the variance- covariance operator will be denoted by Var. Hence, if,u represents the mean vector and Z the autocovariance matrix of the observation x, the following relationships exist:, = E(x) = Var(x) = E(xxt)- x _t The determinant of the autocovariance matrix Z will be denoted by i -d. A double notation is often used throughout this work for convenience. For example, N N P(AI1N) = f f(XISN) 2 (x)> means P(AIN) = f f(xIN) dx f(x)> P P(AISN) = f f(xISN) dx 2(x)> 3 Other definitions and explanations will be made as they occur in the text.

CHAPTER II REVIEW OF FIXED-TIME DETECTION THEORY The theory of signal detectability was aptly formulated by Peterson, Birdsall and Fox (Ref. 1) in 1954. This theory is now called classical fixed-time decision theory. The basic theory was extended by Nolte, Roberts and Spooner. Nolte (Ref. 21) considered the problem of detecting a recurrence phenomenon in noise for different degrees of certainty about recurrence time while Roberts (Ref. 19) expounded on composite deferred decision theory. Roberts formulated the optimum stopping rule for the sequential detection of a composite signal for which the response time is constrained to occur within the observation interval. The cost of making an observation is taken into account. Spooner (Ref. 22) developed a general optimum processor for the double composite hypothesis situation. All uncertain noise and/or signal plus noise process parameters were expressed in terms of a priori densities. Since the present work is based on the theory of signal detectability, it is appropriate that this theory be reviewed. 2.1 Introduction The basic signal detection problem is presented schematically in Fig. 2.1. The noise process is denoted by n(t) while s(t) describes the signal. The receiver is presented with an observation x(t) during 11

12 + ^ ^/\ xa) r -Decision sx(t)! Receiver De -isi^ s(t) " n(t) Figure 2.1. Illustration of the basic signal detection problem. a time interval t < t < to + T. The observation consists of either noise alone or signal plus noise. On the basis of this observation the receiver must decide whether or not a signal had been present during the observation interval. This observation-decision task of the receiver can be formulated from a decision theory viewpoint as a hypothesis test. N: x(t) = n(t) SN: x(t) = n(t) + s(t) t < t < t + T (2.1) The hypotheses are mutually exclusive. That is, either the signal was present during the entire observation interval or it was absent during the entire observation interval. In order to utilize statistical decision theory, it is customary to describe the random process x(t) by a series representation. If the observation x(t) is timelimited to an interval of length T and Fourier Series bandlimited to a band of frequencies of width W, then

13 the Shannon sampling theorem can be employed to represent the observation x(t) by x = (x1,... x)t (2.2) where n = 2WT (2.3) = x + i =1,2,... n (2.4) A more general series expansion referred to as the Karhunen-Loeve expansion is discussed in Ref. 16 and represents a random process x(t) in terms of a complete orthonormal set of functions 0i(t) so that N x(t) = lim X xi.0(t) t <t<t +T (2.5) N-3o i=l - 1 where t0+T x. x(t) 0(t)dt (2.6) to Subject to certain regularity conditions, it is shown that there exists a set of 0i(t) that leads to uncorrelated coefficients and consequently assures convergence in the mean-square sense. The second-moment characterization of the Karhunen-Loeve expansion makes it possible to represent a Gaussian process in terms of an at most countably

14 infinite set of statistically independent Gaussian random variables. 2.2 Receiver Design The receiver will be presented with an observation x(t) during a time interval t < t < t0+T and make a decision as to presence or absence of a signal at the end of the observation interval. In order to optimize receiver performance, we need to specify a particular criterion. 2.2.1 Criteria. The observation-decision task of the receiver results in either one of two possible correct decisions or one of two possible incorrect decisions. The receiver is capable of making two distinctly different responses, A is the response "signal is present" B is the response "signal is absent" to two possible hypotheses, SN the hypothesis "signal plus noise" N the hypothesis "noise alone" Associated with each response is the possibility of error since noise is present throughout the processing. The probabilities of error and correct decision are denoted by P(AIN) probability of false alarm P(AISN) probability of a correct detection

15 P(BIN) probability of a correct rejection P(B I SN) probability of a miss These probabilities are not independent since P(AIN) + P(BIN) = 1 (2.7) P(AISN) + P(BISN) = 1 (2.8) A cost may be associated with each possible outcome. The cost structure of the decision process is illustrated in Fig. 2.2. SN N A C CF B CM CQ Fig. 2.2. Cost structure of the decision process The cost of a correct detection is CD, the cost of a false alarm is CF, the cost of a miss is CM and the cost of a correct rejection is CQ Several different criteria may be appropriate depending upon the user's goals or objectives. The Bayes criterion minimizes the average cost C where

16 C - P(SN) [CD P(AISN)+ CM P(BISN)] + P(N) [CF P(AIN)+ CQ P(BIN)] (2.9) The Neyman-Pearson criterion maximizes P(AISN) subject to the constraint P(AIN) < a where a is a pre-assigned probability. The weighted combination criterion maximizes P(AISN) - w P(AIN) where w is a positive constant relating the costs involved. Other criteria exist with analogous objectives. 2.2.2 Likelihood Ratio Receiver. It has been shown (Ref. 23) that the receiver which bases its decision on likelihood ratio yields optimum performance for a wide class of criteria including those described above. In particular the decision rule is 1 f(x)> P 0 r (x) = r (x)= (2.10) 0 f(x)< P where &(x) _ f(xlSN) (2.11) f(x IN)

17 is the likelihood ratio, P is the pre-assigned threshold and 0(x) is the probability of deciding that a "signal is present" with 0 < r < 1. (Whenever 2(x) > (, the decision is made that a "signal is present.") The situation that occurs when f(x) = 3 describes a randomized decision rule. Whenever the observation densities f(xlSN) and f(xlN) are analytic, the occurrence of f(x) = ( has probability zero and hence the value of r is of no consequence whatsoever. Only the threshold depends upon the particular criterion invoked. Birdsall (Ref. 4) has shown that the likelihood ratio receiver yields optimum performance for any choice of criterion which considers incorrect decisions "bad" and correct decisions "good". This powerful result makes it possible to evaluate the merits of any other decision device by comparing its performance to the performance of the likelihood ratio receiver. 2.2.2.1 Simple Hypothesis. Whenever no uncertainties exist either in the N and/or SN process, the likelihood ratio is easily determined. It is (x) f(lSN) (2.12) f(x IN) 2.2.2.2 Composite Hypothesis. For the simple hypothesis problem the N and SN process statistics were known precisely. For the composite hypothesis problem either N and/or SN process parameters are unknown. Any available information concerning these

18 parameters is expressed in the form of an a priori density. The available information might consist of either the designer's subjective opinion or be specified by the range of parameter definition. That is, if the current state of the unknown parameter is known reasonably well (i.e., the designer possesses a high level of confidence that the value of the unknown parameter is within a given range of values), then the unknown parameter will be modeled by a proper a priori density (the normalization constant exists so that normally the mean and variance exist). On the other hand, if the state of the unknown parameter is entirely unknown (i.e., the designer possesses total ignorance), the unknown parameter might be modeled by a diffuse a priori density (the normalization constant does not exist). A diffuse a priori density in essence expresses the opinion that all admissible states of the unknown parameter are equally likely. 2.2.2.2.1 Single Composite Hypothesis. When uncertainties exist in the signal parameter(s), the situation is referred to as a single composite hypothesis problem. If we denote the unknown signal parameter(s) by 4, then the absolute observation statistics can be expressed in terms of the conditional observation statistics and the assumed a priori density g(4/) as f(xISN) = J f(x14/,SN)g(4))dv (2.13) The observation statistics conditional to N remain the same since only signal uncertainties are involved. Then

19 x _ f(xlSN) f(xlN) = f.(x14/)g(4/)d4 (2.14) where f(x I -) f(x, SN) (2.15) f(x IN) The total likelihood ratio is therefore obtained by averaging the conditional likelihood ratio over all possible states. 2.2.2.2.2 Double Composite Hypothesis. In the double composite hypothesis problem uncertainties exist in both the N and SN process. Uncertainty of a noise process parameter inherently leads to a double composite hypothesis since noise is present conditional to either hypothesis. Let 4 denote the uncertain signal parameter(s) and 0 the unknown noise process parameter(s), and assume 0 and 4 are independent. Again, the absolute observation statistics can be expressed in terms of the conditional observation statistics and the given a priori densities g(V/) and h(O) as f(xlSN) = f f f(x1/,8,SN)g(4) h(O)do d4 (2.16) f(xlN) = f f(xlO,N) h() d (2.17) The total likelihood ratio then becomes

20 f(x) = Sf (xlkh)g(4) d4 (2.18) where f f(xl,, 0,SN) h(0) dO (xl, h) = (2.19) S f(xlO,N) h(O) dO 0 It is apparent that the receiver is now constrained to operate in a dual channel mode, each channel operating on a different hypothesis. This feature is inherent for the double composite hypothesis problem and will become apparent in the work which follows. 2.3 Receiver Realization Receiver realization consists of the implementation of equipment in any fashion that realizes the likelihood ratio or a monotone function of the likelihood ratio. The receiver may be realized either sequentially in time or the entire observation may be processed. Information processing may occur in either analog or digital form. Hence, the receiver might be a matched filter or it might be a digital computer. The availability of high speed digital computers and sophisticated processing techniques has made it feasible to employ computers to do the processing for the optimum receiver in real time. Optimum processing of the observation x(t) according to Bayesian methods also permits the extraction of information useful for classification and/or estimation purposes. Prior information or

21 opinions are modified by data in terms of a posteriori densities. This learning or classification output as expressed in terms of a posteriori densities must be used only in conjunction with the detection output since learning only occurs with respect to knowledge of the true hypothesis. Both the detection output and the classification output depend upon the observation x(t) only through its sufficient statistics. Receiver realization of the optimum receiver in terms of the sufficient statistics usually reduces the dimensionality of the problem, thereby resulting in a fixed memory receiver regardless of the length of the observation interval T. 2.4 Receiver Evaluation Receiver evaluation is succinctly summarized by the receiver operating characteristic (ROC). The ROC is a convenient way of portraying the quality of detection (Ref. 23). It is merely a plot of the probability of false alarm, P(AIN), versus the probability of detection, P(A SN), for all possible threshold levels of the receiver output. For any receiver these probabilities are P(AIN) = E(0(x)lN) (2.20) P(AISN) = E ((x)ISN) (2.21) where 0(x) is the decision rule. When the receiver is the optimum likelihood ratio receiver these relations are

22 P(AIN) = f(xlN) dx (2.22) P(x)> P P(AISN) = f(x I SN)dx (2.23) f(x)> 3 By the obvious transformation - = 2(x) (2.24) these relations become P(AIN) = f(f IN) df (2.25),3 P(AISN) = f f(lISN) d2 (2.26) Further simplification can be obtained by utilizing a fundamental theorem of decision theory (Ref. 23) which states "the likelihood ratio of the likelihood ratio is the likelihood ratio." That is, () f( I N) (2. 27) f f( IN) so that the parametric equations describing the ROC curve for the likelihood ratio receiver become cP(AN) = (2.28) 3

23 P(AISN) = f f( IN) d2 (2.29) f3 Although only the density function of the likelihood ratio conditional to N needs to be determined to evaluate receiver performance, this task is often formidable. Existence of sufficient statistics and some facility with statistical transformations will usually simplify the problem, although it may still be necessary to resort to numerical integration techniques. A standard for comparing ROC curves is the normal ROC. An ROC is called normal if it can be parameterized by the normal distribution as follows: P(A1N) = <>(X) (2.30) P(AISN) = (X + d') (2.31) where t2 (X) = 1 e 2 dt (2.32) %/27r -x0 It is evident that a normal ROC curve can be characterized by the one parameter d'. The parameter d' is usually referred to as the quality of detection. For convenience normal ROC curves are usually plotted on normal-normal paper so that normal ROC curves become linear. A family of normal ROC curves with parameter d' is plotted

24 on normal-normal paper and displayed in Fig. 2.3. * 99 / / /.95.90.10.05-.01_, I I i I i I I I I! 1,, I.01.05.10.50.90.95.99 P(A IN) Figure 2.3. Normal ROC's with detectability parameter d'. The utility of normal ROC curves lies in the fact that physical significance can be attributed to d'. For the detection problem signal known exactly in known Gaussian noise (SKE + KGN), the

25 performance of the optimum receiver is described by a normal ROC with parameter d' = (2.33) 0 where E is the signal energy and NO is the noise power per hertz. The normal ROC provides a basis for the comparison of ROC curves. When ROC curves are almost normal the equivalent detection index d' as measured on the negative diagonal [P(A IN) + P(AISN) = 1] e is indicative of the quality of detection and serves as a convenient quantitative measure of performance.

CHAPTER III SENSITIVITY OF RECEIVER PERFORMANCE TO A PRIORI SPECIFICATION 3.1 Introduction One of the first contributions of classical fixed-time detection theory was the separation of the processing objective from the overall objective. The optimum receiver has the task of formulating the likelihood ratio f(x) of the entire observation interval. A decision as to presence or absence of a signal is made by thresholding the likelihood ratio. The threshold value 3 is determined independently of the likelihood ratio. It reflects the goal of the user and may incorporate such things as costs, a priori probabilities, a desired false alarm rate, etc. The separation of roles is clearly illustrated by the block diagram of Fig. 3.1. xRec Optimum ver Comparator -- - Decision Receiver Figure 3.1. Separation of the processing objective from the overall objective 26

27 Another important contribution to the theory of signal detectability was recently made by Birdsall (Ref. 31). He showed that the optimum processor for the composite hypothesis situation can be formulated as two cascaded processors as illustrated in Fig. 3.2. -(xlg) x(t, 0 ) Primary S (x I SN) Secondary I (x I h) Processor V(xN) Processor g( ) r(6 ) Figure 3.2 Optimum processor for the composite hypothesis problem as two cascaded processors. For simplicity assume that the unknown parameter is 0 Furthermore, let the primary processor be designed on the basis that the unknown parameter 0 has a given a priori density g(0), but at the time of use the a priori density h(O) holds; h(O) is related to g(0) by the Radon-Nikodym derivative r(0) such that h(0) = r(0)g(0) (3.1) Then the equation describing the secondary processor is essentially Eq. 39 of Ref. 31 and is

28 f r() g(0 (xISN),SN) dO f(xlh) = f(xlg) ~ (3.2) f r(O) g(O I (x IN), N) d This relationship can be described equivalently as x(xlh) = k(xlg) E[r(O) x,g, SN] (3.3) E[r() lx, g, N] The a priori density h(O) is the density with respect to which the user wishes to maximize performance. It may reflect his personal opinion as to the information available, or it may reflect the true state of nature with respect to the possible values 0 can assume. That is, the a priori density h(0) could either reflect a subjective point of view or conviction, or it might reflect the actual physical situation, or even a set of specifications. The primary processor forms the likelihood ratio of the observation based on the conditional observation statistics and a natural conjugate a priori density g( ) (provided that it exists). In addition it also determines the sufficient statistics 4 (xlN) and 5 (xISN) of the observation. The secondary processor utilizes the output of the primary processor along with knowledge of the a priori density h(0), in the form of a Radon-Nikodym derivative r(O), to form the likelihood ratio of the observation based on the desired a priori density h(0). This partitioning of information has made it possible to design and build a major portion of the processing equipment based on

29 mathematically tractable functions and to allow the exact goals of the receiver to be specified at a later time without any degradation of performance. This should not be construed to mean that the design and construction of the secondary processor is elementary. Quite to the contrary, it is often a very complex and difficult task. 3.2 Sensitivity Study Let 0 be an unknown noise process parameter; and consider a receiver designed to be optimum with respect to the given a priori density g(0), but whose performance is evaluated with respect to the a priori density h(0). [Mnemonically, g is "given" but h "holds".] The receiver is necessarily suboptimum with respect to the actual operating conditions. A block diagram illustrating the decision process is shown in Fig. 3.3. x so Suboptimum (xlg), | Comparator ---- Decision Receiver g(0) 3 Figure 3.3. Decision process based on the given a priori density g(O).

30 Performance via ROC The performance equations of the suboptimum receiver are N N P(A Ig, h,) = f I f(x h, ) dx (3.4) J (xlg)> 3 with N N f(xh, N) = f f(x, ) h(0) dO (3.5) 0 f(xlg, ) = f(x0, SN) g(0) dO (3.6) and (xIg) = f(xlg,N) (3.7) f(x Ig, N) The optimum receiver would have based its decision on R(xlh) where (xlh) (xh, SN) (3.8) f(x Ih, N) Its ROC will serve as an upper bound for the ROC of any receiver operating under these conditions. The performance equations of the optimum receiver are N f f( N) P(Alh,h,N) = f(xlh, N)dx (3.9) SN xhN l (x I h)> 13

31 Figure 3.4 sketches the idea of comparing the ROC of Eq. 3.4 (ROC G) to the ROC of Eq. 3.9 (ROC H). P(AI SN) ROC H/ ROC G 0 0 1 P(A IN) Figure 3.4. Comparison of optimum and suboptimum receiver performance. We are not attempting to compare g(O) to h(0); rather, we wish to compare performance of the receivers based on g(O) and h(0), under the condition that h(O) holds. Ideally, given that h(0) holds, it is desired to compare the performance of the suboptimum receiver based on g(0) to the performance of the optimum receiver based on h(O) via the ROC's of Eqs. 3.4 and 3.9; but it is often very difficult to evaluate performance for a specific example, even with the use of a high speed digital compur an ohititendhi ed numerical integration techniques.

32 Performance via index An alternative way of studying sensitivity of receiver performance to a priori specification would be to construct a procedure for ordering ROC curves, i.e., find a meaningful and consistent way of representing the two-number description of the ROC by one number. Several authors have studied this task; and they suggest the use of various indices. Several of these are reviewed by Birdsall (Ref. 23). Kullback and Leibler (Ref. 34) used the "divergence between hypotheses" J(1:0) = E[z(x)lSN] -E[z(x)IN] (3.10) where z(x) = fn P(x) (3.11) The divergence J(1: 0) was first introduced by Jeffreys (Ref. 35) in another connection. J(1: 0) is a measure of the "distance" or "divergence" between the hypotheses SN and N and is a measure of the ease of discriminating between them. The divergence J(1: 0) has all the properties of a metric as defined in topology except the triangle inequality property (Ref. 13). For normal ROC curves, J(1: 0) = d where 2E d =N (signal-to-noise ratio) (3.12) N0

33 Development Following this lead and given that h(O) holds, a performance index of the ROC based on z(xlh) can be defined as Jh(1:0) A E[z(xlh) h,SN] - E[z(xlh) h,N] (3.13) It is tempting to define an analogous performance index of the ROC based on z(xlg) [given that h(0) holds] as J'(1:0) = E[z(x g) h,SN] - E[z(xlg) h,N] (3.14) However, the relation of z(xlg) to the a priori density h(0) is most important. Picture the situation at the moment of use of the suboptimum receiver. The user knows the value of z(xlg) for the observation, say z0, and he knows h(0). To get the best possible performance he should base his decision on the likelihood ratio of what he has, with respect to what he knows. That is, he should calculate g(z1lh), or equivalently z(z0lh). We therefore give the user credit for using the suboptimum test statistic z(xlg) in the best possible manner by constructing a performance index based on z[z(x Ig) h], namely J (1:0) - E{z[z(xlg)lh] Ih,SN}- E{z[z(xlg)lh] h,N} (3.15) The transformation z[z(x g) h] can be considered as a means of reducing the test statistic z(xlg) to a "common denominator" with respect to the a priori density h(0 ).

34 The construction of an index of the ROC "difference" follows quite naturally; use as an index of the ROC "difference" I(glh) = Jh(l: 0)- J (1:0) = E[e(x)lh, SN] - E[c(x)Ih,N] (3.16) where E(x) - z(xlh) - z[z(x Ig) h] (3.17) One may also write I(glh) = f e(x)[f(xlh,SN)- f(xlh, N)] dx (3.18) Rationale The rationale behind the particular transformation z[z(xlg) h] is threefold; and it is based on the work by Birdsall (Ref. 23). First of all, if z(xlg) t z(xlh), then the ROC's are equivalent. But equivalence of ROC's implies that z[z(xlg)|h] = z[z(xlh)|h]. Since z[z(xlh) h] = z(xlh), this implies that e(x) 0; and hence I(glh) = 0 whenever z(xlg) t z(xlh). Secondly, if z(xlg) t z[z(xlg)jh], then the ROC based on z(x g) is equivalent to the ROC based on z[z(xlg) h]. Hence the a t b is read "a is strictly monotone increasing with respect to b."

35 ROC is unaltered by the transformation, and furthermore the ROC based on z(xlg) is a regular ROC. [A regular ROC is complete, convex, and interior to the unit square except at (0, 0) and (1, 1).] Finally, if z(xlg) z[z(xlg)|h], then the ROC based on z(xlg) is irregular,and it is dominated by the regular ROC based on z[z(xlg) h]. When this condition exists, the quantity J (1: 0) may not be approg priate; and hence the sensitivity index I(g I h), if used, should be used with the utmost discretion. Thus, the index I(g I h) is a measure or indicator of the "difference" between the ROC's based on z(xlg) and z(xlh), given that h(0) holds; and as such it can be considered as a sensitivity index of the quality of detection with respect to a priori specification. In other words, the index I(glh) measures the performance loss that occurs when g(O) is given but h(O) holds. 3.2.1 Equality of Moments. One way to organize a study of sensitivity to a priori densities is to study the effect of using a priori densities which differ in functional form but have their first few moments equal. It is hypothesized that performance is not significantly affected by different classes of a priori densities provided the a priori densities are reasonably smooth and have their first few moments equal. (The equality of moments of all orders of course implies equivalence of the respective a priori densities and hence performance.) In particular, equality of the means and variances appears to be of paramount importance with respect to equivalence

36 of performance. Consider the class of distributions h(O) absolutely continuous with respect to g(O) with h(0) = r(0)g(0) (3.19) Mathematical tractability will be provided by choosing p r(0) = C ri 0 (3.20) i=0 We require r(O) > 0 for all 0 e O since r(O) is a likelihood ratio. Ease of algebraic manipulation will be provided by defining ak = E(0 Ig) = f 0 g(O)d (3.21) k k k = E(0l h) = J 0 h(0) d f 0 g(e) Z re de 0 i=O p = r'iai+k (3.22) i=0 Equating moments, i.e., Ok = k' means the coefficients in the r(0) polynomial satisfy P - i= ri i+k k=0,1,... m; m<p-l (3.23) i=o

37 To avoid the trivial solution r(O) - 1, we shall assume r ^ 1. Set m = p-1 in order to obtain a unique solution of Eq. 3.23 to within a value of r0, and rewrite Eq. 3.23 as P (l-rO k = 2 ri i k= 0,,... (p-l) (3.24) To simplify succeeding manipulations, define the following matrices. = (ao, Y 1,. *,. ) (3. 25) = (rl, r2,.. r t (3.26) 21 2 A =(i+j1) j ='2 a3 p+ (3.27) p p+l C 2p- 1 Then Eq. 3.24 can be expressed equivalently as (1- r0)_ = Ar (3.28) Solving, r = (1- r) A-1 (provided A-1 exists) (3.29) The existence of A depends only upon the moments of the a priori

38 density g(0). Obviously, A1 does not exist if the moments do not exist. Even if the moments exist, the existence of A cannot be assured in general and must be verified for the particular application in mind. If we define t - a= (al, a2,... a) = A1 ( (3.30) then r = (1- r) a (3.31) and explicit dependence of the polynomial r(O) on r0 is P r(0) = r9+ (1 - r) 1 ai 0 r0 1 (3.32) Although it appears that r0 is arbitrary, the restriction r(0) > 0 for all 0 e 0 limits the permissible values of r. 3.2.2 Signal Known Exactly, Unknown Noise Level. To determine the effect of the equality of moments on performance, a study was conducted using as an example the detection problem, signal known exactly, unknown noise level (SKE-UNL). This detection problem was extensively investigated by Spooner (Ref. 22). His study used a gamma density as the a priori density of the reciprocal of the unknown noise power. The same a priori density will be used in this study. In particular he set

39 0 = 1 (3'. 33) a2(xIN) and chose b+1 c b -c0 g(0) = r b) eC b> -1, c> 0, 0< 0 < x (3.34) r(b+l) Some typical densities of this form are shown in Fig. 3. 5. The moments of g(0) are easily computed, namely = E( lg) = 0 kg(0)d 0 F (b+l+k) c r (b+1) (b+1) (b+2)... (b+k). 36) k C 3.2. 2.1 Preliminaries. It is shown in Appendix J that a. = d. c i = 1,2,... p (3.37) 1 1 with d. independent of c. Substituting Eq. 3.37 into Eq. 3.32 yields r(0) in a form which explicitly shows its dependence on r0 and c, namely P r(0) = r0+(1-r0) p d(c9) r0/1 (3.38)

40 g(~) 1.2 = 192 ** c = 3 1.0.8 c=2 b=1.6.4 c=.2 0 0 1.0 2.0 3.0 4.0 g(@) b = 1.15.2 t b =2.1 0 1 2 3 4 Figure 3.5. Typical probability density functions from the gamma family.

41 If we let f0 = r f = (l-r0)di i=1,2,... p (3.39) then the polynomial r(O) can be expressed in the form P r(0) = fi(co)i (3.40) i=0 with f. independent of c. The restriction r(O) > 0 for all 0 e [0, x) limits the permissible values of r0. It is shown in Appendix A that r [r*, 1) if d > 0 r0 e [r0, P or r e (1, r0*] if d < 0 (3.41) with r0 independent of c. In particular, it is shown in Appendix B that for p = 2 r = b+ 2 b> -1 (3.42) while for p = 3

42 2Vb+3 - b(b+3) < < -I < b < 1 2(/b+3 + 1) r < -(3.43) 0 b > 1 A plot of the extreme value r* is shown in Fig. 3. 6 for several values of p. A comparison of g(O) (r0=l) and h(O) is shown in Fig. 3.7 for p = 3. Recall that for p = 3 the means and variances match. Observe that the given a priori density g(O) and the desired a priori density h(O) differ noticeably for r = A r. varies from 0. unity to the extreme permissible value r0, h(0) changes from a unimodal to a bimodal density. (This is not obvious in the figure since the second mode occurs at 0 > 4.) The effect of matching moments will be shown in Section 3.2. 2.4; changes in relative performance level will be studied via the sensitivity index I(g I h) as developed in Section 3. 2. 3.2.2.2 Receiver Design. The receiver is a device which bases its decisions on z(xl g). It is designed on the basis of the given a priori density g(0), although in fact h(0) holds. Receiver design for the detection problem SKE-UNL was originally developed by Spooner (Ref. 22). His work is the basis for this section. Assume that the input observation x(t) is timelimited to the observation interval (0, T) and (Fourier Series) bandlimited to a band of frequencies of width W. Then the input waveform x(t) can be

43 I0 \t \ \ /.. CO *0 H \ "IV a i II II XII. \I c -a"~~~~~~~~~~~~r,-,c' s.4 r i 0 M i -( 0 ~ ~=.

44 0 oEo? 0 * -r II -p 0 ofn~ ~ f/~I -' 0 o a X o C Ii 0 0 0 0 0 0 0 0 0~0

45 represented as a vector in 2WT dimensional space by employing the sampling theorem. Thus we can characterize x(t) by x = (X1, x2..., Xn) (3.44) where n = 2WT, the dimensionality of the space W = bandwidth of input x(t) T = total duration of the observation and x x(= ) i 1,2,..., n (3.45) The conditional observation statistics are n 1 f(x l 2, N) = (1 e 2 (3.46) and n n 1 (xi- si)2 f(xla2,SN) = e 2a (3.47) 27Ta2 The noise power level a2 is assumed to be an unknown but timeinvariant parameter over the observation interval (0, T). Substituting 0 = - in Eqs. 3.46 and 3.47 yields a2

46 n 0 x 2 2 f(xlO,N) = ( e i=1 (3.48) and n 0 )2 f(xl0,SN) = () e (3.49) The total likelihood ratio depends upon the absolute observation statistics with respect to the given a priori density g(0). We obtain f(xlg,N) = j f(xl1,N)g(0)d0 n n 0 2 c e \2 2 — 2 i +1 /0 e 2 1i=l c1 b -cO o \2-7/ e rb+) 0 e dO JO I f1 r (b+l) -(b+l+ 1 ) =C c( i X.2 + 2c) (3. 50) and f(x I g,SN) = f(xl0,SN) g(O) dO F n- (b+ 1+-n) c C Z (xi-si ) + 2c] (3.51)

47 where Cn n/2 (3.52) n ^"Hb+1n/2 7/ r (b+1) Dividing, (b +1+ n x.2 + 2c (f(x I g,SN) i= f (xlg) - (3.53) f(x I g, N) n f (x.- si)2 + 2c i=l The likelihood ratio f (x I g) depends on the observation x only through the sufficient statistics n:(xlN) = L x.2 (3.54) i= 1 n _(xlSN) = E (xi-si)2 (3.55) i=1 3.22.23 Receiver Realization. An optimum processor is any device which realizes the likelihood ratio or a monotone function of it. A particular realization is presented in Fig. 3.8. The processor is essentially a dual channel device, i.e., one channel computes a sufficient statistic conditional to N while the other channel computes a sufficient statistic conditional to SN; these outputs are

48 combined to form the detection output. and Store n S(xlISN) j (xi- si)Figure 3.8. Receiver realization based on the given a priori density g(~). 3.2.2.4 Sensitivity Index. Performance of the receiver based on the given a priori density g(0) will be compared to the performance of the receiver based on the a priori density h(O) via the sensitivity index developed in Section 3.2. From Eqs. 3.17 and 3.18 I(glh) = f (x[(x f(xh, SN) - f(xh, N)] dx (3.56) where e(x) = z(xlh)- z[z(xlg)lh] (3.57) An nth order integration is required by Eq. 3. 56. The dimensionality of the problem can be reduced by the choice of an appropriate transformation. Reduction of Dimensionality. If we let

49 n E Z s.Z (3 58) u ( ( - S (3. 59) s i 2 n n 2 1 S i = 1 / v2 2 ( i i i (3. 60) then the transformation developed in Appendix C can be employed to show that -(b+ 1+ ) f(u, vlh, N) = K v (u + )2 + v2 + 2 n(u, v) (3.61) s and f(u,vlh, SN) = K v (u - v + s(u, v) nl - -o0< u< c, O<v< o (3.62) with 2(2c/E )b+ r(b+l+-) s 2 K = (3.63) n r 2 r ) r(b+l)

50 n(u, v) = 1 g2( u+) v2 + -c (3.64) n C giE (U + -2) E r(b+l+ ) i=0 s/ s and 1 2 C i2c\i 1 2 2c s(u, V) =. gu F)) + v2]2 (3. 65) r(b+l+!) i=O E s s0 where ~ = f r (b+t + i) i = 0,1,... p (3.66) gi fi 1 2 The coefficients gi are independent of c; and hence the absolute observation statistics depend on E and c only in the form E /c. s s Calculation of z(x I g). If we let a(u, v) z(xlg) (3.67) then c(u,v) = n f(u, v I, (3.68) f(u, v I g, N) (3.68) Since g(O) is a degenerate form of h(O) obtained by letting p = 0 and ro = 1, we obtain

51 (u + ) + vZ +2c 12 E n s a(u,v) = (b+l+-) fn (3.69) 1 v 2 2c (u- ) +v 2+E Calculation of z(x h). If we let (u,v) - z(xlh) (3.70) then (u, v) = n (u, v vh, SN)'f(u, vlh,N)J Calculation of z[z(x I g) I h]. It is shown in Appendix K that the transformation z[z(xlg) h] can be expressed as a ratio of integrals; evaluation can be achieved via a Gauss- Mehler integration routine. The transformation z[z(xlg) h] appears to be monotone with respect to its argument z(xlg). As a matter of fact, the transformation is nearly the identity transformation. Figure 3.9 shows how close the transformation z[z(x Ig) h] is to the identity transformation by plotting z[z(xlg)lh] - z(xlg) vs. z(xlg) for the "worst" (r0 =.01) case. (The transformation is precisely the identity transformation if h(O) = g(0), i.e., when r0 =1.)

52 z(xlg)- z[z(xlg) h].4 3 — Q'.1 I*I Ado l_ — z(xlg) 4 -3 -1 0 1 2 3 4 -.2 b = 1 E C -.3 n = 10 -,4 Figure 3.9. Transformation z[z(xlg) |h]. Calculation of E(x). If we let 6 (u, ) e(x) (3. 72) then

53 6(u,v) = z(xlh)- z[z(xlg)lh] = (u, v) - z[c(u, v) h] n (u,v) + (u, v) - z[a(u, v) h] (3.73) A contour map is used to inspect 6(u, v) in more detail. Two contour maps of the upper right half plane are shown in Figs. 3.10 and 3.11. An appreciation of the nature of 6(u, v) will assist in keeping computational errors to a minimum. It can be shown that 6(u, v) is an odd function with respect to u, i.e., 6(u,v) = -6(-u,v) (3.74) Furthermore, there appear to be p- 1 or fewer distinct contours of 6(u,v) = 0. Computation of I(glh). The sensitivity index I(glh) can be expressed as I(glh) = E[6(u,v)lh, SN] - E[6(u,v)lh,N] (3.75) fS 6(u,v) [f(u,vlh,SN)- f(u,vlh,N] dudv 0 -x Numerical Results. A Gauss-Legendre integration routine was used on an IBM 360/67 digital computer to obtain numerical results. Careful supervision of the inherent computational errors led to reasonably accurate results.

54 4- / 6 =.10 ~ 5 \ 0 6=0. = -.05.2 5 =.75 0o 2 5 4 u Figure 3.10. Contour map of 6(uv), b = 1, Es/c = 1, n = 10, r =.5. S o The effect of r0 on the sensitivity index I(glh) is shown in Fig. 3.12. The sensitivity index I(glh) = 0 for r0 =1 [ro =1 implies that r(0) = 1 and hence that h(0) = g(0)]. The sensitivity index increases monotonically from zero to its maximum value as

55 v =.025 4 — =.050 5 =.075 \ _ 1' {//~5 \100 2 -- \< \ \ & =.125 =0 5=.02. 0 1 2 4 Figure 3.11. Contour map of 6(u,v), b = 1, E /c = 1, n = 2, r =.5. S O r0 varies from unity to its extreme permissible value r0* = 0; the sensitivity index changes more rapidly as r0 - r*. Sensitivity appears to be almost independent of the number of observations n. The numerical values of the sensitivity index I(glh) are

56 To II o.CO _s T l C II II II. I By -~~~ I I< I I Il Ln ~ 0 0 0r4 1r' III I 0 D o C D ^ C O oO CD C O r.... 0 0 0 0 00O Q

57 relatively small numbers, even for r0 = r*; it can be shown that the performance index Jh(l: 0) is approximately equal to the expected value of the signal-to-noise ratio, i.e., Jh(l: 0) - E(OE lh) = E E(0E Ig) = (b+1) c. Thus, sensitivity of receiver performance to a priori specification is minimal even when only the mean and variance of h(O) are matched to the mean and variance of g(O). The validity of the sensitivity index can be ascertained by computing the standard deviation of 6(u, v). Figure 3.13 shows how the standard deviation of 6(u, v) is related to r0. The standard deviation is almost linear with respect to r0 and virtually independent of the number of observations n. The interpretation follows: the level of confidence one places on the sensitivity index should increase as r0 - i and should be independent of the number of observations n; in other words, one should have little doubt about the sensitivity index as an indicator of relative performance level whenever r0 is close to unity regardless of n. It is not possible to show how the parameter b affects sensitivity over its entire range if r0 is kept constant, since r0 is limited in its range of permissible values as indicated in Fig. 3.6. To circumvent this problem it was decided to keep the degree of perturbation between minimum and maximum permissible values of r0 constant; and hence a new parameter d was defined as P

58 0 0 To H Cao H o 0.4 ~(a c" to ~r-......~ ~ ~ ~ ~ ~ ~ ~ ~~ rII 4-/ rx, // / "m <6 a 0 O^' 0 0 o ~ ~ o o ^ /// {a - " S~~~~~~~~~~~ ^ ^//?~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~a ^ ^/^ ^ &~~~~~~~~ /// x ll ^~~~~~ ^ ______________ / ^____________________________________ c" CS1000CD'^CS C> T-~ r-( 0 0 0 0~~~~~~~~~~~~~~~~~~~e

59 r0 - r0 d r r (3.76) 1- rO* Figure 3.14 shows how the sensitivity index is related to b for d =. 5 (r was kept at a level halfway between its minimum and maximum permissible values.) The mode at b = can be accounted for by referring to Fig. 3.6 and observing the transition that occurs at b= for p=3. At that point r(0) = 0 for some 0 e (0, ); for p=3 and b> 1, r(0) = O only at 0 = 0 so that r*=0. The effect of the parameter E /c on the sensitivity index is shown in Fig. 3.15. Sensitivity is nearly linear with respect to E /c for 0 < E /c < 2. Intuitively this is very reasonable. An increase in signal energy Es for c fixed results in a higher performance level for any detection situation. The relative performance level as measured by the sensitivity index is likely to follow the same pattern. The sensitivity index reaches a maximum and decreases as E /c gets larger and larger. Detection becomes almost a certainty as the signal energy gets large in spite of the fact that the receiver is not optimum; and hence it is difficult but irrelevant to distinguish between the performance levels of the optimum and suboptimum receivers.

60 II o. II oo ~~~~~Lrn~~~~~~~~~~~~~~~rC C I PI I I1 III II 1 I cc _/ 4o I o oooa o o o o o o X-4 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~~~~~~~~~~~~~~~~^^ -- ^O O Ln O Ln O L^ —^^^^ - Crg CV CV r( O^^ rO~~~~^ "' O K ^ O O O O O O^^-^ ^ ---- ^ --- ^ g)~~~~~~~~~~~~~~~r

61 i e1~ ~~11 II -oo II io 0l Al Hl l _ \ *'j<~~I III-r43 -H -r'-H ll b.0 O-In 0 \~~ o ~ o.,,~Cc) hi,~~~~~~~~~~~~~~~^^ ( ^^ T-I~~ cr~ ~ ~ ~~ ~~~~~~~~~~~~~~s eo~~~~~~~~~~~~~~^s "n *i-tu?o o ss oo bfi ^^^ -

CHAPTER IV COMPARISON OF RECEIVER PERFORMANCE FOR THE COMPOSITE HYPOTHESIS SITUATION Two contrasting points of view to the theory of signal detectability are presented by the Bayesian approach and the classical approa ch. The Bayesian approach requires the assignment of a priori densities to all unknown parameters and averages the conditional observation statistics with respect to the assigned a priori density. This allows the user to introduce physical or empirical knowledge about these parameters. Unfortunately, some view these a priori densities as simply a convenient technique to eliminate the nuisance of unknown parameters, mistaking "freedom to incorporate" for "license to eliminate." Classical statistics, on the other hand, seeks a test which is either uniformly most powerful (UMP) or uniformly most powerful unbiased (UMPU). Basically, it seeks a test which is either the best test (maximizes the probability of detection for a given probability of false alarm) regardless of the actual value of the unknown parameter, provided only that it is admissible; or it seeks the best unbiased test (never yields performance poorer than chance) with respect to the admissible set of values of the unknown parameter. While UMPU tests often exist, UMP tests exist only in special circumstances. 62

63 Included in the classical approach are the "estimate and plug" techniques, e.g., the "generalized likelihood ratio" method. Both the Bayesian and classical approach often lead to tests which yield identical performance in terms of ROC. This seems to be particularly true of the single composite hypothesis for situations where the parameters are "non-energy-bearing, " and only when the a priori knowledge is modeled by a maximum entropy density. Even for the double composite hypothesis, application of classical estimation techniques may result in the same test obtained by Bayesian analysis. Thus, although the basic criteria employed for hypothesis testing may be dissimilar, both Bayesian and classical tests often yield equivalent performance. As a result it is conjectured that Bayesian philosophy and classical statistics can usually be reconciled with each other for the composite hypothesis. 4.1 Introduction This chapter will consider both the Bayesian and classical approach to signal detection for the composite hypothesis and contrast performance in terms of ROC for several examples. In reference to Bayesian philosophy the ESP receiver is postulated as a means of determining the information content and usefulness of a specific a priori density for use in receiver design. In conjunction with the ESP receiver, conditional performance curves are explained along with average performance curves. Several

64 pertinent examples are worked out. From a classical point of view UMP and UMPU tests are described, and several examples are given. In addition some classical estimation techniques are employed in the task of receiver design. Indications are that "estimate-and-plug" procedures lead to the same type of receiver obtained by utilizing Bayesian techniques along with a diffuse a priori density. 4.2 Bayesian Approach Bayesian philosophy assumes there is a known a priori density for all unknown parameters of the conditional observation densities. Let 0 denote an unknown N and/or SN process parameter(s) in the subsequent development, and let g(0) denote the a priori density assigned to 0 4.2.1 Evaluation Methods. According to Bayesian analysis the absolute observation statistics are f(xlsN) = S f(xI 0, N)g()d (4.1) The total likelihood ratio is (x) f(x SN) (4.2) f(x N) and the conditional likelihood ratio is t(x ( ) (4.3) f(xi0,N)

65 Optimum ROC The optimum receiver bases its decisions on 2(x) while the conditionally optimum receiver bases its decisions on 2 (xl I). Optimum receiver performance is determined via OPT = / f NelNldx P (A I f ) = () > f(x I SN) dx P= s Jo f(x I, N)g(0)d] x dx e L((x)> f = xf F f(xl ~, SN) dx] g(0) dO f pCS(A, SN) g(0) d (4.4) where PC (Al, N) = f(x I, N)dx (4.5) S(x)> f The parametric Eq. 4.5 describes the conditionally suboptimum ROC obtained by the receiver which operates in a conditional environment but bases decisions on 2(x) instead of (xl 0). Hence, the optimum ROC is obtained by averaging the conditionally suboptimum ROC's with respect to g(0). ESP ROC The externally sensed parameter (ESP) ROC has been proposed

66 as a convenient upper bound to the optimum ROC by Spooner (Ref. 32). The name is derived from the hypothetic noiseless observation of the parameter 0 as well as the usual observation, x. If 0 contains "signal" parameters, the ESP receiver observes these whether the signal is actually present or absent. The ESP receiver is a fictitious receiver whose ROC is obtained by averaging the conditionally optimum ROC's with respect to g(0). Hence, it is independent of the unknown parameter(s) 0, but dependent on the a priori density g(0). The parametric equations describing the ESP ROC are E SP N))f pCO(AIoN ) P (AISN) = CO(A, SN) g(9)d9 (4.6) where CO(A 0,N) = f f(xl0, N)dx (4.7) Since the ESP ROC averages conditionally optimum ROC's while the optimum ROC averages conditionally suboptimum ROC's, it is readily apparent that the ESP ROC serves as an upper bound to the optimum ROC. However, equivalence of the conditional ROC's does not necessarily imply equivalence of the average ROC's. Interpretation The ESP ROC and the optimum ROC are both dependent on the a priori density g(0). The difference between these ROC's indicates

67 the degree to which the unknown parameter 0 has affected performance, subject to the a priori density g(0). One might conjecture that a small difference means either that the parameter 0 is irrelevant, or if relevant, that the optimum receiver acts "as if it had learned the value of 0 " from the observation, x. A large difference indicates that knowledge of 0 is relevant, and the optimum receiver is unable to effectively learn the value of 0 from the observation, x. 4.2.2 Examples. Several examples have been worked to demonstrate the concepts presented above. The examples to be considered are SKEA (signal known except for amplitude) with a normal a priori density, SKEA with an a priori density that assigns zero probability to the negative axis (known sign), and SKE-UNL (signal known exactly, unknown noise level) with a gamma a priori density. Definitions The following definitions will be used throughout this chapter in each of the examples: w = x Z -x (4.8) y = st x (4.9) t -1 do= s ~ s (nominal signal-to-noise ratio) (4.10) 0 = s

68 4.2.2.1 Signal Known Except for Amplitude, Normal A Priori Density. Assume that the unknown amplitude a can be modeled accurately by the normal a priori density (a- m)2 1v2v g(a) = e 2 (4.11) and that the conditional observation statistics can be modeled by the nth order multivariate Gaussian densities n 1 t -1 f(xla,N) = (27rZ I) e (4.12) n 1 t -1 n _ 9 (x-as) t (x-as) f(xla,SN) = (27rS 1) e (4.13) where s is the signal "waveshape" and Z is the noise autocovarlance matrix. In order to avoid obscuring the principles involved, the derivation of receiver operation and performance is performed in the appendices for both the conditionally optimum and suboptimum cases. Conditionally optimum receiver The decision rule of the conditionally optimum receiver (derived in Appendix D) is

69 kno ado a 2 0(x) = if a> 0 0 otherwise or no ado i y< a~+ 0 a 2 0(x) = if a< 0 (4.14) 0 otherwise (The nature of the decision rule 0(x) was discussed in Section 2.2.2.) The corresponding performance equations (derived in Appendix D) are PCO(AIaN) = 2 ( a - a ) (4.15) ~l al l/ 0 )CO(Ala,SN ) = -! nal A These parametric equations in X describe a normal ROC with d' = 0 Conditionally suboptimum receiver The decision rule of the conditionally suboptimum receiver (derived in Appendix E) is

70 1 |y +-V > 0 0(x) = < (4.17) 0 otherwise The corresponding performance equations (derived in Appendix E) are Cs P (Ala,N) = (X) + (A2) (4.18) where -A+ mv1 AX1~ -h + mv - ~~~~~(4.20) 2mv (4.21) 2 -VX 0 and A =;2(v1 + d) [n( /I +vd0) +2v (4.22) ESP ROC The ESP ROC is determined by substituting Eqs. 4.11, 4.15 and 4.16 into Eq. 4.6. We obtain

71 pSP (AIN) -= /n I t Q ^1 e l ada (4.23) -xa; lal -d / V27 3C a 1~ (a -m)2 pESP (A I SN) n 1 2v ESP (Ia2I, + 2al d) a2w e (4.24) Numerical values for these parametric equations in 3 were obtained by use of numerical integration techniques. Optimum ROC The optimum ROC can be determined by substituting Eqs. 4.11, 4.18 and 4.19 into Eq. 4.4 and making use of Appendix H. We obtain (a- m)2 PPT (AIN) = f [(X ) + <(x)] 1 e 2v da = ((X1)+ C(X2) (4.25) (a- m)2 POPT(AISN) = f [(a1 +a 2) + (X2- a%)] e 2v da -so 2rv (x +mvi +0(~ (426 = + ( 2 (4.26) \ l+ vd0 / \ +vdo /

72 Diffuseness An inspection of Eqs. 4.18 - 4.21 shows that the conditional performance of the suboptimum receiver depends on m and v only m m in the ratio. The ratio - can be related to a measure of diffuseV V ness or non-centrality. An a priori density is said to become diffuse m-1 as the variance v becomes large. Hence, (-) can be considered a measure of diffuseness. Evaluation of receiver performance Results are presented in Figs. 4.1 - 4.4 for specific parameter values of do, m and v. The parameter do represents the nominal signal-to-noise ratio and was set equal to unity. Conditional ROC's (optimum and suboptimum) are shown in Figs. 4.1 and 4.3. Average ROC's (ESP and optimum) are shown in Figs. 4.2 and 4.4 for the same parameter values as the conditional ROC's. Conditional ROC's An inspection of the conditional ROC's of Fig. 4.1 immediately leads to the conclusion that the conditionally suboptimum ROC may become biased (yield performance poorer than chance) for some particular choice of a. This situation will occur whenever the environmental conditions (actual operating conditions) and the given a priori 1 -. I From a dimensional standpoint, is a more appropriate choice, especially since (- ) is termed a measure of centrality.

73.99 905 b.90 / / / // // -.50 / / / / / --- --- —' Suboptimum.05- do =/ 1 -/ / /7-pm// // 2 P(AIN) Figure 4.1. Conditional ROC's, comparison of optimum and suboptimum receive performance, SKEA, d = 1, m/v = 2.

74.99 / _/ / /.95 - /.90 - / / E SP m= 2 v=l.05.01.05.10.50.90 P(A IN) Figure 4.2. Average ROC's, comparison of ESP and optimum receiver performance, SKEA, d = 1, m = 2, v = 1. 0

75.99 / /.90 / / / /77'/ / / 7y^ / C/ do = ^ / / / / <, / /7.0.05..50. Figure 4.3. Conditi l R, cn of o m ad uboptimum performance, SK05, m/v= O.1 m.01.05.10.50.90 P(AIN) Figure 4.5. Conditional ROC's, comparison of optimum and suboptimum receiver performance, SKEA, d = 1, m/v = 0. 0

76 99.95 / 7o- 7, / /7 /.90 -.- 7 / _ /y~ ~ ~ ~ O / ESP / / 7 / d 7 7 m= 0.05- M=.01.05.10.50.90 P(A IN) Figure 4.4. Average ROC's, comparison of ESP and optimum receiver performance, SKEA, do = 1, m = 0. 7/A 0

77 density differ widely. Hence the receiver designer must be cautioned to be very careful about the a priori density he chooses to model the physical situation. The assumption that negative amplitudes occur rarely can lead to very poor performance (worse than chance) when the environmental conditions are such that a negative amplitude does occur. The performance index d' may be reduced by as much as 2m/(v V/). For instance, the conditionally suboptimum ROC of Fig. 4.1 obtained for a = -6 has an equivalent d' = 2 whereas precise knowledge of the amplitude would have resulted in d' = 6. By the same token, the conditionally suboptimum ROC obtained for a = 4 yields essentially the best performance possible under the circumstances. Hence it is very important to incorporate into the a priori density, in a very precise manner, any information concerning actual operating conditions. It is best to assume a diffuse a priori density - << 1) if little or no knowledge is available; use a sharp a priori density ( >> 1) whenever precise knowledge is available. The conditional ROC's of Fig. 4.3 are based on an even symmetric (m =0) or a diffuse (m 0; as v- sc, --- 0) a priori density. Note that the conditionally suboptimum ROC's are unbiased (performance is at least as good as chance) for all admissible amplitude values. However, the conditionally suboptimum ROC's of Fig. 4.1 are superior to the conditionally suboptimum ROC's of Fig. 4.3 for positive amplitudes. It should be clearly understood-that the "optimum" receiver

78 produces the best average ROC, best with respect to the given a priori density g(-). That is why it is important to carefully choose g( ) to reflect the performance balance the user wishes (and not the designer's convenience). If the user does not desire best average performance, but, for example, something like maximum worst-case conditional performance, then it would be advisable to drop likelihood ratio and use other design principles. Average ROC's The average ROC's for this detection problem are presented in Figs. 4.2 and 4.4. Since the ESP ROC is the average of the conditionally optimum ROC's and the optimum ROC is the average of the conditionally suboptimum ROC's, Fig. 4.2 is a "capsule" summary of Fig. 4.1 and Fig. 4.4 is a "capsule" summary of Fig. 4.3. Closeness of the optimum ROC to the ESP ROC tends to indicate that on the average the conditionally suboptimum ROC's are close to the conditionally optimum ROC's. In essence, comparison of the average ROC's gives some indication of the relative behavior of the conditional ROC's. 4.2.2.2 Signal Known Except for Amplitude, Known Sign. This section will determine the operation and performance of the receiver which is optimized with respect to an a priori density that tMaximum worst-case conditional performance means "go after the little signals and don't try to take advantage of the big ones."

79 assigns zero probability to the negative axis. [The sign of a is known, i.e., P(a > 0) = 1.] Restriction of the a priori density to the class of densities which assigns zero probability to the negative axis leads to some interesting results. The unusual aspects of this detection problem will be discussed. Both conditional and average ROC's will be derived. Conditionally optimum receiver Conditionally optimum receiver operation and performance do not depend on the given a priori density. The decision rule and ROC were derived earlier. The decision rule is given by Eq. 4.14 and the ROC is described by Eqs. 4.15 and 4.16. Receiver operation consists of simply computing the sufficient statistic y = st 1 x and comparing it to a threshold which depends upon the actual amplitude a. Conditionally suboptimum receiver It is shown in Appendix F, subject only to the restriction that g(a) assigns zero probability to the negative axis, that suboptimum receiver operation is identical to optimum receiver operation. Again, receiver operation consists of merely comparing the sufficient statistic y to a threshold y; the threshold y depends upon the a priori density g(a) but is independent of the actual amplitude a. Conditionally suboptimum receiver performance depends upon the conditional observation statistics of y. It was shown in Appendix D that

80 2 y 2d 1 0 f(yla,N) - e (4.27) 2id0 and (y- ad0)2 1i' 2d f(yla,SN) = e 2 (4.28) x/27rd Hence, the conditionally suboptimum ROC is described by pCS(Ala,N) = S f(yla,N)dy y> and = D (6) (4.29) CS(Ala,N f(ya,SN) dy y> = (6 + a 4d-) (4.30) where 6 = z- (4.31) These performance equations yield a normal ROC with d' = a vd. Hence, the conditionally optimum ROC's are equivalent to the conditionally suboptimum ROC's. However, the parametric equations describing the ROC's are different. Knowledge about the exact

81 operating point (threshold level) becomes significant. As a result the ESP ROC and the optimum ROC are not equivalent. ESP ROC The performance equations of the ESP ROC are pESP(AIN) = (-n a SP (AIN) = \J 0) g(a)da (4.32) 0-d0'a 2 and a PESP(AISN) -= f ( np + ) g(a)da (4.33) 0 vrad- 2 0 Optimum ROC The performance equations of the optimum ROC are POPT(AIN) = f (6) g(a) da = (6) (4.34) 0 and pOPT(A ISN) = f ( +a / ) g(a) da (4.35) 0 0 Evaluation of receiver performance An example of this type was worked by Spooner (Ref. 32). The a priori density used in his study was the truncated normal dens

82 (a- )2 1 2XZ g(a) = 1 e 22 <a<c (4.36) He presents performance curves for the ESP receiver and the optimum receiver for various values of a and X. Basically, the average ROC's displayed the same features and characteristics as those of Figs. 4.2 and 4.4. 4.2.2.3 Signal Known Exactly, Unknown Noise Level, Gamma A Priori Density. For the detection problem, signal known exactly, unknown noise level (SKE-UNL), the autocovariance matrix of the observation is known only to within a scale factor. If I denotes the autocovariance matrix, then 1~- = rZ2 S (4.37) where Z is known and a2 represents the unknown noise level. If we let 0= 1- (4.38) a2 then the conditional observation statistics become n 0 t - xt \i x f(xIO,N) 0 = e (4.39) -27 t l |

83 n 0 -)t -1 /2 — (x-s zS (x-s) f(xlO,SN) = ) e (4.40) Reduction of dimensionality The following transformation turns out to be very useful. (Details are discussed in Appendix G.) If we let _y 1 u d 2 (4.41) d0 2 w V2 = i - (4.42) d0 d0 where t w = x 1 x (4.43) t 1 y = s z1 x (4.44) dc = s Z s (nominal signal-to-noise ratio) (4.45) then 1 0 d (u +)2 f(ul0,N) = ( 2t) e - < u< 3 (4.46)

84 n-1 dv2 2( ) 2 e f(v1O,N) = r ) 0<v< c (4.47) 1 Od (u-A)2 n- f(u1O,SN) = 22) e 2 -c < u< c (4.48) f(vlO,SN) = f(vl0,N) 0<v< x (4.49) The quantity (u, v) is a sufficient statistic of the observation and has reduced an n-dimensional problem to a 2-dimensional problem. Conditionally optimum receiver In terms of (u, v) the conditional likelihood ratio is (u v IO) = f(u,vl0,SN) - f(ul,SN)f(vl0,SN) f(u, v l, N) f(ulO,N) f(vlO,N) Od u =e (4.50) Since f(u,vlI) > 3 implies that u > f d, the decision rule of the conditionally optimum receiver is (1u >. do u> d0 0(x) = (4.51) 0 otherwise

85 The decision region is pictured in Fig. 4. 5. V I an 0 do Figure 4.5. Decision region of the conditionally optimum receiver, SKE-UNL. Dependence of the decision region only on u simplifies calculation of the conditionally optimum ROC. The performance equations are PO(Al,N) =5 f(ul0,N) du = 0r - 2 ) u> un O0 (4.52) P (AIO, N)= f (ulO,SN)du= -tn, +: 0d u> \ / 0 (4. 53) Conditionally suboptimum receiver The conditionally suboptimum receiver is based on the total

86 likelihood ratio. The total likelihood ratio can be determined from the absolute observation statistics f(u,vlN) = f(u,vl0,N) g(O)dO (4.54) f(u, v ISN) = f(u, vl, SN) g(0) d0 (4.55) 0 If we model the unknown parameter 0 with a gamma a priori density b+1 g(0) = c bec c> 0, b> -1, 0< < (4.56) then performing the required integration is straightforward and yields r(b+l+2) -(b + 1 + ) n- 2 2 f(u, vIN) = K v (u2 + 2c(4.58) n d where 2c b+1 n 2 - r~-b+ l+ ) K v (4. 59) Kn rn- Ir (4 5 Hence, the total likelihood ratio is

87 (b+ l+) (u ) + 12 +2 f(u,v) - f(u, vSN) 0 (4.60) f(u, vlN) (u-2 +v2 c and the decision rule of the conditionally suboptimum receiver is I1 f(u,v) > 0(x) = (4.61) 0 otherwise Some algebraic manipulation reveals that (u, v) > 3 implies (u-a)2 + v2 > r2 if A < 1 (u-a)2 + v2 < r2 if A> 1 u > 0 if = 1 (4.62) where 1 A+1 a A- 1 (4.63) 2 h 2c r2 = _ (4.64) (A- 1)2 d0 1 n b+l+2 A = 2 (4.65) The decision region is pictured in Fig. 4. 6.

88 Figurena Figure 4.6. Decision region of the conditionally suboptimum receiver, SKE-UNL. Conditionally suboptimum receiver operation consists of determining whether the observation (u, v) lies inside or outside of the semicircular region described by Eq. 4. 62. This is in contrast to the decision region of the conditionally optimum receiver (Fig. 4. 5) which consisted of a half plane. The conditionally suboptimum ROC's are described by the parametric equations PCS(AION) = S f(u,vl0,N)dudv (u, v)> r a+ r- = f(vlO,N) f f(ul,N)dudv O a- ~rz v2 r = f(vl0,N) [c (R V) - O (S V )] dv (4.66) 0

89 pCS(AIO,SN) = f(u,vlI0,SN) dudv f (u, v) > (4. 67) r = f(v I,SN) [cO(R' ) - ~ (S'')] dv 0 where R = a+ + r2 - v2) /d (4.68) S = a+- r v) (4.69) and R'= (a- + r - v (4.70) S = (a — 2 - r ) (4.71) ESP ROC The average ROC's can be determined in terms of the conditional ROC's. The ESP ROC is described by pESP(AIN) = f PCO(AI0, N) g(0) d 0 =f - (nf3 ___ d J-. g(O)dO (4.73) 0 vOd- 2/ k- 0

90 pESP(A I SN) = pCO(A I, SN) g() d 0 f~ -c nd(~ + ) g(0)dO (4.74) Optimum ROC The optimum ROC can be determined from the conditionally suboptimum ROC's. OPT(AIN) = f PCS(AIN) g(o) do 0 (4.75) 3a r = g(O) f f(vl0,N) [>(A f0) - (B fJ)] dv dO 0 0 Reverse the order of integration, make the substitution 2 = 0 (2c+dov2) and utilize Appendix H to yield pOPT r N)=Vn- 2 2c 2 OPT r n-[T (t2)- T (tl)] dv P T(AIN) K fr vn 2 (v2 + ) [T )- T(t)] dv 0 (4.76) where v = 2b+ n+ 1 (4.77) = V c(a+ - v (4.78) v2 +

91 t2= a +2 + - v) (4.79) d 2 K = ) (b ) (4.80) r(n;) r(b+l) and I 1\ r(V) t 2 T (t) = 1 + l dx (4.81) v r() -a \ Similarly, 0r 2\ / OPT (AISN) = K r v2 (v2 + -) [T (t) - T2(t)] dv (4.82) where t= 2 a-r- -V 2) (4.83) v2 + do 2j 2 (a + r v (4.84) v2 +d d0

92 Evaluation of receiver performance Although the equations describing the various ROC's are not in a very convenient form, they are amenable to numerical integration techniques. The ROC's presented in Figs. 4.7 - 4.10 were do obtained in this manner. The conditional ROC's depend only on -, 0 do and n. The parameter b is absorbed into the threshold and does not affect the conditional ROC's. The average ROC's depend d only on —, n and b and are independent of 0 by virtue of the fact that we are looking at average performance. Conditional ROC's The conditional ROC's of Fig. 4.7 clearly show how uncertainty about the noise level affects performance. Whenever the threshold is unity (P =1), conditionally suboptimum receiver performance is equivalent to conditionally optimum receiver performance. This feature is easily explained by a closer inspection of the decision regions. The decision region of the conditionally suboptimum receiver is the semicircular region depicted in Fig. 4. 6. When the threshold is set to unity, the semicircular region degenerates into the half plane u> 0. The decision region of the conditionally optimum receiver depicted in Fig. 4. 5 is also the half plane u > 0 whenever the threshold is set to unity. Hence performance is equal at that particular threshold setting. The importance of this feature lies in the fact that many binary communication systems operate under 1) the assumption of equally

93.99.; / // ax - - -Optimum:^ // ~ B~ / ----- ~Suboptimum 10 7 \0/ ^90yn = 2.0.01,,, I I I,, I,,,I.01.05.10.50.90 P(A IN) Figure 4.7. Conditional ROC's, comparison of optimum and suboptimum receiver performance, SKE-UNL, n = 2. performaance, SKE-LJNLj n 2.

94.99 / / / _/ / _ /.95 - // 7' / 4 / / --------- ESP4 ~/~~ ESP Optimum.10 7/ b=1.05.01 J I I I I I I I.01.05.10.50.90 P(A IN) Figure 4.8. Average ROC's, comparison of ESP and optimum receiver performance, SKE-UNL, b = 1.

95.99.95.90 -// - -- Optimum I10o^y^/ /, ------ Suboptimum. 0~ / /o/ c=O.05.01 I I I I I ] I ] ] I I.01.05.10.50.90 P(AIN) Figure 4.9. Conditional ROC's, comparison of optimum and suboptimum receiver performance, SKE-UNL, c = 0.

96.99 // 95. /.50 Optimum /ne VIUL 0 /I- = 1 // C.05.01.05.10.50.90 P(A I N) Figure 4.10. Average ROC's, comparison of ESP and optimum receiver performance, SKE-UNL, d /c 1. 0

97 likely signals and 2) the criterion, maximize a posteriori probability. Hence, decisions are based on likelihood ratio with a threshold of unity; and performance is optimized and independent of fluctuations in noise level. Conditionally optimum receiver performance depends solely upon the actual signal-to-noise ratio, 0 do, whereas conditionally suboptimum receiver performance depends primarily upon the expected value of the signal-to-noise ratio, E(0 d) = (b+l) ~ (4.85) The conditional ROC's of Fig. 4.7 indicate that conditionally suboptimum receiver performance converges to conditionally d optimum receiver performance as -- 0. The following interpretation is offered: whenever the actual signal-to-noise ratio, 0 do, is d in close agreement with the ratio - (the parameter b is absorbed c into the threshold and affects only the particular operating point, not the ROC), a close correlation exists between conditionally optimum and conditionally suboptimum receiver performance. For instance, the conditionally suboptimum receiver designed for large > 1 c c performs rather poorly whenever 0 do is small (0 d0 < 10). Cond /d0 c trariwise, the conditionally suboptimum receiver designed for small <1) performs very well for 0 do large (0do > 1). Hence, it is very important to assign an a priori density which reflects

98 as closely as possible the actual environmental conditions. When much uncertainty concerning the actual state exists, be realistic in assigning an a priori density. Average ROC's d The average ROC's of Fig. 4.8 are parameterized by - and n and reflect the conditional ROC's of Fig. 4.7. The ESP ROC and the optimum ROC average the conditionally optimum ROC's and the conditionally suboptimum ROC's respectively with respect to the given a priori density. Equality of the ESP ROC and the optimum ROC on the negative diagonal is apparent, independent of -, b or n. Performance increases as a function of. In addition, the optimum ROC converges to the ESP ROC as n- cc, although convergence is not uniform along the ROC. Diffuse a priori density The conditional ROC's of Fig. 4.9 and the average ROC's of Fig. 4.10 behave in an analogous manner. The conditional ROC's of d Fig. 4.9 were presented for c = 0 (or - = x) in order to make the effect of n obvious. A value of c=0 corresponds to a diffuse a priori density. The conditionally suboptimum ROC's nevertheless converge quite rapidly to the conditionally optimum ROC's independent of 0 d. Hence, conditionally suboptimum receiver performance can be considerably enhanced if the total observation is at least 20-dimensional. Figure 4.10 presents the ESP and optimum ROC's for - = 1 C

99 (the ROC's are singular for c=0). Convergence of the ESP ROC to the optimum ROC is apparent as n - c independent d of b and. In addition, average performance improves as b increases. Summary It has been shown that the ESP ROC serves as a useful upper bound to the optimum ROC. The ESP ROC averages the conditionally optimrum ROC's while the optimum ROC averages the conditionally suboptimum ROC's. The importance of matching a priori specification to the environmental conditions has been demonstrated. In addition, performance of the conditionally suboptimum receiver has been shown to converge asymptotically to the performance of the conditionally optimum receiver as the dimensionality of the total observation is allowed to increase. 4.3 Classical Approach The classical approach to composite hypothesis decision theory consists essentially of constructing a test (decision region) which is either uniformly most powerful (UMP) or uniformly most powerful unbiased (UMPU) or of estimating the unknown parameter(s) by some appropriate method and then proceeding as though the estimator(s) were exact, e.g., the generalized likelihood ratio method. An unknown parameter is not assigned an a priori density. Nevertheless, Bayesian and classical approaches to detection theory, though at odds,

100 often lead to tests which yield equivalent performance, especially for certain single composite hypotheses. All the tests (or receivers) in this section will be evaluated conditionally since this is the only available means of comparison. 4.3.1 Uniformly Most Powerful Tests. A uniformly most powerful (UMP) test is basically a test which yields the same performance that would have been obtained if the unknown N or SN process parameter(s) 0 were known. In addition a UMP test requires that the probability of false alarm be independent of the unknown parameter(s). In essence, the ROC corresponding to the UJMP test (UMP ROC) is equivalent to the conditionally optimum ROC for all 0 e 0. It is not particularly difficult to construct a test whose ROC is equivalent to the conditionally optimum ROC for a particular choice of 0; but to construct a test which is equivalent for all 0 e 0 is possible only in special circumstances. A more detailed discussion of the UMP test can be found in Refs. 6-8. Example A UMP test is obtained for the detection problem SKEA if it is known that the amplitude is nonnegative (a > 0, i.e., the sign of the amplitude is known). The UMP test is 1 y >o 0(x) = (4.86) 0 otherwise

101 where y is the sufficient statistic t -1 y = xt -s (4.87) Hence, the receiver consists of a device which computes the sufficient statistic y and compares it to a threshold a to make a decision as to presence or absence of a signal. The UMP ROC will be equivalent to the conditionally optimum ROC for all a e [0, oc) since the test (receiver operation) is of the same form. Comparison to Bayes test The Bayesian analysis of this problem (Section 4.2. 2.2) incorporated the knowledge a > 0 by assigning an a priori density which gave zero probability to negative values of the unknown amplitude; the corresponding receiver operation and performance are identical to that of the UMP test. It appears as though Bayesian analysis might be sufficiently general that it could conceivably encompass classical statistics as a subset. Further investigation of this aspect of the problem was deemed worthwhile and will be pursued in later sections. 4.3.2 Uniformly Most Powerful Unbiased Tests. A uniformly most powerful unbiased (UMPU) test is basically a test which never yields performance poorer than chance and is the best test among all tests satisfying this criteria. An additional restriction placed on a UMPU test is that the probability of false alarm be independent of the

102 unknown parameter(s). Naturally, such a test is highly desirable, but the existence of a UMPU test is not always assured. However, when the conditional observation statistics belong to the exponential family, a UMPU test exists (Refs. 7 and 8). Thus, a UMPU test exists if the conditional observation densities are multivariate normal densities. Example The detection problem SKEA with no restriction on the range of the unknown amplitude (i.e., -so < a < x) will yield a UMPU test. It is the familiar two-sided test (Ref. 7, p. 225). I1 lyl > a 0(x) = o y (4.88) I0 lyl < a Hence, receiver operation consists of determining the sufficient statistic y and comparing its absolute value to a threshold a. Comparison to Bayes test The receiver designed by Bayesian methods (Eq. 4.17) with a symmetric a priori density (m = 0) is based on a test which is identical to the UMPU test. Assignment of an a priori density symmetric about zero reflects the designer's knowledge (or opinion) that the sign of the amplitude is equally likely to be + or -, the maximum entropy condition for sign knowledge. Thus, Bayesian analysis was sufficiently

103 general to include the UMPU test in its repertoire and flexible enough to incorporate information ignored by classical statistics. Although Bayesian methods can lead to biased tests, this should not be considered a defect since biased ROC's will occur seldom (i.e., with small probability). Example A UMPU test can be obtained for the detection problem SKEUNL if we let the observations xi, i = 1, 2,... n be independent and specify the signal as constant, i.e., let v = I (4. 89) and = so 1 (4.90) where 1 = (1,1,, 1.)t (4.91) and s0 is a known constant. Subject to these conditions, the sufficient statistics w and y of the Bayes test (Eqs. 4.43 and 4.44) become t - t n w = x x = xx = x.2 (4.92) i=l 1

104 y = s s x sx= z x.s. = s 3 x. (4.93) i=1 i=1 and the nominal signal-to-noise ratio (Eq. 4.45) becomes -1 1On do = s S s = s s = s.2 = n s2 (4.94) 0 - - 1=l10 Furthermore, the transformed random variables u and v (Eqs. 4.41 and 4.42) become v 1 x 1 u= dy - 2 - (4.95) n 3 (x.-x) 2 w 2JI V = d (d = 1 (4.96) 0 kdl d0 where n x = lx i s (u + ) (4.97) n l1 0 The UMPU test is the one-sided test (Ref. 7, p. 229) 1 1 Y > a (x) = (4.98) 0 otherwise

105 This test can be expressed in terms of the standard t-test as /n-1 x 1 nx >, V v 0(x) = (4.99) 0 otherwise or equivalently, n-i (u + ) 1 ---- - > ot V 0(x) = (4.100) 0 otherwise The decision region turns out to be a half-cone with its center at u= -, v = 0. A typical decision region is pictured in Fig. 4.11. v slope= a, _iue41. eiinrein U U F 2 Figure 4.11. Decision region of the UMPU test, SKE-UNL.

106 Interpretation The random variable u + 2 is the signal portion of the observation, the part of the observation in the signal direction, or the part that "looks just like" the signal. The random variable v is the noise portion of the observation (the signal direction is excluded), the part of the observation that can be viewed as a r. m. s. noise level estimator. The Bayesian approach decides "signal is present" only if the observation (u, v) looks like the known signal with some modest amount of noise added to it. The classical solution is illustrated in Fig. 4.12. The classical approach bases decisions on the estimated signal-toestimated noise ratio, (u+ 2)/v; it may decide "signal is present" even though the signal portion of the observation, u+, is "far away" from the known signal, so long as the apparent signal-to-noise ratio is high. The classical solution corresponds to the common engineering solution of processing the signal portion of the observation through x -Gx u I x Gx Dimensionality Classical u+ Reducer v Processor v Fig. 4.12. Classical solution to the detection problem SKE-UNL.

107 an automatic gain control (AGC) device. The advantage of the AGC action is that the decision process is not affected by the observation level, i.e., the detection output, (u+ )/v, is independent of the variable gain G. Evaluation of receiver performance The performance equations can be obtained in terms of the central and non-central t-distribution on n- 1 degrees of freedom with noncentrality parameter 6. Specifically, P(AIN) = 1- T 0(4) (4.101) n-l1,' P(AISN) = 1- T (c (t) (4.102) n-l, where 5 = %Vdo (4.103) and VI n-1 __ T (t) =' - f xn1 3'(x) -- dx (4.104) 2 2 Note that the probability of false alarm is entirely independent of both the signal and the noise level as was to be expected. Tests of this type are often denoted as CFAR (constant false alarm rate) tests in the literature. In addition, the probability of detection only depends

108 upon the actual signal-to-noise ratio, 0 d The UMPU ROC's are shown in Fig. 4.13 for various values of V0d. and n. At P(AIN)=.5 (50% false alarms) receiver performance is independent of n and has a maximum normal index of detectability, although this operating point is unlikely to be acceptable. Whenever P(AIN) =.5, the threshold a" = 0 so that the coneshaped decision region of the UMPU test (Fig. 4. 11) degenerates into the same decision region obtained for the conditionally optimum test 0 d0 2! when = e (Fig. 4. 5), namely the half-plane u >. The UMPU ROC's asymptotically approach a normal ROC with performance index d' = 0do as n-:o. This performance bound corresponds to the ROC of the conditionally optimum receiver for 0 known. Convergence of the UMPU ROC's to the conditionally optimum performance bound appears more rapid for small values of V0 do (e d0 < 4). Comparison to Bayes test Comparing Fig. 4.6 and Fig. 4.11 gives one an appreciation of the difference in decision regions between Bayesian and classical analysis. A comparison of the performance of the UMPU test and the Bayes test (Eq. 4. 61) can not be made on an absolute scale. Neither the Bayes test nor the classical test yields the best performance that could be attained if 0 were known. Both tests have their merits. Obviously, the UMPU test is superior whenever P(AIN) =. 5.

109 0('0"a =4.99.01.05.10.50.90 90 - ~, CV CI.50.10L.05.01[.01.05.10.50.90 P(A IN) Figure 4.13. ROC's for the UMPU test, SKE-UNL.

110 By the same token, the Bayes test is superior on the negative diagonal [i.e., whenever P(AIN) + P(A ISN) = 1]. Existence or non-existence of an a priori density Since the Bayes test is not equivalent to the UMPU test for any choice of the parameters of the assigned gamma a priori density, the question of whether or not equivalence can be acquired for any other choice of a priori density is raised. In particular, does there exist an a priori density g(.) such that t -1 f(xlg) t X- - (4.105) /t -1 ax S x That is, can we find a function g(-) such that c -2 - -(w-2y+do) 2 e g(0) dO 0~ —-—. —-n- t -- (4.106) n O w J f 2 e g(0) d 0 It is shown in Appendix I that no such a priori density exists. Hence, the Bayesian approach and the UMPU test of classical statistics cannot be reconciled for the double composite hypothesis problem, SKE-UNL. The author believes that in general Bayesian and classical procedures (UMPU tests) yield diverse solutions to double composite hypothesis detection problems.

Ill 4.3.3 A Modified Uniformly Most Powerful Unbiased Test. A slight modification of the UMPU test to an altered form of the ttest is of interest, namely / n-1 u 1 n1 u > a 0(x) = (4.107) 0 otherwise The decision region of this test is shown in Fig. 4.14 and is a symmetric version of the decision region of the UMPU test (Fig. 4.11). v slope Figure 4.14. Decision region of a modified UMPU test, SKE-UNL. The performance equations for this test are P(AIN) = 1 - T(o (4.108) n-1, -- P(AISN) = - T 6(a) (4.109) n-l,2

112 Figure 4.15 displays the ROC's for various values of n. The ROC's are symmetric about the negative diagonal and equal to the performance bound on the negative diagonal. In addition the ROC's converge to the performance bound everywhere as n - oc Comparison to Bayes test The modified form of the UMPU test has the same gross characteristics as the Bayes test (Eq. 4. 61). A comparison of the conditional ROC's in Fig. 4.14 and Fig. 4.7 or Fig. 4.9 will reveal that the modified UMPU test is not uniformly superior to the Bayes test, or vice versa. Whereas the Bayes test may be superior in a certain range of parameter and threshold values, the modified UMPU test is superior elsewhere. If one is interested in conditional and not average performance, then the particular test one chooses to use in any given situation will depend upon the desired operating point and the surmised state of the unknown noise level. 4.3.4 Estimation Techniques. An easy and appealing way of attacking a composite hypothesis problem is to estimate the unknown parameter(s) and substitute into the observation densities as though the estimator(s) were exact. Minimization of mean square error is an appealing criteria while other desirable qualities in an estimator include unbiasedness, consistency and efficiency. Maximum likelihood techniques often result in estimators meeting one or more of

.99 113.95.0.90.05.01.o5.lo.5o.90 P(A IN) Figure 4.15. ROC's for a modified UMPU test, SKE-UNL.

114 these qualities. An excellent discussion of estimation techniques can be found in Ref. 26. Generalized likelihood ratio A classical method of estimating unknown parameters is the method of maximum likelihood. An estimator(s) of the unknown parameter(s) is obtained by choosing the estimator 0 to be the statistic which maximizes the likelihood function L(0) where L(0) = f(xl0) (4.110) The "generalized likelihood ratio" is obtained by treating these estimators as though they were exact, i.e., xNl, S f(xI N (4.111) IWx I 0 0 -— ^ — (4.111) f(xIO 0N' N) where 0SN is the maximum likelihood estimator with respect to f(x 0, SN) and 0N is the maximum likelihood estimator with respect to f(xl0,N). Example Consider the detection of a signal with an unknown amplitude (SKEA + KGN); and let the range of admissible values be -x < a < c. The likelihood function is

115 2 (x- as)t 1 (x-as) L(a) = f(xla,SN) = (2,rrlS) e (4.112) n l -- -'(w-2ay+a2d ) = (27r1I) 1 e (4.113) where w = xt E-x (4.114) t y = xt s (4.115) and d = s Z-1 s (4.116) Since the likelihood function L(a) is maximized whenever a =Y 0 choose - (4.11) The "generalized likelihood ratio" is f(xla =a, SN) I (xIa) - SN -~~ f(x I N) ay - d — d = e y2 2d =e (4.118)

116 Since Q(xla) t lyl, the decision rule becomes 1 lyl > ac 0(x) = (4.119) 0 otherwise Comparison to Bayes test This is precisely the decision rule obtained for the Bayes test (Eq. 4.17) when a symmetric a priori density (m =0) was assigned. Hence, application of classical estimation techniques for this single composite hypothesis problem resulted in a receiver which is identical in both operation and performance to the Bayes test which assigns a symmetric a priori density. Example Consider the double composite hypothesis problem, SKE-UNL. The likelihood function conditional to SN is n L(O1SN) = ( ) e x ( o )"2 - (w-2y + d )) ~and the lklhofuie conditional to N is and the likelihood function conditional to N is

117 n L( IN) e = l)1 e 2 ~2d1 2 e (4.121) The maximum likelihood estimators are SN = - 2y + d- (4.122) =-n w~~~~~N w (4.123) The "generalized likelihood ratio" is 0(xt~sN, i) = fSN, SN) f(xlo 0N' N) n n - w W- 2y+ d0 n rL(U:4)2~ ] (4.124)

118 where u = Y 1 (4.125) d0 2 2 W 2 v= w - (4.126) (The random variables u and v are sufficient statistics of the observation and are defined as before.) Comparison to Bayes test The "generalized likelihood ratio" is the limit of the likelihood ratio of the Bayesian approach (Eq. 4. 61) when a diffuse a priori density is approached (b — 1, c - 0). Again, classical analysis led to a solution which can be identified with a particular Bayes solution. 4.4 Comparison of the Bayesian Approach and the Classical Approach A general discussion of the relation among UMP tests, "generalized likelihood ratio, " and total likelihood ratio with respect to a diffuse a priori density is deemed worthwhile; it may shed some light on the specific examples considered earlier from both the Bayesian viewpoint and the classical viewpoint. Of necessity, this discussion has to be from the Bayesian viewpoint, i.e., we must assume a known a priori density for discussion's sake.

119 The basic equation is Bayes rule, namely f(x, 0H) = f(xl0,H)g(0IH) = g(0lx,H)f(xlg,H) (4.127) where H is either N or SN. 4.4.1 Single Composite Hypothesis. No uncertainties exist in the N process for the single composite hypothesis. Hence, the conditional likelihood ratio is (xO ) f(xlO,SN) (4.128) f(xlN) Since f(xlN) is independent of 0, choosing 0 = 0(x) to maximize (x 0) is equivalent to maximizing f(x 0, SN). Furthermore, if we rewrite Eq. 4.127 for H = SN and g(0 ISN) = g(0), then f(x0, SN) g(0) = g(O Ix,SN) f(xlg,SN) (4.129) and if g(0) is diffuse or constant [i.e., g(0) =K for all 0 O 0], then choosing 0 = 0 (x) to maximize f(x 0, SN) is equivalent to maximizing g(0 Ix, SN). Hence, maximum likelihood, maximum conditional likelihood ratio, and maximum a posteriori estimators are equal with respect to a diffuse a priori density. UMP test Theorem. If there is a uniformly most powerful statistic t(x), then t(x) is sufficient for total (or average) likelihood

120 ratio independent of the a priori density. Proof. A uniformly most powerful statistic t(x) has the property that t(x) t 2(xIO) for each 0 e O (4.130) If t(x) > t(x2), then f(xl Il) > f(x21 ) for each 0 e. Since f(xlg) = f 2(xl0)g(0)dO (4.131) then (x1 Ig) > (x2 g) independent of the a priori density g(0). Q.E.D. Thus, the relation between a UMP test (when it exists) and average likelihood ratio is quite strong, and does not depend upon a diffuse a priori density. Generalized likelihood ratio Dividing Eq. 4.129 by f(xlN) yields /P(x ) g( ) = g(0 Ix, SN) 2(xlg) (4.132) Let 0 (x) be the maximum likelihood estimator of 0, and denote the "generalized likelihood ratio" by M(x) = t(xl0=O(x)) (4.133)

121 If g(O) is restricted to be a uniform or diffuse a priori density, then K M(x) K M (x) = (xlg) (4.134) g( (x) Ix, SN) Necessary condition Denote the inverse image of M(x) by Ra, i.e., let Ra = {xlM(x) =a} (4.135) A necessary condition for the relation M(x) ft (xIg) (4.136) to hold is that the inverse image of (xlg) must be the inverse image of M(x), i.e., we must have I(xlg) = a' for all xeR (4.137) This condition requires that g( (x) x, SN) = K( for all xe R (4.138) and necessarily implies that g(0 (x)lx, SN) must be a single-valued function of M(x). That is, we must have g( (x)lx,SN) = p[M(x)] for all x (4.139) so that

122 g( (x) x,SN) = p(oa) = constant for all xe R (4.140) Example The most well known example of this is the case of signal known except for carrier phase (Ref. 4) where the uniform a priori density yields d f(xlg) = e 2 I[r(x)] (4.141) d r(x)- 2 M(x) = e (4.142) d er(x) e M(x) g(0& (x)lx, SN) e M) (4.143) 27T I[r(x)] 2r Io [an M(x) + (] where d = normal detection index if 0 is known r(x) is the suitably normalized envelope of a matched filter output I0(') is the modified Bessel function of the first kind. Further necessary condition If the first necessary condition is met, i.e., if g(&(x)lx,SN) = p[M(x)] for all x (4.144)

123 then I(xlg) t M(x) is equivalent to M(x) t M(x) (4.145) p[M(x)] If the function p(.) is differentiable, then monotonicity of the above relation corresponds to dM [pMi > 0 for all M (4.146) or equivalently, p^M) > p'(M) for all M (4. 147) Example For the example considered above d e2 M p(M) = e M (4.148) 2r 1[In M + -] and d M[p ] > 0 corresponds to Io'[r(x)] > 0 (4.149) Since r(x) > 0 for all x and since Io'() > 0 for all positive values of its argument, [(xlg) t M(x) for this example.

124 Irrelevance This discussion is somewhat beside the point, because the "generalized likelihood ratio" is usually employed when computation of the total likelihood ratio (for any a priori density) is insurmountably difficult. If g(0 Ix, SN) were known, then f(xlg) could be found simply. (xlg) = M(x) g( (x)) (4.150) g(0 (x)Ix, SN) Although we may talk of total likelihood ratio and a posteriori densities arising from diffuse a priori densities, we must recognize that they will be difficult to obtain analytically in most cases where the -'generalized likelihood ratio" is employed. Therefore, let us discuss these topics in general terms; and pictorially, not rigorously, we turn to the double composite hypothesis case for this discussion. 4.4.2 Double Composite Hypothesis. Let 0 maximize H f(x I 0, H) and consider f(x 0SN SN) M(x) = L(XI0sN N) f(X, N) (4.151) SN af(xlre N) compared to

125 f f(x I, SN) g(O I SN) d t((xlg) =- (4.152) f f(xl0,N) g(0IN) dO If f(xl0,H) is "concentrated" in 0 near 0H for both hypotheses, which we intuitively feel is necessary for 0H to be a good estimator, we would be willing to use the estimate-and-plug procedure of "generalized likelihood ratio." Following the logic found so useful elsewhere in "ellipse of concentration" and "equivalent square bandwidth," let us try a "region of minimum size" such that f f(xI0,H)g(IH) f H) g(H) d = fH)d0 (4.153) O R(x, H) Then f(x I 0SN' SN) f g(0 I SN) dO f(xg) =R -— ~_ ---- W-R(x, SN) f(xl0N, g(IN) d R(x, N) f g( I SN) dO = M(x) R(x, SN) (4.154) f g(0 IN) dO R(x, N) This immediately suggests that if the regions R(x, SN) and R(x, N) are of comparable size under either hypothesis, and the a priori densities are constant and equal, the ratio of integrals will be unity. Under such conditions we would expect the "generalized likelihood

126 ratio" M(x) and the total likelihood ratio Q(xlg) to be nearly equal. In other words, when the a priori density is uniform or diffuse, and when the conditional observation densities are very sharp with respect to 0, and when the regions of concentration R(x, N) and R(x, SN) are of about equal size under both hypotheses, then the "generalized likelihood ratio" should be approximately equal to the total likelihood ratio (with respect to this diffuse a priori density). 4.5 Conclusions Both classical tests and Bayes tests base decisions on sufficient statistics. However, the decision regions are generally of a dissimilar nature and hence receiver operation and performance are not equivalent; but whenever Bayesian analysis is based on diffuse a priori densities, the Bayes test can be identified precisely with a classical test, at least for the examples considered. Since many of the classical tests can be obtained as a degenerate form of the Bayes test, the Bayes test is judged more flexible and representative of the actual operating state. Whereas it is a straightforward procedure to incorporate available knowledge (or ignorance) into the model by use of Bayes techniques, it is often difficult and sometimes impossible to construct a classical test that does. To a large extent the classical tests ignore available information concerning the true state of the unknown parameter(s), although

127 one redeeming feature of the classical approach is the emphasis on unbiased tests, an aspect not considered from the Bayesian standpoint.

CHAPTER V RECEIVER DESIGN VIA NUMERICAL INTEGRATION TECHNIQUE S 5.1 Motivation It is often difficult to determine the likelihood ratio for a composite hypothesis situation. When the analytical form of the likelihood ratio eludes the designer, the electronic circuitry that comprises the receiver itself surely cannot be built. In this section an alternative approach is suggested which circumvents these difficulties and conceives of the receiver as a digital computer. 5.2 Formulation When the conditional observation statistics are of exponential form, it is often possible to average with respect to the given a priori density g(o) by the use of an appropriate transformation and a finite order Gauss-Hermite or Gauss-Laguerre integration routine (Ref. 29). Performance must not be affected adversely by the error introduced in this manner. The normal detection situation falls into this class. The nth order multivariate normal density is n 1 t -I -2 -2(x-g) (x- a) f(x l,I) = (2711) ) e (5.1) 128

129 where Mg is the mean vector and' is the autocovariance matrix, i.e.,. = E(x) (5.2) = E[(x- I) (x- )t] (5.3) In particular consider the situation where both p. and I are known except for a multiplicative constant, i.e., i = a s (5.4) T = oa2 (5.5) where a and a2 are unknown parameters. If we define 0 =- (5.6) a2 n' hen 2 (-as) (x-a s) f(xla,0) = ( -- - e (5.7) 27r 1 z| 1) Furthermore, if we let t -1 d = S - s (nominal signal-to-noise ratio) (5.8) s x a = -- --- (minimum variance, unbiased (5.9) do estimator of a)

130 t -1 -1 ss W = Z - (5. 10) d then the conditional observation density can be written in the form n 0 dO -\2 - 2~ (a-a)2 2- tWx f(xla,O) = r e e (5.11) Gauss- Hermite integration Consider first the detection situation SKEA (signal known except for amplitude) with known noise parameter 0 and unknown signal parameter a. Let the admissible values of the unknown amplitude a range over R where R - (-x, oo); and let g(a) denote the given a priori density of a. The observation statistics conditional to SN are 00 f(xlSN) = f f(xla,SN) g(a) da -c0 (5.12) n 0 t do.300 / 2 x Wx -T(a- a)2 - cc (2S ^ )2 e Wx - e g(a) da If we make the change of variable u = 0d0 /2 (a-a) (5.13) then

131 n 0, -2 x Wx 1 f(x I SN) e d/2) 2 e-u ga + u ) du (5.14) - X v d0do/2 Let f*(x I SN) be an approximation to f(x I SN) obtained by using a pth order Gauss-Hermite integration routine to evaluate the integral inEq. 5.14. This gives n 2 - x Wx w P f*(xISN) = (\2 1 e - (0d /2) 2 wi g(a) (5.15) where 11. a. = a + (5.16) 1 Od 0 d/2 and wi, u. are pth order Gauss-Hermite integration constants. Gauss-Hermite integration is a natural choice since f(x la, SN) is of exponential form and a E R. The error depends upon the magnitude of the (2p)th derivative of g(a) over a eR. If g(a) were a polynomial of order less than 2p, the error would be zero; one might conjecture that the error would be small provided g(a) behaves like a polynomial in the vicinity of a (a is a minimum variance unbiased

132 estimator of the unknown amplitude a. ) The degree of error will be demonstrated to be minimal for a particular problem in succeeding sections. Gauss-Laguerre integration As a second example consider the detection situation SKE-UNL (signal known exactly, unknown noise level) with known signal parameter a and unknown noise parameter 0. Let the unknown parameter 0 have a given a priori density g(0) for 0 e [0, x)). The observation statistics conditional to SN are f(xISN) = J f(xI0,SN)g(0) d 0 (5.17) n 30 (2 ) - (x-as)t (x- as) 9 2e- g(O) dO If we make the change of variable = (x-as)t S1(x-as) (5.18) then n 2 [2v21 f(xl SN) (x - a s)t -1 (x- as) 0 27(x- a )t (x-1 a) -v 2v e g g -- 2} -- dv (5.19) _(x -a)t -1(x- a s

133 Presume that g(9) is of such a nature that analytical integration is difficult or impossible. An approximation to f(x I SN) can be obtained by employing a pth order Gauss-Laguerre integration routine, namely n -- n 2 P f*(xlSN)2 ) = ( )t ( r. g(0i) (5.20) (x-as) ~ (x-as) a = s where 2 v. 0. 1 (5.21)!(x- a s)t - (x- a s) and r v. are pth order Gauss-Laguerre integration constants. 1 For this detection situation a Gauss-Laguerre integration routine is the natural choice since f(xl 0, SN) is of exponential form and 0 > 0. The error is proportional to the magnitude of the (2p)th derivative of g( ) over 0 e [0, x)). If g( ) varies slowly over its region of definition, i.e., if g(0) is a diffuse a priori density, the error will be held within tolerable limits. 5.3 Signal Known Except for Amplitude Apply the procedure outlined in the previous section to the detection situation SKEA. Without loss of generality this study can be restricted to the one-dimensional case. (The integral of Eq. 5.14 requires knowledge of the n-dimensional observation vector x only

134 in the form of the scalar random variable.) Hence, consider the one-dimensional case with conditional observation statistics - ((X - a)2 1 292 f(xla,SN) = e -22 < x< o (5.22) x2 X f(xla,N) = 1 e -x< x< (5.23) v2 a and a priori density I (a-m)2 g(a) 1 e -v - <a< x (5.24) 5.3.1 Receiver Design. It is possible to determine the observation statistics and the total likelihood ratio of the observation for the above example since g(a) is amenable to analytical integration. The availability of the exact form of the observation statistics and the total likelihood ratio makes it possible to evaluate the usefulness of numerical integration techniques for receiver design in composite hypothesis situations. Exact approach The exact observation statistics are

135 00 f(x SN) = f(x I a, SN) g(a) da -30 -_ 1- (x-m)2 = 1 e 2(v+a2) (5.25) 27r(v + 2) x2 1 2c2 f(xlN) = f(xla,N) = 1 e (5.26) V2ara2 and the likelihood ratio is 2+i (x+-) rn ^^ Ti- 2v Lp+ p ~(x~) = ~ + 2(e J m _(5.27) where = v (5.28) a2 Approximate approach If analytical integration had been impossible, a numerical integration routine could have been employed to obtain an approximate form of the observation density f(x I SN). In particular,

136 3C f(x SN) = f f(x I a, SN) g(a) da -3C - - (- a)2 = S 1 e 22 g(a) da (5.29) If we make the change of variable u = - (a-x) (5.30) A2 a then f(x SN) e 1 U g(x + Va u) du (5.31) An approximation to f(x I SN) via Gauss-Hermite integration techniques is P f*(xISN) = g(x + 2 a u.) - i i=11 7T V2v i=l where wi, u. are pth order Gauss-Hermite integration constants. For completeness define

137 1 2oxZ f*(xlN) = f(xlN) = e a (5.33) V2-v a *(x) = f*(x I SN) (5.34) f*(x IN) 5.3.2 Receiver Realization. If f(x) were unavailable because of integration difficulties, an alternative approach would be to base decisions on f*(x). The receiver would then consist essentially of a digital computer with two channels as depicted in Fig. 5.1. The one channel computes f*(xlSN) while the other computes f*(x N). The outputs of the two channels are combined to form the natural logarithm of P*(x), namely z*(x); and a decision as to presence or absence of signal is made by thresholding z*(x). 5.3.3 Quality Index. One would like some assurance that the quality of decisions would not suffer drastically when decisions are based on f*(x) instead of 2(x), or equivalently when decisions are based on z*(x) instead of z(x) where z*(x) = n * (x) (5.35) z(x) = fn (x) (5.36) Development An indication of the "closeness" of z*(x) to z(x) can be obtained via the index

138 0 Co Co CO M 0; ~ ~,U 0I.I io co 4, 0 0 ~~Cd) F:~~~~~C NC (D 0 Pi 0 05 0 X O v~s/I EI. i + * T h P^ ^?H 0~~3 IQ a,^^ — ^ c T s! ^~~o X~~~~~~~~~~~~

139 J A E[e(x)ISN] - e(x) f(xISN)dx (5.37) X where (x) = z*(x) - z(x) (5.38) This index will be referred to as the quality index and is based on information theoretic concepts analogous to those employed to obtain the sensitivity index developed in Chapter III. It is "obvious" that there is little or no degradation in performance whenever the quality index J is close to zero. Application For the detection problem SKEA as developed in the preceding sections c (x) = z*(x) - z(x) = n (x) f* (x I SN) (539) E= fn(o (5.39) n f(xlSN) Evaluation of the quality index yields

140 Oc J = f ( (x (xlSN)dx - 1 (x-m)2 f 1 2(v+a2) e(x) e v dx (5.40) -O -xC 42ir(v+ f2) If we make the transformation x-m y = (5.41) then y2 1 2 f(y ISN) e (5.42) and J = E[5(y)ISN] = 6(y) f(yISN) dy (5.43) -Do where 6(y) = e( + (5.44) Evaluating Eq. 5.44 via Eqs. 5.25, 5.28, 5.32 and 5.39 and simplifying yields (y) = n p 1 [y + 2 /2+p) uy + 2uia 56(y) = fnV L(P i=l 45) (5.45)

141 Since 6(y) depends on the parameters v, a2 and m only through p, the quality index J depends only on p. An inspection of 6(y) will be helpful in understanding the behavior of J; knowledge about the nature of 6(y) will also assist in keeping computational errors to a minimum. Figures 5.2 and 5.3 present plots of 6(y) for typical parameter values of p and p. It can readily be shown by considering the nature of the Gauss-Hermite integration constants u. that 6(y) is an even function of y, i.e., 6(y) = 6(-y) (5.46) Also, 6(y) - x as lyl - x although this doesn't affect the existence of J since f(ylSN) is of exponential form. In addition, 6(y) oscillates about the axis; the number of zero crossings is 2p. Furthermore, 6(y) stays closer and closer to the axis for a longer and longer period as p increases for fixed values of p or for p increasing for fixed values of p. Results Numerical integration techniques were employed to evaluate the quality index J for specific values of p and p. The results are presented in Fig. 5.4. For fixed values of p the quality index J approaches zero as p increases; this indicates that z*(x) is a better and better approximation to z(x) as p increases. For fixed values of p the quality index J approaches zero as p increases;

142 P==P= _.n.P-A v 4, 5l M -0... p=5.H

143 0o II iI c0 OO~ ~ ~ ~ ~ ~ ~ ~ ~ v 0 H-) LO LIi

144 04 -0 oo II^ A~~~~~~~a T-<II ~ ~ ~ ~ ~ ~ ~ - II F~

145 this shows that as the a priori density g(a) becomes more and more diffuse z*(x) becomes a better and better approximation to z(x). The fact that the best results are obtained for large values of p should not be construed to mean that the entire procedure is of little significance. On the contrary, diffuse a priori densities are used to develop receiver design whenever the designer is totally ignorant of the true state of the unknown parameter, or the physical situation dictates that the unknown parameter may range over a wide latitude of equally permissible or probable values. The validity of the results described above can be ascertained by examining the variance of 6(y), namely v. = Var[6(y)ISN] = [6(y)-J]2 f(yISN)dx (5.47) -xC This quantity is plotted in Fig. 5. 5 for the same parameter values as in Fig. 5.4. It shows the same gross characteristics as Fig. 5.4 and should be interpreted as follows: the degree of confidence one places in the quality index J should increase as both p and p increase. In other words the validity of the above- mentioned results should not be questioned for p and/or p large. 5.3.4 Receiver Performance. The recommended procedure for comparing receiver performance considers first the performance of the optimum likelihood ratio receiver [decisions are based on J(x)]

146 0 0ll r'Lo~~~~~~~~~~~~~(o C C14QC co ~c';i c~ L C' Co oo Lr\ II II II II II II Ii II a aa aaa a a ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*H cu U~~~~~~~~~( - - T-<~ ~ ~ ~ ~ ~ ~~~~e

147 and then compares it to the performance of the suboptimum receiver which bases its decision on f*(x). The only difficulty lies in the fact that the decision region of the suboptirr,um receiver for an arbitrary threshold level is difficult to determine analytically; and hence performance in terms of ROC is difficult to evaluate. An alternative procedure is to consider the optimum likelihood ratio receiver in a slightly different context. Suppose decisions are based on f(x), but evaluation of performance is done with respect to f(xlSN) and f(xlN) in addition to f*(xlSN) and f*(xlN). A schematic diagram of this procedure is given in Fig. 5.6. Receiver operation: x(t,a) Receive Comparator Decision g(a) Receiver evaluation: 1) With respect to f(xlSN) and f(xIN) 2) With respect to f*(xlSN) and f*(xlN) Figure 5.6. Receiver operation and performance evaluation structure.

148 Neither ROC is superior to the other in any sense, but closeness of ROC's will indicate closeness of f*(xlSN) and f*(x N) to f(x SN) and f(xlN). For the example at hand, closeness of ROC's will necessarily indicate closeness of f*(xlSN) to f(xlSN) since only an unknown signal parameter is involved (single composite hypothesis). Evaluation of receiver performance for either case is based on the decision region {x I(x) > 3} where 2(x) is given by Eq. 5.27. Some algebraic manipulation reveals that {xl (x)>3} - {xl x + - A p where A = p 2vn(3p +l) + )m2 (5.48) Receiver performance with respect to f(x SN) and f(x I N) Evaluation of receiver performance with respect to f(x I SN) and f(x N) yields P(AIN) f f(xIN) dx Ix + | > p m P = 1- f f(xIN) dx m = (-A' - m')+ I(-A' + m') (5.49)

149 where \.~ = — /v~ ~(5.50) m m = m (5.51) Pa.- v *O /pv Likewise, P(A I SN) = f(x I SN) dx Atm p = 1 - f f(xlSN) dx m -A _ P { in +)-~ m' + (- + m P' \ /p+l \ p+ i (5. 52) Receiver performance with respect to f*(xlSN) and f*(xlN) Evaluation of receiver performance with respect to f*(x SN) and f*(x N) yields P*(AIN) = f f*(xiN) dx X+ m4 > A xx+-!> A = (-A' - m') + (-A' + m') (5.53)

150 and P*(AISN)= f f*(x SN)dx x+m > x+~T > A m. p -- 1 P -2(x- m + au.)2 f1 -J w. e dx m 7T Vv"g i=l P p 1 - w [ W (A" - mi")- )(-A" - mi.")] (5. 54) 17; i=1 where AfAA,. = & = (5. 5) (p+l) m' - 2 u. m." = -i = 1,2,..., p (5.56) Results Results are shown in Figs. 5. 7 and 5.8. ROC's are presented only for p = 2. For p > 2 the difference in ROC's became indiscernible, and for all intents and purposes the ROC's were equivalent. Observe that even for p = 2 the difference in ROC's is negligible for p > 5. This would indicate that diffuse a priori densities and numerical integration techniques blend well together.

151.99 95valuated with respect to 2=2.01.05.10.50.80.90 ^ ^^ //.10 Evaluated with respect to fP((xSN) f(x)N) Figure 5.7. Effect of nmerical integration tEvaluated with respect to mance, Sf*(xASN), f*p =, p (x= 2N) 1 -(aa-m)2 /27F P 1 p = 2.01.05.10.50.80 90 P(A 1N) Figure 5.7. Effect of numerical integration techniques on receiver performance, SKEA, p = I, p = 2.

152.99 m = 4 \ 7 7.10 - / / > j~~ I - - ~~~~~~~Evaluated with respect to.- o o / --.05- 1 E (a-m)2 V22v a2 = 1.0 p =2.01 I I I I.01.05.10.50.80.90 P(A I N) Figure 5.8. Effect of numerical integration techniques on receiver performance, SK~EA, a2 = 1, p = 2. mance, SKEA, a2 = 1, p = 2.

153 5.4 Conclusions The quality index for the detection situation SKEA shows an almost imperceptible difference in quality of decisions for p > 1 when the receiver is designed via Gauss-Hermite integration techniques in lieu of exact analysis. Hence, it appears that use of numerical integration techniques and access to a high speed digital computer is all that is required to design a receiver for the composite hypothesis situation that performs almost as well as the optimum receiver. Likewise, comparison of receiver performance in terms of ROC indicated little discernible difference in performance for p > 2. Closeness of the ROC's indicated closeness of f*(xlSN) to f(xlSN) and hence the appropriateness of Gauss-Hermite integration techniques. Thus, it would appear reasonable to use these same techniques for other composite hypothesis detection situations with the assurance that degradation of performance is minimal or even imperceptible. In view of the fact that a priori densities are subjective in nature and never really known precisely, use of numerical integration techniques to realize the receiver is really no concession to performance at all.

CHAPTER VI PSEUDO - ESTIMATION 6. 1 Introduction In this section an attempt is made to justify the use of estimation techniques for single composite hypothesis receiver design situations by constructing an "estimator" which yields optimum performance in the Bayesian sense. It is not an estimator in the true sense of the word but nevertheless exhibits features and characteristics that are considered desirable of an estimator. The motivation for this work came from a recent paper published by Kailath (Ref. 33) which shows that the likelihood ratio for the detection of a random signal in additive white Gaussian noise has the same form as that for a known signal in white Gaussian noise. However, the correlation integral has to be interpreted in the special sense of an Ito stochastic integral. Nevertheless, it suggests the use of an estimator-correlator as an engineering approximation to the optimum receiver. 6.2 Formulation A word of explanation is in order. Because this chapter will be intimately concerned with a sequential processing of the observation, the notation used in this chapter will be different but self-explanatory. Rather than use a succinct notation which often leads to confusion, dependence of the observation statistics on the preceding observation 1 54

155 will be explicitly indicated. Development of the basic theory Assume the observation statistics are conditionally independent, and let 0 denote a signal parameter with a continuous a priori density g(0). The conditional observation statistics of the total observation are n f(xl x2.. x 10,SN) = II f(xi,SN) (6.1) i=l n f(x,x2,.. x10,N) = II f(x iN) (6.2) 1~ Z~ n ~ i=l 1 and the conditional likelihood ratio of the total observation is f(x x2,... nl, SN) n (xl x2... Xnl 0) = -= n (xi 0) 1P2O n f(x1x2,.. x IN) nf(X1X2, xn) i=l (6.3) where f(x.l0, SN) (xi f(xI N) i =1,2,... n (6.4) is the conditional likelihood ratio of a single observation. By employing Bayes law we can write the absolute observation statistics of the total observation conditional to SN as

156 f(x1,x2,... x 0, SN) g(0) f(I,x2,... x ISN) = g( x,) (6.5) g(01xl,x2,... x ) and the likelihood ratio of the total observation is f(x1 x2,.. x ISN) f (XV X 2... Xn) = (xX2 "X) f(x1x 2,... x N) (6.6) g(0) II (x.il) g( Ix1,x2,... xn) i= 1 (The a posteriori densities depend implicitly on the hypothesis SN since 0 is a signal parameter; this dependence is not expressed since it only serves to complicate the notation.) Although it appears that the RHS of Eq. 6. 6 depends on 0, it is in fact independent of 0 since the LHS of Eq. 6. 6 is independent of 0. Let's inspect the RHS of Eq. 6. 6 in more detail. Since g(0) (0) ( Ix1) g(0 Ix1, X2 X.n- ) g(0lXl, X2..Xn) g(0 Ix ) g(0 lxl,x2) g(O Ixlx2, 2 Xn) n g(1 IxX 2 1 Xi-' n n g(0 lx,x2,... xi1) (6.7) II n -- - -- ^(6.7) i=l g( Ix1,x2,... xi) then n g(0 lxx2,... x.l) (XlX2i,...) =X ) n2 1 ~(xil0) (6.8) i=l g(0 lx1,x2,... xi)

157 If we define Z(1,X2, *. xn) -= n n f(x1,, x2... x ) (6.9) then n n g(0 Ixx2,... xi1) Z(X' x2... xn) = z(xi 0) + E fn XIX 2 i) i=l i=l g(0lx1,x2,... x i) (6.10) where z(xil0) = fnf(xilO) (6.11) If it is possible to determine i. such that g(Oil1X,2'.. Xi x1) 1( i. —- 2> = i i=1,2,... n (6.12) then n z(xx2,... xn) = i z(xil0i) (6.13) i=1 Therefore the optimum (Bayesian sense) test statistic z(x1, x2,... Xn) can be obtained from the conditionally optimum test statistics z(xil 0i), i = 1, 2,... n with 0i determined in accordance with Eq. 6.12. The simplicity of this approach is enticing. Merely determine a set of values 0 satisfying Eq. 6.12 and use these values in the conditionally optimum receiver as though they were exact. For the

158 detection problem SKEA in added white Gaussian noise, the optimum receiver becomes an "estimator-correlator, "a correlator in the sense that the optimum receiver would be a correlator if the signal amplitude were known exactly, an estimator in the sense that "estimators" of signal amplitude at each increment in time are employed in the receiver as though they were exact. Observations Consider the solution(s) of Eq. 6.12 in reference to Fig. 6.1. There may be multiple solutions to Eq. 6.12 for each i; there is at least one solution since the a posteriori densities are continuous by virtue of the assumption that the a priori density was continuous. Each value of 0i depends upon the first i observations (xl, x2,... xi) and might conceivably be interpreted as a posterior estimator of 0 at time t. 1 g(e Ix, x2,... xi) t t two possible values of 6i Figure 6.1. Sketch of a posteriori probability density functions.

159 Multiple solutions to Eq. 6.12 could cause some confusion in regard to the interpretation of i. as an estimator of 0, but possibly a particular solution 0 * exists which behaves more like a true estimator than the other solutions. To explore this possibility define the posterior mean Ai =E(0 XX2,... xi) (6.14) and the posterior variance Vi = Var(0 Ix1,x2.. xi) (6.15) It is common knowledge that the posterior mean /xi is the MMSE (minimum mean square error) estimator of 0. The posterior mean Ai is an unbiased estimator of 0 if E(/iI.00) = 60 where 0 is the true value of 0 Conjecture Conditional to the hypothesis signal and noise if the a posteriori densities are unimodal and if v. < v. then it is conjectured that a 1 particular solution 0 * exists (the pseudo-estimator or "estimator") which converges to the MMSE estimator /i' i.e., 0.i* - i, or i 1 i o more precisely, given 6> 0 there exists i0 such that I1 0*. <6 for i > i. Furthermore, /Li- 00, provided P(IO -01< eIxi)> for arbitrary e > 0, regardless of whether or not /.i is biased, and hence 0 - 0; if ]Li is biased, then 0 i..i more rapidly than i- 00

160 Explanation A heuristic explanation is offered in lieu of a rigorous proof. If v. < v. the a posteriori densities become "sharper" as more observations are taken into account; the mode and the mean become almost indistinguishable for "sharp" densities and become identical in the limit since v. O. In addition, for each i the solutions 0. 1 either tend to be clustered about gli (usually around the mode) or else one solution is close to ALi (usually between posterior modes) while the others tend to be on the "tails". Hence, as soon as the posterior variance reaches a fairly low level, i.e., after a sufficient number of observations, there is always one reasonable "estimator" of 0 near the posterior mean. If we denote the solution closest to the posterior mean by i*, then it seems plausible that 0. i* - i.. Furthermore, provided P(l 0 - 0 < e xi) > 0 for arbitrary e > 0, A. - 00 regardless of whether or not ji is biased since vi - 0 and the a posteriori densities are unimodal by assumption. If p.. were biased, it appears reasonable that 0 * - L. more rapidly than *. 0; it may require many observations for the effect of the bias to "wear off, " especially if the a priori density was of a diffuse nature. In either case both J.i and 0i* converge to 0 The convergence of the pseudo-estimator 0. to the MMSE estimator J.i will be demonstrated in the following example.

161 6. 3 Signal Known Except for Amplitude in Added White Gaussian Noise Consider the detection situation SKEA in added white Gaussian noise and attempt a comparison between the optimal Bayesian approach and estimation. The sequential nature of the problem makes it convenient to use the notation = (1x2I *. xi) (6.16) = (sS,... i) i==1,2,... n (6.17) Let I denote the nth order identity matrix. n - The hypothesis test for SKEA in added white Gaussian noise consists of N: x MVN (a s, a2 I ) -n n -n n SN: xn MVN (0, a2 I ) where the unknown amplitude is denoted by a. A priori density. Let the a priori density on a be given by the truncated normal density (a- m)2 2v g(ax0, SN) = m e (6.18) (CD )O 20a< x O <a<

162 To facilitate subsequent algebraic manipulation, it is convenient to express the a priori density in the equivalent form Aa2 + Ba g(alx, SN) = K eAa +Ba O<a< x (6.19) where A = - 2V (6.20) B = m (6.21) K 1 (6.22) and w( x) (6.23) The parameters m and v can be expressed in terms of A and B as m = - 2A (6.24) 1 v 2A (60 25) Conditional log-likelihood ratio. It has been shown previously that the conditional log-likelihood ratio of the total observation for this situation is

163 a) ( a- sn z(Xn 2 -n - - 2 - nt - n 2n 2 ay n- adn d(6.26) where n 2 - n-n a2 i= 1 a a2 i=1 The sequential realization of the conditional log-likelihood ratio is n z(x la) = z z(xa) (6.29) -n i=l 1 i=l where z(x la) = a( i) -2 (1) (6.30) 1 2 2 \a2 A posteriori density The a posteriori density is proportional to the product of the conditional likelihood ratio and the a priori density, i.e., a2 d. 1 a-\~ 2 Aa2 - Ba g(alxi, SN) = C j(xila)g(a) = C' e e (6.31) O<a< xc

164 Since g(a Ixi, SN) is a proper density, we normalize and obtain A.a2 + B.a g(alx.,SN) = K. e <a< oc (6.32) where d. A.=A-1 _ 1 -1 AI = A -- (v + di) (6.33) -1 Bi = B+ y = m v +i (6.34) K. (6.35) Ki = /mi\ (635) 1 V(1 ) ti - and B. mv +y. m. 2A -1 (6.36) 1 v + d. i 1 Vi 2. -1 (6.37) 1 2A. - 1 v + d. 1 The a posteriori density can perhaps be more easily interpreted in the form! -r (a- mi)2 g(alx, SN) e < a < x l22 i (6.38)

165 In this form it is readily apparent that the a posteriori density (Eq. 6.38) is of the same functional form as the a priori density (Eq. 6.18). Hence the a priori density is a reproducing density with respect to the conditional observation statistics. MMSE estimator The MMSE estimator of a after i observations is the posterior mean a. = E(alx SN) 1 ia g(a xi, SN) da = m. 1 + (6.39) 1 1 i Pseudo- estimator The posterior variance v. is difficult to compute; but it can be shown that the posterior variance v. converges asymptotically to v., i v.e., v. v. In addition, it can readily be shown that v. < v.; Vi 1 1 1 i-I and hence it is plausible that vi < Vi. Furthermore, the a posteriori densities (Eq. 6.38) are unimodal for each i [mode = max (O, mi)]. Since P(a < Olxi) = 0, asymptotic convergence of either the MMSE estimator or the pseudo-estimator to the true amplitude a will occur only if 0 <a < x. To obtain the set of pseudo-estimators for this detection

166 situation we determine that g(a lxi, SN) - --- = 1 0<a< c (6.40) g(a x, SN) implies that A.a2+B.a A. la2 + B a 1 1 i-I K. e = K e 1 0<a< x (6.41) or equivalently, K. a2(A.-Ai )+a(Bi-Bi ) + fn Ki 0 0<a< (6.42) Since s.2 A -A = (6.43) 1 1-1 2 and x. S. 1 I B.- B = 1 (6.44) 1 i-1 z2 Equation 6.42 becomes a2- 2a + c. = 0 0<a< x (6.45) where

167 C = 2 n K.t) (6.46) 1 The pseudo-estimator(s) at time t. is the solution(s) of Eq. 6.45. If we let or. = (Xi ~ x /x2 + 2a2 n 2 - i = 1, 2,... n (6.47) 1i K1 and denote the pseudo-estimator(s) by ai, we get a. = a. 0 <. < x, i = 1, 2,... n (6.48) 1 1 1 Since the a posteriori densities are continuous, there is at least one valid pseudo-estimator. 6.4 Receiver Operation A simulation of receiver operation was conducted conditional to some actual amplitude value a. The signal waveform was chosen to be a dc signal with an energy content of unity over a total observation length of 20 time intervals. In order to observe the convergence of the pseudo-estimator described by the solution of Eq. 6.40 to some asymptotic value, the simulated run included calculation of the posterior mean (MMSE estimator) at each increment in time. These values along with the admissible pseudo-estimators are presented in the figures at each time increment as bars for improved legibility.

168 In addition the detection output z(x20) is shown on each figure to indicate the receiver's opinion as to presence or absence of a signal at the end of the observation interval. Naturally a high value is indicative of signal presence whereas a low value tends to indicate absence of a signal. The simulated runs are summarized in Figs. 6.2 - 6.7 for three radically different information levels. Figures 6.2 and 6.3 summarize the receiver operation for a diffuse a priori density while Figs. 6.4 and 6. 5 summarize receiver operation for precise knowledge. Figures 6. 6 and 6.7 present a simulation of receiver operation for an intermediate information level. Diffuse information level Receiver operation of Figs. 6.2 and 6.3 is parameterized by m = 2 and v = 100. This represents an a priori density that is rather diffuse. In other words the receiver designer did not have a great deal of information available concerning the true value of the unknown amplitude. His ignorance necessarily forced the receiver to weigh the actual observations more heavily than the prior opinion of the designer and consequently the simulated run and detection output are heavily influenced by the data. This becomes readily apparent by inspection of the figures. Figures 6. 2 and 6. 3 both show the disparity between the pseudo-estimator and the posterior mean at the beginning of the observation interval although convergence of

169 1) -rl Cd Cd 0lf~~~ ~ Cl gq ~,0~~~ 0 Cl) a) Q.4 Id O co (0 EaV4 IId o4 c 4 5 ~- c~ N rI co rd I II o I HCo I I. II X G~J G\JI T1 4 I +C) r -- I I I SI 0 I I I <c o P0 k I 0 CH |II o04 I HO I a) 0) L ioI o -I.,-I ^-ycl ------ I —--------------------- o a0

170 F= a) 0 4-' 06 ~~~~0 00 0z co a - a, ^ ^ ~ a -. o ^~~~~~~~~~~~~c 4 X I IX CM -5 S a) ^~~~~~~~I I!: Cd Cd *0 o <o Q)" ^ c| CMSo 9 cq $4 3?3 ^S M I I C2 a) C I I - P 02 0~~~~~~~~~~~~~~~~~~~~C )0 P.d 1^ Cl0 I I' ~~~~~~~~~~~II I'- 0) | I 3~~~~~~~~~~~~~~~~~~~~~~~~~4-' Ea I I r~~~~~~~~~~~~-4 0 CD 0 co a) ^1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~1 pi!q c co'IIIk C) P4i' — ~~~~~~~~~~~~~~~~~~~~~~~~~~~00 w a) cd Cd 0)H F4 4 I ~H(D 1. I4 - O *H cr3 ii~~~~~~~~~~~~~~~~~~~~~~~~~ a) I~~~~~~~~~~~~~0 I 02 a) J. t' Q) >~~~~~~~~~~~~ a*H ~~~~~~~~~~~~~oj 43 ed GV3 R r_~ ~ ~ *" (5111L I1I *H * ~~C CD Ln C~~~3 GV \O.~I c/ I <D~~~~~~ I M~~~~~~~~~~~~~~~~~~a.grl F^ CR

171 the pseudo-estimator to the posterior mean has pretty well taken place at the end of the observation interval independent of the true amplitude. Note however that the posterior mean has not yet converged to the true amplitude. In both Figs. 6.2 and 6.3 a general inclination of the posterior mean to converge to the true amplitude already appears, but in both cases convergence is not yet apparent. This tends to indicate that the pseudo-estimator is in fact closely related to the posterior mean long before the posterior mean has converged to the true amplitude. This observation lends some credence to the use of estimation techniques as nearly optimal procedures for composite hypothesis detection and/or estimation problems. Precise information level When the receiver designer has some very precise opinions and/or knowledge concerning the true amplitude, the actual physical situation had better bear him out or else his job may be in jeopardy. Precise opinions are reflected in the a priori density in terms of small variances, and heavily influence the operation of the receiver. Figures 6.4 and 6. 5 show the dominant effect of precise prior opinions and the insignificant effect of the actual data on the simulated run. The receiver is firmly convinced that it knows the actual amplitude and acts accordingly regardless of the data. Hence, precise opinions must be corroborated by the actual situation in order to yield optimal behavior.

172 Cd $- N*~~~~~ v~~~~~*0. 4' O I.I I o -rro b CCS O tie co Cf s - O (d c ( a P4 Co CCE-PJ V 0 - II C D O -P 0) C~ r1 o ~ L"- IO.S',~O,3 (

173 U) a, o 0 ^,,4 11. o0 a) 4-4, rd$ X4- _ N >I Cdo ~ 4, (1 a) c o P 0 1 0' (I I -* ~.4 10 0 c) CH 0.r4 I~ +3~~~o I ~ ~ ~ I * ~ t A.i4 ) 0 ji 0 U0) 4 I.I * OO D H0 0 a ) EQ ^ _______________ ) ________________ __________ | _______________ ( ______________________________ * ________________Qf - *''*'*~~~~~~~~~~~~~~~~d ~ = (D

174 Intermediate information level For the intermediate level of knowledge the results of Figs. 6.6 and 6.7 show a "middle of the road" policy. Receiver operation is influenced by both prior knowledge and data. The effect of prior knowledge is most noticeable at the beginning of the observation interval while the effect of the actual data outweighs a priori knowledge at the end of the observation interval. Hence the receiver has obtained a delicate balance between the two extremes of knowledge, and yet its action can be closely allied to the action of a receiver which is designed on the basis of estimation techniques.

175 03 00 0 o c~~C~~~~~~f~ ~II 0 C. l d ffi c I o $-4 0 CI 0r w-. t ) U2 I I3 I 8- * X'0 a.5 Ic x ^ I o Ii 0 I, I, CI o I< (^ I I @Io0 0 * 0 I;o 4 I H ~ o0 I I. II oH0I a) C d I. ~~~~I 1.-~~~ 4 a ) (3 0y-4 0 P^C

176 ~\ o') 0.ld Cd *0 4 O al x 11 A11 0; * II 4I (1) I k A o II I'I.I I Cl >) < 0 I. *'I I Cl) E M, r1s iI I I 4 * I I3, 11 11 11 11 I IU c o ~ a) 0NO 0 &t f I Md Ii I 00 I n-I i i ^ I 0 iI 0 tOI' 0 I ~. Q I. <y >~~~~~~~~~~~C I > <L~~~~~

CHAPTER VII SUMMARY 7. 1 Conclusions One of the major goals of this work was to examine the effect of the particular choice of a priori density on receiver performance for the composite hypothesis detection problem. This study was conducted by constructing a sensitivity index which measures the performance loss that occurs when the receiver is based on the a priori density g () but in fact h(.) holds. The sensitivity index was shown to possess several desirable features. The exact form of the a priori density is never known precisely since the a priori density reflects the subjective viewpoint of the designer. There is a lot of leeway in choosing both the functional form and the parameters of the a priori density. The sensitivity index was employed to show that performance depends primarily on the first few moments of the a priori density. Functional form was important only to the extent that the entire range of parameter definition was modeled. The design of a receiver which accurately reflects and incorporates the information at hand can be accomplished by choosing an a priori density whose mean corresponds to the designer's confidence level in his choice of the mean. 177

178 Comparison of receiver performance for the composite hypothesis situation was done in Chapter IV. The Bayesian philosophy was contrasted with the classical approach for several examples. The Bayesian approach to detection problems is more versatile than the classical approach. The Bayesian approach inherently incorporates all prior information into the model by the assignment of an a priori density. By using a degenerate form of the a priori density (a diffuse a priori density), the Bayesian approach leads to the identical receiver obtained by at least one of the methods of classical statistics for all the examples considered. In particular it appears as though the UMPU test of classical statistics (if it exists) is equivalent to the Bayes test corresponding to a diffuse a priori density for the single composite hypothesis. No such relationship exists for the double composite hypothesis. Emphasis upon unbiasedness by classical statistics is certainly desirable; Bayesian analysis can and often does result in a test which is not unbiased. The nature of the Bayes model should, however, preclude the notion that Bayes tests are "bad" since biased tests seldom (with small probability) occur. The concept of an ESP receiver was useful in comparing Bayes tests. The ESP ROC served as a convenient upper bound ROC to the optimum ROC with respect to the assigned a priori density and thereby served to indicate how much useful information was conveyed by the a priori density. The ESP ROC was shown to be the average of

179 conditionally optimum ROC's whereas the optimum ROC was shown to be the average of conditionally suboptimum ROC's. Hence the ESP and the optimum ROC can be viewed as "capsule" summaries of the performance of the conditionally optimum and suboptimum receivers respectively. Receiver design via numerical integration techniques was proposed in Chapter V for composite hypothesis situations whenever exact analysis was difficult or impossible. It was shown that by using a numerical method which was "matched" to the detection situation, a receiver design could be achieved that was almost optimal (in the Bayesian sense) and yet was readily implemented. The receiver is conceived of as an analog-to-digital converter in conjunction with a digital computer. The availability of high speed electronic components makes it possible to operate such a device in real time. In Chapter VI a study was conducted which attempted to justify estimation as a technique which is almost optimal in the Bayesian sense. The impetus for this study was provided by a recent paper published by Kailath (Ref. 33). For the single composite hypothesis a pseudo-estimator is constructed which is optimal in the Bayesian sense and compared to the minimum mean square error estimator. Simulation of receiver operation for a particular example shows a close correlation between estimation techniques and exact Bayesian analysis and lends credence to the use of estimate-and-plug techniques as "good" techniques.

180 7.2 Contributions A sensitivity index was developed to measure the performance loss that occurs when a receiver is designed to be optimum with respect to the given a priori density g (.), but in fact the a priori density h (.) is considered to hold; the important features of the sensitivity index were discussed. The sensitivity index was used to show that the functional form of the a priori density chosen to model the state of the unknown parameter of a composite hypothesis detection problem has little effect on receiver performance. Of prime importance are the first few moments of the a priori density, especially the mean and variance. The designer should choose the mean to be that value which, in his opinion, most accurately represents the state of the parameter; and the variance should be chosen to reflect the designer's confidence level (how sure is he?) in his choice of the mean. Hence, the functional form of the a priori density can be chosen to be mathematically tractable. Bayesian analysis and classical statistics were compared and reconciled. It was shown that Bayesian analysis was more versatile and that one or more of the classical tests could often be obtained by a Bayes test based on a diffuse a priori density. The Bayesian approach is judged to be superior to the classical approach primarily because the Bayesian approach incorporates all available information into the model via a priori densities; classical procedures ignore such information.

181 The externally sensed parameter (ESP) receiver was reviewed and its ROC was evaluated for several examples never considered previously. Its function as an upper bound ROC and its relationship to the conditionally optimum ROC's was pointed out. The feasibility of receiver design via numerical integration techniques was shown. Conceived of as a digital computer, the receiver is nearly optimal in performance with little increase in complexity. The numerical procedure is especially adept (little or no error) at handling a priori densities which are of a diffuse nature. Hence, receiver design for composite hypothesis situations has been demonstrated to be feasible for many problems previously considered too complex to solve. Estimation techniques for single composite hypothesis decision problems have been shown to correspond closely to optimal Bayesian analysis by the construction of a pseudo-estimator. Conditional to the hypothesis signal and noise this pseudo-estimator was shown to exhibit many features considered desirable of an estimator and to asymptotically approach the MMSE (minimum mean square error) estimator. The close correlation between the pseudo-estimator and the MMSE estimator serves to point out that receiver design via estimation techniques can surely be considered justifiable in case the optimal (Bayesian) procedure is too complex.

APPENDIX A NATURE OF THE SET R0 This appendix will determine the nature of the set R0 where RO= -{ro0r(O) > 0, 0< < o, r01l} (A.1) and P r(O) = r + (1-r0) r di(co)i (A.2) i=1 with d. independent of c Independence of c. If we let = co (A. 3) s(0) = r ) (A.4) then R = rls(0) > 0, 0<0< o, r01)} (A.5) Since s(0) is independent of c, R0 is also independent of c. Range of permissible values. Without loss of generality, let c = 1 so that 182

183 P r(o) = + (1- r) ] di 0 (A. 6) i=l 1 Manipulating Eq. A. 6 we get r(O) = (1-r0) [d0+ r(0 )] (A. 7) where r d = (A.8) 0 1-r0 P -0(0) = di 0 (A.9) i=1 Then r(0) > 0 for 0e[O,x) implies d0 + r 0() > 0 and O<r0 < 1 if d > O or d0 + r(0) > O and r0 > 1 if d < 0 (A.10) The extreme permissible value of r0 can be determined by choosing do to be an extremum. This is easily accomplished by choosing min r0(0) d > 0 0 E [0, o) d* = (A.11) max r0(0) d < 0 6 e [0, xO)

184 Then the extreme permissible value r* is d * r0* - d 0(A. 12) do +1 and R = {r1r0*<r < 1} d > 0 (A. 13) R = ro1 < rO <r d <0 0 {r011 rr0 } dp

APPENDIX B EXTREME VALUES OF r In this appendix we will calculate the extreme value r0* for p= 2 and p = 3 It was shown in Appendix A that d* r0* = ~ (B.1) 0 d t + 1 d0 +1 where m in r0(0) d> 0 p ~ 0e[o, X) d<* = / B.2) max r (0 d < 0 i| ~ E[o0, O)o P r0(O):= d. 1 (B.3) i=l The coefficients d. were derived in Appendix J and are d = (dl, d2,... dp) = r (B.4) where 185

186 Yl Y2 Yp r = (i+j-1)= |2 p2 i (B. 5) yp Wp+l w' * 2p-1 = (Y1' Y2''.. yp)t (B.6) and yi = (b+l)(b+2)... (b+i) (B. 7) p = 2. From Eq. B.4 d1 b1 (B.8) 1 bb+ 1 2 (b+1) (b+2) (B.9) Since d2 < 0, choose d * max () I0 ro(O) (B.10) e [0o, ) where ro(O) = d0 +d202 (B.11) The critical values occur at 0 = 0 and at those values of 0 for which r(6() = 0. But rO(0) = 01 implies that 0 = - with 2d2

187 d d 2 r (2dI) = 42 Hence r0 2d2 4d2 d 2 = 4d2 (B.12) d* and r = = b+2 (B.13) d +1 p =3. FromEq. B.4 d1 b+l (B.14) 3 = (b+l)(b+2) (B.15) 3 (b+l)(b+2)(b+3) (B.16) Since d3 > 0, choose d * min ( O d -e[0,oo) r (Oe (B.17) where r(06) = d16 + d202 + d303 (B.18)

188 -d - d -3d d The critical values occur at 01 = 0, = 3d 12_2 3d3 -d2+ d2 -3d d and at 0 =3d with r0(1 ) = 0, r(0( = b(b+3)+ 2 /b+3 = b(b+3)-2vb+3 0o() (b+1) (b+2) and (3) (b+1) (b+2) d = -min[ro(0l), r0(2), r(3)] 2 b+- b(b+3) -1 < b 1 (b+l) (b+2) ~~~~~= {<;~ ~(B.19) 0 b > and 2 /b-+3- b(b+3) j 2^3- b(b+3) -1 < b< I d 2(b+3 + 1) r* = = (B. 20) d * +1 0 b>1 0 0 b>l

APPENDIX C A TRANSFORMATION FROM n DIMENSIONS TO 2 DIMENSIONS This appendix will consist of two parts. The first part will make a transformation from n dimensions to 2 dimensions and determine the effect of the transformation on the conditional densities. The second part will use the same transformation but determine the effect of the transformation on the absolute densities. I) Transformation with respect to conditional densities This section will show that the transformation n u= 1- 2 x. S (C.1) s i=1 11 n -2 = E X2 - (C.2) S i=1 i= will convert the conditional density n 2 2 2 2 -- 1i f(xl0 ) = e (C.3) to the conditional densities 189

190 0 E u'2 0 E \2 e S f(u'10) = 2S e 2 -x < u' < Oc (C.4) n-1 0 E v2 2 _EsV 2( ) n-e2 ) 2 f(vl0) 2= - Ov <v<xo r (n1) ( - 2 ) (C. 5) where n E = Z s2 (C.6) i=i 1 Proof: If we let y = Ax define an orthonormal transformation (AtA =I) with n s. \ Y = i1 i x. (C.7) then n 0 2 f(yl0) -) e (C.8) If we let yi z. -i 1,2,..., n (C.9) 1 Es

191 then n 0E n QE\ - s zz f(ZIO) = ) e i (C.10) Next, let n u' = Z = E x.s (C.l1) S i=l n. n / in n2 =2, x. - X. S. (C.12) i=2 1 s i=l 1 i=l Since z1 is independent of (z2, Z3,... zn), this implies that u' is independent of v and furthermore u' ~N (O E (C.13) \ s 9E v2 X1 (C. 14) That is, _ H E u'2 (C. 15) n- 0 E v2 0 E9 2 S (2 ) v n-2 e 2 f(vlO) = -- O<v< oo r -) - (C.16) Q.E.D.

192 2) Transformation with respect to absolute densities This section will determine the effect of the transformation of Eqs. C.I1 and C. 2 on the absolute densities with respect to the a priori density h(O) = r() g(O) (C.17) where b+l1 c b -cO g(O) r(b1) 0 e b>-1, c> 0, 0<0 < (C.18) P r(0) = r. 0 (C.19) i=0 From Eq. 3. 40 an equivalent expression for the polynomial r(0) is P r(0) = L f (cO)1 (C.20) i=O with f. independent of c. Average the conditional densities derived in part 1) to obtain f(u',vlh) = f f(u',vl0) h(0) d = f f(u'l0) f(vl0) h(O) dO 0

193 E u'2 Jc 1s n-1 -K 02 e 2 2 n- 2 o E v2 s p e 2 fi(co)i b e- c d i=O -(b+ 1 +n n- 2 v 2 (u +2 +2c) m(u',v) K m v) s -o< u' < x 0<v< o (C.21) with 2(2c/E) r(b+l+n) r(b+- 1+ i=s s/ E s 1 Sr~r(b~l (C.(22) and gi f= r(b+1~+ i) i=0,1,...p (C.24) 1 2

APPENDIX D OPTIMUM RECEIVER OPERATION AND PERFORMANCE FOR SKE + KGN The operation and performance of the optimum receiver for SKE + KGN (signal known exactly in known Gaussian noise) will be derived in this appendix. From Eqs. 4.12 and 4.13 the conditional observation statistics are 1 t -1 f(xla,SN) = (27TIr1 ) e 2 (D.1) n t -1 f(xla,N) = (211) 2 (x as)t - (x-as) (D.2) Receiver operation The conditional likelihood ratio is d f(xla,SN) ay- a 2 2(xla) - = e (D. 3) - f(x a, N) where y = s 1 x (observation vector) (D. 4) t 1 do = s Z s (nominal signal-to-noise (D. 5) ratio) The random variable y is a sufficient statistic of the observation x 194

195 since 2(xla) depends upon x only through y. Subsequent discussion of receiver operation and performance will center about the sufficient statistic y. Since (xla) > 3 implies that n ad0 y > + a> 0 a 2 (D. 6) n3 + 0 ad Y < a 2 a< the optimum receiver operation consists of simply thresholding the sufficient statistic y. Receiver performance In order to determine the ROC, it is necessary to determine the statistics of y. Since y is a linear transformation of multivariate normal random variables, y is also a normal random variable. Hence, we need to determine only its mean and variance conditional to both N and SN for a complete description. We obtain E(ylSN) = E(st S x) = t 1 E(xlSN) = t S1 as = adO (D.7) t -1 t -1 t 1 t Var(yilSN) = Var(s 2 x) = s ~ Var(xlSN)(s Z ) t -1 -1 = sd (D.8) do (D. 8)

196 Likewise, E(ylN) = 0 (since a = 0) (D. 9) Var(y N) = do (D. 10) Hence y2 f(yla,N) e (D. 11) d0 (y- ad0)2 1 2d f(yla,SN) =e (D.12) /27rd0 The performance equations are pCO(A IN) N P~OAs) =' f(yla, dy (D.13) "SN f(xla)> SN1 Utilizing Eq. D. 6 and simplifying yields P (A IN) = O = - (D. 14) \ ( la I2 PCO(AISN) n= i pfn { + 0 la (D.15) \/d lal 2

APPENDIX E SUBOPTIMUM RECEIVER OPERATION AND PERFORMANCE FOR SKE + KGN The operation and performance of a suboptimum receiver for SKE + KGN (signal known exactly in known Gaussian noise) will be derived in this appendix. Receiver operation will be based on the total likelihood ratio for SKEA (signal known except for amplitude) with a normal a priori density on the unknown amplitude. Receiver performance will be evaluated conditional to the actual amplitude a. Receiver operation The conditional likelihood ratio was derived in Appendix D and is d 2 0 ay- a 2 D(xla) = e (E.1) where t 1 y = s 1 x (sufficient statistic of the (E.2) observation) t -1 do = s S s (nominal signal-to-noise (E.3) ratio) From Eq. 4.11 the a priori density on the unknown amplitude is 197

198 (a- m)2 g(a) = 2 (E. 4) Subsequent algebraic manipulation will be simplified if g(a) is expressed in the form Aa2 + Ba g(a) = Ke (E.5) where 1 A =- -(E. 6) 2v B -- (E.7) v m2 2v K = e (E.8) /27rv The parameters v and m can be expressed in terms of A and B as v = 2A (E.9) B m = 2A -(E.10) The total likelihood ratio f (x) is the average of the conditional likelihood ratio f(xla) with respect to the a priori density g(a), i.e.,

199 (x) = f (xla) g(a) da 2 do oc ay-a 2 Aa2+Ba -f e K e da -3C =- K eA'a2+B'a d K / e da -3c K= K (E. 11) where A' = A = - (V +d0) (E.12) 3' = B +y = m +y (E.13) mr'2 2v' K' = -2- (E. 14) -/27v' and 1 -1 -1 V = -2A = (v + d) (E.15) m = - =' (V- + do) (mv- + y) (E.16) 2ASubstituting, Substituting,

200 m 2 m2 (x) = e2V 2 (E.17) V Receiver operation is determined by the nature of the decision region {xl (x)> }. It is easy to verify that (x) t [y+mv. Thus the suboptimum receiver operation consists of thresholding ly + mv 1I In particular, the decision rule is 1 ly y+mv I >A 0(x) = (E. 18) 0 otherwise where A = 2(v 1 + do) [n (13 vlvd0) + l ] (E.19) Receiver performance The statistics of y were already determined in Appendix D and are y2 y 2d f(yla,N) = e (E.20) /2wd0 (y- ad0)2 2d f(yla, SN) = e ~ (E. 21) v21Tdo

201 The performance equations are Cs N N P (AIN) - f (yla,SN) dy (E.22) Iy+mv I > A Performing the integration and simplifying yields Cs p (AIN) = C(X1) + D(2) (E.23) PCS(AISN) = (X1 +a /d) + ( - a i) (E.24) 1 V 2 0 where -X1 - z+mv' -(E.25) 0 v-1 X - m(E.26) 2 Ax

APPENDIX F MONOTONICITY OF THE LIKELIHOOD RATIO FOR A SPECIAL CASE This appendix will show that the total likelihood ratio for SKEA is strictly monotone increasing in the sufficient statistic t 1 t y = s 1 x (F.1) whenever the assumed a priori density of the unknown amplitude has zero probability on the negative axis. From Appendix D the conditional observation statistics of y are y2 _Y - 2d f(yla,N) e (F.2) 0 (y- adO)Z 1 2d0 f(yla,SN) = e (F.3) and the conditional likelihood ratio of y is do 2(yla) = f(yia,sN) - e (F.4) 202

203 The total likelihood ratio f(y) for a single composite hypothesis is the average of the conditional likelihood ratio P(yla) with respect to the assigned a priori density g(a), i.e., d 2 0 3c ay-a 2 L(y) = J e g(a) da (F. 5) -3C To determine whether or not f(y) is a monotone function of its argument, check its derivative a 2 a 0 0 ay - a 2 r(y) = ae g(a) da -30 30 -0 f(yla, SN) -f a (ylSN) g(a) da 30 f(y IN) 1 ff N a f(ylSN) g(aly)da f(y I N) -a = (y)E(aly) (F. 6) Since B(y) is a non-negative function of its argument, the monotonicity of Q(y) depends only upon the posterior mean E (a ly). Furthermore, the posterior mean will be positive provided only that g(a) has zero probability on the negative axis. Therefore, the monotonicity of f (y) is assured whenever g(a) has zero probability on the negative axis.

APPENDIX G A USEFUL TRANSFORMATION OF NORMAL RANDOM VARIABLES This appendix will show that the transformation u'= Y (G.1) v2 = _ (G. 2) where y = s x (G.3) w = x x (G.4) t -1 d = st s (G.5) will convert the conditional density n t -1 f(x 0) = e I - (G. 6) to the conditional densities 204

205 1 -0d u'2 f(u'10) = ( ) e 2 - < u< (G. 7) in-1 1 0 d 2 2(o0) vn-2 e 2 f(vIO) = / n- 0 <v< o (G.8) r n- 1 Proof: If we make the transformation r = Ax (G.9) with A determined such that A Z At = I (G.10) or equivalently At A = 1 (G.11) then n 0 t 2 -rr f(rI0) = ( e - (G.12) Apply the transformation of Appendix C, namely t u' =_-r (G.13) Es

206 2 E t ) (G.14) s s with v = As (G.15) E = t7 7 (G.16) to obtain 0 E u'2 /0 E \2 S f(u'l0) = 2(<) e - <u< 2,ff-^ rr~~/ e (G. 17) n- 1 n-1 9E v2 2 Es) vn- 2 e 2 2 ) f(vl0) = - 0<v< r(2) (G. 18) Since t t t t t -I r = (As) (Ax) = s A Ax = s 1 x= y (G.19) t t t t t -1I r r = (Ax) (Ax) x A A x = x = w (G.20) 7r _r = (As) (As) = s A s = s s = d (G.21) the transformation

207 u' = Y (G.22) d 0 \0 v = d o y (G.23) results in 68 d0 u'2 /0^^ do U22 f(u' 10) 02 ) e 2 -(x < U'< x 2("7r - (G.24) n- 1ED Od 2 0 ~0 n.-n- 2 rF n ) (G. 25) Q.E.D.

APPENDIX H USEFUL INTEGRAL RELATIONS The following integral relations turn out to be useful in analytically evaluating receiver performance. U-3 o v^1 v where T (t) = + — du (H.2) V' N /vI' r ) -o \ V - Coil 0 b \ 2):(ax+b)'(x)dx= (I b (H.3) 208

APPENDIX I NON-EXISTENCE OF AN A PRIORI DENSITY* This appendix will show that there exists no a priori density g(.) such that the solution to the double composite hypothesis detection problem SKE-UNL (signal known exactly, unknown noise level) results in a likelihood ratio which is monotone in the test statistic of the UMPU test. Specifically, it will be shown that there exists no function g () such that n & c - -2 (w-2y+d ) e2 e 2 g(O) d where m (-) is any monotone function of its argument. Proof: Restate the hypothesis in terms of the function h(X) where 0n 0w 00(- -m( 2 n 2 h(w) = 2 e 2 g(0)d (. d2) 0 wheredit i s any monoto meyer for the techniqon of this proof. ~Proof:~209 Restate the hypothesis in terms of the function h(.) where n Ow h(w) - fO~e g(O) dO (I. 2) 0 209

210 Then an equivalent hypothesis is: there exists no function h(.) such that h(w - 2y+do) (.3) ------ v = m h~r-) (I. 3) h (w) m-_w) If we define = Wy (1.4) Vw the equivalent hypothesis becomes: there exists no function h(.) such that h(w- 2t]V+ do) -— _ _ _ _ - ~= m (r) (I. 5) h (w) Thus, it remains to show that the LHS of Eq. I. 5 is, in fact, dependent on both w and. Proof by contradiction: Suppose the LHS of Eq. I. 5 were dependent only on r7. Then, consider the particular case do 1) 1 = w = 1 so that

211 h(w-2r F+ d) h(1) — ~ = h (1) (I. 6) h(w) () and do 2) 2 w = 4 so that h (w - 27 /Vw+d0) h(4 - d ) h (w) h(4) If the LHS of Eq. I. 5 were dependent only on r, then h(4-d0) D -h (4) (I. 8) h (4) But this implies that n / d f02 e20 1 - e ~ g(O)dO = 0 (I.9) 0 Since the integrand is non-negative for all 0 e [0, cc) and not identically zero, this implies that no function h () exists satisfying Eq. I. 5 and hence no function g () exists which reconciles the Bayes test to the UMPU test for the double composite hypothesis problem SKE-UNL.

APPENDIX J EXISTENCE OF A1 AND CALCULATION OF COEFFICIENTS This appendix will show that it is plausible to assume the existence of the matrix A where 1 a2 p A = ( i+ - 2 3 * (J. 1) a a... ap p p+l' 2p-1 and - =b (b+l)(b+2)... (b+k) (J.2) k =k Furthermore, if a (aa2...a) = A a (J. 3) a= (aoa21.. a1) (J 4) -- - ( 0'1'''' **p-1) ( ) then the coefficients a. can be written in the form i a. d. c i= 1, 2,.. p (J. 5) 1 1 with d. independent of c 212

213 Existence of A. Write the moments r. in the form i ei. = ci C (J.6) with yi = (b+1) (b+2)... (b+i) (J.7) Define -1 0 l =l c-2 1(J. 8) 0 c-p Yl Y2 **' Yp Y2 Y3 "' Yp+l r = = 2 3 + (J.9) Yp Yp+l * 2p-l so that the matrix A can be expressed in the form A =c rc (J. 10) and a( can be written as

214 a = c Cy (J.11) where Y = (yl'Y2..p) y(J.12) Since c > 0, the diagonal matrix C is obviously invertible. Hence the matrix A-1 exists provided only that the matrix r-1 -1 exists. The existence of r will be assured provided the determinant of the matrix r is unequal to zero. For p = 2 I r = (b+1) (b+2) 2 0 (J.13) while for p = 3 rl = 2(b+1)3 (b+2)2 (b+3) 0 (J. 14) since b > -1. Computer evaluation of I rI for p > 3 and for specific values of b appeared to indicate the existence of r1 in general and hence the existence of A in general. Calculation of coefficients. If we assume that r is in fact invertible, then a= A 1 c= (cCrC)1 c Cy -1 -1 r C-1 C = (c1 C1) c Cy -1 -1 -1 =C d

215 where d = (d,d2... dp)t = r (J.16) The vector d is independent of c since both r and y are independent of c. In addition 0 c 0 -1 C = (J. 17) 0 cP so that a= C d implies a. = d. c i=1,2,... p (J.18) th d independent of c with d. independent of c 1

APPENDIX K THE TRANSFORMATION z [z(x I g) I h] This appendix will construct a procedure for calculating z [z(x I g) I h]. The transformation of Eqs. 3. 59 and 3.60 has simplified the task considerably; we need only determine z [a(u, v) I h] where (u, v) - z(x I g) + 1\2 2 2c U+ +V + —(b+1+ n in- s (K.1) ( 21) 2 E2c -u +V + 2with f(u, vIh, SN) and f(u,v fh, N) given by Eqs. 3. 61 and 3. 62. Distribution Function Let H denote either the N or SN hypothesis. The distribution function of a(u, v) can be determined via F(t I h, H) P [ao(u, v) < t l h, H] (K. 2) It can readily be shown that 216

217 0 t < -t F(t Ih, H) = P[(u, v) e Rt Ih, H] -t0 < t < t0 (K. 3) 1 t>t0 where E E /E to (b1 + n 1 - 2) (K.4) 0 2 L c 4c\4c (" 2 2 2 Rt = -(u,v) Iv > and (u-a) + v > r if 0 < t < t; v > 0 and 2 2 2 (u-a) +v < r if -t < t < 0; or v > O and (K. 5) u < if t= 0 and 1 1 (K. 6) 2 A 2c r = (K. 7) (A-2 E (a - 1) s t n b+l+j A = e (K.8) For I t < to, the distribution function is

218 r u2(A, v) F(t h,H) = f f(u, v h,H)dudv -t0 < t < 0 o u1(A,v) oc 0 F(tlh, H) = f f f(u,vlh,H)dudv t=0 (K.9) 0 -0 r u2(, v) F(t h, H) = 1 - S2( f(u, v h, H) dudv 0 < t < to 0 u1(A,v) where u(A, v) = a - r2 - v (K. 10) u (A,v) = a + jr2 v2 (K. 11) Density Function The density function can be obtained by differentiating the distribution function. f(t h,H) = - F(tlh,H) (K. 12) We can direct our attention to the interval 0 < t < t; the interval -to < t< 0 can be dealt with in a similar manner. Using the chain rule for differentiation

219 dA d f(t I h, H) = dt * d F(tIh, H) (K. 13) dt' For 0 < t < to, employ Leibnitz's rule to obtain r f(th H dA a Simplifying, r f(t I h,H) = /ff(u2(A, v),vlhH) u(A,v) 0\ + f(ul(A,v),vlh,H) ul(Av) <t <t -~T;2 dv 2 0 < t < to 1 p p1''1 r -v (K. 15) If we make the transformation w = 2( - 1 (K.16) then 1 f(w I H) f(tl h, H) = f dw 0<t< t (K.17) -1 1_ where

220 f0(wI H) = - [f(u (Av),v Ih,H) ~ u (A v) 2(A-1)2(b-+ 2) + f(u2(A,v), vh,H) u2(A,v)] (K. 18) v=r 2 A Gauss-Mehler quadrature routine is ideal for evaluating Eq. K. 17 for specific values of t. An mth order Gauss-Mehler quadrature routine evaluates Eq. K. 17 via f m (2j- <\1 0 <H t < t f(tlh, H) -Jf f0 os 2- H =1 L- -2m (K. 19) This method is exact if f0(w I H) is a polynomial in w of degree < 2m - 1 (Ref. 29, p. 381). An entirely analogous procedure for -tO < t < 0 leads to (thH)m F / \ ] -t < t < 0 f(tlh. H) - _T 2 fos- 1T IoH m j=1i \ 2"m/ ] (K.20) The Transformation z[a(u, v) I h] If we let t = a(u,v) (K. 21) then z(t l h) = n (tl0 <It < to (K. 22)

221 It can readily be shown from symmetry considerations that the transformation z(tlh) is an odd function, i.e., z(t I h) = -z(-t h) 0 < Itl < to (K. 23) and hence from continuity that z(tlh) = 0 t=O (K. 24)

RE FERENCE S 1. W. W. Peterson, T. G. Birdsall and W. C. Fox, "The theory of signal detectability, " IRE Trans. on Information Theory, IT-4, 1954. 2. D. Middleton and D. Van Meter, "Modern statistical approaches to reception in communication theory, " IRE Trans. on Information Theory, IT-4, 1954. 3. U. Grenander, "Stochastic processes and statistical inference," Arkiv det Mat., 1, 1950. 4. T. G. Birdsall, "Application of detection theory in bandlimited Gaussian noise, " Random Processes, Linear Systems and Radar, The University of Michigan Summer Conferences, 1964. 5. L. J. Savage, The Foundations of Statistics, John Wiley and Sons, Inc., New York, 1950. 6. H. Cramer, Mathematical Statistics, Princeton University Press, Princeton, 1951. 7. T. S. Ferguson, Mathematical Statistics, A Decision Theoretic Approach, Academic Press, New York, 1967. 8. E. L. Lehmann, Testing Statistical Hypotheses, Wiley, New York, 1959. 9. H. Raiffa and R. O. Schlaifer, Applied Statistical Decision Theory, Division of Research, Graduate School of Business Administration, Harvard University, Boston, 1961. 10. D. Blackwell and M. A. Gershick, Theory of Games and Statistical Decisions, Wiley, New York, 1954. 11. C. W. Helstrom, Statistical Theory of Signal Detection, Pergamon Press, New York, 1960. 12. J. C. Hancock, Signal Detection Theory, McGraw-Hill, New York, 1966. 222

223 REFERENCES (Cont.) 13. S. Kullback, Information Theory and Statistics, Wiley, New York, 1959. 14. A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 1965. 15. W. G. Davenport and W. L. Root, An Introduction to the Theory of Random Signals and Noise, McGraw-Hill, New York, 1958. 16. H. L. Van Trees, Detection, Estimation and Modulation Theory, Part I, John Wiley and Sons, Inc., New York, 1968. 17. N. M. Blachman, Noise and Its Effect on Communication, McGraw-Hill, New York, 1966. 18. J. M. Wozencraft and I. M. Jacobs, Principles of Communication Engineering, Wiley, New York, 1967. 19. R. A. Roberts, Theory of Signal Detectability: Composite Deferred Decision Theory, Cooley Electronics Laboratory Technical Report No. 161, The University of Michigan, Ann Arbor, March 1965. 20. 0. Ye Antonov, "Optimum detection of signals in non-Gaussian noise —detection of a signal of unknown amplitude and phase, " Radio Engineering and Electronic Physics, vol. 12, May 1967. 21. L. W. Nolte, Adaptive Realizations of Optimum Detectors for Synchronous and Sporadic Recurrent Signals in Noise, Cooley Electronics Laboratory Technical Report No. 163, The University of Michigan, Ann Arbor, Michigan, March 1965. 22. R. L. Spooner, Theory of Signal Detectability: Extension to the Double Composite Hypothesis Situation, Cooley Electronics Laboratory Technical Report No. 192, The University of Michigan, Ann Arbor, Michigan, April 1968. 23. T. G. Birdsall, The Theory of Signal Detectability: ROC Curves and Their Character, Ph.D. dissertation at The University of Michigan, Ann Arbor, Michigan, 1966.

224 REFERENCES (Cont.) 24. W. P. Tanner and T. G. Birdsall, "Definitions of d' and 7 as psychological measures, " The Journal of the Acoustical Society of America, vol. 30, October 1958. 25. J. Spragins, "A note on the iterative application of Bayes' rule," IEEE Trans. on Information Theory, IT-11, October 1965. 26. R. Deutsch, Theory of Estimation, Prentice Hall, Englewood Cliffs, New Jersey, 1965. 27. D. V. Lindley, "On a measure of information provided by an expectation," Annals of Mathematical Statistics, December 1956. 28. A. Feinstein, Foundations of Information Theory, McGraw-Hill, New York, 1966. 29. Z. Kopal, Numerical Analysis, John Wiley and Sons, Inc., New York, 1961o 30o M. Abramowitz and L. A. Stegun (Eds.), Handbook of Mathematical Functions, National Bureau of Standards, AMS 55, Washington, D. C., 1964. 31. T. G. Birdsall, Adaptive Detection Receivers and Reproducing Densities, Cooley Electronics Laboratory Technical Report No. 194, The University of Michigan, Ann Arbor, Michigan, July 1968. 32. R. L. Spooner, "Comparison of receiver performance: the ESP receiver," The Journal of the Acoustical Society of America, vol. 45, no. 1, January 1969. 33. T. Kailath, "A general likelihood-ratio formula for random signals in Gaussian noise," IEEE Trans..on Information Theory, vol. IT-15, no. 3, May 1969. 34. S. Kullback and R. A. Leibler, "On information and sufficiency," Annals of Mathematical Statistics, vol. 22, no. 1, 1951, pp. 7986. 35. H. Jeffreys, Theory of Probability (2nd ed.), Oxford Univ. Press, 1948. 36. A. P. Dempster and M. Schatzoff, "Expected significance level as a sensitivity index for test statistics, " Journal of the American Statistical Association, Vol. 60, June 1965, pp. 420-437.

DISTRIBUTION LIST No. of Copies Office of Naval Research (Code 468) 2 Navy Department Washington, D.C. 20360 Director, Naval Research Laboratory 6 Technical Information Division Washington, D.C. 20360 Director 1 Office of Naval Research Branch Office 1030 East Green Street Pasadena, California 91101 Office of Naval Research 1 San Francisco Annex 1076 Mission Street San Francisco, California 94103 Office of Naval Research 1 New York Annex 207 West 24th Street New York, New York 10011 Director 1 Office of Naval Research Branch Office 219 South Dearborn Street Chicago, Illinois 60604 Commanding Officer 8 Office of Naval Research Branch Office Box 39 FPO New York 09510 Commander, Naval Ordnance Laboratory 1 Acoustics Division White Oak, Silver Spring, Maryland 20910 225

226 DISTRIBUTION LIST (Cont.) No. of Copies Commanding Officer and Director 1 Naval Electronics Laboratory San Diego, California 92152 Commanding Officer and Director 1 Navy Underwater Sound Laboratory Fort Trumball New London, Connecticut 06321 Commanding Officer Naval Air Development Center Johnsville, Warminister, Pennsylvania Commanding Officer and Director 1 David Taylor Model Basin Washington, D.C. 20370 Superintendent 1 Naval Postgraduate School Monterey, California 93940 Attn: Professor L. E. Kinsler Commanding Officer 1 Navy Mine Defense Laboratory Panama City, Florida 32402 Superintendent 1 Naval Academy Annapolis, Maryland 21402 Commander 1 Naval Ordnance Systems Command Code ORD-0302 Navy Department Washington, D.C. 20360

227 DISTRIBUTION LIST (Cont.) No. of Copies Commander 1 Naval Ship Systems Command Code SHIPS-03043 Navy Department Washington, D.C. 20360 Commander 1 Naval Ship Systems Command Code SHIPS-1630 Navy Department Washington, D.C. 20360 Chief Scientist 1 Navy Underwater Sound Reference Division Post Office Box 8337 Orlando, Florida 38200 Defense Documentation Center 20 Cameron Station Alexandria, Virginia Dr. Melvin J. Jacobson 1 Rensselaer Polytechnic Institute Troy, New York 12181 Dr. Charles Stutt 1 General Electric Company P.O. Box 1088 Schenectady, New York 12301 Dr. J. V. Bouyoucos 1 General Dynamics/Electronics 1400 N. Goodman Street P.O. Box 226 Rochester, New York 14609 Mr. J. Bernstein 1 EDO Corporation College Point, New York 11356

228 DISTRIBUTION LIST (Cont.) No. of Copies Dr. T. G. Birdsall 1 Cooley Electronics Laboratory The University of Michigan Ann Arbor, Michigan 48105 Dr. John Steinberg Institute of Marine Science The University of Miami Miami, Florida 33149 Dr. R. A. Roberts 1 Department of Electrical Engineering University of Colorado Boulder, Colorado Commander 1 Naval Ordnance Test Station Pasadena Annex 3203 E. Foothill Boulevard Pasadena, California 91107 Dr. Stephen Wolff 1 John Hopkins University Baltimore, Maryland 21218 Dr. M. A. Basin Litton Industries 8000 Woodley Avenue Van Nuys, California 91409 Dr. Albert Nuttall 1 Litton Systems, Inc. 335 Bear Hill Road Waltham, Massachusetts 02154 Dr. Philip Stocklin 1 Box 360 Raytheon Company Newport, Rhode Island 02841

229 DISTRIBUTION LIST (Cont.) No. of Copies Dr. H. W. Marsh 1 Raytheon Company P.O. Box 128 New London, Connecticut 06321 Mr. Ken Preston 1 Perkin-Elmer Corporation Electro-Optical Division Norwalk, Connecticut 06852 Mr. Tom Barnard 1 Texas Instruments Incorporated 100 Exchange Park North Dallas, Texas 75222 Dr. John Swets 1 Bolt, Beranek and Newman 50 Moulton Street Cambridge 38, Massachusetts Dr. H. S. Hayre 1 The University of Houston Cullen Boulevard Houston, Texas 77004 Dr. Robert R. Brockhurst 1 Woods Hole Oceanographic Institute Woods Hole, Massachusetts Cooley Electronics Laboratory 50 The University of Michigan Ann Arbor, Michigan 48105 Director 1 Office of Naval Research Branch Office 495 Summer Street Boston, Massachusetts 02210

230 DISTRIBUTION LIST (Cont.) No. of Copies Dr. L. W. Nolte 2 Department of Electrical Engineering Duke University Durham, North Carolina Mr. F. Briggson 1 Office of Naval Research Representative 121 Cooley Building The University of Michigan Ann Arbor, Michigan 48105 Dr. Ronald Spooner 1 Bolt, Beranek and Newman, Inc. 1501 Wilson Boulevard Arlington, Virginia 22209 Computer Science Library University of Pittsburgh Pittsburgh, Pennsylvania 15213

Security Classification DOCUMENT CONTROL DATA - R & D (Security classification of title, body of abstract and indexing annotation must be entered when the overall report is classified) 1. ORIGINA TING ACTIVITY (Corporate author) 2a. REPORT SECURITY CLASSIFICATION Cooley Electronics Laboratory Unclassified The University of Michigan 2b. GROUP Ann Arbor, Michigan 3. REPORT TITLE The Theory of Signal Detectability: Bayesian Philosophy, Classical Statistics, and the Composite Hypothesis 4. DESCRIPTIVE NOTES (Type of report and.inclusive dates) Technical Report No. 200 03674-22-T February 1970 5. AUTHOR(S) (First name, middle initial, last name) Jaarsma, David 6. REPORT DATE 7a. TOTAL NO. OF PAGES 7b. NO. OF REFS February 1970 252 36 8a. CONTRACT OR GRANT NO. 9a. ORIGINATOR'S REPORT NUMBER(S) Nonr-1224(36) b. PROJECT NO. 03674-22-T NR187-200 c. 9b. OTHER REPORT NO(S) (Any other numbers that may be assigned this report),_________d. X___ TR No. 200 10. DISTRIBUTION STATEMENT Reproduction in whole or in part is permitted for any purpose of the U. S. Government. 11. SUPPLEMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY Office of Naval Research Department of the Navy W__ashington, D. C. 20360 13. ASTRACT Receiver design and performance from a Bayesian viewpoint depend upon a priori specification whenever unknown parameters are encountered in the detection situation; any available information is expressed in the form of an a priori density. A sensitivity index is developed which measures the performance loss that occurs when the receiver is designed to be optimal with respect to the given a priori density g(.) but operates in an environment in which the a priori density h(-) is considered to hold. A comparison of receiver performance is made for the composite hypothesi situation. The Bayesian approach is contrasted to the classical approach. Initial indications were that classical statistics could be closely linked to Bayesian philosophy since analysis according to either mode often led to the same receiver. It appeared that many of the classical tests could be generated from a Bayesian viewpoint by an appropriate assignment of the a priori density. Investigation revealed that this was not true in general; and the conclusion is drawn that the Bayesian approach is uniquely distinct from the classical approach. The externally sensed parameter receiver is reviewed and its receiver operating characteristic is evaluated for several examples not considered before. Receiver design via numerical integration techniques is demonstrated to be feasible for composite hypothesis situations previously considered too complex to solve. Receiver design via estimation techniques is considered justifiable in case optimal procedures are too complex. DD,FORM 1473 (PAGE 1 ) S/N 01 01 -807-681 1 Security Classification.-308 A- 31408

Security Classification 14 LINK A LINK B LINK C KEY WORDS ROLE WT ROLE WT ROLE WT Signal detection Sensitivity Composite hypothesis Receiver A priori density, iDD FORM. 1473 ( BACK)_ S/N 0101-807-6821 Security Cla'ssification A-31409