Technical Report No. 177 036040- 16-T THE THEORY OF SIGNAL DETECTABILITY: ROC CURVES AND THEIR CHARACTER byW Theodore G. Birdsall COOLEY ELECTRONICS LABORATORY Department of Electrical and Computer Engineering.The University of Michigan Ann Arbor, Michigan f or Contract No. N00014- 67-A-0181-0032 Office of Naval Research Department of the Navy Arlington, Virginia 22217 January 1973 Approved for public release; distribution unlimited

- C V,: AllR Theodore Gerald E All Rights Reserved

ABSTRACT The first problem in the theory of signal detectability deals with the decision between two alternative responses, corresponding to two possible classes of causes of an observation. When the goal of a decision process is to achieve the highest quality of terminal decision, the Receiver Operating Characteristic curve (ROC curve) contains all of the information necessary for the evaluation of the decision process. This present work introduces the ROC character, which is isomorphic to the ROC curve. The formal development is based on two key facts. The first is the fundamental theorem: if k(X) is the likelihood ratio of an observation, then the likelihood ratio of f is B itself. The second is the main theorem on ROC characters: each ROC character is isomorphic to a univariate probability distribution that possesses a moment generating function. The character convolution theorem and the character addition theorem follow directly from these. Families of ROC curves are developed from the main theorem on ROC characters. The normal, binormal, Q-table, power, and several discrete families of ROC curve have appeared in the literature. The new families include the Pearson type III, Fisher-Tippett doubly exponential, H-type, Poisson, and the regular conics. iii

Additional families are generated from these by use of the metastatic relation, and the convolution and addition theorems. ROC curves contain information about the performance of other two-cause decisions besides the two-response decision. Several are considered that are used in the testing of human perception; namely, the symmetric forced choice decision, type II decisions, the rating scale procedure, and the analysis of a decision based on the reports of multiple observers.

FOREWORD This report contains foundational work in signal detection theory. The primary purpose of the work was to provide a technique for cataloging performance curves of detection-decision devices, and to expound upon properties of specific performance curves. Its usefulness lies in the exhaustive detailing, which the author hopes will provide detection theory researchers with both insight and detailed information about performance curves they encounter.

ACKNOWLED GMENTS The author wishes to thank the members of his committee, Professor Hok who was especially helpful in the initital development of the work, Professor Macnee and Professor Peterson who have aided and encouraged this and similar work over a number of years, and who patiently worked with the several drafts of this dissertation, Professor Tanner and Professor Pollack who have given guidance and example in psychophysical research extending over a number of years. Special thanks are also due to Professors R. V. Churca;ill, H. H. Goode, and A. H. Copeland, Sr., for encouragement in continuing study and research in applied mathematics. This present study is based principally on the work of W. W. Peterson, W. P. Tanner, Jr., J. A. Swets, D. M. Green, and J. P. Egan, and the experience gained while working with these men. The author is indebted to the U. S. Army Signal Corps, Countermeasures Branch, for support of research in signal detectability from 1952 to 1957, and to the U. S. Navy Office of Naval Research, Acoustics Programs, for continuing support of research in extending the theory of signal detectability since 1960, including the research and writing of this thesis.

TABLE OF CONTENTS Page ABSTRACT iii FOREWORD v ACKNOWLED GME NTS vi LIST OF ILLUSTRATIONS x LIST OF TABLES xvi LIST OF SYMBOLS xvii LIST OF APPENDIXES xxiv CHAPTER I: INTRODUCTION 1.1 Review of Detection Theory 4 1. 2 Fundamental Theorem 18 1. 2. 1 Likelihood Ratio 18 1. 2. 2 Induced Measures 22 1. 2. 3 Section Summary 28 1. 3 Initiation of Present Study 30 CHAPTER II: ROC CHARACTER 33 2. 1 ROC Curve 33 2. 2 ROC Models 38 2. 2. 1 One ROC Curve, Many Models 38 2. 2. 2 A Unique Model for Each ROC Curve 45 2. 3 ROC Character, T(z) 53 2. 4 Remarks on the Distribution of z 56 CHAPTER III: ROC FAMILIES 59 3. 1 Philosophical Orientation 59 3. 2 Fundamental Theorem on Regular ROC 63 Characters 3.3 Classical ROC Characters 69 3. 3. 1 Normal, Gaussian, Error Function 71 3.3.2 Exponential, PearsonX 72 3.3.3 LaPlace, Double Exponential 73 vii

TABLE OF CONTENTS (Cont. ) Page 3. 3. 4 Pearson Type III, Gamma, Chi-Square 75 3. 3. 5 Fisher-Tippett Type I 76 3.3.6 Beta, Pearson I and XII 76 3.3.7 Rectangular 77 3. 3. 8 Geometric 78 3. 3. 9 Binomial 79 3. 3. 10 Specific Binomial 81 3. 3. 11 Negative Binomial, Pascal, Po'lya 82 3. 3. 12 Poisson 82 3. 4 Truncated ROC Characters and the Metastatic 84 Transformation 3. 4. 1 The Transformation 84 3. 4. 2 Self-Metastatic Families 90 3. 5 Nonsingular, Nonregular ROC 94 3. 6 Summary of Chapters I, II and III 99 CHAPTER IV: TRADITIONAL ROC FAMILIES 100 4. 1 Normal ROC 100 4. 2 "Case II", Detection of a Sine Wave in Added 106 Normal Noise 4. 3 Noise-in-Noise, Same Spectrum 116 4. 4 Signal One of M Orthogonal Signals 121 CHAPTER V: ALGEBRAIC ROC 136 5.1 Power ROC Curve: X = yA138 5. 2 Conic ROC 155 5. 2. 1 From Curve to Character 155 5. 2. 2 Conic ROC Curves 16.2 5.3 Lor-Lor Graph Paper 178 5. 4 Choosing a Fitting Conic 199 CHAPTER VI: CHARACTER FITTING BY TRUNCATED 203 NORMAL CHARACTER 6. 1 Metastatic Normal 203 6.2 First Procedure 207 6.3 Second Procedure 215 6. 4 Third Procedure 221 6.5 Summary 224 viii

TABLE OF CONTENTS (Cont. ) Page CHAPTER VII: BINORMAL ROC 225 7. 1 General Development 225 7. 2 Formal Development 230 7.-3 External Rectification 243 7. 4 Internal Rectification 248 7.5 Discussion 256 CHAPTER VIII: ROC FOR MULTIPLE OBSERVATIONS 262 8. 1 Convolution Theorem 262 8.2 Discrete Characters 266 8. 3 Fisher-Tippett, Doubly Exponential 280 Character CHAPTER IX: ADDITION OF ROC CHARACTERS 295 9. 1 Character Addition Theorem 296 9. 2 Pearson III ROC Character 300 9. 3 H-Type ROC Character 305 9. 4 Discussion on Approximations 314 CHAPTER X: SELECTED TOPICS 316 10.1 Character-Free Measures of ROC Quality 316 10. 1. 1 D. M. Green's Theorem 317 10. 1. 2 W. W. Peterson's Theorem 323 10. 1. 3 Comparison of Quality Measures 326 10. 2 A Special Case of Type-II ROC 332 10.3 The Multiple Observer 339 10. 4 Rating Scale Pay-Off Matrices 355 10.5 Summary of Chapters IV-X 373 RE FERENCES 413 DISTRIBUTION LIST 417 ix

LIST OF ILLUSTRATIONS Figure Title Page 1.1 Blank ROC 6 1. 2 The "chance diagonal 9 1. 3 The "convex closure" of two points 10 1. 4 Existence of midpoint symmetry 10 1. 5 Convex closure of a point, its opposite, and chance 11 1. 6 Sketch of spaces and transformations 29 2. 1 Illustration of complex completion of ROC 34 2. 2 Illustration of tangent cone at one point of an ROC curve 34 2. 3 Illustrating singularity for complete convex ROC curves 37 2. 4 Arbitrary decision axis, c 39 2. 5 Sketch of arc of ROC corresponding to a z value [3 of positive probability 49 3. 1 A particular ROC curve 85 3. 2 Example of metastatic transformation 89 3. 3 Replot of Fig. 3 2 91 3. 4 Second example ()f metastatic transformation 92 3. 5 A I:omrpl'ete r:OifcX nonsingular ROC 95 3. 6 Traditional threshold ROC 95 J' G;reen ROC 95 4.1 NoG:i2 ROC 103 4.2 NO cma ROC on normal r-cma" p<pe-~: 104

LIST OF ILLUSTRATIONS WCont.) Figure Title Page 4. 3 Normal receiver operating characteristics 105 4. 4 Four normal ROC characters 107 4. 5 ROC based on Q tables 110 4. 6 Slope and quality values relating Q-table ROC to normal and binormal 111 4. 7 Q-table ROC characters 115 4. 8 A particular ROC curve 125 4. 9 ROC curves for M orthogonal normal signals 129 4. 10 ROC characters for M orthogonal normal signals 130 4. 11 M orthogonal normal signals (log scale) 131 4. 12 ROC character 132 4. 13 M orthogonal normal signals, ROC for, d1 = 4 135 5. 1 Power ROC curves on linear graph paper 137 5. 2 Power ROC curves on log-log paper 140 5.3 Power ROC curves on normal-normal paper 141 5. 4 Pure power A vs. negative diagonal coordinates (X, Y) 144 5. 5 Pure power A vs. negative diagonal normal V d 145 e 5. 6 Symmetric conics.5 <. < 2, izi <.693 167 5.7 Symmetric conics.2 < ( < 5 IzI < 1.61 168 5.8 Symmetric conics,.1 < ~ < 10, Izi < 2.30 169 5. 9 Symmetric conics,.05 < Q < 20, z I < 3. 00 170

LIST OF ILLUSTRATIONS (Cont.) Figure Title Page 5.10 Symmetric conics.02 < < 50 Izl < 3.91 171 5.11 Symmetric conics.0101 < k < 99 zi < 4. 59 172 5. 12 Symmetric conics, unbounded k, izl < co 173 5. 13 ROC and character for three hyperbolas 176 5. 14 Three hyperbolas compared to normal 177 5. 15 Comparison of curves and characters for a conic and a normal ROC 179 5. 16 Comparison of lor and log scales 181 5.17 Comparison of lor, normal and log scales 183 5. 18 Lor-lor paper with lines of equal M' 187 5. 19 Normal ROC curves on lor-lor paper 189 5. 20 Five ROC curves with common terminal slope 191 5. 21 ROC characters -2. 3 < z < +2. 3 192 5. 22 Four ROC curves with common terminal slope 193 5. 23 Five asymmetric ROC curves with common terminal slope 195 5. 24 ROC characters -2. 3 < z < +4. 6 197 5. 25 Five asymmetric ROC with common terminal slope on lor-lor paper 198 5. 26 Power ROC on lor-lor paper 200 6. 1 Sketch of two z axes 205 6. 2 ROC curves showing Normal ROC and its metastatic image 214 x i.i

LIST OF ILLUSTRATIONS(Cont.) Figure Title Page 6. 3 Sketch of probabilities on original axis 217 6. 4 Comparison of third and fourth examples with true curve 219 6. 5 Comparison of characters of third and fourth examples 220 6. 6 Sketch of probabilities on original axis 223 7. 1 A fan of binomial ROC curves 227 7. 2 Swets' first curves 228 7. 3 Swets' curves on normal- normal paper 229 7. 4 A binormal ROC curve 232 7. 5 Binormal ROC curve, q- 1, s=.5 233 7. 6 Binormal ROC q = 2 S =.5 235 7. 7 Five binormal characters, external rectification 244 7. 8 Binormal ROC with slope s =.8 q = 1 246 7. 9 Externally rectified binormal ROC curves 247 7. 10 ROC character for internally rectified binormal, q= 1, s -.8 253 7. 11 DetailofROC character. q 2, s =.8 254 7. 12 Binormal ROC curves, internally rectified 255 7. 13 Example of internal rectification of binormal ROC curves 258 7. 14 A fan of binormal ROC curves 259 7. 15 BinormalROC, s =. 5 showing two types of rectification 260 xiii

LIST OF ILLUSTRATIONS(Cont. ) Figure Title Page 8.1 (a) A Luce ROC, (b) A Green double threshold ROC 268 8. 2 Initial point power law 275 8. 3 Sketch for Luce ROC character 276 8. 4 Fisher-Tippett ROC curves 291 8. 5 ROC curves of Eq. 8. 5 294 9.1 ROC curves Pearson III, p = -0. 5, z0= -.5 In B 306 9. 2 Pearson III, p= -.5, ROC curves on erf-erf paper 307 9. 3 H-Type ROC character Z scales 312 9. 4 H-Type ROC, n- 1 313 10. 1 ROC characters for four multiple observers 346 10. 2 Multiple observer ROC curve 347 10. 3 Two multiple observer ROC characters 349 10. 4 Multiple observer ROC curve 350 10. 5 Fixed pay-off and C-cut matrix 358 10. 6 Pay-cff matrix for (-rating scale 359 10. 7 Base pay-off matrix for general rating scale 363 10. 8 Formation of complete pay-off matrix from base matrix 366 10. 9 Rating forcing functions 371 C. 1 Comparison of intermediate calculation functions 385

LIST OF ILLiUSTRATIONS (Cont.) Figure Title Page C. 2 ROC character Case H, a = 2 387 C. 3 Graphic determination of a power law relation 388 E. 1 Simplified flow diagram for a conic ROC program 398 F. 1 Flow diagram for external rectification 402 xv

LIST OF TABLES Table Title Page 4. 1 Numbers relating Q-Table ROC to normal 109 5. 1 Canonical conic equations 164 5. 2 Symmetric conics, unbounded z 165 5. 3 Symmetric conics, bounded z 166 5. 4 Collected equations 174 5. 5 Table of parameters 188 5. 6 Table of coefficients 196 6. 1 Comparison of true and approximate ROC curves 213 7. 1 Special values of d' 231 8. 1 Limiting binomial examples 279 8.2 Parameters for X = Q(X VN)' y = Q(X 21SN) 290 9. 1 Special integrals 310 10. 1 Exact formulae 329 10. 2 Low detectability limiting formulae 330 10. 3 High detectability limiting formulae 331 10. 4 Luce zA for selected points 345 C. 1 Calculated numbers 384 v i

LIST OF SYMBOLS The following letters are used repeatedly to denote constants or parameters: A, B, C, D, E, K, a, b, c, k, a,, Y, The following is a list of special symbols: Symbol Page Definition "A" 5 the "alarrri'response, correct for condition SN "A", "Ah 15 "A" due to decision devices g, h, et - h A0 13 a designated set of N-measure zero A a set: index of pure power curves; "A" 44 19 a a-ring on the space X, Chapter I AD 20 a difference set a, e, 46 read "almost everywhere" "B" 5 the "background" response, correct for condition N B a set; "B"; an index b a special constant (or variable) in Chapter III J6 22 a au ring on the space Y, Chapter I C a set: the event "correct"; a chance point a a-ring on the space W, Chapter I D(z) Eq. 6. 22 a function ised only in (;7Tapter VI d(x) 13 a J't aSl ua Ua (flli)(S...;:$ 0. ~iaclcc ioVa a(a Xt-: (1(0 s i0l;k:- ii-n

LIST OF SYMBOLS (Cont.) Symbol Page Definition d 72 quality index, normal ROC da 288 a parameter approximately equal to d A d a parameter approximately equal to d dv Eq. 5. 68 a quality index for an ROC point, based on normal scale dwwp Eq. 10. 29 W. W. Peterson's approximate quality index V d 143 a quality index for an ROC negative diagonal point E(u Iv) expected value of u, conditional to v F(u Iv) probability distribution function of u, conditional to v f(ulv) probability density function of u, conditional to v f (u) 56 root product density function of u g(x, (3, r) Eq. 1. 6 a decision function based on likelihood ratio h(x) a designated decision function i(u) 25 the identity map i(u) = u iff read "if and only if' J Jacobian of transformation J(1:0) Eqo 10.27 the "divergence between hypotheses" K Eq. 5. 43 a specific constant for conic ROC characters k Eq. 10. 61 a specific scale factor

LIST OF SYMBOLS (Cont. ) Symbol Page Definition K1' K2 Eq. 10. 80 arbitrary scale factors KN KSN Eq. 6. 1 specific constants in the metastatic normal ROC W(x) Eq. 1. 15 likelihood ratio of the observation x a likelihood ratio L(u) Eq. 5. 66 the log-odds-ratio of u, the LOR scale M the integer multiplicity of the number of signals; the maximum of f M'v Eq. 5. 70 a quality index for an ROC point, based on LOR scale MF(O) Eq. 3. 15 the moment generating function of the distribution F m the minimum of f; a parameter N a cause class; the noise power P(u'v) the probability of u, conditionalto v Pr, Prob probability p("A"'x) Eq. 1. 1 abstract expression of a decision device PM( ) Eq. 5. 13 principal choice probability in a forced choice experiment pQ Eq. 1.7 a special formula q Eq. 7. 2 quality index, binormal ROC R the reals; index for rectangular ROC character; a scale factor for pay-off matrices; the condition "right" for Type II ROC curves xixJ

LIST OF SYMBOLS (Cont. ) Symbol Page Definition r a randomizing decision value r(u), ro(u) rating scale forcing function, Chapter X r r 215 ra' rb constants related to metastaticnormal 208 S signal power s Eq. 7. 2 binormal ROC slope parameter SM( ) Eq4 5. 26 second choice probability in a forced choice experiment SN a cause class S.N 122 the j-th subset of SN J (snr)2 Eq. 10. 24 signal-to-noise ratio based on f (snr) Eq. 10. 28 signal-to-noise ratio based on z T 234 point of tangency of ROC curve and tangent inferior t a real random variable tN, tSN 319 values of t for the indicated cause to Eq. 7. 10 value of t at z0 tT 239 value of t at the point of tangency u, u. random variables u Eq. 10. 61 observer's optimum rating scale report conditional to indicated decision V V Fig. 10. 5 say-off matrix values VM VCR M' ~ ~ ~ ~ o CR

LIST OF SYMBOLS (Cont. ) Symbol Page Definition VA({), VBM() expected values conditional to designated V value V Eq. 8. 18 ROC vertex n, k W a real subspace; the condition "wrong" for Type II ROC curves Wi an ROC point X, ~ an observation space X, x an observation X, x the ROC abscissa, the false alarm probability P("A" IN) (x(f3, r), y(3, r)) 49 ROC point corresponding to g(x, A, r) Y 22 a space Y, y the ROC ordinate, the probability of detection P("A" ISN) Y(X), y(x) formula for an ROC curve Z, z natural logarithm of the likelihood ratio, i.e.,In l Z0, z0 minimum z; mode z z, Zb Eq. 6. 2 maximum and minimum values of z, Chapter VI Zc' d Eq. 6.45 z values at f(z N) modes ZA, ZB Eq. 10.45 z conditional to indicated response ZT Eq. 7.14 z values at point of tangency T z 381 a cut value of z 1g 23 the characteristic function of the set B B~~~~~~~X

LIST OF SYMBOLS (Cont0 ) Symbol Page Definition 2WT 117 the number of independent variables a cut in a decision axis, especially the P-axis r Eq. 5. 43 a conic ROC constant Eq. 5. 43 the conic ROC discriminant h( I ) 27 an induced pair of measures 1u( I ) Eq. 1. 2 the observation statistics; a pair of measures v( I ) 23 an induced pair of measures 5, ~ (x) 39 a decision axis pi (z) 55 the integrated ROC character mg(z) 53 the ROC character X T(z) the exponential ROC character >MN(Z) Eq. 6.8 the metastatic normal ROC character IN(Z) the normal ROC character 7T R(Z) the rectangular ROC character p 75 a Pearson III parameter r2 variance 407 (T) Eq. 10. 14 the autocorrelation function of the ROC character 7T7T c( I ) 46 a discrete probability pair; jumps in a pair of distribution functions co (Z) 54 the ROC character jump 7 A (xii

LIST OF SYMBOLS (Cont. ) The following functions are treated as well-known. In x natural logarithm e exponential sinh x, cosh x, hyperbolic trig sinh x, sechx IO(x), k(x) modified Bessel functions of the first kind P(a, t) incomplete gamma function I~x) complete gamma function D (x) normal distribution function 0(x) normal density function p(X2 I v) chi-square distribution function with v degrees of freedom Q(X Iv) I - P(X iV) Q( a, i) Marcumvs Q-function *X *,

LIST OF APPENDICES Page APPENDIX A: TWO THEOREMS ON ROC CURVES AND LIKELIHOOD RATIO 37 4 APPENDIX B: A LOWER TRUNCATED SIMPLE EXPONENTIAL IS LOWER METASTATIC 381 INVARIANT APPENDIX C: APPROXIMATION TO A CASE II CHARACTER 383 APPENDIX D: DETAILS OF AN APPROXIMATION TECHNIQUE FOR SECTION 4. 3 390 APPENDIX E: A CONIC ROC COMPUTER PROGRAM 397 APPENDIX F: BINORMAL COMPUTER PROGRAMS 401 APPENDIX G: MANIPULATION OF BINOMIAL SUMS 409 xxiv

CHAPTER I INTRODUCTION The theory of signal detectability has grown up in an interdisciplinary way, combining parts of the traditional fields of mathematics, physics, engineering, physiology, and psychology. The skeleton on which the theory is built is the mathematics of decision theory. The analytic description of the observation and perception, the interferences and the cause-effect relationships are drawn chiefly from physics and physiology. The subsequent realization of decision mechanisms is part of engineering when the realization is in hardware, part of psychology when one is dealing with a human or animal decision mechanism, and falls to other specialties when the realization is a computer program or a finite automaton. The evaluation of the performance of the decision mechanism has similarly fallen to various fields, depending upon the application. This present work concentrates on models for the third aspect, evaluation of detectability. Consideration is restricted to those situations where decision quality is the sole goal of the decision device, and the. sole, consideration in the evaluation. In the past, evaluation models have come principally from problems in electrical engineering, Yet the theory is applicable to any information process involving both observation and subsequent (binary) decision. Today's applications involve such diverse areas as radar, sonar, and digital communications;

the psychophysics of human and animal perception; information retrieval and mledical diagnloses l)y machines. One major purpose of this present work is to furnish a variety of evaluation models. A second major purpose is to develop sufficient mathematical structure to form a basis for classification of these models, and to form a basis for the generation of additional models. It is hoped that this work will be useful to those making mathematical analyses of detection devices by specifying the parameters that must be determined. It is hoped that it will be useful to those experimenters with performance data, that it will help classify the data and indicate appropriate models. For the mathematically inclined this paper develops a special L2 function called the ROC character, that is one-to-one with the graphical presentation, the ROC curve. I hope those in communications sciences with a broad interest in observation-decision procedures will be able to learn what information an ROC curve furnishes, and what information it omits. We shall concentrate on the simplest of all observationdecision problems -- the two-by-two problem. This work is directly applicable to the two-by-E (rating) problem: however, no generalization to finite higher order decision problems will be made. in tne two-by-two problem, it is assumed that two alternative decisions may be made, corresponding to two mutually exhaustive and exclusive classes of causes. In order to make any sensible decision a given cause (such as a specific signal) must belong to one class or the other, and not transfer back and forth. This is the reason for the word "exclusive. " The assumption

"exhaustive" means that no other cause than those listed in the two classes can occur. This present work will deal principally with three subjects: (1) the ROC curve, (2) the likelihood ratio and related functions, and (3) a new function, ir(z), called the ROC character. In this first chapter the basic relations between the ROC curve and the likelihood ratio will be reviewed and the proof given for the fundamental theorem:'The likelihood ratio of the likelihood ratio is the likelihood ratio." This theorem has been known and used by several specialists in the field, but apparently has not previously been rigorously established. In Chapter II the ROC character will be formally introduced and it will be shown that it contains information that is unavailable in the likelihood ratio itself. Even further it will be shown that the likelihood ratio and the ROC character contain the two complementary and separate aspects of detection; namely, that likelihood ratio is relevant to the design of the decision device and that the ROC character is relevant to its performance. In Chapter III the ROC character will be used to establish a basis of classification of types of ROC curves. The primary analytic result in Chapter III is that the class of all ROC characters corresponds to the class of probability functions that are sufficiently smooth and concentrated to have moment generating functions. This correspondence yields the set of classical ROC curves. Most of the ROC models available heretofore have come from radar detection problems. In Chapter IV these few are collected and analyzed in terms of their ROC character. Chapters V. VI and VII

each treat at length one broad class of ROC, the algebraic., the truncated normal, and the binormal. Chapter VIII develops the Convolution Theorem for ROC characters, and the Fisher-Tippett ROC character. both arising in cases of multiple observations. Chapter IX presents the ROC character Addition Theorem, and two double parameter characters, the Pearson Type III, and the H-type. Chapter X is devoted to discussion of selected topics that have arisen in the literature. 1. 1 Review of Detection Theory The procedure that will be followed in this section has two gross steps. First, the graph called the receiver operating characteristic (ROC) will be introduced and used to evaluate detection performance. Secondly, a specific function of observation called the "likelihood ratio" will be introduced and it will be explained why optimum detection is based on likelihood ratio. There are many proofs that the likelihood ratio for detection problems is optimum. The most common proofs are based on specific definitions of optimum. such as "maximum expected value," "minimum loss, " or "Neyman-Pearson test." The proof used here has been chosen to emphasize what Is well known, but really unknown to many -- that the optimumness of likelihood ratio does not require that any specific quantity be maximized or minimized.;All that is required is that decision quality is the goal and that the definition of decision quality reflects a preference for correct decisions over mistakes.

At this point, we must begin to introduce formal notation in order to be specific, and to be able to manipulate mathematically certain rather simple quantities. Many types of notation might be used, all of which appear in decision theory and detection theory. One might choose to emphasize the generality of the work by using abstract notation. I have chosen to utilize the notation common in radar and sonar signal detection, in the hopes thatthe nmemonic benefits gained will overshadow the apparent loss of generality. Specifically, the two classes of causes will be signified by N (noise alone) and SN (there is a signal in the noise). Two alternative decisions that the decision device can make are to sound the alarm ("A", the decision that a signal was present) or to conclude that only background was present, the decision "B". When the cause is noise alone, N, the decisions correspond to a false alarm and a correct response, respectively; when the cause is SN the decisions correspond to a detection and a miss, respectively. The formal description of a decision device is a probability function. For each possible observation the description is the probability that the device will response "A". One might feel that a description of a decision device would correspond to a listing of those observations for which the decision device would response "A". (On the remainder of the observations the decision device would respond "B". ) Indeed, this has been the writer's standard approach in electronic problems (Ref. 1, p. 174), Although this description is quite adequate for a great many purposes, it does not cover those devices which may base their decision in part on extraneous quantities: for example devices whose decisions are

affected by internal noise or devices with some nonstationary behavior in the decision process (such as a jttering decision threshold.) The decision probability function is written p("A" x), where x is a general notation for the observation on which the device bases its decision. In Fig. 1. 1the coordinate systems for the ROC are drawn on a blank unit square. The horizontal axis indicated on the bottom is called the probability of the false alarm. It represents the probability of a mistake, the probability of making a decision "A" when the correct decision would have been "B. " The probability of the corresponding correct decision is shown on the horizontal axis above the ROC and is simply one minus the probability of a false alarm. (The device is forced to make either an "A" or "B" decision. ) The left vertical scale is the probability of making the "A" decision when it is correct, and is called the probability of detection. Had the alternative decision "B" been made the action would have been labeled a miss and its probability is simply one minus the probability of detection. This probability is indicated on the right-hand side of the graph on the 1- P("B"IIN) - 0 t I I IM _ __ _ 0......... 1 0 - P("A"IN) - 1 Fig. 1. 1. Blank ROC

vertical axis. Traditionally, one suppresses the upper and righthand coordinates, listing only the probabilities of false alarm and detection, since the other two probabilities can be obtained readily from these. The term ROC first appeared in the paper by Peterson, Birdsall, and Fox (Ref. 1). Curves drawn on the ROC grid have also been referred to as contours of iso-effectiveness, or contours of equivalent performance. The traditional form of the curve that appears in statistics and some work in signal detection plots the probability of a miss on the vertical axis so that the ROC is the mirror image of the form used here. In statistics such a curve is called the power curve, where the probability of false alarm is called the critical level and the power is one minus the probability of a miss, that is, the probability of detection. Our formalism for a decision device is the probability function describing the probability of deciding "A" for each observation x on which a decision is based. Decision Device: p("A" I x) (1. 1) Our formalism for the observation statistics, describing a specific problem, are the measures of the probability of occurrence of the observation, x,. for each of the cause classes. Observation Statistics: (x ISN)? I(x I N) (1. 2)

These observation statistics refer specifically to those observations on which a decision will be based. This is more restlrictive than the general use of the same term which might mean the statistics of the entire random process of which the observation is just one portion. A specific observation situation is defined by specifying the observations on which a decision may be based, and the observation statistics, Eq. 1.2. The evaluation of a specific decision device in a specific observation situation is the ROC point given by the probability of a false alarm and probability of a detection. ROC Evaluation Point: P(A N) = rp("A'lx) d~u(xJN) (1. 3) P(A ISN)= f p("A"x) d[l(x] SN) In Appendix A, taken from course notes for the Summer Intensive Course on Random Process, 1961-62-63, it is shown that the ROC plot of all possible decision devices in any specific decision situation is convex, contains the chance diagonal, and is symmetric about the center-point. The proofs are simple but tedious. The results are shown in the following figures. In Fig. 1. 2 the chance diagonal has been added to the blank ROC. This is the locus of performance where the detection probability is equal to the false alarm probability. Such performance may be obtained with a decision device that reacts independentlyoftheobservation, and hence is described as chance. Performance

falling below this diagonal is considered poorer than chance and performance above the diagonal is considered better than chance. Fig. 1. 2. The "chance diagonal" We have made no specific definition of optimum. However, let us consider some specific point below the chance diagonal and compare it to the point directly above it on the chance diagonal. Both have the same performance under the condition N. The point on the chance diagonal has a higher detection probability than the point under discussion below the chance diagonal. We did specify that correct decisions are to be preferred to incorrect decision, that higher correct-decision probabilities and correspondingly lower error probabilities will be considered desirable. Therefore the chance diagonal is better than any point plotted below it. The convex closure of two points plotted on the ROC grid consists of the straight line joining the two points. The implication is that if two decisions devices are possible which plot at points w1 and w2. then it is known that a continuum of decision devices

exist which will perform as indicated by the line between the points w 1 and w2. 2W2 W1 Fig. 1. 3. The "convex closure" of two points Consider two decision devices which disagree on each individual decision. The probability of "A" of the second device will be the probability of "B" of the first device. P("A2 IN) = P("B1 IN) (1.4) P("A" ISN) = P("B1 ISN) Equation (1. 4) is the reason the ROC region exhibits the midpoint symmetry that is indicated geometrically in Fig. 1. 4. W1i W2 Fig. 1. 4. Existence of midpoint symmetry

The purpose of this detailed geometric examination was to determine the implications of the existence of a single nonchance point. If there is a particular decision device with nonchance performance, plotted at w1, then it follows directly that the region of obtainable performance with the given observation statistics contains at least the parallelogram indicated in Fig. 1. 5. The boundary of this parallelogram is the convex closure of the points (0, 0), wl, (1, 1), and the opposite of w10 Every point in the interior lies on many straight lines between boundary points, Fig. 1. 5. Convex closure of a point, its opposite, and chance The plot of all possible ROC performance points for a specific observation situatio:. is called the ROC region. The ROC region is a function of the observation situation, not of any particular

decision device. Different observation situations may lead to different ROC regions. If two observation situations have the same ROC region, they are said to be equally detectable situations. Two observation situations may differ in.all respects; the observations on which decisions are based, and the observation statistics. If two observation situations have the same set of observations, but differ in one of the measures in the observation statistics,they are considered as different situations. For the remainder of this section (1. 1) we shall consider the observation situation fixed, and the ROC region known. If the ROC region is as shown in Fig..1. 5, then one should desire to operate somewhere along the upper boundary. This is because each point p on the interior represents a higher miss probability than the upper boundary point directly above p, and a higher false alarm probability than the upper boundary point directly to the left of p. The upper boundary of the plot of the ROC region is called the optimum ROC curve. If one restricts himself to only the upper boundary instead of the entire region,he can still maximize any specific goal, that is, he can maximize every specific definition of performance. The second step in this section is to establish that the ROC region is closed, meaning that the upper boundary actually exists and is obtainable, and that the widely acclaimed likelihood ratio will always lead one to performance on the upper boundary. Of special

13 interest are those decision devices which are based on the real quantity likelihood ratio in a special way. A decision is said to be based on a real decision axis d(x) and a cut value i, if the decision "A". occurs with probability one whenever d(x) is greater than 3 and occurs with probability zero whenever d(x) is less than fl. Whenever d(x) equals 3,the decision probability is some value, r, between zero and one. Because this present treatment is slightly more general than the usual likelihood ratio treatment, such as is given in Appendix A, we must go back to the foundations of the likelihood ratio principle. The likelihood ratio of an observation is the connecting link between two possible measures of occurrence of that observation. The RadonNicodym Theorem (Ref. 2) establishes this connecting link. Consider two observation statistics pL(x iN) and /j(xLSN). (These are assumed to be measures on the same completely additive field of sets. ) The Radon-Nicodym Theorem states that there is a function, 2(x), and a set of N-measure zero A0, such that Eq. 1. 5 holds for any mutually measurable set. /j(CN) + di(xfN), Id(A0IN) 0 (1.5) p (C i SN) fS (x) d (x N) + i (C A0I SN) C-A0 To obtain the N-measure of some collection of observations, C, integrate

14 the number one with respect to the N-measure. To obtain the SN-measure of the same collections of observations, integrate the weight &(x) with respect to the N-measure and add on the SN-measure of CnAO This allows one to obtain both measures by integration with respect to j(x IN), except for sets of N-measure zero. The function k(x) is never really utilized when x is in the set of N-measure zero, A0. We may define C(x) to be infinity on this set as a notational convenience. Such a function f(x) is called "the Radom-Nicodym derivative of the SN-measure with respect to the N-measure," and is the most general form of a likelihood ratio. The vast majority of the effort in the physical theory of signal detectability is devoted to obtaining explicit equations for f(x), and to realizing practical equipment which will compute its value. In contrast, this present work is devoted to evaluating performance and not to obtaining performance. The meaning of the terminology "a decision is based on likelihood ratio with cut value 3 and boundary value r" is given by Eq. 1. 6. 0P< oc, 0O< r< 1 p("A" I x) g(x, i3, r) - 1 fZ(x) 7> r l (x) =- /3 (1. 6) To0 (x) t3 To establish that such decision functions fall on the upper boundary

of the ROC regionll compare its performance to the performance of any other possible decision function, which is labeled h(x). The proof is shortest if we consider the contrived function Q.: = [ P('Ag' ISN) - P('A StjN) -3[ P("A IA N) g h gg h (1. 7) Q is a comparison of the probabilities of detection for the g and h decision devices, together with a /-weighted comparison of the probabilities of false alarm of the two devices. The probabilities of false alarm are given by direct integration with respect to the N-measure, P("A "IN) = j g(x,, r) d (x I N) (1.8) P("Ah "N) =S h(x) d (xIN) X-A0 where X-A means those observations that are not in the set of 0 N-measure zero, A0. The probabilities of detection could be obtained by similar integration with respect to the SN-measure. However, utilizing the Radon-Nicodym Theorem and likelihood ratio we may evaluate the probabilities of detection by again using the N- measure. P("A" ISN) - g(x, ii, r) l(x) d/ (x IN) + f g(x, 3, r)di/ (x ISN) X-AO A (1. 9) P("A?'ISN) -. h(x) ((x)djl (x IN) - I h(x)dJ. (x SN) hlX-Ao A

The six integrals in Eqs. 1. 8 and 1. 9 can be collected into two integrals for determining the value of the quantity Q3. Q - jf [g(x,/3, r) - h(x) ] [2(x) - 3 ] djl (x IN) X-A0 (1. 10) + fA [g (x, B, r) - h(x)] diu (x SN) The first integrand is a product of two factors. When the likelihood ratio is greater than 13, the second factor will be positive. Since g is one for these observations, the first factor will be zero or positive. For those observations with likelihood ratio less than 3, the second factor will be negative. Since g is zero for these observations, the first factor will be zero or negative. e(x) g> 3, g-h [1 - [ / = [- hi [ - 3; > O e(x) = 3, [g- h] [ -] r [ h] [ ] = 0 (1. 11) C(x)' /3, [g- h]i [ - /3 ] = [0 - h] [ - X3] > 0 Since the integrand of the first integral is never negative, the integral will also be nonnegative. We now turn our attention to the second integral, over the set of N-measure zero. The likelihood ratio is infinite for these observations. Therefore, g is one and the integrand is nonnegative. xcA0, [w- h] [1- h]:_, 0 (1. 12)

We have therefore determined that a qcuantity Qg is nonnegative for the decision device based on likelihood ratio whose cut level is the same number 3. How does this establish that the decision based on likelihood ratio is better? Pick any particular decision device. Compare it with a decision device based on likelihood ratio which has the same probability of false alarm. Proof that we can find this decision device based on likelihood ratio with matching false alarm probability is given in Appendix A. Granting that we can find such a matching device, it will have some value of i3 and r. The quantity Qp evaluated for that specific value of / will be nonnegative. Since the false alarm probabilities are equal, the quantity Q/ is simply the difference between the two detection probabilities. Q8 being nonnegative means that the detection probabilities for the decision device based on likelihood ratio is greater than or equal to the detection probability for the decision device h. P('Ag IN) P( "Ah IN),1. 13) Q > O 4 P('"AgISN) > P("Ah" SN) This establishes that the performance of the decision based on likelihood ratio falls on the optimum ROC curve for this ROC region. The "onto" part of the proof establishes that for every possible probability of false alarm there is at least one decision

18 device based on likelihood ratio which yields this probability of false alarm. This part of the proof has been omitted here and given in Appendix A because it is tedious and not illuminating. The first part of the proof has been displayed to emphasize that the optimum ROC curve, the upper boundary, is directly related to likelihood ratio. 1. 2 Fundamental Theorem 1. 2. 1 Likelihood Ratio. A decision axis can be considered as an observable. Whatever the physical nature of the original observation and the relationship of the decision axis to these original observations, there will be certain probabilities of occurrence of the various decision values under each of the two classes of causes. The question immediately arises, "If we have two measures induced on this decision axis, what is the Radon-Nicodym relationship between these two measures?" That is, "What is the likelihood ratio of the decision value?" We will consider the special decision axis which is the likelihood ratio of the observation, and will establish that the likelihood ratio of this decision axis is numerically the same as the value of the decision axis. That is, the likelihood ratio of the likelihood ratio is the likelihood ratio. For example, assume an observation space and a device which numerically evaluates the likelihood ratio of each observation. Let us further assume that for a given observation the value calculated by this device is the number 2. 74. We will establish in this section that in this case the likelihood ratio of the number 2. 74

19 is 2. 74. That is. the numerical value 2. 74 occurs undder the cause SN with probability 2. 74 times the probability it will occur under the cause N. Let us consider an observation space, X, and a collection of measurable sets A which form a cr-ring C.don X. A probability function is any nonnegative completely additive set function on i, such that the probability of the entire observation space is unity. To introduce likelihood ratio we consider two such probability functions defined on the same va-ring.A', and compare probabilities. Repeating the Radon-Nikodym Theorem (Ref.2), if u(AISN) is a completely additive set function on a ca-ring of sets which are measurable with respect to ji(AIN), then there exists a unique decomposition consisting of the following: AO, a set of iN measure zero; /(A0IN) - O f, a function integrable with respect to ((xIN), unique almost everywhere Q, a set function defined on 4nAo0 such that p(AISN) A f(x)dl (xlN)+ Q(AA0) (1. 14) Here the Radon-Nikodym derivative relates tl( ISN) to t(. IN) except for a set of N-measure zero. Using the above formulation define a likelihood ratio ((x /;

20 k(x) = f(x) x X A0 (1. 15) = co x e A The definition of likelihood ratio is unique, up to a set of combined measure zero. That is, many functions and sets A0 may satisfy the definition; however, for any two likelihood ratios let AD be any measurable set on which they differ. Then u(ADISN) + (AD'N) 0 (1.1) Corollary P(A SN) = S (x) d/u (x IN) + Q(AnA0) A-A (1. 17) P(AIN) = S du (xIN) A-A Let us consider the implications of this definition to the familiar situations of probability density functions and point-mass probabilities. For example, consider an n-dimensional space. Let x (x1, x2,...x ) and let there be density functions. Then for any Borel set A P(A[SN) = A f(xiSN)dx P(A N) J f(x N)dx (1. 18) Let A0 be those points for which the N density vanishes. Then for any Borel set

21 P(A SN) P(A-A 1 SN) + P(A A'I SN) (1. 19) P(AISN)= f f(xlSN)dx+ P(AOnAISN) (1.20) A-A Multiplying and dividing by f(x N) P(ASN j f(x SN) A f(x'N)dx + P(A0 A SN) P(AA0) f(xl N) 0 (1; 21) while P(AIN) = S f(xiN)dx A-A Matching terms with Eq. 1. 17 f( ) f(x SN) if f(xlN) 0( f(x IN) (1.22) = o if f(xlN) = 0 Next consider a point space {xi, and A be any subset of these points. Then P(AISN) = ~ P(xi ISN), P(AIN) j P(x. IN) (1.23) x. -A x.EA A 1 1 Let A0 be those points having N-probability zero, then P(AISN) = P(xi rSN) -+ P(ANA ISN) (1. 24) i:

22 \A I NP)(x x I SN) P(A SN) P(x.ilN) + P(A A0ISN) x.EA —A I while >(1.25) P(A IN) P(x. N) x. EA-A ~1 0 Matching terms with Eq. 1. 17 P(x. ISN) ~a (X ) = 1 if P(x.ilN) X 0 1'( P(x. IN) 1 (1.26) = o if P(x.lN) 0= We shall not continue with the special cases by treating spaces with distribution functions, leading to a mixture of densities and point-mass probabilities. The purely mechanical difficulties involved in notation make the general Radon-Nikodym formulation attractive. Before proving the basic theorem that "the likelihood ratio of the likelihood ratio is the likelihood ratio. " let us set up the proper framework of "induced measures." 1. 2. 2 Induced Measures. Paraphrasing from Pitt, (Ref. 2,, PP. 25-27) let a be any function mapping X onto Y, not necessarily one-to-one, a a-ring of sets in X and it a measure on it. We will say a set B in Y is a simple set if a (B) is in the o-ring d?. Obviously, the collection of all such sets B form a u-ring H. The function v is

23 called "the measure induced in Y by jl and a" if BE, v)(B) ((B) (1.27) Integration of a function in Y can be reflected back to the original space X. Quoting Pitt, Theorem 31: Theorem 31:- Suppose /i is a measure in X and v is the measure induced in Y by y = a(x). Then for functions of y Sf(y) d = Sf(c(x) ) d (1. 28) Y X in the sense that if one integral exists then so does the other and the two are equal. To say that "B is a v-measurable set" is equivalent to saying that the characteristic function lB(Y) is integrable with respect to v. (Y) 1 yB 0 y/B v(B) f l (y)dv(y) F dv (1. 30) Y B Since the product of integrable functions is also integrable, for any integrable f of Theorem 31, 1B times f is also integrable. We therefore have the corollary: Corollary to Pitt's Theorem 31: Suppose ti is a measure in X and v is the measure induced in Y by y - a(x); then for any measurable set

24 | f(y)dv = S f(o(x) ) du B-1 a (B) (1.31) if either f f(y)d v or f f( a(x) )d~i exist. Y X Let us diagram our current status. We have assumed an observation space X, a a-ringd of measurable sets, two measures j(x N) and ji(xJSN). For any map we can consider the induced measures v( r IN) and v(r ISN). If the map is the likelihood ratio f(x), the image of X is some portion of the reals, together with co. 8: X onto RCEcon. X....nt~...., R c Et U{oo} -. i Q(B)...,.,. - B measurable measurable.Q: u(x IN) v v(r IlN) i: tl(xlSN) —-+ v(rlSN) Right away let us single out r o= and the special set A0 and list their properties..: A0 —co, - 1( {co ) A /l.(A0 IN) 0 -> v(co IN) 0= 1(A0 ISN) exists -''l(OolSN) is some number. For alny measurable B of reals

V(B) A - if co B (B)A0 — A A0 if ooc B We now have a space R with c-ring 9 of measurable sets, and two measures v(r IN) and v(r I SN). The question arises, What is the Radon-Nikodym derivative of v (r ISN) with respect to v(r N)? What is the likelihood ratio on R space? To answer this we consider the identify map i(r) which maps each point into itself. X ~' R i, R By definition the space X has probability one under both v measures. 1 = /i(X ISN) = f f(x)dAi(x IN)+ I(A0ISN) (1.32) X-A0 Because. is integrable with respect to /u, for any set B measurable in R space (B) - A (x)d(x IN) (1. 33) exists. If we now consider the identity map of R onto itself, for any measurable B f i(r)d i(r. N) f a i((x) )dp( x I N) (1.34) B-101 -l(B- _ {})

26 Because i(f) - J i(r)d v(rl N) = J f(x)dji(x IN), which exists (1. 35) Bc- c B- (B)_Ao I(I (B)-AOISN) by def. of f (1.36) v (B- cot ISN) by def. of v being an (1. 37) induced measure Therefore we may write v(B ISN) = f i(r)dv (r IN) + (Bn o ISN) (1. 38) B - lot By direct identification i(r) is a likelihood ratio. We have therefore shown the following theorem: Theorem: A likelihood ratio of r (which is a likelihood ratio of x) is r itself. We complete this formal abstract work with one further consideration. Consider any one-to-one transformation of R which maps infinity onto infinity. This means take any function w = u(r) (1. 39) and its inverse r = v(w) (1. 40) such that

27 u(c) = co, v(co) Q (1. 41) Of course W = u(R) (1. 42) Let the induced measures on W be X(w N) and X(w SN). The X- measurable sets c' are the image of 4 ~ t= u(>) (1.43) For any measurable Ce, let B = v(C). Then X (C iSN) = X (C- co ISN) + X (C{Cooe ISN) (1. 44) X(CiSN) = i(B- {oot ISN)+X(C{oo} ISN) (1.45) X(CISN) = f rdv(rIN)+A(C nlo ISN) (1.46) By the corollary to Pitt's Theorem 31 f rdv(rl N) f v(w)dX(w IN) (1.47) B- c{4 C-loot So X(CISN) = f v(w)dA(wIN) + x(C nool ISN) (1.48) C- oa We see immediately that v(w) is a likelihood ratio. Example: If w l- n r, then 1r = e. If r i:; a likelihood ratio(, lthen e is a. 1:,e i-i.f)d.Yaii(,;.f(;,, w I r.

28 1. 2. 3 Section Summary. What has been considered is diagrammed in Fig. 1. 6. From any observation space X with measures gi and special set A0 there is a likelihood ratio l(x) such that,(AISN) = f C(x)d/(x N) + 1(AnA ISN) (1.49) A-A The induced measures on R -= (X) are v, so related that their likelihood ratio is simply the value of r itself. v(B ISN) = f rdv(r IN) + v(Bn{YooISN) (1. 50) B- ioo For any one-to-one transformation which leaves co fixed, say r v (w) (C SN) f v(w)d (w IN) + (C( {o ISN) (1. 51) The likelihood ratio of w is simply the "substitution value" v(w). In order to establish the proofs, care has been taken to avoid notation which is "suggestive." With the proofs complete we may now make use of more suggestive notation to summarize the two conclusions. f( ) - ~ (1. 52) -1 a(w(e) ) ) w (w) (1.53) and in particular if z In.n

29 r = (x) -. w=w(r) 0- r = v(w) A11 - - 1 C o.o X R W xN(xlN), ji(xISN), B(x) V(rIN), v(rlSN), i(r)- (w N), X(wISN), v(w) Fig. 1.6. Sketch of spaces and transformations

30 f(z) eZ (1.54) When the measures v( 2 IN), v(t( ISN) are given by distribution functions, we may write F(k ISN) - dF([IN) (1.55) 0 This special case was given in Ref. 1 as Theorem 8, and was written dF(3I SN) = 3dF(Q IN) (1. 56) 1. 3 Initiation of Present Study In 1954 W. W. Peterson (Ref. 1, pp. 205-206) showed the rather amazing theorem that if the logarithm of the likelihood ratio is normally distributed under condition N, then it is also normally distributed under SN. Specifically, he started with the assumption z In & has a normal probability density function under N. (z-m)2 1 2d f zIN) 1 e (1.'57) He was first able to show that the mean is related to the variance, (z +. 5d)2 1 e2d f(z2 I1) e 2d (1. 58) 2z2IN)a

3 1 and finally that the SN probability density of the logarithm of the likelihood ratio is (z -.5d)2 2d f(z SN) d e 2d (1. 59) Equations 1. 58 and 1. 59 may be expanded to show that the likelihood z ratio of z is indeed e. f(z SN) z (z) = f(zN) = e (1.60) This theorem of Peterson established the basis for the one parameter family of ROC curves, the normal P.OC curves, The curves are inctexea Dy the single real parameter, d, or as it is done in the psychophysical literature by the single parameter d''d. This one parameter family of ROC curves has been used extensively in both the electrical engineering and psychophysical work of detection. The doubly truncated Halsted distribution is a five parameter class of probability density functions. The untruncated distribution has been used in the study of rapidly fading signals (Ref. 3). It arose again in a study of signal detection and learning in which the amplitude of the signal was initially unknown (Ref. 4). When it is used as the N-probability density function for the logarithm of the likelihood ratio,it is 2 z f(Z IN) F zeCz e 2D A z <E (1. 61)

32 It was discovered that the SN-probability density function for the logarithm of the likelihood ratio fell in the same class 2 f( SN) B -(C 1)z 2D A< z < E (1.62) Since this class of functions contains as subclasses the exponential, the normal, the Rayleigh, the chi-square, and the chi distributions as special cases, it was evident that many theorems like Peterson's normal theorem could be obtained from this relation. The key to all these theorems is Eq. 1. 54, that the likelihood ratio of the logarithm of the likelihood ratio is e.

CHAPTER II ROC CHARACTER 2. 1 ROC Curve In the introduction a simple decision process was characterized as one which assigns a probability p("A" I x) to each observationx, denoting the probability that this decision mechanism will give the response "A" whenever x is observed. The performance of such a decision device in a specific observation situation plots as a signal point on the ROC plot. The mechanism for decisions based upon likelihood ratio considered a probability function g(x, 3, r) which yielded an entire ROC curve. The parameters, 3 and r added sufficient dimensionality to the simple decision function to produce this complete curve. In general, decision functions are parameterized by some index which allows more than one performance point. If the dimensionality of the index set is too small, the decision mechanism may generate only a set of points, such as shown in Fig. 2. l(a), or it may generate a continuum of points leaving some gaps in the curves, or possibly multiple-values for y as a function of x, as shown in Fig. 2. l(c). These differences between the types of ROC curve given by decisions based upon likelihood ratio and decisions which give incomplete or multiple-valued plots on the ROC curve may be removed by a process of convex completion. The graphic form of convex completion ~3 r:3

34 0 0 0 0 0 " 0 o 0 0 0 o 0 o x = P("A"IN) 1 0 x= P("A"IN) 1 (a) Original arcs (b) Convex completion of (a) 1 1 o X1 X2 1 0 x1 x2 -x P(A"N) x = P("A"tIN) (c) Original arcs (d) Convex completion of (c) Fig. 2. 1. Illustration of complex completion of ROC 31. O x= P('"A"IN) 1 Fig. 2.2. Illustration of tangent cone at one point of an ROC curve

35 was to add to the ROC region the line segmentjoining'any two possible points available from agiven decision mechanism. By such a process, a single point not on the chance diagonal yields a whole parallelogram of points surrounding the chance diagonal. Figure 2. 1 illustrates two cases of convex completion. The first, illustrated in Figs. 2. 1 (a) and 2. 1 (b), show the results of convex completion for a finite set of points. In Fig. 2. 1 (b), only the upper bound of the ROC region has been filled in. This is sufficient, since the lower bound of the ROC region lies below the chance diagonal and could be obtained by symmetry through the midpoint if desired. This upper bo.und lies on or above all of the internal secants forming the complie — rc,:npletion between any two possible points. In Fig. 2. 1 (c) a -;:)neTawhza.t different case is involved. Between the points with false alarm,p.,,joabilities x1 and x2, there is a gap which is filled in with the convex completion of those two points. Since the result is better (that is, above and to the left) than the short segment of arc in the interior of the region, this short segment disappears from consideration. Secondly, the lefthand side of the original arc shows a doubling back of the arc, making the function y(x) triple valued on a region of false alarms. A straight line can be drawn from the point (0, 0) tangent to the original curve in the triple valued section, that exceeds all of the original curve up to the point of tangency. The resultant region shown in Fig. 2. 1 (d) exhibits a smooth upper bound, the ROC curve. The ROC curve lies on or above any internal secants

36 since it is the upper bound of all convex completions. A convex ROC curve also lies on or below any tangents or tangent cones drawn to points on the ROC curve. For a tangent to one point on the ROC curve to contact another point of the ROC curve would require that the convex completion of those two points would fall above the ROC curve. Since the ROC curve is on or above all internal secants, this situation cannot occur. Figure 2. 2 illustrates a tangent cone drawn at one point of an ROC curve. At that point the curve is continuous but has a break in slope, so that there is no unique tangent line to that point. The tangent cone consists of all of those lines contacting the ROC curve at only that one point, otherwise lying everywhere above it. The upper bound of the convex completion of given ROC points is the ROC curve for those points or segments. Such an ROC curve is characterized by being complete and convex. The mathematical adjective "complete" means that for each value of the horizontal axis there is one value for the vertical axis. That is, the function y(x) is a well-defined single value function for x between zero and one. Definition Complete: y(x) has a single value for each x e [0, 1] (2. 1) Convex means that the values of the curve lie )on or (-l above tie intein lal

37 D efinition Convex: x3 x O c' a: 1 1 2 3 - (2. 2) y(X2) > ay(x1) + (1- a) y(X3) The term "singular" has been used in the detection literature to mean perfect performance. In terms of the ROC curve this means that the probability of detection is unity for all probabilities of false alarm. This is illustrated in Fig. 2. 3(a). 0 0 0 x X.X (a) Singular (b) Nonsingular (c) Regular Fig. 2. 3. Illustrating singularity for complete convex ROC curves Definition Singular: y(x) = 1 for all x [0, 1] (2. 3) Once it has been established that a given situation leads to singular detection there is no need for further description of its perfo'mnance. A curve may be nonsingular and still reflect a certain degree of perfect detection. That is, there may be some observations yielding a positive probability of detection and a zero probability of false

38 alarm. Correspondingly there may be a set of observations for which one can unequivocably respond "B, "for which the probability of a niiss is zero. The ROC for the hypothetical situation containing both of these possibilities is shown in Fig. 2. 3(b). Such a curve will be considered complete, convex, nonsingular, and nonregular. A regular ROC curve is complete, convex, and interior to the unit square except at the chance points (0, 0) and 1, 1). At this point the precise statement of the type of ROC curves that will be considered can be made. This work will deal only with complete, convex, nonsingular ROC curves, or with nonsingular ROC curves that may be made complete and convex by the process of convex completion. Definition Regular: complete, convex, x = 0 => y = 0 (2. 4) y= -- x 1 2. 2 ROC Models In this section two propositions are proved. The first proposition is that an ROC curve contains insufficient infornmation to specify the observation statistics that led to it, or even to specify the statistics on a real decision axis that lead to it. The second proposition is that an ROC curve does contain sufficient information to specify the statistics on the logarithm of the likelihood ratio. 2. 2. 1 One ROC Curve, Many Models. The purpose of this section is to demonstrate that a single ROC curve contains insuffi>1 is read "implies. "

39 cient information to specify the observation statistics or decision axis from which it was obtained. Rather, there are many decision axes and observation statistics leading to the same ROC curve. To begin the demonstration, start with a real variable,, and choose a distribution function, F1( ), fairly arbitrarily. We shall also assume that a regular ROC curve, Y(X), has been selected. The random variable 5 will be used as a decision axis, and F1(5 ) will be used as the N distribution function. The SN distribution function for 5 will be chosen to obtain an ROC curve identical to the given ROC curve. "B" O 5' 51 Fig. 2. 4. Arbitrary decision axis, - To say that the 4 -axis is a decision axis means that whenever a particular value of 5 occurs which falls above a cut level, say t, the decision will be "A. " In contrast, when it falls below 5', the decision will be "B. " When the observation falls right at the value

40' a second (randomizing) random variable is introduced to determine how often the decision "A" is elicited. Since the purpose of this development is to show that there are many possible ways to set up observation statistics on, let us introduce some convenient restrictions on these observation statistics. To avoid limiting considerations, it is assumed that the probable values of t fall in the open interval between two numbers 5 0 and t 1' Therefore, the distribution function is zero below ~0 0 0 and is equal to one at and above 1, It is also convenient to assume that the distribution function is strictly monotone increasing in the interval between 5 0 and 1. This guarantees that if a particular value of F1 between zero and one is assumed for some 5 value that we can uniquely solve for this ~ value. 0F1()= 0 5 F1(5 ) is strictly monotone increasing 5 0< E < 1 (2. 5) F1(5 ) = 1 51 1With these minor restrictions established, let us set the N distribution equal to F F(Q IN) F1() (2. 6)

41 It will be convenient to have a special symbol for one minuis the distribution. X() 1 - F1( ) (2. 7) X at the largest value of 5, 1' is zero. X at the smallest value of', 0. is one. X is strictly monotone decreasing as a function of 5 on the interval X between zero and one. X( l') 0 X(tO) = 1 (2. 8) X I/tV on 0 < X< 1 (The notation X [, means "strictly monotone decreasing. ") The decision axis and the distribution on 5 under N were arbitrary. We now choose the observation statistics for ~ under the condition SN, using the specified regular ROC curve Y(X). F(5 ISN) - 1- Y[X(Q)] < (2.9) 1 5 > 51 The N Distribution function is certainly a legitimate distribution. Is the SN function as given a legitimate distribution function? For values of the variable less than the minimum, 5 0 the X value is one and hence the Y value is one, since Y(1) = 1. Therefore, the distribution function, under the condition SNT, is zero for all small values of the argument.

42 < %0' X( ) = 1, Y = 1, F(5 ISN) - 0 (2.10) For intermediate values of the argument 5, X is a strictly monotone decreasing function of. Y is a monotone increasing function (not necessarily strictly monotone increasing) function of X. The distribution function under the condition SN is a strictly monotone function of Y. Putting these all together we see that the distribution function for 5 under the condition SN is a monotone function of its argument,. 0< ( < 1' X 1 A, YI X, F(Q ISN) Ly, F(Q iSN) I (2.11) Now we have a decision axis,, and two distribution functions on the decision axis for observation statistics. Let us determine the ROC curve. We will use x2 to denote the values of the probability of false alarm plotted along the horizontal axis, and Y2 to denote the values of the probability of detection plotted along the vertical axis. The subscript two is used to indicate that this is the second ROC curve; the first one is the original or given ROC curve Y(X). The probability of false alarm is simply the N-probability that the decision variable will exceed the threshold value 5', which is simply one minus the distribution function for t' under the condition N.. x2(k') = Pr(5 > C'[N) = 1 - F(t'IN) (2.12) The distribution function of 4' under N condition may be written in terms of X(5 ) by combining Eqs. 2. 6 and 2. 7. When these are

substituted into Eq. 2. 12 we obtain x2(') = 1 - [1 - X(')1 (2. 13) Qx2(') = X(Q') The probability of detection is derived in a completely analogous manner. yi2(') = Pr(5 > r'ISN) (2. 14) Y2(') = 1- [1- Y(X('))] = Y[X(Q')] (2.15) Equation 2. 13 can be used to justify replacing X by x2 to obtain Y2(') = Y[x2(')] (2. 16) The relationship between the vertical and horizontal axes in this second ROC is exactly the same as it was in the original ROC curve; that is, the two ROC curves are identical. Y2(X2) - Y(X) (2.17) Let us review the above argument. Start with any complete convex ROC curve for which a model is desired. A model is a decision mechanism and observation statistics which yield that given ROC curve. The decision axis must meet only the minor restriction of boundedness. The distribution function for the variable under the condition N

44 must meet only the minor restriction of monotoneity. (Both restrictions were for convenience and are not strictly necessary. ) Once a specific choice of these two has been made, a distribution function under the other condition, SN, can be specified to obtain the given ROC curve. This result has two implications. First, a regular ROC curve and a fixed decision axis together contain insufficient information to specify the observation statistics that led to the given ROC curve. Second, given a regular ROC curve for which a model is desired, one may exercise great freedom in assuming the nature of the decision axis and one distribution on it. This above demonstration required that the ROC curve be regular. This could have been relaxed from "regular" to "complete, convex, nonsingular." The additional complexity did not seem warranted. In Section 3. 5 a functional relation will be established between a complete, convex, nonsingular ROC curve, and an associated regular ROC curve. After that section, one may conclude that each model for the associated regular ROC curve implies a model for the complete convex, nonsingular ROC curve, thereby obtaining many models for the nonregular ROC curve. A Notation Introduced. Many of the equations in this work will occur in pairs of equations, one for the condition N and the other for the condition SN. It will be convenient to have a notation that signifies both equations when they are nearly alike. The notation used is a slight modification of the ~ type of notation commonly used. Its use is shown in the following two equations. Consider two exponential density functions.

45 f(tiSN) = et I t<.48125 (2.18) f(t I N) = e It[ <.48125 In the double notation these would be written f(tlSN) et It <.48125 (2. 19) When the upper condition is used, the upper sign is used. Consider the following equation in double notation. Prob (z < SN = d (2. 20) -00 or condition is used throughout the whole equation, or the lower conctluon is used throughout the whole equation. 2. 2. 2 A Unique Model for Each ROC Curve. In the previous section it was shown that there are many models for each regular ROC curve. In this present section the relation between the decision axis and the likelihood ratio will be specified. Under this condition, the specification of the ROC curve will uniquely determine the distributions on the decision axis. The demonstration is complicated enough for regular ROC curves; the inclusion of consideration of the nonregular but complete, convex, nonsingular ROC curves adds no further complexity. The specific decision axis considered is the z-axis, where z is the logarithm of the likelihood ratio of the observation. z = In Q(X) (2. 21)

46 We shall first analyze an ROC curve derived from distributions on the zaxis, then discuss the synthesis of z-axis distributions from an ROC curve. Any distribution function on a real axis consists of two parts the jumps in the distribution function corresponding to values that have probability, and the smooth continuous parts of the distribution corresponding to those values that have zero probability individually, but which have a probability density function. For notation use F for distribution function, f for probability density function, z(i) for a point on the decision axis for which the distribution function under N or SN has a discontinuity. The magnitude of these jumps will be denoted by o(z). Fz9 f (Z dz, + W (Z M a. e. (2. 22) F(z0 = f f( dz + (a2 -00 PM < zo(z(i)< ZO The above equation represents the distribution function for the condition N if the lower conditions are read throughout, and the distribution function for the condition SN if the upper conditions are used throughout. As is true in all general distribution work, we can guarantee the existence of the probability density function almost everywhere, but not everywhere. This is because there may be a finite or countable number 1 Strictly speaking, this may fail to be true at some points. However, the total set of points at which it fails is a set of zero probability. Thus, the statement is said to hold "almost everywhere," written a. e.

of points at which the probability density is not specified without affecting the value of the distribution function. Concentrate on those values of the decision axis, z, that have probability under either condition N or SN. If the probability under N is positive but the point is a set of SN-measure zero the likelihood ratio will be zero. w[.z(i)IN] > O, Co[z(i)ISN] = 0, [z(i)J = 0 (2.23) If the converse situation holds with the point z(i) having N-measure zero, then the likelihood ratio of that value is infinite. w[z(i)IN] = 0, w[z(i)ISN] > 0, k[z(i)] = o (2. 24) In the situation where the point is of positive probability under both conditions N and SN the likelihood ratio is given by the ratio of these two probabilities. co[ z (i) ISN] o[z(i)lN]>O, o[z(i)lSN] > 0, [z(i) = [zii (2. 25) wzi)IN ( )roved in Section 1. 2, the likelihood ratio of the logarithm of the likelihood ratio is the exponential of the log of the likelihood ratio. f(z) = e (1. 54) Equation 1. 54 is a direct result of the fundamental theorem. It is not obtained by inverting Eq. 2. 21. The likelihood ratio in Eq. 2. 21 is a

48 function which maps the observation space into the reals. The likelihood ratio given in Eq. 1. 54 maps the reals into the reals. This distinction is the reason the fundamental theorem had to be established. From Eq. 1. 54, under the three conditions of Eq. 2. 23 - Eq. 2. 25, the values of z(i) can be determined exactly at which the likelihood ratio takes on the values zero and infinity. When the likelihood ratio is zero,,the decision axis value must be minus infinity, and similarly when the likelihood ratio is infinitethe decision axis value must be plus infinity. All other jumps in the distribution functions must be common to both of the distribution functions, and the ratio between the two magnitudes of the jump will be given by Eq. 1.54. f[z(i)] = 0 = z(i) = -bo [ z(i)] = co => z(i) = +o (2. 26) Z( i ~0, o[ Z(i)iSN1 ez(i) W[z(i) [ N We will retain the form of Eq. 2. 22 for the N distribution function but will utilize Eq. 1. 54 and 2. 26 to write the distribution function under SN. z z(i) F[zolSNs] = e f(ziN) dz+ X e w[z(i)iN] a.e. -xO {Z(i)< ZO) (2. 27) In Section 2. 2. i the N distribution function could be chosen quite arbitrarily and the SN distribution function chosen to obtain the desired ROC curve. In contrast, the SN distribution function of z is determined once the N distribution functioni is chosen.

49 Let us analyze the ROC curve for a given N distribution function for z. Let /3 stand for a particular cut value along the z-axis. Whenever the value /3 is one for which the distributions have jumped, representing a z-value with positive probability under both conditions, let r be a number between zero and one inclusive. The simple decision based on the z decision axis with parameters 13 and r will plot as the point (x, y) 00 x(f, r) = f f(zlN)dz+ wo((z(i)lN) +rw(filN) a.e. (2.28) /3 4z(i)> e} y(,r) s e f(zlN) dz+ e W(z(i)liN) + re w(3iN) a.e. /3 1z(i)>/3p (2.29) The point moves up along the ROC curve with increasing r and moves down the ROC curve with decreasing r, in the range of r from zero to one. L |w( I SNN)../., co - (fIN) Fig. 2. 5 Sketch of are of ROC corresponding to a z value / of positive probability

50 The derivative of the probability of false alarm (Eq. 2. 28) with respect to the parameter r is the magnitude of the jump under N. The derivative of the probability of detection (Eq. 2. 29) is the magnitude of the jump under SN. e {z(i)t, 0< r < 1 (2.30) dx(j3, r) _(fIN) dy(f, r) = e: cw(/N) dr dr The ratio of these two derivatives is the slope of the ROC curve dy e: = dx P' = In yl(x) (2. 31) dx' where the prime denotes differentiation with respect to the argument. For any portion of an ROC curve where the slope is constant, the natural logarithm of this slope corresponds to the cut value on the zaxis. At the end points of the linear segment corresponding to the value r = 0 and r 1 the value of the cut on the z-axis can be determined from the appropriate one-sided derivatives. The r value can also be determined for each point along a linear segment of the ROC curve, since r will increase linearly from the value zero at the lefthand point of the segment to one at the right-hand point of the segment. r = 0 = In yt(x + 0) (2.32) r - 1 = - ln y'(x- 0) For a cut at a z value which occurs with probability

51 zero under both conditions N and SN, the parameter r is irrelevant. A change in ROC point corresponds to a change in the cut value f3. If there are no jumps in a z-neighborhood of a cut /, one may differentiate the coordinates of the ROC curve with respect to j3. From Eqs. 2. 28 and 2. 29, it follows directly that /3# Iz(i)1 }dxQ3, = -f(z = i N) a. e. (2.,33) dy(,r) = -e f(z= fIN) a.e. d/ As before, the ratio of these two derivatives relates the value of the decision cut on the z-decision axis to the slope of the ROC curve. e _ dy e - ddx = Iny'(x) a.e. (2. 34) We have shown that when the ROC curve is obtained from the decision axis, which is the logarithm of the likelihood ratio of observation, that the curve is differentiable almost everywhere, and the logarithm of this derivative is numerically equal to the cut value on the decision axis. Synthesis: It was really unnecessary to hypothesize that the decision axis was the logartihm of the likelihood ratio of some (physical) observation space, ~. It was merely necessary for the variable z to be the logarithm of the likelihood ratio of something and, since it is the logarithm of the likelihood ratio of itselfthe internal consistency would

52 have been sufficient to specify the problem. That particular circle was avoided because the logic may have been questioned. Since the logic has now been established by examples here and by the fundamental theorem, Section 1. 2, we can adopt this seemingly circular logic from now on. If we are given an ROC curve which is complete and convex, we can hypothesize the existence of a decision axis which is the logarithm of its own likelihood ratio and determine for each point on the ROC the corresponding cut value on this decision axis; and, if necessary, the value of the randomizing parameter r. If the value of the derivative is constant over some portion of the arc of the ROC, the horizontal extent of this arc indicates the probability w(z IN), and if a value of the derivative has been taken on only once, then Eq. 2. 34 relates the slope to the corresponding cut value and Eq. 2. 33 gives the probability density function, f(z I N). It has been shown that every complete, convex, nonsingular ROC curve corresponds one-to-one with a particular distribution function for z under the condition N. It was also shown that the distribution for z under N completely specified the distribution for z under SN. Even more can be shown, since Eq. 2.26 can be inverted to read w[z(i)lN] = e (i) w[z(i)iSN] (2. 35) and the similar relation for probability density functions is f(ziN) = eZ f(zISN) (2.36)

53 Therefore the distribution of z under SN completely specifies the distribution of z under N. It has therefore been shown that every complete, convex, nonsingular ROC curve corresponds one-to-one with a particular distribution function for z under the condition SN. Both correspondences relate to the same model; they are different ways of describing a sufficient part of the observation statistics, from which all of the remainder of the model may be obtained. Another type of generalization is possible. Any class of functions that is in one-to-one correspondence with the distribution of z under N, will be in one-to-one correspondence with the set of complete, convex, and nonsingular ROC curves. 2. 3 ROC Character, V(z) The ROC character m(z) introduced in this section is the principal function that will be used in the organization of ROC curves into families. The only property of the ROC character and its several related functions discussed in this section is the one-to-one relation with the distribution functions for z. Definition: ROC character m(z). Whenever the logarithm of the likelihood ratio, z, possesses probability density functions, the ROC character is defined as m(Z) - f(z [SN) f(z N) [z < c: (2. 37) The ROC character can be used to obtain the two probability densities used in its definition. Since the likelihood ratio of z is ez that is

54 z f(z ISN) e f= (zIN)(2. 38) it follows that e+ 5f (z)SN) = f(zSN)f(z I N) = f(z SN) (2.39) In a similar manner - 5z f(zIN) fe (z1(N) f(zISN) f(zN) (zN) (2.40) These are summarized in the double notation equation. zSN e 7(z) = f(z ) (2. 41) When the distribution functions for z contain discontinuities, the jump functions (discrete probabilities) are also needed to specify the distributions. Definition: ROC character jump w (z) Let w(z ISN) and w(zI N) be the jump functions for the distribution of z. The ROC character jump is defined as W (Z) = S(ziSN) w(z IN) I z < co (2. 42) Utilizing Eq. 2. 26, it follows that e 5 (z) = w(zSN) (2. 43)

55 The ROC character is obtained from probability densities and, as will be formally established in Section 3. 2, has many of the properties of a probability density. The function that is analogous to a probability distribution function is the integrated ROC character 1(z). Definition: Integrated ROC character II (z) zo n (zo) = f (z) dz+ w (z) (2. 44) -coo z< The integrated ROC character will be useful in Stieltjes integration, where it will appear written dII (z). As a memory aid, one may think of Eq. 2. 44 as reading memory aid: dlI (z) = rT(z) dz + w (z) (2. 45) The distribution functions for z can be expanded in density and jump function form. z F(z SN) = f f(z N)dz+ A W(zN ) (2.46) -co z<z0 Using 2. 41 and 2. 44 N z ~ 5z z+ 5z F(z SN) = f e v(z)dz+ Z e W T(z) (2. 47) The precise relation of these distribution functions to the integrated ROC character is given by Eq. 2. 48

56 F(zI ) = SN Z Z e0 5 z (2. 48) F~ )e'dII (z) (2.48) Z — x Although almost all of this work will be devoted to the distributions of z (meaning, as always herein, the logarithm of the likelihood ratio), it will be convenient to have a function for arbitrary probability density functions that parallels the ROC character. Definition: Root product density function f ( ) If t is a random variable with probability density functions f(t ISN) and f(t I N), then the root product density function for t is defined as f (t) = f(tiSN) f(tlN) (2.49) 2. 4. Remarks on the Distribution of z This section contains four short remarks about the distribution of z and their correspondingly short proofs. The first three have to do with the expected value of z under the two conditions N and SN. Remark: E(z IN)< O Proof: (1) E( IN) = f dF( I N) = f /() dF(&3IN) (2.50) = dF(QISN) = 1 (2) in is a convex function, and hence lies on or below a

tangent.. z <z (1) + z'(1)(Q-1) = Q-1 E(zIN) < E(f-1IN) = 0 Remark: E(zISN)> in E(f ISN) (2.51) Proof: In is a convex function Remark: E(z ISN) > 0 (2. 52) Proof: Consider g(f) = f in Q; g(1) = 0 g'(Q) = 1 + In Q; g'(1) = 1 -1 g"(W) =; g" > 0 g(M) is "convex upward" and lies on or above any tangent *. g(f) > g(1) + g'(1)(Q-l) f in f> f-1 E( ln f I N) > E(Q- 1 I N) - 0 E( ln I N) - E(ln I SN).. E(ln I SN) > 0 Q. E. D. Whenever there are no jumps in the probability distribution functions for z, the ROC character contains all of the information, and there is no need to use the integrated ROC character. In that case, we can say something about the symmetry of the ROC character. Remark: fT(z) is symmetric iff it is symmetric about z = 0 Proof: ASsume r(z) = m(c- z)

58 cooc -00 -00 I i I f(ziN)'dz e ff.52(z) dz - 5z 0 0 f e 7(c-z) dz- e' f e 5(cz)r(c-z)dz - 00 -00 - 5c - 5c 00 0C =e i.5 wi(X)d d= e50" f f(zISN)dz -00 -00 -.

CHAPTER III ROC FAMILIES 3. 1 Philosophical Orientation The objective of the present work is to organize ROC curves into families. To do this, some restriction has to be made on the system of ROC curves to be considered. The first intent is to treat as large a system of ROC curves as possible. The second intent is to develop the ROC families which have some generality while still requiring that each curve in the family can be specified by only one or two parameters. In this way, the determination of an ROC curve can be broken into a two-step process of determining the family, and then determining the one or two parameters necessary to specify the particular curve. In this work I have been influenced by Karl Pearson, who wished to unify and codify the treatment of probability density functions of one real variable. Restricting himself to the class of unimodal functions he worked with those probability densities described by the range of the random variable and the first four moments. His results were strongly related to the function which has come to be known as the Pearson ratio. For a probability density function f(t), the Pearson ratio is Pearson ratio = d (In f(t)) = (3.1) 59

60 Whenever the Pearson ratio is a rational function of degree no higher than a monomial over a quadratic, the density function is within the Pearson system. The specific type is based on the roots of the quadratic. The root of the numerator, the monomial, is the position of the mode of the density function. Pearson later expanded this work by the consideration of a quadratic over a cubic, which allowed inclusion of more complicated unimodal functions as well as bimodal functions. The Pearson ratio plays two roles of special interest to this present work. First, the Pearson ratio defines the area of applicability of the Pearson classification system. That is, the mathematician dealing with a given probability density function could decide definitely whether that function falls within the Pearson system or not. Secondly, within the system the ratio provides a classification into families. In attempting to codify ROC curves the present work is restricted to complete convex nonsingular ROC curves. The formal function that is proposedas a basis for classification is the ROC character. The classification of families of ROC curves would be a huge undertaking if it were not true that a link can be found that relates this classification to our heritage in probability theory and statistics. In a moment the relation of the ROC character to the Pearson ratio will be determined to see if the Pearson system can be used directly for classifying ROC characters. In the next section,

61 3. 2, the relationship of the ROC character to all univartate probability density functions is obtained. This second link provides the basis for the classification of ROC families by ROC characters. The relationship of the ROC character, 7r(z), to the Pearson classes is obtained from the relationship of the SN and N density functions to the ROC character. f (zN) = e' 5r(z) (2. 41) Divide through by the exponential T.5z SN ir(z) = e f (3.2) and take natural logarithms. In r(z) = In f Z5z (3. 3) The functions are in convenient form to take the derivative with respect to the variable, z.. (Z+. 5 (3.4) The Pearson ratio for the ROC character differs from the Pearson ratio for either density function by either plus or minus one-half. It is not enough, however, that each of these Pearson ratios be a rational

62 function. If the denominator is of higher order than the numerator, then the inclusion of the added constant in obtaining a new rational function will generally lead to numerator and denominator having the same order. It is quite common to find that the distribution for the logarithm of the likelihood ratio under one of the conditions does fall in a Pearson class, while the distribution of the logarithm of the likelihood ratio under the other condition falls in either a different Pearson class or outside the Pearson system. Only when the denominator is of equal or lower order than the numerator will the two distributions of likelihood ratio and the ROC character stay within the Pearson system. In this special case, all three will be in the same Pearson class. The Pearson ratio for a normal probability density is mean - z Normal Pearson ratio (3. 5) variance Section 1.3 contains W. W. Peterson's result,'When the logarithm of the likelihood ratio is normally distributed under one condition, it is also normally distributed under the other condition." We can add to this at this time, by adjoining "and the ROC character will be proportional to a normal probability density function and centered halfway between the two conditional density functions."

63 3.2 Fundamental Theorem on Regular ROC Characters In this section it will be shown that ROC characters for regular ROC curves correspond to those probability distributions on the real line which possess moment generating functions. Such distributions have received considerable study in the history of probability theory. All probability distributions on the real line possess characteristic functions (Fourier transforms), but not all possess moment generating functions (double sided LaPlace transforms). 1 Those distributions that possess both will have finite moments of all orders, and their transforms will be analytic in some neighborhood of the origin. In order to consider both discrete probability functions and probability density functions, we work with probability distribution functions, denoted by F, and the integrated ROC character, I1(z). For discrete random variables the individual probabilities are the jumps in the distribution. For continuous random variables the probability density function is the derivative of the distribution, and correspondingly the ROC character 7(z) is the derivative of II(z):. From Character to Distribution: As is customary in probability notation the limiting values for large negative and, large positive values For example, Ref. 30, p. 12.

are indicated by the arguments -oC and +cc. The four requirements on the integrated ROC character for regular ROC curves are: (1) the lower limit is zero I(-cc) = 0 (3. 6) (2) monotone growth dII(z) > 0 (3. 7) (3) unit "N" value S e' 5Z dI(z) 1 (3. 8) (4) unit "TSN" value S e dII(z) = 1 (3. 9) Add the equalities for the third and fourth conditions to obtain f [e'5 +e'5] dH(z) = 2 (3. 10) 5z 5z Since the sum (e' + e ) is greater than or equal to 2 for all z values n(+Oc)= dH(z) < 1 (3. 11) -0c Let c be reciprocal of II(+~c). cli(+oc) = 1 (3. 12) This has provided the basis for the distribution function that is proportional to the integrated ROC character, and the proof that the corresponding moment generating function exists in some neighborhood of the origin. Consider the function

65 F(z) = cIn(z) (3. 13) From (3. 6), (3. 7) and (3. 13) it follows that F(- c) = 0 F(+cc) = 1 (3. 14) F is monotone nondecreasing, dF(z) > 0 Therefore F defined by (3. 13) is a distribution function. The definition of the moment generating function is usually given using the expected value operator, E(). MF() = E(eit) (3. 15) This may be evaluated using the distribution function MF(S) = f et dF(t) (3. 16) 0 If the distribution function contains no jumps, the density function may be used. -cc MF( ) is called the moment generating function because dnMF () Fax = E(tn) (3. 18) dogn

66 Only those random variables which are sufficiently "concentrated" will possess a moment generating function and finite moments of all orders. A simple dominance method can be used to show that the moment generating function exists near the origin } = 0. For any real 5 with absolute value not greater than one-half, cz MF( ) = ez dF(z) (3. 19) -0c c f eZ dII(z) (3. 20) <c e + eZ dII(z) (3. 21) - o e'.z -.Sz <cf ece d (z) 2c (3.22) -OC From Distribution to Character. To satisfy the requirements on the integrated ROC character, specifically the unit N and unit SN values listed in Eqs. 3. 8 and 3. 9, 11(z) may differ from a parent distribution by a shift and a scale factor. Start with any probability distribution function, F(t), which possesses a moment generating function, MF(W). Later, values will be specified for the constants a, b, and c, the latter two positive. For the monzent consider z = a +bt (3. 23)

The moment generating function for the random variable z is E[etz] = E [eE(a+bt)] = eat MF(b4) (3. 24) that is f eZ dF( a) = eat MF(bI) (3.25) The specific values of interest are S e5 dF( ) = e 5aMF(. 5b) (3. 26) For any positive value of b which is sufficiently small so that the function MF(~. 5b) exists, set a l= n MF(-. 5b) - In MF(+. 5b) (3. 27) c - [MF(-. 5b) MF(+. 5b)] (3. 28) The decimal exponent. 5, the fractional exponent 1/2, and the square root symbol all mean the same thing. The decimal exponent is convenient for the type of manipulations in this present work. If the constants have been chosen correctly, then the following is an integrated ROC character n(z) = 1 F(z) (3. 29)

68 If the distribution function F is the integral of a'probability density function, f. then (z) = (bc)- f -(Za ) (3 30) The final step is to determine that Eqs. 3. 8 and 3. 9 hold. From (3. 26) and (3. 27), e 5Z dFba) [MF(. 5b)* 5 [MF(+. 5b)]. MF(. 5b) (3.31) 5tz z-a) 5 e F g dF) [ MF(-.Sb)MF(+. b)] = c (3. 32) From (3. 29), fSe 5z dH(z) = c1S ef 5 dF Za) (3. 33) From (3. 32) and (3. 33), 5r~. Sz -1 jr e. 5z dn(z) = c c 1 (3. 34) There is a continuous range of b values. Corresponding to each b value are appropriate constants a and c. Using these values, the simple scaling, Eq. 3. 23, generates a familyof (integrated) ROC characters. The key relations are repeated for later reference. Given a distribution function F(t), or a moment generating function MF(2). for any positive value of b sufficiently small

69 that MF(f. 5b) exists, let a = In MF(-. 5b) - In MF(+. 5b) (3. 27) c = [MF(-. 5b) MF(+. 5b)] (3. 28) Then II(z) c-i F(zba) (3.29) Ir(z) (bc) f -) (3. 30) are valid integrated ROC character, and ROC character, respectively. 3. 3 Classical ROC Characters The development of a system of ROC families will begin with probability density functions of a real variable which have moment generating functions. Each such probability density function corresponds to some ROC character. If the probability density function is well known and has a universally accepted name, that same name will be used for both the ROC character and the resultant ROC family of curves. Certain classes of probability density function are known by several different names. because they were independently developed in several fields of applications. An example is the normal, Gaussian, or error function. An attempt has been made to relate this present work to each of the names used in the literature. Armed with a table of probability density functions and

70 their moment generating functions, one may mechanically generate classes of ROC characters. The ROC curves are found by direct integration. The basic equations for the ROC curve in terms of the z-axis cut value 3, and the randomizing parameter r are given by Eqs. 2. 28 and 2. 29. When 3 is a point of continuity, or otherwise when r = 0, the ROC curve equations can be written compactly using the form of Eq. 2. 48. =( S e/:. 5z -J~ dII(z) (3.35) x((:) / The following sections list the results for the major classical probability density functions. Some mechanical comments are in order before proceeding with these. (1) The fundamental theorem of ROC characters allows for a translation and scaling of the variable. z =a + bt (3. 23) An entire probability density function class may usually be considered by using the normalized form for the density function, having zero mean and unit variance, since Eq. 3. 23 rescales the mean and the variance. (2) Bounded probability density functions on a bounded range will always have moment generating functions. Many of these correspond

71 to sections of probability density functions with infinite or semiinfinite range. Section 3. 4 will treat these separately. The beta and rectangular density functions are not sections of other probability density functions, and are included in the following sections. (3) The table upon which the following sections are based is Table 26. 1 of Ref. 6. 3. 3. 1 Normal, Gaussian, Error Function. For -o < t< 0o t- ~2 1 2 2 f(t) =e 2 MF() e M (~.S5b) = e 8 exist for allb > 0 b2 8 a = so b2 b,z 2 bz n(z) = e o() = edu a0e ndu and

72 Iba ~z -X83T- e: 1 2b2 r(z) = e e For standard notation for the normal ROC we let b2 = d d z (z) ) e 8 1 2d The ROC curve is y = O(t + /d) when x = 0(t). ( ) is the normal distribution function. 3, 3. 2 Exponential, Pearson X. For t> 0 -t f(t) = e -t F(t) = - e MF(,) = (1 - 1 MF(~.5b) = (1I.5b) exists for < b 2 n ( jn 5b a In.5b - (1-.25b2) H(z) = c 1- e z > a a z r(Z) = bc e eb

73 b-(. I b 1 -. 5bz ir(z) 1-.5b) e z > In 1+ 5b b I +.Sb If we let A-i b = 2A+ that is, let A 1 +. 5b A = A ~ 1 A 1-.5b then 1 z A+1 I(z) = (A-l)-:A A-i e 2 A- z -lnA or A+1 ZZo A' A-i 2 -A (z) l= A-I e z > z = -In A The ROC curve is X = yA 3. 3. 3 LaPlace, Double Exponential. For -oc < t < 1 - tlI f(t) = e MF() (1MF(~.Sb) = (1-.25b)- exists for 0 < b < 2

74 a -0 c = (1-.25b2)lzl 1 1 b 7(z)= e Izl I-.25b b r(z) = e B-1 b = 2 that is, let 2+b B 2-b B> then Izi B+1 B 2 B- 1 V (z) = e B2- 1 The ROC curve is B- i X = (B+i1) B BB B

75 irom tne negative diagonal to (). The negative diagonal point is 1 B X = YB 3.3.4 Pearson Type I11, Gamma, Chi-Square. For t > O,p > -1 _ _-t f(t) - tP e/rp + 1) MF() (1-) 1) MF(~. 5b) (1. 5b)- (p+) exists for 0 < b < 2 a - (p1) ln( + 5b c = (1. 25b /G p+1 (z) (1 25b) (p+)n( e b z As before, let z-aB+ 1 _ 2 +lB 2 B-i (Z) -=( B-i r p)+ 5(-P) ln B)P e B z > -(p+1) n B a

76 or Z-Z 5 (p+ 1) ZB+1 o B5 (P+1) Bz P B+ -e 3. 3. 5 Fisher-Tippett Type I. For -cc < t < cc -t F(t) = e MF( ) = (1- ) and _-t f(t) = e MF(+.5b) = r(1.5b) exists for 0 < b < 2 a = In r(i +.5b)- n r(l -.5b) c = r(1 +.5b) r(-.5b) z Er(l+. 5b) e b e b r(l+. 5b) (11(. 5b) i b r- b)5b) 3. 3. 6 Beta, Pearson I and XII. For 0 < t < 1; m, n> 1 f1 rm-1 nM-(4) = M(m, m+n, )

77 This is Kummers confluent hypergeometric function, and yields a power series form for all b for positive n, m 0c k ~MF(".5b) =C (1k (m+k- 1)! (m+n- l)! bk ~k=O (m+n+k 1)! (m-1)! 2k k! This is so unwieldy that the general form is retained i rn- n-i ir(z) = (bc B(m, n)) ( ) ( a < z < b where a = in MF(-. 5b) -In MF(+. 5b) c = [MF(-. 5b) MF(+. 5b)] 5 For m- 1 or n= 1 the infinite series becomes a finite sum, but no real simplification results. ROC curve formulae have not been obtained. 3.3. 7 Rectangular. For 0 < t < 1 f(t) = 1 MF( ) _ 1 (e -1) MF(4) = (~.5b) = ~2b 1(e5 b - 1) a = -.5b bc=(2 e25b_ -.25b bc -(2 &' -e)

78 If we let b 21n(l + R) R > 1 then bc = 2R +R'r (Z) 2R zI < ln(1+R) The ROC curve is a symmetric rectangular hyperbola (1 + Rx) (1 + R - Ry) 1 + R R > 1 The asymptotes lie outside the ROC unit square at a distance R' 3. 3. 8 Geometric. For integer t > 0; 0 < q < 1 (t)= (1 - q) q M(0) = (1- q)/(1- qe ) M(~.5b) = (1-q)/(1-qe 5b b< 211nq'a = n 1- qe' 5b = (e 5b 5 c (1 (q)l+q-q) q[(e',5bb a 5b) (z) - c ( qC z-a, aI-b, a+ 2 b,

79 5b -nqlz.52. I- n - b.(z) = (1+q2 q 5b qe 5b) qe-5b) e 7~ -.qq write u0 =in ( - )qe- 5b and Zk = + kb, k= 0,1,2,... Let A lnq -.5b Inq +. 5b The ROC curve subscribes X yA and touches it at Xk =(~qe 5b) Y, li - ( q+. 5b k k= 0, 1, 2,... 3.3.9 Bino-mial. For integer t > 0; 0 < p < 1 (t) = Ctn t(1 p)n-t where n n'!__ Ct - binomial coefficients t!(n-t)!

80 Let u =ln P 1-p -n U). n ut (t) W (1 + e- C eut -n M(4) = (1+ eU) (eu~ + 1) eu-. 5b u 5b +1 (e + e a = n In e u- 5bn U+. 5b u+. 5b + 1 (eU 5b+1) 2u u-. 5b (1 + eu) (eu + e + 1) -n c =(1 + eU) 2 e 5u (cosh u + 5 cosh 5b) 5n =1 z-a (z) = c1 (t ) eu(. 5 cosh u +.5 cosh.5b)] 2 n2 e z a+bt -n " The term 2 nCt has a "bell-shaped" graph and sums over t to unity. The symmetric case of p =.5, or u = 0, yields a = -. 5bn and 5n-n n wo(z) = [1+.5 cosh.5b] 2 -n z = b(t-.5n) The ROC curve consists of n + 1 straight segments, with slope change of e at each vertex. The equation of the ROC must be obtained from the two parameter table of binomial sums.

81 3. 3. 10 Specific Binomial. n = 0, one point: from the general binomial a= 0, c= 1, and t= 0 -1.'. (z) = c Wo(t) 1 and z =a= 0 This is the chance diagonal, Y= X. n = 1, two points [ 2u u+5b u- 5b.5b u Z -=l n e b+e 1.5b u W(Z1) e Wo(z0) e +e The ROC curve is called a "Luce-ROC" after Duncan Luce, (Ref. 35) Z~~~ U ~~ ~ U z 1 e Y= e X from (0,0)to eu +e e +e zo Y = 1 - e (1- X) from the vertex to (1, 1) For this special case it is easier to write ROC Curve is line segment from (0, 0) to (X0, Y0) from (X0,Y0) to (1,1) and

82 1- Y inc~(z0) = (1 - Y0)(1-XO) ln I(Zl) = x Z= In XW(z ) 3. 3. 11 Negative-Binomial, Pascal, Po'lya. For integer t > 0;' 0< p < 1. and q= 1-p. w(t) =Ctrl pr(rp)t M()= pr ~.5br +.5b MM- p e b(1 qe ) O< b < 211nqt -. 5b a = r In 5b 5b e -q pr +qa qe+. 5b qe5b c =p 1 +q -qe e 5r co (z) = /1 +qa - qe 5b - 5b) Y5r t+r- t w (1~+q q- qe C r-(l-p) t= 1, 2, and z = a + bt The extensive use of tables is necessary to plot the ROC curves. 3. 3. 12 Poisson. For integer t > 0; m> 0. -m t e m w(t) =

83 M(t)= em(e/ - 1) - 5b M(~. 5b) = em(e 1) a = -2m sinh.5b 2m[sinh. 25b]z c =e wn(z)= em cosh. 5b mt/t! t= 1,2,. z = bt - 2m sinh.5b The ROC curves can be obtained from chi-square tables, using the identities oc +.5b (me. 5b L (me!; 5) =P(X =2me~ 5bV= 2k).k Many of the classical distribution functions for continuous and discrete variables do not possess moment generating functions. Among these are the Cauchy distribution and generalizations of the Cauchy distribution, the Pearson Types IV, VI, and XI, the F distribution and the Student's t distribution. These cover a semi-infinite or an infinite range, and have "tails" which go to zero as some power of the random variable.

84 3. 4 Truncated ROC Characters and the Metastatic Transformation ROC curves which approach the origin at some slope other than infinity or approach the point (1, 1) at some slope greater than zero correspond to ROC characters whose z-ranges are bounded away from +oo or -oo, respectively. There is a functional relationship between many of these characters and those corresponding to ROC curves covering the complete range of slope from infinity to zero. There is similarly a geometric relation between the two types of ROC curves. This correspondence shall be called the metastatic relation. The word comes from the medical term metastasis, meaning a portion of something which has broken off and grows in a new place, yet retains its original character. The metastatic transformation will be used in Section 3. 5 to relate each nonregular but complete and convex. ROC curve and character to a regular ROC curve and character. This will extend the classification system based on associated distributions to these nonregular ROC. 3. 4. 1 The Transformation. The algebra of the transformation, and the geometric interpretation will be discussed at each step in the following development. Figure 3. 1 displays a particular regular ROC curve; Eq. 3. 36 is a formal representation of the corresponding functional relationship.

First ROC curve: Y1 = Yi(X1) (3. 36) O a b 1 Fig. 3. 1. A particular ROC curve Also sketched in Fig. 3. 1 are the coordinate lines at 1 a, = b and Y1 Y1(a), Y1 Y(b). These form a rectangle, with the first ROC curve passing through the lower left and upper right corners. Between these two corners the arc of the first ROC curve is interior to the rectangle, because the first ROC curve is convex. The metastatic transformation is the mapping of this rectangle onto a unit ROC square. Geometrically, the rectangle is removed, and the coordinate axes uniformly stretched to extend from zero to one. The transformed arc of the first ROC curve is the second ROC curve, the metastatic image. Is the second ROC regular? Yes, it is complete, convex, and interior to the unit ROC square

86 except at (0,0) and (1, 1). An equation for the new ROC curve, (X2, Y2), is X2 = (X1-a)/(b-a)? 12~~z1 a < Xl b (3.37) Y= Ot1(X) - Yj(a))/k(b) Y1(a)) z. When a point (X1, Y1) with slope e maps onto a point (X2, Y2) the slope is multiplied by the ratio of the Y to X axis expansion. Formally Z, dY o Z dx e (3. 38) dX Y (b)- Y1(a) Therefore b-a 2 +I1 Y1(b) - Y (a)39) The second z-axis is simply a translation or horizontal shift of the first z-axis. The ROC character for the second ROC curve can be obtained as follows. The formal relationship between the ROC character and the probability density function of the log likelihood ratio, conditional to the condition SN is -. 5z2 ir2(z2) = e f(z2ISN) (3.40)

87 This probability density function can be found by differentiating the corresponding distribution function, 1- Y2. d f(z21SN) = d z (1-Y2) d(1-Y2) d(l-Y1) d Zl (3. 41) d(l- Y1) d z d2 (3.41)z Yl(b) - Yl(a) f(z I SN) The probability density function of the first log likelihood ratio in terms of its ROC character is 5z1 f(Z 1ISN) = e i(zl) (3. 42) From Eq. 3.39 which relates the two z-axes, e. 52= e 1 [a5 (3. 43) Equations 3. 40 through 3. 43 lead to the equation for the second ROC character' t ln Yl(b) - Y1(a)) rl (-z +an (Inb a 2(z~) - rb b- L /(3.44) I~~~~~~~~~~~~~~~~~~~~I A

88 Let us draw a parallel between the geometric transformation of the ROC curves and the algebraic changes in the ROC character. Both a horizontal and vertical scaling convert the rectangle to the unit square. In Eq. 3. 44 there is a vertical scaling given by the denominator, and a horizontal translation, not scaling, given by the added constant in the argument of the numerator. The second parallel has not been explicit. Only an arc of the original ROC curve was transformed, not the entire curve. The bounds on the z2 axis correspond to the upper and lower bounds on the z1 variable at the two ends of the arc by Eq. 3. 39. In summary, the ROC curve was cut, and a small portion selected. This parallels the condition that only a portion of the original character is used by the metastatic character. The ROC curve was re-scaled to fit the unit square; the ROC character has been re-scaled according to its necessary properties. In Fig. 3. 2 are two specific ROC curves and their corresponding characters. The first is a normal ROC curve with index d of 2. 828. Above it is the normal ROC character, a curve proportional to the normal density function, centered symmetrically about zero and extending from -xc to +cc. The metastatic transformation chosen to illustrate the process was the part of the arc from (0, 0) to the negative diagonal. This corresponds to the z1 axis from zero to infinity. A metastatic transformation maps the dotted rectangle to the unit square. The transformation on the ROC character truncates the

I 1(z) r 2(z).3 \.75 -4 -3 -2 -1' 0 1 2 3 4 5 -4 -3 -2 -1 0 I1 2 3 4 5 Z" -' -- -Z 1.0 1. 0.9.9.7.7.6 I.6 25 Y2 Normal ROC Half Normal ROC. d 2.82 D 2.82.4. 4 Dotted zone marked ~ 3 / to obtain metastasis.3.2 /^1111 12 1Ih.1 0.1.2.3.4.5.6.7.8.9 1.0 0.1.2.3.4.5.6.7-.8.9 1.0 Xl X2 D —. d as indicated Fig. 3. 2. Example of metastatic transformation

90 ROC character on the left, re-scales it vertically (the scaling is a factor of 2. 5 for this specific example), and translates the axis horizontally the appropriate amount. The resulting ROC curve and character are shown on the right in Fig. 3. 2. The ROC curves are repeated in Fig. 3.3 on normal-normal probability paper. Figure 3.4 shows ROC curves before and after a metastatic transformation from normal with d = 6. 55. The metastatic ROC curve is the image of the original ROC curve from the (0, 0) point to the negative diagonal. Both of these metastatic "half-normal" ROC curves appear to be binormal ROC curves when plotted on normal- normal paper. 3. 4. 2 S6lf-Metastatic Families. Certain families of ROC curves are self-metastatic; that is, a metastatic transformation of one character will yield a second character which is in the same class. The most startling of these cases is the pure power ROC curve with exponential ROC character (see Section 3. 3. 2). This is the type corresponding to the ROC curve A x y (3. 45) A metastatic transformation of the power ROC curve containing the point (0, 0) to any point along the curve is the same as the original. The formal proof is in Appendix B. The rectangular hyperbola ROC curve has a rectangular

91 99 95 9; 50 /~~ /.0..~,,,, 90 i 80 70 60 y %X 50 0 F - 3i.02..357) 30 (.01.26 ) 20 (. 002.. 115) 10 1 5 10 20 30 40 50 60 70 80 Fig. 3.3. Replot of Fig. 3. 2

92 99 98 97 - / - 96 90 C 80 70 60 y 4 50 40 30 20 10 5.1 I I I 1 2 5 10 20 30 40 50 60 70 80 Fig. 3. 4. Second example of metastatic transformation

93 (flat) ROC character. Such a character, being a simpl'e rectangle, is symmetric about its midpoint. The midpoint must be z = 0 since this is the symmetry point for all symmetric ROC characters (see Section 2. 4). If a flat character is truncated either on the right or on the left, or both, and then the translation and rescaling dictated by the metastatic transform rules performed, the character will again be flat and symmetrically placed about z = 0. Since the character will not span as much of the z axis as the original character did, the new ROC curve will be different from the first; it will still be a rectangular hyperbola. The more that is removed from the original character, the poorer the resulting right hyperbola ROC curve. As a family, rectangular ROC are self-metastatic. It will be shown in Chapter V that the ROC character for the conic class is -1. 5 z -z (z) = ~ K(Ae + 2 B +Ce ) (3. 46) where the range of the log likelihood ratio, z, depends on the specific curve being considered. The coefficients A, B, and C are the first three coefficients in the classical form of a conic section. The translation of the z axis, Eq. 3. 39, will be absorbed into the new coefficients A and C, magnifying c Gd diminishing the other proportionately. The discriminant for a conic section in classic form is B2Z-AC. Any translation of z, although changing the coefficients A

94 and C, will leave the value of the discriminant unchanged. Not only will the individual subtypes of conic section (hyperbola, parabola, ellipse), be self-metastatic families, but each value of the discriminant will correspond to a self-metastatic subfamily. 3. 5 Nonsingular, Nonregular ROC The only difference between the complete convex nonsingular ROC curve and a regular ROC curve is that the initial lefthand point of the ROC curve may begin at some nonzero value and/or attain unity probability of detection at some value of false alarm proba bility less than one. Such a complete convex nonsingular ROC curve is illustrated in Fig. 3. 5. It should be emphasized that both conditions of starting above zero and terminating before one are not necessary, either one is sufficient to make the curve nonregular. The singular ROC curve is simply the upper edge. Nonsingular means that the initial point is not as high as one, nor is the value of false alarm at which it reaches unity detection as small as zero. Nonsingular =- Y(0) 1, X(1) f 0 (3. 47) On Fig. 3. 5, a metastatic transformation of the rectangle from Y(0) to one, and from zero to X(1) would produce a regular metastatic image. For this imetastatic image the ROC character would be

95 Y(O) 0 X(1) x Fig. 3. 5. A complete, convex, nonsingular ROC Po x 0 Fig. 3. 6. Traditional threshold ROC 1 Y(O) y I 0 X(1) 1 Fig. 3.7. Green ROC

96 =1 zin.Xz I (n f (z) = 1 )(3. 48) M2(M) (1- Y(O))] Consider any ROC character, ir2(z) with either bounded or unbounded range denoted by z2 to z2. Given: 2(z), z2 < z < (3.49) If this were the metastatic image of some arc, such as in Fig. 3. 5, the ROC character for the regular part of the original arc would be 7(z) = [X(1)(1-Y(O))] 5 T (z- In X(O)) (3. 50) i, Y(0) 1 1 Y(O) Z2l- In.( )< z < z - In — () The complete description of the nonregular ROC curve contains the boundary conditions. P(z = -ccN) 1 - X(1), P(Z1 = +olSN) = Y(O) (3. 51) Such boundary conditions cannot be included in the ROC character since probabilities are obtained by multiplying the character value by ~ 5z e This multiplier is either zero or infinite at the extreme boundary values. If the values for the boundary were unknown, they could be obtained by integrating the nonregular ROC character over the

97 range between - oc and +o-c. sc f e+. 5Z dnl(z) = 1 - Y(O) (3. 52) -Cc Cc f e-' 5z dII(z) = X(1) (3. 53) -0C The above equations are the essential formal difference between the nonregular and regular ROC characters. For the nonregular ROC the value of one or both of these integrals will be less than one, indicating that portion of the decision which is error free. The regular ROC character always has a value of unity for the complete integral (Eqs. 3. 8, 3.9). Two special ROC curves with discrete ROC character have been of importance in the psychophysical literature. Both are related to nonregular ROC characters with just one value of log likelihood ratio other than ~oc. The pure-threshold or traditional-threshold ROC has only two values of log likelihood ratio that play any role, ~oc and some negative value Z0. The ROC character jump is.5Zo w (Z0) = e (3. 54) Equations 3. 52 and 3. 53 are applied to this nonregular ROC character to determine that X(1) is one, meaning that the ROC curve first

98 contacts the upper edge at the corner point (1, 1). The ROC curve is given by Eq. 3. 55, and shown in Fig. 3.6. zo z Y= 1- ~e X (3.55) The more common form of this equation may be given by relabeling Y(0) as p0, sometimes called the "true probability of perception." Y = P0 + (1- p0) X (3. 56) The value Z0 and its ROC character jump are = ln (1 p0), c(Z0)= (1- p0) 5 (3. 57) A second nonregular ROC curve related to a single point on the log likelihood ratio axis is the, Green ROC. This is the nonregular limit of D. M. Green's double threshold ROC curve. For any point z and any ROC character jump value small enough that 5 z W(z0) < e (3.58) The formal expressions for the Green ROC curve are.5Z Y(O) = 1- e w(ZO) X =0 Zo -. 5Z Y = Y(0) + e X 0 < X< X(1) = e w(Z0) Y = 1 X(1)< X <1 (3.59)

99 The ROCcurve is sketched in Fig. 3. 7 for a slightly negative valtie of z 3. 6 Summary of Chapters I, II, and III The purpose of this research was to furnish a variety of ROC curves, to develop sufficient analytic structure to classify ROC curves into families and to provide a means of generating new families. In Chapter II it was shown that the distribution of a decision variable may be quite arbitrary under one cause condition; therefore decision models may be developed with considerable freedom. The ROC curve contains information sufficient to specify a decision model iff the decision axis bears a specific functional relation to the likelihood ratio f. This research has concentrated on z, the logarithm of the likelihood ratio, because the distributions of z are necessarily sufficiently concentrated to possess moment generating functions. It has been demonstrated that the ROC character can be used to structure ROC curves into families, and to provide both interfamily relationships and a basis for generating new ROC families. This completes the development of the research into the structure of ROC curves. The following chapters deal with specific ROC families, in order to determine properties peculiar to each, and to demonstrate the techniques for working with ROC curves and ROC families.

CHAPTER IV TRADITIONAL ROC FAMILIES This chapter will deal with those ROC families that appeared in Ref. 1. 4. i Normal ROC The normal ROC has been used extensively in both the electronic and psychophysical literature (Refs. 1, 4, 7 through 14). This type of ROC curve was used so extensively in the original psychophysical work of Tanner, Swets, and Green, that many thought it was a necessary part of their perception theory. Tables of the normal ROC curve are available (Refs. 17 and 18). The ROC character for the normal ROC curve is d z_ 1 2d (z) = e 2 e z d.. d>O0 (4.1) Multiply the character by e - 5 to obtain the probability density functions for the logarithm of the likelihood ratio under the two conditions N and SN. Both of these are normal, 100

101 (z +. 5d)2 f(z. N) -... e (4. 2) (z -.5d)2 -11 f(ziSN) - e (4.3) with equal variance and with mean values shifted plus and minus. 5d from the character mode of zero. The ROC curve is obtained by direct integration of Eqs. 4. 2 and 4. 3. P("A"! N) - x - cIt(-'5d- z Ad P("A"iSN) = y = (5d- Z 4) The symbol I stands for the normal distribution function. From this point, on, the probability notation for the ROC curve will usually be omitted and only the description of the ROC curve as a real function, y, of a real variable, x, retained. While Eq. 4. 4 explicitly indicates the relation between the coordinate values and the logarithm of the likelihood- ratio, this relation is not always desired. z may be suppressed, and the equation for the ROC curve written as y -= (X + Dd ) when x = c((X) (4. 5) ~Nhere the variable X is the dummy parameter along the ROC curve. In

102 this form the natural parameterization is by Id, which is normally called d' in the psychophysical literature. Graphs of the normal ROC are displayed in Figs. 4. 1 to 4. 3. In 4. 1 they are displayed on ordinary linear coordinate paper, with curves given for d values of zero (the chance diagonal) and d = 1, d = 4, and d = 4 9. This reflects the natural stepping shown in Eq. 4. 5. The coordinate paper for Fig. 4. 2 has been called "normalnormal paper" indicating that it is related to the normal distribution function on both axes, and is also called "double-probability paper" and "z-scale paper. " The linear distance in each coordinate direction on this paper is the argument of the normal distribution function. Referring to Eq. 4. 5, on such paper a normal ROC point plots with the vertical coordinate X + N/d, and horizontal coordinate X. These lie along a straight line with slope one and separated by a difference rd. Transformation to normal coordinates spreads the unit square over the entire infinite two-dimensional plane. The region displayed in Fig. 4. 2 is that used for medium probabilities, between one percent and ninety-nine percent. The region displayed in Fig. 4. 3 is more common in machine application where very small probabilities can be measured, -6 -6 and extends from 10 to 1 - 10 The main diagonal with slope plus one is the chance diagonal. The minor diagonal, with slope minus one, is usually referred to as the "negative diagonal" and corresponds to those points where the probability of miss, 1 - y, is equal to the probability of false alarm, x.

103 1. 0 9 -..6..8.7.5.3.2 0.1.2.3.4.5.6.7.8 9 1.0 x = P('A"IN) Fig. 4. 1. Normal ROC

104 99 98 \ 97 96 80 -0 F.r95 io —'ae 200 7010 -2 3 102 30 4 50 60 r70 8 Y 4 2 Fig. 4. 2. Normnal ROC on "normal-normal" paeper

105 6 ~ ~. 8 8 5o o o o o o % oE ~ S1:g 0?5 - o o o o o 0 0 o 0 0 o> o o 0 0 o 0 8 8 8 0 0 0 0 0 0 0 0 00 000 0 0 0 0 0.999990 9999 -0 _ 0999 099 - 090 - _ _ _ _ Y 080 o 0~/50 040 0099001, /00, _-/ o // 7000 0 0.000 000000' t x Fig. 4.3. Normal receiver operating characteristics

106 Four normal ROC characters are plotted in Fig. 4. 4 on the equivalent of semi-log paper; that is, In r(z) is plotted against z. These are simple parabolas with center at z O0 and opening downward. As the detectability increases, d increases, the ROC characters broaden out, giving more weight to large magnitude z 6 values. The locus of the one percent false alarm probability and 10 false alarm probability have been indicated. Since the ROC curves, and ROC characters, are symmetric, the negative diagonal corresponds to z = 0. The regions of interest for the psychologist will normally lie in the neighborhood of the negative diagonal with possible extension to high positive z values out to the one percent point, x =. 01. The region of interest for most radar and sonar applications is between the one percent and 10 6 false alarm probabilities. (In the past, some have been interested in false alarm probabilities as low as 1014. ) These correspond to regions of high positive z value. 4. 2 "Case II", Detection of a Sine Wave in Added Normal Noise A problem occurring frequently in the electronic literature is the detection of.a stable sine wave with uniformly uncertain phase in the presence of added white Gaussian noise. This has been treated by Rice, (Ref. 19), by Marcum (Ref. 16), by Middleton (Ref. 7), byHelstrom (Ref. 15), was included in Ref. 1 as Case II, and has been recently investigated extensively for its usefulness in hearing by Lloyd Jeffries (Ref. 20).

107 d = 0.5 -1 =-2 / d=4 -3 —,x 1 d 10 -4 -6 -7 -9 10,.6 -5 -4 3 2 -i 0 1 2 3 4 5 6 7 8 910 11 12 13 Fig. 4.4. Four normal ROC characters

108 The normal ROC case has been used so easily by so many. Yet its equation, Eq. 4. 5, is really quite complicated. The normal distribution function ( ) cannot be determined in terms of polynomials or the elementary transcendental functions. It owes whatever simplicity it possesses to the ease with which tables can be obtained, and familiarity with this specific function. In the case at hand, the coordinates of the ROC curve cannot be simply written in terms of each other, and the parametric form of the curve again involves a tabulated function. However, this particular function is one with which few people have great familiarity and for which the tables are not as readily available as one would wish. Tables have been calculated by Marcum (Ref. 16), specifically for the evaluation of this ROC curve. These are called QTables since that is the symbol that he uses. Q Tables: x = e / = Q(0, ) = t e dt I -t /2 -a 2/2 y = Q(a, ) = f t e/2 e 2 I2 (t) dt (4.6) The false alarm probability is a very simple function of the parameter of the curve, /3, but the detection probability involves integration of a function whose integrand contains the modified Bessel function of order zero, itself a non-simple tabulated function. The parameter a, a

109 positive real number, indicates the quality of detection. As a increased, the ROC curves become better. The Q tables were used to plot the ROC family of Fig. 4. 5. v4 Values Slope Formulae At Y = 50;q AtX 1- Y AtX =.50 s a-.5 n 0 O 0 0 1 - 0.5 - 0. 108 0. 970 0 - 1.0 0. 412 0.40 0.381 0.925 0.5 0.486 1.5 0.869 - 0.761 0.877 1.0 - 2.0 1.397 1.30 1. 188 0.851 1. 5 1. 56 2. 5 1.940 - 1. 640 0. 845 2.0 - 3.0 2.476 2.30 2. 106 0.852 2.5 2. 65 3. 5 3. 003 - 2. 581 0.861 3.0 - 4.0 3. 512 3.35 3. 544 0.872 3.5 3. 70 4.5 4.001 - 4.036 0.885 4.0 - 5.0 - 4. 40 - - 4.5 4.75 5. 5 - Graph 5.0 - Table 6.0 - 5.45 - - 5.5 5.78 7.0 6.47 - - 6.5 6.80 8.0 - 7.51 - - 7.5 7.82 9.0 - 8.55 - - 8. 84 Note: q and s are binormal approximation numbers. See Chapter VII. Table 4. 1. Numbers relating Q- Table ROC to normal The Vd and slope readings for a range of ac are given in Table 4. 1 and plotted in Fig. 4. 6. Consider a random variable t whic:h ranges from zero to infinity with probability density functions given by the integrand in Eq. 4. 6. It is immediately evident that the likelihood ration of t is the product of the third and fourth terms inthe integrand of the equation

110 99 a=5 98 97 96 90 8070 60 (%) 50 - - 40 30 20 10 5 4 3 2 1 2 3 45 10 20 30 40 50 60 70 80 x (%) Fig. 4.5. ROC based on Q tables

111 1. 00.90.80 8 6 d'= Ide d' 3 W AT Y=.50, d'= q 2/'%AT X=.50 0 o 1 2 3 4 5 6 7 8 9 Fig. 4.6. Slope and quality values relating Q- Table ROC to normal and binormal

112 for y (Eq. 4. 6). Therefore, the logarithm for the likelihood ratio of t is z = In I (at)-.5a (4. 7) and is strictly monotone increasing with the variable t. Those familiar with the chi-square distribution will recognize that under the N condition t is the square root of chi- square with two degrees of freedom while under the SN condition t is the square root of a noncentral c hi-square with two degrees of freedom. t = Xx2 vs. t - X22df (4. 8) 2d. f. 2d. f. general problem of detecting a shift from central to noncentral chi-square distribution with a known number of degrees of freedom has been treated in Ref. 21. The variable t rather than Chi-square has been chosen for two reasons; the first is that Marcum's tables use t instead of t2, the second is that the Bessel function In Io(at) is nearly linear with its argument, at, when the argument is large. This means that the density function for z should be of the same type as the density function for t if large values of at are the relevant values. The ROC character for z can be obtained from the root likelihood product of t multiplied by the Jacobian of transformation from t to z. The root Iikelihood product of t is simply the N density function multiplied by e.

v(z) f(tl N) e5Z dt (4.9) It would be desirable to write t in terms of z and determine the form of the ROC character as a function of its argument, z. We cannot invert Eq. 4. 7 in any practical fashion. Therefore, the best procedure is to write the ROC character in terms of the parameter t, and to attempt to graph or approximate the ROC character. 0a at) B(z = t -t2/2 -a2/2,5 Io (at) Table 9. 8 of the NBS Handbook (Ref. 6) gives not the modified Bessel function, but the more slowly varying quantity. -X T (X) = e I1(X) (4.11) (T has been used to indicate "tabulated function. ") In order to use the tables, Eq. 4.10 is rewritten in terms of the table entries. The logarithm of the Bessel functions can be written as ln I (X) = lnT (X)+ X (4. 12) and inserted in Eq. 4.7 and 4. 10. z = [ in To(at)+ at] 1 - [.5 (4.13)

114 in (z) = 5(at) + In(at) + 1. 5 In T (at) - In Tl(at)] 2-. 5t - [.25 a 2 + 2 In a] (4. 14) The argument of the Bessel function is at, and therefore the equations have been written in terms of a quantity which is a function of at alone, and then modifiers depending upon t or a. Since a is the parameter of an ROC curve, it will be constant over any computations of the ROC or its character. These equations were utilized to calculate ROC characters with table values and slide rule. The resultant ROC character for three values of the parameter is shown in Fig. 4.7. This is shown on the same coordinate system that was used for the normal ROC character Fig. 4. 4. The range of the random variable z is bounded below by. 5a 2 as can be seen in Eq. 4. 13, and hence the ROC curve will have a nonzero minimum slope. Since the ROC character is bounded below but unbounded above, it cannot possibly be symmetric. The mode of the ROC character for the cases shown is slightly negative. In the case a = I the mode is at the extreme negative value, z =-. 5. For the cases a = 2 and a =- v10, the mode values are approximately z = -. 73, and z - -. 79, respectively. As detectability increases (a increases) the logarithm of the ROC character covers a wider and wider range and begins to appear similar to those for the normal case. It is therefore very tempting to attempt to fit the ROC curve with a normal ROC curve, to fit the ROC character with a normal ROC character. In Ref. 22, the approximation

115 1= -2 x =10'2 -3 -4 In r(z) -5 ~~~~~Fg.7QtbeR)-6 cac 4=J(a=2 a=1 a=1 a=2 a=J1O -8 -9 -/ -11 Fig. 4.7. Q-table ROC characters

d c(a -.5)2 has been given. Some have used the approximation d ln I0 (a 2 ). Both of these have been useful in engineering application to indicate that performance is very poor for a less than. 5, and essentially normal for a greater than v10. For those interested in the ROC curve between false alarm probabilities of 10-6 and 10 the picture is simpler, since it is evident from IFig. 4. 7 that over this region the arcs of the logarithm of the ROC character could fit quite well with a parabola. Appendix C contains the numerical work in an attempt to fit the entire a = 2, In i(z) curve over the portion shown in Fig. 4. 7. Thi's means that the same expression must fit near the mode as well as along the skirt. The result is summarized in Eq. 4.15. i. 885 I z +.725 2 1. 625 a =- 2 1(z).25e (4.15) The numbers.725 and.25 come directly from the graphically determined mode. The power 1. 885 is close to 2. 000. This accounts for the normal appearance of the ROC curve and ROC character over short segments. The difference between 2. 000 and 1. 885 accounts for the failure of attempts to match the ROC curve or its character over major portions. 4. 3 Noise-in-Noaise, Same Spectrum There are two basically different problems called noisein-noise in the detection literature. The first, and the one that we shall

consider here, is the case which is an increase in theApower level without any other change in the statistics of the observation. The contrasting case, not considered here, is when the two noise processes differ in spectrum and in autocorrelation function. This latter case has caused considerable controversy in the literature because one can obtain singular detection at low power levels from seemingly innocent assumptions. The standard problem assumes 2WT independent samples of a Gaussian noise process are observed. Each sample has the probability density function given by Eq. 4. 16. I2 U. 2N f(uJi N) =e (4.16) 1 I27T N (I apologize for the confusing notation which uses N to indicate the cause on the left-hand side of the equation, and to represent the noise power or variance on the right-hand side of the equation; however, this is the standard notation in the field. ) The signal to be detected, if it were observed noise-free, would be of the same nature U. 1- 2S f(uilS) = r2 e (4.17) except that it has a power S. When the signal is added to the noise, the resultant observations will have the probability density function given

118 by Eq. 4. 18 2 U. 1 S 1.22 e 2(N+S) 4.18) The logarithm of the likelihood ratio of an individual point observation is- obtained by dividing and taking logarithms. U. i~ 1. I N, (U.(.... (4, 19) (1 i) 2 N N + S ) In(N + S) If a number of independent observations are to be used reach a single decisiOn, then the likelihood ratio of the total observation is the product of the likelihood ratios of the individual parts. This means that the logarithm of the likelihood ratio of the total observation will be the sum of the logarithm of the likelihood ratios of the individual observations. It follows directly from Eq. 4. 19 that the sum of the z values corresponds to the sum of the squares of the individual observations. 2WT u,d u. (4. 20) i=l 1 Thus, u is a sufficient statistic. It follows directly that the log of the likelihood ratio of u is the sum of the logarithms of the likelihood ratios of the individual observations.

119 z(u) 2WT 1 1 N z~u) z(u)) 2(NN+S ) +WT In(N+) =S N z = uf2N(N+ S) +wTln(N+S) (4. 21) Since u is the sum of the square of Gaussian random variables, its distribution is closely related to the chi-square distribution with 2WT degrees of freedom. Specifically, the variable u/N will have a chisquare distribution with 2WT degrees of freedom. This can be used to obtain the distribution for u. u WT- 1 2 u e f(uj N) T)NT u > 0 (4.22) 2 r(WT)N In a similar manner, the distribution for u under condition SN is obtained from u/ (S+ N) being a chi- square random variable. The equation is exactly like Eq. 4. 22, except that N is replaced by (N+S). The present objective is not to obtain the individual density functions to take their ratio, since we already know the likelihood ratio. The objective is to obtain the individual density functions in order to obtain their product. Since, by Eq. 4. 21, u is a linear translation of Z, the ROC character is obtained from the root likelihood product for u by multiplying it by the Jacobian of transformation (the derivative of u with respect to Z. ) 7T(z) = [ f(u N) f(u SN)j'5 du- z z (4.23)

120 From Eq. 4. 21, z is bounded below by a quantity we shall call Z because u is bounded below by zero. N S Z0 - WT in (N S) = WTT ln ( + ) (4.24) From (4. 22) and (4. 23) u S 2N+S N' (zWT(S = N) 5WT [ uS WT 2N(N+S) 2S T S L2N(N + S) (WT) S (4.25) The next step simply is tne change of variables from u to z..5WT +~s>(z-)N ~~~NS+N 5TW- +.5((Z - Ze0 (4.26) This ROC character is a Pearson III character, a twoparameter class. When WT is small, or S/N is large, the ROC curve can be obtained by direct integration. Tables of the chi-square distribution may be used to plot the ROC curves. X = 1 - P(X = - 2WT) Y= -P(X =- S N - 2WT) (4.27)

121 When S/IN is very small, but WT is sufficiently large so that the resulting performance corresponds to normal indices of the order of 0. 5 to 50, a normal approximation to the ROC curves may be made. The formula for the matching normal index = (WT- 1)(N) (4. 28) is derived in Appendix D, using the ROC character. This approximation is not new; the use of the ROC character to obtain it is new. 4. 4 Signal One of M Orthogonal Signals Chapter IV on ROC models derived from electronics cases concludes with a review of the M orthogonal signals case. The situation is this. An observation X (X1, X2,..,XM) (4. 29) consists of M similar parts. If the condition is noise alone, N, the individual parts are similarly distributed and are statistically independent M fM(X'N) = j1l f1(Xi N) (4.30) Therefore, the probability density for the occurrence of the total obstervsation is given by a product of the individual N probability density functions. These M parts of the observation correspond to

122 M different subeauses, possibly M specific signals. Ea~ch possible subcause affects only one part of the observation. Thus, if the j-th signal is present, the probability density for the total observation is given by fM(XSjN) = f (Xj I SN) II f l (Xi N) (4.31) The j-th signal affects only the j-th part of the observation and the rest of the observation is distributed the same way it was under the condition N. Rewrite Eq. 4.31 in terms of Eq. 4.30 by multiplying and dividing by the missing factor in the large product and obtain fM(XiSjN) = f1(Xj) f (XN) (4.32) for the probability density under the special subcondition of the j-th SN cause. The actual SN cause is an ensemble of these subcauses, each occurring with known probability P(S.N). The probability density function for the total observation under the condition SN is therefore the average probability density function (averaged over the various subcauses. ) fM(X, SN) = P(Sj N) fM(X SjN) (4.33) Using Eq. 4. 32 fM(X SN) 3 P(S.N) 1t (Xj) f(XN) (434)

123 Since the term fM(XiN) is independent of the summation, divide through by it to obtain the likelihood ratio for the total observation. (X) = X P(SjN) I(X) (4.35) The logarithm of the likelihood ratio is simply the logarithm of Eq. 4.35 z. z = In i p(S.N) e J (4.36) j J The physical picture of the receiver for such a situation is one of M parallel branches, each of which computes a log likelihood ratio on part of the observation. Each of the log likelihood ratios is passed through an exponential nonlinearity which greatly emphasizes large positive values. The output of these nonlinearities are averaged together to form the total likelihood ratio of the observation. If this total likelihood ratio is passed through a logarithmic nonlinearity, the output will again be on a z axis. Although the value of z on any observation will, in general. be less than the maximum individual log likelihood ratio, the effect of the peaking function before the averaging means that the output z will, in general, be greater than the average value over all of the M parallel channels. E P(S N) z. < z < max z. (4.37) j J 3-

124 The evaluation of reasonable parameter cases for the M orthogonal signal problem has taxed many people. The difficulty is the presence of the nonlinearities before the averaging process. Some authors have felt that the presence of the averaging process would tend to make the output normally distributed. However, if one realizes that C ranges over positive values, has an expected value of one for condition N, a good normal fit could be obtained only if the standard deviation of the fit is small compared to one. The direct route to obtain the distribution of z, or k, is to carry through the M-fold convolution to obtain the results. This could be done in the transform plane by finding the characteristic function for Q, raising to a power, and transforming back. Since the likelihood ratio often fails to possess a moment generating function one suspects that a great deal of care will be necessary if any approximation or numerical work is done in obtaining the characteristic function or its inverse. An Example: Let us consider a particular example of ROC character for the individual parts of the observations that simplify the process of obtaining the ROC character for the total observation. Consider the ROC equation given in Eq. 4.38 y1 =1 In xl (4.38) and graphed on normal-normal paper in Fig. 4. 8. This does not represent particularly good observation quality, or any physical case of interest, but has been chosen because of the specific form of its

125 99 98 97 96/ 95 90 80 70 60 ()Y 50 4030 20 10 Y = - X IX nX 25 - (Z) = e -0 < z _<oo 4 2 1 2 3 4 5 10 20 30 40 50 60 70 80 x (%) Fig. 4.8. A particular ROC curve

126 ROC character. 1.5 z. 1 -e 7Tl(Zi) = e 1e -oo < Zi <z (4.39) -. 5z Multiply the character by e to obtain the probability density function for z under the condition N, Z. Z. 1 f(iZ = e 1 e e (4. 40) and by substitution, obtain the probability density function for the individual part-of-the-observation's likelihood ratio. fl(1 iN) = e 1 e 0< {i o (4. 41) Let u = i (4. 42) The distribution of u is obtained by simple use of the characteristic function. The resulting density function is M -1 -u u e f(uIN) = (M! (4. 43) The likelihood ratio desired is not the sum of the individual likelihood ratios, but the average likelihood ratio. Assume that the subcauses are equally likely. From Eq. 4. 43 obtain

127 MM M - 1 -Mf u = M f(eN) M(M - i)1 e (4. 44) From the distribution for the likelihood ratio, transform to the log likelihood ratio axis M z M Mz -Me f(zlN) = (M1)! e e (4.45) 5z and multiply by e to obtain the ROC character for the total observation procedure. MM (M+.5)z -MeZ -oz o (4.4) grM~z?' e e +00 < z <00 (4. 46) This ROC character is functionally similar to the single observation character. 1. 5z. z. i -e 1.ri(zi) = e e -eo < z < oo (4. 47) Normal Case: To obtain an accurate estimate of the ROC curve when the individual observations are normally distributed, Jaarsma (Ref. 24) programmed the IBM 7090 to obtain the distributions using the characteristic function. Reproduced here are some of his ROC curves and the ROC characters for the particular case when the original observation was normally distributed with parameter d - 4. The ROC character for this individual observation is given by Eq. 4. 48.

128 z. 1 1 7T (Zi)= e e -00 Z oo (4. 48) ROC curves are for 5, 20 and 100 parallel channels of observation. The resulting performance is displayed on normal-normal paper in Fig. 4. 9. The ROC characters for these three cases are drawn on linear paper in Fig. 4. 10. The three characters all have the same general form. They are unimodal with the mode being at a slightly negative value of Z. The decrease from the mode is faster in the negative direction than in the positive direction. As the number of parallel channels, M, increases and the detection performance falls off the ROC character is essentially confined to a narrower range of z values andbecomes correspondingly more peaked. To compare these ROC characters with those shown for Cases I and II the logarithm of the ROC character has been plotted in Fig. 4. 11 to the same scale as for Fig. 4. 4 and Fig. 4. 7. Because of the specific mechanics of the computer program, the computation of the character was tejrminated at the neighborhood of the one percent false alarm values; that is, x=. 01. In order to look at one of these in more detail, the axes were changed and the ROC char acter replotted for the case of M-= 20 in Fig. 4. 12. In Fig. 4. 12, the z-range shown extends only from -2 to +2. To the left of the mode, the appearance of the ROC character is quite parabolic (or at least power law). The fall-off to the positive z direction is shallower, as has already been noted, and

129 I... I I.I I X, ss C 95 90 80 - 70 60 50 9/0 40/ 5/ /30 20 10 5 10 20 30 40 50 60 70 X (%) Fig. 4.9. ROC curves for M orthogonal normal signals

ir(z) -. 0 M= 100 1: \ / o M=20 - —.5 -3 -2 -1 C +1 +2 z Fig. 4. 10. ROC characters for M orthogonal normal signals

(aT9os!aoi) srBu~!s Itujou reuo2oqimo z ]6 o~ s~aooa uyora~~ ~)O~H' II'$'~Tz 8 L 9 G g: l 0 t-' 0 — - 1' 1 I I I I I I I' I'!-'' I TT'_"- 61 1 1 1 1 1 1 1 1 1~~~~~~~~~~~~~~~~~~~~~~1 _ii g I I - 8-_ 11z,I I / 91I I i I_ _ I. I ~~I aiG a _ I 1 9 G1T: I u I ]~~~~~ I iI IL/ o~/ ~0 I~~~~~

1. 0 C,', 0.I1 M Orthogonal Normal Signals M = 20 d 4 *01 -2. 0 0. o +1. z "'ig. 4.12i. ROC character

133 appears to be more linear. In order to investigate this nearly linear fall-off of the log of the ROC character, consider what the effect would be if the ROC character behaves exponentially for large positive values. Assume -C2 z m(z)= C1 e C1 >0, C2 >.5, z >C3 (4.49) holds for z greater than some value, say C3, and with coefficients C1 and C2. For this simple ROC character, the ROC curve can be obtained by direct integration 00 C1 - (C2 7.5) z Y ___5t je7T(t) dt C( 5e (4.50) Taking logarithms of both sides Taking logarithms of both sides in (X) In C1 In (C2.5)- (C2 =.5) z(4.51) Eliminate the parameter z to obtain a nonparametric form for the ROC curve. C2 + 5 1.C2 5 2 _________ +.2 in X = () n ( 5)n C1 + 5) in (C2 +.5) (4. 52) This shows that the ROC curve would plot as a straight line on log-log paper for that range for which the ROC character is essentially a simple exponential. The general form for such a curve is given by

134 in X A ln Y + ln B X BYA (4.53) The ROC curves of Fig. 4. 9 are replotted on log-log paper in Fig. 4. 13. The straight-line portions of the ROC continue well beyond the initial part of the curve up to the general region of Y. 55. There must be a departure from this straight line form as the ROC curve approaches the point (1, 1) since the ROC curve is regular, and the ROC character is definitely not exponential over its complete extent. Figure 4. 13 does indicate that the exponential portion of the ROC character extends amazingly close to the mode. A detection situation where the likelihood ratio is an average likelihood ratio, corresponding to the signal affecting only one of many parts of the observation, leads to great difficulty in treating the ROC analytically. This is in contrast to the happier situation when the logarithm of the likelihood ratio is the sum of the logarithm of the likelihood ratio for various parts of the observation, corresponding to the situation where the signal affects all of the individual parts of the observation. In this latter case, one will be dealing with the distribution for z, a variable which has concentrated density function, moment generating functions, and tends to be more amenable to transform methods than Q, which has wildly skewed probability distributions.

135 1.00oo. 80.60.50.40. 30.20 10.06 - * 6 *05 * 04.03 G2_ 01. 01 ~.02.05.10.20.50 1.0 x P("A"IN) Fig. 4.13. M orthogonal normal signals, ROC for d = 4

CHAPTER V ALGEBRAIC ROC This chapter is dedicated to Professor James Egan. His work with the type of ROC known as the Power ROC (Ref. 23) suggested the use of simple algebraic formulas for use in ROC models. Such models can be expected to be used to fit data, or approximate the behavior of a decision mechanism, and not necessarily arise from physical phenomena. Egan noted the nonnormal nature of ROC curves that he obtained for human observers. The region of the ROC near the (0, 0) point appeared to be fairly normal, sharply rising and curved. However, the region of the ROC near the upper corner (1, 1) was more like the pure threshold ROC curve of traditional psychophysics, with the curve approaching the upper corner at some nonzero slope. ROC curves of this nature are shown in Fig. 5. 1. These curves are Egant's pure power ROC curves, detailed in Section 5. 1. The description of the ROC curve as an algebraic formula will mean that most of the mathematics willbe simpler than that necessary when the ROC is described by higher transcendental equations. The calculus that is involved is generally elementary. The algebraic ROC curves are recommended as classroom examples since little higher mathematics is required of the student. In this chapter Egan's pure power ROC i? examined in Section 5. 1. obtaining a one paranmeter fanily of ROC curves. In 5. 2 136

137.9.8N.7.6.5 L4..3.2 - 0 0.1.2.3.4.5.6.7.8.9 1.0 Extra Curves Locus of = 1, z=0 Negative diag X = 1- Y Fig. 5. 1. Power ROC curves on linear graph paper; X =yA

138 the general conic ROC curve is considered. These ROC curves are arcs of parabolas, circles, ellipses, or hyperbolas. The Luce ROC Curve will be considered a conic section, it being the two straight line asymptotes of a hyperbola. The regular conic ROC is a three parameter family. Section 5.3 is devoted to a nonlinear scaling which can be used to produce a special type of probability paper, called lor-lor graph paper. The Chapter concludes with Section 5.4 on curvefitting using the conic ROC curves. While the three parameters of the conic ROC family are too cumbersome to yield a simple set of ROC, they become three degrees of freedom in fitting curves to data. 5. 1 Power ROC: X= yA The ROC curve of the pure power type, X yA, where A is greater than one, has already been shown in Fig. 5. 1 on linear graph paper. In this graphical form it is similar to many ROC curves that have been measured. This type of ROC has one property (Ref. 23) which is extremely convenient in situations where the probabilities, X or Y. can be determined except for a fixed, unknown, constant of proportionality. For example, in an experimental situation in which the number of false alarms is measured, but the number of opportunities for false alarm can only be estimated, the false alarm probability, X, can not be determined exactly. If two similar experimental runs are made, or if an experimental technique is used so that two points on

139 the ROC can be determined simultaneously, then one can obtain the value for A without determining the exact number of opportunities for false alarm. If each of these two operating points fall on the power ROC, then so does-their ratio X = YA X)(Y) (5. 1) Once it is known that the ROC curve is a power law ROC curve, and the parameter A determined from Eq. 5. 1 using two operating points, then the number of opportunities for false alarm can be obtained from either one of the original ROC data points. ROC Curves. Equation 5. 1 suggests the use of log-log paper, since the equation will plot as a straight line on such paper. Power ROC curves with A parameter values of 1, 2, 3, 4, 8, 16, 32, 64, are shown in Fig. 5. 2, on log-log paper. In addition to the ROC curves, the locus of the proper operating point for a symmetric bet, z = 0, and the position of the negative diagonal points are shown. Log-log paper may be useful when A is less than eight or ten, but the crowding on the graph will negate its usefulness for higher values of A. If extremely low false alarm rates are being investigated, then larger A values will be feasible. Probability of detection values less than. 80 can be plotted well on this paper. Pure power ROC curves have been drawn on Fig. 5.3

140 A=32 ^A=64 A=16,~~~/~~~~ ~ /N~%~~ ~Neg. Diag..20 Q y.05 — 01.01.02.05.10.20.50 Fig. 5. 2. Power ROC curves on log-log paper; X = Y

141 99 0 8 97 ro 96 95 5 90 80 707 60'\ x 30 20 ~2~~~~~~~~~~EL o c,,z f 10'. 5 4 3 2 1 2 3 4 5 10 20 30 40 50 60 70 80 X(%) Fig. 5. 3. Power ROC curves on normal-normal paper; X = y

142 on normal-normal paper. These curves are "amazingly straight. " The crowding as the value of A is doubled is not as pronounced as on log-log paper. If one had experimental data lying along one of these pure power ROC curves plotted on normal-normal paper in the range shown, one might well come to the conclusion that the "true" ROC curve was a straight line on normal-normal paper, but with slope less than one. The strict interpretation of such truly straight lines on normal-normal paper will be considered in Chapter VI on binormal ROC curves. The negative diagonal on linear paper is also the negative diagonal on normal-normal paper. The locus of the unity slope values, where Z = 0, plot along a line almost straight and parallel to the negative diagonal. If a human observer had a pure power ROC curve, and operated optimumly in a fair bet yes-no detection experiment, he would operate somewhere along this z = 0 line and the experimenter would conclude that the observer "had a bias toward the'B' decision." A common procedure in both psychophysical work and in binary communication work is to relate performance along the negative diagonal to either the index of the curve, to the signal to noise ratio of the presentation, or to some other indicator of goodness of performance. Along the negative diagonal the probability of a miss, 1- Y, is the same as the probability of a false alarm, X. For the pure power ROC curve with index A the probability of detection along the negative diagonal will satisfy Eq. 5. 2.

143 1-Y = yA (5.2) Equation 5. 2 relates the index A to the ROC coordinates on the negative diagonal. This was solved numerically and the results plotted in Fig. 5. 4. Another common practice is to convert the negative diagonal point to a normal index of detectability. This index is labeled de because it is related to the point of equal error. Figure 5. 5 is a plot of the power ROC index A vs. fde. Although there is some curvature in this graph, a first approximation is that the logarithm of A is roughly proportional to the square root of de. For low index of detection, one might use the approximation that Ve = 2 log10 A. Although such practices of reporting a normal index of detection for nonnormal ROC curves may be commendable because it increases general communication among experimenters using different models, it carries with it the risk that some uninformed people will assume that the ROC curve is truly normal. The index de should always be accompanied by the statement that d applies only to the single point e on the negative diagonal. This is the practice of most user's of the index d e ROC Character. Two basic properties of any ROC curve are (1) the ROC curve gives the complement of the distribution functions of the

144 100 20 A 5. 01.03.05.07.10.20.30.40.50 i., I, i. I I I l X.99.97.95.93.90.80.70.60.50 2 I_ I I.... I...... i....i i....i y ROC Coordinates on Negative Diagonal Fig. 5.4. Pure power A vs. negative diagonal coordinates (X9 Y)

145 100 10 0 1 2 3 4 fd Figh 5.5. Pure power A vs. negative diagonal normal Vd e

146 F(zIN) = 1-X F(z1SN) - 1-Y (5.3) (2) the slope of a differentiable ROC curve is equal to the likelihood ratio cut value corresponding to that point. dY e = eZ (5.4) dX The equation for the pure power ROC is X yAA A> 1 (5. 5) To determine the ROC character, differentiate Eq. 5. 5 with respect to X to obtain 1 = A yA-1 eZ (5. 6) Since the slope of the ROC curve decreases along the ROC from (0, 0) to (1. 1), Eq. 5.6 can be used to determine the minimum and maximum values of z. The maximum value is infinity when Y is zero, and the minimum value is -In A when Y is one. Next, solve Eq. 5. 6 for Y A1 -Z Y \ A) e (5.7) Differentiate the result with respect to z, to obtain the probability density function for z under the condition SN.

147 1 f(zISN) = e (5. 8) The final step is to convert the probability density function to the ROC character z A+1 7I(z) = eZ f(zISN) (= e 2 A- z > -lnA (5.9) A- A The probability density function for z under the condition N is Az f(z N) = e (z) = A- (A e A- (5.'10) The ROC character is a simple exponential. The pure power ROC curves may also be called the simple exponential ROC curves. If z is replaced by -z in Eq. 5. 9 to yield z A+i +2 A- 1 7r(z) = (A-1)- A-1/(A-1) e 2z < IlnA (5. 11) another simple exponential ROC curve is obtained. It is 1-Y - (1-X)A (5. 12) Forced Choice and Second Choice Equations. The distributions of the decision-axis variable serve as a model for many performance measures besides the ROC curve. The following situation is used in

148 experimental psychophysics (Ref. 8, pp. 30-36), and corresponds also to the digital communication case known as M-ary coding using orthogonal waveforms. The description will use the psychophysical terminology. An observation consists of M similar portions. The SN condition holds for one and only one of these, the other M- 1 are statistically independent and due to N. When the SN condition is equally probable in each portion of the observation and response utility is uniform, the experiment is considered "symmetric". In a symmetric forced-choice experiment, the observer is asked to indicate his principal choice as to the portion of the observation due to SN. In a symmetric second choice experiment he is asked to indicate his principal and second choices. Equations for the probability of these choices being correct, for an observer with pure power ROC curve with index A, can be obtained in closed form. Let P (A) denote the probability that the principal choice is correct. This probability will be a function of the index A. Let SM(A) denote the probability that the second choice is correct, when the first choice is wrong. If the principal choice is correct, then the part of the observation due to condition SN yielded a higher decision value than all of the M- 1 values due to condition M. PM(A) = Pr [all (M-1) values of zN fall below zSN] (5. 13)

149 The distribution function for the decision value under condition N is 1- X,. and the probability density for the decision value under condition SN is the derivative of 1- Y with respect to the decision axis. 1 M-1 PM(A) = f [1-X] dY (5. 14) 0 When the number of presentations is two, Eq. 5. 14 takes on a very simple form 1 P2 = S (1-X) dY = Area under the ROC (5. 15) 0 Since for each value of Y, the length of the horizontal line under the ROC curve at that value of Y is 1- X, the integral yields the area under the ROC curve. Equation 5. 15 is known as Green's theorem, and will be discussed in Section 10. 1. 1. For the power ROC the two-choice probability, Green's theorem value, is simply evaluated 1 A 1 A P2(A) = f [1-yA dY = 1 A +1 (5. 16) 0 To obtain an expression for M> 2, factor one of the 1- X terms out of the integrand in Eq. 5. 14 PM(A) = f [1-Y] 2 [1-YA j dY (5. 17) 0

150 P (A) - 1 r[ A1MM- 2 dY- A[1-A] dY (5.18) 0 0 The first integral is simply the probability that the principal choice is correct had there been one fewer alternative. The second integral is evaluated using integration by parts. The general formula for integration by parts is u dv = uv - fvdu where the bar after the term uv indicates that this product is evaluated at the lower and upper limits as are the integrals. The appropriate choice of u and v for the second integral in Eq. 5. 18 are given in Eq. 5. 19. M- 2 u =Y dv = YA [1-Y A] dY (5. 19) M- I du = dY v = [1- yA] (M-1) This choice is partly appropriate because the value of u at the lower limit zero. is zero; the value of z at the upper limit one, is zero. Therefore the term uv contributes zero to the integral. i M-2 i M-1 Ai A A 1yA 1 dY A- - / v du - dY (5.20) Equatio~n 5. 20 substituted in (5. 18) yields a recursion relation between

151 probabilities of the principal choice being correct for M and M- 1. PM( P = PM- P ) A(M-) (5. 21) Collect the two coefficients of PM and rewrite A PM(A) A +1/(M-1) 1M-(A) (5. 23) Repeating Eq. 5. 16 P2(A) = A1 (5.16) Applying (5. 23) once A2 P3(A) = (A+1) (A + 1/2) (524) and in general AM M+ (A) (5.25) PM+I(A) = (A+ 1) (A +.1/2)... (A+ 1/M) (5.25) Eaci- Of these expressions is a simple algebraic formula which takes on the value 1/M for A 1 (chance) and grows monotonically to one as

152 A approaches infinity. The second choice measure is SM(A) - Pr [Second choice is correct I First choice is wrong] (5. 26) A conditional probability is the ratio of the joint probability divided by the probability of the condition. The probability that the first choice is wrong is simply one minus the probability that the first choice is right. SM(A) = Pr [Second choice is correct AND First choice is wrong]/ [1- PM(A)] (5. 27) The event "second choice is correct and first choice is wrong" is the event that one, but only one, of the N based decision values exceeds the SN based decision value. SM(A) = Pr [(M- 2) values of zN fall below zSN and one zN exceeds zSN] / [1 - PM(A)] (5. 28) SM(A) = (M- 1) [1-X]M X dY/[1- PM(A)] (5.29) 0 where the multiplying term (M- 1) is the number of ways of choosing which N based decision value exceeds the SN based decision value. For the power ROC curve

153 I M-2 A SM(A) = (M-l) f [1- ] Y dY/ [1-PM(A)] (5.30) 0 This integral was evaluated previously, Eq. 5. 20. Therefore SM(A) 1 PM(A)/ [1-PM(A)] (5. 31) Using Eq. 5. 25 M-2 SM(A) ( 1 A (5. 32) 1 _ A M-I AM" (A+1)(A+ A+Mll] It is somewhat simpler to evaluate this expression for small values of M if both numerator and denominator are multiplied by (M -1) factorial. M- 2 SM(A) = (M- 1)A (5.33) [(A+ 1)(2A+ 1)... [(M-1)A1] - (M 1)! AMIf there were an experiment with only two presentations and the signals had to be in one of them, then if the first choice is incorrect, the second choice must be correct. Equation 5. 33 for M= 2 is S2(A) = [A1] - 1 (5.34) In a three- choice experiment

154 S3(A) 1 S3 < (5.35) S3(A)=3A+1 2 3 -3 As one would expect, when the value of A is one (chance) the second choice probability is one-half. (Since the observer knows his first choice was wrong,he is guessing between two intervals with no basis for decision. ) The nonintuitive result is that as the ROC curve becomes better and better (A approaches infinity) the second choice probability does not approach unity; instead, its maximum value is two-thirds. The probability of the principal choice being right does approach one when the ROC index approaches infinity, however, the second choice probability does not approach one. In a similar manner, the second choice in a four-choice experiment is correct with probability 6A2 1< 6 S4(A) 3 <4 < (5. 36) 11A' + 6A + 1 which takes on the chance value of one-third when the ROC is the chance ROC, but takes on the nonunity limiting value of 6/11 when the ROC index approaches infinity. Although a great many equations have been displayed here, they are all quite simple and easily obtained. This is the nice feature of an algebraic ROC curve. Such computations could proceed ad infinitum, since once the ROC character has been specified, performance of the decision device under many different circumstances

155 can be obtained. 5.2 Conic ROC This section considers a somewhat more complicated algebraic ROC curve. Since the formula for the ROC will be an algebraic formula,,the same general procedure for determining the ROC character from the equation for the ROC curves shall be used. Although this will be considerably more difficult than was the case for the pure power ROC, the work is generally much easier than when the equation for the ROC is not algebraic. The purpose of this section is twofold: (1) To demonstrate again the relation between the ROC curve and the ROC character by obtaining the character from the curve, and (2) to develop the three-parameter class of conic ROC curves for their own right. 5. 2. 1 From Curve to Character. The general technique for obtaining the ROC character from the equation for the ROC curve has basically five steps. (1) Obtain an explicit formula for y(x). (2) Differentiate this expression with respect to x, substituting eZ for the derivative of y with respect to x. (3) Invert the equation to find x(z); the distribution function for z under condition N is 1- x(z). (4) Differentiate this expression with respect to z to obtain the probability density function for z under condition N.

156 (5) Multiply this density function by e' 5z to obtain the ROC character, The coniz section in the plane is the locus of those points which can be described by a second order algebraic equation. Ay2 + 2Bxy+Cx2 +2Dy+2Ex+ F 0 (5.37) A regular conic section ROC curve must go through the point (0, 0) and the point (1, 1). This establishes two conditions on the coefficients in Eq. 5.37 (0, 0) F= 0, (, 1) A + 2B C+ 2D + 2E = 0 (5. 38) It is convenient to use the first condition explicitly but to reserve the use of the second condition until the end of the development. In order to solve for y explicitly in terms of x, rewrite Eq. 5. 37 as a quadratic equation in y AyZ + 2(Bx D) y + (Cx2 + 2Ex) = 0 (5. 39) A form of the quadratic equation solution, which may be unfamiliar to many, is a i 0, a2 + 2b + c - 0. a = -b /b2 - ac (5. 40) This is used in the present development because it eliminates the factors of two and four involved in the more common equation for the solution of a quadratic equation. For the moment assume that the

157 coefficient A is nonzero and apply the standard solution. A # Ay = -Bx-D [Bx2 + 2B Dx + D - ACx2 - 2AEx]5 (5. 41) Ay = -Bx-D [(B2 - AC) x2 + 2 (BD - AE) x + D2]' (5. 42) Two collections of terms appear in Eq. 5. 42. The first is the geometric discriminant B2- AC, the second is the coefficient of x. Another collection of terms which will appear later in the development is called simply K. Let A = B2 - AC, rF BD-AE, K CD2 +AE - 2BDE (5. 43) Using these new symbols Eq. 5. 42 becomes Ay = -Bx- D+ x + 2rx + D2]' 5 (5. 44) The first basic step is complete, an expression for y explicitly in terms of x. Differentiate both sides of the equation with respect to x and set dy/dx equal to eZ Ae = -B~.5[Ax + 2rx+ D2]- 5[2AX+ 2r] (5. 45) The third basic step is to obtain an equation for x explicitly in terms of z. Cancel some two's and move B to the other side of the equation. (AeZ + B)= +[Ax2 + 2rx + D2]'5Ax+ r] (5.46)

158 Square both sides and multiply through by the denominator of the right- hand side (Ae + B)2 [Ax2+ 2rx+ D2] - A x2+ 2Arx+ r2 (5.47) The resultant expression is a quadratic equation in x. Collect the coefficients of the various powers of x, A x2 [(AeZ B)2-A] + 2x [(AeZ + B)2 - A + D2 (AeZ + B) - r = 0 (5. 48) The large factor in the coefficient for both x2 and x is common. This factor may be zero; however, it is a function of the variable z and hence will be zero for only one (or two) values of the argument. We shall exclude such values and later obtain the results for those values by a continuity argument. The result of division by [(AeZ + B)Z - A] is Ax+2 r+ D2(Ae + B) - = 0 (5. 49 (Ae + B)2 A The third term in Eq. 5. 49 can be rewritten to obtain Ax~ + 2rx + D + (AZ - B) 0 (5. 50) The new term can be simplified by utilizing the definitions of Eq. 5. 43.

159 D2 _ r2 = D2(B2 - AC) - (B2D2 - 2ABDE + AZE2) D2A- r = D2AC + 2ABDE - A'Ez (5. 51) D2 A r2 = A(CD2 - 2BDE +AE2) = -AK The numerator is simply -AK. Next examine the denominator, a 2z z (AeZ + B)2 - - = A e + 2ABeZ + B- (B - AC) = A (Ae2z + 2Bez + C) (5. 52) The denominator also contains a factor A. Utilizing Eqs. 5. 51 and 5. 52, canceling out the common factor of A, Eq. 5. 50 becomes x2 + 2rx + [DZ - K(AeZ + 2Be + C) ] = 0 (5. 53) This is a sufficiently simple form to apply the standard quadratic equation, (5. 40), and obtain.5 A x = -2r [+ ir - AD2 + AK(Ae2Z + 2BeZ + C)1] (5. 54) Use Eq. 5.51 again to simplify the term r2- AD2.5 2z 1 A x =-2r [AK +AK(Ae + 2BeZ+ C) ] (5. 55) This completes the third basic step, an explicit expression for x in terms of the logarithm of the likelihood ratio z. Differentiate with resprect to z to obtain the probability density function for z under the condition N.

160 -5 -A f(zlN) +.5 [A A + (Ae2 2Be C)- 1] -2 (Ae2 + 2BeZ + C) (2Ae2Z + 2Bez) (5. 56) The differentiation has been accomplished by the standard chain rule so that the manipulations may be traced. Cleaning up the minus signs (5. 56) becomes 2z z 2-2 f(zIN) = V(Ae + BeZ) (Ae + 2BeZ + C) [A +A(Ae + 2BeZ+ C) 1 (5. 57) This expression must be simplified if it is going to be of much use. 2z The first maneuver is to take a square root of the factor (Ae+... ) from k-he third factor into the fourth factor, This eliminates the awkward expression containing the reciprocal of the square root of a factea containing apother reciprocal. -1.5 f(z IN) = VReZ(AeZ + B)(Ae + 2Be + C) [A(Ae2 Z+ 2Bez + C)+Al (A 58) The last factor, under the square root, is 2z Z 22z Z 2 A(Ae 2Be +C)~a = Ae +2ABe +AC+ B AC - (AeZ ~ B)2 (5.59)

161 Eliminating the discriminant h yields a perfect square, and nothing could be nicer under a square root than a perfect square. The density function for z then becomes simply f(zIN) = v ez (Ae2Z + Be + C). 5 (5. 60) The fifth basic step is to convert to the ROC character by multiplying 5z by e 5 l5z 2z -1. 5 r(z) = K' e (Ae + 2Bez + C) (5. 61) Since both terms are raised to the 1. 5 power, absorb the simple exponential into the more complicated term. - -' 7r(z) =K 5 (AeZ + 2B + Ce-Z) (5. 62) Let us summarize what has been done. A regular ROC curve was formed by passing a general conic through the point (0, 0) and the point (1, 1). The form of the equation was Ay2 + 2Bxy + Cx + 2Dy + 2Ex 0 (5.37) with the side condition A + 2B+C + 2D+ 2E = 0 (5. 38) We defined two particular constants, the geometric discriminant and a coefficient K

162 a = B2 - AC K = CD2 + AE2 - 2BDE (5. 43) After straightforward manipulation the formula for the ROC character was obtained. r(z) = K 5 (Aez + 2B + CeZ) (5.63) In order to obtain the solution the assumption was made that A was nonzero. One could return to the original equation, set A equal to zero, and repeat the process. Indeed, the author has done just that, but the work was as involved as that above, and did obtain the same solution. Why? A continuity argument gives the answer. We assume that if A is zero either B or D is nonzero otherwise y would not appear in (5. 37). Consider two curves, one for A zero and another with the same coefficients but with A nonzero. These can be made to lie arbitrarily close to each other. The character, Eq. 5. 63, is also a continuous function of A when either B or C is nonzero. We may therefore omit the special case A = 0 from our derivation and fill in from the solution by this continuity argument. The same argument holds for those values of z for which the expression in the denominator of Eq. 5. 49 is zero. 5. 2.2 Conic ROC Curves. A conic has six original parameters. Two are eliminated by the regularity conditions, and one more could be eliminated by dividing Eq. 5.37 through by any one of the constants known to be nonzero. The conics -re essentially a

163 three-parameter system ot AuC curves, and (basically) three parameters appear in the ROC character, A, B, C. The coefficients, A, B, C, are not necessarily the most appropriate coefficients to manipulate in detection work. A more appropriate set of parameters might be based on the range of the logarithm of the likelihood ratio. Let M be the slope of the ROC curve as it approaches the point (0,0), and let m be the slope of the ROC curve as it approaches (1, 1). The range of z is In m < z < In M (5. 64) M - +oc and/or m = 0 will be allowed. Since the slope of the chance diagonal is one, the value for m is less than one, and the value for M is greater than one for nonchance regular ROC curves. Given an ROC curve in an expression such as Eq. 5. 37, these terminal values for ROC slope can be obtained from Eq. 5. 45. The work is trivial but tedious, the results are given below. M = -E/D m - -(B + C E)/(A B + D) (5. 65) In Table 5. 1 are displayed parameters for various conditions of termninal slope. Table 5. 1 also contains equations for the ROC curves and characters. The fifth case in Table 5. 1 is the symmetric case, because the election of symmetric bounds on this type of ROC character means that the ROC curves will be symmetric.

164 M-oo D = O, E -1. A + 2B + C= 2 ROC: Ay2 + 2Bxy + Cx2 - 2x = ir(z) = fA (Aez + 2B + Ce-Z) 5 K = A, 0)(O= A-8, m= 1 - (A + B)M= m =0 B= 1- A, C=A, D= 0, E =-1, A = 1- 2A ir(z) = VA8 (1 + 2A sinh.5z)1 5 ROC: A(y-x)2- 2x(1- y) = 0 M oc, m=O0 B = -(1 +A), C = 2M+A, D- 1, E -M, A = 1- 2A(M-1) ROC: Ay2 - 2(1 + A) xy+(A+2M)x2 + 2y - 2Mx = 0 M / c, m / 0 D = 1. E = -M. A + 2B - C = 2(M-1) ROC: Ay2 + 2Bxy + Cx2 + 2y - 2Mx= 0 K=AMz 2MB + C, m = 1 - (M-1)(A + B+ 1) Mcc, mM= 1 B= M - 1 - A, C A, D-, E -M; A -2M/(M-1) K = A(M-1)2 + 2M(M-1), A - (M- 1)(M- 1-2A) Table 5. 1. Canonical conic equations

Relevant equations and parameter values are collected in Table 5.2 for the case of zero and infinite terminal slopes. Y on ROC Curve A=C A B neg. diag. Chance Diagonal y = x cc -c -c.50 Ellipse (Long Edge) Circle y2 + (1-X)2 = 1 1 -1 0.707 Ellipse (Pointed End) Parabola 2(y-x)= 1- (x+y-1)2.5 0.5.75 Hyperbola Singular y= 1.0 1 1 1 Table 5. 2. Symmetric conics, unbounded z When the ROC is symmetric but with bounded range the geometric names and the corresponding values for A, B, C. D and A are displayed in Table 5. 3. Figures 5. 6 through 5. 12 display on linear paper the ROC curves and the ROC characters for bounded symmetric conics. Each figure displays the complete range of geometric type from a chance diagonal to the Luce ROC curve, all with the same terminal slopes.

A=C A B Y on neg. diag. Chance Diagonal oc -cc -c.50 Ellipse (Long Edge) Circle M-1 -(M-1)2 0 M-1 Ellipse (Pointed End) Parabola.5(M- 1) 0, 5M-1 41 Hyperbola Rt. Hyperbola 0 (M- 1)2 (M- 1),v +l Hyperbola Luce 2M (M+1)2 M2+1 M M- I1 M- 1 M+1 Table 5.3. Symmetric conics, bounded z

167 n).8 -(2) (1) Chance diagonal (2) Circle *.7 L (4) (3) Parabola (4) Rt. hyperbola.6 (3) (5) Luce (2).5.4 t (5) 0o I I IZi 4. 1.9.8.7.6 y P(AISN).5.4.3.2. 0.1.2.3.4.5.6.7.8.9 1 x = P(AIN) Fig. 5. 6. Symmetric conics.5 < 2 < 2, zl z <.693

168 m(z) 4 ) (5) Luce,3 I;,(4) Rt. Hyperbola (4),2 (3) Parabola,~~~1.9 0 ^ 1.2.3 5 4.5.6.7.8.9 1.0. 5.2.1 o.1.2.3 A4.5 6.7.8.9 1.0 x = P(AI N) Fig. 5. 7. Symmetric conics.2 < <C 5 Il < 1.61

169.4 C;, (2).3 (5) Luce 7T(z) (3 t.* 2 ___________________ ____________ (4) R t. H yperbola 1 (3) Parabola r1 2 (2) Circle lzl (5).8 (4).7 I 2).6.5.4.3.2 0.1.2.3.4.5.6.7.8.9 1.0 x = P(A IN) Fig. 5. 8. Symmetric conics,. 1 < I < 10, I z I < 2. 30

170 (5) Luce (4) Rt. Hyperbola. 4 (3) Parabola (2) Circle T(Z). (5) ~~~~~~~.1, ~ ~ ~ ~ ~ ~ ~.... (4) 1 2 3!zl A. E3l 1 t 7 (' \ R n 7 7 -_ s!.-!! i=2).6.4.3 2 1 o.1'.2.3.4.5.6.7.8.9 x= P(AIN) Fig. 5. 9. Symmetric conics,.05 <! < 20, lzl < 3. 00

.4 - (2).3 0 1 2 3 4 Izl 1. 0.8.7.5.2 0.1.2.3.4.5.6.7.8.9 1.0 x = P(AIN) Fig. 5.10. Symmetric conics,.02 < 2 < 50 I z < 3.91

172 4 (2) 3 r(Z) 2 (3) (5).1 (4) 0 1 2 3 4 4.6 Izl.8.7.6.5.4.3.2 0..2.3.4.5'.6 7.8.9 1.0 x = P(AIN) Fig. 5.11. Symmetric conies,.0101 < f < 99 Izl < 4.59

173.4.3 2.1 (5 (4) 1 2 3 (4) &(5),.9 o _ (3).8 (3)or~~ ~(1) Chance ~7 | 2 ) (2) Circle (3) Parabola.6 ~~1) ~(4) Rt. Hyperbola. 5// | (I(5) Luce.4.3.2 o 1. 2.3.4.5.6.7.8.9 1.0 x = P(A IN) Fig. 5.12. Symmetric conics, unbounded 2, Izl < oo

174 Two ROC characters, one concave and tile other convex, are compared to a rectangular character. The specific cases considered are shown in Table 5. 4 (1) A = C = 1, B = 14,D= 1, E = -16, K= 705, A = 195, r- 30 ROC: r + 28 xy x + 2y - 32x= 0 r(z) = (705) 5 (28 + e + ez Iz < 2.77 (2) A C= 0, B= 15,D 1, E =-16, K= 480, =225, r= 15 ROC' 30xy + 2y- 32x = Q w(z) (480) 5 (30)-1 IzI < 2.77 (3) A =C -1, B= 16, D = 1, E -16, K = 255, A = 255, r= 0 ROC: -y +32xy + 2y- 32x= 0 () = (255) 5 (32 - e eZ) IzI 2.77 Table 5. 4, Collected equations For the middle case the coefficients A and C are zero. Therefore the ROC character is a constant over its full range. The range has been chosen to correspond to a maximum slope of 16 and a minimum slope of 1/16. For Curve 1 and 3 the z range is utilized and therefore the minimum and maximum slopes are the same. However for Curve 1, A and C are plus one, while for Curve 3, A and C are minus one. This means that as the parameter z increases in absolute value from zero, the denominator of the ROC character

for Case 1 will increase, and the ROC character will decrease. In just the opposite manner as z increases in absolute value, the denominator for the ROC character for Curve 3 will decrease in magnitude and the ROC character will increase for large z. The ROC characters and the corresponding ROC curves are plotted on linear paper in Fig. 5. 13. The cusp nature of the third ROC character concentrates more probability at extreme values of z, and therefore places more emphasis on those arcs of the ROC with extreme slope. This means that it spends more "time" rising sharply from the origin than does Curve 2 with a flat ROC character; the ROC curve is superior to the rectangular hyperbola with the same limiting slopes. In contrasting manner, the first ROC character has the more normal convex appearance with mode at z = 0. This emphasizes slope in the neighborhood of one (z = 0). If one were to use an approximation procedure for data by using rectangular hyperbola for the first order approximation, it seems fairly evident how one might adjust the ROC character compared to the flat character to obtain better fits in the neighborhood of the negative diagonal. The three ROC characters of Table 5. 4 and their ROC curves, can be compared to normal by plotting the curves on normalnormal paper. This is done in Fig. 5. 14. These hyperbolas differ from normal ROC curves by indicating higher detection indices, d, along the negative diagonal and lower values off the diagonal. That is, they have sharper "elbows" along the negative diagonal. For

176 1.0 9X- (3).8 --.7.6.5.4.3 2 0.1.2.3.4.5.6.7.8.9 1.0.3 (3).2 ~~~~~~(1)~12) i_..........i I I I i i -4 -3 -2 -1 0 1 2 3 4 Fig. 5. 13. ROC and character for three hyperbolas

177 99. / 95 -90 - 80 (3) 70 7 -(2) 60 50 40 30 20 10 5 1 2 3 4 5 10 20 30 40 50 60 70 80 x (%) Fig. 5.14. Three hyperbolas compared to normal

178 comparison, a normal curve with d = 2.3104 has been chosen to approximate Curve 1 in the. 05 to.50 false alarm probability region. The character and ROC curve for the first hyperbola have been repeated in Fig. 5. 15 on linear paper, along with the approximating normral curve and character. The normal ROC character extends to plus infinity, therefore we would expect it to have far superior performance at very low false alarm rates. Indeed, for x =. 01 the normal ROC has y =. 21 while the hyperbolic ROC shown has only y =, 132. This discrepancy might be exceedingly important in some situations. If the data for which an approximating ROC curve is desired are based on fewer than 500 observations, then determination of an accurate false alarm rate of the order of. 01 would not be possible, and this region of the ROC would be of little interest. A feature of the chosen fit is that the one percent false alarm rates occur at roughly the same z value. As can be seen from Fig. 5. 15, from the maximum z value of the conic ROC down to about z = 1 the ROC character for the conic is larger than the character of the normal; the conic ROC rises proportionately faster. By the time the false alarm rate is ten percent the two curves cross. If one's definition of "close" means "as points appear when plotted on linear paper," then these two ROC characters yield ROC curves which are quite close. 5.3 Lor-Lor Paper The previous section developed the three parameter

179 1. 0.9.8.6 *~ 4 ~ ~~(N) Curve.3 (1) Points markedo /1. I I,I I I. i.1.2.3.4'.5.6.7.8.9 1.0 ff(Z) r(z) x=. 10 x =.01 -4 -3 -2 -1 0 1 2 3 4 z Fig. 5. 15 Comparison of curves and characters for a conic and a normal ROC

180 class of regular conic ROC curves. If one wishes to use this form of ROC curve and character to fit data, one would appreciate some graphical aid to obtaining the fit. Those who have worked with normal ROC approximations a're familiar with the usefulness of normal-normal paper. In some experimental work the data have considerable scatter, and the availability of various types of special paper, a straight-edge, and the experimenter's eye can be as powerful as a digital computer. This section will develop the formulae for the coordinates of special paper, called lor-lor paper on which the rectangular hyperbolas plot as straight lines with slope one. This is followed with the general equation for the symmetric conic ROC curves as they would plot on this paper. Finally, graphs are presented of the hyperbolas, other'conics, and normal ROC curves, on lor-lor paper. The coordinate transformation which shall be used was used by R. A. Fisher in his study of the correlation coefficient and is sometimes known as the Fisher-z transformation. The coordinate transformation has also been used in the study of sequential observation (Ref. 25). 1 am unaware of the commercial availability of such paper. The ratio of the probability of an event occurring to the probability of that same event not occurring is sometimes called the "odds ratio." Quite naturally, the logarithm of this ratio is referred to as the logarithm of the odds ratio, or simply log-odds-ratio. We shall reserve the symbol L for the log-odds-ratio transformation (simply, the lor transformation).

0 -1 -2 -3 Co -4 L -5 L(p) VS. P p L(p) = in -6 -7.001.002.005.01.02.35 10.20.50 p Fig. 5.16. Comparison oflbr-and log scales

182 Definition of log-odds-ratio, L( ): P 0 < P < 1, L(P) = in 1- (5. 66) For very small values of its argument, the log-oddsratio is very close to being a simple logarithm. To demonstrate how close, the log-odds-ratio has been plotted against probability on semilog paper in Fig. 5. 16. For values of probability below something like three percent to ten percent L(P) is graphically equivalent to In P. L is an odd symmetric function about 50 percent probability. This may be seen by substituting (1-P) for P in Eq. 5. 66. Therefore when P becomes very close to unity, L(P) will again become nearly logarithmic, but with the logarithm of (1- P). L(1-P) = -L(P) P- 0, L(P)- In P (5. 67) P- 1, L(P) —ln(1-P) This symmetry property makes the lor scale more useful than the ordinary log scale for plotting ROC curves. In Fig. 5. 17 the log, lor, and normal scales are displayed. The placement was chosen to align the one percent and fifty percent points on the scales. The major contrast between the lor and log scales is in the symmetry. The log scale expands the region below. 50, but greatly compresses the region between. 50 and 1. 00, while the lor scale expands both regions

01.02.05.10.20.30.40.50 1. 00 () I I i I I i iiiiL — I I I I I III (a).01.02.05.10.20.30.40.50.60.70.80.90 95 98.99 (b) I 1 11111111 1 II I _I11111111 1.01.02 05.10.20.30.40.50.60.70 80.90.95.98.99 (c) I ~ LI I I _ I I_ I I I I _I I 1~IiiI_ (C)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~C (a) Logarithmic Scale s = in p, p(s) = (b) Lor Scale L(p) =ln in(L) / eL /(IL t 1 2 1 -U /2 d (c) Normal (Probability) Scale,p(t) e u du Fig. 5. 17. Comparison of lor, normal and log scales

184 symmetrically. The normal scale is also a symmetric scale about 50 probability. On all three scales the marked positions for probability are not uniformly spaced. The scale variables, L, t, and s, are uniformly spaced nn the axes. The quantity d' advanced by Tanner, Swets, and Green as an index of detectability is directly related to the normal scale when the experimental procedure is a YES-NO test (Ref. 13, 10). For each point (x, y) on the ROC curve the detection index is given by Eq. 5. 68. d' = S (y)- 1(x) (5.68) The functionb is the normal distribution function and the symbol -it indicates the inverse function. The inverse of the normal probability distribution function is the linear distance scale on normalnormal paper. Therefore the value of d' for any ROC point is its vertical scale value minus the horizontal scale value on normalnormal graph paper. The normal ROC curves plot as straight lines with a slope of one on normal-normal paper. The difference between the vertical and horizontal scale values is constant over the normal ROC. The numerical value of the constant is the square root of the index d appearing in the ROC character. Therefore, for the normal. ROC the value of d' is the square root of d over the whole ROC curve. NormalROC: d' = Ad (5. 69)

185 In an analogous manner we shall define a lor index of detetability of a point as the difference between the vertical scale coordinate and the horizontal scale coordinate on lor-lor graph paper. Definition: M? L(y)- L(x) (5.70) The coefficients of the equation for the regular rectangular hyperbola are repeated in Eq. 5. 71. Regular Rt. Hyperbola: A = 0, B = M-1, C= 0, D = 1, E =-M (5. 71) M is the maximum slope of the ROC curve, and the reciprocal of the minimum slope of the ROC curve. Substitute these values in the general equation, and divide through by a common factor of two. (M-1) xy + y -Mx = 0 (5.72) Such an equation can be explicitly solved for y. The solution is Mx Y = Mx + 1- (5.73) Subtract y from one 1 - x 1a-ty =kMx+ 1r-tx (5.o74) and take the ratio of y to (1-y). x yl_ M (5.75) 1-y - -x

In log atiimic form, In 1By =il + lnx MM (5, 76)' in t_ lo inaI-x Eqution 5. 76 can be simply rewritten in terms of the lor scale. L(y) = L(x) + in M (5. 77) If graph paper is used so that the true linear scale along the two axes is the lor of the probability marked; on those axes, then a regular rectangular hyperbola will plot as a straight line with slope of one. Along the entire ROC curve the lor index of detectability, M', will be constant and equal to the logarithm of M. Rt. Hyp. ROC: M' = lnM (5.78 In Fig. 5. 18 curves of constant M' are plotted on lor-lor paper. A subtle distinction which is very necessary is the difference- between M and M'. The same difference must be made between d and d' (Ref. 10). The index d applies only to the normal ROC curve, and the normal ROC character. The index M applies only to the regular rectangular hyperbola ROC curve, the rectangular ROC character. The indices d' and M' apply to any point on the ROC graph. in general, the indices d' and M' will vary along a given ROC curve. Only when the ROC curves are the special cases of normal (for d') and rectangular hyperbola (for M') will these indices

187.99.98.95.90.80.70 60 *50 *40,30 20 10 *05 02.01.01.02.05. 1Q.20.30.40.50.60.70.80.90.95.98 Fig. 5.18. Lor-Lor paper with lines of equal M'

188 be constant over the entire ROC curve, and directly related to the parameter for the entire curve. In Fig. 5. 19, a family of normal ROC curves are plotted on lor-lor paper. The index M' varies along each curve, although not radically. Along the negative diagonal, the value for M' is a minimum, and M' increases along a given curve as the point moves away from the hegative diagonal. This indicates that the normal character concentrates more probability at large magnitude z values than would a rectangular character, which yielded the same negative diagonal point. When working with ROC curves with nonzero minimum and noninfinite maximum slope, then one may obtain information about the ROC character from the ROC curve plotted on lor-lor paper. In order to demonstrate this interpretation procedure, a set of conic ROC curves with the same initial and slope values were selected. The parameters for these are given in Table 5. 5. Neg. Diag. Point A B C D E A 1 7 1 (.30.. 70) 5 - 3 5 1 -10 -11. 25 (. 2405.. 7595) 0 9 0 1 -10 81. I1 I1 (.20. 80) -10 - 1 -1 1 -10 105 3 3 3 (.10,.90) -2.21875 11.21875 -2.21875 1 -10 120.94 Table 5. 5. Table of parameters

189 *99.98 5 /.90:/ b// 9 80.80 t ~70 OVX. 60.50 40 30'20 *10 iO 05 02.01.01.02.05.10.20.30.40.50.60.70.80.90.95.98 Fig. 5.19. Normal ROC curves on lor-lor paper

190 Each of these curves has maximum slope of ten and minimum slope of, 1. A rectangular hyperbola with index M' = log 10 = 2. 30 also has these same terminal slope values. These ROC curves are shown on linear paper in Fig. 5. 20. The Luce ROC curve with slopes 10 and. 1 is also drawn. This is the limiting value for conic ROC curves with these slopes. The hyperbola going through the negative diagonal point (. 10,. 90) lies very close to the Luce ROC curve. The hyperbola going through the negative diagonal point (. 20,, 80) is superior to the rectangular hyperbola, and the elipse going through the negative diagonal point ( 30,.70) is inferior to the rectangular hyperbola ROC curve. Figure 5. 21 displays the ROC characters for the same curves on semi-log paper. Figure 5. 22 displays the ROC curves on lor-lor paper. The rectangular hyperbola has flat character, and plots as a straight line on Fig. 5. 22, Since all of the curves have the same minimum and maximum slope values, the z range is common. This is shown as a common range in Fig. 5. 21 from -2. 3 to +2. 3. This appears on the ROC curve in Fig. 5. 22 as the common asymptotic M' value. All of the curves are asymptotic to M' = 2. 30. The curve going through the negative diagonal point (. 30,. 70) was inferior to the rectangular hyperbola, while the other two were superior. The same relationship holds for large magnitude z values on the ROC character. The (. 30,. 70) curve has lower character value than the flat character while the other two have higher character value at

191 1.0 ~,970 8 y.7 l I 1 1I(. 3ol 70).6 Y.5 ~~I/~~~ ~x o F.2.3 5 6 8.9slope Fig. 5.20. Five ROC curves with common terminal slope

192 (112.4) (112.4) ~30, 70) _1 - M'= 2.30 (. 20, 80) 100 010 (. 1O,.90) Neg Diag Pt Given o Indicates X =.01 for z > 0 Y =.99for z< 0 -2 -1 0 1 2 Fig. 5.21. ROC characters -2.3 < z < +2.3

193.99.98.95.90 (.10.9.80 /(20.70 30 70).60 Y: 50 40.30.10.05.02.01.02.05.10.20.30.40.50.60.70.80.90.95.98.ig. 5.22. Four ROC curves with common terminal slope

194 large magnitude z. Of course, just the opposite relation holds when z is close to zero. Since the ROC curve is obtained by multiplying the ROC character by e+' 5, and integrating, the effect of the ROC character for large magnitudes of z overshadows the effect at small magnitudes of z. The ROC character for the (. 10,. 90) case rises from very small values to the value. 16 at z = ~2. 2, to the value 112. 4 at z = ~2 3. If the peak character value of 112 were maintained constant over a z range of only. 0026, it would accumulate an area of. 288. The integrated ROC character for the Luce ROC has jumps of.288 at z = ~M'; that is, the character has delta functions of magnitude. 288 at ~M'. The two ROC curves and characters are close. The above example compared symmetric conic ROC with the same limiting slope values. The next example compares nonsymmetric conic ROC's with the same limiting slope values. The limiting slopes chosen are a maximum slope of 100 and a minimum slope of. 1. This means that the z range will be from -2. 3 to +4. 6. The coefficients for the five cases considered are given in Table 5. 6. The ROC curves are displayed on linear coordinate in Figs 5. 23. Since the initial and final slope values are not reciprocal values, there is no rectangular hyperbola with constant M' value to compare with these. In Fig. 5. 23 are shown portions of two constant M' curves. In the high slope region, to the left of the negative diagonal, is drawn the rectangular hyperbola with index M' = 4. 6. To the right of the

195 1. 00.90 l t 90.. I., 80 - I. 60 60) 40.30.20.10 0.10.20.30.40.50.60.70.80.90 Fig. 5.23. Five asymmetric ROC curves with common terminal slope

196 (. 4,.6) A = 742 B = -633 C 722 D 1 E = -100 A=-135, 035 3 5 3 (.3,.7) A 91- B= 17 C=71- D= 1 E=-100 A —6211.+ (.2,.8) A = 12 B= 97 C = 8 D= 1 E = -100 A = +9505 29 29 29 (.1,.9) A -1 —3 B= 110- C =-21-3 D- 1 E -100 A = +12,258 1 10 1 1 1,1) y- 100x= - 0, x_ 11'; 10y-x 9 x > -i- Luce ROC Table 5. 6. Table of coefficients negative diagonal, in the low slope region, is drawn the rectangular hyperbola with M' = 2. 3. In the neighborhood of the corners points of the ROC all of the curves will come together because they approach the corners with the terminal slope values. The corresponding ROC characters are displayed on semi-log paper in Fig. 5. 24. The ROC curves on lor-lor paper are displayed on Fig. 5. 25. The two curves which are inferior to both of the constant M' curves display the same general peak character as in the previous example. The curve marked (. 1,. 9) hugs the Luce ROC for the low slope values, but not for high slope values. Therefore the ROC character displays a nearly delta-like behavior at the minimum z value, -2. 3, but not nearly so sharp a cusp at the maximum z value, 4.6. The curve marked (. 2,. 8) has a monotone decreasing ROC character. The value of this character falls between that of the two limiting constant M' values between z = -1. 4 and z = 2. 6. This

197 (13, 6) r(z) (49 6) 100 (. 9) 010 Negative Diag Pt Given o Indicates X =.01 for z > O Y =.99 for z < 0 001 1.. -2.32 3- 0 1 2 3 4 4.6 Fig. 5.24. ROC characters -2.3 < z < +4.6

.added To-.TIoI uo adols CulWal uOWULO3 q4MA O 30 J A 3I.Ta4 S 3ALL1 _ aA *J1 x 96' 06' 08' OL' 09' 0.' OV* 0S' OZ' 01' * O:' 0ZO I 0. I Ie! X I We i I! t I! lO TO' 0I Q0' 08' 06 0I T — _.6 86' 66' 861

199 corresponds to the range of slopes from. 25 to 13. 5. Since the ROC character for (. 2,. 8) falls below the limiting M' character at z = 4. 6, the ROC curve will be inferior to the M' = 4. 6 level near the origin. Since the ROC character for (. 2,. 8) is larger than that for M' = 2. 3, the ROC curve will fall above the M' = 2. 3 value as the curve approaches (1, 1). In Fig. 5. 26, a pure power law ROC is plotted on lorlor paper. Two asymptotes are drawn to this curve. The left-hand asymptote is obtained from the logarithmic nature of the lor for small values of argument. Since the pure power ROC plots as a straight line on log-log paper, with slope A (in this case, 2) the pure power ROC will plot as a straight line with slope A on lor-lor paper when both x and y are small. Since the pure power ROC curves approach the (1, 1) point with slope I/A, the ROC curve will become asymptotic to' the rectangular hyperbola with index M' = ln A. 5. 4 Choosing a Fitting Conic The conic equation contains six coefficients, A, B, C, D. E, and F. However, restriction to regular ROC curves places three side conditions on these coefficients, leaving three degrees of freedom. It would be desirable to have a procedure to obtain the "best" fitting ROC to any amount of data. Such a procedure has not bee:. developed in this present work. If three representative points are selected, the procedure

200. 99' --.95.90.80.70 ~ 60 / Y. 50. 40. 30.20 05 02.01.02.05.10.20.30.40.50.60.70.8.90.95.98 x Fig. 5.26. Power ROC on lor-lor paper

Z01 to fit the regular conic through these three points is fairly simple and straightforward. One must first check that the broken line segment that goes from the point (0, 0) to the point with lowest false alarm rate, then to the point with next false alarm rate, to the point with the highest false alarm rate, and then to the upper corner (1, 1), is convex and that no three points are collinear.. This last means that there are really four distinct straight line segments going from (0, 0) to (1, 1). The only problem is to determine the coefficients so that Eq. 5. 79 is satisfied for these three points. AyiZ + 2BxiYi+ Cxi +2Dy. + 2Exi 0, i = 1, 2,3 (5.79) Utilizing the equation that makes the ROC go through (1, 1), and momentarily making the assumption that E is nonzero, the set of equations (5. 79) becomes Eq. 5. 80. Ayi(l-yi) + Byi(l-xi) + C (Yi-xi) (Yi-i) i 1, 2,3 (5. 80) E = -1 D = - B -.5(A+C) If the determinant of the three equations in three variables is not zero, then the assumption that E was nonzero is valid and the ROC has been determined. If the determinant is zero, we may assume that E = 0 andthen resolve the equations in the first part of Eq. 5. 80 after setting the rightiand side to zero. At a meeting of the Acoustical Society of America in

.20 1965, students of Lloyd Jeffries presented some ROC curves with very distinct and measurable slopes from the origin and the (1, 1) point. If one wishes to match a conic ROC to such data, these two slopes and anyone point are sufficient to define the conic. The only restriction is that the point must be chosen to lie above the chance diagonal and interior to the Luce ROC curve with the given slopes. The Luce ROC curve with these given slopes is given by 0 < m< 1< M, y = M x < _ 1-m (5. 81) y = 1- m + n m M+m -< x < 1 Y= Y-m-t__ <x<1 M+-m - program was written to do this matching and is contained in Appendix E, The input data for such a program are the two slopes and the additional point. The program is designed to check that the point lies above the chance diagonal and below the Luce ROC curve, to determine the coefficients of the equations, and in addition to print out the ROC curve in linear and lor coordinates and determine the ROC character for the resultant conic.

CHAPTER VI CHARACTER FITTING BY TRUNCATED NORMAL CHARACTER 6, 1 Metastatic Normal The purpose of this chapter is to present various methods of approximating ROC characters by sections of normal character, in order to yield approximate ROC curves through the use of normal tables. Each technique assumes a certain limited amount of knowledge about the true ROC character, and based on this information achieves some sort of fit in terms of a truncated normal character. In Section 3. 4, the metastatic relation for general ROC curves was given. The two key equations are Eq. 3. 39 relating the ROC characters, and Eq. 3. 37, relating the ROC curves. = nj({zln Yl(b) - Yl(a) T (z).....(3. 39) ((b - a) (Yl(b)- Yl(a))' 5 X1-a Y1-Y(a) X Y (3. 37) 2 b - a 2 Yl(b) l(a) (337) The original ROC curve is subscripted by a one and the metastatic derivative subscripted by a two. Two expressions that repeat in these equations are the range of the X axis that is mapped onto the metastatic derivative, and the portion of the Y axis that is mapped onto the metastatic derivative. It will be convenient to have single 203

204 letters for these positive quantities. Let KN = b - a KSN = Yl(b) - Yl(a) (6.1) Two of the salient characteristics of an ROC character are the minimum and maximum values of z. Let the maximum value be Za and the minimum value be zb. The subscript was chosen to correspond to the X1 values on the original ROC curve. z ais value of2 when X a, Y1 = Y1(a) (6. 2) Zb is value of z2 when X =b, Y1 - Y1(b) It will also be convenient to have a notation for the shift between the z values on the first (the original) and the second (the metastatic) decision axes. This is the logarithm of the ratio of KN and KSN, herein denoted by z0. z0 - In (KN/KSN) (6. 4) l1 = 2 -0 (6.5) This chapter is devoted to the metastatic normal ROC curve. Therefore 7l(z) will always be assumed to be a normal ROC character, The general form of the ROC character with index d is d z 1 2d Wf (Z) - e 1 e (6. 6) In order to avoid confusion between the normal detection index d, and

205 the parameter in an original ROC character which carries over as a parameter of the metastic character, write m2 for dc The notation that 0 is the normal probability density function in standardized form and D is the normal distribution function is as used throughout this work. Rewriting the normal ROC character with these changes, ) e 8 ) (6.7) mN Im The ROC character for a metastatic normal ROC curve is MN( = mJK K e b < Za (6. 8) In Fig. 6. 1 are sketched two complete normal density functions on the original z1 axis, and underneath this the z2 axis. The N SN / S2 2, -ZIz03-.5tn * 5 - Z a Z0 Fig. b 0 a 2 Fig. 6. 1. Sketch of two z axes

206 three key z2 values are the minimum value, Zb, the mode value, z0, and the maximum value, Za. Since the mean of the N distribution is at -. 5m2 the values for the false alarm probabilities, X1, can be written in terms of the normal distribution function as given below. X 1) = I - Z (- CD +.5m (6. 9) At the extremes of the z2 range, a= 1 - a +. 5m b= 1 - (b +. 5m (6. 10) Because the mean of the SN distribution is +. 5m2, the formula for the Y1.1 probabilities will be quite similar to Eq. 6. 9. Y1(z1) =1- (i -m = 1 ~- -. 5m) (6. 11) The areas remaining under the normal density curves inside the abrupt truncation point are KN and KSN. KN=b- a=( j+m ( ZZ Zo ) K b a = <) ( am~ + 5 (6 12m KSN= (5 zO. 5m - (Zb.Z..5m (6. 13) The general equation for the metastatic ROC, Eq. 3. 37, becomes the following for the metastatic normal ROC.

207 [ Z. Z(6. 14) YMN =mN YMN(Z) = [ (z- 0 5m) (- z 0 5m) KN (6.15) In the following sections, the problem of fitting a given ROC character with the metastatic normal ROC character of Eq. 6. 8 is attacked. The metastatic normal is given by three parameters. Given a specific ROC character, three salient characteristics must be selected to determine the matching metastic normal. It is assumed that the ROC character to be fit contains a simple mode and has the general appearance of being normal about this mode. This can be most easily seen by looking at the logarithm of the ROC character, since the logarithm of the normal character is a simple parabola. It is also assumed that the character to be fit covers a bounded z-range. 6. 2 First Procedure In this section the equations necessary to fit a metastatic normal ROC character in a given ROC character are based on the minimum value and mode value of that given character, and the ratio of the character values at the mode and the minimum. It will be assumed that the maximum value of z is above the mode, but no other

208 use will be made of the maximum value. This informaticn is summarized in Eq. 6. 16. Known: Zbzo that Za > Z0' T(zo)/7r(zb) = rb > 0 (6. 16) This procedure is based on only the information given in Eq. 6. 16. The metastatic normal ROC character, Eq. 6. 8, evaluated at the mode, is m 1 - ~ 1 fMN(Z0) m=m e (6. 17) The metastatic normal ROC character differs from this value by only an exponential term. Equation 6. 17 can be used to simplify the writing of the off-mode values of 7T. The known information relates the character values at the mode and at the minimum value, (zb- Z02 2;rMN(Zb) = TMN(ZO) e (6. 18) Use this equation to determine the value for the parameter m zb - z0 bm 0 (6.19) The detection index, m2, of the originalnormal ROC curve, has been determined. The lower z1 cut point zb - z0 was contained in the original information. To determine the upper z1 cut point, za - z0,

209 requires a devious route which is not always successful. The only clue to the value of the upper cut point lies in the normalizing constants KN and KSN. Rewriting Eq. 6. 4 in an equivalent form N SN z0 e KN = KN (6. 20) the task of obtaining Za begins. In the equation for the K values, Eqs. 6. 12 and 6. 13, za and zb appear in both equations. Expand Eq. 6. 20 using Eqso 6. 12 and 6. 13. All the terms containing za can to moved to one side of the equation and all the terms containing zb to the other side of the equation. (Z-Z+. 5m) e ( Za -.5m m m (6. 21) (I b +. 5- e ZO ( - ~ 5m \ m m The right-hand side of this equation is identical with the left-hand side, except that zb appears in place of za. Let D(z) = ( m + Sm) - eZO ( ~ - 5m)(6 22) Equation 6. 21 is simply D(za) = D(zb) (6. 23) The value of D(zb) is known. If the function D(z) takes on each of its possible values exactly twice, then there would always be a solution

for za. For very small arguments nearing -o, the normal distribution function approaches zero. Therefore, for very large negative values of z, D will be nearly zero. D(/-c) = 0 (6. 2-4) The sign of the derivative- of D(z) is positive for z values less than Gz0o and negative for z values greater than. 5z0. Sign D'(z) = sign (. 5z0 - z) (6~ 25) This means: that D takes on its: values at most twice. For large arguments D(z) approaches 1 - e z0 D(+ol): = 1 - e (6. 26) If the mode value, z0, is positive the D function takes on negative as well as positive values to the right of. 5z0, while it takes on only positive values to the left of. 5z0. Therefore, if the known value Zb, which is below the mode, is also below half the mode, then we are guaranteed that for a positive z0 a solution exists for Eq. 6. 23. If z0 is negative, the range of values to the right of the peak of D is less than the range of values to the left of the peak. Therefore if zb is too far below the mode, z0, there will be no value Za that satisfies Eqo 6. 23. This is, summarized below. z zO<, D(zb) 1 - e no solutionforz (6. 27)

211 Given the information in Eq. 6. 16, to use.Procedure I for determining the fitting ROC character, one must proceed partly through the solution before knowing whether the procedure will be successful or not. The solution consists essentially of Eq. 6. 19 for the parameter m describing the strength of the original normal ROC and Eq. 6. 23 from which one determines the upper cut value. First Example: Consider the ROC character of the conic section which is Curve 1 of Table 5. 4 and Fig. 5. 13. This ROC curve and 2haracter were matched to an extent by a complete normal curve, in Fig. 5o 15. The equation for this character is -1. 5 r(z) = (705) 5 (28 + e + eZ),lz < 2.77 (6. 28) Although the complete ROC character is known, pick out simply the lowest value of the range, the mode value, and the ratio of the mode character to the character at the minimum value. Given: zb= -2.77 z0 = 0 (z0)/1r(zb) = 1.772 (6. 29) Based on this information alone, find the fitting metastatic ROC character. The parameter m is found from using Eq. 6. 19. Numerically this results in m= (2. 77)/ f21i. 772 = 2. 60; m2 = 6.76 (6. 30) (Thle work leading to Fig. 5. 15 utilized a normal ROC with parameter d = 2. 31 as an approximation. ) The present procedure will use anormal ROC

212 with strength 6o 76, but will not utilize the entire curve. Since the mode is zero, the equation for the function d is symmetric and the solution for the upper cut value is immediate. ( a D(Zb) )za = +2. 77 (6. 31) a (zb)=~ a All that remains is to evaluate the K parameter using Eqs. 6. 12 and 6. 13, and to write out the expressions for the metastatic normal ROC using Eqs. 6.14 and 6. 15. This results in KN = KSN = 400152 (6. 32) X = [.991106 - ~ (t + 1. 3)]/. 400152 (6. 33) Y = [.409046 - (t - 1. 3)]/. 400152 A character with known ROC curves has been used in order that we might evaluate this procedure of fitting an ROC curve by fitting three parameters of the ROC character. The true ROC curve for this case has been computed with great accuracy on the IBM 7090 computer for use in Chapter V. In order to test the usefulness of the procedure for quick hand calculations, all of the parameters, and the ROC curves of Eq. 6. 33, were calculated using a handbook of normal tables and an ordinary ten-inch slide rule. The result of the ROC curve evaluation is given in Table 6. 1, along with the true value of the ROC curve as obtained by its computer.

213 X.0125.0224.0496.0890. 1450.2200 YMN, 160.249. 416.560. 685.782 YMN YTRUE.161.251.418. 563. 683.781 Table 6. 1. Comparison of true and approximate ROC curves The combination of errors in the procedure and in the slide rule technique resulted in a discrepancy that can not be observed when the ROC is plotted in the one percent to 99 percent range, as it is in Fig. 6. 2. In Fig. 6.2 the hyperbola and the fitting metanormal plot on top of each other. Also on that graph is plotted original normal ROC from which the metastatic normal was cut out. The points on the original normal leading to the metastatic normal are indicated. To summarize this example, the first fitting procedure has been used on a shallow symmetric ROC character. The resultant fit utilized only forty percent of an original normal ROC curve. The resultant fit tothe ROC curve was excellent. Second Example, For a second example consider the nonsymmetric ROC character that appeared in Fig. 5. 24, and whose ROC passes through the negative diagonal point (. 3,. 7). We shall use only the mode value, the minimum value, and the ratio of the ROC character values at the mode and the minimum. Given: z0 =-0.2 zb = -2.3 T(z)/(zb)=7.5 (6.34) The value of the constant m is found by Eq. 6. 19. m =(2. 1)//12 in7. 5 = 1. 04 (6. 35)

214 98 97 96 70 40 30 10a 2 3 4 t10 20 so 40 50 60 70 80 PFig. 6.2. ROC curves showing NormalROC and its metastatic image

215 z In order to utilize the D function of Eq. 6. 22, e is determined and then the D function evaluated at the lower limit. To check for the existence of a solution, D(+oc) is calculated by Eq. 6. 26. The results are Z e =.81873 D(zb) =.06345 D(+c) =.18127 (6. 36) The value. 06345 is taken on only once by D, and Procedure I yields no solution. There are many ways to explain the existence of this nonsolution. The most apparent is that for this value of mode, the function D takes on more values to the left of its peak than to the right of its peak. 6. 3 Second Procedure The second procedure is similar to the first, and is utilized when the first fails. The first procedure's failure meant the D function took on more values to the left of its peak than to the right of its peak. The second procedure also uses the mode z0, and hence the same function D. However, if the known value is za, a solution is guaranteed. As in the first procedure the ratio of the character values at the mode and theknown cut are given. This is formalized below. Known'a Za that Zb < z0' T(z0)/r(za)= r (6. 37) As in the first procedure this amount of knowledge is sufficient to determine the parameter m. The expression is

2'16 m= JZ a z0f (6.38) 12 in6r a a The second part of the solution is to utilize the D function defined by Eq. 6. 22 and to solve Eq..23. In contrast to Procedure I, the value of the upper cut Z is knw, nd Eq. 6. 23 is utilized to determine Z Third Example; Let us return to the example for which the first procedure failed. This was the ROC character given in Fig. 5. 24 with the negative diagonal ROC point (. 3,. 7). The information necessary for the second procedure is the mode value of -0. 2 and the upper cut value of +4. 6, and the ratio of the ROC character at the upper cut and the mode. The values are given in Eq. 6. 39. Given: z0 = -0. 2 za = +4 6 r(z0)/r(za) = 288 (6. 39) The value of the parameter m, this time using Eq. 6. 38, is m =(4. 8)/ 21 n 288 = 1. 426 (6. 40) In this second procedure m is 1. 426, whereas it was 1. 04 in the first unsuccessful attempt. First step numerical values are shown in Eq. 6. 41. z0 e =.81873 D(za) =.18452 D(-oc) 0 (6. 41) D(-oc) < D(z ) means a solution to Eq. 6. 23 exists. The numerical a values obtained here, and the trial-and-error solution for zb was carried out using a slide rule. The solution only took five iterations,

217 and the accuracy is good to the second decimal place in ib. b = -2. 65 (6. 42) KN = 1 -`000023 -.193522 =.806455 (6. 43) K = 1 -.004001 -.010983 =.985016 SN X = [. 999977 - (t +. 713)]/. 806455 (6. 44) Y= [. 995999 - t (t -. 713)]/. 985016 An intuitive feel for the determination of the metastatic ROC curve is obtained from the probabilities of occurrence on the original Zl axis. This is sketched in Fig. 6. 3. N. 193522 | 3455.000023 SN-1.01098 5016.003.365. I.~z1 SN. 010983. 98.5016.004001 I I Fig. 6. 3. Sketch of probabilities on original axis The constant KN =. 806455 means that this amount of N-probability falls between the two cut values used in obtaining the metastatic ROC. Almost all of the remaining N-probability fell below the lower cut, with only a small amount being chopped off above the upper cut. The fact that the mode value is negative means the value of KSN is bigger than KN; more of the SN-probability was retained between the two cut values than was N-probability. Although more Of the SN-probability

fell below the lower than fell above the upper cut, the proportion is nowhere near as large as for the N case. The metastatic normal ROC curve of Eq. 6. 44 and the true ROC curve cross in the neighborhood of. 001 false alarm and. 037 detection probability. By the time the curves get into the ROC region usually displayed on normal-normal paper, the ROC of Eq. 6. 44 is about. 1 in d' above the desired ROC curve. The second procedure is based on establishing a metastatic ROC character that has the same maximum z value, the same mode value, and the same ratio of ROC character at mode and maximum z values. The match should be relatively good in the region of the maximum. If the match were perfect at the maximum it would be perfect at the mode value. The ROC character of the second example is shown in Fig. 6. 50 The fit falls somewhat below the true value at the maximum, z = 4. 6. This is relatively unimportant because of the extremely small value of the character here. However, it is reflected in a corresponding discrepancy in the neighborhood of the mode, where more probability is concentrated. The original ROC character was hyperbolic, -1. 5z and the limiting slope on semi-log paper behaves like e A parabola cannot match such a straight line for much of a z-range. Therefore, the approximation developed in Procedure II is first below, then above, and then below this straight line section. This accounts for the ROC curve falling above the desired locus.

219 99 98 97 96 95 90 80 70 60 20.0 40 F50 ---- 40 30 20 10 5 4 3 2 1 2 3 4 5 10 20 30 40 50 60 70 80 Fig. 6.4. Coompar ison of third and fourth examples with true curve

220 True Character..- M. N. Approx. Based on Mode, Max. and n/n Ratio - h M. N. Approx. Based on Min. and Location of f(z) Modes ir(z).01 001-2.6-2.3-2 -1 0A 1 2 3 4 4.6 Fig. 6.5. Comparison of characters of third and fourth examples

221 6. 4 Third Procedure The first two procedures utilized the ratio of the ROC character at the mode and at an extreme point. The matching metastatic normal ROC character covered the same relative range of vr(z) values. For the medium probability range between one and 99 percent, the behavior of the ROC character when its magnitude is one percent of the mode magnitude is of little importance. Another characteristic of any ROC character which appears somewhat parabolic on semi-log paper is the "width" of the hump. Sharply peaked characters correspond to poor detection, broad characters correspond to better detection. Another item of interest about the ROC character are the points on the character which correspond to the mode of the z distribution under condition N and condition SN. When the slope of the derivative of the logarithm of the character is +. 5, the derivative of the N density function for z will be zero. Correspondingly, the slope -. 5 at a somewhat larger z value identifies the mode of the SN density function. The third procedure shall assume knowledge of one of the extreme values of the z-range, and the position of the two modes of the z density functions. If these: two modal values are widely separated, then the logarithm of V(z) must be very slowly changing slope, and be very broad. Conversely, if the two points are close together, the ROC curve corresponds to poor detection. Formalizing what is known for Procedure II we write

zd In,= d In 7r(z Known: Za orzb' dz +0.5, dz c d (6. 45) zb < zc < za b c d a The solution for this procedure is fairly straightforward. The mode of the matching metastatic normal ROC character will lie midway between the two points with slope + 1/2 and -1/2. These two points also determine parameter m. z = 5(Z + Zd) m= Jd- (6. 46) The third step is the same as the third step in the first two procedures, namely, to solve Eq. 6. 23 for the unknown cut value, once z0 and one cut value are known. Fourth Examples: As an example of this third procedure consider the same character for which the first procedure failed, and for which the second procedure was overly generous. From the graph of the character the points with required slope were estimated to be -0. 6 for the N mode and +0. 7 for the SN mode. Given zb -2. 3 z =-0. 6 zd =+0. 7 (6. 47) In rapid order solve for z0 and m. z0 +0.05 e= 05127 m =.3 = 1. 14 (6. 48) Check to see if a solution for the D equation exists,

223 D(zb) = +. 06362 D(+oc) = -. 05127 (6. 49) A solution of the D equation is possible, and in a few iterations was determined to be z = 2. 044 (6. 50) The next steps are to determine the K constants and to obtain the detailed equation for the ROC curves. KN - 1 -.010170 -.068112 =. 921718 (6. 51) KSN= - 119000- 004269 = 876731 X [.989830 -' (t +. 57)]/.921718 (6. 52) Y= [.881000 - q(t -.57)]/. 876731 A sketch very much like Fig. 6. 3 has been drawn for this case and is shown in Fig. 6. 6. I I N. 068112 1. 921718 1.010170 -2. 06 1. 75 SN.004269 1 876731. 119000 I i Fig. 6. 6. Sketch of probabilities on original axis This approximate solution leaves less of the SN-probability in the middle than N-probability, and that far more of the high z rTolues are chopped off than with the second procedure.

2:24 The ROC curve and the ROC character were plotted in Figs. 6.4 and6. 5 respectively. In Fig. 6. 5 the match is very good all along the ROC character from the minimum point up to the neighborhood of the mode. The corresponding segment of the ROC curve is that dealing with the high probabilitiesi However, the maximum value z of the metastatic normal is only a little over two compared to the value of 4. 6 for the given ROC. On the ROC curve, as a point proceeds from high probability to low probability this metastatic normal drops away from the desired ROC curve. 6. 5 Summary The purpose of these procedures is to determine the parameters of a metastatic normal ROC curve whose ROC character matches three salient distinguishing items of a given character, The corresponding ROC is obtained with the aid of normal tables and simple calculation.

CHAPTER VII BINORMAL ROC 7.1 General Development The binormal ROC curves have appeared in the literature as a by-product of the popular use of normal ROC curves, and use of normal-normal graph paper for plotting empirical ROC data. Normal ROC curves plot as straight lines ohiviormal-nornial graph paper, with a slope of one, that is, parallel to the chance diagonal. A binormal ROC curve plots as a straight line on normal-normal paper but with slope different from one. The use of binormal ROC curves adds a second parameter to the class of ROC curves based on normal distribution functions. They have been very useful, and investigators have learned to interpret the slope as indicative of additional variance under one of the causes, N or SN, compared to the other. However, a strict interpretation of the binormal ROC curve is as a nonregular ROC. The equations for the relation of the logarithm of the likelihood ratio to the points along the ROC curve will be developed in Section 7. 2. The ROC curve can be made regular by either one of two (usually slight) modifications. The first of these is treated in Section 7. 3, and is called external rectification. External rectification corresponds to adjoining the convex closure to the regular part of the 225

binormal ROC. The second change, made in Section 7.4, is canlled internal rectification. Internal rectification corresponds to deriving an ROC based on the lielihood ratio of the decision axis that led to the original nonregular binormal ROC curve. The effects of these modifications on data fitting procedures, and upon the ROC character, will be discussed in Section 7, 5. In an article in Science (Ref. 27), J. A. Swets proposed the RQ4S curve as a basis for measuring effectiveness of information retrieval systems. Figure 7 of that article was a particular type of binormal RC family; it is redrawn on normal-normal paper in Fig. 7. 1 The letter E is Swets' index for effectiveness, Although the curve with effectiveness E = 0. 5 is nearly parallel to the ch:nce diagonal, E = 0. 0, as the effectiveness increases the slope is decreasing slightly with each successive step. These curves are binormal ROC curves. Figure 7.2 is Swets' curve comparing empirical data to the binormal ROC curves for two information retrieval techniques. Figure 7. 3 is the same data drawn on normalnormal paper, with best-fitting straight lines through the data.l The values for s,l:J4Xked in this figure, 0. 79 and 0. 86, are the slopes of the best-fitting straight line on this paper. The binormal ROC curves fit the data very much better than the best normal fit. By permission of J. A. Swets. These appeared in an unpublished work.

227 99 98 97 96 95 90 80-'.._ 70 j 60 132 50 60 -O _.._ 50 40 30 __ 20 10 5 4 3 2 1 2 3 4 5 10 20 30 40 50 60 i70 80 X = Pr p(R) Fig. 7.1. A f'an Of binomial ROC curves

228 1.0c 090r ~ 0.80 0.70 0.60 d6' 0 0.40 030 0.20 6,0~2.. l 0. 0 10' 0.a0 0. 0.40 50 0.60' 070 0.80 90.00 Pr#(RI The empirical operating characteristics for two retrieval techniques and various theoretical operating characteristics based on the assumption of Gaussian probability distributions with a variance proportional to the mean,'specifically, with a ratio of mean to standard deviation of 4. 0. Fig. 7. 2. Swets' first curves

229 2 I A A 0.98 0.97 0.96 0.95 0.90 0.80.~j~3~p~D0 0.70 - - 060 0.50 040 0.30 0.20 0.10 0.05 0.04 0.03 0.02 0.01 I II I I I I aOI 0.02 0.03 004 0.05 Q0 0.20 0.30 0.40 0.50 0.60 0.70 0.80 Prp (R/ The data reproduced on probability scales, with the best-fitting theoretical curves assuming Gaussian probability distributions. The slopes of the curves are the best-fitting slopes. Fig. 7,3. Swets' curves on normal-normal paper

230 This example has two features typical of applications of the binormal ROC. The first is that the slope is less than one. The second is that only a portion of a binormal ROC curve has been utilized to fit the data. In Fig. 7. 2 the data points lie near or to the left of the negative diagonal. 7. 2 Formal Development One form of the equation for the normal ROC curve is X = ~(-t) Y = (4d- t) (7. 1) The function: ( ) is the normal distribution function. When a normal ROC curve is plotted on normal-normal paper, the linear horizontal coordinate of a point is -t when the linear vertical coordinate is 4- t. This is a straight line, with slope one, displaced vertically above the chance diagonal by a distance 4. The letter s is often used to denote the slope on normal-normal paper. E is used so often in probability theory for expected value, that the letter q (for quality) shall be used as the index of effectiveness. The binormal ROC curves can be written in a manner analagous to the normal ROC as X =- (-t) Y = ~(s(q-t)) 0 < s! 1 (7.2) The geometric interpretation of s is the slope of the ROC curve on normal-normal graph paper. The geometric interpretation of the quality index, q, is the'distance between the binormal and the chance diagonal at the fifty percent detection point. This distance may be

231 measured either horizontally or vertically. Table 7. 1 gives the separation from the chance diagonal, the d' value (Ref. 10) at several locations along the binormal ROC curve. -1 -1 dY Location Value of d' = l(y) _ -l(X) Y =. 50 qse5q Neg. Diag. 2sq/(1+s) s X =.50 sq se* 5 Table 7. 1. Special Values of d' Let us first look at a particular binormal graphically, to see what is objectionable about the binormal ROC. Figure 7. 4 shows a binormal ROC with quality 1 and the rather small slope of 1/2 drawn on normalnormal paper. In this form, there seems to be very little that is obviously wrong with the ROC curve. The same ROC curve is displayed on linear coordinates in Fig. 7. 5(a). On this scale it is evident that the curve is an S-shaped curve, and not convex. From the point (0, 0) toward the point (1, 1) the curve convex is a regular ROC, then has a nearly linear portion where the slope is a minimum, and then the slope increases again, in fact to infinity, as the ROC goes to the point (1, 1). This same ROC has been reproduced in Fig. 7. 5(b ), together with the chance diagonal and one other straight line. The point marked C in Fig. 7. 5(b) is the point at which the binormal ROC curve crosses the chance diagonal. From this point on to the right

232 99 98 97 96 95 90 Plo 80'tO50 40 - 30 20 10 5 4 3 2 1 2 3 4 5 10 20 30 40 50 60 70 80 X(%) Fig. 7.4. A binormal ROC curve

233 1. 00.90:.80 ~70.60 50 (a).40.30 20.10 0.10.20.30.40.50.60.70.80.90 1.00 x 1. 00 90.80.70 ~.60~~C 620 10 0 / 0.10.20.30.40.50.60.70.80.90 1.00 X Fig. 7. 5. BinormalROC curve, q= 1, s. 5

234 the binormal ROC curve falls below the chance diagonal. Does this always happen? The binormal ROC curve is a straight line on normalnormal paper, and not parallel to the chance diagonal; it must intersect the chance diagonal. For s < 1 the part of the binormal below chance is near (1, 1); for s > 1 the part of the binormal below chance is near (0, 0). The point on the ROC curve marked with an arrow and zero indicates the point of minimum slope to the ROC curve. The point of minimum slope of a regular ROC curve always occurs at the upper right-hand corner (1, 1). Whenever the minimum slope occurs before the upper right-hand corner, the ROC curve must be nonregular. The additional line drawn on this figure is a tangent to the ROC curve that goes through the upper right-hand corner. The point of tangency is indicated with the arrow marked T. The ROC curve formed by piecing together that part of the binormal ROC curve from the origin to the point of tangency (marked T) and the straight line from the point T to the upper right-hand corner, is a regular ROC curve. That part of the binormal ROC that is used, between the origin and the point of tangency, is called "the regular part" of the binormal ROC. The straight line between the point of tangency and the upper right-hand corner is called the "tangent inferior. " (The tangent line to any regular ROC curve that has the smallest slope of any tangent is the "tangent inferior"; the tangent line to a regular ROC curve that has maximum slope is the "tangent superior. ") In Fig. 7. 6 the binormal ROC curve with quality q=2 and slope s=. 5 is drawn on normal-normal paper together with the tangent inferior. The point of tangency, the

235 99' - / e - 95 - -0 90 Regular ARC: P(AIN)~.5, P(AISN)K.84-1.14<z i0 r/ Irregular ARC: Slope decreasing -1. 443 < z <-1. 14 30 _.50 <P(AIN)<.75 Slope increasing. 75 <P(A I N)<1. 00 20 On the regular ARC: P(AISN) =,(1.33-.81[6 z +1.443) P(AIN) =. (.667 - 1.633.+ 1. 443) 10 z > -1. 14 4 2 1 X 4 t~~~~~~~~~I t I' I I I i I 2 3 4 5 10 20 30 40 50 60 70 80 0 959697 98 x (%) Fig. 7.6. Binormal ROC q = 2 s =.5

236 point of minimum slope, and the chance diagonal point are indicated. In the formal development that follows, the restriction that the slope falls between zero and one is made. An alternative restriction would be that the slope is greater than one. The two cases must be treated separately, but the results are similar since the two cases correspond to reflections about the negative diagonal. The equation for the binormal ROC curve is X = ~(-t) Y = (s(q- t)) O < s < 1 (7. 3) Since X and Y are one minus the distribution functions for the dummy random variable t, the probability density function for t is obtained by differentiation. f(tlN) d(1- X) - 0(-t) (7. 4) dt f(tISN) = (1 Y) s0[s(q-t)] (7. 5) The function 0( ) is the normal probability density function. In order to determine the logarithm of the likelihood ratio of t, divide Eq. 7. 5 by Eq. 7.4 and take logarithms. z(t) = In s -.5sz (q-t)2 +. 5(-t)z (7. 6) When (7. 6) is expanded in terms of the variable t z(t) = in s +.5(1-$2 z) t +sz qt -.5sz qz (7.7)

237 Up to this point no use has been made of the restriction that s falls between zero and one. In fact all of the equations above hold for allvalues of s greater than zero, including one. Were s equal to one the quadratic term. 5(1-sz )tz would drop out of the Eq. 7. 7, and z would be linear with t. When s is not one, z is quadratic in t. It is basically this quadratic term which causes all the trouble for the binormal ROC; this quadratic will always appear when the curve is binormal and not strictly normal. Use of the assumption that s falls between zero and one is made in order to factor out the term 1 - s2, complete the square, and obtain the final expression for z. z(t).5(1 2 ) (t + + In s - 5 q (7. 8) 1- s2 (1 82 The minimum value of the quadratic term is zero. Therefore the minimum value for z is the second term of Eq. 7. 8. Call this value ZO 0' =nS-.5 q2 (7.9) 1- sq The corresponding value for the dummy variable t shall be called t0. t -= q (7.10) to:1-s.~ (7...10)

238 In Figures 7. 5 and 7. 6 this point on the ROC was indicated with the index zero. The coordinates of the point on the ROC are given by Eq. 7. 1., 2 x0= (-to) = S q) (7. 11) YO = Jts,(q-t)] = 1 From this point of the binomal ROC curve to the point (1, 1) the slope will be increasing. The slope of the ROC curve is the likelihood ratio, z e. In Eq. 7. 8, as the quadratic term increases. either because t is" becoming greater than to or because t is becoming less:than t0, the value of z will increase. Since the slope along the arc from(X0, Y0) to (1, 1) is increasing it is greater than e. This formal, proof of the concave nature of the binormal ROC curve near (1, l1) is summarized in Eq. 7. 13. dY z dY X <X< X e< 1 (1 > e = 0 O~dX dX X 0, It would be very nice to be able to solve for the:point of tangency of the tangent inferior as simply as for the point of minimum slope. In order to do this consider the logarithm of the slope of the

239 tangent inferior. i - Y (s (t -q)) (tT) = In In XtT ~T I - XT!b (tT) The value of tT is the value of t at the point of tangency. An explicit solution by equating the right-hand sides of Eq. 7. 8 and Eq. 7. 14 is not possible. The graphic solution used in Fig. 7. 5 when the ROC is drawn on linear coordinate paper is straight-forward although not extremely accurate. A similar procedure can be used on normalnormal paper by drawing a set of curves on that paper which would be straight lines from the point (1, 1) on linear paper. The point of tangency between the tangent inferior and the binormal can again be judged rapidly but not extremely accurately by this graphical method. The only guaranteed accurate method known to the writer is an iterative solution for tT, by tables and hand computation, or by computer. The values obtained in the present paper were obtained using an IBM 7090 computer. Before proceeding to various methods of rectifying this irregular ROC curve, the ROC character for the regular part of the arc is obtained. The equations also hold for the irregular part of the arc, but the intended application is to the regular part from the origin to the point of tangency with the tangent inferior. Equation 7. 8 is simplified by using Eqs. 7 9 and 7. 10.

240 z = 5(1- s )(t- t0) + Z (7. 15) Solve Eq. 7. 15 for t t =t+[s (Zz0)} 5 (7. 16) The equation for the ROC curve, in terms of the logarithm of the likelihood ratio, is therefore X (- s- +q 2/(1 s) (7. 17) Y-=, + S2/(i1 -' + 2 ) ) The Jacobian of transformation from t to z is the derivative of t with respect to z. dt 2 5 d = 2(1- s) (7. 18) The density function for z under condition N is obtained by the substitution of Eq. 7. 16 into Eq. 7. 4, and then multiplying by Eq. 7. 18

241 f(z N) = d;2 A/2(1- s2 ) (7. 19) (z-z0) - 5e [2 t02 + At, to /Z 12 (zzo)] (.5(z-z0).5z f(z) = e e f(zlN) (7. 20) Equation 7. 9 yields 5z 25qt e' s e l(7. 21) This leads to the following form of the ROC character JT~z) 1 ______.2.5qt0 - 5t 2 (7. 22) -.17:- (1+sZ \,Z-Z5 1_S Z( ~ s2 2 ) e e The final step is to eliminate the constant t0 by using its definition,

242.q r(z) s=. (-s ) e. 5 [ 4(1-s2 )z (Z-z 5 (7. 23) (e \ 1-s" S Equation 7.23 is so complicated with clusters of coefficients that it is difficult to envision its behavior. It is rewritten to focus on those places where the variable z - z0 appears. 7(z) = A (z-z0) 5 eB -C(-) (7. 24) If the quality is positive, as we may assume it to be in most interesting cases, the coefficients A, B and C will be positive. For a very large magnitude z the final exponential term will dominate, and the ROC curve will behave similarly toone with an exponential character, a pure power ROC. If the constant B were zero (it is not), then the character would be that of a Pearson Ill with p = -. 5. Such a Pearson, character is monotone decreasing in z. The term Bz —07 tends to make the character increase as, z increases. This means that the- character may exhibit a mode, or may be monotone decreasing in z, depending on the specific values of the parameters. In this section the ROC character for the regular part of the binormal ROC curve, Eq. 7.23 was obtained. The equation

243 for the ROC curve itself in terms of the slope and quality parameters, and the logarithm of the likelihood ratio z was also determined. This is Eq. 7. 17. 7. 3 External Rectification The process of external rectification consists of replacing the irregular part of the ROC curve with the tangent inferior. The corresponding action on the ROC character is to limit the applicability of Eq. 7. 23 to those z which fall above zT, (the logarithm of the slope of the tangent inferior) and to add a jump in the integrated ROC character at zT of appropriate magnitude. No analytic solution for the tangent inferior has been found. Therefore a computer program was written that did the following search. The binormal ROC curve was traced starting at the origin. At each point along the ROC curve, the value of z was compared with the (logarithm of) the slope from the point on the ROC to the point (1, 1). Near the origin the line connecting the ROC point with the point (1, 1) is a secant, interior to the ROC. As soon as the two slopes agree, the secant has become a tangent, and the program halts. This is a straightforward, albeit brute force, method for determining the tangent inferior and zT. This program is listed in Appendix F. Figure 7. 7 presents five ROC characters which were plotted as part of one run of this program. For this run, the slope parameter s was fixed at 0. 8. The qualities examined were q = 1, 2, 3, 4, 5.

244 3x104 I..047. 001 ---- 3I -4 -3 -2 -1 0 1 2 3 Fig. 7.7. Five binormal characters external rectification

245 In Fig. 7.7 the points on the ROC character with positive z marked with a large open dot indicate the point at which the ROC curve passed through. 01 false alarm probability. The points marked with the solid dot indicate the points at which the ROC curve passed through the. 99 detection probability. When the quality parameter is one, the ROC character is strictly monotone decreasing. In the neighborhood of z0 the ROC character is very steep. The other four ROC characters present a more normal appearance. Each is terminated at zT with an w term. For q = 2 and greater, 7T ZT was sufficiently to the right of z0 to prevent the appearance of a sharp rise in the ROC character due to the term (z-z0). The corresponding ROC curves are plotted on normalnormal paper in Fig. 7. 8. The effect of the external rectification is evident only in the q = 1 curve. As was seen on the ROC character, all of the other curves contain the one percent to 99 percent region in their regular portion. Although it is important in theory to make sure that the ROC curves are regular, it may very well turn out in practice that this careful consideration has been unnecessary. In order to exhibit binormal ROC curves which evidenced need for rectification on the usual plot on normal-normal paper, a number of cases with slope equal to one-half were considered. These are presented in Fig. 7. 9. The points where these would cross the chance diagonal are marked -ith C and the points where these would take on their minimum slope

246 99 I 1 1 q=5 98 97 96 - 95 - 90 /0 q=- 3 80 q=2 70 60 y q= 11 1 1 1 (%) so 30 20 10 1 23 4 5 10 20 30 40 50 60 70 80 X(%) Fig. 7.8. Binormal ROC with slope s =.8 q = 1

247 98 - 97 96 T 95'80X 70 60 40 30 10 5 4 3 2 1 23 45 10 20 30 40 50 60 70 80 90 95 96 97 98 99 Fig. 7.9. ExternaUy rectified binormal ROC curves

248 e are marked with 0. The point at which the tangent inferior is adjoined to the curve to make it regular is marked with T. The, curves shown in Fig. 7. 9 are regular ROC curves, derived from binormai ROC curves by external rectification. The corresponding integrated character has a jump at the value zTj and is differentiable from ZT to infinity. 7. 4 Internal Rectification The binormal ROC curves are irregular because they are based on a decision axis, t, which is not monotone with its own likelihood ratio. External rectification removed part of the axis from,considerantion in order to maintain a monotone relationship between the/ remaining range of t and the likelihood ratio. A'decis1ion mechanism' that allows access to the t axis can undergo a superior rectification to the simple external rectification. Without access to the decision axis there is no other choice but to stay with external rectification. Consider a decision device with a decision axis, t, which led to a specific binormal ROC curve with quality q and slope s between zero and one. The probability density functions for this decision axis are both normal, and are given in Eq. 7. 4 and 7. 5. Since the logarithm of the likelihood ratio is quadratic in t, then the response "A" corresponding to z being greater than some level will be equivalent to a two-tailed test on the t axis.

249 The probability density function for z~ under both of the causes, N and SN, is based on the sum of the probability densities for each of two appropriate t values (the solution of 7. 15). The ROC character may be determined as the probability density ~.5z for z under one of the causes, multiplied by either e Another approach is to find the root likelihood product, and to multiply this by the Jacobian of transformation, dt/dz. The root likelihood product, using Eq. 7. 3 and 7. 4, and taking the square root, is [t IN)f(tN) (tlSN)]5 _ [ s ]5 4-[t +s(q-t)2] (7. 25) In the previous section the relation between z and t was derived t t o [ - (z-z ) (7. 26) This differs from the previous equation, Eq. 7. 16, in the range of application. Previously t had been limited to those values t greater than to, in fact, those values of t greater than tT. However, direct use of the likelihood ratio allows use of the entire t axis. A transformation of variables effects the probability density functions by the absolute value of the Jacobian of transformation and not the signed value. Therefore, Eq. 7 18 holds independently of whether the plus or the minus sign in Eq. 7. 26 is used.

250Q I dt= [2(1s2z ) ] (-~)Z 5 (7. 18) The only remaining step is to substitute t into the expression in the exponent of Eq. 7. 25 and do some careful algebra. Expand the exponent of Eq. 7. 25 and use Eq. 7. 26 to obtain t + s (q-t)2 [2 ] (z-z) + (z-z )' (7. 27) + s2;q2(1+s2) ) L (1_S2 )2 The ROC character is obtained by adding together the two values for the root likelihood product that correspond to the same value of. z, and multiply the result by Eq. 7. 18. Two of the factors in the exponent, Eq. 7. 27, are common. That is, two of the terms in Eq. 7. 27 are independent of whether the plus sign or the minus sign in Eq. 7.26 is used. The Jacobian of transformation is also independent of which sign has been used. Therefore, the only effect of the summation is on the middle term in Eq. 7. 27. Since this middle term appears in an exponent, the adding of the two terms will yield a hyperbolic cosine term. The result of the transformation is shown below.

251 s. 2 -(S ) 7T(Z) = [ < 5 4(1-s ) [iz) 5 e:j2-s J.cosh q (zz ). ]28) S2 _q2 z'> zO - in s (1-s2 ) The main change between Eq. 7. 28 and Eq. 7. 23 appears to be that the single exponential term involving a square root of z has become a hyperbolic cosine term involving the square root of z. Another basic difference which is not so evident is that Eq. 7. 28 is valid for z right down to z0, whereas Eq. 7. 23 is valid only for z as low as ZT, which is greater than z0. This means that the minimum slope of the internally rectified binormal ROC curve will be the same as the minimum slope of the original irregular binormal ROC curve. The integrated ROC character for the internally rectified case contains no jump, that is, there is no w7 term at the lower end of the ROC character. However, there is an action in the ROC character which behaves very similar to this "delta-function." Since z is no longer ROC character will be dominated by the term (z-z0) * 5. As z gets

252 smaller and smaller the character will suddenly become strongly influenced by this term and will approach infinity. s2 q2 s2 s4(1- -5 z z -W (Z e (z- zO) - co ~ ~ ~It0 - [_]- 5 4(1-s22 0 _ 5 (7. 29) An example of this behavior in the neighborhood of z0 is shown in Fig. 7. 10, a plot of the logarithm of the ROC character for the case q = 1, s =. 8. The ROC character for the externally rectified case was shown in Fig. 7. 7. The characters for both the internal and external rectification appear very similar. The only difference is that the ROC character for the internally rectified case is continuous and has no wc term adjoined to it. However, the appearance is very much the same for both cases. A detailed comparison of the behavior near z0 for internally and externally rectified ROC characters is shown in Fig. 7. 11. The case considered is q = 2, s =.8. The value of ZT is about.01 greater than the value z0. For the internally rectified case the ROC character is monotone decreasing as it approaches zT from higher values, the then W (ZT) is abruptly adjoined. For the internally rectified case the ROC character decreases, passes through a minimum, and rises abruptly to infinity as it approaches z0 from above.

253 10 -.~ _~q=l r(z) 1 01 0o1 I i I,.. l I' l -3 -2 -1 o 1 2 3 4 Fig. 7.10. ROC character for internally rectified binormal, q=l, s=.8

8'=S'(Z=b'JaolozBe DO3t Jo ImLyao II'L'T.. z I I 1 I I I I I I' 0 I00' 100' I \ zoo. I \ ~ 900' oo00 (z)I 800' 600 010 t0o' o10'

255 99 98 97 96 95 90 80 70 60 Y x%)50 40 30 20' 10 5 4 3 2 1 2 3 4 5 10 20 30 40 50 60 70 80 Fig. 7.12. Binormal ROC curves, internally rectified

25-6 The computer program that accomplishes internal rectification is also in Appendix F. The case s -.5 that was analyzed for external rectification was also analyzed for internal rectified ROC curves. Figure 7. 12 shows the results of internal rectification when the original quality was q = 1, 2, 3, and 4. These may be compared with the externally rectified solution given in Fig. 7. 9. Curves for internal rectification are smoother, and somewhat better than for the externally rectified case. The appearance of the q = 4 curve is virtually unchanged, however, the additional increases being in the third or fourth decimal place. 7. 5 Discussion This chapter considered binormal ROC curves, which plot as straight lines at slope less than one on normal-normal paper. These are irregular ROC curves, but we can associate with any binormal curve a regular ROC curve by either external or internal rectification. Such rectification changes the shape of the curve, and changes the ROC character. A very important question is the actual magnitude of the changes in the ROC curve, especially if one is interested in a limited region of the ROC curve. In the first section, ROC curves were taken from some work of J. A. Swets. For the family of curves used, the slope is related to the quality by the equation s =1.25q (7.30) 1+t-.25q

257 It can be shown that all of the ROC curves in this family pass through a common point on the chance diagonal. This is the point (. 999968,.999968). The data (Fig,7. 2, 7. 3) fell in the medium percentage levels, in fact, to the left of the negative diagonal, and were quite distant from probabilities such as. 999968. The program developed for this work internally rectified six curves from this family. The result is shown on extended normal-normal paper in Fig. 7. 13. The resulting rectified ROC curves do not cross the chance diagonal, and do not cross each other. These curves and their corrections on the usual range of normal-normal paper is Fig. 7. 14. In the region above. 97 detection the rectified ROC curve is slightly better than the original binormal curve. One can certainly conclude that Swets is in no trouble from having chosen to operate with a nonregular ROC curve. The rectification is formally called for if a rigorous treatment is desired for the ROC curves. As a descriptive model of empirical data a way of fitting and describing data, the curves are perfectly valid. In fact, one may undoubtedly manipulate the model mathematically so long as the operations stay concentrated on the region far from z0, Binormal ROC curves have arisen in detection problems with slopes as drastic as s =. 5. When such curves are to be used in a region anywhere close to the chance diagonal, one or the other type of rectification must be considered to render the ROC curves regular. The final figure of this section, Fig. 7. 15, shows two

258.960.950.940.4.920.90.80.60.40.20 10.01 / A Fan of Binormal ROC The Corresponding Internal Rectffted ROC.031.01.01.051 Fig. 7.13. Example of internal rectification of binormal ROC curves

259 99 98 97 96 - / / 96. / 90 80-/ 70 60 - y ()50 g 7. 1 A all go through the Potnt (.999968,.999968) (0{4) =.999968) s = 4/(4+q) The sections evidencing internal rectification 1 2 3 5 10 20 30 40 50 60 70 80 x () Fig. 7.1~41.A "fan of binormal ROC curves

26099........ 98 97 96. 95 External: Rectification. Slope - 32 90O 80 / 70,', 60- _ 405 X ( 1. 5: 10i 20 30 40so/ 50. 60 70 80+ Figo'.115. Binormal ROC, s =.5 showing two types of rectification

261 binormal ROC curves with slope equal to. 5. Both the internal and external rectification have been shown on the same figure. If one is dealing with slopes as drastic as this, he may very well wish to incorporate rectification into his model.

CHAPTER VIII ROC FOR MULTIPLE OBSERVATIONS 8. 1 Convolution Theorem The distribution for the sum of a number of independent random variables can be determined by convolution of the distributions for the individual random variables. In this section, the convolution theorem for the ROC characters is developed for characters without jump functions. The convolution concept is important in detection situations in which the observation upon which a single decision is based consists of several independent parts. The logarithm of the likelihood ratio for such an observation is the sum of the logarithm of the likelihood ratios of the independent parts. Let z. denote the logarithm of the likelihood ratio for the i-th part of the observation, and.i(zi) denote its ROC character. Let z be the logarithm of the likelihood ratio for the complete observation. n = z. (8. 1) i=l 1 Theorem: If 7rT(Z.) are the ROC characters for statistically independent observations, then n(z) = r1(Z1) * 7r2(z2) *... * rn(zn) (8 2) where the star indicates convolution. 262

263 Proof: The proof is inductive, and depends on the convolution theorem for probability density functions. InitialStep: n = 2 z = z1 +z2 f(z SN) = fl(zl SN) f2(z2 = z - z. SN) dzl.5z1.5(z - Z1) e 5 rl(zl) e r2(z - zZ) dz ~5z = e (Z) 7r (Z z ) dz (Z) e f(z iSN) T. (Z) J' l(Z1) m2(z - Z) dz Inductive Step: Assume that the theorem is true for n r > 2. Z - L Z. Let z - z. 1 i=1 z - Z+z - ~ ~~~ r _ul

264 Proof (Cont.) by the initial step, and the inductive hypothesis w(z) = [ l(Zl) * 2(Z2) *... rT(z)j *r +1 (Zr + 1) Convolution is associative, so we may omit the [. Q. E. D. The convolution of the classical distributions are well known. Brief comments about the normal, the rectangular, and the exponential, in order to call attention to certain specific ideas. The convolution of a finite number of normal density functions is again normal, with the mean being the sum of the means and the variance, the sum of the variances. The normal character is proportional to a zero mean normal density, with variance d. Therefore, the convolution of a number of normal characters will yield a normal character, symmetric about zero, and with detection index d being the sum of the detection indices for the individual Z. d = d (8.3) The rectangular ROC character is iT.(z.)'= I zi < ln(I+R) (8. 4)

265 The convolution of two identical such rectangular characters is given in Eq. 8.5, 1r(z = z1 +z2)2 In ( + R)-+ 4R2 Iz[ 2 In (1 + R) (8.5) This is a triangular ROC character. The convolution of n identical rectangular characters will be expressed in n-th degree polynomials in z. The expression will consist of n + 1 sections along the z axis, and become very cumbersome. The result has a central tendency to the normal character. It would be useful to have a measure of the normal parameter, d, which could be used as the number of individual rectangular characters is large. The usual procedure is to equate variances. In order to do this, we would have to re-normalize the ROC character to have unit area. An equivalent calculation for symmetric characters is to define a quantity such as =& SZ d(Z) dz...........'(8. 6) Sf (z)dz For the rectangular character of Eq. 8. 4 A d1 3n (1 + R)) (8. 7) Eq. 8. 6 evaluated for the character resulting from n convolutions of similar ROC characters will be n times as large as for the single

266 ROC character. d = n(1 + R)) (8. 8) n =. The gamma, Pearson III, and chi-square distributions all correspond to the same ROC character. It is well-known that the sum of chi-square random variables has a chi-square distribution, and similarly for the other distributions. These, in turn, are strongly related to the simple exponential. Consider the simple exponential ROC character given in Eq. 8.9. A+ i 0 T(z A' ATT > = -lnA (8.9) i(zi) - A- 1 n A 0 If n such characters, all with the same index A, are convolved, the result is the Pearson III character with the same index. + z-nZ0 ) I r(Z) LAA1 ] (z - O)n z e z > n z0 (8. 10) This result does demand that the indices A be identical for all of the individual characters. 8. 2 Discrete Characters The proof used in 8.1 for the convolution of ROC characters was given in terms of probability density function. A similar

267 result holds for discrete ROC characters. It will be assumed without proof. The simplest discrete character is the Luce character with only two points. The ROC curve, such as shown in Fig. 8. 1(a), consists of two straight lines meeting at the point (x1, Y1), also marked as the vertex V1 0. In the general binomial ROC curve, let the vertices be marked Vn k indicating that n observations were made, and k of these or fewer were indicative that the condition was N. As with any polygon, it is easiest to write out the total polygon by naming the vertices. The Luce ROC curve is specified by (0,0) V1,0 = (X1 Yi) (8.11) If two independent observations are made, and each has the same Luce ROC character, then three likelihood ratios are possible. Because of the statistical independence and similarity of the observation, order is unimportant. The ROC curve resulting from Fig. 8. l1(a) is shown in Fig. 8. l(b). This is a Green double threshold ROC. What has previously been referred to as the Green ROC is the limiting form of the Green double threshold ROC, where the vertices V2 0 and V2 1 lie arbitrarily close to the edges of the graph. The vertex V2 0 corresponds to the response "A" made whenever neither observation was indicative of condition N. The vertex V2,1 corresponds to the response "A" whenever one or fewer observations were indicative of condition N. The description of this ROC by its vertices, together with the relation of these vertices to the Luce vertex is given in Eq. 8. 12.

268 1.0 9.8 V1,0.7.6 (a) 5.4 3 2 0.1.2.3.4.5.6.7.8.9 1.0 X 1. 0.9 / V2, 8 7.6 V2,0 Y.5 (b) 4 3.2.1 0.1.2.3.4.5.6.7.8.9 1.0 X Fig. 8. 1. (a) A Luce ROC, (b) aGreen double threshold ROC

269 (0,0) V20 =(X2, Y1Z) V21 = (2x1-x1zA2y,1yl ) V2 2 = (1,1) (8.12) 2, 2 In the particular case graphbd in Fig. 8. 1, the Luce vertex V1 falls on the Green double threshold ROC curve. It falls exactly in the middle of the line segment between the two middle vertices. This will always happen, The notation that half of one vertex plus half of another vertex means the point lying halfway between the two vertices indicates both the geometric relation, and the algebra that must be performed on the coordinates to obtain this geometric relation. Equations 8. 12 for the Green double threshold ROC and 8. 11 for the Luce ROC can be combined to obtain: 2 2,0 2 V2,1 = (X, Y1) = V1,0 (8.13) If three independent observations are taken, and each has the same Luce ROC, the resultant ROC will have three nonchance vertices. These are listed below (0,0) V V V V (0,) V3 0 V3, 1 V3, 2 V3,3 = (1,1) (8.14) and the equations for the nonchance vertices in terms of the Luce vertex are

270 V 3 V3,0 = (X13 Y1 ) V31 = (3x12 - 2X13 3y! - 2y! ) 3,12 y1 V3 2 = (3x1- 3x1Z +x13, 3y1- 3y1 +Y1) (8. 15) Examination of Eq. 8. 15 and the equations for the Green double threshold ROC vertices, Eq. 8. 12, reveals that the Green double threshold vertices lie along the segments of the three observation ROC.-.1~2 2 3 V3,0 3 V Y) = V (8.16) 33,1 + 3 ( x1Z 2yl- y12 ) = V3,0 (8.17) It appears that each successive ROC curve, due to one additional observation, is better than the previous ROC, but not strictly better. Each vertex of the prior ROC touches a segment on the new ROC. This is indeed the case. The theorem will be stated in terms of the vertex notation; however, the proof itself will deal with the coordinates of the vertex. The meaning of the vertex notation is written out explicitly V (X k' Y k) n observation, < k "B"t (8.18) n, k n,,n Theorem: k1- k+1 ( n ))V V 0Ck<n n n,k+1 n-lk

271 Proof: Assume that n observations have been taken, and that each has a Luce ROC character associated with the Luce point (x1, y1). The k-th ROC point corresponds to responding "A" whenever there have been k or less observations indicative of condition N. The probability of a false alarm is therefore given by a partial sum over the binomial distribution k n,k 0 Cj (1 x)J xJ (8.19) =0The probability of a detection is gven by a similar su The probability of a'detection is given by a similar supi Y - y. n-Y (8. 20) n, k j 0 j 1 The only difference between Eqs. 8. 19 and 8. 20 is the use of x1 for the condition N and Y1 for the condition SN. The following manipulations will deal with the N condition. Similar results for the SN condition may be obtained by interchanging x1 and Yl. Most of the specific manipulations have been placed in Appendix G and only the key results displayed here. Equation 8. 19 can be expanded and rewritten as a polynomial in x1

272 Proof (Cont.) n 1 V' m 1 k n- m (- 1) n - k)Ck 1 C x (8,21) If the "A" decision is relaxed slightly to admit k + 1 indications of conditions N, the additional probability of false alarm is a probability of exactly k + 1 indications of condition N. nM - Xk+1 n-k-1. n,k+1 - x1 x1 (8.22) This can also be written in terms of a polynomial in x1 k+nk+n m 1 X k+ X =- 1) (n-k)Ck - 1) +n, nk m=n-m n n n-k-1 *U x + C x m k+lXl (8. 23) The exact form in Eq. 8. 23 has been chosen to make it as similar as possible to Eq. 8. 21. In order to establish the theorem, the highest power of x1 in Eq. 8. 21 must be eliminated by canceling it with the highest power in Eq. 8. 23. This highest power corresponds to the summation index m =.

273 Proof (Cont.) This dictates that we try adding (k + l),/n of Eq. 8. 23 to Eq. 8.21. k+l1 + ()(X -X k) nk n n) k+l n k = k- n-i k m 1 k n-1-m (-1) (n - k - 1) C x m =0 (8.24) The right-hand side of Eq. 8.24 has the form of Eq. 8. 21 when n is replaced by n - 1, so the X part of the vertex equation is proven. Since the manipulation on Y k is identical to those for n,k X the theorem has been established. n,k' One additional fact about the binomial ROC curves that develop from repeated Luce observation may be noted. Pass a pure power ROC through'the Luce vertex. The parameter value for this pure power ROC is xl = Yl 4 A = (ln)/(ln (8. 25) The first vertex above the point (0,'0) on each of the binomial ROC curves corresponds to saying "A" if and only if there have been no indications of condition N. The coordinates of this vertex are

274 therefore the n-th power of the coordinates for the Luce vertex. V- 0= (x1j, y' Y ) (8.26) Therefore, this vertex also falls on the pure power ROC curve with index A. x Y A(8. 27) nO n, 0 Consider an observer whose complete one-observation ROC curve is a pure power ROC. If this observer records decisions made on a number of independent observations, and forgets everything else about the observation, then his individual decisions act as if they came from a Luce ROC. If he then uses these several decisions to reach a terminal decision, his terminal decision will be no better than one he could obtain from a single observation, if his final criterion demands confirmation of the condition SN on each observation. In order to use the multiple decisions to reach a better terminal decision, he must employ not-toolax a criterion for the individual decisions, and not-too-strict a criterion for the terminal decision. This behavior of the initial point on the polygon ROC curves is shown in Fig. 8. 2 for the same Luce and Green double threshold ROC used in Fig. 8. 1. The first part of the polygon ROC falls below the pure power ROC, while the remainder falls above the pure power ROC. The work above has been very detailed, and is the type of analysis one would use when the number of observations leading to a

275 1.0.9 V2 1an.8 V I, 0.7.6 V20.5.4.3.2.1.2.3.4,5.6.7.8.9 1.0 Fig. 8. 2 Initial point power law

276 terminal decision is very small. When the number of observations leading to a terminal decision is large, a description of a continuous ROC character that approximates the binomial ROC character would be convenient. The two possible log likelihood ratios that occur in the Luce ROC character, and the corresponding jumps in the integrated ROC character are sketched in Fig. 8. 3. W2 z z2 1ig 8.3. Sketch for Luce ROC Fig. 8. 3. Sketch for Luce ROC character The equations for these quantities in terms of the Luce ROC point (x1, Y1) are given in Eq. 8. 28. 1 In ((I Y1)/(1- x)), z2 ln(y1/x1) i = (1 - yl)(l - x1) 2 = (X1 Y1 (8.28) c2 (x1 yl) 5 (8. 28) The binomial ROC character for n observations is dnl(z) = Ckn e1nk k2 at z = (n- k) zl +kz2 (8.29) Equation 8. 29 is not the familiar binomial probability equation unless the sum of a] and w2 is one. Since an expression that is the binomial

277 probability function would be useful multiply and divide by the sum of w1 plus w2 to obtain Eq. 8. 30 d-)=WI n n-k c2 k d1n (z) =+ 6 C ( t a ( + (8. 30) n1 2 k W(+W + 1 The standard approximation for the binomial probability function is a normal probability density with mean value and variance given by familiar equations. Since we want to use this approximation for z, and not for the integer value k, we must multiply by the Jacobian of transformation. This is indicated here 2era *~r~ ~ (k- m)= L 1l + 2c |n nr (z) ~ / t e | dz (8. 31) where the values for the parameters m and a2 are given in 8. 32. W2 1 2 m = n ( a n( p+2 (8. 32) W1 +W2 ( + A2)2 This is the continuous ROC character which will be used to approximate the discrete character of the actual binomial ROC. To obtain this equation in terms of the variable z, use Eq. 8. 29 for z in terms of n and k to do the transformation. After the very straightforward algebra, the resulting form for the approximating ROC character is given in Eq. 8.33.

278 (z- n3)2 1e 1 -nD lra(z) j 1 2 -2" n J+ e nz < z < nz w1 * 2 D = (z2 z1)2 z w +Z cv = 1 2 2 (8.33) (w1 +2) The parameter D has the role of a variance term, while the parameter [3 will be interpreted as a skewness parameter. If the range of z were infinite, the normal ROC character would have to have a mode of zero, which would preclude the use of Eq. 8. 33 except for 3 values of zero. However, the approximating ROC character of Eq. 8. 33 applies only to the bounded range of z values that can be obtained from the original Luce character in n observations. If the original Luce character is nonsymmetric, this resulting range will also be nonsymmetric. Therefore, Eq. 8.33 is interpreted as a metastatic normal ROC character. If the original Luce ROC character is symmetric, the skewness parameter / will be zero, and the range of the approximating normal character will also be symmetric. This is shown in Eq. 8. 34. Yl X1 = 1- Y1 = /0 =, D- (ln 1 ) (z= ) -y1 ( (8.34)

279 The parameters D and 1 apply to the'individual distribution; the factor n appears separately in Eq. 8. 33. These quantities D and 3 are the parameters of the original Luce ROC curve that are relevant in studying the limiting form of the binomial ROC resulting from many similar observations. Table 8. 1 presents three examples of Luce ROC curves _X1l] Y1 Zl z.1 2'1 W2 0100.0918 -. 0862 2. 215.949.030 -.164 -.042.157 1586.5000 -.5200 1. 50.649.273 -.026 -.034.580.3085.6915 -.806.806.462.462 0 0.650 Table 8. 1.; Limiting binomial examples and the values of D and 1 evaluated for them. The three points chosen for the Luce ROC all fall on a normal ROC with index d of unity. The first was picked as x1 -.01, a very strict criterion. The second was chosen at Yl.50. The third was chosen at the negative diagonal. A column for the ratio of d to the square root of D has been included. The parameter 1 is the skewness parameter, giving the amount that the z mode shifts for each observation. The ratio of d to the square root of D indicates the mode shift in standard deviation units. The highest value of the parameter D results from the symmetric Luce ROC. The fact that this is only 65 percent (roughly -) of the original d is familiar to those who have worked

280 with detectors based upon clipped correlation devices sometimes known as polarity coincidence comparators (Ref. 36). The performance is fairly insensitive to the departure from the negative diagonal until the departure becomes extreme. For very strict criteria, the parameter D is very much lower than the original normal ROC parameter d, and the resultant binormal ROC will be poorer than if a more symmetric original position had been used. 8. 3 Fisher-Tippett, Doubly Exponential Character In Section 3. 3. 5, the Fisher-Tippett distribution was used to obtain an ROC character. The form of this character is repeated here. azpe r(z) = A e a >. 5, >0 (8.35) The probability distribution associated with this character arises in studies of the distribution of the largest of a number of independent random variables. In that application, the two multipliers of z are identical. This section will treat a slight generalization of this character, where the multipliers of z need not-be identical. This character is shown in Eq. 8. 36. () = Aaz - e ay >0, lal >.5 (8.36) The coefficients a and y may be either positive or negative, but they must both have the same sign. Otherwise, the ROC character would not approach zero for large magnitude z. The coefficient j is always positive.

281 Examination of the first two derivatives of the exponent show that this ROC character is unimodal. Let 5 (z) denote the exponent in Eq. 8. 36. 5'(z) = az - eYZ =(z) a e (n) n.yz (n)(z)= - e n > 2 (8.37) The mode of the ROC character is at the z value for which the first. derivative is zero. 1a mode z -- In (8. 38) The second derivative of the exponent is everywhere negative, confirming that the Eq. 8. 38 yields a mode, and not a relative minimum.,"(Z) - -Z,2 e/Z < 0 (8. 39) From ROC Curve to Character, The easiest demonstration of the ROC curves corresponding to the characters of Eq. 8. 36 is to exhibit the ROC curves and then to show that they have this character. The incomplete gamma function is defined in Eq. 8. 37. P(at) r(a) J e dX a > 0 (8.40) The ROC curve will be a comparison of two incomplete gamma functions

282 at different parameter values a, but with the same cut t. The ROC curves are given parametrically in Eq. 8. 41. The notation used gives the two parameter values in terms of their mean and difference. X(t) = - P(m - 6, t) Y(t) - 1 - P(m + 6, t) 0 < 6 < m (8. 41) To obtain the ROC character first find the probability. density functions for the dummy variable t, and then transform to the log likelihood ratio. Since the ROC curve is given in terms of integrals, the probability density functions are simply the integrands. f(t N)= r~t(m 1 )tm- 6- 1 -t f (tIN) r- (in- 6)m e (8.42) f(tI SN) = r(m + 6 )tM e (8. 43) These are Pearson III probability density functions in the variable t. The likelihood ratio is formed by dividing Eq. 8. 43 by 8. 42. z 26 e = t r (m- 6)/ r (m +6) (8.44) The likelihood ratio has a Pearson III probability density function, but z will not. To obtain the probability density functions for z, solve Eq. 8.44 for t.

283 1 r (m + 6 2'5 t = r m e (8. 45) It will be convenient to have a symbol for the constant factor in Eq. 8. 45. Call this constant / /3 = [r~(m+6j]"8.46 r (= >o (8. 46) Rewrite Eq. 8. 45 as z t - e2 (8. 47) e to obtain the Jacobian of transformation. z dt p 26 t dz 26 e 26 (8. 48) The root product density function for t is 5. -.5 m-1 -t [ f(t IN) f(t SN)]* = [r (m- 6) r(m + 6)] t e (8. 49) Since the Jacobian of transformation, Eq. 8. 48, contains a single power of t, absorb this into the root product density before substitution for t according to Eq. 8. 44. The result is z z mz 7(z) =- [r(m - 5) r(m + )] 5 3 m 26 -e (8. 50)

284 To simplify this equation to the general form of Eq. 8. 36, merely identify the constant factor, A, and the two multipliers of z, a and i-. 1 - 5 m A =- [r(m- 6) r(m +6)] m 1 a = 2 Y = 2 —- a >.5, y > 0 (8. 51) The restriction that the parameter 6 be between zero and m converts to the restriction that the coefficient a be greater than one-half and that y be positive. Whenever an HOC character is nonsymmetric, a second ROC character and ROC curve are obtained by reversing the z-axis. Specifically, replace z by -z everywhere in the first ROC character, and replace X by 1 - Y and Y by 1 - X in the equation for the first ROC curve. When this is done for the doubly exponential ROC character of Eq. 8. 36, another doubly exponential ROC character is obtained z- -z X- 1- Y Y- 1- X (8.52) vT(z) = A e - ~ a >-.5, > 0, >0 (8. 53) The coefficients are the same as those given in Eq. 8. 51, and are positive. The two minus signs can be absorbed into the terms a and 7', giving an ROC character for which the new a and y parameters are both negative, but the coefficient / is still positive. The ROC curve is given parametrically in terms of the incomplete gamma function.

285 X(t) - P(m +,t) Y(t) = P(m- 6,t) 0 <6 < m (8.54) From ROC Character to ROC Curve. Whenever a doubly exponential ROC character is encountered, the parameters necessary for the incomplete gamma functions used in the ROC curve formulae are easily obtained. The m parameter is the ratio of the two multipliers of z, a and y. These coefficients will either be both positive or both negative, and hence their ratio is positive. The difference parameter, 6, is given solely in terms of the y parameter. Although there are only two degrees of freedom in the Fisher-Tippett Doubly Exponential Character, there are three coefficients in the exponential. One must therefore make sure that the I parameter has the appropriate value. This is summarized in Eq. 8. 55.,a 1 2Iyf Check: [ r(m + 6) / r(m - O)j 6. ) Tables of the incomplete gamma function are not always av\leiabioe. The incomplete gamma function is related to the chi-square distribution whenever the parameter values m + 6 are integers or half integers. The relation is that the chi-square value is 2t and the

286 degrees of freedom are twice the incomplete gamma parameter. P(m +,t)- P( = 2t 2m + 26) m, 6 are both odd quarter positive integer or are half positive integer or positive integer (8. 56) Whenever the degrees of freedom of the chi-square distribution are even integers, the probability may be written in terms of a finite power series, a polynomial, times a simple exponential. m-1 + 6 tn -t 1 - P(m + 6,t) = e - n! n=0 m, 6 are both half positive integer or both positive integer I (8. 57) Tables for the incomplete gamma function and the chi-square are normally available for only a low number of degrees of freedom. Typically, the largest tables available will terminate at thirty degrees of freedom. Approach to Normal.. As the number of degrees of freedom become large, the Chi-Square distribution approaches a normal distribution. The following approximation is usually suggested 1-P(XzI v) (-v2xz + 2v-1), v large (8.58) The table in Cramer (Ref. 5) suggests v of thirty is sufficiently large to use this approximation, while the NBS Handbook (Ref. 6, 26. 4.13)

287 suggests v larger than 100. ROC curves deal with a comparison of two distribution functions. Our experience has been that such approximations may be used at much lower values of parameter in a comparison than would be allowed were one evaluating a single distribution. The approximation of Eq. 8. 58 used in evaluating the ROC curves of Eq. 8. 51 will necessarily yield a normal ROC curve. The parameter d will be given by the square of the difference of the two terms involving the two degrees of freedom. When both of the degrees of freedom are large, and their difference is small compared to their average value, some standard approximations make a further simplification in determining the normal ROC parameter. The manipulations are omitted but the result is given below. d = (4m- 12 - 6 -4m- 1 - 26) 462 m-i = (ay) 1 (8.59) Equation 8. 59 strongly suggests that no single parameter in the dou-' — exponential character can be interpreted as the quality parameter for this distribution. Rather, detectability will be inversely proportional to the product of the two multipliers of z, a and y. Since the ROC character is nonsymmetric, the exact ROC curve will be nonsymmetric. When the ROC curve is plotted on normal-normal paper, the nonsymmetry should be manifested as a slope different from one. It would be advantageous to have a value obtainable from the character which would be simply related to this slope. In Carver's tables (Ref. 28) for the Pearson III distribution, the skewness of the incomplete gamma function is given in terms of the parameter a.

-.5 a 3 = skewness of P(a,t) = 2 a (8.60) Although the ROC equations use two incomplete gamma distributions with different skewness, under the condition that m is much larger than the difference, 6, a 3 is of the order of 2/- %. No quantitative measure is implied. It is implied that the skewness should disappear as m is increased. The approximation given in Eq. 8. 59 groups together those ROC curves with the same ratio of 6' to m. H the role of 6 is replaced by a new parameter, d, using the substitution 6 =. 5m-d a a the ROC character is rewritten as m z - evz IT(z) A e a a (8.61) Collecting Eqs. 8. 41 and 8. 56, the ROC curve for the above ROC character in terms of the chi-squaredistribution is 1- P(X2 = 2t v = 2m +md) (8.62) X - a For tables which give the upper tail area, which is one minus the chisquare probability, X = Q( = 2tiv =2m+ md.) (8. 63) Holding da constant, it can be shown that [ behaves like m minus one-eighth d as m approaches infinity. a

289 d m -, f-m - 8 (8. 64) When this is used in the expression for the parameter t, Eq. 8. 47 and the limiting normal form of the chi-squaredistribution, Eq. 8. 58, the Locus of the z = 0 point is the negative diagonal. at z = 0,t X X (+.5 (8.65) For m values much larger than the detection index da, d may be considered equal to m. In this limiting form, the double exponential ROC character approaches z ~m z- me m a m- o,o a(z)- Ae da (8.66) Examination of the derivatives of the exponent for this limiting form, shows that the mode has shifted to zero and that the derivatives of order three and larger rapidly approach zero. The second derivative of the exponent is the only derivative that does not approach zero. t'(0) = a M-3y = /-m_ =0 t(o0) - - - da,''( o) - - = 1- o damda (8. 67) a~~~~~~~~

290 The power series approximation to the exponent is dominated by the second term. ~(z)-i ~'(0) a - - Y (8. 68) a This is the exponent for the normal ROC character. Using Eq. 8. 63, ROC curves were evaluated for vdT = 1, 2, and 3, and all possible small m values yielding integer degrees of freedom. These are shown in Fig. 8. 4. As can be seen from the plots, the grouping of the parameters m and 6 provided by Eq. 8. 59 was fairly effective. That grouping was based on an approximation to the chi-square distribution normally considered valid for degrees of freedom of the order of 30 to 100 or greater. The actual values for the degrees of freedom used in plotting Fig. 8. 4 are given'i Table 8. 2. d v a m 6 SN N'.;..........,........ 1 - 1 0.50 3 1 2.25 0.75 6 3 4 1. 00 10 6 6. 25 1. 25 15 10 9 1.50 21 15 4 4 1 12 4 9 2 24 12 9 4 3 14 2 9 4.5 27 9 Table 8.2. Pararmeters for X = Q(X IN), = Q(X tjS2

291 99 z00,, / =+ / /m-5m/ // /O 90 - 21 80 70 y 60 (*) 50 X 40o 30 y Q(x lt=2mV+ Tdm) X = Q(.X Iv= 2m - Vd-m) ~ - Pe- e 7(z) = A e 10 ~tO~ w~neB=sns dn~~n.5Jwhere ~ =r.5 ~1 -5 10 20 30 40 50 60 70 80 90 x (%) Fig. 8.4. Fisher-Tippett ROC curves

292 The normal ROC curves with parameter v/d 1, 2, and 3 lie slightly below the double exponential character ROC curves, crossing them in the neighborhood of x =. 02. The development in this subsection was suggested by the standard approximation to the chi-square distribution for large degrees of freedom. This approximation indicated that the proper way to consider the ROC character 7r(z) = A eaz e (8.36) is to consider the product of the two z multipliers, a>', as the reciprocal of the approximating normal index, and to considerthe ratio of these two z multipliers, a/y, as a skewness index. By a sequence of approximations, it was established that in the limit, as m = a/y approached infinity, that the ROC curves do become symmetrical (indicated by the z = 0 point falling on the negative diagonal), and the ROC character does approach normality with the index equal to the reciprocal of a;,. ROC curves based on available tables indicated that this same viewpoint of the parameters in the ROC character is valid for small integer m, as well as when m approaches infinity. Other Cases. The cases considered numerically showed a slope greater than one on normal-normal paper, contrary to most of the examples that we have considered. The reverse double exponential ROC character of Eq. 8.50 would show the corresponding slope less than one, since these ROC curves would be the mirror image of those shown in Fig. 8. 4.

293 In the above subsection, the concern was With cases where both parameters representing degrees of freedom became large. A particular example that appeared earlier, in Section 4. 4, was the ROC curve Y = X( + ln X ) (8. 69) This is a particular case of the double exponential character; namely, when a is y +. 5. This fixes m - 6 at unity, and the basic equations for the ROC curves are X(t) = - P(1,t) (8. 70) Y(t) = 1- P(2m - 1,t) (8. 71) This is a case where Eq. 8. 57 can be used effectively, if 2m- 1 is an integer. 2m-2 n X(t)= e, Y(t) e E m (8. 72) n=0 The dummy variable t can be removed by solving the simple X(t) equation, and substituting into the second equation. The result is Y = x(1+~InXI+4. lnXl + " + X In 2m) /InxlP+... ~T 2m-2)! (8. 73) The first nine such ROC curves are plotted in Fig. 8. 5.

294 *99.98 97.95 "$ //.7 90.60~/.80.40.70.60.50 - 40 -.30.20.O-.01.02.03.04.05.10.20. 30.40. 50.60.70.80 X Fig. 8.5. ROC curves of Eq. 8.73

CHAPTER IX ADDITION OF ROC CHARACTERS This chapter deals with the ROC character addition theorem, and with two families of two-parameter ROC characters. The character addition theorem says that if a number of regular ROC characters are combined in a weighted sum, then the resultant ROC curve is formed by using the same type of weighted sum over the original regular ROC curves. ml(Z) = - Ck clk(Z) => X(z) ck Xk(z) (9. 1) Y(z) = L ck Yk(z) The ROC characters treated in this chapter are the type containing a power of z, and an exponential term in z. The two parameters in each family are the power of z, and the coefficient in the exponential. The first family considered in Section 9. 2 is the Pearson IIL PearsonIII Type: m(z) = C z e /k (9. 2) The power parameter, p, may be any real value greater than -1. The exponential is a simple exponential, that is, it is linear in z. The second family considered in Section 9. 3 is a form of the Halsted family mentioned in Chapter I. These latter ROC characters are related to the Hermite polynomials, and derivatives of the normal density function. 295

296 H Type: Czn e(z0) 9/2k (93) This type of ROC character is just being developed at this time, and the power of z is restricted to being an integer greater than, or equal to, one. The H-type ROC character differs from the: Pearson III character in that the exponential term is quadratic in z. Both types of ROC character are bounded below, and unbounded above; that is, the z-range is from some negative value up to plus infinity. The addition theorem allows the expansion of either or both of these types to ROC characters which consists of polynomials in z, times either a linear or quadratic exponential term. When the equations for the ROC curves in terms of the variable z have been developed, one may immediately obtain the ROC curve for such a generalized character. There is no restriction of the use of the character addition theorem to combining characters of the same general family. Any types of character may be added together. 9. 1 Character Addition Theorem This section states and proves the character addition theorem. The statement and the proof are in terms of the integrated ROC character, to obtain the maximum generally. It will therefore apply to ROC curves that are differentiable, and have ordinary ROC characters; it will also apply to polygon type ROC curves, with straight lines joining the vertices, as well as mixtures. The key mathematical point in the proof is the ability to interchange a finite summation with

297 integration. ROC Character Addition Theorem,: Let II (z) be the integrated ROC characters for the regular ROC curves (Xk(z), Yk()),k = 1, 2,... n. Let ck be any positive numbers whose sum is one. Then n II (z) - ck (z) (9.4) k=1 k k is an integrated ROC character, the ROC curve is regular and is given by n n X(z) = ck Xk(Z) Y(z) = ck Yk(z) (9. 5) k= l k- 1 Proof: The four requirements for a nonnegative function to be an integrated ROC character of a regular ROC curve were given in Chapter III, and are (1) the lower limit is zero H (-co) = O (3. 6) (2) monotone growth dH (z) > 0 (3.7) (3) unit "N" value e d(z) = 1 -co (3. 8) (4) unit "SN" value e- dH(z) - 1 -0o (3. 9) Each of the given individual integrated ROC characters, HI k have these four properties. Does the newly formed sum of Eq. 9. 4 also have these properties? If the individual integrated ROC characters are all zero at z = -so, then the sunm will also be zero. Similarly? if each of the

298 individual integrated ROC characters exhibits monotone growth (not necessarily strict monotone growth), then a finite sum does also. To verify Eqs. 3. 8 and 3.9 for the newly formed sum one need merely note that the definite Stieltjes integral is a linear operator to both functions. Therefore the integral with respect to the sum is the sum of the integrals. Since the coefficients ck add to one, the sum of Eq. 9. 4 will satisfy the properties 3. 8 and 3. 9. The notation for the ROC curve in terms of the single parameter z implies the dual parameter of a decision based on likelihood ratio. Specifically, whenever the integrated ROC character has a jump, the associated straight segment of the ROC curve is given in terms of a randomizing parameter, r. The complete equation for the ROC curve in terms of the parameter z is actually given in terms of a cut along the z axis, 3, and the randomizing parameter, r, whenever it is needed. y(/3 r) 0O f- f e. A(z) dz x(3, r) +rt. 5. 5z + re co ()+ i e c w(z) ~ z>3 (9.6) Most of this present work has dealt with the first integral alone, since for most ROC characters the jump function

299 w is zero everywhere. Each of the original n ROC curves are given by an equation such as Eq. 9. 6. To determine the ROC curve for the integrated ROC character of Eq. 9. 4 substitute the equation for the new integrated character into Eq. 9.6. Y(f, r) 0 ~.5z = e f eCk 7k (z) dz X(Or) / k + re5 Ck O k(/3)+ (Z, e A, ck wk(Z) k z>3 k k (9. 7) In the first term, the interchange of the order of summation and integration is allowed because the sum is finite. The second term dealing with the jump at the point /3 contains only one summation and, therefore, any reordering of the terms is valid. The third summation deals with one finite sum and one possibly infinite sum. However, since all of the terms involved are positive, if the sums converge in any order they will converge in any other order. Therefore, the summation over k may be taken outside of the expression to obtain Eq. 9. 8. Y(/3, re) d +ck

300 ~+ Z e co. 5z (z) (9. 8) z>3 The term inside each bracket is the pair of symmetric expressions for the vertical and horizontal coordinates of the ROC curve, Yk and Xk. The proof is complete. 9. 2 Pearson III ROC Characters In Section 3. 3. 4 the Pearson III character was obtained from the Pearson III probability distribution. The form of the ROC character is "(Z):= I )1p+ (Z -Z )P e,z > z0 (prz) (z- z)0' (a G~ B > 1, p > -1, z0 = -(pl) ln B In this section explicit expressions for the ROC curve are obtained when the parameter p is either an integer or when twice p is an integer. This latter is usually referred to as half-integer order. In each case the steps will be to display an ROC curve in terms of a dummy parameter t, obtain the relation between t and z, and check to be sure the ROC character is truly Pearson III. Integer Order. Consider any nonnegative integer n, and the following ROC curve.

301 Bt n (Bt)k p= n>O X= e k! t k =0 (9. 10) n k Y - e-t k! k =0 Both X and Y are unity when t is zero, and decrease toward zero as t approaches infinity. Therefore, X and Y, as usual, are treated as one minus the distribution function for t. The probability density functions for t are the negative of the derivative of X and Y with respect to t. Proceeding formally, the derivative of X with respect to t is n kk- n k k Collect the two sums, first changing indices so that the coefficients of a single power may be determined, to obtain dX -Btn B+ tr n B BB tk -Bt dt e -r k' n! dt =0 k= 0 (9. 12) Therefore, the density function for t under condition N is B n n Bk f(tN) =! tn e (9. 13) Since the equation for Y is similar to that for X with B set equal to one,

302 f(tISN) n! t e (9. 14) z The ratio of these two equations gives the likelihood ratio, e. e = -( (B1)t (9.15) Solving for z, z = (B- 1) t - (n+1) In B (9. 16) By Eqs. 9. 13 and 9. 14, the variable t has a Pearson IIi density function. Since z is a linear transformation of t, it too will have Pearson III density functions under both condition N and SN. Explicitly, t is t - B-1' ) z0= -(n+1) ln B (9. 17) Any of the several possible routes to obtain the ROC character will reobtain Eq. 9. 9. It is therefore established that Eq. 9. 10 is the ROC curve for a Pearson III ROC character of integer order. Eq. 9. 16 gives the explicit relation between z and the variable t. Half-Integer Order. Consider any value for p which is one-half larger than an integer. The claim is that Eq. 9. 18 is the corresponding ROC curve.

303 n k+. 5 p = n+.5>-.5 X = 2-2i 2Q( )+ (Bt)k+l 5) e (9.18) n k+. 5 Y - 2 -2 (V 2t) + t -t r(k+l. 5)e k 0 If the value for p is -. 5, so that n would take on the value -1, the summation is omitted. The function ~ ( ) is the normal distribution function, and the range of t is from zero to infinity. Differentiate X with respect to t in two steps. First consider the term involving the normal distribution function. The detailing of differentiation is given in Eq. 9. 19 d 1 t-.5 d- 2O (j2Bt) 2 (/2Bt) B t (22 4= - ^.51. e 2 (9. 19) -. 5 -Bt \ t/ e The square root is squared in the normal density function, leaving a linear exponential term multiplied by t to the power minus one-half. The differentiation of the sum is subject to the same collapsing, or telescoping that occurred for integer order, with the exception that two terms are left after the cancellation instead of one.

3/04 dt k -Be0 fB5k 0k k- 0 )-n k+. 5 k+. 5 n Bk+. 5tk-. 5 - Bt B BtB e r (k+l. 5) ke r0k+5) -eBt [B jtn - = e Lr (n+L.... r(5) J9. 2 (9. 20) The two remaining terms contain the desired large power of t, and a term looking very much like that in the last line in Eq. 9. 19. Indeed, if the gamma function is evaluated at one-half, it is simply the square root of 7. r(.5)= - (9~ 21) In the sum of Eqs. 9. 19 and 9. 20, these similar terms will cancel. n+1. 5 n+. 5 -Bt d B t - Bt f(tlN) = dt (1 X) d= tTn+ 15 e (9. 22) The density function for t under the condition SN is obtained by setting B equal to one. n+. 5 f(tSN) - nt -t f(t SN) (n + 1. 5)e (9. 23) Divide Eq. 9. 23 by 9. 22 to obtain the likelihood ratio. ez = B-(n+l1 5) (B-1)t (9 24) The logarithm of the likelihoo d ratio, z, is linear w;ith the variable, t

305 and, therefore, will have the same type of density function as t. Since these are both Pearson III, there is no need to detail the familiar route and reobtain the ROC character, Eq. 9. 9, for half-integer order. This has established the equations for the ROC curves for Pearson III character for integer and half integer order. Other forms of these equations may be obtained in terms of the chi-square distribution, and the incomplete gamma function. The ROC curve is given in terms of a comparison of two points along either of these distributions, using the same number of degrees of freedom. This is contrasted to the case of the Fisher-Tippett Doubly Exponential Character, which was obtained from the incomplete gamma function, or chi-square distribution, by reading at the same argument but for two different degrees of freedom. The most skewed of the characters considered in this section is that with p = -. 5. The ROC curves for several B values are shown on normal-normal paper in Fig. 9. 1. There is a special coordinate system that makes these curves plot as straight lines. It is based on the error function, erf t, and is used to display the same ROC curves as before in Fig. 9. 2. 9. 3 H-Type ROC Character The character under study in this section consists of a power of z, times a quadratic term in z in the exponential. The general form is

ul g' -= z'g'0 -=d 61 I uos EaOA sAJno DOE'T'6 0'21 X 098' 0' 09' 0 a' 0 O 0'' O' 01 G0 W0' g O'ID Io 170 (' — -- I, 0 0za -z) (z) oz-z 0z-z _ _,. - OV 4'F~~~ ~ ~ ~ 0 09. 08,' _1 _ 09' 1 1 z I r I I I i I \1 ~66 9-0L 06' 96 90S'

307 1.00.90.80 B=I44 0 70 60 50.40 y.30 *20 /z ~ ~ ~ ~ ~ ~~zlo - B+10 4 5 I 2-( (z) = -B (z-z0) e J (B-1) PearsonIII, p =-.5 z0 =-.5 lnB.02 1 1 1 1 1 f ~02 I J_.. I { I..!...{...I I I.02.10.20.30.40.50.60.70.80.90 1. X Fig. 9.2. Pearson III, p = -.5, ROC curves on erf-erf paper

308 -(z- z) 2/8a-2 (z) C (z z0)n > z Zo8 n> 0 (9.25) The term:0 is the minimum value for z and will be negative. If the quadratic in the exponential were expanded it would contain both a term in z2 and a term in z. The constant value, C, is a function of both the power n and the detection index a. No general expression for this constant, C, or the value of z0 in terms of n and a is known at this writing. The exponential with the quadratic term can be expressed as a normal density function, and the form of the constant C changed to make subsequent manipulation simpler. The following form of the ROC character will be used 7T(z) = A 2 e 0' ( 2) (9.26) To obtain the equations for the ROC curve, multiply the character by ~. 5z e, and integrate. The integration will be simplest with the change variable Z-ZO dt = t= ZZ0 dt > 0 (9. 27) 2a' dz 2a' t (9.27) The multiplier necessary for the integration can be expressed in terms of ar t, and z0. ~at ~. 5z et = e (9. 28)

309 Reversing the usual route, the density functions for t are obtained from the corresponding density functions for z, by multiplying Eq. 9. 28 by 9. 26, and by the Jacobian of transformation. f(SN) A ei' tn e. 5t2/ /2ir (9. 29) Of the four terms in the exponential, the three not involving z0 form a perfect square. When this is done, the densities can again be expressed in terms of a normal density function. fSN = Ae (t~ c}) (9.30) The equations or the ROC curve can then be formally expressed by integrating the density functions. 5z 0o Y- Ae.0 f tn (t-oa) dt T (9. 31) -. 5z 0o X = Ae f tn0(t+o) dt T At this point the symmetric form for these equations has to be abandoned in order to obtain simpler integrals. Make a change of variables in each of the integrals of Eq. 9. 31 to obtain a single letter argument for the ace anmai density function.

310.z cx Y Ae S (t+o)n 0(t) dt T-c (9. 32) -.5z oo X = Ae - 0 (t-o)n' (t) dt T+ac The derivatives of the normal density function can be expressed in terms of a polynomial times the normal density function. These polynomials are known as Hermite polynomials. Using tables of Hermite polynomials one may develop formulae for the integral of a single power times the normal density function. Such a table is given here. f 0(t) dt =: (t) f tr(t) dt = -0(t) S t20(t) dt = (t)- tO(t) S t 3(t) dt = -(t2 + 2) 0(t) f t4 0(t) dt = 3-.(t) - (t3 + 3t) 0(t) f t5 (t) dt -(t4 + 4t2 + 8) (t) f t6 0(t) dt= 15c(t)- (t5 + 5. t3 + 15 t) 0(t) Table 9. 1 Special Integrals The equations of: 9. 32 may be integrated by expanding the term (tC~a)n as a polynomial in t. The coefficients will depend upon c, and whether the sign is plus or minus. By using the table of special integrals, the equations for each ROC curve may be reduced to a term involving the

311 normal distribution function, and a polynomial times the normal density function. The values of the polynomial times the normal density function at the upper limit, plus infinity, is zero. The value for the normal distribution at the upper limit is unity. Therefore, the upper limit will contribute at most a constant term. The rest of the equation for the ROC curve will depend upon the evaluation of the indefinite integral at the lower limits, t oa. The resulting form for the ROC curve will be Y C1[ 1- (t- o)l + Pn (t - a) (t- o) X = CO [1 - (t t)+ O) (t+ a) (9. 33) > 0, t > O The ROC curve and ROC character have been graphed for the simplest case, when n is one. The ROC character is shown in Fig. 9. 3. The lower scale is the dummy parameter t. Five z scales are shown. As the parameter a increases, the ROC character will be effectively spread over much more of the z axis. The mode of the ROC character is negative, and effectively moves farther to the left as detectability increases. The equations for the ROC curve in terms of the normal distribution and density for n = 1 are given below nY = (a[l- I (t- c)] - (t- +(a(t- ))/[a>(c)+0(c)] n = 1- (9 34) X- (-a{ 1- (t+c)1 + 0(t+ 9))/ oa (-a)+ (a)j The ROC curves for av= 1 and 2 are displayed on normal-normal

-5.38 L, I -1 0 +1 4 8 12 -3. 96 -3 -1 0 +1 3 6 9 I "=0.5 - 21.026 -2 1 0 +1 2 3 ~~~-1. ~26 -~'~' (~ +1'~ 2. ~ — a= 0.5 -0.251 0. -.2 0 +.2 +.4 +.6.25 24 ~ 22.20 -- ~ ~6 - - t ~ w h e ret~ t /\z0 = -ln[0(a) + a (a)]/[0(-a) - a (.a)]l.14 t -2 a2.12 \k = e [0 (a)+a (a)] [(-a) - a c(-a)] 10 ~ 08 Fig. 9.3. H-Type ROC character Z scales and shape of ROC character

313 95 90 80 / 70 60 y %50 40 30 20 1 C I[ - I I 5 1 2 3 40 50 60 70 80 1 5 10 20 30 ()40 6 0 70 80 Fig. 9.4. H-typeROC, n = 1

314 paper in Fig. 9. 4. They appear to be nearly straight lines, with slope less than one. For a= 1, the slope is s =.85 and for a= 2, the slope is.74. 9. 4 Discussion on Approximations The addition theorem may be used to advantage to obtain an ROC curve, for an ROC character by fitting the given character with a weighted sum of characters with known ROC curves. If the nature of the tail portion of the given character is known precisely, it may be matched with ROC characters with similar tail behavior. The Pearson III class which includes the simple exponential of the power ROC, all exhibit a linear exponential tail. The conic ROC class also has a linear exponential tail, with coefficient fixed at -1. 5. The H-type ROC character, as well as the normal and metastatic normal, have a quadratic exponential tail behavior. The field of approximations has been barely studied in the present work. It has been the purpose of this present work to provide the relation between the ROC curve and an analytic form, the ROC character, and to present a variety of families of ROC curves. The author intends to expand the ideas presented in this chapter to further enrich the collection of ROC characters with known ROC curve equations. The area of obtaining ROC curves and characters from experimental data is worthy of research. Especially useful would be the development of techniques for successive approximations, whereby

315 limited data could be used to obtain a first ROC curve and character, and subsequent data could be used to refine the first approximation.

CHAPTER X SELECTED TOPICS 10. 1 Character-Free Measures of ROC Quality When the description and evaluation of an ROC curve is just one part of an experiment, or one part of a mathematical analysis, there is a strong practical demand to reduce the description of detection performance from the single function ROC to a single number. When the purpose of the detection is well defined, this can be done by stating the degree to which the detection performance achieved its purpose. For example, in binary communications operating with a symmetric channel, the likelihood ratio cut is uniquely established at one, and the probability of error is the single numerical quantity of interest. One of the major purposes of this present work was to reduce the description of the ROC curve to a two-step description. The first step is the determination of the family to which the ROC curve belongs. The second step is the evaluation of the one or two parameters necessary to describe the particular curve. The specific parameters to be evaluated depends on the nature of the character for the particular ROC family. In this section various measures of ROC quality are considered that have been proposed for use independent of the particular character of the ROC curve. We call such measures 316

317 "character-free." The definition of the measure is independent of the character of the ROC. However, the values obtained for such measures willdepend on the particular character of the ROC. 10. 1. 1 D. M. Green's Theorem. A particular type of psychophysical experiment is known as the multiple presentation, multiple alternative, forced choice experiment, or simply, "forced choice" experiment. The description which will be given here is the forced choice in time experiment. There are other forms of experiment with equivalent analysis. In the two alternative forced choice experiment, two intervals in time are marked for the observers attention. A signal will appear in one but not both of these time intervals. The observer is required to decide which interval contains the signal. In the general M-alternative forced choice situation, M intervals are marked off, and the signal will appear in one and only one of these intervals. A completely symmetric forced choice experiment presents the signals in each interval with equal probability, and the subject is asked to obtain the maximum number of correct identifications. The single number score for his performance is the fraction of identifications that are correct. The corresponding analytical measure for atheoretical situation is the maximum probability of correct identification. Green's Theorem (Ref. 29) relates the score for a twoalternative symmetric forced choice experiment to the ROC curve.

318 Experimentally, the ROC curve is obtained in a single presentation two-alternative situation. Green's Theorem is that the two-alternative forced choice score will be the area under the ROC curve plotted on linear paper. P2(C) = Area under the ROC curve (10. 1): A general formulation for the probability of correct identification for the M alternative symmetric forced choice situation in terms of the ROC curve can be readily obtained. Consider an observer with a given ROC curve which measures his ability to distinguish between signal in noises SN, from a background of noise alone, N. If this ROC curve is regular, then it corresponds to any one of a number of decision axes. Consider any such decision axis, and label it t. When the cause is N, denote the decision axis value by tN; when the cause is SN, denote the random variable by tSN. Consider the observation upon which the subject bases his identification. It consists of M- 1 values of the decision axis due to N, and only one value resulting from SN. To maximize the number of correct identifications, the subject should choose the largest value of t as corresponding to the interval with the signal in it. The probability that he will be correct is the probability that the value of t due to SN, t is greater SN' than all of the other observed decision axis values. PM(C) = Prob (all M-1 tN < tSN) (10.2)

319 Assume for convenience that t has a probability density function under the condition SN; then PM(C) can be written in terms of an integral as follows: The probability of correct identification conditional to any specific value of tSN is the product of the probabilities that each of the tN is below the tSN value. This individual probability is the distribution function for t under condition N. For the symmetric presentation case, all tN variables have the same distribution. PM(C) is obtained by averaging over all possible tSN values. PM(C) = f F (tlN) f(tISN) dt (10. 3) -00 It is convenient to convert this integral to the more general Stieltjes integral. M-C=S1( 1 PM(C) = f FM (t I N) dF(t I SN) (10. 4) No matter which of the decision axes we hypothesized for the observer, the distribution function under N is simply 1- X; the distribution function under SN is 1- Y. 1 M-I PM(C) = f (1-X) d(1- Y) (10.5) 0 A change of variables from 1- Y to Y involves a minus sign which

32.0 is absorbed in reversing the limits of integration. 1 M_1 PM(C) = (1-X) dY (10.6) 0 Setting M equal to two in Eq. 10. 6 yields the two expressions 1 1 P(C) = S (1-X) dY = 1- f XdY (10.7) 0 0 Interpreting the integral as the area under the curve of the integrand, P2(C) = Area under ROC curve = 1 - Area above ROC curve (10. 8) Another integral which is equivalent, since it also computes the area under the ROC curve, is P2(C) f YdX (10.9) 0 Green's Theorem gives a strong geometric relation, easily understood, which relates two psychophysical experiments. It also emphasizes the insensitive nature of the forced choice score on the detail structure of the ROC curve. The two-alternative measures applied to the Luce ROC curve with vertex (Xl, Y1)' to the normal ROC curve with index d,

321 and to the exponential ROC curve with index A, are given in Eqs. 10. 10, 10. 11, and 10. 13, An excellent approximation to the M-alternative probability for the normal ROC curve is given in Eq. 10. 12 (Ref. 17). Luce (xl,Yl): P2(C) =.5.5(yl-x1) (10. 10) Normal, d: P2(C) = ((f) (10. 11) Normal, d: PM(C) = 0(aM d- bM), 0(-bM) = (10. 12) Exponential (Power), A: P2(C) = ~A 1 (10. 13) Green's Theorem related the probability of correct identification in the two-alternative test to the ROC curve. Every quantity related to the ROC curve is also related to the ROC character. The following theorem will establish the corresponding relationship between P2(C) and the ROC character. Any real L2 function has an autocorrelation function. The form chosen for the (unnormalized) autocorrelation function of the ROC character is cc n (T) = Jf (z+. 5r)T(z-. 5))dz (10. 14) -x0 It will be shown that the two-alternative probability of correct identification is the- area under the autocorrelation on the positive side of

322 zero, weighted with an exponential factor which emphasizes large values. Theorem: 0 P (C) toc e 5T0 (-) dr (10. 15) Proof: Equation 10. 9 is the starting point. In order to apply the integration, the explicit relation between corresponding values of X and Y is needed. y(/) =. jC 5(10. 16)-5 y(~) = j e 7r(z) dz, -dx(3) = e t(3) d3 (10. 16) When these are formally utilized in Eq. 10.9, the minus sign on the differential for x disappears in the Jacobian of transformation. PC) ) dz e) d (10. 17) P2(C) = f Se' rz )dz e. ( l))r( ) do (10. 17) P2(C)= f f ef 5(z-) -(z) ir(c) dzd/3 (10. 18) -Oc < /3< z 0 The region of integration is over the half-plane of those values of z which are greater than /3. Any alternative

323 description of the same half-plane can be used as well. A change of variables which singles out z-p has been selected. z = U+.5r u = 5(z+3) |J(z ), = 1 (10.19) 3 = u-.5T T = Z-3 The half-plane of integration is now described as the half-plane with'positive T, and u ranging from -cc to 0C as oc(c) oo S +. 5T 0 -cc The integral over u is the autocorrelation function of the character. P (C)= f e5Tq0 (T) d (10. 15) 0 The proof of the theorem is complete. 10. 1. 2 W. W. Peterson's Theorem. The basic relation between the moments of the likelihood ratio under the two distributions, N and SN, was first pointed out to this writer by W. W. Peterson.

324 Theorem X E(nIN) = E(n- ISN) n> 1 (10.21) This is a direct result of the fundamental theorem that the likelihood ratio of the likelihood ratio is the likelihood ratio. The proof of Eq. 10. 21 can be accomplished simply in one line by writing down the integral definition for the n-th moment. Factoring out one f term and associating it with the differential yields the desired (n- l)st moment under condition SN. fS f dF(2tN) = fn-dF(N) fn-ldF( I SN) This theorem leads to a basic understanding of some calculations on likelihood ratio used to obtain measures of ROC quality. The average value of likelihood ratio under noise is unity because the average value of the constant one under condition SN is unity. The average value of likelihood ratio under condition SN is the second moment under condition N. E(/IN) = 1 E(Z2I]N) = E(Q[ISN) (10. 22) This means that the variance of the likelihood ratio under condition N will be equal to the difference of the means under the two conditions. c2(fIN) = E( ISN)- 1 = E(.ISN)- E(fIN) (10.23)

325 The signal-to-noise ratio, snr, for a linear filter is defined as the square of the peak output shift caused by the presence of signal, compared to the average noise power. In similar fashion, many authors including the present one, will define the snr for any kind of device as the square of the shift in the expected value, divided by the variance. When applied to linear filters, for situations in which a signal is added to stationary noise, the definitions agree. In general detection theory, any receiver is considered a likelihood ratio receiver whose output is a fixed order-preserving transformation of likelihood ratio. Consider for the moment a receiver whose output is precisely the likelihood ratio of the input. The signal-to-noise ratio for this output is [E(1 I SN) - E(f IN)]2 (snr) - (10. 24) - I2( IN) By virtue of Eq. 10. 23, this signal-to-noise ratio is also equal to those quantities in Eq. 10. 23. If this signal-to-noise ratio could be interpreted in usual engineering fashion, and if this likelihood ratio is the one that leads to a terminal decision, then an order of magnitude of the quantities involved can be inferred. In most engineering applications, the signal-to-noise ratio desired is of the order of ten, or a hundred, or a thousand, or greater. If the variance of the likelihood ratio under N is any of these numbers, while the expected value is

3-26 fixed at unity, then the distribution of ~ must be skewed to the right with tremendous upper tail. A description based on only the first two moments is hardly an adequate description for such a statistic. Therefore internal consistency dictates that Eq. 10. 24 be considered a valid description of ROC quality only when its value is smaller than one. In an attempt to compensate for the highly skewed nature of the distribution of likelihood ratio for cases with large detection quality, W. W. Peterson presented the following argument. For a normal ROC curve, the measures of Eqs. 10. 23 and 10. 24 are all related to the index d in exponential manner. NormalROC E(fISN) = ed (10.25) The index d is therefore d =- n [1+ a2 (N)] (10. 26) When a signal is added to white Gaussian noise, and applied to the matched linear filter, the distribution of likelihood ratio is log normal, and d agrees with the accepted value of signal-to-noise ratio. He therefore suggested that if one wishes to base a quality statement on the first two moments of likelihood ratio, that one should do so by computing the variance of B under N and applying Eq. 10. 26. 10. 1.3 Comparison of Quality Measures. In this section four measures of detectability are applied to four types of ROC:

327 normal, rectangular character, two-line symmetric Luce character, and the exponential character. Kulback, Leibler, and Jeffries (Ref. 31 ) have used a measure which they call "the divergence between hypotheses," or "the difference in mean information for decision per observation available from the hypothesis space." It is the difference between the means of the logarithm of the likelihood ratio. J(1:0) = E(zISN) - E(zIN) (10. 27) The second measure of performance will be found used in many papers in the physical theory of detectability, called output signal-to-noise ratio of a decision device. It is the difference of the means of the logarithm of the likelihood ratio, squared, divided by the variance of the log likelihood ratio. (snr) = J2(1:O)/a2(zlN) (10.28) This form is used when dealing with a decision axis which is known to be a linear translation of the z variable, but in which the constant has been lost during the analysis manipulations. It is justified because the difference of the means removes any translational constant, and the division by the variance will remove any multiplicative factor. A third measure of detectability was suggested by W. W. Peterson. dWWp = ln(l+ ~c2(~lN)) = lnE(Q2IN) (10.29)

328 The fourth measure of detectability is the variance of the logarithm of the likelihood ratio taken under either hypothesis. a (z or a 2(z SN) (10. 30) The four ROC characters to be considered are: d Z 8 1 2d Normal, d > 0: rN(z) = e e JR Rectangular, R> 1: lrR(z) = R Izl in R Symmetric Luce, a > 0: w (z= a) = (e + e a) A+1 z:1 1 A-Ai 2 Exponential, A > 0: E(Z) = A ()> -n A Table 10. 1 lists expected values and exact equations for the four measures of detectability. All four measures of detectability are the same for the normal ROC curve. The four measures of detectability are not the same for the other types of ROC curves. Let us examine the approximate value of these measures at the two extremes — when the detectability is very low, and when the detectability is high.

Item Character Normal Rectangular Sym Luce Exponential E(z N). 5d + [R1 In R - 2 ~2a tanh a- In E(z2 SN R+ 4a2 L J E( SN) | d+. 25d2 |ln R 8 - 4 R-n RIn R 4a E(z 2IIN) n 4 E(N)e 3 [R2 + R + 1] 2 cosh 2a - 1 A(2-A) A< 2 J(1:0) d 21 nR- 4 4a tanha (A-1)2/A ~R-1 ~(A- 1) o2(z1SN)d -4R n2 R 4a2 sech2 a A (Ni)2 (R- 1) (snr)zl d [(R+1) In R- 2(R-1)] 4sinh2a (A-) (s nr 4 sinh~ a A [(_-1)2 - R in2 R] | dWWp | d I ln[+ R+R 1] ln 3 ln[2 cosh 2a- 1] -lnA(2-A), A < 2 Table 10. 1. Exact formulae

330 Character Index Limiting Performance Rectangular R 1 (ln R)2 Sym Luce a - 0 4a2 Exponential A- 1 (A- 1)2 Table 10. 2. Low detectability limiting formulae All four measures differ in the way the limits are ipproached. However, for small signals all four measures agree. If Dne's interest in detectability is limited to a problem in which repeated, independent and similar observations will be taken, then any one of these four measures of individual detectability will be appropriate for determining the ultimate detectability. In predetermined observation procedures or in sequential observation procedures, the logarithm of the likelihood ratio is the sum of the individual z values. The resultantROC curve for fixed size decision procedures will be approximately normal, with index d equal to the sum of the individual detection "d." values. For example, one may validly conclude that if a final decision is based on n individual decisions, each decision yielding an observable output with log likelihood ratio either +2a or - 2a, then the final ROC will be nearly normal and the detectability index will be d = 4n aZ. We have demonstrated that each of these four measures applies to the normal ROC curve independent of the level of detectability,

331 and to other types of ROC curve (at least, for a small class of ROC curves) at' low levels of detectability. If one could similarly conclude that for asymptotically large levels of detectability all four measures were nearly the same. one would have great confidence in using any of them. However, for each character these measures will approach different limits and approach them at different rates. Item/Character Rectangular Sym Luce Exponential J(1:0) 2 In R-4-2 ln R -4a - A,(z SN).4In4 R 6a2 e2a A 2(JN R- 1 2a (snr) - In2 R -e - A2 dWWp ln((l+R)/3)- ln R -2a oc for A> 2 Table 10. 3. High detectability limiting formulae. In summary, several measures of detectability have been examined. and have been shown to lead to common numerical evaluations when the detectability is low. The normal ROC curve is unique in that all four, measures of detectability yield the same value, d, at all levels of detectability. When the ROC curves near perfection, the four measures of detectability become drastically different. Not only do the measures disagree for the same type of character, but the behavior of each measure is different for different characters.

332 Anyone using a single number characterization for the ROC curve level without mention of the form of the ROC character must be very careful to know the use that will be made of his one number in order to have an appropriate measure of detectability. 10. 2 A Special Case of Type-l ROC In certain psychophysical experiments two quality measures have been obtained simultaneously, one of which is an ROC curve (Ref. 14). The first such experiment known to this writer is an experiment of Egan's for articulation score testing (Ref. 32). A word was randomly selected from a known list of fifty words, mixed with noise, and presented to a subject. The subject responded to each utterance by selecting one of the words from the known set as the most likely transmitted. The novel part of the experiment was that the subject was asked to score himself; that is, to indicate which responses he felt were right and which responses he felt were likely to be wrong. An ROC curve was then obtained by plotting on the horizontal axis the proportion of words accepted as correct, which were, however, wrong, and on the vertical axis, the proportion of words accepted as correct, and indeed were correct. This was a measure of the listener's ability to detect his own correct responses in a background of incorrect responses. This ROC curve is a "type-two" ROC curve, the distinction being made that the subject was indicating something about his first response. The first response was measured with a percent

333 correct or articulation score. In this section a somewhat simpler experiment leading to two responses from the device, or subject, is analyzed. Both responses will be evaluated on ROC curves. Consider any two-alternative single-presentation experiment as the primary experiment. As always, it is not important in ROC analysis that one causal hypothesis be background noise and the other a signal introduced into the noise; any two causes may be used as long as there is a correct response "A" corresponding to the cause SN. and alternative response "B" corresponding to the second cause, N. The response to the first detection experiment will be subscripted throughout by "sub one"; that is, the ROC curve will be Y1 as a function of X1, with log likelihood Zl, and ROC character 1rl(z). The second response, leading to a typetwo ROC curve, will be a judgment on the part of the observer as to whether the first response was correct or incorrect. That is, the observer is asked to place his responses into two categories, the high category indicating those quite likely to be right, and the low category being those which are less likely to be right. What is the observation on which the subject bases this second response? It is exactly the same observation on which he based his first response. The observation itself is not different; the type of response is different. The relevant probability densities will also be different. The only feature of the observation that one need consider is the log likelihood ratio zl. Although this is the logarithm of the

334 likelihood ratio for the first decision, it may help to de-emphasize this nature of the observation; therefore replace zl by the simple letter "t" as the observation for the second decision. The type-two ROC curve conditions are right, R, or wrong, W, corresponding to whether the first decision was right or wrong. Two ways the subject would be right would be detection, A SN, and correct rejection, B- N. Two type-two ROC curves are to be developed, the first conditional to the response A, and the second conditional to the response B. This is done for three reasons; one, the subject knows the response; two, the experimenter or grader knows the response; and three, these conditions may be significantly different. In order to develop type-two ROC curves a specific point must be picked as operating point on the type-one ROC curve. Let j3 designate a cut in the z1 axis, and the corresponding ROC point be (X1, YQ1) The type-two ROC curve conditional to the response "A" is relevant whenever the observation t is greater than the cut value 3. The density function for t, conditional to being right, is the probability density of t, given the joint condition A- SN. Care is taken here to indicate the conditions, so that the proper normalization will be included.

335 t > 3 f(tIR) f(tlA, SN) f(t, A ISN)/P(A ISN) (10.31) + 5t e 7r (t)/Y1 In similar fashion, the probability density for t, given that the response was wrong, is the probability density for t, conditional to the joint condition A- N. t >3 f(tlW) = f(tlA, N) f(t, AN)/P(AIN) (10.32) - 5t = e 5t 7(t)/Xl The likelihood ratio relevant for the second type of response is ~(t) = f(tlR)/f(tlW) (10. 33) Using (10. 31) and (10. 32), Eq. 10. 33 becomes t > z2 t + ln(X1/Y1) (10. 34) The ROC character is 7r2(z IA)= r1 z( + in (10. 35) fi+ln- < z < maxzl + ln y1

336 The type-two ROC curve conditional to the response "A" is the metastatic ROC curve derived from the type-one ROC considered from the point (0, 0) to the point (X1, Y1). The type-two ROC curve conditional to the response "B" is similarly derived by considering t values less than /3. The probability density for t conditional to a wrong response is the probability density for t, condition to the joint occurrence of the response B and the condition SN..t < / f(tIW) = f(tIB,SN) f(t, BISN)/P(BISN) (10.36) +e. 5t 71(t)/(1- Y1) Similarly, the probability-of t given the right response is t < f(tlR) = f(tlB, N) = f(t, BIN)/P(B I N) (10. 37) = e 5t M1(t)/(1- X1) When the observable t is less than /i the relevant likelihood ratio is (t) = f(t R)/f(t W) (10. 33) The value of z2 and character are

3'37 1-Y t < z2 = -t + In l X1 (10. 38) 2(zIB) V= 7(-z -n _1 / (i X1)(1-Y1) (10.39) - 3 + ln(1- Y1)/('1- X1) < z < -min Z + ln(1- Y1)/(1- X) This type-two ROC curve is the metastatic ROC curve obtained from the type-one ROC by considering the arc from (X1, Y1) to the extreme (1, 1) point. These two type-two ROC characters are both obtained from the type-one ROC character. Graphically, the type-one character is cut apart at the point corresponding to the operating point f, and the left portion reflected about the cut point, that is, the -co tail swung around toward +xc. The appropriate horizontal translations and vertical magnifications are applied to yield a legitimate pair of ROC characters. It appears that three conditions are both necessary and sufficient to make the two resulting type-two ROC characters equal. The magnifications will have to be equal, so the original operating point must fall on the negative diagonal. The domains of z must be the same (and utilizing the first point that the operating point is on the negative diagonal), the cut point 3 must be zero. Third, and most

338 important, the original ROC character must be symmetric in order that the mirror image 2r2(z B) be equal to r2( IA)-. For many reasons an analyst may wish to- lump, two conditional type-two ROC curves together as a single curve. This may be done formally by simply averaging the type-two ROC characters. These two characters may be averaged if and only if the corresponding cuts in the type-two responses are not conditional to the type-one response. That is, in reaching a type-two decision the observer must cut the z-axis at a single point, and not one cut conditional to the A response and a second cut conditional to a B response. The ROC character is zero outside of the ranges displayed in Eqs. 10. 35 and 10. 39. Using p and q to indicate p = P(SN) and q= P(N), the a priori probabilities for the type-one conditions the lumped data type-two ROC curve will:have character Vr2(z) - P(A) 7r2(z IA) + P(B) r2(z I B) PY1+qX1 Y1\ 1-Y1-qX1 r z + ln + (10. 40) XY1 (1-Y1)(1-x1) z 1- Xl For a symmetric case, where the type-one character is symmetric and the observer has selected the esymmetric cut, = 0, falling on the negative diagonal. the type-two ROC curve is

339 "2(z) -rlz + in l 7r2(z) = (+ ln~) ~Fk7 z>X1, z > -In -- (10.41) The experimental procedure determines if a specific decision device operates according to this optimum theory would require the responding device (subject) to operate at a fixed operating point for the type-one decision. Either a number of fixed operating points, or a rating scale, may be used to obtain the type-two ROC curves. 10. 3 The Multiple Observer The title, "Multiple Observer,' comes from psychophysics. It refers to the situation where the final decision maker is aided by several subordinate decision makers. In the context used in this present work, the multiple observer has been generalized to include situations when the subordinate processors may or may not reach decisions themselves. There is no essential restriction to situations involving human observers. An electronic system with inputs from several types of sensors is a multiple observer. Formally. the multiple observer is considered as a terminal decision maker whose input observation consists of a vector V containing the outputs of the n subordinate observers. The emphasis in multiple observer studies is usually on those situations for which the outputs of the subordinate observers are of different nature. However, all outputs are influenced simultaneously by the presence or absence of signal. The terminal decision maker may also be one

340 of his own subordinate observers. Such situations are prevalent in our everyday life. A person who must reach a decision as to whether it will rain this afternoon or not, may listen to a weather forecast, talk to his friends, and observe the sky himself. However, he must reach the final decision "A", to act as if there will be no rain, or "B" to act as if there will be bad weather. He takes his own observation into account as well as those of his friends and weather specialists. He must process his own visual information in the appropriate manner, and combine that processed output appropriately with the information from his other sources. We shall make two assumptions: first, that all subordinate observers are operating under the same condition, N or SN; second, that inputs to the final decision maker are statistically independent. Likelihood ratio theory yields what the proper processing by the final decision maker should be, and what his ROC character is. He should compute the logarithm of the likelihood ratio for each component of his input. and add these to obtain the logarithm of the likelihood ratio of his total observation. His ROC character is the convolution of the characters of the subordinate observers. V -(tl, t2,... tn) (10o. 42) n z(V) = 3 Zk(tk), 7'. * n

341 The likelihood ratio of a decision can be considered as easily as the likelihood ratio of any other kind of observable. If the k-th subordinate observer makes the decision "A", and his ROC operating point (Xk, Yk) is known, then the likelihood ratio of the observation is the ratio of Yk to Xk. This is the ratio of the likelihood of the decision "A" under the condition SN compared to the likelihood of the decision "A" under the condition N. Therefore, each subordinate observer who is a decision maker will behave as if his ROC is a Luce ROC. tke {"A"' "B" z Zk("A")= ln(Yk/Xk) (10. 43) zk("B") = ln((1- Yk)/(1-Xk)) No matter how different the types of observations the subordinate observers make, no matter how different their ROC curves, and style of reporting are, the formal description of the multiple observer treats each of the subordinate observers similarly. That is, the terminal decision maker calculates the log likelihood ratio of the subordinate to know how to process his information, and considers the ROC character of the subordinate in order to determine his own ROC character. While the problem of the final decision maker is fairly straightforward, the problem of the subordinate observer who must report a decision is much more difficult. The overall optimum behavior for the subordinate observer is to report his decision value,

342 his z value, to his superior. (This is obviously optimum since the superior could do no worse than process this decision value in the same way that the subordinate observer does.) The problem of selecting the optimum operating point, the cut value on the z axis, for the subordinate observer forced to make a decision is unsolved except for a few special cases. Several workers have suggested that the cut be chosen to maximize the Shannon information measure. Certainly further research is necessary to arrive at a completely satisfactory technique. It is hoped that the description of the final decision maker's performance in terms of his ROC character may provide a new approach; the subordinate observer may set his goal as the choice of the Luce ROC character which, convoluted with the other ROC characters, will yield the best final character. An observer who reports a continuous decision axis is called a continuous observer, and one who reports a discrete number of decision values is called a discrete observer. The phrase "Luce observer" is reserved for a subordinate observer who reports a binary decision. The resulting ROC character for situations involving continuous observers, or discrete observers who report many distinct values, is best handled by direct application of convolution. In this section two examples will be presented. The first is a combination of one Normal and one Luce observer; this is treated numerically. The second example is of one exponential observer and one Luce observer; it is possible to treat this situation analytically.

343 Example: Normal Plus Luce. The normal ROC character, and the ROC curve parameterized by the variable z are given in Eq. 10. 44. d z 8 1 2d lN(Z)= e e XN(z) = (-. 5 d -z/ fd) (10.44) YN(z) = 0(+. 5 d - z/Id) sible z values and the corresponding discrete character values for a Luce observer are ZA = ln(yl/xl) W (ZA) =Xl Y (10. 45) ZB ln(1-Y1)/(1-x1), w (ZB) = (TT1-x1)(1-y1) The convolution of these ROC characters is simple because only two possible z values occur with the Luce observer. The convolution is therefore the sum of two products. 7r(z) = W(zA) fN(Z- ZA) + +w(ZB) n7N(Z- zB) (10. 46) One can obtain the ROC curve by one of two routes. The formal method is to multiply the ROC character by e and to integrate from some point to infinity. The other is to verbalize the equation as follows: whenever the Luce observer reports "A" and the terminal

344 observer's cut value is z, the decision maker will say yes whenever the normal observer gives a ZN value greater than z- ZA. Conversely whenever the Luce observer reports "B", the normal observer must give a much stronger indication, namely, z- zB', to generate a terminal "A" decision. Either route yields precisely the same equations. Y(z) = Y1 4(+. 5d + zA/fd - z/xd) + (1- y1) (+. 51d + z B/d - z/fd) X(z) = xl ~(-.5Ad+ zA/ d- z/ d) + (1-xl) (-. 5Ad+ zB/d-z/fd) (10. 47) When the Luce observer is operating on the negative diagonal, the ROC curve and the ROC character are symmetric. This allows a slight simplification of the ROC equations and some further analytic work. Symmetric Luce: x! = 1 - Y, ZB = -ZA (10. 48) Whenever the normal observer is sufficiently superior to the Luce observer, the ROC character of Eq. 10. 46 will have a single maximum at zero. Under the converse situation, when the Luce observer is sufficiently superior to the normal observer, the ROC character will be bimodal. The precise definition of'superior" depends on whether zA is smaller or larger than the square root of d. When the Luce observer is superior, the twin modes of the ROC character

345 are given by the solution of Eq. 10. 49. z z zA > d, z = zAtanh dA (10.49) This is demonstrated in Fig. 10. 1, where the ROC characters for the final decision maker are plotted on semi-log paper. The continuous normal observer had an index of d= 1. Four Luce observers were considered. The characteristics of these four are listed in Table 10. 4. ZA 0 1.00 1.45 2.95 X1 0. 50 0.31 0.19 0. 05 Y1 0.50 0.69 0.81 0.95 Table 10. 4. Luce zA for selected points When the Luce observer operates at the (. 19,. 81) point, the ROC character is bimodal, but on semi-log paper appears to be fairly flat topped. The ROC curves for this situation are plotted in Fig. 10. 2. The Luce ROC is dotted, since the only point used is the single operating point. The ROC curve for this observer before he was forced to take a fixed position may have been anything going through the point (. 19,. 81). In the neighborhood of the negative diagonal the combined Luce and normal observer is only slightly bellei than the Luce operating point. However, the combination is definitely better than either away from the Luce operating point. For

346 Continuous Observer: Normal, d = 1.00 Discrete Observer: Symmetric Luce, Z = ZA ZA= 1 i(z).1 -ZA: 2. 95.01 001 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 z Fig. 10.1. ROC characters for four multiple observers

347 99 9897 96 95 - 90 / / / 70 b 60 -- y / (%) /o40 30/ 20/ 10 / 4 3 2 3 4 5 10 20 30 40 50 60- 70 80 x (%) Fig. 10.2. Multiple observer ROC curve

348 comparison the normal character for the curve that would pass through the Luce operating point is drawn together with the multiple observer ROC character in Fig. 10. 3(a). That part of the character associated with the ROC region appearing on one percent to ninetynine percent normal-normal paper is also indicated in Fig. 10. 3(a). When the Luce observer is considerably better than his co-worker, the normal observer, the ROC character is strongly bimodal. This is best seen in Fig. 10. 3(b), plotted on linear paper. This strongly suggests its origin as the sum of two displaced normal density functions. When the ROC curve is calculated, that region of the character affecting the part displayed on usual normal-normal paper is basically the part between the two modes. That is the concave section of the character, and much more like the extreme hyperbolas considered in Section 5. 2 than like the normal character. The corresponding ROC curve is plotted in Fig. 10. 4. This ROC displays much more of a "corner" than has been displayed before except for the Luce ROC. This occurs because of the relative minimum on the ROC character, which places very little probability on those z values near zero. As a result the slope drops from five (z = 1. 6) to one (z = 0) with only a 0. 01 change in false alarm probability. If this Luce observer represents the operating point on a normal ROC curve, the index would be fd = 3. 29. If a continuous normal character with this index is convolved with the d= 1 continuous normal character, the result would be a normal ROC with index

349 / \.14 _.13.12 I.09 / I.08 Y=.99 I I 4-X=.01.07 -. 06I.06 I.05 I I.04 I 03.02 / i // 01.08 Y 99: X=.01 Y-.99 "07.06.05.04.03.02.01 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 Fig. 10. 3. Two multiple observer ROC characters

350 99 98 97 96 - / -- 95- -- Luce (. 05,.95) 90- I I 80~ / 80- / I I 70 60 50 40 30 20 10

351 Vd = 3. 44. The d- 1 observer is completely overshadowed by the better observer. However, when the better observer reports only a decision and not a continuous measure, the normal observer with d= 1 becomes important in improving the ROC curve away from the better observer's operating point. Example: Exponential Plus Luce. The following work is restricted to considering a symmetric Luce character. Similar analysis can be performed with nonsymmetric Luce character, but the generality seems to obscure the nature of the result. The exponential character and corresponding ROC curve are z- z0 A+1 Z V/A A- 1 2 VE(z) - A e A-i 2> z0 = lnA (10. 50) z- O Z- Zo A 1 -A A- 1 YE YE(z) e A-i XE() e AThe terminal observer's character is formed by convolution T(Z) (- r (ZA)[rE(z-A)I E(Z(Z +ZA (10. 51) Each of the exponential terms in this convolution can be simplified. Using the double notation these are considered simultaneously. Each i:..ritten explicitly in the exponential form of Eq. 10. 50, and a commnon exponential factored from each.

352 A+1 ZZ0 oZA A+1 ZA V E(Z ZA) A- iA e a 2EA' Z > zO > zA = 0 z<z0 zA (10. 52) Care must be taken to keep track of the lower limit of each of these equations, since they differ. Rewrite the ROC character as Eq. 10. 53. A+I ZA A+1 ZA v(z) =wr(Z A) e A- 1 2 E(Z) Z > Z + ZA A+1 ZA = (z + e E(Z) z- ZA z Z <z + ZA =' A 0 0 A 0 A O~~ z < z0- ZA (10. 53) The important feature of this ROC character is that in each of the two active regions, the character is proportional to the original exponential character. The coefficients of proportionality depend on the quality of the Luce observer and the index for the exponential observer, as do the ranges. However, they are constants as far as the ROC character is concerned. Let us emphasize this constant nature by using the two symbols, B and C.

353 7(z) BrE(Z) z-z0 > ZA (10. 54) c:E(Z+ZA) Iz-zol < ZA Integrate Eq. 10. 54 formally. X(z) = B XE(Z) Y(z) = B YE(z) z > + ZA (10. 55) Use Eq, 10, 55, and the power law relation on the original exponential ROC curve, to obtain a power law relation for this ROC also. X(z) = B Y(z) B-A YA(z z > z + ZA (10. 56) This is not a pure power ROC. This equation only holds for the initial part of the ROC curve. To obtain the remainder of the ROC curve integrate the second part of Eq. 10. 54. This is most simply done by integrating in the direction opposite the usual, by integrating from the lower limit up to some value of z, and obtaining the probability of miss and the probability of correct rejection. 1 - X(z) C [1 - XE(zzA)], 1- Y(z) C [I- YE(ZZA)] I1zoI < ZA (10. 57) The expression for the original power law is used to simplify (10. 57)

354 AA [ 1 l Xz-Y( c1- 1x(z] [ - [z)I jz-z0j < ZA (10.58) C c 0 Evaluate B and C by comparing Eqs. 10. 53 and 10. 54. +B cosh(hZA) cOh(i, B> 1 (10.59) z ) C - W(A)=. /cosh() C<.25 (10. 60) The constant B is always greater than one, and is much greater than one when the Luce observer is far superior to the original exponential observer. The portion of the ROC curve beginning at the origin is a modified power curve. It has the same power as the original exponential. However, the false alarm rate will be less. To obtain the numerical idea of how much less, consider the power A = 2. 5, which is in the neighborhood of the normal Id = 1. For zA 1.45, B = 2. 22 and the false alarm ratio between original exponential and final modified power is.302. When ZA 2.95, B = 6.79, and the false alarm factor is. 063. The primary purpose of these examples has been to show how the multiple observer can be treated by using the ROC character. Examples could be similarly given to show how a completely discrete set of subordinate observers could be quickly convolved, and the resulting polygon ROC curve calculated. A second purpose has

355 also been to show the role of a relatively inferior continuous observer operating in conjunction with a superior discrete observer. 10. 4 Rating Scale Pay-Off Matrices This section deals with experimental techniques used to obtain ROC curves from subjects who can be motivated by reward and punishment. In the basic procedure for determining the observer's ROC curve (the yes-no experiment), a single decision is obtained from the observer on each observation. An operating point is determined by making a sufficiently large number of observations to obtain the accuracy desired. In order to obtain the shape of the ROC curve, the observer must repeat this observation-decision experiment, operating at a number of different operating points along the ROC curve. Instructions by the experimenter can be used to motivate the observer to shift his decision mechanism to obtain these different points. Another technique that may be used is to establish a bonus system, or pay-off matrix, for the experiment. The subject is given a four-entry table which shows him exactly how much he will be rewarded for each correct decision, and how much he will be fined (or, rewarded less) for each incorrect decision. The observer also knows the a priori probability of each of the conditions N and SN. He is assumed to operate in a manner which will tend to maximize his total bonus. The true motivating utility to the observer may not be precisely what the pay-off matrix describes, since he may

356 take into account many other factors in choosing his operating point. However, when the pay-off matrix is changed, almost all observers will respond with a change in the operating point that is in the same general direction as the operating point for an observer motivated solely by the pay-off matrix. The bonus system also gives a form of feedback to the observer, who can try to adjust his own internal description of a cut on a decision axis so as to maximize the bonus. This type of feedback may tend to stabilize the experimental situation and aid in the repeatability of particular operating points. This procedure for determining an ROC curve point by point can take considerable time. A procedure which might establish four or six or even fifty operating points on an ROC curve in a single experimental session is a tremendous boon. Such a procedure is the rating technique. In the fixed category rating technique, the observer may make one of a fixed number of responses. One end of the response scale indicates "A" with a great deal of confidence, and the other extreme represents "B" with a great deal of confidence, while intermediate ratings represent intermediate preferences for the "A" and "B" decisions. This experimental technique, a discussion of its advantages, and experiments verifying that it yields essentially the same results as the longer yes-no procedure are given by Egan, Schulman, and Greenburg (Ref. 34, and Ref. 8, p. 172). They used a four-category technique. Watson and Rilling (Ref. 33) have used an essentially continuous rating technique to obtain ROC curves.

357 In a study by Swets, Tanner and Birdsall (Ref. 26, also Ref. 8, page 130) a six-category rating scale was used to obtain an ROC curve. However, the intention of that experiment was to determine more than just the ROC curve. The observers were instructed as to the meaning of each category. Specifically, they were informed that the highest category should mean an a posteriori probability of signal presence of roughly ninety-five percent or better, that the next lower category must indicate between eighty and ninety-five percent a posteriori probability, etc. The technique was satisfactory in obtaining a six point ROC curve, but the experimenters were basically dissatisfied with the relation of the responses to the assigned a posteriori categories. In personal communications, Tanner has argued that the pay-off matrix, with its monetary entries, is a simple, clearly understood, and efficient means of explaining the experimenter's desires so that the observer can understand them. This is not a reflection on the observer's intelligence, but on his experience. The pay-off matrix deals with quantities more familiar to the observer than such quantities as a posteriori probability, likelihood, "strict" versus "medium" criterion, or "nine percent false alarm rate." A bonus system also affords the observer a specific measure that he can check after an experimental run, to see to what extent he is holding a particular criterion.

358 The purpose of this section is to present a type of payoff matrix to use in conjunction with a continuous rating scale report by the observer. It also presents a formula for tailoring this rating scale pay-off so that the ideal observer, "motivated solely by the pay-off matrix," would report a rating proportional to any one of a number of functions which the experimenter specifies. The pay-off matrix for a standard yes-no experiment, and the appropriate cut value on the likelihood ratio decision axis, is shown in Fig. 10. 5. R:\C: SN N AFA X = P(N) VCR- VFA P(SN) VD -VM B V VCR Fig. 10. 5. Fixed pay-off and f- cut matrix The only restriction on the values in the table is that the value of a correct response must be greater than the value of the corresponding incorrect response. Thus, when noise alone is present the value of a correct rejection must be greater than the value of a false alarm, when signal plus noise is present, the value of a detection must be greater than the value of a miss. This section will start with an example of one type of rating scale pay-off matrix and then generalize in two steps. Theobserver s reporting procedure will be as follows.

359 He makes the customary binary decision "A", "B", and reports a magnitude u. If he is correct, he will be paid proportional to u. If he is incorrect, he will be penalized proportional to u2. Thus, he will wish to make the value of his rating response, u, large in order to obtain a large reward, but this is done at the risk of a very much larger penalty. In addition to the description in terms of the observer's rating, u, a constant k is fixed by the experimenter to scale the positive pay-offs (to keep the experimenter from financial ruin) and a factor R is used to give a relative weight of the N pay-off to the SN pay- off. k O0 a fixed scale factor for positive pay-off R > 0 a fixed factor for N pay-off over SN pay-off (10. 61) u > 0 magnitude of observer's rating response Using these constants the pay-off matrix for what will be called the i-rating scale is given in Fig. 10. 6. R: C: SN N A VD +uk VFA -.5uR B VM = -.5u VR = ukR Fig. 10. 6. Pay-off matrix for k-rating scale

360 Consider for a moment any fixed value for u. The proper cut-off on the likelihood ratio axis cLn be obtained by using the pay —off matrix of Fig. 10. 6 in the optimum cut equation of Fig. 10. 5. P(N) ukR 5uR R P(N) = P(SN) uk+.5uZ P(SN)R ( 62 The cut on the likelihood ratio axis is independent of the value of the rating, u. Therefore, before the observer has chosen the magnitude of rating for a particular response, he may determine whether the "A" or "B" decision is appropriate. In the analysis the sole motivation of the observer is to maximize his average pay-off. Whenever his response is "Al' he will wish to choose the rating value u to maximize the expected pay-off for that particular single response. This value is given in Eq. 10. 63. > VA(t) = P(SNIx) uk-. 5u2R P(NIx) (10.63) The a posteriori probability appears in this equation because the choice of the rating will be made after the observation x. The rating u is the only variable in this equation. The proper u value is obtained by differentiating the value V. The first and second derivatives of the value with respect to u are a VA( ) ___ = P(SNlx) k - uRP(Nlx) (10.64);Su

361 a 2VA(X) -R P(NIx) (10. 65) a u? Since the second derivative is everywhere negative, the value has a single maximum at the zero in the first derivative. From Eq. 10. 64, the derivative is zero whenever k P(SNIx) k P(SN) k u R= R P(tNIx) T P(N)(x)= W (10. 66) The constants / and R are related to the a priori probability ratio, and are used to eliminate this a priori probability from the final expression in Eq. 10. 66. The same type of analysis is used to determine the proper magnitude of the rating response when the decision response has been "B'". < 3 VBt(~) = P(NIx) ukR -. 5u2 P(SNlx) (10. 67) avB(2) P(Nlx) kR - u P(SNIx) (10. 68) -P(SNIx) (10. 69) a uz P(N I x) (N)1 k/3 (1070) u =kR ~gTH = kRR (10. 70) P(SNIx) P ( = T' Let us summarize the effect of this type of pay-off matrix on the observer who is motivated solely by maximizing his

362 expected return. His binary decision is the same as with a fixed pay-off matrix with value ratio equal to R. Whenever his response has been "A', his rating is directly proportional to the likelihood ratio. Whenever his decision has been "B", his rating response is inversely proportional to the likelihood ratio. The constants of proportionality are established by the experimenter. What can be learned from such an experimental technique? As long as the observer's rating response is order-preserving with likelihood ratio, a valid ROC curve can be obtained with this method as with any rating scale method. If enough observations taken under suitably stable conditions have been used to obtain this ROC curve, an analytic fit can be used to approximate the curve. From the analytic fit the distribution function can be obtained for the likelihood ratio, as in any of the analytic models in this present work. These distributions can be compared with the distributions of the observer's u values. The experimenter can thus obtain information about the degree to which the observer has been able to match his u- scale with the - scale. The first step in generalization of this type of pay-off matrix is to determine if an ideal observer can be forced to report a rating of any other nature than one proportional to likelihood ratio. In this generalization it may be necessary to restrict the range of the observer's available rating responses. A function r(u), a rating forcing function, will play the same role as u2 in the 2-rating pay-off

363 matrix. u > uO magnitude of observer's rating response R > 0 a fixed factor for N pay-off over SN pay-off r(u) > 0 a smooth convex upward function for negative pay-offs (10. 71) The four-fold pay-off matrix is shown in Fig. 10. 7. R: \ C: SN N A VD +u VFA = -R r0(u) B V O(U) = VCR = +R u Fig. 10. 7. Base pay-off matrix for general rating scale At this stage in the development the rating forcing function r(u) will be given some additional properties to make the solution unique. After this has been completed and a single solution obtained for each situation, the rating forcing function may be generalized to make it more appropriate for particular practical situations. These restrictions are given in Eq. 10. 72. r0(u) = uO, r(uO) = 1, rj(u)> o, U> u0 (10. 72) The first restriction is used so that the pay-off matrix will be "balanced" when the observer chooses the minimum rating value.

364 That is, the value for detection will be as much positive as the value for miss is negative, and the value for correct rejection as much positive as the value for false alarm will be negative. The second restriction will insure that a solution exists. The third will always hold, the function must be a convex upper function; this is the characteristic that restrains the observer from using extremely large scale values. The first step is to determine a proper operating point by using the 3 equation of Fig. 10. 5 for the pay-off matrix of Fig. 10. 7. P(N)R Ru + R rO(u) P(N) _ -Tsj~ ~ = R (10.73) As expected, the observer may determine the proper response threshold to cut between the "A" and "B" decisions. As before, the derivative of the value function conditional to the binary decision response yields the appropriate u value. The second derivatives are indeed negative and are not displayed here. The values' first derivatives and the equation controlling the optimum rating response are given in Eqs. 10.74- 10.79. f > VA(f) P(SNIx) u - R r0(u) P(NIx) (10. 74)..A.. - P(SNIx)- R ri(u) P(Nix) (10. 75)

365 1 P(SNIx) P r)(UA) = R P(N1 = ) (10.76). < VB() = P(NIx) R u - r0(u) P(SNIx) (10. 77) a u P(N I x) R - r%(u) P(SN Ix) (10. 78) rb(UB) - R P x -) (10. 79) Equations 10. 76 and 10. 79 appear to repeat virtually the same situation as (10. 66) and (10. 70), namely, that some function of the rating response be directly proportional to the f over j ratio under condition A and inversely proportional to the same ratio under condition B. This is the basic restriction on this present technique. The rating may be forced to follow some particular function of the experimenter's choice only if the experimenter's function is in turn a function of the & to B ratio. However, in the study of decision processes based on likelihood ratio there are a number of functions of this type of interest. One more step of generalization is called for before displaying particular cases of rating scale forcing functions. The pay-off matrix of Fig. 10. 7 can be changed by the addition of a properly selected set of four numbers which will change the pay-off to the observer but will not change any of the equations. That is, the decisions andthe rating scale reports will be exactly the same, but

366 the monetary return to the observer will be different. Assume that the experimenter has been successful in solving Eqs. 10. 76 and 10. 79 for his particular rating function. Once he knows the value of the minimum rating, u0, he may.choose any three constants, K0, K1, and K2, so long as K2 is less than 2u0. K0, K1, arbitrary, K2 < 2u0 (10. 80) The role of these three arbitrary constants is shown in Fig. 10. 8. R: \C: SN N A u -R ro(u) K K0 +R K2 B -r 0(u) Ru K1+K2T K0 K1+ u K0 + R K2- R r0(u) K K2- r0(u) K +Ru Fig. 10. 8. Formation of complete pay-off matrix from base matrix The resultant pay-off matrix must be a valid pay-off matrix, an affect the optimization equations. To be a valid pay-off matrix, the value paid for detection must be greater than the value paid for a miss. VD > VM: K1+ u > K1+ K2- rO(u) (10.81) The constant K1 appears on both sides of the equation and therefore

367 disappears.. This leaves the restriction r0(u) + u > K2 (10. 82) Because the forcing function r0(u) is monotone increasing, it takes on its smallest value at u0. r0(u) + u > r0(u0) u0 = 2u0 (10. 83) By definition K2 is smaller than this value, and the value of detection is indeed greater than the value of a miss. Similarly the value for a correct rejection is greater than the value paid for false alarm. VCR > VFA: K R u > K + R K2- R r(u) (10. 84) Subtract K0 from both sides of this equation, divide by R, and obtain Eq. 10. 82 again. The final pay-off matrix in the I equation is P(N) Ko +Ru [K R K2 - R r0(u)],= P(SN) K u (10. 85) P(N) 0 2 P~sN)....u-K2 +r0(u):........ (10. 86) P(N) RuRK2R (u)0 =P(N)) R (10. 86) P(SN) u - K2 + r(u) P(SN) the same j equation as always. Although K0 and K1 disappeared from the / equation, they will appear in the value equation under the conditions A and B.

368 2 > / VA(~) = P(SNIx) [Kl+u] + P(NIx) [K + RK2 - R r(u)] (10.87) However, when one differentiates to find the maximum, theseconstants disappear and the same equation as before appears. aVA( ) au = P(SNIx) - R rb(u) P(NIx) (10. 74) Similarly, the constants appear in the equation < / VB() = P(NIx) [K0 + Ru] + P(SNIx) [K0 + RK2 - RrO(u)] (10. 88) but disappear upon differentiation, a VB(f).... = P(NIx)R- rb(u) P(SNIx) (10.77) The experimenter therefore has a great deal of freedom in choosing the actual pay-off matrix that he will use in his practical experimental situation. To determine the rating scale forcing function we may consider the simpler base pay-off matrix of Fig. 10. 7. In the remainder of this section, three rating scale forcing functions will be developed, both for their own sake and to demonstrate the technique for determining such a forcing function.

369 Example: Desire uA k( ) and uB = k( ) The basic tools are Eqs. 10. 76 and 10. 79. These are symmetric equations. Equation 10. 76 shall be solved first in each case, and then Eq. 10. 79 checked to see if it is satisfied. Equation 10. 76 says that the first derivative of r evaluated at the proper value of u will be equal to the i to 3 ratio. The desired result is to have the proper value of u equal to This requires that r' evaluated at 1 equals.3 rt(k ) = X r (u) - This determines the equation for the derivative of r. The side conditions of Eq. 10. 72 set the value of r' at the minimum. Since the minimum of the f to j ratio is one, u0 is equal to k. r (u0) 1 = u0 (k) The function ro(u) is obtained by integration of rb(u). Integration introduces an arbitrary additive constant, normally suppressed but displayed here 2 ru u ro(u) = S r(u) = du du + c The first side condition of Eq. 10.72 determines this arbitrary constant.

370 r0(uO) = u C Gathering these all together, the final form of the basic rating forcing function which will make the response proportional to the likelihood ratio under the A condition is k2 k k uu r0(u) = ) u > k Although not displayed here, this does check with Eq. 10. 79. This function differs somewhat from that given in the original example. The difference is that the original example had a constant added to rO(u) to form the simpler appearing function k u r(u) rO(u) - 2 2k = r(u) This forcing function is displayed in Fig. 10. 9(a), Example: Desire u = klz - ln P3 Writing z as ln f, we insert the desired form of u in the equation for the derivative of r. A: r6(k(ln&1-ln 3)) =: rb(u) = euA The side condition on the derivative of r at the minimum value determines that the minimum value is zero.

371 6K 5K (a) 4K r 2K - Opt uA = K /13 K uB =K u/K O - - uo = K K 2K 3K 7K 6K (b) 5K 4K t r0(u) = K[eU/K- 1] 3K _ /Opt u=Klz- ln B1 2K u uo = 0 K 2 K I o 0 0 K 2K 7K 6K (c) 5K 5K R= f= 1 4K t rOIu)/ ro(u) = K [1- ln2 - - ln 3K 3K LOM 2K Opt uA=KP(SNIx) K uB = K P(Nlx) o.... I l',I I I........... __ I... UB=K~o K/2. 5K.K.7K.8K. 9K K U0 = K/2 Fig. 10.9. Rating forcing functions

372' rb(uo) = 1 0 u0 -0 Integrating the derivative to obtain the function rO(u) =k eu/ +C ind using the side condition to determine the constant of integration r0(O) = = -k the final form of the forcing function is determined. rO(u) k[eu -i] u > This forcing function is shown in Fig. 10. 9(b). Example: Desire uA = k P(SNIx) and uB = k P(NIx) The goal is to choose the forcing function, rO(u), so that the observer will report u values that are proportional to a posteriori probability, P(SNIx). The formal solution for the "A" condition has a solution, but it differs from the "B" solution unless R and P(SN)/P(N) are both one. Restriction: R - 1, P(SN) =. 5 = P(N) If this restriction is made, the solution is

373 r (u) = kL -ln2 k ln(1- u < kThis is drawn in Fig. 10. 9(c). 10. 5 Summary of Chapters IV-X Many different features of ROC curves have been developed in the above chapters. The purpose of these developments has been twofold: (1) to obtain specific useful results, (2) to demonstrate techniques for working with ROC curves and ROC characters. The author hopes that others working with simple decision situations may find some useful results among these developments, and techniques that will be helpful for their own needs. If the radar analyst can determine the appropriate ROC family for his device, he will find which parameters he needs to evaluate to determine a specific ROC curve. Those in psychophysics will find a variety of curves and characters that may be more like their experimental data than the previously available ROC curves. Every result obtained suggests further research. Especially needed is research in fitting ROC curves (and characters) to experimental data so that one may extract what information is truly present, by placing appropriate weighting on each datum. This work has no conclusion, only a continuation in the development of our understanding of the evaluation of decision devices.

APPENDIX A TWO THEOREMS ON ROC CURVES ANDA LUELIHOQD RATIO This appendix is part of the notes for one lecture issued to the students enrolled in "Random Processes," a University of Michigan Engineering Summer Conference, in the years 1959 through 1963. It is included in this present work because it contains the proofs of theorems herein called one and two. A more general proof of one-half of the second theorem is given in Chapter I. Theorem 1: Given two simple hypotheses, the ROC image of all randomized decisions is convex, and contains the chance diagonal P(AISN)= P(AIN). It is symmetric through the center (. 5,. 5), that is, if (x, y) is in the image, then so is (1 - x, 1 - y). Proof: (1) The chance diagonal: for any real number r on the closed unit interval, consider the randomized decision probability g(x) = r. The expected value of any constant is that constant. P("A"tt I N) = E(rIN)= r and P("A"I SN) = E(rlSN)= r (2) Convex: Consider any two distinct randomized decisions, say, gl(x) and g2(x). The ROC images are P("Ai" IN), P("Ai' "ISN). We wish to show that for any real number a 374

375 Proof (Cont. ) on the closed unit interval there is a randomized decision probability g3(x) whose image is [aP("A 1' IN) + (1 - a) P("A2" IN), aP("A1" SN) + (1 - a) P("A2" SN)] The obvious choice is to try g3(:) = a g1(x) + (1 - a) g2(x) Since g1 and g2 range from 0 to 1, as does a, g3 will also; the sum of integrable functions is integrable. Therefore the above g3(x) is admissible as a randomized decision probability, and the linearity of integration assures that the image of g3(x) is the desired point (on a straight line between the images of g1 and g2). (3) Symmetry: If the image of some g1(x) is the ROC point (x, y), then consider g2(x) = 1 - gl(x). This has the range 0 to 1, is integrable, and has the desired image at.(1- x, 1 - y). With respect to the ROC, g2 is called "the opposite of g." Q. E. D. Theorem 2: The collection of all criteria based on likelihood ratio maps onto the upper boundary of the ROC region. Proof: The two steps in this proof are the "into" and "onto all"

376 Proof (Cont.) parts, i. e,, to show that any criterion based on likelihood ratio maps into some point on the upper boundary, and then that any point on the upper boundary is the image of some criterion based on likelihood ratio. (1) Consider any decision based on likelihood ratio g(x; /, r) and any other randomized decision probability, h(x), with not greater epected value under N, i. e., E(h(x) I N) E(g(x;, r) IN) Whenever f(xlSN) -.f(xlN) > 0, Q(x) > 3, g(x; 3, r) = 1 and g(x; j, r) -h(x) > 0; whenever f(xlSN)f( x (xJN)< 0, &(x)<,. g(x;3, r) = and g(x; 3, r) - h(x) < 0 That is [f(x SN) - f(xIN)]. [g(x; 3, r) - h(x)] > 0 Therefore.f[f(xlSN) - t3f(xlN)] [g(x; 3, r)- h(x)]dx > 0 Expanding

377 Proof (Cont.) E(g(x; /3, r) SN) - E(h(x) SN) - 3[E(g(x; A, r) N) - E(h(x)lN)] > 0 Since the last two expected values have a positive difference by original assumption and d> 0, E(g(x; j, r)[SN) > E(h(x) I SN) Thus one concludes that, given any criterion based on likelihood ratio, its P("A" ISN) value will be greater th[ or equal to the P(/'A" ISN) value for any randomized decision device with equal to smaller P("A" IN) value. Considering only equal values of P("A" IN) is all that is needed to show that any criterion based on likelihood ratio maps into the upper boundary of the ROC, concluding step one. (2) One wishes first to show that for any k, 0 < k < 1, that there is a nonnegative 1k such that E(g(x; Elk' 0)[N) < k < E(g(x;'k' 1)IN) since if one of the equalities holds, that strict-criterion will have the appropriate P(A IN) value, and if neither equality holds, the value k k-E(g(x; ik' 0)N) rk E(g(x; Ok' 1)IN) - E(g(x; Ok, O) [N )

378 Proof (Cont.) will yield the appropriate probability E(g(x;,k' rk) N) = k Although it may be obvious to many that E(g(x, 3, 1)IN) is continuous on the left as a function of 3, and that E(g(x;., 0)1 N) is continuous on the right, and that these functions having common points of continuity and discontinuity by their very definitions establish the proof, the following simple steps may help others to that conclusion. At the extreme left of / = 0, E(g(x; 0, 1 N) = P(e> 0[N) 1 Toward: the right extreme of arbitrarily large note that E(x; A r3ISN = Sg(x;, r) -X)f(xN)dx > g(x;, r)f(xlN)dx that is Eg(x; g, r)IN) < E(g(x; /3, r)'SN) < - so that E(g(x; cc r)l N) im = For any point 30 not zero or infinity, if [3 < K0 P(Q(x) > i3 I N) = P( (x) > /3G I N) + P(i < (x) < /30) N) Since as 13 approaches /0,it exceeds all values less than /0'

379 Proof (Cont.) rn o P(I P(x)< 301N) = 0 and so lim P-o00 E(g(x;, 1) N) = (x; 1)x; N0, 1) IN) if > 0 P((x) >> 1N) + P(G > >(x) > P0IN) =P( > P01N) Since as p approaches p0 it excludes all values greater than. 0 i r dP(.(x) ~>,) IN) = P(I > p 0N) and so. lim p-Wi g E(g(x;, 1) IN) = E(g(x;.0' 0) IN) Thus the monotone decreasing (nonincreasing) function E(g(x; j3, 1)iN) has extreme limiting values of 1 and 0, and at any t 0 between zero and infinity has a left-hand limit of E(g(x; i0', 1)IN) and a right-hand limit of E(g(x; P0, O)IN). The extreme value of 1 corresponds to the strict criterion that is the whole observation space (which is based on likelihood ratio in a trivial sense), and the extreme value of O corresponds to the strict criterion of all observations

380 Proof (Cont. ) with infinite likelihood ratio (f(xlN) = 0 and f(xlSN) f 0). The extreme limiting values serve one other purpose; for any value of k between but not equal to 0 or 1, there will be a set of 3 yielding E(g(x; I, 1)1N) values bet tweenk and 0, and a set of fs yielding E(g(x; 03, 1)IN) vaiues between k and 1. If the value k is not taken on by the function, then these two sets of U's are disjoint, and the cut value is such that the left-hand limit is greater than k, and the right-hand limit is less than or equal to k. This is the appropriate 3k' and the value of rk was given at the beginning of step 2. Q. E. D.

APPENDIX B A LOWER TRUNCATED SIMPLE EXPONENTIAL IS LOWER METASTATIC INVARIANT In Section 3. 4. 2 the statement was made that the metastatic image of a pure power ROC curve would be the same as the original if the point (0, 0) was included in the transformation. This statement is proved in this appendix. Denote the original ROC as X1 = A;. A > 1 and its character is 1 A+i z -i -i A-i 2 1i(z) =(A- 1) A e A-> 2,z > z0 = -In A The equations for the coordinates, as functions of z, are z-zO -A -X,1(z) = e Z —~ Y1(z) e A-i Choose any cut, z c greater than z0. The metastatic transformation C. will be to use only that part of r1(Z) for which z is greater than zc. Let a = 0 Yl(a) 0 C -Z0. Zc-Z0 -A -._ A-i A-ILet b = e Y1l(b) = e 381

382 The second ROC will be the image of the rectangle, < X < b O< Y < Y (b) Z2 Z1 + in Yl(b) aYl ~ z 1 m iz z O z c z 2 c The minimum Z2 value occurs for zl at the cut value zc mil. C The new ROC character is given by applying Eq. 3. 34, -l (Zl(Z2)) 2(2) v (b - a)(Y(b) - Y(a)) T(2 + Zc - ZO) e- e cZ2> z [A~ 1O eiziA}5'2- 0 A+1 Zc-Z0 1 A+1 Z2+Zc Z0 A-i 2 1 -i A-i 2 =e (A - I)- A e -A. A-1 A-1 2 =(A-1) A A-i e A —1 2 mr2(2) = 7r1(22) z2 _> Z0 Q. E. D.

APPENDIX C APPROXIMATION TO A CASE II CHARACTER The purpose of this appendix is to obtain Eq. 4. 15, an almost normal appearing character to approximate the a -2 Case II charactero The equations for the ROC character for Case II are written in terms of a parameter of quality, a, and a dummy, t. The tabulated function Tv(X):=e- I1(X ) (4.11) is used in computation. The computational equations are z:= [n T0(at)+ at] -. 5a2 (4. 13) In n(z) r [. at + ln(at) + 1. 5 In To(at) - In Tl(at)]2 0 -. 5t [.25&2 + 21n] (4. 14) Let Bi(at) = In TO(at) + at (C. 1) B2(at) -.5at + ln(at) + 1. 5in T0(at) - In T1(at) (C. 2) These were calculated over the range of at from zero to twenty. The results appear in Table C. 1, and B2 is plotted against B1 with integer at points indicated in Fig, C. 1. These equations were used to calculate the log of the ROC character for a = 2. This is tabulated in the fourth and fifth 383

at Bn(at) B(t X(t) +.725 -(n(z 1.385 01. 0,.6935 -2.000 1. 6935 1. 000 -1.27 5.3085. 2.013.7:09 -1. 987 1. 683 995 -1. 262 298.4 039.734 -1.941 1. 673.980 -1.216 1.288 ~ 6 1 088.792 -1. 912 1.640.956 1 87.255. 8. 155.8'488 -1.:845 1. 619.923 -1. 120.234 1.0.235.923 -1.765 1.589 883 1 -1.040.204 1. 2.334 1. 021 -1.666 1. 546 835 -0. 941.161 1. 4.1 441 1. 118 f - 1. 559 1. 514. 81 1 -0.834. 129 1.6.560 1.230 -1.440 1 1.477.726 -0.715 7.092 1. 8.687 1.:344 -1. 313 1. 448.667 -0. 588.063 2. 0.8'2.3 1. 466 -1. 177:1 421.606 -0. 453.036 2. 2.965 1. 1586 -1.035 1. 406.546 -0. 310.021 2.4 1. 113 1.712 -0. 887 1. 39 5. 487 -0. 162.010 2.6 11.2,69 1. 846 -0.73 1 1. 386. 430 -0.006 O,:001 2.8 1. 426 1.974 -0. 574 1. 393 1 375 +0. 151.008 3.0 1. 585 2. 105 -0. 415 1. 407.325 +0.310 022 4.0 2. 42.5 2. 744 +0. 425 1. 643.135 + 1. 150.258 5. 0 13.304 3.374 1. 30 4 2. 138.048 +2.029.753:6.0 4. 209 3. 9:87 2.209 2.900.0111 2.934 1. 515 7. 0- 5. 126 4. 508 3, 126 4.:004 0. 00218. 3. 851 2. 619 8.:0 1 6.058 5. 177 -.. 4,.058 -:5. 210.1.~~ 000335 1 4 783 1 3.825 9.0 7.000 5.760 5 00 6.752.00004 5.7:25 5.367 10. 0 7. 942 6. 323 5. 942 8. 564. 00000373 6.667 7. 179 12.0 9.850 7.452 7.850 12. 935 1.0000(000152 8.572 1 1.550 -13 15. 0 12.739 9.117 10.739. 21.395 6.x 10 11.464 2. 010 18.0 15.641 10. 731 13.641 32. 156 14.366 30.771 -22 20.0 a 17. 59 11. 795 15. 59 40. 592 1. 93 x 10 16. 315 39. 207 Table C. 1 Calculated number

B2 ('vt) 8- - f~~~~~~~~~~~~la -= 18 to. ~~~7- ul~~~~~~ 2 t 0- ~ B at = -0. ~6~Fg -- C 2 ~ 9101 33 112 13 Bl(at) = z +.5d Fig. C. 1i. Comparison of intermediate calculation functions

386 columns of Table C. 1, and plotted in Fig. C. 2 for the z range of -2 to +5. This range of z corresponds to the range of false alarm probabilities from x =. 00004 to x = G, 00. This arc appears so parabolic that the calculations were continued to obtain a power-law fit to the arc. From Table C, 1, columns four and five, it was estimated that the mode of In i(z) occurred at z = -. 725 at which point the lnlT(z) value was -1. 385. If the character were truly parabolic, then it should obey the law in r(Z) = -1, 385 - A(z +. 725)2 (C. 3) If so, then - (In -(z) + 1. 385) = A(z + 725)2 (C. 4) A plot of -(in 7(z) + 1. 385) against lz +.7251 should plot as a straight line of slope 2 on log-log paper. These last-two quantities were calculated; they are listed in columns seven and eight of Table Co 1 and plotted in Fig. C. 3. The plotable ranges for Fig. C. 2 are -2. to -. 887 for z below the mode of z = -. 725, and -, 415 to 15. 59 for z above the mode. Slight differences in the fit near the mode are inconsequential. The character above the mode ist the more, important arc, because it covers the false alarm range below x =.325 All plotted points above the mode are well fit by a single straight line. However, the slope is not two. The end points of this straight line are given in Table C. 2.

387 -1 i n. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I -2 -3 4 4 -5 _~~~~ _. -6 \ -2. -1 O 1 2 3 4 5 Fig..2. ZOC character Case II, 2 Fig. C.2. ROc character Case II, a! = 2

,10 * -.41:5< z< +15. 59 - slope of 2: 1. 0t: 1. t..!,,.. _ 1,,,,,,. i1 l,; 1. 10 10.01. 1 0 $ 10i 10 -[ln r(z) + 1. 385] Fig. C. 3. Graphic determintion of a power law relation

389 z +.725 -(in 7(z) + 1. 385) x(z) 0. 310 0.22 0.325 -22 16& 315 39. 207. 1.93x 1022 Table C. 2 End points of approximation This straight line has the equation -in ir(z) -1.385 =.20 z +.72511.885 (C. 5) From which it follows that z-1. 385 -. 21z +. 7251 885 (C. 6) /-(z)= e e Equation C. 6 may also be written1 Iz +. 7251" 885 i(Z) =.25 e (Co7) This may reasonably be considered an approximation to the a = 2 Case II ROC character, since it also fits the mode region well. The conclusion is the Eq. 4. 15. > -2 (CO 8)

APPENDIX D DETAILS OF AN APPROXIATION TECHNIQUE FOR SECTION 4. 3 -~ -- - -- - app,..,. - This appendix contains a demonstration f how the ROC character ca be used to obtain an approximation for the "Case 3: Noise in Noise" ROC curve. The special restrictions are that WT is large, and the signal4ooise ratio, S/N, is small. Some idea of the magnitude of the various terms involved in the ROC character must be obtained. The first of these is the lower bound, z0. The first several terms of the standard approximation for the logarithm yield =z0 -WTln(i+ +) -WTN (1- + ) (D. 1) The lower bound Z is negative, and its magnitude is the product of the large quantity WT times a small quantity S/ N. For detectability to be reasonably good the range of the nonzero part of the ROC character must extend to large values of z. Although large positive values might appear to be sufficient to have a good ROC curve for low values of false alarm, one cannot have large positive values for the ROC character without having large negative values also. (The expected value of z is negative under the condition N. ) Make the further assumption that z0 has large magnitude. The logarithm of the ROC character ln7(z) = (WT - 1)ln(z - z0) - ( +. 5)(z - z0)+ WTln( )-lnr(WT) (D. 2) 390

391 and its derivative Z Z d WT1 N Nm (D. 3) dz ln(z) - 5) = W( (D. 3) dz z0- S S Z z- z are going to be the tools in determining the nature of approximation to make for this ROC curve. The quantity Zm in Eq. D. 3 is the mode of the ROC character. WT - 1I S S S 2 Z + S =z0 + (WT - 1) )(1- 2S + ZZm (1 + 0 2N 4N2 (D. 4) Since z0 is roughly -WT S/N and the mode zm is (WT - 1) S/N above it, zm must be in the neighborhood of S/N, that is, very close to zero. Equation D. 3, with mode set at zero is very similar to a normal ROC character if the denominator changes very little6 To formalize this, expand Eq. D. 3 is a power series in z about the mode, zm rm z -z z-z z-Z Z-Z m m m m 1 Z "(0)~(Zz0 )z " ____ _ m Ornm (D. 5) -Z -" Z <z-o \Zm a:k - z0 k=Z Z Z Z m m =(:mf) kz0 (-)k;n ) IZzI< Zm Z0 For z very close to Zm, neglect all except the first term d-/N - \ 2

392 Since S < < N D z0:a: -WT N (D. 7) z Z0 (WT 1)() < ol (D.8) d lnn(z)':~ (N m)2 () (z - z ) (D dZnW ( ZWZT WT For a normal ROC character d1 naN(z) ( -z/ d (D. 10) Can we treat the small S/ N case as a normal ROC? The matching index is d=(WT - 1) (S 2 (D. 11) To answer this we must determine how well the entire ROC matches; specifically, how the values at the mode match for the same index. Calculate lnlT(z ). m inrJ(z) = (WT - 1)ln(zm ) -(+ 5) Zm n ( lr (D 2) (WT l)ln(zm - z) =(WT - 1)lnf?WT -n ( + j] (D.o I2) (W -1)= (WT - 1) =n(WT - 1)-+(WT - ) n ln ( + Now (WT - 1) ln(WT - 1) is closely related to ln f Specifically,

393 Eq. 6. 1. 40 of Ref. 6 gives Inr(k) ~ (A -.5)ln - + lnf2i +.. hence (D. 13) (WT- 1) ln(WT - 1) =t5 ln(WT 1) + (WT i 1. 5)ln(WT - 1). 5 ln(WT - 1) + lnr(WT - 1) + (WT - 1) - lnF2-f' (D. 14) Collecting (WT - 1) ln(z - zO). 5 In(WT - 1)+ Inr(WT - 1)+ (WT - 1) - In12,f2 +(WT -1) [l~- n i - ] (D. 15) Next --- + *N 5) ( m ZO) =-(WTZ- 1) (D. 16) Hence Insr(zm) o 5 ln(WT - 1) + In r(WT - 1) - In r(WT) - ln'27 + (WT - 1) S S S S [ln i ( +WT[ln~ +.51nl+ ) (D.17) Now r(WT) = (WT - 1)r (WT - 1) (D. 18) So -in r(WT) = -ln(WT - 1) - lnr(WT - 1) (D, 19)

+ +. 5WTIn lnn( ( -, (WT -! -ln (1 +!n +. (WT -!( - (D. 20) The first three terms pc be written as -n 2i7T(WT-1) () = -ln (D.21) Expand the last two terms in power series (WT-!)ln (+; =(W= -(WT 1) (D0 22) +05WTn + 5WT. (.23),.......cl~e 2 3 +N (D. aS) Adding, and collecting in Powers of S/N S S S2- [-(WT 1)+ WT] N Si= [(WT- 1)- 2WT] = -T + 1) 8N2 8N2 S3 S.- [-(WT - 1)+ 4WT] = (3WT + 1) 24N3 - 24N S4 S4 [(WT- 1)- WT] -(7WT +1) 64N4 64N4 (D, 24) Carefully review the assumed order of magnitude of the variables. (1) S/N is quite small, S/N << 1

395 A If d is to be the detectability index, it should range from o 5 to 50. Hence (2) (WT - 1) (S/N)2 is between. 5 and 50 (1) and (2) mean (3) (WT - 1) S/N is very large Putting these all together means the second term in the above collection from the two power series is outstandingly large compared to the rest. It is s" S2 S2 S" 1 ^ 8N2 S- (WT- 1) - 1. 8N2 8N2 4N2 8N2 (D. 25) Gathering all together in~:(zml ) In - d )n(ZM) - 8d -in W1 d (D. 26) m i^8 To summarize the approximation technique; the Pearson ratio, the derivative of the logarithm of the ROC character, under the assumption that the range of the variable was substantial, suggested a match with a normal ROC curve. By matching values for the derivation of the logarithm of the ROC character, a relation between what would have to be the matching normal index and the parameters of this problem was obtained (Eq. 4. 38). Since a metastatic transformation of a normal ROC curve would also give this same matching value, it was necessary to calculate an actual value on the height of the ROC character, or correspondingly, its logarithm. Using the standard power series and approximation-piled-on-approximation technique common in applied

396 mathematics, such a calculation of the magnitude of the ROC chareacter at its mode aue was obtained This is Eq. D. 26. This maximum value of the ROC characters matched that for the normal. Since the mode is extremely close to zero, the logarithm of the ROC character is parabolic in the neighborhood of zero and the magnitude of the ROC character corresponds to the width parameter in the parabola in exactly the same way that a normal character behaves. We conclude that the Pearson m- characters are essentially normal in the limiting case studied.

APPENDIX E A CONIC ROC COMPUTER PROGRAM This appendix contains the details of a computer program mentioned in Section 5. 4. A regular conic ROC is specified by its minimum slope (called LL0), maximum slope (called LHI), and one super-chance point (X0, Y0). These three items of information are the input information to the program. A simplified computer flow diagram is shown in Fig. E. 1. The equations for each computation block constitute the remainder of this appendix, SETUP Certain quantities are used repeatedly and do not depend on the input data. These are computed first. Z(I) 1(1-100) 1 <I < 200 X(1) = 10-4 X(2) = 10 1< J < 52 X(J>3) =. 02J-. 05:H(J) InX(J) (J) = n 1-X(J) CHECK The point (X0, Y0) must lie between the chance diagonal and the Luce ROCo The program is written for the ROC curve slope range (. 00005, 22000). If any of the following inequalities is TRUE, a transfer to "GOOF' occurs. 397

398 rse up Irbtet ruP~~~~~~~~~~~~~~~~~~~~~~~~~~~a'~ z(1, i) nt Progcam. Z;O' {M Pr 0, VA MM ~~~~~~ ~ ~~~~Title, ]LlL No~ CHEC 1 — ~ Is(Xo. Y Q) o ut.de Lge RO or L1I too large Or LLO too smail Ye Cmpts Zd.s Gomputeqs Cot Cntant Print o. LLO Lm ZLO Zmm......F LN ranges from L range A. B C, D, E., K, C5 Print A, B, C,, D, E, A, K Print title Compute RQC for Prnt X(J),r~tlN, -)~ to table I ar? linenr Paaer wtth slopeat eh pt. Print Page Prtnt XO, Yo LLo HlL Compute VrJ L(P)rn rX-. Pi6 ZTitle an.. I I.,... Print Pase LHI~ (j) ~ ~~~~~~~~~~~~~~~~~~ Print X)(J)':~~~~~ _3 Fig. E.1. Simpifie H(fo), ~ig(J), M' a tabl HW a Print ""nput Data it PrintX Y iut of Range OC ram Fig. E. 1. Simplif ied flow diagram for: a conic uk, Drog. ra

399 0O 0 Y0> (LHI)X0 (l-YO) < (LLO)(1-X0) LHI > 22000 LLO<.00005 CONSTS The constants in the algebraic equation for the ROC and several auxiliary constants are computed. ZLO = In (LLO) ZHI = in (LHI) ILO = (integer part of) 10(ZLO) + 101 whenever this is less than 1, set ILO = 1 IHI = (integer part of) 10(ZHI) + 100 whenever this is greater than 200, set IHI = 200 1 = (LHI + LLO - 2)/(1 - LL0) C2 (1 - (LLO)(LHI))/(1 - LLO) 2 C = -2(Yo- Xo)2 -=B1 -AC A C3( 1XYO + C 2e0 + YO- (LHI)XO B = C1 - A C = 2C2 + A D =i1 E -(LHI) =B2 AC K-= 4AEa- -2BDE + CD2 C5= 2(B + A(LHI)) The ROC curve appropriate for plotting on linear-linear graph paper is computed, for the x points precomputed in SETUP. The slope of the ROC curve, the likelihood ratio, is also computed. x = X(J) Y = ( LAX + C5X+ 1 - BX- 1)/A 1 J 52 Y(J) = Y L = (-E - AC - BY)/(D +BX - AY)

400 The horizontal coordinate X is called from memory to compute the vertical coordinate Y and the slope L, X, Y and L are then printed out and Y is stored for later use, L is not stored, nor does it correspond to the stored list L(I). PIOFZ This computes the ROC character. The general equation for the conic ROC character is r(z) = k(AeZ + 2B + CeZ) 5 The computational equations are T = AL + 2B + C/L CHAR = k/(T;fT) for L= LLO; L(I) IL0 < I < IHI; LHI After each character value is computed, L, Z, and the character value are printed. LORLOR The log-odds-ratio coordinate for any probability p is In (p/(l - p)). This was computed for the x coordinate in SETUP. The y values were calculated in LINLIN. H = H(J) V=ln Y(J) 1 < J < 52 - Y(J) -- MPRIME = V - H The program terminates (on the University of Michigan machine) when the READ DATA statement fails to encounter new data0

APPENDIX F BINORMAL COMPUTER PROGRAMS This appendix contains two programs for finding regular ROC curves related to an irregular binormal ROC curve. Both assume that the slope, S, is between zero and one and that the quality Q, (the d' value at Y =. 50) is positive. The first program searches for the point of tangency of the ROC curve and the tangent inferior. It then performs "external rectification"' by adjoining the tangent inferior to the regular part of the ROC curve. The printout for the regular part of the ROC curve consists of the ROC linear coordinates, the slope L, its natural logarithm z, the slope of the secants from (0, 0) and (1, 1) to the ROC point, and the ROC character. The second program operates on the decision axis inferred by the original ROC. The likelihood ratio decision is essentially a twotail test whereas the original ROC is a one-tail test. The program computes the original ROC, the lower tail contributions, the total two-tail ROC curve, and the ROC character. Figure F. 1 is the computer flow diagram for the external rectification program. The details of the printed comments and equations are as follows; 401

402 Pirint am sCoptead Nit. 1 " T".' e R1c0wc~c -:, - - -Fig. F. 1. Flow di agrasm fr e8er-na 0 - rectifttion T~~~me'Colhinintis~'ECI1 Figi ~~~~1F'ct i Itoptse Pinu L'' Y:re-~~~~~~~ifego gt?

403 START1 The program title reads: REGULARIZED BINORMAL ROC PROGRAM NUMBER ONE BINbRMAL ROCS PLOT AS STRAIGHT LINES WITH SLOPE LESS THAN ONE ON NORMAL-NORMAL PAPER. IF NOT REGULARIZED THEY WOULD STOP SHORT OF (1, 1) AT Z = Z0. TO BE REGULARIZED THEY ARE CONTINUED ALONG THE TANGENT INFERIOR TO (1, 1). THE INPUT DATA ARE IN TERMS OF THE ROC ON NORMAL-NORMAL PAPER. Q IS THE VALUE OF DPRIME AT. 50 DETECTION, AND S IS THE SLOPE. SETUP- TN(I) =.1(I- 25) 1 < I < 63 X(I) = ~(TN(I)) This computes the normal argument from -2. 4 to +3. 8, and the associated false alarm probability range is.0082 to. 999928. These points are uniformly spaced on normal-normal paper. The University of Michigan IBM 7090 MAD Language does not have a c ( ) subroutine. The actual equation used is 0>(t) = 5+. 5*ERF. (T-t) ENDPT The EPT comment reads: THE UNREGULARIZED BINORMAL ROC WITH SLOPE S AND. 50 DETECTION DPRIME Q WOULD RUN FROM (O. 0) TO THE POINT OF MINIMUM SLOPE. Some constants, and the coordinates of the point of minimum slope are then calculated.

404 WheneverS>, S = 1/S A = S Q/{( - s) B = (1 -iS2 )/2S2 S1 =V s2 =1/s1 Z0 - ns -QSA/2 L0 e x= (SA) YO = (fA) REGROC The slope of the ROC curve, L, will usually fall below the secant inferior before x -. 999928. this program assumes that 0. 1 steps in normal argument is sufficiently fine quantizaton tat the tangent inferior corresponds to the secant inferior which first falls above L. ROC curve points are computed until thisoccurs, or until I = 63. TN = TN(I) TSN = S(TN + Q) C -= 39894228 EXP. ((TN + TSN )/ 4) PI = C/(S1 TSN -S2 TN) X = X(I) Y= CI(TSN) ZZ+B(A TSN )2 L = eZ SECSUP = Y/X SECINF (1 - Y)/ (1 - X) If the slope of the ROC curve is still above the secant inferior by x=. 999928, then the computation terminlates and the following OUT comment is printed: IN THE ABOVE COMPUTATION, NO REGULARIZATION WAS USED.

405 TANINF The TAN comments read: BINORMAL ROC IS IRREGULAR- FROM HERE ON. TANGENT INFERIOR PORTION FOLLOWS. The loop index, J, for this computation is initiated at the last used value for the REGROC loop index, I. This repeats this computation for that one point, and then continues until x =. 999928. X = X(J) Y= 1 - T + TX SECSUP = Y/X The program terminates (on the University of Michigan machine) when the READ DATA statement fails to encounter new data. The flow diagram for the internal rectification program is given in Fig. F. 2. FRONT The front page of each program bears the following program title: REGULARIZED BINORMAL ROC PROGRAM NUMBER TWO. BINORMAL ROCS PLOT AS STRAIGHT LINES WITH SLOPE LESS THAN ONE ON NORMAL-NORMAL PAPER. IF NOT REGULARIZED THEY HAVE A POINT OF MINIMUM SLOPE SHORT OF (1, 1). TO BE REGULARIZED A NEW DECISION AXIS BASED ON LIKELIHOOD RATIO IS USED. THE INPUT DATA ARE IN TERMS OF THE ROC ON NORMAL-NORMAL PAPER. Q IS THE VALUE OF DPRIME AT.50 DETECTION, AND S IS THE SLOPE. TANGENT INFERIOR P(A/SN) IS GIVEN VS P(A/N) OF THE FINAL ROC.

Pri,'t Program. Read Data. FO S TRP 3{< - -E ~"::"'t~e -" -i T"~'~ s. No, Tv Is / g \;Compute Prtnt Prnt l Is. NO No (3-IN ~ M;,pe=9 ~N T1. — X FA Xpt, A, N ~~~ ~~~~~3; ~~~~~ ment "JAS K-~~~~~~~~~~~~~~JASS Yel.m.' I:s OI Tt Pttnt _SAME | - PEnt [....."'.........Coent t'' S,,No."I.:OP.. O ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I:es?=3. J'~1 D PD1T r'u~~~~~~~~~~~~~OOP ~ ~ ~ ~ ~ ~ ~ f S= TT/{Q) N> No:oPU!. -'.< -.. -..,mp. [ DPFA''!~~~~~=FA B _ i~:~ _.tt~ ConstntsXA, YAI:;.: YX... TPT/(TP,"' — PrFg. F.2. Flow diagram for interal rectification ru ~ ~ ~~ C~~~Ps~~~~eE~~~ T~Tale MAIN:Rerdihg I~~~~~~~~~~epd~~~~~ns: T1-~~~~~< Tbi,. DP FA C~~~~~~~~~mptrpde, Print Z. 0 Tf T17. 1m etR 1- XORIG, ORM Equatitons Z XAD~Di X. r 19. F.Ii. Flow diagam for i ic ionia

407 START In addition to the data Q and S, the input data contains an integer, No,, and a word, type. To oktain one curve, set No. = 1. To obtain N parallel curves, set No. = N, and type = SAME. To obtain N ROC curves that have a common chance diagonal point set No. = N, type - FAN, (Such a "fan" is a family of ROC curves modeled as a zero-mean unit-variance noise, with SN distribution normal with mean Q and standard deviation, 1 + kjASQ). If a number of curves are calculated, Q will be used both as the initial value and as the incre - mental Q value. TYPING The SAME comment reads:'I SHALL NOW COMPUTE A NUMBER OF CURVES WITH IDENTICAL NORMAL-NORMAL SLOPE. " It is followed by the values of S, A, No. If the type is FAN T = SQ/(1 - S) Pt XPt= =(Tt) kJAS = pt The FAN comment reads: "I SHALL NOW COMPUTE A FAN OF CURVES THROUGH THE COMMON CHANCE DIAGONAL POINT. " It is followed by the values of X A, No., kJAS. NLOOP The equations in the "compute constants" block are B= 1 -S'2 Z0 In S - S2Q /2B 0 L0o= e DPDET = Q DPSUBE = 2SQ/(i + S)

408 DPFA = QS To = S Q/B 0~.o C =2T0 BANNE R The page heading reads: 1tTHE SLOPE ON NORMALNORMAL PAPER, AN THE DPRME VALUES OF THE UNREGULARIZED ROC AT. 50 DETECTION, NEGATIVE DIAGQNAL, AND. 50 FALSE ALARM ARE." This is followed by the values of 5, DPDET, DPSUBE, DPFA, then followed by Z0 and L MAIN The MAIN equations are T2 -T1 - C T3 = S(T1-Q) T = S(T2 - Q) T5= T0 + T1 XORIG = (-T) YORIG = c (-T3) XADD =, (T-) YADD l (Ti X = XORIG = XADD Y = YORIG + YADD = Z +BT5 - 5T -. 5T2 0~ - 5 PI.39894228 e e +e /BT5 YTAN = 1 - L0 + L X These programs were used to obtain regularized binormal examples for Chapter VIII.

MANIPULATION OF p t-OMMAL SUMS Remark 1 X = n J J k km (1)ml Xknm Xn k Xn =Xni k (-1)k(n kk) C n-m nmk 0 n (G. 1) Proof: C'~- (1 - _ x n J ( )n j xJ- m j0 xJ -:- - mm j=O0j' j=O mj 3C =C (l)Jm~ C Cj xnn-m =<njLk ( m m Now CnCj - n! n! 1 j m j!(n-j)! (jnm))!m!'m! (j-m)!((nm)-(j-m))! Cn C no"m (G.2) m j"m Let r = j - m replace j 0< m < j k m k and 0 < r < k-m SO utEq 24. 4ofRef. 6is 409

410 k(-)r n-m (k-m n-m-l r=0i mk We may use Eq. G. 2 in the' followng cn C n-m- _n n-m n-k n k n-k x )m k-m m k-m n-m k m n - m Hence k nk n(1)knk n:' k C 1 k rn. Xk -m n, k n =O -m k m (G. 4) Proof: Xk- =- (In ki n-kCim Xn, Xnk = C - X =Ck, (k_)im kxn- m =' 1 lk+lk ( -m cnn n-k-1 nk+-, = (1 C k +l-m Xm -Ck+l X (G. 4) m=O

o~~~~ ~ ~ I~:' ~~~ + Cri3~~~~~~~~C 0~ o: o ItX:X-LC/: o- s' I ~-. +1 +' /:C ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~,.'.,. c~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-e CE)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~) 116~C:.. +.'E + ~. N~~~~~~~~~~N L~~~~~~~ ~r t' Cf. ~~~ I. Cf9.'i! Clit~ C) + = Cb~~~~~~~~~C I!~~~ 0 "' ~F' 3+ Url~~~~~~~~~~~~~~~:'Cr C ~ ~ ~ ~ ~ ~~~~~~~fE~~~~~~C S~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I j.F I 5' t ~~~~~~~~~~~+ $- f~~~~~~~~~~~~~~~~ ~~~~~~ 5L~~~~~~~~~~

412 Xn k +cn (Xn, k*l~Kk)= (-t (n -k- 1) Ck:~ ('!)m+l m=n * (n-k.m c.k Xn-m k+ Cn xnk (n km)(k+!:m~n m 1 + n-; k: l k n-m k+ n nw Now Ck (n-k)m:k n!(n- k)m k! Ck.fi' —"m/)(k/$'1: 2-m)'rl C.... k(n-m)(k+I.-rnn m k! (n- k)!(n -' m)(k+ l -rn)n r!(k -m)! -~ (n- 1)! k! C-l 1 k Cn _- (ki)n! (ni)! _0n-l c -n k+i ni(n- )(k )! k: ('nn"l)!i;-:1 I- k Xnk + 1 x k) (k)(n )Ck ( ()l):! 1 k- (n-l)-(m-l) n-i n-l-k (n- 1)- (m-i) Cm(i+ k 11 k n —— (1n n-!-k =(-i) (kn —k) j1) ( ) Ck X kk 1 -k X k(-) X(n -k) =Ck (-1) CX Q kE.D krn=O

REFERENCES 1. Peterson, W. W., Birdsall, T. G., and Fox, W. C., The theory of signal detectability. Trans. IRE Professional Group on Information Theory, 1954, PGIT-4, pp. 171-211. Also Peterson, W W. and Birdsall, T. G. The theory of signal.detectability. Electronic Defense Group Technical Report No. 13. Ann Arbor: Electronic Defense Group, The University of Michigan, 1953. 2. Pitt, H. R., Integration, measure and.probability. Edinburgh and London: Oliver and Boyd, 1963. 3. Halsted, L. R, Birdsall, T. G., and Nolte, L. W,, On the detection of a randomly distorted signal in Gaussian noise. Cooley Electronics Technical Report No. 129. Ann Arbor: Cooley Electronics Laboratory, The University of Michigan, 1962. 4. Roberts, R. A.,, Theory of signal detectability: composite deferred decision theory. Cooley Electronics Technical Report No. 161. Ann Arbor: Cooley Electronics Laboratory, The University of Michigan, 1965. 5. Cramer, H., Mathematical methods of statistics. Princeton University Press, 1946. First published in Sweden, Almquist and Wiksells, Uppsala, 1945. O, Abramowitz, M. and Stegun, I.A. (Ed.), Handbook of mathematical functions with formulas, graphs, and mathematical tables. National Bureau of Standards, Department of Commerce, 1964. 7. Middleton, D., An introduction to statistical communication theory. New York, Toronto, London: McGraw-Hill, 1960. B. Swets, J. A. (Ed. ) Signal detection and recognition by human observers. Contemporary readings. New York, London, Sydney: John Wiley and Sons, 1964. 9. Quastler, H. (Ed.), Information theory in psychology, Problems and methods. Glencoe, Illinois: The Free Press, 1955. 10. Tanner, Wo P., Jr., and Birdsall, T. G., Definitions of d' and 7; as psychophysical measures, JASA, 1958, Vol. 30, pp. 922-928. Also Ref. 8, p. 147, *413

414 11. Tanner, W. P., Jr., and Swets, J. A., The human use of information I. Transt iRE Professional Grou on Information Theory, 1954, PGIT- 4, pp.. 213i221. 12. Green, D..Mo, Psychoacoustics and detection theory. JASA, 1960, Vol. 32, pp. 1189-1203. 13. Tanner, W. P.,Jr., Swets, J.A. and Green, D. M., Some general properties of the hearing mechanism. Electronic Defense Group Technical Report No. 30. Ann Arbor: Electronic Defense Group, The University of Michigan, 1956. 14. Pollack, I.,, On indices of signal and response discriminability. JASA 1959, Vol. 31, pp. 1031. 15. Helstrom, C. W., Statistical theory of signal detection. New York: Pergamon Press, 1960. 16. Marcum, J. I. A statistical theory of target detection by pulsed radar. RAND Research Memo RM-754. Santa Barbara, California: The RAND Corporation, 1947. Also reprinted in IRE Trans. on Information Theory, 1960, Vol. IT-6, ppo 59-144. 17. Elliot, P. B., Tables of d'. Electronic Defense Group Technical Report No. 97. Ann Arbor: Electronic Defense Group, The University of Michigan, 1959. Also Ref. 8, pp. 651-684. 18. Blosser, A. B., A performance-oriented approach to detection: tables for detection, discrimination, and decision theory, Tracor Inc. Report No, 65..267-U. Austin, Texas: Tracor, Inc., 1965. 19. Rice, S. O., Mathematical analysis of random noise. BSTJ, 1945-1946, Vol. 23, pp. 282-332, Vol. 24, pp. 46-156. 20. Jeffress, L. A. Stimulus-oriented approach to detection. JASA, 1964, Vol. 36, pp. 766-774. 21. Birdsall, T. G. and Lamphiear, D. E., Approximations to the non-central chi-square distribution with applications to signal detection models. Electronic Defense Group Technical Report No. 101. Ann Arbor: Electronic Defense Group, The University of Michigan, 1960. 22. Ristenbatt, M. P. and Birdsall, T. G., ROC curves. Notes for EE 534. Ann Arbor: The University of Michigan, 1958.

415 23. Egan, JO P., Greenberg, G. Z., and Schulman, A. I., Operating characteristics, signal detectability, and the method of free response. JASA, 1961, Vol. 33, pp. 993-1007. Also Ref. 8, pp. 316-347. 24. Nolte, L. W., and Jaarsma, D., Detectability of recurrence phenomena. Cooley Electronics Laboratory Technical Report No. 179. Ann Arbor: Cooley Electronics Laboratory, The University of Michigan, 1966. 25. Birdsall, T. G. and Roberts, R. A., Theory of signal detectability: observation-decision procedures. Cooley Electronics Laboratory Technical Report No. 136. Ann Arbor: Cooley Electronics Laboratory, The University of Michigan, 1964. 26. Swets, J. A., Tanner, W. P., Jr., and Birdsall, T. G.,,; The evidence for a decision-making theory of visual detection. Electronic Defense Group Technical Report No. 40. Ann Arbor: Electronic Defense Group, The University of Michigan, 1955. 27. Swets, J.A., Information retrieval systems. Science, 1963, Vol. 141, No. 3577, pp. 245-250. 28. Carver, H.C. (Ed.), Mathematical statistical tables. Ann Arbor: Edwards Brothers, Inc., 1950. 29. Green, D. M., General prediction relating yes-no and forced choice results. JASA, 1964, Vol. 36, p. 1042(A). 30. Parzen, E., Stochastic processes. San Francisco: Holden Day, Inc., 1962, p. 12. 31. Kullback, S. and Leibler, R. A., On information and sufficiency. Annals of Mathematical Statistics, 1951, Vol. 22, No. 1, pp. 79-86. 32. Egan, J. P., Message reception, operating characteristics, and confusion matrices in speech communication. Hearing and Communications Laboratory AFCRC TR-57-50. Bloomington: Hearing and Communications Laboratory, Indiana University, 1957. (AD-110064). 33. Watson, C. S., Rilling, M. E., and Bourbon, W. T., Receiver operating characteristics determined by a mechanical analog to the rating scale. JASA, 1964, Vol. 36, pp. 283-288.

416 34. Egan, J. P., Schulman, A..E and Greenberg, G. Z., Operating characteristics determined by binary decisions and by ratings. JASA, 1959, Vol 31, pp. 768-773. 350 Luce, R. D., A threshold theory for simple detection experiments. Psychoa. Review, 1963, Vol. 70, pp. 61-79. 36. Faran, J. J., Jr. and Hills, R., Jr., Correlators for signal reception. Harvard Acoustical Laboratory TechnicalMemo No. 27. Cambridge: Harvard Acosutical Laboratory Harvard University. 1952, pp. 58-65.

DISTRIBUTION LIST No. of Copies Office of Naval Research (Code 468) 1 (Code 102OS8) 1 (Code 480) 1 Navy Department Washington, D. C. 20360 Director, Naval Research Laboratory 6 Technical Information Division Washington, D. C. 20390 Director 1 Office of Naval Research Branch Office 1030 East Green Street Pasadena, California 91101 Office of Naval Research 1 San Francisco Area Office 1-076 Mission Street San Francisco, California 94103 Director 1 Office of Naval Research Branch Office 495 Summer Street Boston, Massachusetts 02210 Office of Naval Research 1 New York Area Qffice 207 West 24th Street New York, New York 10011 Director 1 Office of Naval Research Branch Office 536 S. Clark Street Chicago, illinois 60605.Director 8 Naval Research Jaboratory Attn: Library, C6de 2029 (ONRL) Washington, D. C. 20390 417

418& DISTREBUTION LIST (CoIt.) No, of Coies Commander Naval Ordnance Laborato-ry Acoustics'Division White Oak, Silver Spring, Maryland 20907 Attn: Dr. Zaka $laws Commanding Officer Naval Ship Research & Developme-nt Center Annapiis, Maryland 21401 Commandeir: 2 Naval Undersea Research & Development Center San Diego, California 92132 Attn: Dr. Dan Andrews Mr. Henry Aurand Chief Scientist Navy Underwater Sound Reference Division P. 0. Box 8337 Orlando, Florida 32800 Commanding Officer and Director Navy Underwater Systems Center Fort Trumbull New London, Connecticut 06321 Commander 1 Naval Air Development Center Johnsville, Warminster, Pennsylvania 18974 Commanding Officer and Director i Naval Ship Research and Development Center Washington, D. C. 20007 Superintendednt Naval Postgraduate School Monterey, California 93940 Commanding Officer & Director 1 Naval Ship Research & Development Center* Panama City, Florida 32402 Formerly Mine Defense Lab.

419 DISTRIBUTION LIST (Cont,) No. of Copies Naval Underwater Weapons Research & Engineering Station.Newport, Rhode Island 02840 Superintendent Naval Academy Annapolis, Maryland 21401 Scientific and TeChnical Information Center 2 4301 Suitland Road Washington, D. C. 20390 Attn: Dr. T. Williams Mr.E E. Bissett Commander 1 Naval Ordnance Systems Command Code ORD- 03C Navy Departmenl Washington, D. C. 20360 Commander Naval Ship System s Command Code SHIPS 037 Navy Department Washington, D. C. 20360 Commander 2 Naval Ship Systems Command Code SHIPS 00V1 Washington, D. C. 20360 Attn: CDR Bruce Gilchrist Mr. Carey D, Smith Commander Naval Undersea Research & Development Center 3202 E. Foothill Boulevard Pasadena, California 91107 Commanding Officer Fleet Numerical Weather Facility Monterey, California' 93940

420 DISTRIBUTION LIST (Cont.) No. of Copies Defense Documentation Center 12 Ca meron Station Alexandria, Virginia -22314 Dr. James Probus Office of the Assistant Secretary of the Navy (R&D) Room 4E741, The Pentagon Washington, D. C. 203 50 Mr. Allan D. Simon Office of the Secretary of Defense DDR&E Room 3E1040, The Pentagon Washington, D. C. 20301 Capt. J. Kelly Naval Electronics Systems Command Code EPO-3 Washington, D. C. 20360 Chief of Naval Operations Room 5B718, The Pentagon Washington, D. iC. 20350 Attn: Mr. Benjamin Rosenberg Chief of Naval Operations Rm 4C559, The Pentagon Washington, D. C. 20350 Attn: CDR J. M. Van Metre Chief of Naval Operations 1 801 No. Randolph St. Arlington, Virgihia 22203 Dr. Melvin J. Jacobson Rensselaer Polytechnic Institute Troy, New York 12181 Dr. Charles Stutt General Electric Co. P. O. Box 1088 Schenectady, NeW York 12301

421 DISTRIBU TIO! L ST. (Cont.) No, offCopies Dr. Alan Winder EDO Corporation College Point, New York 1.1356 Dr. T. G. Birdsall 1 Cooley Electronics Lab. The University of Michigan Ann Arbor, Michigan 48105 Mr. Morton Kronengold Director, Institute for Acoustical Research 615 SiW. 2nd Avenue Miami, Florida 33130X Mr. Robert Cunningham 1 Bendix Corporation 11600 Sherman Way North Hollywood, California 91606 Dr. H. S. Hayre 1 University of Houston Cullen Boulevard Houston, Texas 77004 Dr. Robert R. Brockhurst 1 Woods Hole Oceanographic Institute Woods Hole, Massachusetts 02543 Dr. Stephen Wolff 1 Johns Hopkins University Baltimore, Maryland 21218 Dr. M. A. Basin Litton Industries 8000 Woodley Avenue Van Nuys, California 91409 Dr. -Albert Nuttall Navy Underwater Systems Center Fort Trumbull New London, Connecticut 06320

422 DISTRIBUTION LIST (Cont.) No. of Copies Dr. Philip Stocklin Raytheqn Company P. 0. Box 360 Newport, Rhode Island 02841 Dr. H. W. Marsh Navy Underwater Systems Center Fort TrumbuIll New London, Connecticut 06320 Dr. David Middleton 35 Concord Ave., Apt. #I Cambridge, Massachusetts 021-38 Mr. Richard Vesper Perkin- Elmer Corporation Electro- Optical Division Norwalk, Connecticut 06852, Dr. Donald W, Tufts University of Rhode Island Kingston, Rhode Island 028.81 Dr. Loren W. N-olte Dept. of Electridal Engineering Duke University Durham, North Carolina 27706: Dr. Thomas W. Ellis Texas Instruments, Inc. 13500 North Central Expressway Dallas, Texas 752 31 Mr. Robert Swarts Honeywell, Inc. Marine Systems Center 5303 Shilshole Ave., N.W. Seattle, Washington, 98107 Mr. Charles Loda Institute for Defense Analyses 400 Army-Navy Drive Arlington, Virginia 22202

423 DISTRIBUTION LIST (Cont,) No, of Copies Mr. Beaumont Buck General Motors Corporation Defense Research Division 6767 Holister Ave. Goleta, California 93017 Dr. M. Weinstein Underwater Systems, Inc. 8121 Georgia Avenue Silver Spring, Maryland 20910 Dr. Harold Saxton 1601 Research Blvd. TRACOR, Inc. Rockville, Maryland 20850 Dr. Thomas G. Kincaid General Electric Company P. O. Box 1088 Schenectady, New York 12305 Applied Research Laboratories 3 The University of Texas at Austin Austin, Texas 78712 Attn; Dr. Loyd Hampton Dr. Charles Wood Dr. Paul McElroy Woods Hole Oceanographic Institution Woods Hole, Massachusetts 02543 Dr. John Bouvoucos Hydroacoustics, Inc. P.O. Box 381b Rochester, New York 14610 Hydrospace Research Corporation 5541 Nicholson Lane Rockville, Maryland 2085.2 Attn: CDR Craig Olson Cooley Electronics Laboratory 100 University of Michigan Ann Arbor, Michigan 48105

424 DiSTRIBUTION LIST (Cont.): No.: of Copies r. Ray Veenkt 1 Texas Ilnstruments,. Inc. North Central Expressway Dailas,T Tex. 75222 Mail Station 208 Dr. Joseph Lapointe 1 Systems. C'ontrol, bnc. 260 Sheridan Ave. Palo: Alto, Calif. 94306 Dr.. Bruce P. Bogert Bell Telephone LabratoriesWhippany Road Whippany, New Jersey 0798-1

Sctitlt Clm"frl^tketon DOCUMENT CONtROL DATA R & D (mer,,rltfr rlaaltltlal tion nf title. hotly f ahbtfrort.nf indelui en. d nntflnn n,*at be enterned whfen the overal) report In cf, elRIled) -. oRIINAI N.. ACt VI T (Corpor, t,.thO) 2.. REPORT SECU'rlITY CLASSIFICATIoN Cooley Electronics Laboratory Unclassified The University of Michigan tb.OuP Ann Arbor, Michigan 48105 3. RCPORT ITLK, The Theory of Signal Detectability: ROC Curves and Their Character 4. Ote SCRIPTIV c ot tti (type of repotl end Inchaitve date.) Technical Report No. 177 - January 1973 S. AU 1"tr''ii (#'trt name, m Id tnttlatil, fast nafme) Theodore G. Birdsall Ap. REPORT DATtKul7. TOTAL NO. OF PAGES 7b. NO. OF RteF January 1973 450 36 *^. C nNTRAC T 4OR GIANT NO. Oa-. ORIGINAtOR'S REPORT NUMBEReIS N0001 4-67-A- 9181- 0032 036040- 1 6- T b. PROJECT NO. C. eb. OTHER REPORT NO(S1 (Any other numbers thet may be aralgned thie report) ~~~~d. | ~TR1t 77.......-___ _Arlington, Va. 22217. I". ApS'TAC' The first problem in the theory of signal detectability deals with the decision between two alternative responses, corresponding to two possible classes of causes of an observation. When the goal of a decision process is to achieve the highest quality of terminal decision, the Receiver Operating Characteristic curve (ROC curve contains all of the information necessary for the evaluation of the decision process. This present work introduces the ROC character, which is isomorphic to the ROC curve. The formal development is based on two key facts. The first is the fundamen tal theorem: if k(X) is the likelihood ratio of an observation, then the likelihood ratio of f is I itself. The second is the main theorem on ROC characters: each ROC character is isomorphic to a univariate probability distribution that possesses a moment generating function. The character convolution theorem and the character addition theorem follow directly from these. Families of ROC curves are developed from the main theorem on ROC characters. The normal, binormal, Q-table, power, and several discrete families of ROC curve have appeared in the literature. The new families include the Pearson type III, Fisher-Tippett doubly exponential, H-type, Poisson, and the regular conics. Additional families are generated from these by use of the metastatic relation, and the convolution and addition theorems. ROC curves contain information about the performance of other two-cause decisions besides the two-response decision. Several are considered that are used in the testing of human perception; namely, the symmetric forced choice decision, type II decision the rating scale rocdue on the _r s of ) D... 1 4 73 multiple observers. 1cwlty CNOV 6l5ctlon

... Security eClsSiatoion I 4. LiNK A LINK LINK C IKKEY WORO I I I I' Receiver operating characteristic curve Likelihood ratio Signal detection theory ROC characters Detection- decision devices S,'curity C'lu'lic,'io..n

UNIVERSITY OF MICHIGAN 11 015 02514 75241111111 3 9015 02514 7524