THE UNIVERSITY OF MICHIGAN COLLEGE OF ENGINEERING Department of Aeronautical and Astronautical Engineering Technical Report SINGULAR PROBLEMS IN THE DETECTION OF SIGNALS IN GAUSSIAN NOISE William L. Root ORA Project 02905 under contract with: NATIONAL AERONAUTICS AND SPACE ADMINISTRATION RESEARCH GRANT NsG-2-59 WASHINGTON, D.C. Presented at the Symposium on Time Series, Brown University, June 11-14, 1962 administered through: OFFICE OF RESEARCH ADMINISTRATION ANN ARBOR July 1962

ACKNOWLEDGMENT I should like to acknowledge a debt to Dr. T. S. Pitcher for the use of some of his unpublished work. I have also benefitted from discussions of the mathematical material with Dr. Pitcher and Professor J. G. Wendel. iii

ABSTRACT A statistical inference problem is called singular if the correct inference can be made with probability one. It has been observed by Grenander, Slepian, and others that the mathematical models used to describe the detection of radio signals in Gaussian noise sometimes appear to lead to singular inference problems. A large class of signal detection and extraction problems are examined here in the light of recent mathematical results, with the conclusion (which is necessarily a matter of opinion) that natural constraints prevent singular cases from arising. The mathematical background is the theory of equivalence or singularity of (Gaussian) measures on function spaces. A unified treatment of some of the work in this area is given based largely on work of Kakutani, Grenander, Baxter, Slepian, Pitcher, and Feldman. In particular, a very slightly modified version of Pitcher's unpublished proof of the fundamental result (due to Feldman and H6jek) on pairs of Gaussian measures, and a drastically modified version of Feldman's theorem for the special case of rational spectral densities are given in detail. v

I. INTRODUCTION In the statistical theory of signal detection one is concerned with problems occurring in electrical communication engineering involving statistical inference from stochastic processes. Most of the work in this area has been concerned with the theory of detecting or characterizing information-bearing signals immersed in noise with Gaussian statistics. The discussion here is concerned with (1) singular cases arising in this special class of problems, and (2) implications that these singular cases carry concerning the suitability of the formulation. We start from the model y(t) = s(t) + n(t) (1) where t is a real variable, n(t) is a sample function from a real-valued Gaussian stochastic process [nt) which represents the noise, s(t) is a realvalued function representing the signal, and y(t) represents the observed waveform. We assume that y(t) is known to the observer, that s(t) is not precisely known, and that n(t) is not known but has certain known statistical properties. We want to make specified inferences about s(t) from the observation y(t). The signal s(t) may be of the form f(t; o,..., 0n) where the function f is known to the observer, but the parameters a,. o,n are not. For example, in the simplest detection problem, s(t) = cf(t) where a = 0 or 1; the problem is then one of testing between two simple hypotheses concerning the 1

mean of a Gaussian process. If the parameters a,..,a are real valued, the 1 n problem may be one of point or interval estimation. All such problems in which f is known and the parameters are unknown we say are of the sure-signalin-noise type. On the other hand s(t) may itself be a sample function from a stochastic process, of which only certain statistics are known to the observer. If this is so we say the problem is of the noise-in-noise type. It is worth noting that there is also a sort of in-between case which occurs when s(t) = f(t; a,... n) where f is known and the a are random variables with known joint distribution. Properly, then, the signal is a sample function from a stochastic process Ist}; however since the structure of Ist) is much better known than that of a process specified in the usual way through its family of joint distributions, it may be more appropriate to think of the resulting problem as sure-signal-in-noise than as noise-in-noise. As in any analysis of a physical problem, the choice of an appropriate mathematical model is somewhat arbitrary, and in particular there are situations described usefully by either a sure-signal or noise-in-noise modelo Usually, in fact, such is true if the mechanism whereby the channel distorts the signal is very complicated (see, for example, the article by R. Price listed in the Bibliography). In any event, whatever inferences are to be made from the observed waveform must be made after a finite timeo If we except sequential testing procedures, we can usually fix a basic time interval say of duration T, during which all the data are collected on which one decision or set of inferences 2

is made. This interval of duration T is called the observation interval; we shall be concerned here with problems for which there is a fixed observation interval, so that y(t) in Eq. (1) will be qualified by the statement 0 < t < T(or a ~ t' a + T). Note that s(t) or n(t) may be defined for other values of T and we may want to see what happens when T is varied. In any electrical system whatever there is a background of thermally generated noise (Johnson noise, shot noise, etc.) which is generally assumed to be representable by a stationary Gaussian stochastic process, both because it is a macroscopic manifestation of a great many tiny unrelated motions, and because of experimental evidence. It is this background noise which is represented by n(t) in Eqo (1)o This noise is always present, although it may not be the chief source of uncertainty about the received waveform. Usually one assumes the autocorrelation of the process [nt) to be known (although it seems almost impossible that it could be known precisely), and the mean to be zero (which in the model of Eqo (1) is equivalent to assuming it known). Thus the entire family of finite-dimensional distributions for the [nt) process is taken to be available. For convenience we shall call the class of detection theory problems characterized somewhat loosely above, the Gaussian modelo This term is to include both sure-signal-in-noise and noise-in-noise cases and is to imply that (nt), - oo < t < o, is a stationary Gaussian process with known autocorrelation and that the observation interval is finite. Various results obtained in the past few years show that there are classes of decision problems involving a model of the kind described for 5

which a correct decision, or correct inference, can be made with probability one. Such problems will here be called singular. Slepian pointed out in 1958 that the problem of testing between the two simple hypotheses: that a waveform observed for a finite time be a sample function from a Gaussian process (xt) or from a different Gaussian process [xt), both of which are stationary and have known rational spectral density, is always singular except in a special case. From this he raised the question whether much of the noise-in-noise detection theory being developed was based on an adequate model; for it seems contrary to common sense that perfect detection of signals can be accomplished in a real-life situation. In 1950 Grenander had shown that a test between two possible mean-value functions of a Gaussian process with known statistics could be singular, even when the mean-value functions have finite "energy" (are of integrable square) and the observation period is finite. He also showed that the estimation of the "power level" of a Gaussian process with autocorrelation known except for scale is singular, again even with a finite observation intervalo These results, which are quite simple, seem not to have been known or at least appreciated by engineers working on noise-theory problems for some time after 1950. In an application of Grenander's work, Davis, however, in 1955 gave a rationalization for excluding the singular cases in the problem of testing for the mean (a sure-signalin-noise problem), and in 1958 Davenport and Root gave a different one (see Problem 146 in their book). Since Slepians's paper of 1958 there has been considerable interest in the appropriateness of the Gaussian model as it has been used in detection problems (see in particular the paper by Good). 4

I agree with the point of view that a well-posed detection theory problem should not yield a singular answer. With this as a sort of working principle, the aptness of the kind of model described above will be discussed in Section IV, where an argument is given that the Gaussian model is usually acceptable. The detection problems deal with probability measures on infinite product spaces or on function spaces. They are singular, as the term is defined here, when the measures are relatively singularo Thus one is led to the subject of relatively singular measures on function spaces, and in particular to singular Gaussian measureso In Section II a few basic results in this area are collected, and in Section III some more specialized results applicable to detection theory are giveno Proofs are given for some of the propositionso It is likely that singular measures on function spaces are of interest to some who have no interest in detection theory; for them the following material will perhaps be useful as an introductory survey, 5

II. EQUIVALENT AND SINGULAR GAUSSIAN MEASURES Since the eventual interest here is in continuous-parameter random processes, while many of the techniques involved use representations of these processes in terms of denumerably many random variables, one sometimes needs to carry relationships between pairs of measures on a Borel field to their induced measures on a Borel subfield, and vice versa. What is required usually turns out to be trivial, or nearly so, but it seems worthwhile to establish a procedure once and for all. For this purpose two simple lemmas are stated first. Let Q be a set, -B a Borel field of subsets of Q, and p. and v probability measures on 43 e The probability measures p1 and v are mutually singular (or simply singular) if and only if there is a set A E - for which p(A) = 0, v(Ac) = 0. The condition p, v singular is denoted by pJv. Consider a collection of Borel fields, each with base space Q, and measures on these fields related to each other as follows. -B is a Borel field on which there are two probability measures p,vo The completion of p, we denote by A, the completion of v, by v, and the Borel fields of sets measurable with respect to I and v, we denote by,v, respectively. Leto be a Borel field contained in bothB andv, and 40 and vo be the measures induced on o by 4 and v, respectivelyo It follows directly from the definitions that: lo If Po.Lo then p.-lv Let i, prv,, P pl, v, o', po. and v, be defined as above. Suppose now, 6

however, that 0o is equivalent to vo(i6t-vo). Let io, ~o be the completions of 4, vo, respectively, and denote the Borel field of sets measurable with respect to either 0 or vo by 0o Suppose further that t CI o, and write p1, v' for the measures induced on Q by io, vo, respectively. Then one can readily verify that: 2. Under the hypotheses of the preceding paragraph 4 = A', v v', p -v, and fo =1 =18 The application of these lemmas is to situations such as the following. Suppose there are two real-valued random processes (xt(())], (yt(c)), t c T (a linear parameter set), c E Q (an abstract set), such that the smallest Borel field containing all sets of the form [wox(t,o)E)A], A a Borel set, is the same as the corresponding Borel field containing all sets of the form [o|y(t,cu)eA]. The probability measure on l for the x-process is 4 and for the y-process is v. Suppose also there is a denumerable collection of random variables (xk), each of which is equal almost everywhere with respect to both 4 and v to a function measurable with respect to 1, and representations for both (xt) and (yt) in terms of the xk such that for every t xt and yt are equal almost everywhere, dot and dv, respectively, to functions measurable with respect to the Borel field o8 generated by the Xko Then if it can be shown that the measures o0 and vo induced on 8o are equivalent, one has that the measures 4 and v are equivalent by Lemma 2o If the measures 4o and vo are singular, then V and v are singular by Lemma lo 7

Singularity and Equivalence of Product Measures In the development to be sketched here we take as starting point a theorem of Kakutani on the equivalence or singularity of two probability measures each of which is an infinite direct product of probability measures, pair by pair equivalent. Suppose 4 and v are equivalent measures defined on the same Borel field of sets from Q, then we define, dv p(vv) = -\- dp ~P(4 J V) d The function p(p,v) thus defined has the immediately verifiable properties: O < p(p,v) - 1, p(4,v) = 1 if and only if p = v, p(t,v) = p(v,k). Let 1W(-3) be the class of all probability measures on3. The definition of p(p,l') may be extended so that p(p,') is defined for all t,V'e t (i), as follows: Let vJt~l(i3) dominate p and p' (i.e. p < < v and p' < < v). Define \ =, _ _I= dvp. dvd Then'T and'' belong to the L space L (v), and =p ( 4 ~) _ (4, i+! ~) ~(2) where the inner product indicated is the inner product for L (v). One verifies easily that for arbitrary p1 and', (+', i' ) has the same value irrespective of the dominating measure v used in its definition. Hence, Eq. (2) may be used to define p(p,p') for all l, p'~l)(i). With this extended definition it is clear that p(V,V') = 0 iffpl.'. The basic theorem is then:

Theorem 1. (Kakutani) Let (mn and (m') be two sequences of probability measures, where m and n n n mn are defined on a Borel fielden of sets from a space Qn, and mnmn. Then 00 00 the infinite direct product measures m = I mn and m' = I m' are either equivn~l n n=l n alent, m-m', or mutually singular, mJ-m', according as the infinite product 00 I I l(m,m') is greater than zero or equal to zero. Moreover n=l n n 00 p(m,m') = p(mnmn) The theorem is proved by imbeddingt (fi) in a Hilbert space in which the ordinary strong convergence is equivalent to some kind of convergence of the products of the derivatives i. The completeness of the Hilbert space guarantees the existence of a limit element which corresponds to the derivative of the infinite product measures, in the case of convergence. The imbedding is accomplished by defining a metric with the aid of Eq. (2) by: 1/2 1/2 d(I1 ) = | L- 4' f11 = [(q -4' _4 )l/ = [2(1 - p(i))] k Fdk dm7 It can be then shown that 1I - converges in L (m) to - if the k=l dm2 dm product of the p(mn,mA ) converges, the case of equivalence. Thus one has as =ik a subsidiary result that a subsequence of dA m ^converges with probability. dm' one (dm) to -- if the latter existso This last statement can be improved, dm of course, by application of the martingale convergence theorem which shows that the original sequence of partial products converges to m with probability one (dmn). 9

Gaussian Process with Shifted Mean Let (xt}, t C I, I an interval in E, be a real separable (with respect to closed sets), measurable Gaussian random process, continuous in mean square, arnd with mean zero. We take I = [0,1] for convenience; and we let S be the smallest Borel field containing all cw sets of the form [(|x(t,ci)eA}, t e I, where A is a Borel set.:Then R(t,s) = E x(t)x(s) is a symmetric, non-negative definite, continuous function in [0,1] x [0,1]; and the integral operator R on L [0,1] defined by 1 Rf(t) = R(t,s) f(s) ds, t [0,1] 0 is Hermitian, non-negative definite and Hilbert-Schmidt. We assume in addition that R is (strictly) positive definite. Then an orthonormalized sequence of eigenfunctions of R corresponding to all of its non-zero eigenvalues is a c.o.nSo (complete orthonormal set) in L [0,1]. We denote 2 eigenvalues of R by Xn, X > 0, and corresponding eigenfunctions by in(t), i.e. Rxn = Ann (Un m) = &r-m The condition that R be strictly definite is not necessary for what is to follow, but its presence simplifies the statements a little. It will be satisfied in the case that is of real interest to us, as will be pointed out in the last section. We now let a(t) and b(t) be continuous functions defined for t C [0,1] and consider the random processes 10

y(t) = a(t) + x(t), 0 _ t' 1 (3) z(t) = b(t) + x(t) 0 t - 1 These processes are measurable, separable and have the same Borel field of measurable o-sets as x(t). By the well-known representation of Karhunen and Loeve, x(t) = Z xn n(t), t E [0,1] n where the convergence is in mean-square with respect to the probability measure for each t, and where the random variables xn are given by 1 xn = x(t) n(t) dt 0 and satisfy E xx =n X n m nnm E xn = 0 Since x(t) is Gaussian, the xn are jointly Gaussian random variableso If we let 1 an = a(t) dn(t) dt 0 1 bn = b(t) n(t) dt 0 then the random variables yn = Xn + a are Gaussian and independent, as are the Zn = Xn + bno The measures On and vn induced on E by yn and zn respectively are equivalent, so the theorem of Kakutani quoted above may be 11

applied to yield that the product measures, which we denote by o and v respectively, are either equivalent or totally singular. The probability measures 0o and vo are the measures induced on the Borel field -3o C 3 generated by the xn. Then by Lemmas 1 and 2 the processes y(t) and z(t) are either equivalent or mutually singularo According to the theorem, [o and vo are equivalent if and only if II Pn convergeo One has, since Yn and zn are Gaussian dn f ( - bn)2 (- an)2 () = exp dv 2k 2k dn n n (bn - an) 2X e xp p(n + bn - 2)), n 1 0 (n____n_______ -- bn) e - exp -2V n n n 1h |< -.:? p ( - 00) ~ - )2 e 4xp 2J 2 - a V n -Co n -e -(a, - bn)2 n 82 n nJ Thus one has the result due to Grenander: 12

Theorem 2. (Grenander) The Gaussian random processes y(t) and z(t) defined by Eq. (3) are either equivalent or mutually singular. They are equivalent if the series (a - bn)2 n ------- converges, and singular if the series diverges to + infinity. n X n Two Gaussian Processes with Different Autocorrelations It has just been noted that two Gaussian processes defined on a finite interval and identical except for different mean-value functions have the "zero-one" property of being either equivalent or singular. The same result has been demonstrated for arbitrary Gaussian processes on a finite interval independently by H6jek and Feldman (1958 and 1959), who used entirely different methods of proof and obtained different kinds of criteria for equivalence. Here we shall sketch a third proof given by T. S. Pitcher in an unpublished memorandum, which yields a criterion for equivalence which is somewhat similar to that first obtained by Feldman. Suppose two real-valued Gaussian processes are defined on the interval o s t ~ 1, each with mean zero, and with autocorrelation functions R(t,s) and S(t,s) continuous in the pair t,s in [0,1] x [0,1]. We shall denote sample functions by x(t) and the respective probability measures on the space of sample functions for the two processes by 40 and 4K * Thus Eix(t) -j x(t) dl (x) =, i = 0,1 *Note that same symbol is used for sample functions of both processeso 13

and Eox(t) x(s) x(t) x(s) d4o(x) = R(t,s) E1x(t) x(s) _x(t) x(s) d (x) = S(t,s) The integral operators on L2 [0,1] with autocorrelations as kernels are written: 1 Rf(s) = R(s,t) f(t) dt 1 Sf(s) _ S(s,t) f(t) dt 0 where f(t) is any element of L [0,1]. We proceed with a series of lemmas: 3. If R and S have different zero spaces, then oLl. If Rf =0, then 1 Ei x(t) f(t) dt = 0, i = 0, 1 0 1 2 Eo x(t) f(t) dt = (Rf,f) = 0 E K x(t) f(t) d = (Sf,f) 1 k 0 Now, since S is a non-negative definite operator, either Sf = 0 or (Sf,f) > 0. In the latter case the Gaussian random variable 1, x(t) f(t) dt - e 0 has positive variance with respect to pt measure. Hence, 14

kl[xle(x) 0] = 4o[xie(x) = 0] = 1 Henceforth we assume, without any real loss of generality, that both R and S carry only the zero element in L [0,1] into zero. Then R-1, S1, 2 (R1/2), (S1/2) are densely defined, symmetric, unbounded operators. In particular, if Rn = nin, ( n' m) = nm, then for any feL [0,1] one has N f = ann an < If fN = Zann then fNf and n Na R- =E aNn n N 1 an Analogous formulas can be written for S in terms of its spectral decomposition. We shall write (R1/2)- _ R-1/2 (Sl/2.)- - 1/2 4. If S1/2R-1/2 or R1/2S-1/2 is unbounded, then %-Ll4. Suppose there exists a sequence of elements fk in the domain of R1/2 satisfying || fkl | = 1 and IS1/2R-1/2fkI - k3. Let 1 ek(X) = x(t) (R /2fk)(t) dt 0 Each ek(x) is Gaussian with mean zero, and E e2 1 (R(R-1/2 Rf) = 1 k 2 k2 Ek (S Ifk) R- fk 2 - I1/2R1/2f |2 R k4 k k15 15

Now, by the Tshebysheff inequality o [x |k e(x)2 E] 2k2 e.k so by the Borel-Cantelli lemma [ [X |e k(x) | E, infinitely many k] = 0 for every e > 0. Also, since each ek(x) is Gaussian, 1 2n l[x |k(x)| n] 1 2n and again by the Borel-Cantelli lemma, 1i [x [e k(X) | n, infinitely many k] = 0 for every n > O. That is, kto[x limlek(X) = 0] = 0 1Ix 1lim Ik(x) I = o] = 1 5. Let (ej(x)) be any sequence of real-valued- -measurable functions on the space of sample functions which are independent Gaussian random variables with respect to both 4, and., and which satisfy Eo.j = E j = 0 E9 = a >0 1 j j 16

aj and j. arbitrary positive numbers. Then the measures [, and i' induced by io and [i on the Borel field generated by the (eJ. are either mutually singular or equivalent. They are equivalent if and only if cx. -( - (1 - aj )2 < o Both statements follow from Kakutani's theorem. The first is immediate. For the second we need to calculate the product of the p. defined in that theorem. Let ~. be the likelihood ratio for e with respect to'i and L: j j = exp -- ) + log j/j ] Then ~oo re- 2 Pj = /= j/ VS exp - L j + log j de. 00. L 4 \j a 4 OJj j (e + [ )1/2 a a jJ J3 is equivalent to the convergence of the series 17

(1 -cj)2 r, \ z)2 Z 1- ___;3 4(aj P j 7 i The convergence of this series is equivalent to the convergence of 2 J Bj Z a' - -) Z (1 - J)2 < oo, since J-,Ll implies % 1l. a. 6. If Z (1 - J)2 < oo, the Radon-Nikodym derivative of'o with respect to 4 on the Borel field generated by the ej(x) is di1 Z 92 (x) + log This formula follows from Kakutani's theorem and the expression ij above. E 3 9=. (f.x f.) 1 iJ 3 J Elij = (Xf i, fj ) 18

Since R- /2 is densely defined for each i, i = 1,2,... there exists a sequence [fi.. such that lm f = f and such that hij = R l/2fi. is deiJJ jj ij iij i fined. Let 1 i (j(x) = h hij (t) x (t) dt 0 Then rim EO4ij(ik = im (Rhij, hik) = lfi12 k,j + oo k,j -oo and lim E J =ik lim (Sh h,hik) IXfi l2 k,j -+ o k,j - oo The existence of these limits implies that the sequences ([ijji have meansquare limits eio and eil with respect to both 0 and i1, and that eio and eil are measurable-8 andt, respectively. It also follows that the ([ij}j converge in mean-square with respect to 40 + p. to elements @i in L2 (p + ai ) and that io = ei [p ]' eil i [il] * Since eio and eil satisfy the second moment requirements, the ei do alsoo The ei are measurable with respect to,0 and- l. We now state the main result: Theorem 3. (Modified version of Feldman's theorem) Either (-.1l or p.oll-l. A necessary and sufficient condition that -Vt1 is that X*X =.iPi, where each Pi is the projection on the one-dimensional subspace of L [0,1] spanned by some f. from an orthonormal sequence (fi), and Z(1 - xi)2 < o 19

If p~L and random variables si are formed from the fi as in Lemma 7, then x(t) = (R1/2fi)(t) ei(x) almost everywhere dt duo and dt dp, and dto 1 2 1 d ~ (x) = exp - 2 j(x)(K- - 1) + log X. 2 J We show first that if lo and 4 are not totally singular then X*X = iPi' Pi one-dimensional, and Z(1 - hi)2 < o. For by Lemma 4 X is bounded, so X*X has a spectral decomposition, / kdP.\ Let I be the identity operator, and suppose that for some E > 0 I - P1+ is infinite dimensional. Then there exists an infinite sequence (k., 1 + e X < \ <..., and normalized fj's in L [0,1] such that (P - Pk ) f = k Hence by Lemma 7 there exist Gaussian random variables ok satisfying Eoj (x)ek(x) = 6jk and k+i E1Gj(x)ek(x) = (X*Xfj fk) = 5jk Xdd(Pfk fk k' (1 + e) But then by Lemma 5, Vo and 4 would have to be totally singular on the Borel field generated by the.'s, which is a contradiction. Hence I - P must be finite-dimensional for every C > 0. A similar argument shows that P must 1-E 20

be finite-dimensional for every E > 0. Hence X*X has discrete spectrum and X*X = \.iPi, where the P. are projections on the one-dimensional subspaces 1 spanned by the fi. If (e.(x)) is a sequence of Gaussian random variables corresponding to (fj} as in Lemma 7, then by Lemma 5, o and i1 are equivalent when restricted to the Borel field (ei ) generated by the e.'s, and 3 Z(1 - xj)2 < o, Eq. (5) holds for the restriction of po and i to 3(~i) by Lemma 6. It remains to prove the expansion of Eq. (4), for then by Lemmas 1 and 2 the equivalence of the restrictions of po and 1l to -(ei) will imply the equivalence of 40 and 41. For the dt dp. case it is sufficient to show that 1 E Jdt x(t) - Z(R1/2fi)(t)(x) 2 (6) E (6) 0 converges to zero as N -+ oo Now E x(t)ei(x) = lim E x(t)i. (x) j +00 lim E x(t) hi(u) x(u) du = lim Shij(t) J + 00 00 0 lim SR-1/fi. (t) Hence, 1 E /x(t)(R1/2fi)(t)ei(x) dt 0 21

= lim R1/2fi, SR-1/2f.) = lim (XRfi, Xfij) -* 00 mj i - +00 = (Rfi, X*Xfi) = xil R1/2fi 12 A similar verification shows that 1 E (R/2f. )(t)(x) * (R/2f )(t)e (x) dt 0 = - iji I IR1/2fi | 12 Therefore Expression (6) above can be written 1 N Rl(t,t) dt- Z i lRo/2fill2 1 0 We now show that this expression converges to zero. In fact, since S R1/2X*XR1/2 1 S(t,t) dt = Z (Sfi, fi) 0 = ~ (XXR1/ fi, R1/2fi) =Z (X j(R1/2f., f.j)f. R1/2fi) i j j1 3 =EZZx(R1/2f., f )2 =Z. Z (R1/2f, f )2 ij 3 i 3 j i i j i JIR1/2fi 112 An analogous calculation shows that Eq. (4) holds almost everywhere dt do, which completes the proof of the theorem. One will observe that the proof just given is based on an infinite-dimensional analog of the simultaneous diagonalization of two covariance matrices. 22

The representation that results, and in terms of which the derivative is written, is perhaps interesting, but it is of limited usefulness because the ei are not given explicitly. The restriction to processes with mean zero is not essential; neither Feldman nor Hajek required it, and it can be removed in the above. The proof given here is somewhat similar to Feldman's. Hajek's proof is different, and is in fact essentially information-theoretic. Let x,...,x be measurable functions on Q which are Gaussian random variables i N with respect to two different measures; and suppose they have probability densities p(x,..,xN), q(x,...,xN). The J-divergence (see Kullback and Leibler) of these two densities is defined as J = Ep log P - E log p (7) p q q q where Ep, Eq denote expectation with respect to p and q measures. The first term of Eq. (7) can be interpreted as the information in p relative to q; hence, J can be interpreted as the sum of the information in p relative to q and the information in q relative to p. Now if (xt, t e T) is a realvalued Gaussian process with respect to two different probability measures on Q, the J-divergence of the processes is = sup t t Jt U...,L t,.o.,t c T 1 n 1 n Hajek's theorem states that the processes are singular iff J is infinite, intuitively a highly satisfying conclusion. 25

In addition to those already mentioned, there are papers by Middleton and Rozanov containing results similar or related to Theorem 5.* *Other interesting results, not used here, on the differentiability and derivatives of measures corresponding to random processes are contained in Prokhorov (Appendix 2), Skorokhod, Pitcher. It should be noted that some of the material discussed can be regarded as a development of earlier work of Cameron and Martin, (not included in the Bibliography). Also it would appear to be closely related to parts of extensive work on functional integration by, e.g., Segal, Friedrichs, Gelfand (not included in the Bibliography) o 24

IIIo SPECIAL RESULTS An interesting consequence of Theorem 3 is: Theorem 4. (Feldman) If Aj and Bj are polynomials, with degrees respectively aj and bj, j = 1,2, and b > aj, then the Gaussian processes (restricted to a finite parameter interval) whose spectral densities are |Aj(x)/Bj(x)2 have equivalent measures on path space if and only if (a) b -a =b - a 1 1 2 2 (b) the ratio of the leading coefficients of A and B has the same absolute value as the ratio of the leading coefficients of A and B 2 2 The necessity of these conditions was first shown by Slepian, using a theorem of Baxter. Baxter's theorem applied to stationary processes states that if x(t) is Gaussian, real-valued, with continuous covariance function possessing a bounded second derivative except at the origin and with meanvalue function possessing a bounded derivative in [0,1] then 2n - 2 7 x( n) - x( n 2n 2 1] n=l 2 2 converges with probability one to the difference between the right-hand and left-hand derivatives of the covariance function at the origin. Suppose two processes have rational densities which violate condition (a) of Theorem 4. Then if both processes are differentiated k times, k = min (b - a) - 1, j=1,2 the sum of squared differences will converge to zero for samples drawn from 25

one differentiated, process and to a number different from zero for the other, with probability one. If condition (b) is violated and (a) is satisfied, the sums will converge to different numbers not equal to zero. Slepian showed further that by using higher order differences an equivalent test for singularity can be obtained directly without first differentiating the processes. The sufficiency (and a different proof of necessity) of the conditions of Theorem 4 was demonstrated by Feldman (1960). Feldman stated Theorem 4 as a corollary to a somewhat more general theorem in which only one of the processes involved need have a rational spectral density. This result was made to follow from his basic theorem referred to earlier, by techniques depending largely on certain properties of entire functionso Here we shall give a proof of the sufficiency of the conditions of Theorem 4 using Pitcher s conditions as stated in Theorem 35 The proof is an adaptation of Feldman's, modified to fit the different equivalence condition we are siingo In particular we shall use Feldman's lemmas on entire functions without proofo We assumne to start with that both processes have mean value zeroo The autocorrelation functions R(t,s) and S(t,s) are stationary and (with a slight abuse of notation) we write them as R(t-s) and S(t-s). They are defined for all real s,t, are integrable and of integrable square, and have rational Fourier transforms. The operators R and S on L [-1,1] are defined as before. 2 We also need now, however, to define operators Ro and SO on L2(-o, oo) by 00 (Rof)(t) = fR(t - s)f(s) ds, - 0o < t < o — 00 26

00 (Sof)(t) = S(t - s)f(s) ds, - o0 < t < o -00 Inner products and norms on L [-1,1] will be denoted by (.,.),. and on L2(-,o00) (which will be written just L ) by (.,.), |. | I, respectively. The Fourier transform 1(f) (in whatever sense it may be defined) of a function f will be denoted by f. We now proceed with a series of lemmas. 1. If f, g e L2 and are supported on [-1,1] then (Rf,g) = (Rof, g)o (Sf,g) = (Sof, g)o 2. If f, g C L, 00 00 (Rof, g)o = f R(t - s) f(s) g(t) ds dt -00 — 00 00 = R(4) tf() g(4) d -00oo and analogous formulas hold for (Sof, g)o. 3. The operator Ro is Hermitian, positive-definite, and has a positive-definite square root R/2 which satisfies 00 (R/2f, g)o =/ (())1/2] ) (k) do -00 We now further specialize the autocorrelation function R(t). In particular, let 9(x) =, u an integer - 1 (1 + x)u 27

Let p(x) = (i + x)U, then R(x) = |P(X)|2 Ip(x) 12 and 00 (Rof, g)0 = ) g() IP() l0i IP(4) 2 -00 The operator Ro has an inverse Ro1 which is unbounded but densely defined on L. Where defined, 2 RO-f = {- C Let us now define operators Ro, Q by X_1/2f _ 1( ( )) Ro/ =a-+7 IP() (k)) Qf = -(p(4)f(k)} for all f for which the expressions in brackets belong to L. Here, — 1 is the inverse Fourier transform in the sense of Plancherel theory. One notes immlediately that (Qf, Qg)o = (Roef, g)o when either side exists. By the conditions on S, we can write 2 2 (x) A(x) where A(x), B(x) are polynomials, deg (B) - deg (A) - 1 and there are no poles on the real axis o 4. Let deg (B) - deg (A) = u. Then Ip(x) 12 [5(x) - 2(x) ] has a -l1 - transforms 4(t) in L, and 28

1 1 / |(t- s) 2 dt ds = a2 < -1 -1 Proof: The inverse transform exists in the Plancherel sense, since 1 | A(x) 2 1 P(x) p(x) 2 B(x) |p(x)2 IB(x) 2 P(x) where P(x) c L. The second assertion is a trivial consequence IB(x)12 2 Now let Odenote the class of functions belonging to C for which the closure of their supports is contained in (-1,1). 5. Let f c. Then p(d) f e J, and 1 p(dd)fl = p(u)?(u) Furthermore, p(u) f(u) e L and is of exponential type. 2 6. Let (f n be a complete orthonormal sequence (c.o.n.s.) for L [-1,1], fne Let n 3 (fn),' gn P Then 00 2,m=l (Ro, gm) (S ~, g ) = a2 n,m=l o n m o on Proof: 00 00 = fnf(t) ff(s) A(t - s) ds dt 1 1 - nr t) m(s) 4(t - s) dt dS -1 -1 29

But fn(t) fn(s) is a c.o.n.s. in L ([-l,l]x[-l,l]), hence 21 1 1 (nm= Ro, -m) ( -(S =(t ms)|2 at ds = a2 nm=l m on o -1 -1 7. Let A = S /2Q Then (I - A*A)fn f [) = a2 n,m=l n o Proof: This follows from Lemma 6 since (I - A*A)f f m = (f= n fm) - (Afn Afm)0 = (Ron, ) - n o n' Am)o 8. The sequence (Zn}, zn = R1/2 Q is an o.nn.s in L2[-l,l]o Proof: Qfn is defined and has its support contained in (-1,1). Hence R 1/2Qf is defined Then (Zn Zm) = (R1/2Qfn, R1/2Qfm) (RQfn, Qfm) = (Rofn Qf m)o = (fn fm ) by Lemmas 1 and 35 9. If E is the closed subspace of L [-1,1] spanned by the z n, then L [-1,1] Q E is finite dimensional. Proof: Let Y = L [-1,1] Q E. Then y e Y if and only if (z, y) = (R1/Qfn y) = (Qf, R/2y) = 0, n = 1,2,.. We know that the orthogonal complement of the closed subspace spanned by (Qf n is finite dimensional, say of dimension N-by Feldman (1960)-Lemma 5. So now suppose that Y is of dimension greater than N. Then there are 3o0

Yk e Y, k = 1,2,..., N + 1, such that for any choice of numbers Ok not all N+l zero Z Ckyk 0. Hence [ N+l 7 N+l R1/2 L Ykj =k c (Rl1/2yk) / 0o by the strict definiteness of R and hence of R1/2. Since R1/2yk # 0, this contradicts the fact just stated that the orthogonal complement of the subspace spanned by (Qfn) has dimension N. Hence Y is of dimension N. 10. The operator S1/2R /2 is defined and bounded on a dense subset of L [-1,1] and hence has a bounded extension X with J'(X) = L [-1,1]. The bounded self-adjoint operator I - X*X is Hilbert-Schmidt on L [-1,1]. Proof: From Lemma 7 it follows routinely that A is bounded. Since (S1/2- /1/2, S1/2R-1/2z.) = (S/2fi, S1/2fj) = (SQf, Qfj) = (SQfi Qfj)o = (So/2Qfi S/2Qfj) = (Afi, Afj) one has I| Xzn | = | Af I| | B. Hence X is densely defined and bounded on the closed linear manifold E spanned by the Zn, and can be extended to a bounded operator on Eo Furthermore S1/2 R1/2 is densely defined on the finite-dimensional subspace L [-1,1] ( Eo Hence S1/2R-1/2 has a unique bounded extension X with domain L [-l,l]o In order to prove the second assertion we augment the oon. sequence (z }, n = 1,2,..., with elements z N+l, ZN+2 *Zo so that (Zn, n = -N, -N+l,... is a c.o.n.S. for L [-l,1]. Then 31

00 2 00 ~Z ((I - x*x)zizj) = Z + Z + ~ i=-N+l i,j=l i=N+l,...,0 j=N+l,...,O j=-N+l j=-N+l,..., i=-N+l,..,oo 2 + ~ ((I - X*X)zi,z.) i=-N+1,...,0 j=-N+l,...,0 By the preceding calculation, the first sum on the right is equal to oo 2 F, ( (I - A*A)f.,f.) = a2. The second and third sums are finite since i,j =1 2 Z I( (I - X*X)zk,Zj) = I (I - X*X)zkl 2 = III - X*XI, and the fourth sum J is obviously finite. Thus I - X*X is Hilbert-Schmidto The sufficiency part of Theorem 4 now follows directly from Theorem 3. Although there are various criteria for the equivalence of Gaussian measures, Theorem 4 is particularly apt for noise-in-noise detection theory problems because it states a criterion for equivalence that is fairly general and is explicitly in terms of properties of the autocorrelation functions. Results of this kind for wider classes of processes would be useful. For discussing singularity and equivalence in sure-signal in noise problems, the following theorem can be used in connection with Theorem 2. Theorem 5. (Kelly, Reed, and Root) Let R(t) be a stationary, continuous autocorrelation function with the propertie s 00 (1) |R(t) | dt < oo -00 (2) The integral operator defined by 32

T RTf(t) = R(t - u) f(u) du -T is strictly positive definite for every To Let (nT}, ([ (T)} be respectively a c.o.n.s. of eigenfunctions and the set of associated eigenvalues of RT. Then if s(t) e L, s (T) = (s, nT), T(I ) T_ 2 n n,T' is an L - Fourier transform of s(t), and R(4) is the Fourier transform of 2 R(t), 00 |S ( 2 00 nC n(T) 2 / I() d1), as T - o n=l (T) n in the sense that the left-han side converes monotonically if the righthand side exists and diverges monotonically to +co otherwise, One can show by example that the sum on the left side above may be finite for fixed T while the integral on the right diverges, even with the support of s(t) contained in (-T,T). A recurring hypothesis in the preceding discussion has been that if (xt) is a stationary random process with autocorrelation function R(t), the integral operator RT as defined above is strictly definite, or what is equivalent, RTf = 0 implies f = Oo For a large class of processes this is true; an essentially well-known sufficient condition, useful for our purposes is the following theorem. Theorem 60 Let the random process (xt, - oo < t < oo be defined by the stochastic integral 00 x(t) = I h(t - u) d:(u) — 00 33

where (It) is a Brownian motion, and h is a real-valued function in L o tCi 2 Then if R(t) = E Xuxu+tj the operator RT, T > 0, is strictly positive definite ~ The proof follows easily from inspection of (RTf, f) written in terms of the Fourier transforms of R(t) and f(t). 54

IV. SUITABILITY OF THE STATIONARY GAUSSIAN MODEL As remarked earlier, it seems unreasonable to expect that arbitrarily small error probabilities can be achieved in a radio communication or radio measurement system, which is what Theorems 2 and 4 might appear to show if the Gaussian model is to be believed. The two most commonly offered explanations of why these results do not really violate intuition are first, that the measurements are always inaccurate, and second, that the a priori data are always imperfect-in particular, autocorrelation functions and spectra are not completely or precisely known. Both explanations are obviously true statements, but I feel they do not meet the objection raised. Neither shows the existence of an absolute lower bound on error probabilities. With enough care and elaboration in obtaining a priori data and in making and processing the measurements, it would seem that arbitrarily good performance could still be achieved in some instances. So, although these points are important, I shall try to explain away the paradox of the singular cases in a different way, in fact in the simplest way possible, by showing the existence of constraints that prevent their occurrence. The essence of the explanation is that in all cases we know about, singularity occurs only if the spectral densities of the two signal plus noise processes differ at infinity, but a reasonable model of the problem indicates that the spectral densities at infinity are always determined by the residual noise, and hence are the same for both.* *This idea appears in Davenport and Root, in Middleton, and is developed at some length in Wainstein and Zubakov, Appendix III.

To fix the domain of the argument, consider the class of systems that may be represented as in Fig. 1. A signal s/// (t) is generated, processed at the transmitter, sent through the channel, received, and processed at the receiver. Gaussian thermal noise is added everywhere, s'(t;a) + n'(t) ////) | Transmitter s"(t) Channel Receiver s //(t) Processor B Processor A CB Thermal Noise y(t) = s(t;c) + n(t) Figo 1 but presumably the most important increment of noise is added at the point where the signal power level is lowest, at the input to the receiver, as indicated in the figure. The generated signal, s ///(t), has finite energy, that p 2 is j s(t)1 dt < o, and begins and ends in a finite time intervalo It is arbitrary, but once chosen is fixed, even though we may let the observation interval, T, change. The processing at the transmitter and at the receiver must preserve the finite energy constraint and must be realizable in the usual sense that the present does not depend on the future. The channel must meet these same conditions; it may, however, perturb the signal into any one of a parametrized family of functions. The output of the receiver processor is the observed waveform, which is available for decision making. In different contexts the receiver processor might be taken to be a whole radio receiver 36

in the usual sense; it might be only the antenna system at the receiver, or anything between these two extremeso In fact, in a particular instance there can be a good deal of arbitrariness about the breakdown into transmitter, channel, and receivero Always, however, the noise has one property: there is at least a part, generated by thermal mechanisms, which can be thought of as entering the system as white noise, or as white up to frequencies at which quantum effects become important. Let us look first at sure signals in noise. For one of the simplest situations the observed waveform is y(t) = as(t) + n(t), 0 S t = T where n(t) is stationary, Gaussian, of mean zero and with a known continuous autocorrelation function R(t), as prescribed for the Gaussian model; where s(t) is known and of integrable square on [O,T], and a is unknown but either zero or one. A statistical decision is to be made as to whether a is zero or oneo As Grenander observed in 1950 this problem, with no further constraints imposed, can be singular in two ways. First, the integral operator RT with noise autocorrelation as kernel may have a non-zero null space while s(t) has a non-zero projection in this null space. Then there is an element c L [0,T] such that (4' n) = 0, n = 1,2,o.., ([n a complete set of eigenfunctions for R, but (~,s) t 00 Obviously, then, the statistic (4,y) will distinguish between the two hypotheses with probability one Second, the series n 37

may diverge, so that again, from Theorem 2, there is a test to distinguish between the two hypotheses with probability one. Suppose now, however, that the receiver processor C is linear as well as realizable and in fact can be represented by an integral operator with L kernel h(t)o Then from Theorem 6 R has 2 a zero null space, and the first kind of singularity mentioned above cannot happeno Let ii(li) be the Fourier transform of h(t) (i.e. h(v[) is the so-called transfer function of C), then 00 00 S( 2 d= s (- i l (2 ) 2 di < o (8) /\2,~ (Jj)12 - R00 () - R1() so by Theorem 5 the second kind of singularity mentioned cannot happen either. Indeed, for any observation interval T, 00 oo, ilsn|l ~ I ~(~) I\ 12 d~ (9) n X d n -0o and for a maximum-likelihood test (non-zero) error probabilities may be calculated depending only on the quantity on the left side of the inequality, which plays the role of a signal-to-noise ratioo Now suppose the channel perturbs the signal by delaying it, shifting its frequency spectrum, changing its amplitude, etc. As long as it does not amplify the signal to give it infinite energy, a bound of the kind in Inequality (8) still exists, and the detection problem is non-singular. The situation is a little different if a radio measurement is to be made. The signal will be known to exist and a statistical estimate is to be made of the parameter a in s(t;a). Let a, a be any two possible values of a (which may be vector1 2 valued). Then the two Gaussian processes 38

yt = s(t;cl ) + nt, O t T yt = s(t;a2) + nt, 0 5 t S T are mutually singular if and only if z Isn(l) - Sn(02)12 + n X n Again, by an application of the Schwarz inequality, and with the conditions on the noise imposed above, this series cannot diverge if 00 Is (t;ci) 12 dt < oo, i = 1,2 -00 as we have assumed. The conclusion does not depend on whether a is considered to be an unknown or a random variable Two weaknesses in the above argument are the assumptions that the receiver processing is linear and that the noise enters the system as pure white noise. Let us try to patch these upo First, the point of observation at which y(t) is available after the noise has been introduced (actually noise is introduced everywhere, as mentioned) is arbitrary for purposes of discussion. Thus if it is possible to observe the processed received waveform at some point past the point of noise entry where the waveform is a linear functional of s'(t;a) + n'(t), y(t) can be taken as the waveform at that point and the above arguments apply. No further processing of the sample functions can reduce the problem to a singular one. Second, I suggest that there is no mechanism for generating the signal s///(t) so that the square of its Fourier transform falls off faster at infinity than thermally generated noise, and that the filtering action of the 39

transmitter and channel is such as to attenuate the Fourier transform of the signal at high frequencies by more than the reciprocal of the frequency (the effect of a simple R-C filter). If this be true, then obvious modifications of Eq. (8) will restore the argument for non-singularity. The discussion for noise in noise is similar to the foregoing, and can therefore be shortened. Consider the simple detection problem: y(t) = P si(t) + n(t), 0 5 t _ T, i = 0,1 where s (t) - 0 and s (t) is a section of a sample function from a stationary Gaussian process with mean zero. f is a constant, We assume (slt) and (nt) are mutually independent, so that (yt) is again a Gaussian process under either hypothesis. The only readily applicable criterion available for the singularity of two stationary Gaussian processes is that of Theorem 4; so we require the processors and channel as shown in Figo 1 to be linear with rational transfer functions. Then i6 Int) is white noise and (s//t), i = 0,1, has rational spectral density, (Yt) has rational spectral density under either hypothesis. If the transmitter and channel have an over-all transfer function which vanishes at least as the reciprocal of the frequency at infinity, then the behavior of the spectral density of (yt) at infinity is determined entirely by the noise, (nt), under either hypothesis. Thus by Theorem 4 the non-singular case obtains, for any observation interval T. Obviously, operations on the transmitted signal of translation (time delay) or 40

amplification or linear combinations of these do not affect this conclusion * The aim here has not been to try to "prove" the faithfulness to reality of the Gaussian model, which would be foolish, but merely to try to rescue it from one rather important apparent difficulty. This seems to me to be important if the Gaussian model is to be used with confidence as a basis for more sophisticated analyses. *The concept of band-limited noise, which is common in engineering literature, does not appear here. Actually, band-limited noise is a special case of the class of analytic Gaussian processes, which has been completely characterized by Belyaev. It is redundant to our argument, but perhaps of interest, to note that neither received signal nor noise can be analytic with the constraints adopted here. See Belyaev, Theorems 2 and 3. 41

BIBLIOGRAPHY Baxter, Go, "A Strong Limit Theorem for Gaussian Processes," Proc. A. M. S., v. 7, n. 3 (1956) ppo 522-528. Belyaev, Yu. K., "Analytic Random Processes," Theory of Probo and its Applications, v. IV, n. 4 (1959) PP. 402-4090 (Translation.) Davenport, W. and Root, Wo, "Introduction to the Theory of Random Signals and Noise," McGraw-Hill (1958), New York. Davis, Ro, "On the Detection of Sure Signals in Noise," J. Applo Phys., v, 25 (1933) P. 76. Feldman, J., "Equivalence and Perpendicularity of Gaussian Processes," Pacb J. of Math., v. 8, no 4 (1958), pp. 699-708. Feldman, J., "Correction to Equivalence and Perpendicularity of Gaussian Processes," Pac. J. of Math., v. 9, n. 4 (1959), pp. 1295-1296. Feldman, J., "Some Classes of Equivalent Gaussian Processes on an Interval, Paco J. of Math., v. 10, n, 4 (1960), pp. 1211-1220. Good, I. Jo, "Effective Sampling Rates for Signal Detection: or Can the Gaussian Model be Salvaged," Information and Control, Vo 3 (1960), p. 116140o Grenander, Uo, "Stochastic Processes and Statistical Inference," Ark, f Mat., v. 1 (1950), pp. 195-277. Hajek, J., "On a Property of Normal Distribution of any Stochastic Process," Cy. Math. J., v 8 (1958), pp. 610-617. (Also: Selected Translations in Math. Stat. and Probo, v. 1 ppo 245-253.) Kakutani, S., "On Equivalence of Infinite Product Measures," Ann. of Math., v. 4? (1948). Kelly, Reed, and Root, "The Detection of Radar Echoes in Noise, I," J. Soco Indust. Appl. Math., v. 8, n. 2 (1960), ppo 309-341o Kullback, S., and Leibler, R. A., "On Information and Sufficiency," Ann. Matho Stat., v. 22 (1951), pp. 79-86. 42

Middleton, D., "On Singular and Nonsingular Optimum (Bayes) Tests for the Detection of Normal Stochastic Signals in Normal Noise," IoR.E. Trans. IT-7, n. 2 (1961), pp. 105-113. Pitcher, T., "Likelihood Ratios for Gaussian Processes," Unpublished M.I.T. Lincoln Laboratory memorandum. Pitcher, To, "Likelihood Ratios of Gaussian Processes," Ark. fo Mat., v. 4, n. 5 (1959), PP. 35-44. Pitcher, T., "Likelihood Ratios for Diffusion Processes with Shifted Mean Values," Trans. A.M.S., v. 101, n. 1 (1961), ppo 168-176. Price, Ro, "Optimum Detection of Random Signals in Noise, with Application to Scatter - Multipath Communication, I," I.R.E. Trans. IT-2, no 4 (1956), pp. 125-1355 Prokhorov, Yu. U., "Convergence of RandOm Processes and Limit Theorems in Probability Theory," Theory of Prob. and its Appl., v. 1, n. 2 (1956), pp. 157-214. (Translations.) Rozanov, Yu. A., "On a Density of One Gaussian Distribution with Respect to Another," Teor. Veroj. i Primo, Vo VII (1962), pp. 84-89. Skorokhod. A. V., "On the Differentiability of Measures which Correspond to Stochastic Processes, I.," Theory of Probo and its Appl., v. II, n. 4 (1957), ppo 407-432. (Translations.) Slepian, D., "Some Comments on the Detection of Gaussian Signals in Gaussian Noise," I.R.E. Trans. IT-4, n. 2 (1958), pp. 65-68. Wainstein, L. A. and Zubakov, V. D., "Extraction of Signals from Noise," (Translation from the Russian), Prentice-Hall (1962). Englewood Cliffs. 45

UNIVERSITY OF MICHIGAN 3 901 39w 4