TECHNICAL REPORT ECOM-01870-14-T October 1967 OPTIMAL SPACE-TIME SIGNAL PROCESSING AND PARAMETER ESTIMATION WITH EMPHASIS ON ITS APPLICATION TO DIRECTION FINDING SEL Technical Report No. 16 Contract No. DA 28-043-AMC-01870(E) DA Project No. 1P021101 A042.01.01.02 Prepared by Donald G. Mueller The University of Michigan SYSTEMS ENGINEERING LABORATORY Department of Electrical Engineering Ann Arbor, Michigan for UNITED STATES ARMY ELECTRONICS COMMAND, FORT MONMOUTH, N. J. Each transmittal of this document outside the Department of Defense must have prior approval of CG, U. S. Army Electronics Command, Fort Monmouth, N.J. ATTN: AMSEL-WL- S

- C% f - f a

ERRATA Page Line Correction 11 15 Change "where" to "were" 14 4 Insert the word "finding" between "direction" and "systems" 38 7 Delete "s" on "proceedings" 67 24 Insert the word "a" between "any" and "priori" 71 11 Insert the words "necessary and" between "is" and "sufficient" 72 4 Place quotation marks around the word "nondecreasing" 76 7 Change "of" to "or" 77 18 Insert the word "of" between "class" and "processes" 81 4 Change "zero mean" to "zero-mean." 82 3 Change "classify" to "clarify" 85 20 Insert the words "the "equivalence classes" of" between "includes" and "all".102 18 Change ",-" to "=",A A 106 14 Change "Let Po and Pi" to "Let Po and Pi" 107 7 Change "Conversely, it can" to "But, by symmetry, it can also" 108 11 Change "on" to "to" 109 2 Change "Bj" to " j 115.L Change "defined" to "define" 145.18 Change "ki' to "i, 200 20 Change "j -:" to "kEK'" and'"Zil T to'IZi]k

TABLE OF CONTENTS Page LIST OF FIGURES v LIST OF SYMBOLS vii ABSTRACT xvi CHAPTER 1. INTRODUCTION 1 1.1 Brief Statement of the Problem 1 1.1.1 History of Direction Finding 2 1.1.2 Basic Elements of a Direction Finding System 5 1.2 Approaches to the Problem 8 1.2.1 Stochastic Optimal Control Theory 9 1.2.2 The Theory of Estimation 10 2. SYSTEM MODEL FOR CONTROL THEORY SOLUTION 13 2.1 Introduction 13 2.2 System Model 14 3. THE OPEN-LOOP SOLUTION 21 3.1 Introduction 21 3.2 Assumptions 21 3.3 Results 23 3.4 Solution 24 4. THE CLOSED-LOOP SOLUTION 33 4.1 Introduction 33 4.2 Assumptions 33 4.3 Results 34 4.4 Solution 35 5. DISCUSSIONS OF CONTROL THEORY SOLUTIONS 42 5.1 Discussion 42 6. STATEMENT OF THE ESTIMATION PROBLEM 54 6.1 Introduction 54 6.2 Statement of the problem 54 6.3 Assumptions (General Case) 58 6.4 Assumptions (Special Case) 63 6.4.1 Independent, Identically Distributed Noise 66 6.4.2 Independent, Identically Distributed (Except for an Amplitude Factor) Noise 66 iii

TABLE OF CONTENTS (Concluded) CHAPTER Page 7. GENERAL SOLUTION OF THE ESTIMATION PROBLEM 67 7.1 Introduction 67 7.2 Mathematical lPreliminatie s 68 7.2.1 Probability Theory and Stochastic (Random) Processes 68 7.2.2 Functional Analysis 84 7.3 Solution 96 7.4 Summary and Discussion of Results 138 8. SPECIAL CASES 142 8.1 Introduction 142 8.2 Independent, Identically Distributed Noise 143 8.2.1 Example 166 8.3 Independent, Identically Distributed (Except for an Amplitude Factor) Noise 171 9. ERROR ANALYSIS 173 9.1 Introduction 173 9.2 Error Analysis (General Case) 173 9.3 Error Analysis (Special Case Solution Presented in Section 8.2) 175 9.3.1 Example 183 10. SUMMARY AND CONCLUSIONS 190 APPENDIX I. A RELATIONSHIP BETWEEN a-FIELDS 194 II. PROPERTIES OF THE OPERATOR Ri 196 III. A PROPERTY OF THE PROCESSES [Zi] AND [zO] 200 IV. A CONVERGENCE THEOREM 203 V. COMPARISON OF "MINIMUM PROBABILITY OF ERROR" AND "MAXIMUM A POSTERIORI PROBABILITY"' PARTITIONS 205 REFERENCES o09 iv

LIST OF FIGURES Figure Page 1.1. The Bellini-Tosi goniometer for sequentially sampling a pair of the Bellini-Tosi array. 4 1.2. A generalized direction finding system. 6 2.1. A phased array DF system. 15 2.2. Simplification of the DF system illustrated in Fig. 2.1. 17 2.3. System model to be analyzed in Chapters 3-5. 19 3.1. Plot of qj vs. (m -u'). 29 5.1. A plot of the deviation of the sample y. from yj as a function of a in the vicinity of m wheA u1 = m + 1,,2 =0 and o2 = O. a 45 S in 5.2. A plot of the deviation of the sample y. from yj as a function of a in the vicinity of m whef u = m +l/2, G2 = a. = a2 = 0, and A represents the noise in the sample. 47 5.3. Plot of the deviation of the sample y. from y. as a function of a in the vicinity of m wa en u! =Jm +2 a2 0 anda2 =. a 48 5.4. Plot of the deviation of the sample y. from yj as a function of a in the vinicity of m wlen ul = m -2u, O2 0 0 and 2 =a a 49 5.5. A plot of (C1+C2)/(1+C1+C2) vs. C2 for various values of C1. 51 6.1. Geometry of the receiving array relative to the direction of arrival of the signal. 6 6.2. Processing system to be used. to provide an estimate of the direction of arrival of the signal. 57 6~3. Typical Fourier transform of [rc(T)]jk which represents the cross-covariance function of the processes driving the jth and kth antenna of the receiving array. 64 8.1. Operations performed by Hi. 149 8.2. Regions of integration for Eq. (8.23). 156

LIST OF FIGURES (Concluded) Figure Page 8.3o Device for computing [Y1] jk~ 156 8.4. Device for computing [Y2]jk. 157 8o5. Alternate device for computing [Y1] jk 159 8.6. Alternate device for computing [Y2]jk. 161 8.7. Geometry of the receiving array relative to the incident wave. 167 8.8. Noise power spectrum. 167 8.9. Low-frequency noise power spectrum. 169

LIST OF SYMBOLS General notes: Vectors in real Euclidean spaces will be underlined while vectors and matrices whose components are real-valued functions will be enclosed in brackets. The components of a vector v in J-dimensional Euclidean space will be denoted by vl,v2,... vJ and the components of a K-dimensional vector of real-valued functions [v] will be denoted by [V]1,[V]2,...,[ v] K. The components of a KxK matrix of real-valued functions [V] will be denoted by [V]ll,[V]12..., [ V]KK. Symbols which are used with several indexes in the text will be represented below with the indexes i, J, and k. Also, symbols which depend on other variables will be written both with and without the dependence indicated. Finally, the symbols i and P with a variety of subscripts and superscripts will be reserved to denote o-fields and probability measures. First Symbol Des cription Reference a the true angle of arrival 14 a an estimate of a 22 e cO a possible angle of arrival 61 aJ the true angle of arrival relative to the jth reference direction 55 a0 a reference direction 14 an optimal estimate of a 16 6 the signal strength at the array site 16'r5, the filter and control functions 37 J 7y4 the fourth moment of a about its 26 me an A,A,A small variations in the variables Y, n, and y vii

LIST OF SYMBOLS (Continued) First Symbol Description Reference A a change in an estimate of a due to the presence of noise 46 A,A changes in an estimate of a due to imprecise knowledge of 8 46 1 A82 AS A82 components of A and A 46 al al a2 a2 O1 oa2 6 a positive constant 59 6( ) the Dirac delta function 39 O,c',c1,s2 small positive constants 22 (aCU1,u2) the difference between C(o,u 1u2) and the first three terms of the power series expansion of C(a,ul,u2) 22 e the error in a direction of arrival estimate 174.n(t),nr,, the noise in the DF model, its jth sample, and a vector of such samples 18 {n },{n.} sequences of random variables which allow a "decomposition" of the process [zi] 120 e an unknown phase angle 16 A. the ith element of a partition of the observation space 61 A*,i the ith elements of optimal partitions of the observation space 110 Ak the kth eigenvalue of the operator k R' 122 Pk the kth eigenvalue of the operator I-X*X 128 E1,Z2 summations which allow the derivation of a bound on Et 2} 183 2, a2 2 the variances of the random variables a,8, and nr 20 T time variable 66 viii

LIST OF SYMBOLS (Continued) First Symbo 1 Des cript ion Reference TX the time of propagation of a signal from the jth antenna site to the origin 59 an eigenvector of the operator I-X*X and an element of a basis for the null space of I-X*X 120 i[t] the jth eigenvector of the operator Ri 122 Q( ) the region of variation of the variables listed in the parentheses 34 Q the observation space 96 the "frequency" variable of a Fourier transform 63 w the center frequency of a bandlimited signal 14 [Al(ai)],[A2(ai)] matrices which are deterministic functions of a. 153 a a constant related to the bandwidth of the noise 166 ak a positive number 66 B(ao) a bias of the statistic.o 164 1 1 B, represents a device which delays and amplifies signals received by the jth array element 14 a o-field of subsets of Q 99 bo,bo,b the filter bias, its optimal and suboptimal values 22 bj,b the "weight" applied to the jth sample by the filter and a vector of such "weights" 22 3bbb1,2 the optimal and several suboptimal forms of the vector b 22 b a constant related to the bandwidth of the signal 166 ix

LIST OF SYMBOLS (Continued) First Symbol Description Reference C'(a,3,u1,u2) a function which summarizes the effects of the variables a,S,ul, and u2 on the signal s'(t) 16 C"(,u1,u2) the amplitude antenna pattern 16 C(a,u1 u2) an approximation for C"(a,u1,u2) 18 C1,C2 variables which depend on the design parameters of the DF system 50 C a positive constant 59 c,c,c' constant vectors 23 c velocity of propagation of signals 55 D,DE the covariance function of the random vector y and the error that results when it is approximated 25 Do a constant necessary for the evaluation_of the Radon-Nikodym derivative dP./dP 133 1 o D the separation of array elements in the example 167 det{ ) the determinant of the quantity within the brackets 163 dPi/dP the Radon-Nikodym derivative of P1 with respect to P 110 1 0 dd the expectation of a particular vector and the error that results when it is approximated 25 E{ } the expectation of the quantity within the brackets 23 E.{ } the expectation of the quantity within the brackets when P. is the probability measure 108 EsEN constants which indicate signal and noise levels present 166 [F(w)] form of signal and noise power spectrums in special cases 63 x

LIST OF SYMBOLS (Continued) First Symbol Description Reference FN(w),FS(w),Fh(W) the Fourier transforms of rN(T), rS(T), and h(T) 148 + + F (w),FN(w) positive portions of the spectrums FS(W) and FN(W) 150 FS (w) FN (W) the Fourier transforms of rS (T) o o and rN (r) o 151 Fh (w),Fh2(W) functions of w which allow the determination of Fh(w) 151 3 { } the Fourier transform of the quantity within the brackets 151 G(oL ),G.(u.) expressions which are minimized by an optimal estimate and the optimal jth control 35 H. a bounded self-adjoint operator which allows the evaluation of the Radon-Nikodym derivative dPi/dP 133 ~~~~~T 00~1 H,H. integral operators whose kernels 1 1 are hT(t,s) and h'(t-s) 145 1 1 H an integral operator whose kernel is h(T) 164 [h T(t,s) ],[h(t-s)] kernels of linear filters which allow the evaluation of the statistic Z(a) 65 h(i) an impulse response from which [hT(t,s)] can be constructed 147 hl(T) an impulse response related to h (T) 152 h (Tl),hl (T) the real and imaginary parts of lr 11 hl(T) 152 A1(T),h2(T) realizable impulse responses 155 I the identity matrix and the identity operator 26 Io the value of the Radon-Nikodym derivative dP./dP at the observation [y] 1 o 139 xi

LIST OF SYMBOLS (Continued) First Synmbo 1 Description Reference Im( ) the imaginary part of the quantity within the parentheses 152 J the total number of samples of the observation interval 18 K the total number of antennas in the array 14 K' the value of (K-1)/2 183 L a loss function 35 L2[T] the space of all K-dimensional square integrable functions on [OTf] 87 v'(ca,) a quantity related to the array antenna pattern 178 PIL,Z(a) the value of (Hi[y],[y]) which allows the evaluation ofthe RadonNikodym derivative dP./dP 142 1 O i a vector drawn from the origin to the kth array element 55 M a real integer and the set of integers {1,2,...,M} 61 M the set {o,1,2,...,M} 97 m,m the means of the variables a and B 21 [N(t)] the corrupting noise process 59 [N1],[N2] the noisy components of [Y1] and [Y2] 175 N. the noise term corrupting the statistic C(al) 180 1 1 -2 N,N quantities which are related to the noise energy in the system 182 N( ) the Hilbert-Schmidt norm of the operator within the parentheses 95 xi i

LIST OF SYMBOLS (Continued) First Symbol Description Reference PO a complete probability measure on the a-field i 99 P the probability of error 61 Pr( ) the a priori probability of the event within the parentheses 61 Pri( ) the probability of the event within the parentheses given a. is the true direction of arrival 61 pi,p a term in the power series expansion j pof C(a,u,u2) and a vector of such terms 0 22 p( ) the probability density of the variables listed within the parentheses 22 pp the values of p for different values of the control u1 23 p ( ) the probability density of the noise samples 34 p(xl/x2) the conditional probability density of x1 given x2 36 q._q q1l 3q2 a term in the power series expansion - - of C(c,u,u2), a vector of such terms, aid the values of q for two forms of ul 22 qmax the maximum length of the vector q 23 RT the space of all real valued function on the interval [O,Tf] 96 RioR* an integral operator on L2[T] and its adjoint 105 1/2 -1/2 R /,R._ the square root of Ri and its inverse 105 Re( ) the real part of the quantity within the parentheses 150 rrr a term in the power series expansion of C(a,u1,u2), a vector of such terms, and thy value of r for a particular ul - 22 xiii

LIST OF SYMBOLS (Continued) First Symbol Descript ion Reference [rO(t,v)],[r(a) (t,v)], the covariance functions of the [r (t,v)] processes [N(t)], [Sm(t)], and [ye(t)] 59 rN(T) the covariance function of the noise in a special case 66 rs(t,v) the covariance function of the noise-free process driving the array element located at the origin 59 rS (T),rN (T) the covariance function of proco o esses related to the "slowly varying" portion of r (T) and rN(T) 150 S a quantity related to the signal energy received by the array 178 [S1 ],[S2] the pure signal portions of [Y1] and [Y2] 176 Sf',Sff "shift" operators 163 i f [Sa(t)],[s(t)] the signal process driving the array and its sample function 55 sl(t),s2(t) the quadrature components of a received sample 176 s'(t) a noise-free signal in the DF system model 16 Tf the end of the observation interval 16 Tr{ } the trace of the indicated matrix 121 tr{ } the trace of the indicated operator 95 t,t the time variable and its jth sample 16 U a Hilbert-Schmidt operator 132 ul(t),u1,u1 the pointing angle control, its jth a- sample and a vector of such samples 16 xiv

LIST OF SYMBOLS (Concluded) First S ymb 0 Description Reference u(t),u.,u another representation for ul(t), j, u1 U-, U 36 u2(t),u2,u2 the beam width control, its jth sample and a vector of such samples 16 ul,u,u the optimal and two suboptimal 1 forms of ul 22 u_ a vector of the controls u1,u2,...,u. 36 J u a unit vector drawn from the origin toward the signal source 55 X,X* the operator R/2R-/2 and its adjoint i o 117 [Y1],[Y2] sufficient statistics which allow the evaluation of X. 153 1 [Y (t)],[yi(t)],[y(t)] the process driving the array when aO is the true direction of arrival, the process driving the array when o, is the true direction of arrival, and a sample function of these processes 60 y(t),y,yj a noisy signal in the DF system model, its jth sample, and a vector of such samples 18 y'(t) a noisy signal in the DF system model 16 y_ a vector of the samples y1,y2,...,yj 36 [Zi],[Z'] special random processes defined on Q 103 { } the transpose of the quantity within the brackets 24 { }-1 the inverse of the quantity within the brackets 26 "-" denotes complex conjugate 85',," denotes convolution 151

ABSTRACT The purpose of this study is to develop techniques for processing signals received by a spacial array of K omnidirectional antennas so as to produce optimal direction-of-arrival estimates. Such estimates are useful for navigational purposes in which case the signals are electromagnetic in nature as well as for seismic investigations where the signals are mechanical vibrations transmitted through the earth. Another possible application is an underwater direction finding system which utilizes an array of hydrophones for its receiving antennas. Two distinct approaches to the problem are pursued which apply differing mathematical disciplines. The first approach models the array as an element of a phased array direction finding system and attempts to apply "Stochastic Optimal Control Theory" to produce otpimal control laws for directing the pointing angle and specifying the beam width of the array. Optimal filtering of the signals is also considered. The controls and filter which result in "Minimum Error Variance" are considered optimal. The second approach applies "Estimation Theory" and considers the array only as an information gathering device whose signals are to be processes directly without the "prefiltering" present in phased arrays. The optimality criterion of this approach requires the direction-of-arrival estimate to result in a "Minimum Probability of Error" or, what is shown to be equivalent, to have "Maximum a Posteriori Probability." In the Control Theory approach, the model is investigated under both open-loop and closed-loop operating conditions. The closed-loop analysis applies "Dynamic Programming" techniques and produces an integral equation whose solution represents the optimal controls. Unfortunately, the nonlinearities in the system prevent an explicit solution of this equation. The open-loop analysis assumes the a priori probability density of possible angles of arrival is concentrated in a "narrow" region about the a priori expected direction of arrival. An optimal filtoer and optimal controls are obtained for this case. The "Estimation Theory" approach assumes the received signals are samples from a Gaussian random process which are corrupted by additive Gaussian noise. Extensive use is made of the "Theory of Random Processes" and "Functional Analysis." The Radon-Nikodym derivative of a particular Gaussian probability measure with respect to another Gaussian measure plays a key role in the optimal estimation technique derived. Several special cases are considered in which the signal and noise processes are restricted to being "narrow-band." For these cases, an easily implemented technique is obtained for evaluating the required RadonNikodym derivative. Finally, a "narrow-band" numerical example is included along with an analysis of the error that results from the use of this estimation technique. xvi

CHAPTER 1 tNG/NLtv Or INTRODUCTION Ij 1.1 BRIEF STATEMENT OF THE PROBLEM The problem to be considered here is concerned with the surveillance of a region of space in which multiple sources of electromagnetic radiation are present. These sources, which emit random signals and are embedded in a noisy environment, operate in an intermittent mode and have the possibility of varying their relative locations within the region of surveillance. The system, which is to perform the surveillance, receives information concerning the source locations in the form of a sequence of time signals of finite duration, each of which is the output of an element of an array of omnidirectional antennas. The solution which we desire, then, specifies the operations to be performed by a signal processing system in order that optimal, in some sense, estimates of the angular locations of the sources relative to the receiving array can be provided. These angular locations will, henceforth, be called the "'angles of arrival" or "directions of arrival" of the signals at the site of the receiving array. In the literature, this problem is sometimes referred to as the "Direction Finding Problem." It includes, as a special case, the "Radar Problem" in which the angular location of a source,whose emitted waveform is known to the surveillance system, is required. A direction finding system that is capable of operating according to the above specifications has an obvious military application for determining an enemy's strategy. For example, activity in certain geographical areas may indicate an elaborate preparation is underway; activity

2 of a single source at sea may indicate the location of a submarine; and the resolution of a number of signal sources, synchronized to a common signal, may indicate a new air-defense system. To demonstrate a nonmilitary application, suppose the antenna array is replaced by an array of seismic sensors. It is then evident that the above signal processing system can be used, equally well, to determine locations of earth disturbances as well as to map the earth's strata for oil exploration. 1.1.1 History of Direction Finding In the past, a variety of techniques, depending on the exact application, have been utilized for determining the direction of arrival of signals at a particular point in space. Almost all of these techniques, however, pursue a common method of attack which requires antennas with directionalproperties to be sequentially scanned through the region of surveillance. The direction of maximum or minimum received signal strength is then used as an indication of the true angle of arrival of the signal. In the period around 1890, Hertz,1 operating in the 200 Mc range, utilized cylindrical parabolic mirrors to focus and concentrate energy from a transmitter and hence produce an antenna system with directional properties. Directivity was also used by Marconi2 before 1900 to increase the range of his transmission by the use of copper parabolic mirrors. When radio communication, because of the ranges possible, progressed rapidly to the hf range around 1900, the mirror techniques were no longer practical and J. Zenneck,3 at about that time, introduced simple "director" and "reflector" wires to increase the range and to separate interfering signals. A similar approach was pursued by S. (G. Brown who connected together two vertical antennas separated by a half wavelength. Stone,5 in 1902, proposed to physically rotate this array to determine

3 the direction of arrival of the wave. The next major advance was due to Bellini and Tosi6 who provided a second array at right angles to the first. Then, by means of quadrature coils placed outside of a single rotating coil, they were able to sequentially sample the signal being picked up by each antenna pair. This technique avoids the necessity for rotating the entire array. The device for sampling the antennas sequentially is called a "goniometer" and has its counterpart in many present day direction finding systems. Such a device is shown schematically in Fig. 1.1 where the moving coil is connected to a receiver. An additional improvement was provided by Zenneck who placed an omnidirectional element in the center of the array and connected so as to add phasewise to the output of the moving coil of a Bellini-Tosi goniometer. This arrangement produced an antenna pattern which was a rotating cardioid rather than a figure-eight, thus eliminating the 180~ ambiguity which existed with the former configuration. World War I and the development of the vacuum-tube amplifier with its sensitivity stimulated refinement of the above method of direction finding. Of particular interest was the design of the Adcock array and numerous indicators, both mechanical and cathode-ray types. World War II produced the 8-element Adcock array, H collectors, and crossed loops, all utilizing the spinning goniometer technique. Since about 1950, the University of Illinois has been conducting experiments7 with the Wullenweber facility which utilizes a goniometer and a 120-element circular array. Furthermore, the recent papers of W. J. Lindsay and D. S. Heim8 and C. E. Lindahl and B. F. Barton9 which are concerned with goniometer systems testify to the fact that these techniques are still of importance today.

B Bm A Fig. 1.1. The Bellini-Tosi goniometer for sequentially sampling a pair of the Bellini-Tosi array.

5 The methods described above employing spinning goniometers are not, however, the only practical techniques for determining the direction of arrival of a wave. World War I also saw the development of several rotating loop direction finding systems which applied the directivity of the loop to determine the direction of minimum received signal intensity. Various other techniques for obtaining directivity employing reflectors and lenses were also utilized during World War II. Two recent developments have produced the possibility for a tremendous improvement in the art of direction finding. A new interest,10 with emphasis on arrays of large numbers of elements, has provided the impetus for advances in our knowledge of the principles of arrays and the invention of new arraying techniques. These advances, together with the present availability of compact digital computers with their inherent signal processing capabilities, hold the promise of a vastly increased capacity for classifying and keeping under surveillance a large number of emitters. (These possibilities will be explored in the ensuing chapters of this thesis. ) For the sake of completeness, it is worth mentioning that still other techniques have been devised for solving the direction finding problem. In the situation where a limited number of sources whose signal strength at the site of the surveillance system is strong, time delay and phase comparison techniques are applicable. 1.1.2 Basic Elements of a Direction Finding System Before examining any direction finding (sometimes abbreviated DF) system for its individual characteristics, it will be useful to consider the elements basic to any system and see how they can affect its performance. A block diagram of a generalized DF system is shown in Fig. 1.2 with a discussion of its individual blocks being presented in the

r- -- - 1 Signal ~ Propagation Soc MAntennas Prefilter Filter Source Medium I Processor Fig. 1.2. A generalized direction finding system.

7 following six sections. 1.1.2.1 Signal Source.-The signal source has four properties which affect the performance of a direction finding system. These are its transmission frequency, its power level, its modulation characteristics, and its location relative to that of the DF system's antennas which intercept the emitted signals. The transmission frequency and modulation characteristics influence the choice of the antennas and receiving equipment employed in the system. The source power level together with its location which can be given by specifying the source-receiving antennas separation as well as its angular bearing influence the DF system only as they affect the signal strength of the emitted signal at the site of the receiving antennas. 1.1.2.2 Propagation Medium.-The propagation medium affects a direction finding system by attenuating the signal in varying degrees depending upon the frequency of the carrier, the time of day, the distance between the source and the receiving antennas and many other factors. In addition, it is also possible for the propagation media to introduce various other distortions such as random phase errors which are independent of the source being used. Finally, under certain conditions in long-range direction finding, multipath propagation usually cannot be avoided. 1.1.2.3 Antennas. —All direction finding systems require some device for sensing the electromagnetic waves that have been emitted by the sources. The exact type utilized is a function of the desired antenna pattern together with the carrier frequency and bandwidth of the emitted signals as well as various economic considerations. The type that will be of interest to us employs an array of omnidirectional antennas. 1.1.2.4 Signal Processor. —In most direction finding systems, the processor has two distinct functions to perform as illustrated in Fig.

8 1.2. These functions are discussed in the two following sections. 1.1.2.5 Prefilter.-The basic purpose of this unit is to combine the signals received by the antennas in a manner such that its output can be interpreted, by the following filter,as a direction of arrival of the signal. In systems employing arrays of omnidirectional antennas, the outputs from the individual antenna elements are usually combined so that the "array-prefilter" pair possesses a highly directional antenna pattern* and, as a result, the prefilter can be thought of as a "beam forming" network. Then, by varying this method of combination with time, the system can be caused to scan the region of surveillance. Systems employing dish antennas also permit an analysis in a manner similar to that of the above system of omnidirectional antennas. In this case, however, the prefilter is restricted by mechanical connections with scanning being performed by a physical rotation of the antenna. In any event, this prefilter is normally preset and independent of past received signals. In the systems to be considered in this presentation, the above restriction will be relaxed with the operations to be performed by the precessor being specified so as to optimize the performance of the total system. 1.1.2.6 Filter.-The filter takes the output of the signal prefilter and provides an estimate of the direction of arrival of the signal according to some established criterion of optimality. As an example of such a criteria, we could require the filter to provide the estimate which has the "maximum a posteriori probability" based on the received signals and any a priori information that might be available. 1.2 APPROACHES TO THE PROBLEM In this dissertation, two different approaches for determining an *See Refs. 11 and 12 for a discussion of phased arrays.

9 optimal estimate of the direction of arrival will be explored. Chapters 2-5 present a solution which attempts to apply the existing theory of "Stochastic Optimal Control" to this problem. In particular, control laws for the optimum beam forming network and the optimum filter are derived. In Chapters 6-9, an alternate approach is pursued in which the separation of the beam forming and filtering operations is absent. Instead, the operations to be performed by the optimal processor are specified by an equation, whose solution represents the optimal estimate. This equation, whose implementation with existing hardware is also considered, is derived by the application of several techniques from the "Theory of Estimation." 1.2.1 Stochastic Optimal Control Theory Historically, the classical theory of control of deterministic, linear, time-invariant systems was developed during the 1930's and 1940's. Design of control systems according to this theory usually entailed an analysis of its frequency domain characteristics. During the 1940's and 1950's, the theory was extended to linear, time-invariant systems involving stationary random signals by use of Wiener's theory of prediction and filtering. During the last 10 years, the fundamental direction of research in control theory has turned toward the modern theory of optimal control. This theory requires the determination of a control law which governs the action of a controllable variable so as to optimize (i.e., maximize or minimize) some performance or loss functional. By the use of the statespace techniques, the calculus of variations, dynamic programming, and Pontryagin's maximum principle, a very substantial and satisfactory theory now exists for deterministic systems.

10 Just as the classical theory was extended to include the effects of random disturbances, attempts have been made to extend modern control theory into this area. Indeed, some successful results have already been obtained, particularly in the case of linear, discrete-time systems.l3 The results 14 which will most interest us are due to the Russian, A.A. Fel'dbaum, who has established a general procedure for handling both linear and nonlinear discrete-time systems. Applying his techniques, solutions can be obtained which, in general, require the use of a digital computer in a manner similar to that required by many solutions employing dynamic programming. As with dynamic programming, however, these solutions demand computation rates which, in many cases, are unrealistic in light of the capabilities of present day computer systems. Other authors, E. P. Maslov,15 V. P. Zhivoglyadov,l6 and R. L. Stratonovich17 to name a few, have attempted to apply Fel'dbaum's'dual control" techniques to specific situations with varying degrees of success. Results18 have also been obtained for the continuoustime optimal control problem with noisy observations, but, in the nonlinear plant case to which our particular problem reduces, the theory is still in a rather preliminary stage of development. For additional background on the stochastic optimal control problem, see the paper by Wonham,19 where further references are given. 1.2.2 The Theory of Estimation "Estimation Theory" was established as a mathematical technique in 1806 with the first publication on "least squares" estimation by Legendre.20 This concept is generally credited to Gauss,21 however, since his method published in 1809 was derived from fundamental principles. At that time, it was used mainly by astronomers as a means of reducing observations to obtain the orbital parameters of minor planets and comets.

11 After the introduction of the least squares, the next major advance was the "method of moments" formulated by K. Pearson22 around 1900. The main disadvantage of the method of moments is that estimates found with this technique are not the "best" possible from the viewpoint of "effi-,,24 ciency Estimation theory was put on a firm foundation by R. A. Fisher with a series of fundamental papers.23-25 Fisher demonstrated that the method of "maximum likelihood" was usually superior to the method of moments and that estimates derived by this technique could not be improved "essentially." A renewed interest in this area was stimulated by the rapid development of communication theory. Similar but independent theories were developed by N. Wiener26 and A. Kolomogorov27 for separating desired signals from undesired noise signals. Wiener was interested in obtaining optimum estimates of signals by the use of linear, physically realizable electronic filters and assumed the observed signals where sample functions of "stochastic processes." A very complete presentation of the theory of stochastic processes which we shall reference from time-to-time is available 28 29 30 in the books by M. Lo'eve, J. LDoob, and P. R. Halmos. The application of maximum likelihood techniques to the problem of optimum demodulation of communication signals was first considered by F. W. Lehan and R. J. Parks.31 Further investigations of this problem were carried on by D. C. Youla32 and D. Slepian.33 Youla was concerned with amplitude demodulation while Slepian was interested in the estimation of a finite number of parameters of which the signal was assumed to be a function. J. B. Thomas and E. Wong34 obtained the a posteriori most probable estimate of the modulation rather than the maximum likelihood estimate. Recently, H. L. Van Trees35 has attempted to generalize the work

12 of these authors to handle arbitrary modulation as well as diversity communication problems. These notes by Van Trees have stimulated much of the work that is presented in the last four chapters of this dissertation. However, rather than directly apply the techniques of Van Trees, a more fundamental approach has been pursued which extends the work of T. T. Kadota.36-39 These papers by Kadota are concerned with deciding which of a finite number of Gaussian signals of differing covariance functions, in addition to Gaussian noise, is being received. Finally, there is also a great body of literature concerned with the application of maximum likelihood techniques to the analysis of radar systems with the papers by E. J. Kelley, I. S. Reed, and W. L. Root40o41 being examples.

CHAPTER 2 SYSTEM MODEL FOR CONTROL THEORY SOLUTION 2.1 INTRODUCTION In the next four chapters, a solution of the "Direction Finding Problem" by the use of "Stochastic Optimal Control Theory" will be sought. As an initial attempt at a solution, this approach appears to have merit in light of the fact that most existing systems have some control variables which affect the signals received. For example, with a dish-type antenna, the pointing angle affects the received signal strength. In the majority of these systems, however, the scanning modes and filtering techniques are independent of past received signals and often established without any real regard for each other. As a result, we might logically ask the following questions. "Is it possible for the control laws and filtering operations to be coordinated, depending on the past received information, so as to obtain a total system optimization?" "Can we, by the use of optimal control functions and possibly in conjunction with a digital computer, increase the performance of existing systems?" In the following chapters we shall attempt to answer these questions by obtaining an optimal "openloop" solution and an optimal "closed-loop" solution with emphasis on both the control laws for orienting and shaping the antenna pattern and the filter operations which process the received data. For the closed-loop solution, the control law will be a function of past received signals, past controls, and any priori information that might be available concerning the true angle of arrival of the signals. In the open-loop solution, the control law will be independent of past received signals and past controls but still a function of the a priori information, 13

14 2.2 SYSTEM MODEL The DF system which will be presented in this section for analysis in the next three chapters has been selected because of its simplicity and because it exhibits most of the features of existing direction systems which are of importance when considering their system optimization. Before introducing this system model, however, let us first consider the direction finding system illustrated in Fig. 2.1. This system will be employed to provide us with some justification for our particular choice of system model. In Fig. 2.1, an incident wave at an angle a relative to some reference direction, cP, is impinging upon a group of K omnidirection antennas. This wave has been emitted by a source, sufficiently far removed, so that the wave appears to be a plane wave when viewed at the site of the receiving array. Furthermore, the source waveform is a pure sinusoidal of frequency wo while the source location is in a plane which contains all the elements of the antenna array so that the estimation of a single angle is required. (If and when a tractable solution is found for this type of signal waveform and this particular source-array geometry, we may then consider the more general (and difficult) case where the source is emitting a random signal and the estimate of two angles is required.) Recall,* now, a group of K onmidirectional antennas can be connected in such a manner that the group has the characteristics (antenna pattern) of a single directional antenna. By the introduction of proper delays and amplifications of the individual received signals before summing, the pointing angle and, to some extent, the beam width can be controlled. Applying this fact, in Fig. 2.1 the individual received signals are shown as inputs to the devices Bj,je{l,2,...,K}, which delay and amp*See Ref. 11, Vol. 2, Chapter 1.

Noise s'(t):'(t; All~~~~~~~~~~~ te A y (t) Coder M Ct~lontroller 22 a~~~~~~~~~~ a<;.~\< ~./;~, Fig. 2.1. A phased array DF system. a -<, i> r ~ ~ ~ 0.r~ f i.21 A.hse 1ra F yt'h e *s ~.._r any ~lr C'' R. t'*~ -"'s. c. I

16 lify as required by the coder. The coder, in turn, specifies the delays and amplifications which are necessary to comply with the controls ul(t) and u2(t), the desired pointing angle and beam width, respectively, specified by the controller. Moreover, since we are interested in optimal DF systems, the controller operates so as to optimize the performance of the filter whose output a, is a "best" estimate of the angle of arrival of the signals based on the data y'(t) received during the finite observation interval [O,Tf]. Finally, in this figure, an additive noise source is shown which is assumed to be independent of the controls. Let us now consider a model which, as we shall see, possesses the essential features of the preceding direction finding system. Since the coder performs known operations, as far as the control-estimation problem is concerned, the model shown in Fig. 2.2 is equivalent to the system of Fig. 2.1. In this figure o, B, and e are unknown parameters, a representing the unknown direction of arrival, B representing the unknown signal amplitude and e the unknown phase angle. The signal s'(t) is given by s'(t) = C'(a,B,ul u2) sin (wot+e) where C' is the deterministic function of the variables O, B, ul, and u2 which summarizes the effects on the signal s'(t) produced by the unknowns a and 3, the antennas, the coder, the summer, and the devices Bj,jE{(l2 O..K) of Fig. 2.1. (Note, if we eliminate the control variable u2, this model could equally well serve as a model for a system which employs a dish-type antenna where the pointing angle ul is to be controlled and whose beam width is fixed.) The actual function C'(a,B,u,ulu2) is linear in B so that it may be written as B times C"( a,u1,u2) where C"(ca,ul1,u2), as a function of the variables a, ul, and u2, is most easily obtained by experimental

17 Noise.C iw s'(t) i Luy'(t) c' sin (cot + 8) Filter u (t) u 2(t) Controller Fig. 2.2. Simplification of the DF system illustrated in Fig. 2.1.

18 techniques. However, an approximation for the function C"(a,ul,u2) (usually referred to as the amplitude "antenna pattern") which retains its essential properties is given by C(O,u1,u2) = exp{-u2(a-u1)2} As with C"(a,ul,u2), this function has a maximum when ul equals a and a beam width dependent on u2. We are now in a position to present the system model which will be the subject of the ensuing analysis. This model is illustrated in Fig. 2.3 where it can be seen to have a structure similar to that given in Fig. 2.2. In this model, however, additional simplifications have been introduced by substituting SC for C' and assuming 0 is known so y'(t) can be synchronously detected to eliminate the sin (wot+6) term which contains no information about the angle a. (A phase-locked loop can be employed to provide the variable 0.) The signal y(t) is then given by y(t) = 8C(c,ul(t),u2(t)) + n(t) where q(t) represents the noise in the system and is assumed to be independent of the controls. In addition, since we are interested in possibly utilizing a digital computer which demands discrete-time data and since such problems are usually easier to solve, the observation interval [O,Tf] has been divided into J subintervals with the signal y't) being sampled after each subinterval to obtain the sequence {yj},jcE{1,2,...,J}. The controls ul(t) and u2(t) can then also be restricted to a discrete form with ul and u2 being the respective controls applied during the jth J j subinterval. For convenience, moreover, we shall assume that the noise component of y(t) at each sample time tj, nj, is an independent random

19 Noise sin (Uot +) /i(o8 Filter I 3I uF 23 Controller Fig. 2.3. System model to be analyzed in Chapters 3-5.

20 variable of zero mean and variance o2. Finally, in many problems it is reasonable to assume that the direction finding system has available some a priori information concerning the true values of the unknown parameters. For this model, we shall exhibit this information in the form of a known probability distribution of these parameters. It is now possible to be more specific as to what we shall mean by a closed-loop and an open-loop solution for the direction finding problem using control theory techniques. A solution which specifies the operations to be performed by the controller and the filter for the model of Fig. 2.3 will be called a closed-loop solution while a solution for this model when the controller is required to operate without knowledge of the past samples of y will be called an open-loop solution. To summarize, in hopes of obtaining a tractable solution, we have developed a very simplified model of a direction finding system while, at the same time, retaining many of the features important to the controlestimation problem. The significant point to remember is, if no solution can be found for this model, there appears to be little hope for solving the more realistic problem in this control-estimation framework.

CHAPTER 3 THE OPEN-LOOP SOLUTION 3.1 INTRODUCTION Although we are interested primarily in the closed-loop solution, we shall begin with an investigation of the open-loop system from which insight (and experience in obtaining a solution) can be gained. The results of this chapter are presented in Section 3.3. A discussion of these results, however, will be deferred until Chapter 5 at which time the closedloop results will also be discussed. 3.2 ASSUMPTIONS The solution presented in Section 3.3 is optimal when the following assumptions together with the system model of Fig. 2.3 represent a valid description of the environment of the direction finding system. A3.2.1 The control u2 is fixed which implies the beam width does not change during the observation interval [O,Tf]. Also, the controls uj,jE{1,2,...., are chosen a priori since we are considering the open-loop system. A3.2.2 The unknowns a and, are independent random variables having means of ma and mg, respectively, and variances o2 and a2, respectively. In addition, the probability density of a is an even function about mc. A3.2. 3 The probability density function of a, p(Oa), is "narrow" compared with the beam width so that, over the range of a where p(a) is different from zero, the function C(c,u,u2) can be written as 21

22 C(a,uj,u2) = pj + qj(a-ma) + rj(ca-ma)2 + E(a,u,u2) (3.1) where Pj = exp{-u2(m )-u)2, (3.2) qj = -2u2(mm-u!)exp{-u2(me-uj)2} (3.3) rj = 2(u2)2(m -u3)2exp{-u2(ma-uj)2} - u2exp{-u2(mm-ul)2}, (3.4) and the absolute value of c(c,u1,u2) is less than the real positive constant eo. (The first three terms on the right of Eq. (3.1) are, of course, the first three terms of the power series expansion of C about the point m_ when C is considered to be a function of a only. As a result, for any Eo > O, there is a width for the density p(a), below which, Eq. (3.1) is valid.) A3.2.4 The class of filters from which an optimal filter is sought operates on the received signals to produce an estimate ae according to the equation =e b + by(35) e 0 jYj j=l where yj is the value of y at the jth sample time, bo represents a bias J of the sum L bjyj and {bj } is a sequence of weights applied to the rej=l spective samples by the filter. A3.2.4 The optimal controls,j{1,2,...,J}, and the "weights bj,j{1,2,...,J}, together with the bias bo which characterizes the optimal filter minimize the expected value of the squared error, i. e., they min

23 imize E{(cae-a)2}. According to Assumption A3.2.3, the solution that will be obtained in this chapter corresponds to the limiting case in which a great deal of information is available concerning the true value of a and only minor refinements are required. The restriction of the optimal filter to the class specified in Assumption A3.2.4 has been made because of the ease of implementation of such filters. 3.3 RESULTS The following results were obtained when the assumptions of Section 3.2 were applied to the system model illustrated in Fig. 2.3. These results together with those derived by the closed-loop analysis of the next chapter will be discussed in Chapter 5. R3.3.1 Suppose J, the total number of samples upon which our estimate is to be based, is an even number while ul = maco + (1/v) c, (3.6) b = (c/mf+a2 2 c (3.7) - 2( a +m ) q2max+O2 - and =m mCm.'IV (3.8) o = m - mB jjI j=l where co denotes the J-dimensional column vector whose elements are plus one, c denotes the J-dimensional column vector for which J/2 arbitrary elements are plus one and its other J/2 elements are minus one, qmax =' exp{-1/2}, bj is the jth element of b, and P is the value of p. when the jth element of ul is the control 1. Then, as the beam width becomes

24 large compared with the width of the probability density function p(a), b as the the use of b0 as the filter bias and the jth elements of u1 andb as the jth control and jth filter weight, respectively, results in an expected value of the squared error which approaches arbitrarily close to the expected value of the squared error resulting from the use of the optimal values of these quantities. R.3.3.2 If the controls, the filter weights, and the bias specified in R3.3.1 are applied, the expected value of the squared error becomes E{(ae-a)2} = o2 + f E (3-9) 2 2 2 ) o (+m) + q2 max where c' is a term which goes to zero as the beam width becomes large compared with the width of p(c). It will be shown, the controls specified in R3.3.1 direct the antenna beam so that, for equal time intervals, each of the two angles at which the antenna pattern (the function C(ac,u1l,u2)) has maximum slope as a function of a are oriented in the a priori expected direction rna. 3.4 SOLUTION Let us begin by defining y and b to be the J-dimensional column vectors whose jth components are given by yj and bj, respectively. Then, using a superscript T to denote the transpose of a vector, we find that Eq. (3.5) can be rewritten as ae = bo + bTy (3.10) where bTy is understood to represent the matrix product of bT and y. Our problem, then, according to the optimality criterion of Assumption A3.2.5,

25 reduces to the determination of bo, b and the control vector ul, ul being the J-dimensional column vector whose jth element is ul, that minimize the expected value of the squared error which results from the use of Eq. (3.10) to estimate a. Forming E {(me-a)2), we find E{(ae-a)2 = E{(bTyo-a)2} - [E{(bT y-o)}]2 + [bo-E{(b_T-a)}]2. (3.11) Then, since bo appears only in the last term of Eq. (3.11), it is easily seen that A bo = bo = E{(bTy-))} (3.12) A minimizes E{(ae-a)2} with respect to bo and, therefore, bo as defined in Eq. (3.12) is the optimal bias. Moreover, when bo = bo, the expected value of the squared error reduces to E{( oe a)2} = E{(bTy_ )2} _ [E{(bTy_o)}]2. (3.13) In order to obtain the optimal "weights" {bj}, let us first simplify Eq. (3.13) by squaring the indicated quantities and noting that only the random vector y and the random variable a are affected by the expectation. Then, applying Assumption A3.2.2, Eq. (3.13) becomes E{(Oe-O)2} = 02 + bTDb - 2bTd (3.14) where D = E{(y-E{y})(Y-E{y})T} (3.15) which is the covariance function of the random vector y and d = E{(c-m,) y}. (3.16)

26 We now have the expected value of the squared error in a form, as given in Eq. (3.14), which will allow us to obtain the optimal "weights." Taking the gradient of Eq. (3.14) with respect to b and setting it equal to zero, we find the vector of optimal "weights" b must satisfy 2DA - 2d = 0 or equivalently, b = D-ld (3.17) where D-1 denotes the inverse of the matrix D which we shall assume exists, at least for the time being. The jth component of b is then the weight applied to the jth sample by the optimal filter. Finally, substituting b for b in Eq. (3.14), E{(ae-o)2} becomes E{(ace-a)2} = 02 - dTD-ld (3.18) which now must be minimized with respect to the control vector u1. Applying Assumptions A3.2.2 and A3.2.3 and denoting the J-dimensional column vectors whose jth elements are pj, qj, and rj, by A, q, and r, respectively, the matrix D becomes 0D~ 22T+(C 2+m2)a2q + [(2+m),Y4 - m(2)2]rrT =0 pp +[,m) - __. +o202 [TprT + rpT] + U2I + De(E,ir) (3~19) where y4 = E{(a-ma)4}, I is the identity matrix, and De(L,1L,r) is a matrix whose elements go to zero as co goes to zero and, hence, as the beam width becomes large compared with the width of the density function p(x). (Note, to obtain ]D, we have used the fact that E{(a-m-)3} = O which follows from Assumption A3.2.2 where p(a) was assumed to be an even function about me.) The vector d on the other hand, reduces to

27 d = m2+ + (3.20) where d is a J-dimensional column vector whose elements also go to zero with Eo. It can now be seen, even if De and de are neglected in Eqs. (3.19) and (3.20), the minimization of Eq. (3.18) with respect to the control vector ul is a problem which is not readily solvable since the functional relationships between pI a, r, and the control vector ul are nontrivial. It appears that only a computer solution can produce the desired minimizing control vector ul. Thus, we find that the method of solution which calls for the successive minimization of E{(ae-a)2} with respect to bo, b and ul, although being relatively straight forward as far as the bo and b minimizations are concerned, becomes analytically unmanageable when the ul minimization is considered. Let us now back up to Eq. (3.13) and take a slightly different approach. Rather than derive the optimal vector b at this point, let us write Eq. (3.13) in the equivalent form E((e-_O)2} = E{[(bTyct) - E{(bTyO)}]2} (3.21) and apply Assumptions A3.2.2 and A3.2.3 which, after some rearrangement of terms, gives us E{(aec-a)2} = a2((a+m2)l/2bTa - m (O +m2)-1/2)2 + a2bTb 2 2 + E{[(B-m)bTE +()8(-m )2)-_mS2)bTr]2}+ ~aaa + E (o2+=2 ( 0 (3.22) where c1 is a term which goes to zero with co. Then, since the fourth term of Eq. (3.22) is independent of b and ul, the use of the J-dimensional column vector of filter "weights" ~ and the J-dimensional column vector

28 of controls 1 which minimize the sum of the first three terms of Eq. (3.22) result in a loss, E{(ae-a)2}, which approaches arbitrarily close to the loss resulting from the use of the optimal "weights" and optimal controls as El goes to zero. But this implies, the performance of a system, as measured by E{(ae-a)2), which employs b and U1 instead of b A1 u approaches the optimal performance as the beam width becomes large compared with the width of p(a). To obtain b and 1, note that the vector of "weights" and the vector of controls "i which minimize the sum of the first two terms of Eq. (3.22) must be such that 6 and ~1 (the vector q when 1 is the control vector) are colinear while l1 must be of maximal length, i.e., l must' ~ i1must achieve the maximum value of gTR. If b1 and?1 did not satisfy these conditions, we could always find another pair, call themt_ and )2 for which a2 = b, q v T~2 ~ T" r 2T = bTgl and b <T b bTb and, as a result, the sum of first two -V1 terms of Eq. (3.22) could always be decreased. Moreover, if b and u1 are such that b Pli and b _rl are equal to zero, and will also minimize -11 arad u will also minimize the sum of the first three terms of Eq. (3.22) which implies these vectors are suitable choices for b and u1, respectively. (The vectors ~1 and 1 are vectors p and r when Ul is the control vector.) Looking now at Fig. 3.1 where qj is plotted, we see that Tq is maximum when ui,jS1,2,...,J}, satisfy (m,-ul) = + 1/v2u (3.23) or, equivalently, u = ma + 1/V2 As a result Jq, must be of the form

qjT -.' exp{-/12} I-/J2~ I/ i (ma -u ) -' /uexp{ -/2} - Fig. 3.1. Plot of q. vs. (m -uD). j i

30,1 = V2 exp{-1/2}c' (3.24) where c' is a J-dimensional column vector whose elements are either plus or minus one. Furthermore, if ~1 has this form, ql -- = 0 (3.25) and lTN1 = 2 exp{-l} I c' (3.26) j=l J where c' is the jth element of c'. Then, if J is an even number and half of the cj's are plus one while the other half are minus one, clT is also TT zero. But this together with Eq. (3.25) implies b1 El1 and b r are zero since b1 and 9-1 are collinear. Thus, ifl c =C the control vector and the filter "weights" vector which minimize the sum of the first two terms in Eq. (3.18) also minimize the sum of the first three terms in Eq. (3.18) so that = b1 and u is given by - -1 = mOc + (1/V) c. (3.27) (The vectors Co and c are defined in Section 3.3.) Let us now determine the vector6. Since b and q are collinear, we can write b1 = aqi (3.28) where a is a real number. Then, the first two terms of Eq. (3.22) can be written as 2 (a( )/2 T_ 2 (2+2)-1/2)2 + a22 (3.29) wi B ( -1 2 a2 to (3.29) which is easily shown to be minimized when

31 a =' - (3.30) a2( a2 +m2) qax+2 a 5 m ax n and x T 2Ju2exp{-l}. Finally, substituting Eq. (3.30) and Eq. cqmax ql q1 (3.24) with c' = c into Eq. (3.28), we find IV (l/) q2axo 2m b, = c - =2c(_+m2)qmaX~0ff (3.31) which, together with Eq. (3.27), reduces Eq. (3.22) to 2 2 ~n E{(Oe_ )2} = o2 a 2 +,1 (3.32) lY2 ( o2+m2 ) + O In a~ nm ax Finally, let us now investigate a suboptimal bias that is a function of known quantities. In particular, if we return to Eq. (3.11) and substitute bo = ma- ~lT Pl (3.33) for bo instead of to, E{(ae-a)2} will increase by E2 which also goes to zero with eo. Then, if we let ~' = e1+E2, the remainder of the results presented in Section 3.3 follow. Several attempts were made to provide, under less stringent conditions than above, the control vector, the filter "weight" vector, and the filter bias whose application results in a loss which approaches that obtained by the use of the optimal forms of these vectors. In particular, additional terms were added to the powerseries expansion in Assumption A3.2.3 with a search similar to that pursued above being conducted for approximations to the optimal control vector, the optimal filter "weights" vector, and the optimal bias. A solution was not found, however, and it appears that only a computer solution of the type discussed in the paragraph con

32 taining Eq. (3.20) will produce a solution. Moreover, since we are really interested in the closed-loop solution, a computer solution was not attempted and the investigation of the open-loop system was terminated at this point. The solution presented in this chapter does, however, bring out some interesting points which are discussed in Chapter 5.

CHAPTER 4 THE CLOSED-LOOP SOLUTION 4.1 INTRODUCTION In this chapter, optimal filter and control functions for the closedloop system illustrated in Fig. 2.3 are obtained. This is accomplished by making use of techniques which were developed by A. A. Fel'dbauml4 and which resemble the dynamic programming approach of R. Bellman.42 4.2 ASSUMPTIONS The system model discussed in Chapter 2 and illustrated in Fig. 2.3 together with the following assumptions specify the environment in which the solution presented in this chapter is optimal.< A4.2.1 The unknowns a and B are random variables with a joint probability density function given by p(a,B). A4.2.2 The control u2 is fixed during the interval [O,Tf], i.e., we are again considering the case where the beam width does not vary. A4.2.3 A control function (a function which specifies the operation of the controller) and a filter function (a function which specifies the operation of the filter) will be considered optimal when they jointly result in a system which possesses a minimum expected value of the squared error, i.e., if a system employes these functions, the expected value of the error will be minimized. 33

34 4.3 RESULTS The following results were obtained when the assumptions of Section 4.2 were applied to the system model illustrated in Fig. 2.3. As stated in the previous chapter, these results together with those obtained in Chapter 3 will be discussed in Chapter 5. R4.3.1 The optimal filter operates on the received samples {i)} according to the equation op(a,) J p(y./u1,a,l)dad8 Q( a,_ -... ( 4.1) iI. (,8) tH p(yjp,up,o,)dcadl Q(oc,S) j=l i.e., the estimate'a which is provided by an optimal filter satisfies Eq. (4.1). In this equation, Q(ca,3) denotes the region of the variation of the variables a and 6 over which they must be integrated while the function p(yj/ujl,z,8) is given by p(yj/u'a',) = p(yj -C(a uu2) (4.2) when po(nj) is the probability density function of the noise random variable i j. (The function 8C(a,u,u2) was discussed in Chapter 2 and represents the value yj would achieve in the absense of noise when a,B,ul, and u2 are the angle of arrival, the signal level, the pointing angle, and the beam width, respectively.) If, for example, the variables {inj} are Gaussian,p(yj/ul,a,) takes the form ) 1 exp_ 1 [ _C(,uu2)]2} 2o2 yj Cu u T)~~~~r

35 where 02 is the variance of n j. R4.3.2 The optimal control tj applied during the jth subinterval of the observation interval [O,Tf] minimizes the function G.(ul) = I { mimnj I mi nI (a( 2m J 0 J1 I(j+ u (y) Q(, j+l 1 (x) p(a,S) HI p(yj/ulJa,,)dad| dyJ *.djj+l dyj (4.3) j=l which is, in turn, a succession of minimizations with respect to the future controls and integrations with respect to future samples yi,ie{j,j+l,...,J}, as well as the variables a and S over their respective regions of variation. (The variable & in this equation has the functional form illustrated by Eq. (4.1.).) We shall find in the next chapter, the evaluation of Eq. (4.1) and the minimization of Eq. (4.3) with respect to u1 requires the use of numerical integration which, in turn, necessitates the need for a digital computer. 4.4 SOLUTION Let us begin by denoting with L, the loss function which equals the expected value of the squared error. We can then write L = (a-e)2p(, ae) dQ( a,a,) (4.4) Q(a,ae) where ae represents the output of the filter, Q(c,ae) is the region of variation of the variables m and ae, dQ(ma,ae) is an infinitely small ele

36 ment of Q(a,ae) which will also be written as dadCe, and p(a,ae) is the joint probability density of a and ae' (Throughout this chapter, we shall use p(xl,x2,...,xn) to denote the joint probability density of the arbitrary variables x1,x2,...,Xn listed in parentheses.) Therefore, to satisfy the optimality criterion of Assumption A4.2.3, a filter function and control function must jointly minimize this loss function. To simplify the notion in the following discussion, we shall now discard the superscript on the control u1. No ambiguity will result since u2 is assumed fixed and will not enter the derivation. Then, applying the fact* that the integral of p(c,ace,B,y,u_) over the region of variation of B,y and u- is equal to p(c,ace), i.e., J J I~ p( a, ae,,yJudJ) ddyj —du = p(aae), (4.5) we can write the loss function L of Eq. (4.4) in the alternate form L = ( a-e ) 2p( a, ae,y, u_) d e dBdydu_ Q(a~a e 11 2YT UT J J J J (4.6) where y-= (y,Y2,...,YJ ), u- = (u,u2,... u), dy = dy dy...dy and J 1 2'J J J 2' 12 duT = duldu2...duJ. (Remember yj is the jth sample of the received signal y. ) Moreover, if we make use of the relation* p(xlx2) p(xl/x2) (1x2) P(X2) which defines the conditional probability density of x1 given x2, we can write *See Ref. 43, pp. 29-30.

37 p(a,ae,3,yj,5u) = p(,B)p(ae/y YyUa,B)p(y, uj/aB) (4.7) and P( uY-u//a') P(uJ/yj —, u_,,as /)p(yj/yJ-l,u_,~, a S) p(y _,u/, a,. (4.8) In this last equation, the second term on the right can also be written as p(yJ/YJ,uyo,) = p(yJ/uJ,,3) (4.9) since the noise random variables {n j} are assumed to be independent and since the function BC in our model of Fig. 2.3 has no memory which implies the probability of yJ given y J-, u, a, and B is just the probability of yJ given uj, a, and S alone. Let us now assume the filter function exhibits a random strategy, i.e., the filter generates an estimate ae based on the data Y- and uJ J according to some probability density function which we shall denote by r(ae/Yy,u-). (By allowing a random strategy for the filter function rather than requiring it to be a deterministic function, we have increased the number of elements in the class of possible filters from which an optimal element is sought. We shall find, however, the optimal element in this class exhibits a deterministic strategy rather than a random strategy as might be expect.) It is then obvious that the conditional probability density p(ae/yT,u,ca,t) on the right of Eq. (4.7) can be written as p(/yyUay,,s) = p(oe/yJuy) = r(v/yy,u-) (4.10) J;J ~~~~~~~~J

38 since the output of the filter depends only on the values of yT and uT. If we also assume the control function for the Jth subinterval exhibits a random strategy, denote it F (uJ/y-,uj —-), we find the first term on the right of Eq. (4.8) can similarly be written as p(uJ/J, ul- uTy,a,o) = p(uJ/yj-.ujT) = rj(uj/yj-, uj T) (4.11) since the control during the Jth subinterval depends only on y-, and Ujl.a Then, substituting (4.9) and (4.11) into (4.8) and proceedings by induction, we obtain J p(y-,u-/a,B) = rjp(yj/u, (4.12) j=l where r =p(u_/y u ) is the control function for the jth subinterval and r1 = P(Ul). If we now defined the function G(oe) by G(ae) = (a-ae)2p(a,B) H p(yj/uj,a,o)dadB (4.13) Q(aB) j=l and substitute (4.7), (4.10), (4.12), and (4.13) into (4.6), the loss function becomes L = ] G(a )rdae n r. dy du. (4.14) P(yy,Uj) Q(ae) e ) j=l J J Moreover, since the filter makes its estimate after the Jth subinterval, for purposes of determining the optimal filter function we can assume the r js are known. As a result, it can be seen from Eq. (4.14) that L is minimized if

39 G(ae)rdue (4-15) Q(a e) achieves a minimum value for each possible combination of the vectors y. and u-. It can also be seen that (4.15) achieves this minimum value J J if 6( e a-) (4. 16) where 6 is the Dirac delta function and a minimizes G(ae) for any set of vectors y- and u —. Therefore, minimizing G6 e) with respect to ae by J J taking the partial derivative of G(ae) with respect to ce and setting it equal to zero, we obtain cap(ea,B) H p(yj/u,a,B)dcdB ~ p(a,) H p(y./u.,cB, )dcdS Q( a,B) j=l which, together with (4.16), represents the optimal filter function. (The second derivative of G(ae) with respect to ae is easily found to be positive for all ae so that a as defined in Eq. (4.17) does, indeed, minimize G(ae).) Note that this filter is optimal regardless of the control mode, so, when no restriction to a particular class of filters is made, it is also an optimal open-loop filter. Turning now to the problem of obtaining the optimal control function, let us define the function Gj(uj) by GJ(uJ) =,,() (4.18) i J~~2(J

40 Then, substituting (4.18) into (4.14), the loss function L becomes It can now be seen, as in the previous paragraph, L is minimized when the term within the bracket is a minimum for any combination of the vectors y and u... But this term achieves its minimum value when J-1 J-1 r = 6(u-) (4.20) and uJ minimizes GJ(uJ). Thus, rJ as given in Eq. (4.20) is the optimal control function during the Jth subinterval. If the above procedure is repeated, the remaining optimal control functions can be found to have the similar form 1 6 - = (u.j-.) (4.21) J where u. minimizes the quantity Gj(uj) defined by Gj(uj ) = mm]F~ Gi(U) = j f j min J... min L (a- )2p(aB) Q(yj) LUj+1 Q(Yj+l) UJ Q(yj) (x) H p(yj /uj,,1)dad dI...dyi+ dyj, (4.22) The problems associated with the implementation of this equation in a real situation will be discussed in the next chapter. This completes the derivation of the optimal filter function, (4.16) together with (4.17), and the optimal control functions, (4.21) together with (4.22), which jointly minimize the loss function L. By reinstituting the proper superscript on the control variable u., these optimal functions J

can be seen to coincide with those presented in Section 4.3. The expression for p(y./u,a,c) given in Eq. (4.2) is obtained by replacing the tJ J1 noise variable'n by [y.- KC(ot,uu2)] in the probability density function for nj. This follows from the fact that the probability of yj being in a particular set A given ul, and 8 is just the probability that n is in the set specified by the collection of point yj-SC(a,ut u2) as yj ranges over the set A. (Remember, u2 is also known due to Assumption A3.2.1.)

CHAPTER 5 DISCUSSIONS OF CONTRCL THEORY SOLUTIONS 5.1 DISCUSSION The goal of the three previous chapters has been to determine if "Stochastic Optimal Control Theory" can be applied to the direction finding problem. In many respects, this approach seems the most natural since most existing direction finders operate by controlling the pointing angle of a directional receiving antenna. This control mode is normally independent of past received signals or, at least, it is chosen without any real gard to what is optimal. Therefore, if "Stochastic Optimal Control Theory" can be applied, possibly in conjunction with a digital computer, many existing systems could be easily converted to take advantage of these techniques. Rather than investigate a complicated system as to its optimal performance, a much simplified model was constructed which still retained the essential features of a typical direction finding systems. By using such a model, it was hoped that techniques could be established for solving the more general problem. Certainly, if no reasonable solution can be found for this simple model, it is doubtful that the more complicated models could be optimized to any real extent. With the above considerations in mind, an investigation of the model under open-loop operation was initiated. This investigation was concerned with determining optimal filter and control functions which depend only on a priori information, i.e., filter and control functions which are not allowed to be altered during the observation interval as more information is received. These constraints are typical of those usually 42

43 imposed on existing direction finding systems. In addition, because of their ease of implementation, the class of possible filters from which an optimal element was sought was restricted to those whose outputs are a linear combination of the received signals plus, possibly, a known bias level. The filter "weights," the filter bias, and the controls presented in Section 3.3 were shown to result in a system whose performance approached arbitrarily close to that of an optimal open-loop system as the beam width became large compared with the width of the probability density function p(G). (The system performance was measured by the expected value of the squared error which, for the above filter and controls, has the form illustrated in Eq. (3.9)o) This particular relationship between the beam width and the width of p(a) corresponds to the limiting case in which a great deal of a priori information is available concerning the true direction of arrival and only minor refinements are required. Observing now the functional form of the filter "weights," the filter bias, and the controls specified in Section 353, we find that they exhibit the following properties: P5.1 The filter "weights" are dependent on the applied controls. P5 o2 The controls are such that the antenna is always directed so the angle of maximum slope of its pattern is oriented in the a priori expected direction ma. P5.3 The controls are such that each of the two angles for which the antenna pattern slope is maximum are utilized for equal time intervals. To understand this behavior, let us substitute Eq. (3.8) into Eq. (3.5) and assume the filter "weights" vector is b as defined in Eq. (3.7)e We can then write

44 J ae = m + I bj(yj-7j) (5*1) j=1 where y = m.pj which approaches arbitrarily close to the expected value of yj as the beam width becomes large compared with the width of p(a). It can now be seen, the estimate ce is the sum of the a priori expected value of a plus a weighted sum of the deviation of the samples {yj } from the values {yj}. Let us now consider the jth sample and approximate the antenna pattern as a function of a in the vicinity of mm by the first three terms of its power series expansion assuming the control is ma +/V2uzi. (Note for this control, the antenna pattern slope is maximum in the direction a = ma which implies the third term of the above power series expansion is zero. ) This approximation to the pattern times m8 is then illustrated in Fig. 5.1 where the deviation of the sample y. from y. is plotted against a for the case where a2 = 0 and o2 = O, i.e., for the case where B is known a priori n and the sample yj is noise free. (Remember, yj = SC(a,u,u2)+nj in the general case so, when rj = 0 and B = mi, the sample yJ becomes msC(a,u:,u2).) Suppose, now, A., is the deviation of the true angle of arrival and A is the corresponding deviation of the sample yj from yj. Then, for this situation, it is obvious from Fig. 5.1 that A. should be calculated by dividing Ay by the slope of msC(mc,u,u2) at ma. Observing Eq. (3.7), we find this is exactly the operation performed by the filter specified by Eqs. (3.7) and (3.8) when 02 and o2 are zero. The quantity msqma/J is the slope of muC(auju2) at a = ma when the control ul is m +l/Vu2. Moreover, since the antenna pattern slope at a = ma depends on the control, it is clear that the filter should also depend on the controls as observed in Property P5.1.

45 (Yj-Yj) I / Yj l /I ma o a aFig. 5.1. A plot of the deviation of the sample y. from y. as a function of a in the vicinity of m when u1 = m +J /, 2= 0 and a2 = O. nI

46 If we now let yj be a noisy sample so its value is displaced by an amount Al from the value Ay illustrated in Fig~ 5.1, dividing the deviation Ay+A9 by the slope of nmC(oU,uj,u2) will result in the error AX shown in Fig. 5.2. Moreover, it can be seen that AZ will be least when the slope is maximum which accounts for the second property listed above. Note, however, in the noisy case, the filter weights as specified in Eqo (3~7) are dependent on the variance of the noise as well as the slope of mpc(ajupu )~ To understand Property P5.3, let us again assume yj is a noise free sample but, now, allow p to be different from m5. For this case5 a plot 12 12 2 of mnC(a,9ul1u ) in the vicinity of %m when uJ is equal to m+l 2u differs from a similar plot of rnC(ac,u3,u2) as shown in Fig. 5.3 where the respective curves are labeled "Curve a" and Curve b." "Curve a" is displaced from "Curve b" by an amount AX which depends on p-mp and possesses a correspondingly greater slope. Also in this figure, "Curve c" is drawn parallel to "Curve b" but intersecting "Curve a" and the line a=ma at a common point. Now, if Aa is again the deviation of the angle from ma, dividing the corresponding sample deviation Ay'by the slope of "Curve b" results in the estimation error AP1. (We are forced to divide by the slope of "Curve b":instead of the slope of "Curve a" because p is unknown.) This error, AP1L is equal to the sum of Aa1 and A1 as shown in Fig. 5.35 Let us now look at Fig. 5.4 where the previous figure is redrawn but with u1 equal to m-l/2. For this case, the error obtained by dividing the deviation Ay by the slope of "Curve b" is A~2 wnich equals Al minus A2o Furthermore, the errors A1 and A51 are equal as are A and A It can now be seen, the use of the controls mao+l4 and nm-l 2u2 for equal time intervals as stated in Property P5.3 eliminates the errors A'2 and A~2 since they cancel each

47 (Yj -j) I,-_ Ay MCI Fig. 5.2. A plot of the deviation of the sample y. from. as a function of a in the vicinity of m when u = m + l/I2u, J 2 = 0 and A represents the noise in the samnple. 11

48 Curve a / / Curve c / Curve b /, (Yj -j) a a (, Y I / /t II,a a'I 0 a-.Fig. 5.3. Plot of the deviation of the sample y. from y. as a function of a in the vicinity of m when uI = m + 1/J/2U, J2 o and 2 = O. = 1-0

49 Curve a Curve c (Yj -j) I. — a=ma Curve b-e ty { 0 a a2 Fig. 5.4.'Plot of the deviation of the sample y. from y. as a function of a in the vicinity of m when ul = m J1/2u, 2 0 and 2 = O. a a ~B n O.

50 other when the estimates of A are averaged. The errors An1 and Ab1 a1 a2 on the other hand, cannot be eliminated since they add when the average is taken. One additional property could be added to the above list which simplifies the implementation of the filter specified by Eqs. (3.7) and (3.8). It is easily shown that E b.pj is zero so Eq. (5.1) reduces to j=l J, e m + b y (5.2) and, as a result, it is unnecessary to subtract y. from y for each sample 3 j as suggested by Eq. (5.1). This follows from the fact that the slopes of.C(a,uju2) at ma when ul = m +1/V2 and ul = ma1/V2u are the negative of each other while Yj remains fixed for all samples. Finally, to obtain an indication of how the error variance is affected when Eqs. (3.6)-(3.8) are utilized, let us write Eq. (3.9) in the equivalent form C +C2 ) E{(ae-a)2} = a2 2 )+ e' (5 3) i l+C1+C2' where C1 = o /m and C2 = 2/(2 ) A plot of the function (Cl+C2)/(l+Cl+C2) against C2 for various values of C1 is then illustrated in Fig. 5.5, which, together with Eq. i5.3), can be used to evaluate the performance of the estimation technique presented in Chapter 3. (Remember, the term c' goes to zero as the beam width becomes large compared with the width of p(a).) The constant q2 is equal to J times the square of max the antenna pattern slope at m,. The question can now be asked, "Are the above stated properties independent of the particular antenna pattern utilized or do they depend in a special way on the assumed form that was employed?" Returning to the

.9.7 C1= 0.0.3.2 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 C2 Fig, 5,5, A plot of C1+C2z)/(Ci+CiC2) vs. C2 for various values of C1.

52 deviation of Chapter 3, we find the same solution will be obtained regardless of the antenna pattern as long as it is symmetrical about its pointing angle. If the pattern is not symmetrical, it is not clear what formnn the optimal control should take although the resulting error can be calculated easily from Eq. (3.22). Turning now to the closed-loop results presented in Section 4.3, we see immediately the optimal filter is a nonlinear function of the samples {yj}. In addition, we see the present optimal control depends not only on past samples {yj} but also on future controls. This is typical of many solutions obtained by the use of dynamic programming techniques. If we now take a closer look at Eq. (4.3), a major difficulty is directly noticed. Rather than provide an explicit expression for the jth optimal control, Eq. (4.3) represents an identity that the optimal control u. must satJ isfy. In order to implement this result in a realistic problem, a method for solving this equation must be available. Several attempts were made to perform the indication integrations analytically when various probability densities were substituted for p and various antenna patterns were assumed present. In one instance, the integration was attempted when po had the form of a Gaussian probability density and C(a,u,u2) had the form exp(-u2(a-uI)2). These attempts did not meet with any success, however, so the possibility of performing the integrations numerically with a digital computer was explored. Although useful for implementing the optimal filter of Eq. (4.1), numerical integration was found to be impractical approach to a solution of Eq. (4.3) since the variables {yj} range from -co to +. There are simply too many arithmetical operations to be performed, even with a digital computer, when a reasonable number of samples J is considered. Finally attempts were made to approximate

53 the optimal controls by assuming the probability density p(a,B) was a "narrow" function about m, and mg as done in Chapter 3. Some preliminary results were obtained but, because of their marginal applicability, have not been included in this presentation. In conclusion, it was found that "Stochastic Optimal Control Theory," at least that applied by the author, did not give a satisfactory answer to the question of how an optimal direction finding system should operate. As a result, it was decided to alter the above approach and attempt to apply, instead, "Estimation Theory" to this problem. This altered approach, utilizing "Estimation Theory," is the subject of the remaining chapters.

CHAPTER 6 STATEMENT OF THE ESTIMATION PROBLEM 6.1 INTRODUCTION In the previous chapters, a direction finding system was considered which employed a beam forming network. This network delayed and combined the outputs from the various elements of an antenna array in a manner specified by a controller so as to produce a system which had the characteristics of a single antenna of variable directivity and whose direction of maximum gain could be varied. These directional properties of the system were then exploited for the purpose of determining the direction of arrival of the impinging signal with a "control function" and "filter function" being derived which resulted in optimal performance of the system. In the remaining chapters, a direction finding system will be considered in which the beam forming network is absent and the outputs of the antenna elements are processed directly. This method of attack results in what appears to be a more general solution in the sense that the system is, initially, less constrained. This statement must be qualified, however, since a different criterion of optimality is applied to the model. In any event, this technique results in a system which lends itself to a more complete mathematical analysis and which can be implemented by hardware that is available today. 6.2 STATEMENT OF THE PROBLEM The objective of the remaining chapters, as in the previous chapters, will be to derive a technique for determining a "best* estimate" of the *The optimality criterion is discussed in Assumption A6.3.3 of Section 54

55 direction of arrival of a random signal which has been emitted by a point source. This estimate is to be based on the time signals observed at the outputs of K antennas which form a receiving antenna array. In addition to the signal originating from the point source, we shall assume the received signals include additive noise terms. We shall also assume the source is far enough removed that, at the site of the receiving array, the random signal takes the form of a plane wave. The geometry of a portion of the array relative to the plane wave is illustrated in Fig. 6.1 while the system which is to provide the estimate is illustrated in Fig. 6.2. In Fig. 6.1, a coordinate system has been drawn with an arbitrary element of the array positioned at its origin. A portion of the wavefront assumed to be impinging upon the array together with a unit vector u originating at the origin and perpendicular to this wavefront have been indicated. The angles between u and the x and y axes are denoted by al and a2, respectively, and jointly represent the angle of arrival of the wave. Also in this figure, the vector k has been drawn from the origin to the site of the kth element of the array. We shall now assume, without loss of generality, the vectors u and %k are column vectors whose components are the projections of these vectors on the coordinate axes. It can now be seen, if s(t) is the time signal received at the origin due exclusively to the point sources, the corresponding signal [s(t)]k received by the kth element of the array is given by Rk c where c is the velocity of propogation of the wave, QT is the transpose kof - and ZTu is the matrix product of ~T and u. (In the following pre

Waveffront aI x Denotes Antenna Locations x Fig. 6.1. Geometry of the receiving array relative to the direction of arrival of the signal.

57 a2 2 x a3 X,-~ -' Processor a K Fig. 6.2. Processing system to be used to provide an estimate of the direction of arrival of the signal.

58 sentation, brackets will be used to distinguish K-dimensional column vectors whose individual elements are either time functions or random processes. Moreover, the KxK matrix covariance functions corresponding to these random processes will also be characterized with brackets while individual elements of these matrices as well as the above K-dimensional vectors will be denoted by subscripts. Thus, we shall use [s(t)] to represent the column vector of signals received by the antenna array due to the point source and [s(t)]k, as in Eq. (6.1), to denote the kth element of [s(t)] which is the signal received by the kth array element due to the point source.) In Fig. 6.2, the outputs from the various antenna elements are shown as inputs to a processor whose output is represented by (a1,&2) = o which is a "best estimate" of (el,a2) = a, The problem now reduces to the determination of the operations to be performed on the input data by the processor so as to produce an optimal estimate. We will say these operations have been specified when an equation has been provided whose solution is readily obtainable and represents the optimal estimate. In addition to providing this equation, consideration will be given to the problem of' obtaining an operational method for providing a solution to the equation by the use of existing hardware. 6.3 ASSUMPTIONS (General Case) In order to obtain a solution to this problem it will be necessary to be more specific concerning the type of noise that will be present, the type of signals that might be emitted by a typical point source and the criterion of optimality that must be satisfied. We shall begin by specifying the most general conditions under which we are able to obtain a solution. In the next section, we shall present conditions which are

59 more restrictive but which, at the same time, allow us to obtain a more tractable solution~ The general solution, which is derived in Chapter 7, is optimal when the following assumptions are valid. A6. 3.1 The time signal [y(t) ]k received by the kth antenna of the array during the observation interval [O,Tf] is the sum of noise and a signal which is due to the point source alone whose angular location is desired. Each of these terms represent the kth component of a vector sample function from a zero mean K-dimensional, vector Gaussian process. (See Section 7.2.1 for a discussion of random processes.) If rs(t,v) is the covariance function of the signal (point source) process driving the antenna located at the origin in Fig. 6.1 and if a is the true direction of arrival, the matrix covariance function of the signal process [Sa(t)] driving the entire array is given by [r(a) (t,v)] and its jkth element takes the form [r( ) (t,v)]jk = rs(t+Tj'V+Tk) (6.2) where Tu(a) Tn - c nE(j,k}, and c is the velocity of propagation of the wave (see Eq. (6.1)). Furthermore, the elements of this matrix are continuous and bounded on [O,Tf]x[O,Tf] where [O,Tf] represents the observation interval. The noise process [N(t)] driving the array is independent of the signal process and the elements of its covariance function [r~(t,v)] are bounded and continuous on [O,Tf]x[O,Tf] also. In addition, the noise process has the property of "separability" (see Ref. 29, p. 52) and there exist constants C > O and 6 > O such that

6o E {[r~(t,t)]kk + [r~ (vv)]kk - [r~(vt)]kk - [r(t,v) ]kk < CIt-vf6 k~K when t,vE [O,Tf]. (These last two conditions will be used to guarantee the continuity of "nearly all" sample functions of the noise process [N(t)]. Continuity of the sample functions is a reasonable requirement since they are the observed outputs of physical devices which possess finite response times.) Thus, the total process [Ya(t)] driving the array is given by [Ym(t)] = [Sin(t)] + [Ns(t)] and is Gaussian, of zero mean, and possesses a covariance function of the fo m [ra(t,v)] = [r() (t,v) ] + [r~(t,v)], (6.3) Finally, the covariance function of the noise process [N(t)] satisfies Tf Tf [f(t)]T[r(t,v)][f(v)dt dv > 0 (6.4) 0 0 for every K-dimensional, column vector valued function [f(t)] which satisfies Tf [f( t)][f(t)]dt < m 0 (The superscript T on the vector [f(t)] denotes the transpose.) A6. 3.2 The signal processor has available certain a priori information concerning the possible values of the angle of arrival. In particular, this

61 information takes the form of a known probability distribution function for the possible angles of arrival. A6.3.3 The "criterion of optimality" to be satisfied is that of "Minimum Probability of Error (MPE)." If the direction of arrival has one of M possible values, this criterion requires the space of possible received signals Q to be partitioned into M disjoint subsets (A1,A2,..,AM) which have the property M Pe = i Pr(ai)Pri(Q-Ai) (6.5) i=l is minimum among the set of possible partitions. In Eq. (6.5), Pr(cai) represents the a priori probability that mi is the true direction of arrival and Pri(Q-A) is the probability of the received signal not being an element of Ai when the true direction of arrival is ai. (Another term can be added to the sum in Eq. (6.5) if the possibility of noise alone being received is present. See Section 7.3 for additional disccusion.) Thus, if a decision scheme is used which specifies mi as the true direction of arrival when the received signal is contained in Ai, it can be seen that Pe is truly the probability of error. A similar expression is obtained in the more realistic case where M-+o. (In Appendix V, this optimality criterion is shown to result in a partition that is "equivalent" to that obtained by the use of the "Maximum A Posteriori Probability (MAP)" criterion. This latter criterion places a particular received signal in the subset Ai if ai has the greatest probability of being the true angle of arrival based on the received signal and the a priori information.)

62 Assumption A6~3.1 implies (1) the dimensions of the array are small compared with the distance separating the source and the array so that the wavefront of the source signal is essentially a plane wave when viewed at the site of the array, (2) the individual elements of the array are separated by a distance which is sufficient to produce negligible interaction of elements (this requires* the interelement spacing to be somwhat greater than the largest dimension of the elements),and (3) the antenna elements have onmidirectional receiving patterns over the region of surveillance. In an attempt to justify the assumption of known covariance functions, let us remember that, in the case of stationary sources, knowledge of the covariance functions is equivalent to the knowledge of the power spectrums which is a reasonable assumption in many direction finding problems. The assumption that the signal and noise processes are Gaussian is typical of the restrictions normally placed on communications problems. As always, the main justification for this assumption lies in the fact that it simplifies the calculations and results in reasonable solutions to the problem. Finally, we shall find the property presented in Eq. (6.4) easily verified in the cases which will be of particular interest to us, namely, those in which the noise process driving each array element is stationary and independent of all others. Assumption A6.3.2 is particularly appropriate in the case where continuous surveillance of a mobile source is not possible, e.go, either the source does not emit continuously or the direction finding system cannot devote its efforts to this task continuously. Between each period of surveillance, the source is allowed to change its position with the result that, before each period, there exists an uncertainty as to *See Ref. 10, p. 122o

63 its true position which is reflected in the given probability distributions. Many different optimality criterions could have been used instead of that specified in Assumption A6.3.3. Intuitively, the probability of error appears to be a reasonable measure of performance of the system. The preponderant reason for choosing this criterion, however, is the fact that the use of it results in a solution that can be easily implemented with existing hardware. 6.4 ASSUMPTIONS (Special Cases) By placing further restrictions on the functional form of the noise and signal covariance functions, we are able to produce several special cases which are often reasonable descriptions of the environment of a direction finding system. Moreover, the solutions obtained for these cases represent a tremendous simplification over the general case solution obtained in Chapter 7 as far as implementation is concerned. In each of these special cases which are analyzed in Chapter 8, the covariance function of the noise process will have a different assumed form. (See Section 6.4.1 and 6.4.2 for characterizations of these covariance functions.) Before specifying these forms, however, let us introduce the following assumption which, also, will be employed in attaining each of the special case solutions. A6. 4.1 The noise and signal processes are stationary. Furthermore, each of these processes has the "band-limited" property that the Fourier transform of the jkth element of its covariance function, [F(w)]jk, is representable in the form illustrated by Fig. 6.3 where "most" of the trans

64 F(w)] jk -Wo Wo Fig. 6.3. Typical Fourier transform of [rU(T)]k which represents the cross-covariance function of the processes riving the jth and kth antenna of the receiving array.

65 form is concentrated near a frequency w0 and where the reciprocal of the difference between the lowest and highest frequency (of significant energy) is large compared with the propagation time of signals across the array. Also, there exists a symmetric kernel [hT(t,s)] which is square integrable on TxT and which satisfies jTf fTf [r~(u-t)][hT(t,s)][ra(s-v)]dtds = [ra(u-v)] - [rO(u-v)]. 0 0 (6.6) Finally, Tf is large compared with the reciprocal of the difference between the lowest and highest frequency (of significant energy) of [F()] jk so that [hT(t,s)], for t,se[O,Tf], can be "approximated" by the kernel [hs(t-s)] which satisfies ~. u-t = [r~(u-t)][h(t-s)][r(s-)]dtds = [r(u-v)] - [r(u-v) __C*~~~~~ co ~~~~~(6.7) (The sense in which [hT(t,s)] is to be approximated by [h'(t-s)] will be discussed in Chapter 8.) The solution of Eq. (6.7) is easily obtained by the use of Fourier transforms and conditions on [r~(t,s)] and [ra(t,s)] are easily found in order that this solution is symmetric and square integrable on TxT. Moreover, it seems reasonable that the solution of Eq. (6.6) should "approach" the solution of Eq. (6.7) as Tf becomes large. We shall find, however, the existence of a bounded symmetric solution to Eq. (6.6) and the fact that it is "approximated" by [hs(t-s)] is very difficult to verify when specific covariance functions are considered. The following two sections characterize the forms that the noise covariance functions will possess in the special cases analyzed.

66 6.4l1 Independent, Identically Distributed Noise This special case is a reasonable description of the case where the noise is due primarily to the thermal noise introduced by the receiving elements of the antenna array. A6.4.1.1 The noise processes driving each of the receiving antenna elements are independent and their covariance functions are identical and denoted by rN(T). 6.4.2 Independent, Identically Distributed (Except for an Amplitude Factor) Noise This case is similar to that of Section 6.4.1 except now we are allowing the noise level to vary between antenna elements. A6.4.2.1 The noise processes driving each of the receiving antenna elements are independent and their covariance functions are identical except for an amplitude factor with the kth process covariance function denoted by akrN(T).

CHAPTER 7 GENERAL SOLUTION OF THE ESTIMATION PROBLEM 7. 1 INTRODUCTION The method of solution that will be employed in this chapter, will rely heavily on the theories of probability, stochastic processes and functional analysis. Such books as Probability Theory by M. Loeve,28 Stochastic Process by J. Doob,29 Measure Theory by P. Halmos,30 and Functional Analysis by A. Taylor44 are excellent references for this presentation. However, an attempt will be made to make this treatment selfcontained in the sense that all necessary definitions will be included with theorems being quoted if they are needed. In general, the proofs of the theorems will be omitted if they are readily available in the literature. In this chapter, a technique will be developed for detecting a signal and determining its angle of arrival at the site of a receiving antenna array. The application of this technique will result in the specification of either (1) "no signal is present" or (2) "an estimate of the true angle of arrival of the unknown signal is ai" where mai is contained in the set of possible angles of arrival. This specification will be optimal in the sense that it minimizes the probability of error. (Remember, the optimality criterion chosen in Section 6.3 was "Minimum Probability of Error.") Moreover, it specifies the hypothesis of "Maximum a Posteriori Probabiliy," i.e., it specifies the hypothesis that is "most probable" based on the signals received by the array and any priori information that is available concerning the true angle of arrival and the random processes present. The form of these signals and the nature of the a priori information is discussed in Section 6.3. 67

68 The approach pursued in this chapter is an extension of that used by T. T. Kadota,36-39 T. S. Pitcher,45 and B. H. Bharucha46 to solve the problem of distinguishing between possible transmitted signals of differing covariance functionsby an observation of a single received signal. In the direction finding problem we are interested in distinguishing between possible directions of arrival by the observation of a vector of received signals which is a sample function of a vector valued process whose covariance function varies in a known manner with the direction of arrival of the signals. 7.2 MATHEMATICAL PRELIMINARIES Concepts and theorems that will be useful in the remainder of this chapter will be briefly reviewed in the next two sections. 7.2.1 Probability Theory and Stochastic (Random) Processes Elementary probability theory suffices to handle problems of the type considered in the four preceding chapters of this thesis in which only a finite number of random variables are present. However, when continuous parameter processes are present, questions concerning existence and uniqueness become important and necessitate a more fundamental approach in which very precise definitions are required. As a result, we will begin by stating these definitions and proceeding from this point. In order to consider random processes, we must first recall what is meant by a probability space and a random variable. Definition 7.1: A probability (measure) space (Q,1,P) is the triple of the sure event,Q. the (nonempty) a-field 3of events (or measurable sets) and the probability (measure) P defined on 3. (The symbols and P with various subscripts will be reserved for the above quantities

69 while w will be used to denote the individual elements of Q.) The terms measure,* probability and probability measure will be used interchangably throughout this treatment since a probability can be defined as a measure for which P(Q) = 1. A special type of probability measure that will be of particular interest is called a complete probability measure. This measure has the property that subsets of sets of measure zero are measurable, i.e., if a set A is contained in a set B and if P(B) = O, then Ae 6 and P(A) = O. Actually, this is a very reasonable property for a probability space to possess since, intuitively, we would expect a subset of a set of probability measure zero to have zero probability also. It can be shown** that a given probability measure can always be completed in a unique way by a slight enlargement of the a-field S. This enlargement, I, consists of all sets of the form BUN wher Be s and N is a subset of a set in- of measure zero. (The afield'3 is also given by sets of the form BAN where B and N are as above and BAN denotes the symmetric difference of B and N which equals (B-N) 0(N-B).) The measure on )', P, is then defined by F(BN) = P(N) (BN) = P(B) (7.1) Definition 7.2: A real random variable X on (0,e,P) is a mapping X of Q into the real line QR = (-moo) which is measurable relative to 6 and the a-field HR of Borel sets*** in QR, i.e., Ior**** every BE,Y'R, we have {w;X(w)EB}eIa. (The concept of an expectation of a random vari*See Ref. 30, p. 30, for the definition of a measure. **See Ref. 30, p. 55. ***See Ref. 30, p. 62, for the definition of Borel sets. ****It is shown in Ref. 30, p. 79 that this definition of measurability of a random variable is equivalent to requiring, for all real a, {w:X(w) <a}~ g.

70 able X, E{X}, will also prove to be very useful and is defined as the integral of the random variable X over the probability space Q, i.e., E(X} = J~X dP.) A random variable as defined above can be seen to induce a probability measure on the Borel sets of the real line. In particular, there exists a distribution function F(X) = P{w;X(w) < A} (7.2) which is defined for all real X. This distribution function then defines a probability measure PR PR(B) = dF(X) (7.3) B for all B where B is a Borel set and the integral is of the LebesgueStieltjes type. Actually, we have performed a measure preserving transformation from (Q,13%,P) to (2R, RPR) as discussed by Doob.* The utility of this transformation becomes apparent if we consider, for example, the problem of determining the expectation of a random variable of the form D(X) where D is a Baire** function of the real random variable X. It can be shown that fs[X(w)]dP = ( ) dF( ) (7.4) and, as a result, it is unnecessary to revert to the original probability space to calculate the expectation. Moreover,*** any problem involving random variables defined on (Q,G,P) which are measurable with respect to j(X), (the smallest o-field on which X is measurable), can be ex**Any function defined on the real line which is measurable is called a Baire function. ***See Ref. 29, p. 621.

71 pressed as the corresponding problem in terms of random variables defined on the induced probability space (QR, R,PR). (The a-field?should not be confused with the a-field N( ) C Owhich is dependent on the quantity contained within the parentheses.) In practice, the inverse of the above operation arises. That is, what properties must a given function F(X) possess in order that there exist a probability space and a random variable with F(A) as its distribution function? Loeve* shows that requiring F(X) to be monotone nondecreasing, continuous on the right, and lim F(X) = O, lim F(X) = 1 (7.5) +-_00 X ++00 is sufficient for the existence of such a probability space and random variable. If, in addition, F(X) can be written as the Lebesgue integral X F(X) = [' p(X)dX, (7.6) _00 we say that p(X) is the density function corresponding to F(X). We will be mainly interested in the case where X is a Gaussian random variable. In this case, p(X) exists and takes the form p(X) = exp - - (X-Mx)2 (7.7) where MX and o2 are the mean and variance, respectively, of the random variable X, i.. e, MX E{X} and = E{X-Mx}2. Since we will be interested in families of random variables, we must define what is known as a multivariate distribution function. In particular if Xi,..,Xn are real random variables, the function defined by F(A1,...,n) = P{w;Xj(w) < Aj, j=1,2,...,n} (7.8) *See Ref. 28, p. 167.

72 is called their multivariate distribution function. Conditions that a function must satisfy in order that there exist a probability space and random variables with such a multivariate distribution are given by Loeve.* These conditions require F(A1,... n) to be nondecreasing and continuous on the right in each of variables {Xj} with F(X1,... An) = 0 if any Xj = - and F(A1,...,Xn) = 1 when each A. = +o. In the case where the family of random variables has more than a finite number of elements, Kolmogorov** has proved that, if for every finite t set tl,...,tn, the multivariate distribution function Ft,. *tnof Xt... Xt is prescribed and if (1) F t,..t (... n ) (7.9) (1,..tn 1 n t n where Wr,...,Trn is a permutation of 1,...,n, and (2) Ft (A...) = lim Ft,tn( 1n t1,''',t m!.-o t 1'''' n j=m+l,...,n (7.10) for m < n, then there exists a probability space and a family of random variables with the above finite multivariate distribution functions. When a distribution function of n random variables is available, it can also be shown to induce a probability measure P on the n-dimensional Euclidean space n on = H QR R j=1 j by *See Ref 28, p. 169. **See Ref. 29, p. 10.

73 PR(B) = I... SdF(l(,*,'n) (7.11) where R = 2R for all j and B is any Borel set in 2R. The extension of this result to a nonfinite family of random variables can be accomplished and is discussed by Halmos.* As in the single variable case, any problem involving random variables measurable with respect to 6(Xt;teT), the minimum a-field on which Xt is measurable when t is contained in the set T, can be expressed as the corresponding problem in terms of random variables defined on the induced probability space. The concept of a measure (probability) as discussed above, allows us now to define several types of convergence which will be useful in the following development. Definition 7.3: A sequence of random variables {Xn} is said to converge almost everywhere (a.e.) to a random variable X, with respect to a measure P, if there exists a set N for which P(N) = 0 and Xn.(w)+X(w) when E4N. Definition 7.4: A sequence of random variables {Xn} is said to converge in the mean to a random variable X, with respect to a measure P. (denotedli.nm Xn = X) if, for every E > O, there exists a real number no n-~oc such that, for n > no, JXyn-Xi2dP < E Definition 7.5: A sequence of random variables {Xn} is said to converge in measure (probability) to a random variable X, with respect to a measure P, if, for everyel 2 >O,there exists a real number no such that, for n > n,, *See Ref. 30, p. 154.

74 P{w: IXn-XI > Elj < ~2 (7.12) Definition 7.6: A sequence of random variables {Xn} is said to converge in distribution to a random variable X if the distribution functions {Fn} of {Xn) converge to the distribution function F of X at every continuity point of the latter. Any standard text* on measure theory proves that convergence almost everywhere implies convergence in measure and in distribution. Likewise, convergence in the mean implies convergence in measure and in distribution. And finally, convergence in measure implies convergence in distribution. We now list several theorems which will be necessary to the following treatment. The proof of these theorems can be found in the references noted in each theorem. Theorem 7.1: If the sequence of random variables {Xn} converge in the mean to X, then some subsequence {Xnk} converges almost everywhere** to X. Theorem 7.2: If (X } is a sequence of random variables which converges in measure to two different random variables, X and Y, then X = Y almost everywhere~ *** Theorem 7.3: If {Xn} is a mean fundamental sequence of integrable random variables, i.e., if for every e > 0 there exists a real number no such that ~ XnXm dP < ~ (7.13) Q when m, n > n? then there exists a square integrable random variable X *See Ref. 30, pp. 92 and 103 and Ref. 47, p. 77. **See Ref. 28, p. 119. ***See Ref. 30, p. 92.

75 such that* l.i.m. X X n->oo Theorem 7.4: A random variable is integrable with respect to a measure P if and only if its absolute value is integrable with respect to that measure. ** Theorem 7.5: If {Xn} is an increasing sequence of nonnegative random variables which are integrable with respect to a measure P, then*** lim XndP = lim J XndP (7.14) Qn-o n-oco Q This theorem specifies conditions under which the order of the limit and integration operations may be interchanged. Theorem 7.6: (Schwartz inequality). If X1 and X2 are square integrable functions defined on the finite measure**** space (Q,i,P), then***** - 1/2 X2d 1 2dP < ) 1/2 d 1/2 (7-15) Schwartz's inequality gives us an upper bound on the expectation of the product of two random variables in terms of the second moments of the individual random variables. Theorem 7.7: (Tchebycheff's inequality). If X is a random variable such that E{X} = 0 and E{X2}= a2 < a, then, for every****** positive E, X _ P(w;xl > 2} < (7.16) 2 x *See Ref. 49, p. 146. **See Ref. 30, p. 113. ***See Ref. 30, p. 112. ****A measure space (Q,,P) is finite if P(Q) < a. *****See Ref. 30, p. 175. ******See Ref. 30, p. 200.

76 We are now in a positiion to give the definition of a random process. Definition 7.7: A random process is any family of random variables {Xt;teT} where T is some index set. It may also be denoted with {X(t,w);wQ,teT} to emphasize the fact that two variables (t,w) are now involved rather than one as in the case of a random variablee The index set T is normally associated with the time of observation with the process being either a discrete parameter of continuous parameter proc ess depending on whether the set T is an integer set or an interval of the real line. The function of tsT obtained by fixing wEQ and letting t vary is called a sample function or trajectory of the process. In this presentation, we will be concerned chiefly with Gaussian random processes which are defined by specifying the form of the multivariate distribution functions. In particular, if, for every finite subset tl,...,tn of T, F(XFXt AX)=r:tn 1 exp 1 TMx (XMX dX t t ) 1 X (X-MX n n _ co (2;)n/2 V eVn x 2p (7.17) where X = (Xt,.. Xtn)x 1 MX = E{(Xt,...,X )T} n and Vn = E (X-Mx)(X- Mx)T} (note that (X-MX)T denotes the transpose of (X-MX) while Vn1 and |VnI denote the inverse and determinant, respectively, of the matrix V ), then

77 the process will be said to be a Gaussian process. It is easily shown that such a system of distribution functions satisfy the conditions of Loive and Kolmogorov discussed above, and, as a result, there exists a probability space and a family of random variables with these multivariate distribution functions. Note also that knowledge of the first two moments* of a Gaussian process is sufficient to characterize the process. We shall, henceforth, refer to the first moment of a random process (Xt;teT}, {E{Xt);tsT), as its mean value function and to the joint second moment E ( Xt-E Xt ) ( X-E ); t,vTT,}, as its covariance function. The specification of the multivariate distribution functions of a random process mayr not, however, provide sufficient information to answer all questions that might be of interest. For example, we may want to consider the w-function defined by X(t,w)dt T where {Xt;tET} is a random process and T is the interval [O,Tf]. This operation is not necessarily defined, however, since X need not be a function measurable with respect to t. Therefore in some cases, it is necessary to restrict the class processes considered. This will be done with the help of the following definition. Definition 7~8: The random process X = {Xt;tET} is measurable on the interval T if Xt(w) = X(t,w) defines a function measurable on the product measure space** (QxT,jx TPxt) where 4T = ZR(OT represents the Borel sets on the interval [O,Tf] and Pxt is the corresponding product measure. *See Ref. 43, p. 49, for a definition of moments of a random variable. *See Ref. 30, p. 137, for a discussion of product measure spaces.

78 Thus, if a process is measurable, it is reasonable to consider the integral of the process with respect to t while w is held fixed since such a function is then measurable.* Although not every random process on the interval T is measurable, in many cases they are "nearly" measurable. To make this statement more precise, we will need the following definition and theorem. Definition 7.9: Two random processes X and Y defined on the same probability space (Q,fj,P) and the same interval T are said to be equivalent if Xt = Yt a.e.(P) for every teT. Theorem 7.8: (Borel-Cantelli lemma). For any sequence of events {An} from the a-field I associated** with an arbitrary probability space (Q, 0,P), one has*** E P(An) < 0' 1 im An = ~ a.e.(P) (7.18) n no where ~ denotes the empty set. We can now present the following theorem which specifies conditions on a given random process that guarantee the existence of an equivalent process which is measurable. Theorem 7.9: Let X = {Xt;teT} be a random process on the probability space (Q,,P1) for which the second moment Ei{XtXt, } is continuous when t,t'sT = [O,Tf]. Then, there exists a random process X' defined on (Q,5j,P1) which is measurable and equivalent to X. In addition, if X is also a random process on the probability space (Q,3,P2) with E2{kXtXt } continuous for t,t' T, there exists a process X" which is measurable and *See Ref. 30, p. 142. ~*See Ref. 50, p. 128o ***'jim 7 n =f) JAm. non m _ n

79 equivalent to X regardless of which of the spaces, (s,y,P1) or (0,,P2), is the underlying probability space.* Proof: Applying Theorem 7.6, we can write Qxt-xtldP1 {f(Xt-Xt,)2dP 1/2 = {E1{(Xt)2} + Ei{(Xt,)2}-2E{XtXt,}} 1/2 (7.19) But, since El{XtXt, } is continuous for t,t'ET, for every C > 0, there exists a 6 such that, when It-t' I < 6, the expression on the right of Eq. (7.19) is less than E2. Then, if we let Be = {W;IXt-Xt, > E), we see that ~ IXt-Xt IdP1 > P1i(BE) which, together with Eq. (7.19), implies P1w;{ iXtXtXt > E) < l (7.20) when It-t'l < E, i.e., P1([; JXt-XttI > e) goes to zero as t' + t. Now, for every integer n > O, let us choose a finite increasing sequence from T, say 0 < t <... < tk < T, such that Pl{w;IXu-Xvl>n- 1} <2-n if u and v belong to the same interval [tn,tn]. (Without loss of geni-l i erality, we may suppose that {t'}iC{tQ'+}. for every n and that the countable set T' consisting of all the t-. is dense in T.) Then, let us 1 define a sequence {Xn;n>O} of mappings of QxT into the real line by xn(t,5) = X(tn~,w) f t < t < ti+ These mappings are measurable on (MxT,xT). Morei i+l' over, the inequality E Pl{w;Ixt-Xr | > n-1} < X 2-n < n n *This theorem is an extension of a theorem presented in Ref. 50, p. 91.

80 holds for every tET and, due to Theorem 7.8, rim {Io; Xt-X~_ > n"'1} = a.e. (P) But this implies that the jointly measurable function rim xn(t,w) equals X(t,cw) a.e. (P1) for every teT. To prove the second part of this theorem, we need only choose the sequences {ti}i such that the relation P2{w;IXU-XVl > n-1 } < 2-n is also satisfied. The remainder of the proof then follows the construction presented above. Q.E.D. We can now state the following theorems which illustrate several properties of measurable processes. Theorem 7.10: (Fubini). If X is a nonnegative measurable random process on QxT, then f/ Xd(Pxt) = j XdtdP = J XdPdt (7.21) QxT Q T T Q where P, t and Pxt are the measures on Q,T and QxT, respectively. Also, if A is a measurable subset of QxT of measure zero, then almost every section has measure zero.* (The set At = {w;(w,t)eA} is called the section of A determined by t.) Theorem 7.11: (Fubini). If X is an integrable process on 2xT then (7.21) is valid.** *See Ref. 30, p. 147. **See Ref. 30, p. 148.

81 The above two theorems will be used to justify the interchange of the orders of integration with respect to time and with respect to the probability measure of the sample space. Theorem 7.12: Let {Xt;tET} be a zero mean measurable Gaussian process defined on the probability space (Q, 6,P) and let T be a finite closed interval of the real line. If E{XtXs} is continuous for t, seT, and {gj} is any sequence of continuous real valued functions on T, then the sequence of random variables {8.} defined by Gj(w) = X(t,W)g.(t)dt (7.22) T is jointly Gaussian.* Families of random variables which have the property of being "independent" will also play a key role in this presentation. As a result, we state the following definition. Definition 7.10: A family {Xj;jcJ} of random variables where J denotes an arbitrary index set is said to be independent if P[ {w;X. < a.}] = H P{w;Xj < a.} (7.23) for every finite subset J' of J and every choice of real aj,j EJ'. Note that by this definition, the distribution function of a finite subset of an independent family of random variables is the product of the distribution functions of the individual random variables of the finite subset. This implies, in the case of an independent family of Gaussian random variables which eventually enters our discussion, the density function** of any finite subset of the family if the product of the individual density functions of the finite subset. *This Theorem is a trivial extension of the discussion presented in Ref. 43, p. 155. **See Ref. 28, p.227.

82 The concept of a "conditional expectation" of a random variable relative to a a-field 3' will also be useful. The following defintion will serve to classify this operation. Definition 7.11: If X is a random variable on (M,?,P), the conditional expectation E V X of X with respect to' & C is a random variable on (Q,~',P,) such that XdP = E XdP (Be (7.24) where P:3 is the restriction of P to 3'. (E X is unique* a.e.(P l.) This definition allows us to write the following theorem. Theorem 7.13: Let X be a random variable of finite mean on (Q,3,P) and let i-31 C+32C C..C ~ be a-fields of measurable sets. Let f' be the minimum a-field containing l,e2,...,i.e., let ~' =(91,1'~2', )' then** rim E nX = E' X a.e.(P') (7.25) n-oo where P' is the restriction of P to iV'. We shall conclude this section with two additional definitions which are concerned with the mutual relationships that exist when two different measures are defined on the same a-field and three theorems which are consequences of these definitions. Definition 7.12: If (Q,,P1) and (1,3,P2) are two probability spaces,we say that P2 is absolutely continuous with respect to P1, in symbols P2 << P1, if P2(B) = 0 for every measurable set B for which *See Ref. 50, p. 121. **See Ref. 29, p. 331.

83 P1(B) = 0. If P1 << P2 and P2 << P1I then P1 and P2 are said to be equivalent, in symbols P1 - P2. Definition 7.13: If (nQ,~,pPl) and (Q.,",P2) are two probability spaces, we say that P1 and P2 are mutually singular, or more simply, that P1 and P2 are singular, in symbols Pll1P2 if there exists two disjoint measurable sets A and B whose union is Q and P1(A) = P2(B) = 0 Theorem 7.14: If (Q, Qt,P1) is a probability space and if P2 is a probability measure that is absolutely continuous with respect to P1, then there exists a finite-valued measurable function dP2 dP1 on Q2 such that dP2 P2(B) = o 2 dP1 (7.26) B dP1 for every measurable set B. The function dP2/dP1 is unique in the sense that if, also, P2(E) f= f dP B for all Be;, then dP f = 2- a.e.(P1)' dP 1 The function dP2/dP1 is called the Radon-Nikodym derivative of the measure P2 with respect* to the measure P1. *See Ref. 30, p. 129.

84 The above Radon-Nikodym derivative will prove to be an indispensable tool in the solution of the estimation problem. Theorem 7.15: (Kakutani). Let {P ln} and {P2n be a sequences of probability measures where, for all n, Pln and P2n are defined on a ofield in of sets from a space Q and Pn P2n Then the infinite direct product measures P and P are either equivalent, product measures P 11 P= and P' = P are either equivalent, 1 ln 2 2n n=l n=l 1P P2 or mutually singular, PilP, according as the infinite product nHo S JPnd? no dP2n (7.27) n=l n dP2n n is greater than zero or equal to zero.* This theorem will be applied to the case where {(8 is a sequence of random variables such that 6j is defined on the two probability spaces, (,3,Poj.) and (Q jPi ). Then, due to the above theorem, the pro00 00 00 00 duct measures P* = P oj, and P* = II P defined on ( I Q., HI ( ) 0 j=l i j=l 1j j=l J j=l J are either equivalent or mutual singular. 7.2.2 Functional Analysis Functional analysis is the study of functions in which the individual elements of a class are considered to be points of an appropriate infinitedimensional space. In this way, many theories of these functions can be derived as simple extensions of theories that are true for finite-dimensional Euclidean spaces. The function spaces of interest to us can be characterized by the following definitions. Definition 7.14: A complex (real) linear** space W is called a *See Ref. 51, p. 295. **See Ref. 44, p. 9 for the definition of a complex (real) linear space.

85 normed linear space if there exists a real-valued function on W, whose value at weW we denote by I |wl|, with the properties: (1) IIw+w211 _ Iwll1 + 1Iw211 (2) I lawl I = lal I w I (3) lIWIl > 0 (4) 11 Iw + o if w $ o when w, wl, w2EW and a is an arbitrary complex (real) scalar. Definition 7.15: A complex (real) linear space W is called a complex (real) inner product space if there exists a complex (real)-valued function on WxW, whose value we denote by (w 1,w2 ) when w, w2W, with the properties: (1) (Wl+W2,w3) = (wl,W3) + (w2,w3) (2) (wl,w2) = (w2,w1) (the bar denoting complex conjugate) (3) (awl,w2) = a(wl1,w2) (4) (w,w) > 0 and (w,w) + O if w + O when w, wl, w2, w3cW and a is an arbitrary complex (real) scalar. Note that if we define Iwli = (w,w)l/2, an inner product space is also a normed linear space. One class of functions with which we will be concerned is denoted by L2[T] and includes all real-valued functions which are square integrable on the interval T, i.eo, all functions f for which the Lebesgue integral / f2dt, (7.28) T where t represents the Lebesgue measure on the interval T, is finite. For this class of functions, we can define an inner product by (f,g) = fgdt (7.29) T

86 and it can be shown* that L2[T], with this inner product, satisfies the axioms of a real inner product space. It is also a normed linear space if we define I f2 = f2dt. (7.30) T Furthermore, this space** is complete which means that all Cauchy*** sequences in L2[T] converge in norm to an element in L2[T], i.e., if {fn) is a Cauchy sequence in L2[T], then there exists an element fEL2[T] such that lim IIf-fnl = 0. n-,oo A complete complex (real) inner product space is also called a complex (real) Hilbert space. Henceforth, we shall refer to a complex (real) Hilbert space as a Hilbert space since the theorems that will be required are true in both cases. In addition, L2[T] is separable which implies that there exists a sequence of functions {gi} such that (gj,gk) = 1 if j=k = 0 if jik (7.31) and such that, for any gEL2[T], n lim Ig- I (g,gj )g 2dt = 0. n- T j=l J The sequence {gn} is known as an orthonormal basis for the Hilbert space. (If a sequence of functions satisfies Eq. (7.31) only, it is called an orthonormal sequence.) *See Ref. 44, p. 108. **See Ref. 44, p. 377. ***If (fn) is a Cauchy sequence, for every E > O, there exists a number no such that, for n, m > no, If-ff| < s.

87 If we now consider the set of functions K k 1 K L2[T] = I L2[T] L2[T]x... xL2[T] (7.32) k=l where Lk[T] = L2[T] for all k so that a point in L2[T] is a vector valued function with each component of the vector being an L2[T] function, we find that L2[T] is also a separable Hilbert space when the inner product is defined by K ([f],[g]) = I f [f]k[g]k dto (7.33) k=l T In this equation, [f], [g]EL2- [T] and f]k and [g]k are the kth components of the vectors [f] and [g], respectively. The space L2[T] will be the space of interest in the remainder of this chapter. The next topic to be reviewed is concerned with transformations defined on the space L2[T]. As an example, consider the integral operator R defined on L2[T] by R([f]) = f [r(t,v)] [f(v)]dv (7.34) T where [r(t,v)] is a KxK matrix of functions of two variables and [r(t,v)][f(v)] is the matrix product of [r(t,v)] and [f(v)]. The properties of such a transformation when the kernel [r(t,v)] is of a particular type will be discussed later. In the meantime, we shall classify the transformation to be used in the next section. Definition 7.16: If W1 and W2 are linear spaces, then an operator R on W1 into W2 is called linear if the following two conditions are sati s fied: (1) R(W11+Wl2) = R(W11l) + R(W12) (2) R(awl) = aR(wl) (7.35)

88 where a is an arbitrary scalar and wl, wll, w12cW1. Definition 7.17: If R is a linear operator, then the inverse of R, denoted R-1, is such that R(R-l(w )) = w and R-1(R(w )) = w (7.36) for all w. (Note that the inverse does not always exist.) Definition 7.18: If W1 and W2 are normed linear spaces, a linear operator R on W1 into W2 is bounded if IIsup IR(W1) II < O. We define the norm of R, denoted I RI I, by IIRII = Su IIR(w )11 (7 37) if it exists. (It can be shown* that the norm of R is also given by IIR(wl)l[ sup R(w) I I w1+O 1w I11 Definition 7.19: If W1 and W2 are normed linear spaces and R is a bounded linear operator on W1 into W2, it is said to be completely continuous if it has the following property: If {wln} is any bounded sequence of elements from W1, i.e., I Iwln I < M for some finite M and all n, then, the sequence {R(Wln)} contains a convergent subsequence. Definition 7.20: If W is an inner product space, an operator R on W into W is said to be positive if *See Ref. 44, p. 86.

89 (R(x),w) > 0 (7.38) for every wcW. It is positive definite if (R(w),w) > 0 (7.39) for every nonzero wEW. Definition 7.21: If W1 and W 2are inner product spaces and R is a linear operator on W1 into W2, then the adjoint R* is defined on W2 into W1 by (R(wl),w2) = (wl,R*(w2)) (7.40) for wlsW1 and w2EW2. If W1=W2 and R* = R,we say that R is self-adjoint. Definition 7.22: Let R be a positive self-adjoint operator on W into W. A square root of R, denoted R1/2, satisfies R1/2(R1/2(w)) = R(w) (7.41) for every w6W. (Every positive self-adjoint bounded operator possesses a unique positive self-adjoint bounded square root.*) Having made the above definitions, we will now state several theorems which will be useful in the following development. Theorem 7.16: If R is a bounded linear operator defined on a dense subset of a Hilbert space W into W, then there exists a unique bounded linear extension R' of R to all of W, i.e., R' is bounded and R'(w) = R(w) (7.42) if we,R' the domain of R. Proof: If wzW, there exists a sequence {wn} such that WneR for ea.ch n. and *See Ref. 52, p. 62.

90 I Wn-Wl I 0 since oR is dense in W. Define R'(w) = lim R(wn) = z n-o which exists because R is bounded and W is complete. Then I|R'(w)I |IR(wn)ll - = im < IIRI I1 I Il n- l tW Il since the norm is a continuous* function. Now assume that there exists another bounded extension R" such that R"(w) = z2~ Then lIZl-Z2I = II R'(w)-R"(w)fl = lim IIR(wn) R(wm)I = m,nltoo m and we see that R' is unique. Q.E.D. Theorem 7.17: If R is a bounded self-adjoint linear operator defined on a Hilbert space W into W with its null space** consisting of the zero element only, then the inverse R-1 is densely defined.*** Theorem 7.18: Let Rjkj,k{l1,2,...,K}, be a finite set of completely continuous operators, each defined on the Hilbert space W into W. Then, K the operator R defined on the product space WK = 1 Wk,Wk = W for all k, k=l into WK by wI = R(w2) (7.43) *See Ref. 44, p. 84. **The null space of an operator R defined on W consists of all woW such that R(w) = 0. ***See Ref. 44, p. 226.

91 where w1 = (ll,w12,...,wlK)CWK, w2 = (w21,w22...'W2K)EWK, and K wli X Rik(W 2k) (7.44) k=l 1 is completely continuous.* Theorem 7.19: If R is the operator defined on L2[T] into L2[T] by R(f) = /r(t,v)f(v)dv, (7.45) T then the condition f Ir(t,v) 2dt dv < o (7.46) T T is sufficient to ensure that R is completely continuous.** Thus, if the transformation defined by Eq. (7.34) has a kernel which satisfies K j [r(tv)]jk dt dv < (7.47) j,k=l T T where [r(t,v)]jk is the jkth element of [r(t,v)], it is completely continuous by the above two theorems. The next two theorems are known as the Spectral Theorems for completely continuous self-adjoint operators and bounded self-adjoint operators, respectively. In order to write these theorems, we must recall that a closed linear subspace of a Hilbert space is a linear subspace in which all Cauchy sequences converge to a limit which is in the subspace. Also, if Wj is a closed linear subspace of a Hilbert space W, every element of W can be written as the unique sum of two elements, one in WJ and one in the orthogonal complement of Wj which is the set of all wEW which satisfy (wj,w) = O for all wjcWj. *See Ref. 53, p. 316. **See Ref. 53, p. 320.

92 Theorem 7.20: (Completely continuous self-adjoint case). Let R be a completely continuous self-adjoint transformation defined on a Hilbert space W. Then, the null space (a closed linear subspace) of R is orthogonal to its range, i.e., if wl is contained in the null space of R and w2 in its range, then (w1,w2) = O. Furthermore, there exist a sequence of distinct real numbers p1',2'. and an associated sequence of closed linear subspaces W1,W2,... each having nonzero dimensionality with the following properties: (1) On Wj, R = pjI, i.e., the vectors of Wj are the eigenvectors of R corresponding to the eigenvalues vij. (2) If jSk, then WjIWk, i.e., (wj,wk) = 0 for any wjEWj and wkEWk. (3) The closed linear subspaces {W j} span the range of R in the sense that any w in the range of R can be written as the sum w = wI + w2 +... (7.48) where wj~Wj for each j, and with the series converging in norm to w. (4) The subspaces {W } each have finite dimensionality. (5) The sequence {fpj} converges to zero. The numbers {pj} and the subspaces {Wj} are uniquely determined by R.* Note that the above theorem implies the existence of a sequence of functions which are eigenfunctions of R and which form an orthonormal basis for the range of R. Theorem 7.21: (Bounded self-adjoint case). If R is a bounded selfadjoint linear transformation on a Hilbert space W and if m(R) = inf (R(w),w) (7.49) II 1=1 *See Ref. 54, p. 115 and Ref. 44, p. 336.

93 and M(R) = sup (R(w),w), (7-50) I Iw 1=1 then there exists a family of projections {E ) defined for each real u having the following properties: (1) EP1E2 = EE = E = E if p1 < ~2, (2) lim +E2 (W) = EP1(w), (3) Ep = 0 if P < m(R), Ep = I if M(R) < -, (4) E R = RED. (7.51) For each w, and w2 in W, (Ej(w),w2) is of bounded variation as a function of p, and L2 (R(Wl),W2) = P pd(E (W1l),2). (7.52) The integral is an ordinary Stieltjes integral over any interval [1,B2] such that 31 < m(R), M(R) < 82. Also, the formula B2 R = U (7.-53) holds, the integral on the right being defined as the limit of RiemannStieltjes sums with convergence in the norm of W.* Corollary 7.1: If R is a bounded self —adjoint operator on the Hilbert space W whose spectrum is discrete, i.e., if there exists a sequence of real numbers {-j} for which the family of projections {El} discussed in the previous theorem has the property, for every j, E (w) = El (w) *See Ref. 44, p. 345.

94 when wEW and pj < p a pjl, and if lim M = O (7054) with the range of E.-E._ being finite for all 4t, then R is completely continuous. * The Spectral Theorem for completely continuous operators can be applied to prove the following theorem. Theorem 7.22: (Mercer's Theorem). Let the kernel [r(t,s)] of the operator R defined in Eq. (7.34) be the covariance function of a K-dimensional vector valued random process with each element of [r(t,s)] being continuous and bounded on TxT. Then, there exist an orthonormal sequence of functions ([gn ], [gn]EL2[T] for each n, and a sequence of real numbers [(n) that satisfy R([gn]) = n[gn] (7.55) for all n and such that [r(t,s)] = (n[gn(t)] [gn(s)]T (7.56) n=l where the series converges uniformly for all t, seT.** This theorem gives us a "decomposition" of the kernel [r(t,s)] which separates its variation with t and s into functions which are dependent on a single variable. One additional type of operator will be of interest in the following treatment, the so-called Hilbert-Schmidt operator. The integral operator R defined on L2[T] by (7.34) and such that *See Ref. 55, p. 234. **See Ref. 56, p. 215.

95 K [)[r(t,s)].klI2dtds (7.57) j,k=l T T is an example of such an operator (see Lemma 7.1). To characterize these operations, we make the following definitions. Definition 7.23: If W is a separable Hilbert space, the trace of an operator R on this space is defined by tr(R) = I (R(w.),w ) (7.58) j=l j, where {wj} is an orthonormal basis for the Hilbert space. Definition 7.24: The Hilbert-Schmidt norm N of an operator R on a Hilbert space W is defined by N(R) = (tr(R*R))1/ = ( I IR(wj) (7.59) where {w.} is an orthonormal basis for the space. A Hilbert-Schmidt operator, then, is any operator R for which N(R) < o. The following theorems for the Hilbert-Schmidt operators will be necessary for the solution of the estimation problem. Theorem 7.23: The Hilbert-Schmidt norm has the following properties: (1) N(R) is independent of the basis used in its definition. (2) Itr(R1R2)I < N(R1)N(R2), (3) N(R1R2) < I IRII IN(R2) N(R1R2) < I IR2 IN(R1), (4) IRH < N(R), (7.60) when R,R1,R2 are Hilbert-Schmidt operators.* Theorem 7~24: Every Hilbert-Schmidt operator is also a completely continuous operator. ** *See Ref. 49, p. 1012. **See Ref. 49, p. 1010.

96 It will be shown later* that the operator defined by Eq. (7.34) and satisfying (7.57) has a Hilbert-Schmidt norm equal to (7.57) and, as a result, is a Hilbert-Schridt operator. 7.3 SOLUTION Having stated the above mathematical preliminaries, we are now in a position to apply these ideas to the solution of the direction finding problem as discussed in Chapter 6. Let us begin by defining an observation space to be the space Q = & kT where, for each k, k is the space k=l of all real-valued functions on the observation interval T = [O,Tf] and K is the number of receiving elements in the antenna array. Due to its frequency of occurrence in the remaining chapters, we are denote the interval [O,Tf] with the symbol T. Then, an observation, a point in the observation space, takes the form of the vector valued function [y(t)] = (y(t),y2(t),..,y(tK))T tET, with yk(t)ERk and representing a possible signal received by the kth antenna of the array. Initially, instead of allowing the angle of arrival to have any value within a specified sector, let us consider the case in which this unknown angle is limited to a finite number of specific values, {a1, 2,'''',aM In this case, a solution takes form of a disjoint partition (A0,Al,..,AM) of observation space Q with the understanding that if yA,i{1,2,....,M }, we will make the decision that the true angle of arrival is given by ai and if yEA0, we will make the decision noise alone is being received. An optimal solution is then defined to be the disjoint partition of Q which satisfies the "MPE" optimality criterion discussed on Section 6.3. (It will be shown later that this solution is also optimal with respect to the "MAP" optimality criterion mentioned in Sec*See Lemma 7.1.

97 tion 6.3.) This criterion requires the minimization of the probability of error P, whil,3 in this case, is defined by e Pr(ai.)Pr (Q-A.) + Pr(ac )Pr (Q-A ) (7.61) where Pr(ao) is the a priori probability of receiving noise alone, Pro(Q-AO) is the probability of an incorrect decision being made when noise alone is present, and, for each i{(1,2,... M), Pr(ai) is the a priori probability that ai is the true angle of arrival while Pr.i(S-Ai) is the probability of an incorrect decision being made when ai is the true value of the parameter. Thus, the optimal partition of Q is that partition of the observation space which results in a minimum probability of error. At this point, due to their repeated occurrence in the following discussion, let us agree to denote the sets ({,2,...,M) and {o,l1,2,...,M} with the shorthand notation M and Mo, respectively. The fact that we are using the symbol M to represent both the integer M and the set of integers {1,2,...,M} will not result in confusion since it will be obvious from the particular application to which quantity we are referring. There is one obvious difficulty with the above formulation, however, and it is concerned with the interpretation of the probability Pri(Q-Ai). For example, does this probability exist for all partitions of Q and when it does exist, how do we evaluate it? To answer these questions9 let us recall that [y(t)], tsT, is a sample function of one of the vector valued random processes [yi] = {[Yi];tcT}, iFsMoas discussed in Section 6.3. (The process [yi] should actually be denoted as [yai] according to the notation of Section 6.3 but, for convenience, we have replaced the superscript am by i.) Then, as a result of the discussion of Section 7.2.1, 1

98 we know that there exists a probability space (i) on which [y] is defined. Note that the probability measure Pi and the a-field ei are dependent on the true angle of arrival since the covariance function which determines this measure and a-field is dependent on ai. We then see that, by Pri(Q-Ai), one really means Ti(; [Yi(w)]Ek-Ai}. It is now obvious that all possible partitions of Q are not necessarily possible since all sets of the form {w;[Yi(w)]cQ-Ai} need not be measurable. Therefore, let us now determine for which subsets of Q2, the probabilities of their inverse images under [yi] are defined, i.e., let us determine the subsets of Q which are possible candidates to be members of the optimal partition of Q. The knowledge that a particular process, say [Y i], is Gaussian with a specified covariance and mean value function provides the probabilities of the sets in the minimum a-field with respect to which each random variable [yi] is measurable when teT. Let us denote this minimum a-field t t and the measure defined on it by Hi. According to the discussion of Section 7.2.1, let us also make the standard construction which completes this measure so the a-field of sets of known probabilities can be enlarged slightly, by completion, to i with the associated measure being Pi. Note that if the inverse image of a subset of Q under [yi]- is contained in i., it need not be in Q. under [YJ]- when ifj. This detail will be considered in the next paragraph where we shall construct probability spaces which contain all the information that is necessary for obtaining an optimal partition of Q and whose sample points are elements of 2.

99 Let us now define Vto be the o-field of subsets of f2 generated by the class of sets A of the form D = {[fleQ; [f(t)1 < a1 [f(t2)] < a2'5,[f(tn)] < an) (7.62) where n is any arbitrary finite positive integer, t.eT for jE{1,2,...,n}, and a,jE{1,2,...,n}, is any real vector in K-dimensional Euclidean space. In Appendix I, it is shown that [yi]l (() = <i where i is any isMo. Then, if we define a measure Pi such that P.(B) P= (bi) (7.63) when BE4G and Bi = [Y]-l(B), we see that (M,e,Pi) is a probability space with a measure defined on e that is consistent with the measure on i. Furthermore, if we complete Pi to tF on Hi' we find (Q, 1i'Pi) possesses the desired property that every subset of Q, whose inverse image under [yi] is measurable with respect to -i j is an element of Vki. This follows from the fact that [yi]-l(B1L)B2) = [yi]-l(B1)kJ [Yi]-l(B2) for B1, B2E R. The measures of the corresponding elements of these two a-fields are easily seen to be equal. In a similar manner, the probability spaces (Q., i,Pi), iEMo, can be constructed. To summarize the above discussion, the probability spaces (n,l i,Pi), icMo, can be constructed so as to have the property that any subset of, whose inverse image under [yi] is contained in 6i is an element of ti and the probabilities of the subset and its preimage are equal. Also, any element in Hi has a preimage in i of equal measure. As a result the sets that make up the optimal partition of 2 must be elements of the appropriate o-fields i, ieMo.

100 It will be shown later that, in the case of most practical interest, the above measures have the property P - Pol, icM, i.e., these measures are equivalent. For this case, the following theorem with O= gives us a clue to the construction of an optimal partition of the space Q. Note that, if P. P, then Pi << P ~ Pi and i. = 1 1 0 o 1 0 Theorem 7.24:* Let (S,),Pi), iEMo, be probability spaces such that Pi' iCMo is absolutely continuous with respect to P, i.e., Pi << Po, iEMo * (7.64) Let ai, iEMo, be such that 0 < ai < 1 and I ai = 1 and let (Ao,A1,...,AM) be a partition of Q, i.e., /) Ai = Q, Ai for each ikMo and Ain = if i j. Let dPi/dPoL, ikMo, be the Radon-Nikodym derivatives of Pi with respect to Po, and let (A*, A* A..). A be the partition of Q whose elements are defined by dP. dP. -Po, dP n {w;ai (w) > a. (w),j > i} (7.65) dPo dP for all iM o. (Note, in writing Eq. (7.65), we have treated the subscript o as the number 0 so that j < i and j > i are meaningful.) Then, for any partition (A 0A..,A A M) of Q, we have a.Pi (Q-A*) < I a. (Q-Ai). (7.66) iMo0 1 10 Proof: Since the Radon-Nikodym derivatives are measurable functions, it can be seen that the sets A., ieMo, are measurable. Moreover, we have *The remaining theorems and lemmas of this chapter are xtensions to the vector ca~ of results originally obtained by Kadota, ~-9 Pitcher,5 Bharucha, and Root. 51X

101 P aipiA) - a. —(A) =[a.P(A — A^n Aj)-aj.P(AA A.)] icMo iEMo i,j Mo dPi dP _ i -()-aL j _ ( dP > 0 (7.67) where the first equality follows from the fact that both (A*,A*,.,A*) o 1M and (Ao,A1,..,AM) are partitions of Q and the inequality follows from the definition of the sets A* isM Then, because of the relation i o'7 aiPi(Q-Ai) = 1 - aiP i(Ai), (7.68) iEMo isMo we see that (7.67) is equivalent to a iPi -Ai) < aiPi(Q-Ai). (7.69) isM0 ieMo Q.E.D. As a result, the above theorem tells us that if the measures Pi, isMo, are each absolutely continuous with respect to Po which is implied by the statement Pi -Po, iEMo0 then knowledge of the Radon-Nikodym derivatives dPi/dPO, iEMo, is sufficient to provide an optimal partition of the observation space Q. Thus, the problem reduces to that of obtaining these derivatives. The following theorem will allow us to extend these results to the general direction finding problem in which ai's are not limited to a finite set of values. Theorem 7.25: Let Pi and A*, ieMo, be as in the previous theorem but with M no longer finite and define TIHE UNtVERoTY OF MH',lGAN ENGINEERING LIBRARY

102 dP dP j N = {w; sup a. W() ai (W), ieMo} (7.70) i Mo dP dPoP where M' = {0,1,2,...}. Also, define 0 A = AgN 0 0 and lT A, ie{1,2,...}. (7.71) i Then {AL*}is a partition of Q and iia..(Q- _) < a.P i(Q-Ai) (7.72) for all partitions ( Ao 1 A2,.. ) of Q. Proof: Clearly, the sets A*, isMo are disjoint and measurable. However, their union ( Ai is not necessarily equal to Q, i.e., for every isM' wSQ, sup aj (dPjY@Fo0 (w) may not be achieved for any jeMo so that w need not belong to any A.. The set N is this set of exceptional w points. Since dP./dP > 0 a.e.(Po), Theorem 7.5 tells us ne dP/ a dP 1. jFM J dP 0 jeMI Q dP. Thus, dF. I aa - a.e. (PO) j sMo dPo and lim ao _l- + O a.e.(P ), i-,~ dP

103 As a result, P (N) = 0 and the remainder of the proof follows that of the previous theorem. Q.E.D. Thus, even when the true angle of arrival can achieve any one of a countable set of values, the Radon-Nikodym derivative is the key to the optimal partition of the observation space Q if Pi - PO for all isMO. In order to provide a proof of the next theorem which specifies, for any arbitrary index i EMo the functional form of the Radon-Nikodym derivative dPi/dPo as well as the conditions under which the measure Pi is equivalent to Po, we will need the results of several lemmas. These lemmas will require the use of two auxiliary random processes, [Zi] and [Z~], defined on the probability spaces (Q,e,Pi) and (Q, e,Po), respectively, where i is any index itMl_ for which the particular Radon-Nikodym derivative dPi/dPo is desired. Before specifying these processes, however, suppose [Z'] = {[Z_];tsT} is defined on Q by [z.([f])] = [f(t)] (7. 73) for all [f] E. It can be seen that [Z'] is a random process on (Q,4,Pi) for each iEM' and that - is the minimum a-field for which this is true. It can also be seen, for any finite subset {tl,t2,..,tn}CT, the distribution function of [Ztl ] [Z72 ]I.. * [z n] when [Z'] is assumed defined on ti o2 tn (Q,'Pi) is equal to the distribution function of [Yt ], [Y't ].. t It then follows, [Z' ] is a Gaussian process with a mean value and covariance function identical to that of the process [yi] when [Z'] is defined on (Q,6,Pi). (These common mean value and covariance functions were discussed in Section 6.3.) Moreover, according to Theorem 7.9, there exist measurable processes defined on [ ((,,Pi);ieM~) which are equivalent

104 to [Z'] when defined on {(Q,~,Pi);icM }, respectively. The equivalent processes defined on (Q,-,Pi) and (Q,t3,P0) are the auxiliary processes referred to above and denoted by [Zi] and [Z~ ], respectively. Furthermore, due again to Theorem 7.9, we can assume without loss of generality [Zi(t,[f])] = [Z~(t,[f])] for all tET and [f]E2. (Even though [Zi] and [Z0] have identical values on the probability spaces, the separate superscripts will be retained to distinguish the underlying probability measure.) The equivalences mentioned above can then be applied to show [Zi] and [yi] as well as [Z~] and [YO] have identical mean value and covariance functions. Finally, let us define 93' to be the minimum a-field with respect to which [Zi] and [Zt] are measurable for all teT. It then folt t lows, again due to the above equivalences, the elements generating the a-field 6' (and, hence, the elements of j3') differ from the elements generating 6(and, hence, the elements of 13) by sets of measure zero with respect to Pi and Po i.e., for any element in the generator of' (and, hence in -I'), there exists an element in the generator of'3(and, hence, in \) for which the symmetric difference of these two sets is a subset of a set in 6whose Pi measure is zero and similarly for PO. Thus, for any element BE6, there exists sets B, Ee, B'e' NCNi E for 1 2 1 1 which P1(Nl) = 0, and N2CN~c19 for which Po(NO) = 0 such that B = B'AN1 A and B = B2AN2. But this implies, defining I5' to be the class of sets of the form B'AN where B'E<6' and NCNie~, NCNo0 3, with Pi(Ni) = Po(No) = 0, W~C~~ k'.~~ *(7.74) In the following lemmas, we will also have need of the integral operators Ri, iM', defined on L2[T] by 1O0

105 Ri([f]) = f [ri(t,v)][f(v)]dv (7.75) T where [ri(t,v)] is the matrix covariance function common to the processes [yi] and [Zi] and [f(v)] is any vector valued function from L2[T] which was discussed in Section 7.2.2. (Note that [ri(t,v)][f(v)] denotes the matrix multiplication of [ri(t,v)] and [f(v)].) The properties of these operators which will be of interest to us are summarized in the next lemmao Lemma 7.1: The integral operator Ri defined in Eq. (7.75) takes elements of L2[T] into elements of L2[T] and is self-adjoint, positive definite and Hilbert-Schmidt. Proof: See Appendix II for a proof of this lemma. Note that since Ri is positive definite, self-adjoint and completely continuous, there exists a unique, positive, self-adjoint, bounded, square root operator R1/2 defined on T2[T] (see Definition 7.22). In addition, (Ri/2([f]),R/2([if])) = (Ri([f]),[f]) > o (7.76) for all [f] = 0 so that R1/2 has a zero null space. Then, due to Theorem 7.17, R-1/2 = (R1/2 is densely defined. The next three lemmas will be concerned with the relationships that exist between different probability measures defined on identical cr-fields. In particular, we shall be concerned with the o-field ~ defined in Eq. (7.62) and the two probability measures, Po and Pi, induced on Qby the random processes [YO] and [yi] as defined in Eq. (7.63). Lemma 7.2: Let be a o-field of subsets of 2 such that ~f~<~~~~ ~C (7.77)

lo6' /~~~106 and let Po and Pi be the restrictions of Po and Pi to'1. Then, olPi PolPi * (7.78) Proof: If PoIPi', Definition 7.12 says that there exist disjoint sets A,BFs such that AUB = 2 and to(A) = Pi(B) = 0. But, due to (7.77), A,BE 4J and, since Po and ~i are restrictions of P and Pig P0(A) = P.(B) = 0 which implies PolPi. Q.E.D. Lemma 7.3: Let 4 be a a-field of subsets of Q such that C A, and define to be the class of subsets of Q of the form LAN where Be, NCNO E, NC NiEs and Po(No) = Pi(Ni) = 0. (Note that X is contained in %o and.. ) Let Po and Pi again be the restrictions of Po and Pi to and assume to -Pi (7.79) and (7.80) Then, (1) P _ P. (2) o = 0 i = 0(3) d i / (3) d21/dP0 = dP~/d~o a.e.(P0).

107 Proof: It follows from (7.80) and the definition of 68that, for any BE@1, there exist sets B and N such that Ic, NE 3o, NE3i, PO(N) = Pi(N) = 0 and B = 9AN Then if Po(B) = 0, we have P (B)= P(B) P(B) = P= ) = ( =.Pi(B) = Pi(B) _ 0 0 0 1 because of (7.79) which implies Po << Pi. Conversely, it can be shown that Pi << Po so the assertion of (1) follows. To prove (2), note that (1) implies o3 ='3i. Moreover, it follows from (1) and (7.80) that But, from the definition of I, we have C so that To prove (3), let Bgso and write B dPO d ~ I ) dP = P= ( B P (BAN) A Awhere B = BAN with BFer and N-O?, NE Hi with P (N) = Pi(N) = O. It then follows from Theorem 7.14 that A - doi dPi 1 o dPo a.e.(PO) Q.E.D.

108 Lemma 7.4: Let {j;jEM' }, M' = {1,2,...}, be a sequence of jointly Gaussian random variables defined on the probability spaces (Q,5,Po) and (e,",Pi) such that Eof{6} = Ei{Oj} = 0 Eo{0j ek} = Y js6jk Ei {j ek} = S6jjk where Eo and Ei denote the expectations with respect to Po and Pi. respectively, 6jk is the Kronecker delta, and yj,Sj,jcM', are arbitrary positive numbers. Let X be the minimum a-field with respect to which all the 0's are measurable, and let Po and Pi be the restrictions of Po and Pi on. Then, (1) either Po - Pi or PoPi r A A (2) P0 Pi iff -1 - <co j=l j/ 2 z7 A (3) if Po - Pi9 = exp ~'1 1___ 1 e2'. 1 log Yj a.e.(P dP jl 2 y= j Bj ) 2 (PO) Proof: Let each 8j, jcM' = {1,2,...,}, be the minimum a-field with respect to which the random variable e. is measurable. Then, define & to be the class of sets of the form Bj fn Bj )...fBj where B sjkE k 1 2 nJ2 Jn k kc{1,2,...,n}, n is an arbitrary finite positive integer, and jkEM! (Note that 3 ( ) = i so Po and Pi are probability measures on -( ~).) Now consider the product probability* spaces (Q*,6*,P*) and (*,~ 3*, P*) *See Ref. 30, p. 157.

109 oo Too oo where Q* = H Q,Q. = Q I* = H and P* = H Pmj,m o,i) with j=l j=l j= Pmj being the restriction of Pm to Bj. Remember, the a-field i* is generated by the class of sets i * of the form (BlxB2x...) where BjE i jcM', and all but a finite number of the Bj equal Q. Also, the product measure Pm is uniquely defined by specifying, for each set (BlXB2x...) in qe*, Pm(B 1 xB2 x..) = m(Bjl )Pm(Bj ). m(Bj ) where Bj,Bj,..,B are the sets Bjk tJk' jksM', which are not equal to Q We can now see, since the random variables {ej} are independent, there exists a natural mapping C from ~I* onto i C(BlxB2x...) = Bj B.n o..(Bj with (BlxB2x..o) as described above, which preserves measures, i.e., Pm(BlXB2x...) = Pm(Bj Bj n...Bj ) M M ii i2 in for Me{o,i}. Then, since the probability* measure of the limit of an increasing sequence of sets is equal to the limit of the probabilities of the sets and since all sets** in 2(~) and 8(i)*) are disjoint unions of sets in sand ~*, respectively, it follows that o = Pi if * = P* and PofPi if Po*PiP. Assertion (1) is now an immediate consequence of Theorem 7.15. Since Poj and Pij are Gaussian, we can write dP = ) exp 2 () *See Ref. 30, p. 38. **See Ref. 57, p. 224.

110 Then N00 dpD\1/2 00 (4y.B.j1/4 = I j=l dP oj j=l (j+j)l/2 which converges to a positive number if 00 EL j.L (7.81) j=l (yj+Bj)2 converges to a positive number. But (7.81) converges* to a positive number if and only if r % 4YJ- <J c0 (7.82) j=l (Y+ )2 and, since 4Yj Sj (1_yj/Bj)2 1- - (Yj+Bj)2 (l+y j/j)2 the inequality of (7.82) is satisfied if and only if D00 I ( -rj /j)2 < j=l Assertion (2) then also follows from Theorem 7.15. To prove (3), let 9n = (,) and note, for any Bnen, there exists a B' which is a Borel set in n-dimensional Euclidean space such that Bi(Bn) = Pn (Bn) = p pi(01e2,...,n)de1d2... den (7.83) Bn *See Re f. 58, p. 381.

111 where _nn noA where Pn is the restriction of Pi to in and pi(1,02,...,en) is the joint density, with respect to the probability measure Pi, of the Gaussian random variables1 1,02,.. en (See the discussion concerning Eqs. (7.3) A and (7.11).) Since a similar expression can be written for Po(Bn), (7.83) is also given by ~.(B ) = pi' n n pi (Bn) n= f p(e1,e2, en d..., n)de d2... den B' pn( e1,e2,-. ).,n) B pn(e1,02,...' n) dpn where,n( l2.,oOn) is the joint density, with respect to the probability measure Po, of e01,2,.,0 and pn = n/pn. However, due to n Pin Theorem 7.14 and Definition 7.11, we can also write y1~2nn 0A where EoQ (dPi/d ) is uniquely defined a.e. (Pn) so that a.e. ( )o dP (The subscript of Eo? denotes the fact that the underlying probability measure is PO.) Then, by Theorem 7.13, we have lim pn = im E di a.e (P (7.84) 0 A o n->om n0o dP0 dP since dP.i/dP is measurable with respect to+. Assertion (3) now follows from (7.84) and the fact that

112 = exp ) + g4 pn(1 1,2,...,n ) 3j Q.E.D. The following lemma shows that the "inner product" of the measurable random processes [Zi] and [Z~] discussed previously and any L2[T] function results in a Gaussian random variable. Lemma 7.5: If [Zi] and [Z~] are the K-dimensional, zero mean, measurable Gaussian processes discussed following Theorem 7.25 and {[gj]} is any sequence of elements from L2[T], then the sequences of random variables {ei)} and {e?} defined for all [f]e2 by j 3 oi( [f]) = j [zi(t, [f])]T[g(t)]dt and e~.(.[f]) = f [Z~(t,[f])]T[gj(t)]dt j T are jointly Gaussian on (Q,W,Pi) and (Q2,,Po), respectively. (Note that ei([f]) = e~([f]) for all [f]EQ since [Zi(t,[f])] = [Z~(t,[f])] for J j all toT and [f]sQ.) Proof: See Appendix III for a proof of this lemma. The next lemma specifies a condition, in terms of the integral operators defined in Eq. (7.75), under which the measures Po and Pi are mutually singular, i.e., PoLPi. Lemma 7.6: Let Ro and Ri be integral operators of the type defined by Eq. (7.75) and let [Z~] and [Zi] be the random processes referred to above. If either R1/2R-1/2 or R1/2 R-1/2 is unbounded, then PolPi. 1 0 Ro 1

113 Proof: Note that Ri/2R/2 and Ro R are densely defined opi /2 l/2 ti1eeit erators on L2[T] and suppose R.i/2Ro-/2 is unbounded. Then, there exists -1/2 a sequence of L2[T] functions {[fj]} in the domain of Ro satisfying I I[f j] = 1 and IRiR/2R-11/2([fj])II > j3. Let i([f]) = 5 [zi(t[f]) T[-1/2( [fj ])](t)dt 1 f [zi(t,[f])]T[ Rolf] 3 J and 0o([f]) = 1 ( [Z~(t,[f])]T[Rol/2([f ])](t)dt J J T be random variables defined (2, 3,Pi) and (Q2,,P0)., respectively. (Note that these random variables are Gaussian due to the previous lemma.) Then, since e ([f]) = e.([f]) for every [f]s~, it can be seen that the funcJ 3 tion defined on 2 by ei([f]) = e~([f]) = ei([f]) is a Gaussian random variable on both (Q,,Pi) and (Q,3,Po). Moreover, Eo{e2} = 1 (R(R1/2 [f ]) Rl/[fj]) 1= (7.85) 0 i.2 ])0 j2 and Ei{} = 1 (Ri(Rl/2[f])Ro1/2[f]) > (786) where Eo and Ei denote expectations with respect to the measures Po and Pi' (To obtain (7.85) and (7.86), the order of integration with respect to the probability measure and the time variable was interchanged. This interchange can be justified by an argument similar to that used to justify the interchange in Eq. (II.4).) But, by Theorem 7.7, for any ~ > O, we have

114 P0o[f]; e ([f])l > )} < 1 so that,* by Theorem 7.8, P {[f]; lim le.([f]) = O which implies P0{[f]; lim 8ej([r])I = 0} = 1. 3 -*co Also, since each 6. is Gaussian with respect to Pi by Lemnma 7.5, for any finite positive no, we have n0 2n o j2V and, again by Theorem 7.8, Pi{[f]; lim lej([f])l > n,} = which implies P1{[f]; li lej([f]) = } = 1. Thus, if we let A = {[f]; lim ej([f])'l = 0) j we obtain Po(A) = Pi(Q-A) = *lim I8.([f])I = inf sup.8j([f])l. joo 0 k j>k

115 and PolPi. A similar argument can be used to prove PojPi in the case where R1/2R-l/2 is assumed to be unbounded. 0 i Q.E.D. The following lemma will be necessary to the proof of Lemma 7.8. Lemma 7.7: Let {[f ]} be a sequence of functions from L2[T] and, for every [fj ] in this sequence, suppose {[fjk] } is another sequence of functions from the domain of Ro1/2 such that lim lI[fj] - [fk 0. (7.87) k-*w Then, if Ril/2R l/2 is bounded and if we defined a sequence of random variables { jk} on Q by;jk = [zi(t,[f])]T[R-l/2([f k])](t)dt =. [zO(t,[f])]T[R-1/2([fjk])](t)dt, =k T jk T there exists a sequence of random variables {ej} which are measurable with respect to 6, jointly Gaussian with respect to Po and Pi, possess finite variances, and such that j = l.i.m. j (Po0Pi) k- jk Proof: According to Lemma 7.5, the random variables {0jk;j=1,2,... } are jointly Gaussian on both (Q,',Pi) and (Q,'?,Po). In addition, we have Eo{ejkejn} = (Rol/2 [ fjk ],Ro(Rol /2 [ fJn ] ) ) = ( /[fk ],[fkjn]). Again, the order of integration has been interchanged which is justifiable by an argument similar to that used to justify the interchange in Eq.

116 (II.4). Hence, we find lim Eo{lIjk-Gjnl2} = im {11 [fjk]112 + II[fjn ]112 - 2([fjk],[fjn]) = O k,n- k, n(7.88) since lm I [fjk] I12 = I [fj ]112 and lim ([fjk][fj]) = II [f ]1 12 as k-n-ooo jkn jn a result of (7.87). Also, Ei {ejkejn} = (Rol/2 [ fjk ],RiRl/2 [ fn ] ) = 1 2ko }jk = /2Rk/2 jnD ) = (Ri/2ol/2[fj ] R/2RO/2 ) and, because Ri/2 Rl /2 is bounded and therefore continuous, 1 0 lim Ei{lejk- e jnl2} = 0. (7.89) k,n-oo Then, combining (7.88) and (7.89), we see that {Ojk}k, for each j, is a mean fundamental sequence with respect to both Po and Pi and, hence, with respect to the measure Po+Pi. If Theorem 7.3 is now applied, we see that there also exists a sequence of random variables {Oj}, each of which is measurable with respect to and square integrable on (0,6,Po+Pi), such that Gj = l.i.m. ejk (Po+Pi) k->w which implies Gj = l.i.m. 8jk (PoPi). k-coo Then, since the integral of a positive function over (Q,6,Po+Pi) is equal to or greater than its integral over (Q,qPo) and (Q,3,Pi), we see that each 8j is square integrable over (Q, 3,Po) and (, 3,Pi ). Moreover, since Eo{Qj} = 0 and Ei{.j} = 0, 8. has a finite variance with respect to both Po and Pi.

117 Finally, since the sequence (Gjklk converges in the mean to Gj with respect to both measures, it converges in measure to Gj with respect to both measures. This implies,* the characteristic function, with respect to either measure, of jl, )j2...,Gjn is the limit of the corresponding characteristic function of the jointly Gaussian variables [(jok' j2k'.,Gjnk} when jme(l12,... for me(l,2,...,n) and n is an arbitrary positive integer. Then, since the mean values and covariances of these random variables approach finite limits as k + oo, it follows from the general form of Gaussian characteristic functions that the limiting characteristic function is also Gaussian. Q.E.D. Lemma 7.8: If Ri/2Rol/2 is bounded, then, for any sequence of functions {[fj]} from L2[T], there exists a corresponding sequence of jointly Gaussian variables { }), defined over Q and measurable with respect to A, such that Eo{e. = Ei{ej} = 0 Eo{ejek} = ([fj][fk ]) Ei {e ek = ([fj],X*X[fk]) (790) where X is the bounded extension of R/2Rol/2 to the whole of L2[T] and X* is the adjoint of X. Proof: Since Ro!/2 is densely defined on L2[T], there exists a sequence of L2[T] functions {[fjk ]}k, for each j, such that *See Ref. 28, p. 169.

118 lim II[fj] - [fjk] = koo This, in turn, implies im ([fjm],[fkn]) = ([fj],[fk]) (7.91) m,n o since* the inner product is a continuous function on L2[T]xL2[T]. Moreover, we can define, for each j, the sequence of functions {(jk) on o which satisfy Gik = J [zi(t,[f])]T[Rol/2([fjk])](t )dt r [ZO(t,[f])]T[Rol/2([fjk])](t)dt. Then, by Lemma 7.7, there exists a sequence of random variables {G )} defined on Q, each of which is measurable with respect to ~:, jointly Gaussian with respect to Po and Pi, square integrable on (2,~,Po) and (Q,,Pi), and such that = l.i.m jk (Po, Pi) J k->co Let us now show that the sequence {eJ} satisfies (7.90). Since [Z0] is a zero mean process and I (Oj-ejk)dPo < flej- ekIdPO I, / k 1/2 where the last inequality follows from Theorem 7.6, we have Eo{ej} = limEo{ejk} = O. k-oo *See Ref. 44, p. 108.

119 Similarly, Ei{e4} = 0. Also, because E o{82} < c and Eo{2kn} < o for all j, k, and n, (8ejok-ejmkn )dPoI < {1 Jlj(k- kn) I+lekn(0j-j) I }Po < [E {e2}]1/2 L e kkn2dP /2 -- j I ek-ekn I2 dP + [Eo{ekn}]/2 5 ejo ejm 2dp which implies Eo{Ojk } = lim Eo{ejmekn}. (7.92) m,n~ Similarly, it can be shown Ei{0j0k } = lim Ei{0jmOkn}. (7.93) m,n~o Then, applying Eqs. (7.91), (7.92), and (7.93), we find Eo{ejek} = lim ([fjm],[fkn]) m,n-o ([fj],[fk]) and Ei{6jek} = lim (R1/2R-ol/2[fjm],R1/2R-l/2[fkn]) m,n-o (X[f ],X[fkl) where X is the bounded (continuous) extension of Ri/2Ro1/ 2.E.D.

120 In the next lemma, we shall obtain an expansion for the processes rZO] and [zl] in terms of a countable set of Gaussian random variables, each of which is measurable with respect to 3. Lemma 7.9: Assume Ri/2Rol/2 is bounded and denote its bounded extension to the whole of L2[T] by X. Then, if the identity* is denoted by I and if the self-adjoint operator I-X*X is completely continuous, l[Zo] = lo1 1/2 [/ n + [R/ []] (txPo) (7.94) n-` L= [ j=l and** eigenvalues of I-X*X, {[4Q ]} is an orthonormal basis for the null space of I-X*X, {nj} and {nj} are square integrable, measurable with respect to ie, jaintly Gaussian random variables that satisfy j 1.i.m. f [zi(t,[f])]T[R-l/2([qjk])](t)dt j= l.i.m. 5 [Zi(t,[f])]T[R, l/2(( jk])](t)dt with {[cjk]}k and {[ojk]}k being elements of L2[T] chosen such that they are elements of the domain of R-1/2 and such that limI I[j] s [ jk]l = *The identity operator maps every element of its domain into itself. **If [Z] and [Zn] are vectors, n-~o n-co kxT

121 and (txPo) and (txPi) are the product measures on (QxT). Proof: Note that X is uniquely defined since Ri /2R/2 is densely defined and bounded (Theorem 7.16). Also {nj} and {nj} exist as a result of Lemma 7.7 and Eo{njnk} = ( [j ] [k]) Eo{njnk} = ([Hj],[$k]) Ei{njnk} = ([j],X*X[$k]) Ei{njnk} = (.[j],X*X[k]) (7.96) by Lemma 7.8. In addition, {[pj]} and {[j]} exist since I-X*X is completely continuous and self-adjoint (Theorem 7.20). Now consider n n ~= -E j= - z]- [R/2[~ ] - ~ 1/2 K[Zo - [Rl/2 - [R ]]n dP0dt. j=l o J iJ j=l (In this case, the interated integral Jf is equal to / due to Theorem T Q Tx2 7.10.) Then, performing the Q integration and substituting (7.96), we find In Tr [rO(t,t)]dt -2 ([Ro12[]][ [ZO]n}]) 2 1 ([RO1/[qj]] [E {[Z~]n }]) j=l n n + R ([Rl/2 / 2[j],]]) + ([R1/2[ j],[Rl/2[cp]]) (7 97) j= j=l 0 where the symbol Tr{ } has been used to denote the trace of the indicated matrix. Furthermore, since

122 [Z2{Ifr212 dpo1 2 t q o we have [Eo{ [Z]nj}] = lim 5 [Z ][Z~]T[R-1/2 [jk]](s)dsdPo lim [R/2 [ k ] ] k-oO = [Rl/2[j]] (7.98) when the order of Q and T integration is interchanged. ([Zt]q denotes the qth element of the vector [Zt].) This interchange can be justified by an argument similar to that used in the previous lemmas. Similarly, [E{ZIT 1= [Ro/2 ]. (7.99) Now, since R is completely continuous and has a zero null space, there exists an othonormal basis for L2[T], {[rQ ]}, which satisfies [Ro[]] = k ] and n [r~(t,s)] = lim [ XIk[ k(t))] [+k(s)]T n-o k=l1 with the sum converging uniformly (Theorem 7.22). As a result, we have Tr [r~(t,t)dt = Tr lim nX [7(t) ][L (t)]T dt T n->o k=l T n lim C Xk (7.100) n-o k=l To justify the last statement, see the discussion concerning Eq. (II.11) and note that

123 k = ([p1k],[Ro[ip~]]) > 0. This last summation of Eq. (7.100), however, can also be written as 00 00 1/2 o R00] ([Ro[]], ]) ([Ro [k]],[ k=l k=l which becomes 00 0o Z > [([Rl/2 [Lp]][~ ])2 + ([Rl/2E1~]][Y]21 (7.101) k=l j=l when [R1/2[Lk]] is expanded in terms of the orthonormal basis {{'[j]}, Q{[c]}}. Moreover, since Ro/ is self-adjoint and the order of summation can* be interchanged, (7.101) becomes o00 {) + ([R1/2[2 ]], [R12 [j]]]) + (7.102) J-1 Then, substituting (7.98), (7.99), and (7.102) into (7.97), we find co 00 Io = I I[R/2 [j]]H I 2 + I I I[R/2[ j]]I 2 j =n j=n an d lim I = 0 n n-> since (7o100) which equals (7.102) is finite. In a similar manner, it can be shown that oo n i ([R /2 ]][R1/2[]])([x*x[j ]] [ ]) n = k=l j=l - [j= /2 [- ],[Ro/2[~j]])(xjxL] ]) j=l *See Ref. 59, p. 161.

124 where Xk satisfies [Ri[J]] = Xki[Ji] and {[~k } is an orthonormal basis for L2[T]. Also, 00 00 k = C ([Ri[k i k=l k=l 00 ([/2X*XR]] (7.103) Then, expanding [Ro/2[4]] in terms of the basis {{[] {[ } (7.103) becomes 00 00 I' ([R1/2[ik]] ][ j ])2([ X*X[~j]j] ][ j]) k=l j=l 00 00 + i ([ROl [X*X[j i], [ji). k=l k=l Finally since R1/2 is self-adjoint, interchanging the order of summation* 0 results in 00 0co = - I 2 ([Rl/2[j]],[Ro/2[j]])([X*X[j]] [pj]) + ( [R1/2 [j f ], j] ]) ( [X*x[j.]] [ j ]), and we see again that lim I = 0. n-oo n Q.E.D. The final lemma of this section proves the o-field~is contained in a slight enlargment of the a-field generated by the random var1*See Ref. 59, p. 161.

125 ables {nj } and {r}j defined in Lemma 7.9. Lemma 7.10: Under the hypothesis of Lemma 7.9, TI3c 3 where < is the a-field of sets of the form AAN, AWs, NcNosE, Po(No) = 0, NCNiEc, Pi(Ni) = 0, and4 is the minimum a-field with respect to which the random variables {nrj and {n r} are measurable. Proof: Let us first prove 63'CJ_ (Remember,-63 is the minimum a-field with respect to which [Z~] and [Zt] are measurable for all toT.) To prove this proposition, is suffices to prove [Z(t,[f])] = [Z~(t,[f])] = [Zi(t,[f])] is measurable for every tET. De fine [Sn(t,[f])] = E {[R1/2[j ]](t)n j([f]) + [R 1/2[j ]](t)-n j([fl)}. n(tf])] =j=l Then, since {[Sn(t,[f])]} converges to [Z(t,[f])] in the mean with respect to (txPo) and (txPi) by the previous lemma, [Sn(t,[f])] also converges in the mean to [Z(t,[f])] with respect to the measure (tx(Po+Pi)). As a result, there exists a subsequence {[Snk(t,[f])]} which converges to [Z(t,[f])] a.e. (tx(Po+Pi)) and, hence, a.e. (txPo) and a.e.(txPi). (See Theorem 7.1.) Thus, if N = {(t,[f]);[Z(t,[f])] + lim [Snk(t,[f])]}, nk we have (tXI0)( (txPo)(,) = (t.Pi) ( Then, due to Theorem 7.10, there exists a set T''CT such that, for teT',

126 Po(Nt) = Pi(Nt) = (7104) when Nt is the section of N determined by t and t(T') = Tf, i.e., the measure of the set T' equals the measure of the interval [0,Tf]. Now define Bt = {[f]; Z(t,[f])sA} and Bt {= [f]; lim Sn (t,[f]) A} nk0 k for toT' and A an arbitrary Borel set in K-dimensional Euclidean space. (Note* that Btc _8 and Btc~). Then, since Bt = (Bt Nt) U(Bt( t) Bt = (Ot t) Y(Btn C) and Bt h Nt =T if we denote the complement of Nt by NC, and since Po(Btfl N P(Nt Pi(Bt) Nt) = Pi(BtNt) Nt) = 0 due to (7.104), when we form BtABt, we find o ( ABt Pi(BtABt) = 0. It can now be seen, since Bt = BtA(BtABt), Btc1for tT', i.e., [Z(t,[f])] ism-measurable for almost every tOT. *See Ref. 30, p. 84.

127 Let us now prove [Z(t,[f])] is O-measurable for every tET. Since [r0(t,s)] and [ri(t,s)] are continuous by Assumption A6.3.1 of Section 6.3 and T' is dense in T, there exists, for every tET, a sequence {tn}, tnET', converging to t such that lim Eo{ |I[Z(t,[f])] - [Z(tn,[f])]l 12 = n-o and lim E{i(l[Z(t,[f])] - [Z(tn,[f])] 12} = O. n-o Hence, there exists a subsequence {tnk} such that [Z(tnk,[f])] converges to [Z(t,[f])] a.e. (Po) and a.e. (Pi). Then, since each [Z(tnk,[f])] is s-measurable for every tnk, the same argument as used above shows [Z(t,[f])] is p-measurable for every toT. If we now return to Eq. (7.75), we see A and, due to the definition of' and the above discussion,'s C t3 which together imply Q.E.D. We now have all the lemmas that are necessary to prove the principal theorem of this chapter. This theorem specifies the conditions under which P - Pi, isMo, and the form of the Radon-Nikodym derivatives, dPi/dPo, iSMo.

128 Theorem 7.26: The following three properties relating Po and Pi are true: (1) Either Po- Pi or PoLPir (2) Po - Pi if and only if R1/2Rol/2 is bounded and I-X*X is a Hilbert-Schmidt operator where X is the bounded extension of RV/2R-l/2 to l 0 all of L2[T]. (3) If P - Pi' = lim exp - log J dPno jJ n+~o j=l Pjl where pj, j{l1,2,...}, are the eigenvalues of I-X*X and nj, jE{l,2,...}, are defined as in Lemma 7.9 with [fj] being the eigenfunctions of I-X*X Proof: Let us first prove (2). Necessity: Assume P - Pi.. Then, from Lemma 7.6 and Theorem 7.165 R/2R-1/2 is bounded and possesses a unique bounded extension to all of L2[T] which we are denoting by X. Furthermore, it is easily seen that X*X is bounded, self-adjoint and positive. Theorem 7.21 then guarantees the existence of a spectral decomposition of X*X which we can write as 5O vdEV where Ev represents a resolution of the identity. We shall now show, by contradiction, X*X has a purely discrete spectrum. Suppose for some e > O, I-E1+c is infinite dimensional. Then, there exists a sequence of real numbers {vj} such that 1+E < v2 <... and a sequence of orthonormal functions {[fj]), from L2[T], such that (E"-E') [fj] = [fj] where v' < v; < v" and

129 (EV,,-E,)[fj] = O when vj < v' < v" or v' < v' < v.. It then follows 00 ([fj],X*X[fk]) = vd(Ev[f][fk]) 0 jkj > 6jk(l+s). (7.105) But, due to Lemma 7.8, there also exists a sequence of random variables {ej } which are measurable with respect to e, jointly Gaussian with respect to both Po and Pi and such that Eo{ej} = Ei{j} = Eo{ejek} = 6jk Ei{ejek} = ([fj],X*X[fk]). (7.106) T! Now let M3 be the minimum a-field with respect to which the sequence {lj} is measurable, and let Po and P" be the restrictions of Po and Pi to 3 TT. Then, from Lemma 7.4 and Eqs. (7.105) and (7.106), P"LP' since [0 2 _ ___ =co0. (7.107) j=l Ei {:e2 But, by Lemma 7.2, we also have PoJPi which is a contradiction. Therefore, I-E1+c is finite dimensional for every E > 0. Similarly, it can be shown that E1 l is finite dimensional. Hence, X*X has a purely discrete spectrum with 1 being the only possible limit point of the spectrum. This implies I-X*X also has a discrete spectrum and its only possible limit point is zero. Therefore, by Corollary 7.1, I-X*X is com

130 pletely continuous~ If we now look at the equation (I —x*x)[] = p[5], we see that the eigenfunctions of I-X*X are also eigenfunctions of X*X with the eigenvalues of X*X being one minus the corresponding eigenvalue of I-X*Xo As a result, the {nj} and {nrj} defined in Lemma 7.9 can be shown to have the following properties: Eo{nj) = Eo{nj) = Ei{nj) = Ei{n-j} = 0, Eo{njnk} = ([4j]5[$k]) = jk, EO{njnk} ( [j ]'X [0k ]) Eo{njnk} = ( [4j],[k]) = 0, Ei{n jnk} = ( [j ],XX[k]) = 0, Ei{njnk} = ([4j],X*X[Xk] = (l-p.j)jk, Ei{nrj-k} = ( [j ]X*X[k]) = 6jk (7.108) To verify these properties, an argument analogous to that used to prove Eq. (7.90) of Lemma 7.8 suffices. Now let4 be the minimum a-field with respect to which {nj} and {n.} are measurable and let Po and Pi be the restrictions of Po and Pi to. Then, it follows from Lemma 7.4 that either Po Pi or %PciP and Po - Pi if and only if *' - 1-p.) <. * (7.109)

131 Furthermore, from Lemma 7.2, A Pi. But since Po P from the A1 hypothesis, we must have P - Pi. Hence, (7o109) is satisfied, or equivalent ly, P < 00 (7.110) j=l 0 which implies, by Definition 7.24, that I-X*X is a Hilbert-Schmidt operator. Sufficiency: Assume I-X*X is a Hilbert-Schmidt operator on L2[T]. We can then establish {nj}, {nj} and (7.108) as previously done. Moreover, since I-X*X is Hilbert-Schmidt, (7.110) and hence (7.109) is satisfied. A It then follows from Lemma 7.4 that Po Pi which, together with Lemmas 7.3 and 7.10, imply P - Pi Lew us now prove (1). Assume Po and Pi are not equivalent. Then, according to (2) proved above, one of the following three cases must hold: (a) R1/2R-1/2 is unbounded, (b) I-X*X is bounded but not completely continuous, (c) I-X*X is completely continuous but not Hilbert-Schmidt. In case (a), PoLPi by Lemma 7.6. In case (b), X*X has a spectral representation and either I-E1+ or E1+c must be infinite dimensional for some E > 0. Then, PoLPi, as shown in the paragraph which contains Eq. (7.107). In case (c), I-X*X has the eigenvalues and eigenfunctions {pj} and {[.j]}. Also, the random variables {nj} and {n j}, as described previously, are well defined. But, since I-X*X is not Hilbert-Schmidt, (7.110), and, A A hence, (7.109) do not hold. Then, according to Lemma 7.4, PolPi and, from Lemma 7.2, PoLPi. Therefore, we conclude that if Po and Pi are not equivalent, then, they must be singular. This trivially implies that if Po and Pi are not singular, then, they must be equivalent.

132 Let us now prove (3). This assertion, however, is an immediate consequence of Lemma 7.4 (3) and Lemma 7.3 (3). Q.E.D. The next two theorems provide, under certain conditions, an alternate functional form for the Radon-Nikodym derivatives, dPi/dPo, isMo. This alternate form will allow us to determine, by the use of linear filters and correlators, in which set of the optimal partition of the observation space a particular sample function belongs. Theorem 7.27: The measures Po and Pi are equivalent if an only if there exists a Hilbert-Schmidt operator U defined on L2[T] which satisfies R1/2UR /2 R 7.111) 0 0 = - Ri (7.111) Proof: If Po and Pi are equivalent, let U' = I - X*X where X is the bounded extension of R/2Rol /2 -1/2R is bonded due i 0/2 bou0 to Lemma 7.6.). Then, since U' can be seen to satisfy (7.111) and, by Theorem 7.26, U' is Hilbert-Schmidt, we see that the conditions of the theorem are necessary. To prove these conditions are sufficient, assume U is a HilbertSchmidt operator that satisfies (7.111). Then U also satisfies Ri = R/2 (I-U)R/2 and, as a result, on the dense subset of L2[T] where Ro12RiR1/2 s defined, Rol/2RiRo1/2 is equal to I-U. Furthermore, since U is completely continuous by Theorem 7.24, U is bounded as is I-U so that Rol/2RiRo1/2 is also bounded. Finally, since the bounded extension of a densely de

133 fined bounded operator is unique by Theorem 7.16, we see that X*X = I-U. Thus, I-X*X is Hilbert-Schmidt and the application of Theorem 7.26 completes the proof. Q.E.D. Theorem 7.28: If there exists a bounded self-adjoint operator Hi on L2[T] satisfying RoHiRi = Ri-Ro, (7.112) then Po and Pi are equivalent. Moreover, n 1 lim exp log(l-p)I = Di n->o LC j=l exists and is finite and'-i = D01 exp{ 1 (Hi( [f]),[f])} a.e.(P) dP2 where {p.} are the eigenvalues of I-X*X and [f] is any observation from the observation space Q. (If [f]{L2[T],(Hi([f]),[f]) is defined to be zero.) Proof: If (7.112) is satisfied, we can write Ri = R( I+HiRi) = (I+RiHi )R which shows that Rl/2RiR1/2 is bounded and densely defined on the subset of L2[T] where R1/2 is defined. As a result, we can define an operator U to be the uniquely defined bounded extension of I - Rol/2RiRol/2 = - R1o/2HiRiRol-1/2 = - Rol/2RiHiRol/2 0 10 0 11 0 11

134 Furthermore, since Ro is positive definite and completely continuous, we can assume the sequence of eigenfunctions {[ip]) of Ro form an orthonormal basis for L2[T]. Then, if we let {Xo} be the sequence of eigenvalues that correspond to {[j] }, we find o0 00 [N(U)]2 = IIu([o?])II2 = (R1/2HiRiR/2( [DR/2RiHiROR1/2( ]) I i Y 0 i i 0 0 /2i / j=J j=l 00 j (o)-1/2(9)l1/2 (Rl/2H Ri(i[o]), R-1/2RiHi([lo])) j=l I (HiRi( [j01),RiHi( [D)) j=l tr(RiHiRiHi ) < N(Ri)N(HiRiHi) < I IHi 12[H(Ri) ]2 < o when the results* of Theorem 7.23 are applied. Furthermore U satisfies (7.111) and, from Theorem 7.27, we see that Po and Pi are equivalent. Now let X be the unique bounded extension of R'i2Rol/2 (Again, RI/2Rol/2 is bounded due to Lemma 7.6 and the fact that Po 0 Pi.) It can be seen that U = I-X*X since U is the unique bounded extension of I-Ro/2RiRo-l/2. Furthermore, if AcL2[T] denotes the dense subset of L2[T] on which Rol/2 is defined, we have (X*X)(Rl/2H R1/2) (Rol/2Ri R1/2)(Rl/2HiRl/2) = Ro/2RiHiRo/2 = Rol/2RiHiRoR-1/2 on A = R-ol/2RiRol/2 - I on A X*X - I (7.113) *The symbol N(U) denotes the Hilbert-Schmidt norm of the operator U.

135 since (X*X) (R1/2HiRo/2) is bounded and the bounded extension of I-Rol/2RiR-1o/2 is unique. Similarly, we find 0 i10 (R1/2H Rl/2)(X*X) = X*X - I 0 i 0 which, together with (7.113), implies I - R1/2HiR/2 = (x*X)-1 or Rol/2HiRl/2 = I - (x*X)-1. (7.114) Then, if {[4j]} and {P.} are the eigenfunctions and eigenvalues, respectively, of the Hilbert-Schmidt operator I-X*X and ([j] is a basis for its null space, we have (X*X j (l_-Pj) i[] and (X*x) [-j] = [*] Finally, this implies, because of (7.114), (HiRo1/2[ 5],R 1/2[k]) ( ) (7.115) i. ) 6jk j-l (HiRo/1/ 5],Rl/2] ) 0, o (7.116) and (HiR1/ L 5]R 1/2 []) = O. (7.117) 0' 0 k Returning now to Lemma 7.9, we see that [Z0] = 1.i.m.[Z~] (txP) (7.118) n-)oo

136 where [Z~] =. {[R1/2([4j])]nj + R1/2([ n} (-119) Then, if we define [Z~] by [Zn] = [Zo]-[z~] and let Hi be zero off I;[T], we find lim / [Hi([I ])]T[HiC([ZO])]d(txP = im [Hi ([]) I [Hi([])l]dtdPo n-w Tx fl n- Q T < urn 11i112 [z]T~iZno~dti n-c0 T T < im I IHi 112 f [Zo]T[no]dtdPo n-*W~ T = im I |Hi| 12 j [ Z]T[zo]d(txPo) n-~ Tx n n = 0 by the application of Theorem 7.10, the definition of the norm of H1, and Eq. (7.118). But this implies Hi([Z~]) = l.i.n. Hi([Z~]) (txPo) (7.120) no 1 n o Furthermore, as shown in Appendix IV, we can write* *Note that m(P ) denotes convergence in measure. 0

137 [Z~(t,[f])]T[Hi([Z~])](t)dt = rli ) [Zn(t,[f])]T [Hi([Zn])](t)dt m(P[) T tn+ Z T n p. = im I n. m(Po) (7.121) n-oX j=l Pj j-1 when we make use of Eqs. (7.115)-(7.120). Let us now obtain the relation that exists between T[ZO(t,[f])]T[Hi([Zo)]J(t)dt and dPi/dPo. In Theorem 7.26, it was shown that T lim {(n-log Di) = log a.e.(P ) n+~o dPo where = 1 P.(si )2 i 2 j J j-1 and Din = 11n (.)1/2 D. f (1-p.) j=l Let us now assume the sequence {log Din} does not converge to a finite number. Then, since {(n-log D.i) converges a.e.(P ) to the finite valued function dPi/dPo, inn must converge only on a set of measure zero. Thus, for every no and ~ > 0, P {[f];lIn-mM > E = 1 (7.122) for some n, m > no. But, from (7.121) we see, for every E > O, there exists an no such that if n,m > n0, we have P{[f];ln-iml > El < ~ which is a contradiction when we look at (7.122). As a result, {log Dn}) 1

138 must converge to a finite limit, call it log Di. Furthermore, since {in} converges in measure to both 1- [ZO(t,[f])]T[Hi([Zo])](t)dt and log i 2 T dP~ + log Di, we see from Theorem 7.2 that dPi - dP = D1 exp 1 [zO(t,[f])]T [Hi([z])](t)dt a.e.(P ). dPo 2 T But, as can be seen from the proof of Theorem 7.9, we can write [Z~(t,[f]) ] = [z'(t,[f])] = [f(t) ] for all [f]eQ which are continuous on [O,Tf]. Moreover, since the noise process [N(t)] is separable and there exist two constants C > 0 and 6 > 0 such that Tr{[r~(t,t)] + [r~(t',t')] - [r~(t,t')] - [r~(t',t)]} < CJt-t' for all t,t' T as stated in Assumption A6.3.1, there also exists* a set N Q for which Po(N) = 0 and such that all [f]eQ-N are continuous. Finally, this implies dFi -P = ex1p{ (Hi([f]),[f])} a.e.(Po) dP Dlexp { (i 2 when (Hi [f], [f] ) is defined to be zero if [f]{L2[T]. Q.E.D. 7.4 SUMMARY AND DISCUSSION OF RESULTS In this chapter, we have obtained a decision scheme for providing an estimate of the direction of arrival of a stochastic signal which is received by an array of receiving elements in the presence of noise. This scheme makes use of the a priori probabilities of the possible angles of *See Ref. 50, p. 98.

139 arrival and requires knowledge of the covariance functions of the signal and noise processes. Moreover, it is an optimal decision scheme in the sense that it results in a minimum probability of error as discussed in Assumption A6.3.1 of Section 6.3. (This scheme is also optimal with respect to the "Maximum A Posteriori Probability" optimality criterion mentioned in Assumption A6.3.1 so that it results in the specification of the direction of arrival which is "most probable" based on the received signals and the available a priori information. Optimally with respect to the "MAP" optimality criterion is discussed in Appendix V. ) The decision scheme is as follows: (1) Determine, for each possible direction of arrival ai, a bounded self-adjoint operator Hi which satisfies the operator equation RoH iRi = - Ro (7.123) where Ro and Ri are the integral operators defined in Eq. (7.75). 1 Determine also, for each i, Di = lim n (1-p )1/2 her n-oo j=l is the sequence of eigenvalues of the bounded extension of the operator I-R-1/2Ri R-1/2. 0 10 (2) Evaluate, for each i, the expression ={Pr{ } D 1exp {(Hi([y][y (7.124) where [y] denotes the vector of signals received by the array on the interval [O,Tf] and Pr(i)} is the a priori probability of ai being the true angle of arrival. In addition, let Io = Pr{ao) where Pr{(o} is the a priori probability of receiving noise alone.

140 (3) Finally, make the decision ai is the true direction of arrival if i is the smallest index of I for which Ii > I for all j. (If I > I for all j, make the decision noise alone is present.) This decision scheme possesses certain anomalies when the "no signal present" hypothesis has a nonzero a priori probability, however. For instance, when the number of possible angles of arrival become large, each of the a priori probabilities Pr{i)} approach zero. This implies the decision "no signal present" will be made for "nearly all" received samples [y] if Pr{a)} $ O. This difficulty has arisen because we are comparing the hypothesis "no signal present" with the individual hypothesis "signal at an angle ai." whose a priori probability is going to zero. An alternate, but more reasonable, decision procedure first makes a choice between the hypothesis "no signal present" and the composite hypothesis "signal at one of the angles ai.' Then, if the latter is chosen as the true hypothesis, it makes a decision as to which from the sequence of hypotheses "signal at an angle ai" is the true hypothesis. If this alternate decision procedure is adopted and if each of these decision are made so as to minimize the probability of error, it follows easily that step (3) of the above decision scheme should be replaced by the following: (3') Make the decision "no signal present" if Pr{c } > X {Pr{ai)}}{Dlexp{l (H( [y]),[y])}} Otherwise, make the decision ai is the true direction of arrival if i is the smallest index of Ii for which Ii > Ij for all j. It should be added that, if no such solution Hi exists for the Eq. (7.123), it is not clear what procedure is optimal for obtaining an estimate of

141 the direction of arrival of the signal. However, in most practical problems it appears that (7.123) has the required solution. Assuming the sequence of operators {Hi} and constants {D } can be found, attempts to apply the above procedure in a practical direction finding problem will still meet several difficulties, however. In the first place, the received signals normally can have an infinite number of possible directions of arrival. Even if we assume this number is finite, say no, it is still necessary to have either no processing systems operating in parallel to evaluate (7.124) or the signals must be recorded and then processed at a later time. In either case, the optimal procedure appears to be an impractical approach to the problem. This procedure does, however, suggest suboptimal techniques. For instance, the interval of observation [O,Tf] can be divided into n' disjoint subintervals so that Eq. (7.124), for various values of i, can be sequentially evaluated over these subintervals. Although this does not use all available information, it still appears to be a plausible technique when it is impractical to record the signals or have a great redundancy of equipment. It must be realized, though, in most practical problems n > n' so this 0 0 technique necessitates the need for a policy which selects the particular Ii, iE{1,2,..,no}, to be calculated during each of the no subintervals of [O,Tf]. A search for policies that are optimal would probably follow very closely the discussion presented in the first five chapters of this thesis. In at least two special cases of tremendous practical interest, however, the above difficulties do not exist. These cases, which restrict the class of possible noise and signal covariance functions to those which are stationary and "band-limited," are the subject of the next two chapters.

CHAPTER 8 SPECIAL CASES 8 1 INTRODUCTION In the previous chapter, we derived an optimal decision scheme for choosing between the hypothesis of "no signal present" and the sequence of hypotheses ("signal at an angle ai"}. This decision scheme required us first to obtain the operators {Hi) and then to calculate, for each i, the value of i = (Hi([Y]),[y) (8.1) where [y] represents the observation. Unfortunately, even if we assume the sequence of operators {Hi} is known, this necessitates the need for either recording the observation [y] so that Ii' for all (i, can be calculated at a later time, or a bank of filters, one for each possible ai, so that the entire sequence {ki } can be calculated simultaneously. In this chapter, we shall consider the special cases discussed in Section 6,4 and Sections 6.4.1-6.4.2. The solutions that will be obtained, while being more restrictive in their application, are, nevertheless, of great practical importance since the restricting assumptions typify, in many ways, the environment of realistic direction finding problems and since a greatly simplified implementation is possible. We shall find, in these special cases, a single matrix of 2K2 numbers provide all the information given by the sequence {iO}. (Remember K denotes the number of elements in the antenna array. ) Each of these 2K2 numbers can be obtained as the output of a realizable filter operating on the received data. We will 142

143 be able, for these special cases, to eliminate not only the need for recording the received signals or having available a large bank of parallel processing systems but also the need for solving individually for each of the operators in the sequence {Hi}. In addition to the sequence {(i}, the sequence of constants {Di} must be available before an optimal estimate can be made. The problems associated with the calculation of these constants will also be considered in this chapter. 8.2 INDEPENDENT, IDENTICALLY DISTRIBUTED NOISE Let us consider, first, the problem in which Assumptions A6.4.1 and A6.4.1.1 together with A6.3.1-A6.3.3 are valid descriptions of the environment of the direction finding system. Before obtaining a solution for this problenq however, let us investigate the consistence of these assumptions. In particular, let us shown the existence of constants C > 0 and 6 > 0 such that { E [r~(t,t)]kk + [r~(v,v)]kk - [r~(t,v)]kk - [r~(v,t)]kk} < Cjt-v|6 kEK (8.2) when t, vet is consistent with the "narrow-band" assumption. Since the noise process driving each array element is now stationary, independent of all others, and possesses a covariance function rN(T), if we let K = 1/2K, Eq. (8.2) reduces to IrN(O) - rN(T)I < KCITI6. (8.3) Furthermore, since rN(T) is continuous and bounded, a question as to the existence of the constants C > O and 6 > O such that Eq. (8.3) is satisfied arises only when T-O. Moreover, if Eq. (8.3) is satisfied for 6 > 1

144 and any T < 1, it is automatically satisfied for 6 = 1 and any T < 1. Thus, we need only investigate the implications of Eq. (8.3) for C > 0 and 0 < 6 < 1. Suppose, now, lim L2 FN(c)) < RC (8.4) W-*cxo where FN(W) is the Fourier transform of rN(T). Then, applying the Initial Value Theorem* from the Theory of Fourier Transforms, we find that Eq. (8.4) implies rim N(0)-TN(T) < lim - T+O T which implies IrN(O) - rN(T) I < KCIT6 for 0 < 6 < 1 and T sufficiently close to zero. Thus, we see the conditions under which there exist constants C > 0 and 6 > 0 such that Eq. (8.3) is satisfied are related to the manner in which the power spectrum falls off for large w and, therefore, are consistent with the narrow-band as s umption. Turning now to the problem of specifying an optimal estimation procedure when the above stated assumptions are valid, we see, according to the discussion of the previous chapter, an optimal estimate requires the use of a bounded self-adjoint operator Hi which satisfies RoHi Ri = Ri - Ro. But, referring to Assumption A6.4.1, we find there exists a symmetric kernel [hT(t,s)] which satisfies Eq. (6.6) and which is square integrable *See Ref. 60, p. 267.

145 on TxT. Then, since [hT(t,s)] is symmetric and square integrable, the integral operator HT with [hT(t,s)] as its kernel is self-adjoint and Hilbert-Schmidt which, in turn, implies Hi is self-adjoint and bounded. The proof establishing these properties follows exactly the proof presented in Appendix II where Ri was shown to be self-adjoint and HilbertSchmidt. Moreover, since [hT(t,s)] satisfies Eq. (6.6), it is easily T shown that HT satisfies oo HT R0H1R 0 R1 - R0 so Hi represents a realization of the desired operator Hi. At this point, it is instructive to note that there actually are covariance functions [ri(t,s.)] and [r~(t,s)] for which we can find symmetric square integrable kernels satisfying Eq. (6.6). For example, suppose the orthonormal sequences of eigenfunctions for the operators Ro and Ri are identical and given by {[49]} while the corresponding eigenvalues are {X9} and {Xi}, respectively. Then, it is easily shown that 00 (xJxc) I j T j=l io ][O(S) is a solution of Eq. (6.6), is symmetric, and, if _ 0 < oo00 is square integrable on TxT. Returning now to Eq. (6.6), the fact that a kernel with the desired properties is known to exist does not provide us with a method for obtaining this kernel. On the other hand, Eq. (6.7) which has a similar form can be solved easily for [h.(t-s)] by the use of Fourier transforms 1

146 when the signal and noise covariance functions are known to be stationary. The assumption which states that [hi(t,s)] can be approximated by [hi(t-s)] for t, set when Tf becomes large is then intended to replace the necessity for solving a difficult integral equation, Eq. (6.6), with the necessity for solving the much simpler integral equation,Eq. (6.7). To justify this assumption on a basis other than its expedience, though, let us refer to Eq. (7.124) where it can be seen that the usefulness of the operator Hi stems from the fact that it allows us to obtain the statistic = (Hi[Y],[Y]) T since HT is a realization of the desired H.. Thus if the integral opi 1 erator H. whose kernel is [h.(t-s)] for t, seT can be shown to provide "substantially" the same statistics when substituted for H., it would be unnecessary to make this assumption and Hi could be used directly when making an angle of arrival estimate. Unfortunately, attempts to produce conditions on the observation interval [O,Tf] and the covariance functions of the signal and noise processes which guarantee this property have not met with any degree of success. Intuitively, however, this assumption is not unreasonable when we note that Hi[y] can be obtained as the output of a linear filter whose impulse response is [hT(t,s)] and if we remember that the optimal Wiener impulse response for finite data,* whose defining integral equation is very similar to Eq. (6.6) if white noise is present, approaches the solution for the infinite interval with only minor variations near the endpoints of the interval.** Turning now to the problem of solving Eq. (6.7), the kernel *See Ref. 43, p. 240. **See Ref. 61, p. 1087.

147 [h.(t-s)], due to Eqs. (6.3), (7.75) and Assumption A6.4.1, satisfies [r~(u-t)][h.(t-s)]{[r (i) (s-v)]+[r(s-v)]}dtds =[r( (u-v)] or, equivalently, c r jo o [r(u-t)] mj[hT(t-s)] k{[r (i)(s-v)] + [r(s-v)]kn}dtds ILkF-K. -oo -~ -o mj 1 jk kn kn j,kcK -K -( [r'(u-v)] mn for all m, neK. Furthermore, if we let u+T m(o) = u', V+Tn() = v' t+Tj(a) = t' and s+Tk(i ) = s' and assume [hi(t-s)]jk = h(t-s+T.(a.) - Tk(a.)), (8.5) the above equation reduces to {j J {rN(u'-t')}{h(t'-s')}{Kr (s'-v)+(s'-v'v)}dt' ds = rS(u'-v') (8.6) when we apply Eq. (6.2) and Assumption A6.4.1. (The scalar K denotes the total number of antenna elements in the array while T (Ci)-Tk (ai), as defined in Assumption A6.3.1, is the propogation time of a signal between the Jth and kth array element if ai is its true direction of ar1 rival.) Thus, by anticipating the form of the kernel [h.(t-s)], the 1 requirement for solving K simultaneous integral equations can be replaced by the need for solving only one integral equation. By introducing the proper delays as indicated in Eq. (8.5), the kernel [h.(t-s)] can be 1 constructed from the solution h(t'-s') of Eq. (8.6). Moreover, taking the Fourier transform of both sides of Eq. (8.6), the Fourier transform of h(T) is found to be

148 FS(W) Fh() F(w) [F (w)+KFs ()] (8.7) N N S where Fh(w), FS(w), and FN(w) are the Fourier transforms of h(T), rS(T), and rN(T), respectively. 00 To obtain a physical interpretation of the operator Hi, let us observe Eqs. (8.5) and (8.7) where it can be seen to perform two distinct operations. First of all, H. delays the individual components of [y]. These delays, as illustrated in Fig. 8.1, are such that the various components of [y] are in phase if ai is the true direction of arrival. (Remember, Tf is large compared with the propagation time of signals across the array so the fact that the delayed signals are not "quite" defined over the same interval can be neglected.) These delayed signals are then added and filtered by a nonrealizable filter whose impulse response is h(T). Looking at Eq. (8,7), it can be seen that this filter has an impulse response which is very similar to the impulse response it would have if its output was required to be a best (minimum mean square) estimate of the signal received by each array element when no delays are present. (The Fourier transform of the impulse response of such a Wiener filter would have the form FS(M) FN (w)+KFS () Finally, the filtered signal is again delayed to produce the vector [H?[y]] whose kth component is in phase with the signal received by the kth array element when ai is the true direction of arrival. As a result, we see that (H.[y],[y]) is the sum over j, keK of the correlation of the signals received by the jth and kth antenna, the jth antenna sig

[Y]I,F (w ) -'r i(aF L) -K T2I(ai) [_ 0 0~I I*0 [y]l S()T ai [H~D[y]] K -I- (a.)!( K 3 1 8.1. Operations performed by H.3 Fig. 8.1. Operations performed by H.. 1

150 nal being filtered and delayed so that it is in phase with the kth antenna signal when ai is the true direction of arrival. Thus, when we calculate the sequence ((H'[y],[y])) which approximates the sequence (2i), we are actually "scanning" the array and obtaining a measure of the signal energy received by the array from each possible direction of arrival ai. Let us now see now the band-limited condition of Assumption A6.4.1 affects Eq. (8.7). As shown by Kelley, Reed, and Root, when the Fourier transform of the signal covariance function has the form illustrated in Fig. 6.3, it can be written as rs(T) = Re(rS (T)eJwoT) (8.8) where rS (T) is the inverse Fourier transform of FS(w+wo), FS(W) being defined by F+(w) = FS(W) W > 0 = 0 < O and where Re( ) denotes the "real part" of the indicated quantity. It can be seen that Fs(w+ o) is a "low-pass" function so rS (T), when compared with cos T_, is a slowly varying function of T. Kelley, Reed, ~~~4o0~~~~~~~~~0 and Root have also shown -8.9) rS (T) = rS (-T) (8.9) o o where the bar indicates the complex conjugate. Then, since* *See Ref. 43, p. 365.

151 y1 {ej (Wo) dT = 6 (w-w) with 6 representing the Dirac delta function, we can write FS(W) = ~{Re(rS (T)ejWoT)) o = {(1/2)rS (T)eWoT + (1/2)rS (T)e-JWo~} o o 7 r{FS (w)*6(w-w ) F (-w)* (w+o ) (8.10) O O by applying Eq. (8.9) and letting FS (w) = l{rS (w)}. (The symbol ( } o o denotes the Fourier transform of the indicated quantity while "*" denotes convolution.) Furthermore, substituting (8.10) and a similar expression for FN(w) into (8.7), we find Fh () = Fh ()*6(w-o ) + Fh (w)*6(o+w ) (8.11) where F (F) = N S FS (-W) Fh = _ 9 ( 0 (8.13) 2 FN (-w)[FN (-w)+KF (-w)] o 0 0 + + and FN () FN(w+w )N FN(w) being defined by + FN() = F (W) W > 0 = 0 < O.

152 Finally, since Fh2(w) = Fhl (-w), the inverse Fourier transform of Fh(w), h(T), is given by h(T) = Re(hl(T)ejWoT) = Re(hl(T))cos W0T - Im(hl(T))sin woT (8.14) -1 where hl(T)-j (Fh (w)) is the inverse Fourier transform of Fh (w) and Im( ) denotes the "imaginary part" of the indicated quantity. Thus, according to Eq. (8.14), the kernel of the desired integral operator can be separated into high frequency and low frequency terms, hl(T) being slowly varying when compared with cos w T and sin w T. Note that hl(T) o o is real if the signal and noise power spectrums FS (w) and FN (w) are o o symmetric about the point w = O. This will be true when F (w) and FN(w) are symmetric about w = w and fall off sufficiently fast so FS(O): 0 and FN(0) t O. As a result, we can interpret Im(hl(T)) as a term which compensates for any lack of symmetry that might exist in these spectrums. Let us now investigate what happens to the sequence {(H.[y],[y])} when Hi is the integral operator whose kernel is obtained by substituting (8.14) into (8.5). Since hl(T) is slowly varying compared with cos W T and the propagation time of signals across the array is small compared with the inverse of the bandwidth (Assumption A6.4.1), Eq. (8.5) becomes [hi (T)]jk hlr (T)cos( oT+o (T (Ci) Tk(Ci)) hli ( )sin( o+e () (Oi) Tk(i)) (8.15)

153 where "2'" denotes an approximation, hlr (T) = Re(h(T)), and hli(T) = Im(hl(T)). Then, applying the identities* cos(l+i2) = cos P1 cos P2 - sin j1 sin P2 (8.16) sin(1l+i2) = sin'1 cos 42 + cos'1 sin'2, (8.17) we find Qi r Tr{[Al(ai)]T[y1]} - Tr{[A2(ai)]T[y2]} (8.18) where Tr{} denotes the trace of the indicated quantity while [Al(ai)], [A2(ai)], [Y1], and [Y2] are KxK matrices whose jkth elements are given by [A1(i)]jk = cos wo(T(ai) - Tk(i ))' (8.19) [A2(i)]jk = sin w (T.(i.) - T )) (8.20) jk o j i k i L~ l]ik J TT JT [Y(t)>][Y(s)] kh1 (t-s)cos'w o(t-s )-hli(t-s)sin o(t-s))dtds. (8.21) and [Y2]jk JT JT [Y(t)] y(s)]k{hlr(t-s)sin wo(t-s)h (t-s)cos (t-s)}dtds, (8.22) respectively. Observe, now, the fact that the matrices [Y1] and [Y2] are independent *See Ref. 62, p. 344.

154 of the parameter ac. The utility of this fact becomes apparent when we realize, if [Y1] and [Y2] are known, the calculation of our approximation to the sequence of coefficients {.i} reduces to a simple matrix operation on known matrices which is easily handled by a digital computer. To approximate i., for example, we need only form the matrices [Al(ai)] and [A2(ai)] which are known functions of the parameter ai and substitute into Eq. (8.18). Therefore, if we know [Y1] and [Y2], it is unnecessary, as in the previous chapter, to record the received time signals or have a large number of parallel systems, each calculating hi for a particular a.. The matrices [Y1] and [Y2], each with K2elements, contain all the information necessary to construct an approximation of the sequence {9i). This means we are able, at any later time, to "scan" the array and obtain a measure of the signal energy received by the array from any particular direction of arrival during the time interval [O,Tf]. The advantage of this method of operation becomes apparent when we compare it with the normal mode of operation of existing direction finding systems where scanning takes place sequentially and, therefore, only a single direction is under surveillance during a particular time interval. (Similar results have been observed by Ksienski63 for the "sure signal" case where the signal is representable as a sinusoid of known frequency but whose amplitude and phase, while being constant, are unknown in magnitude.) Let us now investigate the matrices [Y1] and [Y2] further and determine just how they may be evaluated. In Eq. (8.21), let us split the integral over the two-dimensional subspace (t,s)ETxT into the sum of two integrals, one over each of the regions I and II indicated in Fig. 8.2. We can then write (a similar expression can be found for [Y2]jk)

155 [Y f[t][= (tSts W [y(ts)-h[y(t-s)sin w{(t-s)}dtds + [y(t)]jEY(S)]k{hlr(s-t)Cos w0(s-t)-h[y(s-t)sin w0(s-t)cdtds 0 0 (8.23) since h(T) = h(-T), which can be realized by the use of the hardware illustrated in Fig. 8.3. In this figure, the jth and kth received signals are fed into filters with the realizable impulse response hl(T) = hlr (T) COS W T - h (T)sin o T T > 0 lr o li o 0 -r < 0 (8.24) and then correlated with the kth and jth received signals, respectively. The outputs of these correlators are then added to obtain the matrix element [Y1l]k. Thus, by the use of K2 devices of the type illustrated in Fig. 8.3, we are able to produce the K2 elements of the matrix [Y1]1. Moreover, a similar procedure shows the jkth element of the matrix [Y2] can be realized by the use of the device illustrated in Fig. 8.4 where the block containing h2 is a realizable linear filter whose impulse response is given by h2(T) = h lr(T)sin m T + hl (T)COS CO T > 0 =0 T < 0. (8.25) Note that the outputs of the correlators are subtracted in this case. The element of the matrices [Y1] and [Y2] can also be produced by using mixers and local oscillators rather than the band-pass filters

156 Tf t 0 t _ Tf Fig. 8.2. Regions of integration for Eq. (8.23). [Y]j ] si Correlator Fig. 8.3. Device for computing [Y1 ]jk

157 [Y iig 2 ec orreotor + 2jk I I IN h[ Correlotor [Yk ] Fig. 8.4. Device for computing [Y21 ]k

158 suggested in the previous paragraph. To show this, substitute (8.16) and (8.17) into (8.21) and (8.22) to obtain [Y1]jr =+ Tf t {[y(t)]C[y(s)]C + [Y(t)]S[y(s)]}S hlr(t-s)dsdt + [Tf { t[y(t)]C[y(s)]S + [y(t)]S[y(s)]C}hi(t-s)dt o 0 tcYSc + )k [y(t)ls[y(s)]S}hlr(s-t)dtds + fTf {[y(t)] [y(s)]k - [y(t)]?[Y(s)] }h i(s-t)dtds (8.26) o 0 where [y(t)]k = [y(t)]kcos w t (8.27) Tf k and [y(t)]k = [y(t)] sin w t (8.28) [)k - k o0 In Fig. 8.5, the hardware and circuitry necessary to implement Eq. (8.26) is illustrated. The blocks containing either lr r or li represent realizable low-pass filters with impulse responses of either hlr(T) = hlr(), T > 0, (8.29) = 0 otherwise or li(T): hli('), T > 0, = 0 otherwise. (8.30)

COS~.o cos co t h~~~ii I cri ~~~~~~Correlator Nr \\ Correbator Ii r nn ~~~~~~~-I ~Correlator y ]k j hl sin wot Fig. 8.5.I Alternate devicef orreltor [Y ]k ~lr~~~~~~~~~~~~~~~~~~~~~~~~~~~~v iilr I // ~~~~~~cl C~orrelator ~-~ ii r II y ~~~~~~~~Correlator Ar3, Ir I r/ I, Correla-or COS wot Fig. 8.5. Alternate device for computing [Y11jk

160 Even though Fig. 8.5 is more complicated than Fig. 8.3, from a practical standpoint, this latter approach appears to have an advantage over the one indicated in Fig. 8.3 (at least for microwave signals) in that mixers and local oscillators are usually easier to build and regulate than are band-pass filters. A similar technique can be used to realize the elements of the matrix [Y2]. Proceeding as above, we find [Y2].k = { { {[y(t)]s[y(s)]~ - [y(t)]c[y(s)]s}h (t-s)dsdt lo lo { ) )k + [y(t) j k}hli(t-s)dsdt O 0 + j~Tf J {[y(t)]C[y(s)]c + [y(t)]N y(s)]Sh (t-s)dsdt 0 [y(t [y() Tf i {[y(t)]S[y(s)]5 - [y(t)1NY(s)lc}h (s-t)dtds k k lr T0 0 f { {[y(t)]c[y(s)]c + [y(t)]S[y(s)]S}h (s-t)dtds (8.31) 0 k i kli which can be implemented as illustrated in Fig. 8.6. Let us now turn our attention to the problem of calculating the sequence {Do) which is necessary to the decision scheme presented in Section 7.4. Although techniques are available for estimating each Da when the sequence of eigenvalues {pc} corresponding to each particular "ao" is known, this is an unsatisfactory approach since, normally, we do not know {p)}. The ideal situation would be where D(ao), D(ae) = Di for all i, is some simple function of the kernels of the operators Ri and Ro since these functions are known. Although a search for such a function has not met with success, it is possible, for the special cases being considered in this chapter, to determine properties of this function which provide

Cos wot~ A~~~~~~~~~~~~~~~~~~~~ [' hir Corr — -- ~ t0elator I ~ ~ ~ ~ ~C~~~~I ~ Correlator I ~ ~ ~ ~ V 71 ~~~ CorrelatorL Ailr \ / Correlator YLj sin irt sin wot Ali Corre lotor o It I ~ ~ ~ ~~ \ -~~~~ C~orrelor Y ki h8.6 A r dei Correlotor htt~ I - orrelato COS wot Fig. 8.6. Alternate device for computing [Y21jk'

162 "essentially" the same information as would be available if D(ai) was known exactly. In particular, we can show it is reasonable to assume D(mo ) is independent of a. so that only the detection problem requires 1 1 knowledge of the exact form of D(ai). But, since the a priori probability of noise alone is chosen somewhat arbitrarily anyway, it can be seen that, even in the detection problem, little is gained by knowing the exact numerical value of D(ai). Thus, an experimental approach which sets the level D(cai) so as to obtain desired "false alarm" and "detection" probabilities appear most tractable for a real direction finding situation. To demonstrate the relationship that exists between D(ci) and a X, return to Theorem 7.28 where Di = D(ai) is defined. According to this theorem, D = (1-pj) 1 j=l where {pj. is the sequence of eigenvalues corresponding to the operator I-X*X. But, since X-l is the unique bounded extension of R1/2 R-1/2 by o 1 Lemma 7.6 and Theorem 7.16, it follows that -2 co D2 = I p' (8.32) i j=l j where pt 1 1-po and {pJ' is the sequence of eigenvalues corresponding to (X*X)-1 which -1/2 -1/2 is unique bounded extension of R / R R/. The infinite product of the 1 o 1 eigenvalues of an operator as specified in Eq. (8.32) is sometimes referred* to as its "determinant" so that *See Ref. 49, p. 1029.

163 D-2 = det[(X*X)-l) 1 where det{} denotes the determinant of the indicated operator. Moreover, since -1/ R = 1/2 1/2R -1 1R/2 1/2 1/2 O I i0 (I-Ro i 1 1 when we apply Eq. (7.112), it can be seen that the eigenfunctions and eigenvalues of (X*X)-1, RoRi and I-R H. are identical so det{(X*X)} det{ (X*X) = det = detI-R H.} 00 Let us now investigate the effect of substituting Hi for Hi which has been assumed to approximate H. throughout this chapter. Suppose the operator Sf. is defined on L2[T] so that, for each ksK and [f]EL2[T], [Sf+([f])] = [f(t-T (mi))a ] te[Tk(i),Tf] [ k+ ( ]) [f(k i k k i f = [f(t+Tf-Yk(aii)]k tc[O,Tk((i))) where the subscripted k denotes the kth component of the indicated element of L2[T]. It can be seen that Sf. maps L2[T] into L2[T] by shifting the individual components of the L2[T] elements forward in time with a slight distortion near the endpoints to ensure an L2[T] range. In a similar manner, let us define the operator Sf7 by [sfi([f])]k = [f(t+Tk(a.))] t4[0,T -Tk(Oi)] 1 k ki k fkl [f(t-Tf+Tk(ai))]k tc(Tf-Tk(i)'Tf] for each keK and [f]hL2[T]. Referring now to the discussion of Fig. 8.1, we see, since Tf >> max Tk(i),' the operator H. can be "approximated" by f K ~~~~~1

164 Sf. H Sf. 1 where H is the integral operator on L2[T] into L2[T] for which the jkth element of its matrix kernel is given by h(t-s). This approximation co for H. appears valid and we shall not pursue further the manner in which 00 00 it converges to H. since H. is, itself, an approximation to Hi. If we now form I-R Sf.HSf., we find 0 1 1 I - R SfHSf. = Sf ( I-R H)Sf O 1 1i 0 1 since R has a diagonal kernel. As a result, it can be seen I-R Sf. HSf. O o i 1 and I-R H have identical eigenvalues and eigenfunctions since Sf. sSf Sf I 1 1 1 1 But this implies det{I-R Sf.HSf} = det{I-HR } o i i o which is easily seen to be independent of a.. Finally, since Sf HSf. 1 i 1 "approximates" Hi, we see that it is reasonable to assume I-R Hi and, hence, Di is also independent of ai. An alternate and possible more intuitive argument for assuming D(ai) is independent of a. can be given if we rewrite Eq. (7.124) in the form Ii = Pr{a.}exp{ 2 {(Hi[Y],[y] )- B(Ci)) (8.33) where B(ai) = 2 log Di. In Eq. (8.33), it can be seen that B(aci) serves to introduce a bias of the statistic (Hi[y],[y]). Moreover, the approximation to this statistic developed above, as shown in the next chapter, is the sum of two statistics which depend on a and a;, the true and assumed

165 directions of arrival, respectively. One of these statistics achieves its maximum value when a. = a while the other is a noise term which tends to distort the sum causing the maximum of our approximation of the sequence {(Hi[y],[y])} to be achieved when ai is different from a. Suppose, rather than requiring B(ai) to equal 2 log Di, we substitute for B(ai) the function of ai which equals the expected value of the distorting noise term so that the term in the exponent of Eq. (8.33) is unbiased by the noise, i.e., the expected value of this term is maximum at the true direction of arrival. This approach, though possibly suboptimal, has merit since we would expect, if our original estimation technique was any good, approximately the same answers should be obtained using either bias. The advantage of using the latter bias is that it can be obtained easily. Moreover, as we shall find in the next chapter, the expected value of this noise term is independent of a. which tends to reenforce the statements made in the previous paragraph concerning the independence of D(ai) and ai. The summarize, the discussion of this chapter has produced a technique for approximating the sequence {I } which is the crucial element of the optimal decision scheme presented in Section 7.4. This technique requires us first to obtain the KxK matrices [Y1] and [Y2] whose elements can be realized by the use of the devices illustrated in Figs. 8.3 and 8.4 or Figs. 8.5 and 8.6. Then, substituting into Eq. (8.18), the sequence {t} can be easily approximated for any desired set of possible angles of arrival {ai}. Finally, it is a simple matter to substitute this approximation to {i.} into Eq. (7.124) letting D(ai) be a 1 1 preassigned constant level to be determined as discussed above and obtain the sequence {Ioi}. 1

166 8.2.1 Example It is instructive at this point to consider an example which applies the above techniques to the design of a signal processor. Suppose K elements of a receiving array are equally spaced along the y-axis as illustrated in Fig. 8.7. Suppose, also, there is a Gaussian signal source located in the x-y plane at an angle a relative to the y-axis and possessing a power spectrum ES FS() = (8.34) (w-w )2+b2 O O where ES is a constant which depends on the signal strength and b << w. (This last inequality will allow us to apply the "band-limited" condition discussed in Section 6.4,) A wavefront due to this signal is shown in Fig. 8.7 together with a unit vector u which is drawn orthogonal to it. Finally, suppose the observation interval [O,Tf] is long compared with 1/b (b being a measure of the bandwidth of the signal) while the noise 0 0 corrupting each received signal is Gaussian, independent, and identically distributed with a power spectrum which is much wider than the power spectrum of the signal. The noise spectrum can then be approximated as shown in Fig. 8.8 where EN is the height of the spectrum and a >> b. It can now be seen that this model satisfies all the hypotheses of the Special Case Solution presented in Section 8.2 except possibly the conditions relating to the existence of a symmetric square integrable solution to Eq. (6.6) which can be approximated by the solution to Eq. (6.7). However, since these conditions are very difficult to verify, let us design a signal processor according to the techniques presented in the previous section even though it may be suboptimal. We can then evaluate its performance later by calculating the expected value of the square of the

167 y - Axis Wavefront C ~Y D/ x-Axis -2 X Denotes Antenna Locations K1 Fig, 8.7. Geometry of the receiving array relative to the incident wave. FN(W )t N _ _ __ I I O Wo- o o Oo +O Fig. -8.8. Noise power spectrum. Fig. 8.8. Noise power spectrum.

168 resulting error. Observing the forms of FS(w) and FN(w), it can be seen that ES Fo) b2 (8.35) 0o while FN (w) has the form illustrated in Fig. 8.9. Then, substituting o0 these relations into (8.12), we obtain (E /E2) F (W) =(8.36) hl W2+b2+KE /EN o S N If we now take the inverse Fourier transform of (8.36), we find hli(T) = 0 (8.37) and hlr(T) = hl () Es 2E 7S exp{-IT 1Vbz+KES/EN }. (8.38) 2E2v/b+KE /E 0 S Finally, substituting (8.37) and (8e38) into (8.29) and (8o30) we obtain A hli(T) = 0 (8.39) and A ES r(T) = exp{-T/b2+KE/EN }, T > 0 (8.40) lr 2E2Vb7K o S N = 0 otherwise which are the impulse responses of the low-pass filters required in Figs. 8.5 and 8.6 to provide the elements of the matrices [Y1 ] and [Y2].

169 FNo (W) EN I i -00 00o Fig. 8.9. Low-frequency noise power spectrum.

170 All that remains now, assuming we have the devices for realizing [Y1] and [Y2], is to provide a method for approximating the sequence {(i) by the use of Eq. (8.18) where, for this case, D [AI(ac)]k = cos[(k-j) - cos a.] (8.41) i jk c 1 and w D [A2(ci) ] = sin[ (k-j) -- cos a.] (8.42) i jk c since Tj(Oi.) = -j( D/c)cos a.. (8.43) (In Eqs. (8.41) and (8.42) we have indexed the antennas from -(K-1)/2 to (K-1)/2 as shown in Fig. 8.7 with the element at the origin being the Oth antenna.) Then, if all possible angles of arrival have equal a priori probability, our estimate of the true angle of arrival becomes the ai corresponding to the Zi possessing the largest approximation. (The modification of the above statement to handle the case where not all possible angles of arrival are equally probable is obvious.) But, assuming a digital computer is available, it can be seen that no additional hardware is necessary to produce this estimate since all the operations mentioned in this paragraph can be performed easily by such a computer. (If the detection problem is also of interest, the experimental technique mentioned above can be applied to obtain a level to be substituted for

171 An evaluation of the performance of the above estimation technique will be discussed in Section 9.3.1. 8.3 INDEPENDENT, IDENTICALLY DISTRIBUTED (EXCEPT FOR AN AMPLITUDE FACTOR) NOISE Let us now consider the problem in which Assumptions A6.4.1 and A6.4.2.1 together with A6.3.1-A6.3.3 are valid descriptions of the environment of the direction finding system. As noted in Section 6.4.2, this problem is very similar to the problem discussed in Section 8,2. In this case, however, we shall allow the noise levels to vary between antenna elements. Following the procedure of the previous section, if we let h'(t-s+T -T ) [h?(t-s)] = - k (8.44) 1 jk aa ajak we find the Fourier transform of h'(T), Fh,(w), must satisfy F S(W) Fh,(w) = (8045) FN(w)[FN (w)+K'Fs())] where K' = E (1/aj). Observe, the only difference between Eq. (8.45) jEK and Eq. (8.7) is the replacement of the scalar K by the scalar K'. If we then solve for h' (T) and h' (T) as in Section 8.2, we also find lr li i i Z, X Tr[Al(a T)j[Yj]- Tr[Al(a )T rY2] where [Y[] and [Y'] are defined as [Y1] and [Y2] in Eq. (8.21) and Eq. (8.22) with h' (T) and hl (r) replacing hlr(T) and hli(T), respectively, [A-(-)]= I 1 i jk a~a (8.46) jk ~ ~a aja

172 and where [A2 (.) ] [AI(aN)]~ = (8.47) 2 i k ]jk a(ak' It can now be seen that the entire solution obtained in Section 8.2 carries over to this problem with only minor variations. As a result, we shall not pursue this problem further.

CHAPTER 9 ERROR ANALYSIS 9.1 INTRODUCTION In this chapter we shall consider the variance of the error resulting from the use of the estimation techniques discussed in the two previous chapters. Expressions relating this variance to the number of elements in the antenna array and the "signal-to-noise" ratio will be presented for the example of Section 8.2.1 in which the array elements are collinear and uniformly spaced. 9.2 ERROR ANALYSIS (General Case) In Section 7.4, it was found that the optimal estimate of the angle of arrival corresponded to the ai which maximized the sequence {Ii} where Ii = {Pr{ai}}{exp 2 {(Hi[y],[y]) - 2 log Di}} (9.1) 1 2 1 1 with H, being dependent of ai and [y] denoting the observation. Let us now consider the error that results from the use of this technique when the a priori probability distribution of the angles of arrival is uniform, i.e., Pr{ao} = Pr{a,} for all i and j. This choice of distribution has been made as a matter of convenience and it is possible to extend our discussion, without difficulty, to cases where this distribution is other than uniform. Moreover, since the exponential function is a monotone function of its argument, the a, that maximizes {I,} also maximizes the sequence {W(a.)-B(ai)} when (a.o) = (Ho[y],[y]) and B(ai) = 1 1 1 1 1 2 log Do. As a results our optimality criterion reduces to that of choosing the ai which maximizes the sequence {Z(ai)-B(oi)}. 173

174 Now, to obtain an estimate of the resulting error, let us assume there is a continuous range of possible angles of arrival. Furthermore, let us assume the noise is "small" so that the optimal estimate & is approximately equal to the true value a. If we then expand 2(ai)-B(ai) in a Taylor series, about a, retaining only the first three terms, we find (ai) - B(ai) [(Hi[Y],[Y])-B(i + a(- [(Hi[Y],[y])-B(ai)]l. ) (ai-a) + (a [(Hi[Y],[Y])-B(ai)] a) i )2/2 (9.2) so that the optimal estimate is given by Aa = a + c (9.3) where - [(Hi[Y],[y])-B(ai)] =a a' i 1 Finally, if we replace the sample function [y] in Eq. (9.4) by the random process [Ya], the various statistics of this approximation for the error can be considered. In particular, the second moment of this approximation can be used as a measure of the performance of the system for the limiting case where the received signal energy is "large" compared with the received noise energy. The above procedure for obtaining a measure of the system performance, although intuitively meaningful, leaves several questions un

175 answered however. For instance, "Over what range of values of a. and a does Eq. (9.4) represent a "good" estimate of the error?" The more fundamental question concerned with the legality of the power series expansion also exists since the functional relationship between k(ai) and ai is not known at this point. Finally, the problem of calculating the 1 second moment of this approximation is not trivial since it is the ratio of two random variables. In the next section, we shall examine the error resulting from the use of the estimation technique developed in Section 8.2. To obtain a measure of the system performance, the procedure discussed in the previous paragraphs will be modified slightly due to its above stated difficulties. 9.3 ERROR ANALYSIS (Special Case Solution Presented in Section 8.2) Let us now restrict our attention to the Special Case Solution of Section 8.2. Recall, in this case, Assumptions A6.4.1 and A6.4.1.1 together with A6.3.1-A6.3.3 characterize the operating conditions of the direction finding system. Let us further assume the Fourier transforms of the signal and noise covariance functions are symmetric about the frequency w, i.e., FS(w +w) F(w o-w) and FN(w +w) = FN(w -w) for w > 0. Because of this last assumptionit follows from Eq. (8.12) that hl(T) is real and, hence,hli(T) = O. (Actually, allowing hli(T) to be nonzero does not significantly alter the error calculation other than causing an increase in the necessary bookkeeping. It might also be added that this assumption is valid in most realistic problems.) But this implies the matrix elements [Yl]jk defined by Eq. (8.12) can be written as LY]ik = [S l]jk + [N1]jk (9.5)

where [S1]jk =T iT [s(t)] [s(s)]k{hl (t- s)cos o (t-s)}dtds (9.6) and I[l jk -IT jT (s(t)] [n(s)]k + [n(t)] [s(s)]k + [n(t)]j[n(s)] k (x) {hlr (t-s)cs w(t-s))}dtds. (9.7) (A similar expression can be written for the matrix element [Y2]jk defined in Eq. (8.22).) In Eq. (9.5), [Sl]jk represents the pure signal component of [Y1l]jk while [Nl]jk represents its distortion due to the presence of noise. Suppose, now, the sample function of the signal process during the observation interval [O,Tf ] is given by s(t) = sl(t)cos wot + s2(t)sin w t (9.8) where sl(t); Sl(t+T) and s2(t) = s2(t+T) if T < max T j(a), i.e., the signals sl(t) and s2(t) do not vary significantly in the time it takes a wavefront to propagate across the array. This is a very realistic form for a sample function since, in most communication problems, it can be written as s(t) - A(t)cos(wot+c(t)) = [A(t)cos 4(t)]cos wot - [A(t)sin ~(t)]sin w ot where the "widths" of T1 |Tf A(t)e-Jit dt

177 and 1 Tf f ( t ) e- dt f 0 as a function of w, indicate the maximum rate at which A(t) and ~(t) can vary with time. Moreover, if A(t) and ~(t) are samples from ergodic* processes (an assumption often made in such problems) and Tf is large compared to the reciprocal of these "widths," it is reasonable to approximate these "widths" by the "width" of FS(w) which implies (Assumption A6.4.1) A(t) and ~(t) as well as A(t)cos ~(t) and A(t)sin ~(t) do not vary significantly in the time it takes a wavefront to propagate across the array. Let us now investigate the error that results when the sample function of the signal process during the interval [O,Tf] is given by Eq. (9.8). If Eq. (9.8) represents the sample of the signal process during [O,Tf], the sample received by the jth array element during this interval is given by [y(t)]j sl(t)cos W (t+Tj(a)) + s2(t)sin w (t+Tj(a)) + [n(t)]o (9.9) where a represents the true direction of arrival of the signal. This follows from the above stated fact that sl(t) sl (t+T) and s2(t) I s2(t+T) if T < max T.(ai.). Then, substituting Eq. (9.9) into ij Eq. (9.6), we find [S]jk () S[Al(a)]jk (9.10) *See Ref. 43, p. 67.

178 where S = f f {sl(t)sl(s) + S2(t)s2(s)}hlr(t-s)dtds (9.11) and [Al(a)] jk is defined in Eq. (8.19). (To obtain this approximation, we have used Eqs. (8.16) and (8.17) and the fact that sl(T), s2(T) and hlr (T) are slowly varying when compared with cos w T.) From Eq. (9.11), it can be seen that S is dependent on the received signal energy while being independent of the true direction of arrival of the signal. Moreover, if we write [Y2]jk = [S2]jk + [N2]jk (9.12) where [S2]jk and [N2]jk are defined as [Sl]jk and [N1]jk with sin w (t-s) replacing cos w (t-s) in Eqs. (9.6) and (9.7), respectively, we find also [S2]jk - (- *)S[A2(a)]jk (9.13) (The matrix element [A2(a)]jk is defined in Eq. (8.20).) As a result, Eq. (8.18) reduces to X(ti) " (4)S[TrQ[A(ai) [Ail(a) ]+ [A2(ai)]T[A2(a)]] 4 1 L + Tr{[Al(ai)]T[N1] - [A2(i)]T [N2]}. (9.14) Let us now investigate the variation of the terms on the right of Eq. (9.14) as a function of ai. If we let V'(ai) be the function enclosed in brackets which multiplies S, i.e., if we let a,(li.) = {[Ai(oi)]T[AI(O)] - [A2(cOI)]T[A2(cc)]}, (9.15)

179 the application of Eq. (8.16) reduces this expression to Z'(i) = cos[w (T.j(ci)-T (O)-T. () +T k()] (9.16) jkFK o j i k i j k which is easily seen to be maximum when ai equals the true angle of arrival a. It then follows that the first term in Eq. (9.14) is the product of a term which depends on the signal energy and a term which depends on the geometry of the array in such a manner that it achieves a maximum value when a. equals a. The second term of Eq. (9.14), on the other hand, distorts this approximation of X(a.i) so the ai which maximizes ~'(ai), i.e., the true direction of arrival a, is not necessarily the ai which maximizes the approximation. Thus, in order to compute the error that results from the use of the estimation techniques derived in Chapter 8, it is necessary to investigate this distortion since our estimate of the true direction of arrival is the ai. which maximizes M(ai)-B(ai) ~ At this point, let us again consider the functional form of the "bias" B(ai). In Chapter 8, two arguments were given to demonstrate the independence of B(ai) and ai in the Special Case Solutions of that chapter. One of these arguments made reference to the results we have just derived. At that time, it was stated that an approximation could be written for Q(ai) which was the sum of two statistics; the first statistic was maximum when oai and the true angle of arrival a were equal while the other statistic tended to distort the sum so the maximum of the approximation was not necessarily a. This property has now been verified. In addition, it was stated that a reasonable (though possible suboptimal) form for B(ai) is the expected value of the distorting term in the approximation of (ai.) since an unbiased estimate then results, 1

180 i.e., if B(a.) has this dependence on ai., our estimate equals the true angle of arrival a when the distorting term is equal to its expected value. If we now refer to Eqs. (9.7) and (9.12) where [N1]jk and [N2]jk are defined and replace the sample functions [n]. and [n]k by the processes [N]j and [N]k, respectively, it can be seen that the expected value of these matrix elements are zero except when j equals k. Moreover, if this result is substituted into Eq. (9.14), it is easily seen that the expected value of the second term of this equation (the distorting term) is independent of ai which then justifies the statements made in Chapter 8 concerning this independence. It is now possible to estimate, for a limiting case of special interest, the error that results from the application of the estimation technique derived in the previous chapter. In particular, if we consider the case where the received signal energy is much greater than the received noise energy, it follows from the above discussion that the first term in Eq. (9.14) dominates and the maximum of this sum is achieved when a(, is "near" a since ~'(ai) is a continuous function of a, which achieves its maximum value at ao = a. Moreover, because of the form of 9'(ai) as a function of ai, we can approximate V'(a) in the vicinity of a by the first three terms of its power series expansion about a. (In the next section where a specific array geometry is considered, it will be possible to determine the range of values of ai over which the approximation is valid.) As a result, Eq. (9.14), in the vicinity of a, becomes 1 1 a2 (a -a)2 P (~ a-)Sk'(a) + (-)s 2 ['(a)] +- + N. (9.17) i 4 4 i a where Ni = Tr{[Al(ai)]T[N1] - [A2(ai)]TN2]} (9.18)

181 (Note that the second term of the power series expansion of Z'(ai) about a is not present in Eq. (9.17) because the slope of.'(ai) is zero at ai = a.) Then, taking the partial derivative of 2(ai)-B(ai) with respect to ai and setting it equal to zero to obtain a necessary condition for (a i )-B(ai) to achieve its maximum, we find A = a + C where UN. 11 A Ha. a. =Ot ~ i 1 E (9.19) a2 S [(i) ] a 1i and a represents the optimal estimate. This approximation for the error, as stated above, remains valid as long as e is not greater than the maximum value of lai-al for which the first three terms of the power series expansion of ~'(ai) remains a valid approximation of Z'(ai). To obtain a measure of the error in the above case, let us now evaluate the expected value of the square of the error approximation given by Eq. (9.19). Applying the identities of Eqs. (8.16)-(8.17) and the fact that sl(T), s2(T), h lr(T), and rN (T) are slowly varying when compared with sin w T and cos w T, we find a2} 2 i e'(al { {cos Wo(T(a)-T (a)- iA)+T (a))} i j,k,mEK (X)-{ ( } {()-T Tk( a))}+2N E { —-( T (Td()-T- ( ))}2} (x{ (jk j,keK D J k (9.20) where* *See Ref. 64, p. 71 for a discussion of the expectation of a product of Gaussian random variables.

182 N = j | {Sl(t)sl(u)+s2(t)s2(u)}rN (s-v)hlr(t-s)hlr (u-v)dtdsdudv T T T T o (9.21) and N = | | I rN (t-u)rN (s-v)hlr (t-s)hlr(U-v)dtdsdudv T T T T To 0 (9.22) (Remember, this expectation has been calculated assuming sl(t) and s2(t) are known functions of time with only the noise processes being allowed a random nature.) Thus, the above measure of system performance is seen to be a function of terms which are independent of the geometry of the array, i.e., N, N, and S, and terms which depend only on the geometry, a2 i.e., v- [.'(ai)]I = and the two indicated summations. Before evaluating a specific system by the use of Eq. (9.20), let us derive an additional property of the function Q'(ai) which appears in this equation. Applying the identity of Eq. (8.16) to Eq. (9.16), we find X'(ao) = cos w (Tk(a)-T())i + sin w (Tk()-T ( 2k keK co i kEK (9.23) Let us now assume the receiving antenna array possesses the geometrical symmetry, when a vector is drawn from a particular point (a point of symmetry) to any element of the array, there exists another element of the array for which a vector drawn from the point of symmetry to this element is the negative of the first vector. The position occupied by the center element of a linear array of an odd number of equally spaced elements is an example of such a point of symmetry. If this symmetry exists, with a little reflection we can see the second term in Eq. (9.23)

183 is zero. For every element in this sum, there exists another element also in this sum whose value is the negative of the first. Thus, for this case, we have 1triki) ok- cos w (T (ai)-Tk(a)) (9.24) 9.3.1 Example In this section we shall evaluate the performance of the system which was designed in Section 8.2.1. This evaluation will use Eq. (9.20) to demonstrate the variation of the expected value of the squared error with various parameters such as the "signal-to-noise" ratio and the total number of antennas in the receiving array. Let us begin by simplifying the expression for Q'(ai) which, in this case, has the form presented in Eq. (9.24). Substituting Eq. (8.43) into Eq. (9.24), we find K' 2 ~'(i) — = tkE CKos kF( i, ) (9.25) k=-K' where K' = (K-1)/2 and w D F(ai, ) = 0 (cos a-cos ao). (9.26) But Eq. (9.25) can also be written as' ( =i) lRe exp{-jK'F(ai,a)} I exp{jkF(ai,a)})J (9.27) k=O where j is the complex number 4C1 and Re[ ] denotes the "real part" of the indicated quantity. Then, since the sum on the right side of Eq. (9.27) is a Geometrical Progression,* it reduces to *See Ref. 61, p. 317.

184 [l-exp(j(2K'+l)F(ai,a)) }] [l-exp{jF(ai') }] so that'(ai ) becomes e )1-e{-j(K' + )F(i, ia)}. (a.) Re i 2 1 =| L exp{j(l/2)F(ai,a)}-exp{-j(1/2)F(a,a)} J sin[(K' + 2)F(ai,a)] (9.28) = t -n 2 i(9.28) sin[ (1/2)F(ai,a)] Finally, taking the second partial derivative of the expression in Eq. (9.28) with respect to ai and evaluating it at ai = a, we obtain 2 )]- = -(D'sin a)2. K [(2K,+1)2_1] 1 i6 - -(D'sin a)2 K (9.29) since (2K'+l) = K >> 1, where D' = w D/c. It is now possible to answer the question that was posed earlier concerning the range of values of ai and a for which the first three terms of the power series expansion of Z'(ai) remains a valid approximation of 2'(ai). Comparing the value of 2'(a.) as specified in Eq. (9.28) with its series approximation, we find the approximation to be "good" as long as ecos ai - cos at < c (9.30) 1 tw DK and 1 - cos a > 2 (9. 31) 0 1. % nV

185 The region specified by Eq. (9.30) is approximated by the region between half-power points of the antenna pattern of the array when operating as a phased array and oriented in the a direction. However, a must be greater than a "beam-width" from the a = 0 direction as stated by Eq. (9.31). (If a = 0, the third term of the power series expansion is zero as can be seen from Eq. (9.29).) Turning now to the summation multiplying the N1 term in Eq. (9.20), we find K' osA= I l (J-k)(m-k)) [ ( (933) j,k,m=-K'I Dj,km=K' < {a [ (Tm( )- Tk()]} = (Dsin where K' El = E (Q-m-k). (9.35) j,k,m=-K' We Similarle now in a position multo present an explicit expression whichven bounds the expected value of the squared error (Dsin this (9example. Sub-34) K' E1:2 (j-k)2. (9.35)

186 stituting Eqs. (9.29), (9.32), and (9.34) into Eq. (9.20), we obtain the relation +{ 2-+ (9~36) 2 D'sin a S2 \K8 s2 \K8/ (936) which shows clearly how an upper bound on E{c2} varies with the amount of signal and noise present as well as the number of elements in the antenna array. (The quantities N and N2 will be shown later to be positive.) The terms (N /S2) and (N /S2) represent generalized "signal-tonoise" ratios while the terms (E1/K8) and (E2/K8) depend only on the size of the array. In order to give Eq. (9.36) a numerical interpretation, let us further assume sl (t) and s2(t) are samples from independent Gaussian processes whose power spectrums are each equal to FS (). If this is so the case, the sample function s(t) defined in Eq. (9.8) is a sample function from a Gaussian process whose power spectrum is FS(w). Let us also assume s1(t) and s2(t) have Fourier transforms whose squared magnitude equal the expectation of the squared magnitude of these transforms over the probability space. We can then write* lim 13I{ 1 sl(t')}2 = lim |I{ s2(t')}12 = FS (W) T T f T- O fso f f where } is the Fourier transform of the indicated quantities and t' = t-Tf/2 with sl(t') and s2(t') defined to be zero outside the interval [-Tf/2,Tf/2]. Moreover, since Tf has been assumed "large" compared with the reciprocal of the "bandwidth" of FS (), it is reasonable to approximate Is{sl(t)}l2 and | {s2(t)}t2 by TfFS (X). *See Ref. 43, p. 108.

187 Substituting the above approximation for b#sl(t)}12 and Ij{s2(t)}l2 together with Eq. (8.36) into Eq. (9.11), we find S X Tf| FS (w)Fhl (w)dw * co 0 Tf EN) 1 7 T I - -S _ (9.37) K EvN b' b2+KES /ENJ To evaluate the quantities N and N which are defined in Eqs. (9.21) and (9.22), let us first realize that hlr (T) has "approximately" the same form as rS (T). Then, since rN (T) has a Fourier transform which remains flat until w- a and since a >> b (see Fig. 8.9), it follows that rN (T) is very "narrow" compared with rS (T) and hlr(T). Applying this fact, Eqs. (9.21) and (9.22) reduce to IN T IT 1 {sl(t)s1 (u)+s2(t)s2(u)}hlr (t-v)hlr (u-v)dtdudv (9.38) and ( N)2'T T {hlr(ts)}2 dtds, (9.39) respectively. Then, substituting the approximation for I {(sl (t) } 2 and I~{s2(t)})2 discussed above together with Eq. (8.38) into Eq. (9.38), we find — 20 N ENTf F ((h f S |1 1 1 K(ES/EN) o N =bES/EN)3/2 1 94) (It can be seen from Eqs. (9.39) and (9.40) that N and N are positive quantities as stated earlier.) Moreover, if we let T=t-s, Eq. (9.39)

188 becomes f ( N) lo | {hlr(T)}2dTdt v Tf /ESN 2 1 exp{-2Tf/bZ+KEs/EN} (b2+KES/EN)3/ 2Tf bo+KE /EN 1 1 2T V'bZ+KE' /E V'T E \2 i T 32f E - (9.41) (b2+KES/EN) N To obtain this last result, we have substituted Eq. (8.38) and performed the indicated integration. Finally, since we are considering the large signal case, Eqs. (9.37), (9.40), and (9.41) reduce to fS Tr E) f (9.42) N 7T Tf E ) Kbo K \EN/ (9.43) and -2 v12 Tf___ N % L, (9 44) K3/2 E 13 respectively. The only quantities in Eq. (9.36) which still require further simplification are the summations E1 and E2. But, since* K' K'(K'+1) k=O *See Ref. 62, p. 317.

189 K' k2 = K' (K'+l)2K'+l) k=-K' and K >> 1, it follows easily that K' (K'+l)(2K'+l)3 1 < 3(2K'+l)(K')2(K'+l)2 + (K+1) (2K K5 4 (9.45) and =2 2K' (K'+1) (2K'+l)2 3 K4 6 K (9.46) Finally, substituting Eqs. (9.42)-(9.46) into Eq. (9.36), we obtain the desired relation E2} < N(k)(D'sn a) ()( )(9.47), 3'sin Tf E f which illustrates an upper bound on the expected value of the squared error as a function of the parameters EN, ES, K, b, Tf D', and a. Remember, however, this expression is valid only in the "large" signal case where a is greater than a "beamwidth" from the a = O direction.

CHAPTER 10 SUMMARY AND CONCLUSIONS The research described in this dissertation is concerned with the determination of optimal procedures for estimating directions of arrival of signals emitted by stochastic sources. To make these estimates, it is assumed that an array of K omnidirectional antennas is available for gathering data. Two distinct approaches to the problem are pursued which are described in Chapter 1 along with a general discussion of existing Direction Finding techniques. The first approach which is investigated is Chapters 2-5 applies "Stochastic Optimal Control Theory" and assumes the array is operating as a phased array whose control laws for directing the pointing angle and specifying the beam width of the array are desired. In addition, the forms of optimal filters for processing the received signals of the phased array are investigated. In Chapters 6-9, the array functions only as an information gathering device and "Estimation Theory" is applies to determine the processing that is necessary to produce optimal direction-of-arrival estimates. In Chapter 2, a greatly simplified model of a phased array direction finding system is constructed in hopes of obtaining a model which approximates the operating conditions of a realistic system and which allows a meaningful analysis. In Chapter 3, this model is investigated under the open-loop operation conditions where the control laws are required to be independent of past controls and past signals received by the array and, thus, are chosen a priori. Results are obtained for this case with the added assumption that the a priori probability density of the possible angles of arrival is concentrated in a "narrow" region 190

191 about the a priori expected value of this angle. These results indicate the system should operate with the array pointing angle directed so the maximum slope of the antenna pattern is oriented in the a priori expected direction of arrival. In addition, if the signal level is unknown and the antenna pattern is symmetric, the two orientations which satisfy this condition should be used for equal time intervals. In Chapter 4, this model is investigated under the closed-loop conditions where the control laws are allowed to be dependent on past controls and past signals received by the array. An optimal filter of the signals together with an integral equation whose solution represents the optimal controls are derived. In Chapter 5, the results of Chapters 2-4 are discussed more fully. In Chapter 6, a new model for a direction finding system is constructed which employed the antenna array only to gather information. Various special cases which restrict the possible forms of the signal and noise processes of this model are also discussed. These special cases all require the signal and noise processes to be narrow-band and the observation interval to be long compared with the reciprocal of the bandwidth. Chapter 7 contains a derivation of an optimal estimation technique for this model without the narrow-band and observation interval constraints. It is found that the Radon-Nikodym derivative of a particular Gaussian probability measure with respect to another Gaussian probability measure plays a key role in this technique. Forms for this derivative are found which allow an evaluation by the use of physical devices. In Chapter 8, the narrow-band and observation interval constraints are introduced and result in a greatly simplified solution for cases which are of engineering importance. Finally, Chapter 9 contains

192 an analysis of the errors that result when the estimation techniques of Chapter 7 and Chapter 8 are applied. Comparing the two approaches, we find the application of "Estimation Theory" techniques results in a solution which allows implementation with existing hardware while the application of "Stochastic Optimal Control Theory" techniques produces easily implemented results only in very restricted situations. The difficulty associated with the control theory approach stems from the infancy of the existing theory of "Stochastic Optimal Control" when nonlinearities are present. "Estimation Theory," on the other hand, possesses many powerful tools which are applicable to the Direction Finding Problem. To summarize, the contributions of this dissertation are then the following: 1. Control Theory and Estimation Theory Techniques have been applied to the Direction Finding Problem with the result that only the Patter have been found to produce a solution which is easily implemented. 2. The work of T. T. Kadota on the "Optimal Reception of M-ary Gaussian Signals in Gaussian Noise" has been refined and extended to the vector valued signal case so that it can be applied in the design of DF systems employing arrays of omnidirectional receiving antennas. 3. In two narrow-band signal cases, a relatively easily implemented technique has been found for evaluating the Radon-Nikodym derivatives which are called for by the estimation procedure following from the above extension. Numerous other special cases exist which restrict the applicability of the general solution presented in Chapter 7 while remaining of great practical interest. For example, the introduction of multiple

193 narrow-band signal sources produces a problem whose analysis is different from that required when a single signal source is present. Also, the presence of noise sources which possess direction properties typify many realistic situations. Both of these cases, conceptually at least, fit easily into the framework of the analysis presented in Chapter 7. However, it is not clear at this time that simplified solutions of the form derived in Chapter 8 are possible. Finally, there are several unanswered theoretical questions which are discussed in the text of this report and which deserve further investigation.

APPENDIX I A RELATIONSHIP BETWEEN a-FIELDS Let [Y] = {[Y];tET) be a K-dimensional vector valued random process on the probability space (S, 5,P) and let Q be the space of all real, Kdimensional, vector valued functions on the interval [O,Tf]. Furthermore, letp be the class of sets of the form D = {[f]EQ;[f(tl)] < a,[f(t2)] < a,...,[f(tn)] < a} where n is any arbitrary finite positive integer, tisT for iE{1,2,..n} and a, js{1,2,...,n}, is any real vector in K-dimensional Euclidean space. Define 0 to be the class of sets of the form D = [y]-1(D) where Del.. Then ( = [yE(r (S ) ) Proof: Let,I = [Y]-l( (S2)). Since [y]-( U Bj) = U [Y]-1(B.) 9 jE J EJJ [y]-l(Q-B) = [Y]-(Q)-[Y]l 1(B) [Y]-l(Q) =,, for J, a countable set, and B, Bj e3( ), we see that 6 is a o-field. Also,o) is easily seen to be contained in s. Therefore 194

195 Now let 7Z? be the class of sets MC Q such that It can be seen that 72is also a a-field with I)C9 so that (i)) C 7). (I.1) Thus, we have ~g fi~c-~ [yV'l(' ( ))c[ [y]-lQ21) = or () = E.D. Q.E.D.

APPENDIX I I PROPERTIES OF THE OPERATOR Ri The integral operator Ri defined in Eq. (7.75) takes elements of L2[T] into elements of L2[T] and is self-adjoint, positive definite, and Hilbert-Schmi dt. Proof: Let us first show, if [g] = Ri([f]) where [f]EL2[T], then [g]eL2[T]. If[f]k denotes the kth component of [f], the jth component of [g] given by [g. ] = [ T [ri(t s)]jk [f(s)]kds Then, due to Theorem 7.6, we have [g]2 < i [ri(t s) ds [f(s) ]kds kEK T jk T and ( [g]2dt < Z i f [ri(t,s)]2k ds J [f(s) ]2 ds T kEK T T < 00 (II.1) where the final inequality follows from Assumption A6.3.1 and the fact that [f]kEL2[T]. It now follows from the definition of L2[T] that [g]eL2 [T]. Let us now prove that Ri is self-adjoint. If [f],[g]eL2[T], then (Ri([f]),[g]) = f [g(t) ]j [Ri([f ]) ]jdt jeK T I /T ( [ri(t, if(s)[gt)]J[fs)]k dtds ([f],Ri([g])) (ii.2) 196

197 To prove that Ri is positive definite, note that Ro automatically has this property due to Assumption A6.3.1 and that, for it{1,2,...), [ri(t,s)] = [rO(t,s)] + [r(i)(ts)] (II.3) where [r~(t,s)] is the noise covariance function and [r(i)(t,s)] is the covariance function of the signal process [Si(t)] whose angular location is mi (see Assumption A6.3.1). As a result, if R(i) is the integral L2[T wit kerel [(i) operator defined on L2[T] with kernel [ri(t,s)], it is sufficient to prove R(i),i'{l,2,...}, is positive since R. = Ro+Ri). For any [f]kL2[T], we have (R(i)( [f ]) =If j S U(t)U(s)(P(i)xtxs)dtds (II.4) where U(t) = [si(t)]T [f(t)]'and [ni] is a measurable, zero mean Gaussian process with [r(i)(t,s)] as its covariance function. (The process [Si] exists due to Theorem 7.9.) But Eq. (II.4) reduces to f fU(t)dt}2 dP > 0 (".5) Q1 T (i) if the order of integration can be reversed. This order can be reversed, by Theorem 7.11, if U(t)U(s) or equivalently, by Theorem 7.4, if IU(t)U(s)l is integrable on the product space QxTxT. Due to Theorem 7.6, LI IU(t)U(s)IdX((i) xtsx)]2 < (U(t))2dX 5 QxTxT IxTxT 2xTxT where X is the product measure induced on QxTxT. But, by Theorem 7.10, xTxT (U(t))2d =:)- (U(t))2dP) dtd s Tfk KT [rTi)(t't)]jk[f~t)] j[f(t)]kdt (II.6)

198 Moreover, the right hand integral of (II.6) is finite since l [r(i)(t,t)]jk[f(t) ]j [f(t)]kldt < [[r(i)(t,t)]jk[f(t)]j]2dt}l/2 T T (x) { [f(t)] 1/2 fbif[f(t)] dt1/ [f(t)]dt < [f(t)]dt[f(t) dt}/2 k j T T T again by Theorem 7.6, where bi is the bound on the function [r(i)(t,s)] jk discussed in Assumption A6.3.1. Thus R(i) is positive and Ri,isMo, is positive definite. To prove that Ri is Hilbert-Schmidt, we shall first show that it is completely continuous. But Assumption A6.3.1 together with Theorem 7.18 and Theorem 7.19 imply this statement. Then, since Ri is positive definite and completely continuous, Theorem 7.20 says that there exists an orthonormal basis, {[L]}, for L2[T] whose elements are eigenfunctions of Ri and an associated sequence of eigenvalues {X'}. Let us now con1 n sider.' I [ri(t,s)]2 dsdt (II-7) T T j,kcK jk which equals Tr Q [r (t,s)].[ri(t,s)] dsdtd (II.8) T when Tr{ } denotes the trace of the given matrix. Then, applying Theorem 7.22 and Assumption A6.3.1 which say that 03 [r (t,s)] = xi[pi(t)]'[~n(s)], (II.9) n n n n=l (II.8) becomes Tr ~ 1' ( Tr4X ml& p~t -[<s ]-[~s $nt dsdt ~ (II. 10)

199 when the order of summation and integration is interchanged which is possible due to the uniform convergence of the series. Performing the s integration, (II.10) reduces to (Xi)2j [m(t) ].[ [i(t) ]dt m T which equals c0 (mi)2(m (II.) jEK m=1 m when we let amj = ( [m(t)]2dt. (II.12) T (Note that amj is positive.) Then, interchanging the order* of summation in (II.11), this sum reduces to I (X1)2 (II.13) m=l since C amj = ft [lm(t)]T. [i(t)]dt = 1. j K T m But (II.13), which equals (II.7), is also equal to the square of the Hilbert-Schmidt norm of Ri, i.e., 0o 00 N2(Ri) = I I Rij[m]112 = (Xm)2 m=l1 m=l (11.14) and is finite as a result of Assumption A6.3.1. Q.E.D. *See Ref. 59, p. 161.

APPENDIX III A PROPERTY OF THE PROCESSES [Zi] AND [Z~] If [Zi] and [Z~] are the K-dimensional, zero mean, measurable Gaussian processes discussed following Theorem 7.25 and {[gj]} is any sequence of elements from L2[T], then the sequence of random variables {Ji} and {Co) defined for all [f]sQ by ej([f]) = ( [zi(t,[f])]T[gj(t)]dt T and 0~([f]) = [Z~(t,[f])] T[g.(t)]dt i i are jointly Gaussian on (Q2, 6'Pi ) and ( k 6,P0) respectively. Proof: We shall only prove the random variables 0i,1... are jointly 1 2 Gaussian since the proof that the random variables 80,~0,... are jointly 1 2 1inui i Gaussian is identicsal. But, to prove 02 e jointly Gaussian, it.. suffices to prove i,,....6 1 are jointly Gaussian when n is an arbitrary 1 2 n integer since the sequence {(i} can be reordered without loss of generality. Let us first verify the fact that i,j{1,2,...2, is a random variable. Since [Zi] and [gj] are measurable on QxT, [Zi]T[gj ] is measurable on QxT and' l [j~ld(P xt) < f [Z ]kT[gj]kd(Pixt) x [Zi]T~g j] d(Piit) jek QxT (III. 1) where [Zi]k and [gj]k are the kth components of [Zi] and [gj], respectively, and Pixt is the product measure on the produce space QxT. Then, 200

201 due to Theorem 7.6 and Theorem 7.10, the right hand side of Eq. (III.1) is less than kET l [gj ] dt [Zi]2dPidt But f 4[Zi]2dp dt = [r i(t,t) ]kkdt < biT where [r (t,s)]kk is the kkth element of the matrix covariance function of [Zi] and bi is the bound on this function as discussed in Assumption A6.3.1. Therefore, the left hand side of (III.1) is finite and [Zi ]T[gj ] is integrable on QxT by Theorem 7.4. Finally, j [Zi(t,[f])]T [g(t)]dt T is (measurable) and integrable on 2 by Theorem 7.10. Let us now show I,'82,.',n are jointly Gaussian. Since the continuous real valued functions* are dense in L2[T], for every [gj], je{1,2,...,n}, we can find a sequence of continuous functions ([gjk]]k for which lim I [gjk] - [gj]l = o. (III.2) k-*co Moreover, for any k, the random variables { ijk;j=1,2,,n} defined by 8ik = f [zi(t,[f])]T[gjk(t)dt jk T are jointly Gaussian by Theorem 7.12. Also, we find *See Ref. 39, p. 251.

202 lim E{ei 0i} = rim E{ei ei) jk jk jk j = E{eiei, J when the order of integration is interchanged and Eq. (III.2) is applied. This implies, for each jc(l,2,...,nj, the sequence (G[jkk converges in the mean and, hence, in probability to Gj. As a result, the characteristic function of the Gaussian random variables (Gk;j=l,2,...,n), call it k, converges* to the characteristic function of ([j;j=1,2,...,n), call it O. The general form of Gaussian characteristic functions then shows that the limiting characteristic function is also Gaussian. Q.E.D. *See Ref. 28, p. 169.

APPENDIX IV A CONVERGENCE THEOREM Let [Z~] and {[Z0]} be as in Theorem 7.28 and define Hi([Z~]) = [W~] n and Hi( [Zn]) = [WO] with Hi also as in Theorem 7.28. Then, T[Z(t,[f]) ]T[W(t [f])]dt = lim ( [Z~(t [f])]T[Wo(t [f])]dt m(Po) n-+ T where m(P ) denotes convergence in measure with respect to Po. 0 Proof: With the help of Theorem 7.6 and Theorem 7.10, we can write f {[z~]T[o~] - [Z~]T[W~]}dtIdPo < fl[z~]T{[wO] - [WO~]}dPodt Q T T Q + flI[w~]T{[zo~] [Zn]}-dPodt T Q T Q where n Zn( - n -1= {W~ [Wol]}T{[Wol - [W~]}dprdt T Q and T Q Then, from Eqs. (7.118) and (7.120), [Zo]=l.i.m.[Zn](txP) and [W] = n liO[W(txP ) so that and 2 approach zero as n-*. Furthermore, n n n f f[Wo]T[wo]dP dt = J [WoT[W]dtdp T Q o T T T 203

204 and is, therefore, finite. Also, from the definition [Z], we obtain O n j [Zn]T[Zn]dPodt = I {(Ro/2[j],Rl/2[qj]) + (R1/2[j],R1/2[ j])} T Q j=l (IV.2) with {[~j]} and {[~j]} as in Lemma 7.9. But, as shown in Lemma 7.9, the summation on the right of (IV.2) converges to Tr {5 [r~(t,t)]dt} which is T finite by Assumption A6.3.1. It can now be seen that the right hand side of Eq. (IV.1) approaches zero as no-. Now let An = {[rf]: Ij[Z]T[Wo] - [ZO]T[wO]dtl > } T where E is any real number greater than zero. Then, if no is chosen such that, for n > no, the left hand side of (IV.l) is less than E2, it follows that Po{An} < c when n > n0 and ~ [Z~(t,[f])]T[wo(t,[fl)]dt = lim [Zn(t,[f])]T[w~ (t,[f])]dt m(Po) T no T which implies [Z~O(t,[fl)]T[W(t,[f])]dt = lim f [Z~((t,[f])]d[w~( t m(P ) n ~ r n' T n-.E T Q.E.D.

APPENDIX V COMPARISON OF "MINIMUM PROBABILITY OF ERROR" AND "'MAXIMUM A POSTERIORI PROBABILITY" PARTITIONS In this appendix we shall discuss the relationship that exists between the "Minimum Probability of Error" and the "Maximum A Posteriori Probability" partitions of the observation space. In particular, we shall find, defining the "A Posteriori Probability" in a special way, these two partitions are identical. It should be pointed out, however, this discussion is not intended to be a rigorous proof of this relationship but rather an addition argument illustrating the "optimality" of this partition. In Theorem 7.24, it was found that the partition of the observation space that minimizes the "Probability of Error" classifies the observation [y] in the subset Ai if i is the least index for which ai -- ([y]) > aj dPi ([y]) (V.1) dP dP for all.ji. (The set for which ai. ([y]) achieves max ai di ([y]) 1 d17o 1 1 for more than one index i has zero probability as shown in Theorem 7.25 and, therefore, need not be considered.) The constants {ai} represent the a priori probabilities of the various angles of arrival. The question now arises, "To what are we referring when we specify the a posteriori probability of the angle of arrival given a received observation [y]?" To answer this question, let us first restrict ourselves to the discrete case where the noise process [N(t)] and the possible signal processes ([Si(t)]} take on only a finite set of possible 205

206 values. Then, if [y] is an observation of one of the processes [yi(t)] = [N(t)] + [si(t)], (V.2) Bayes* rules gives us Pr(i/[y]) = aiPri([y]) (V.3) i aiPri ([Y]) where {ai} represents the a priori probabilities of the measures on the processes {[Si(t)]}, Pri([y]) is the probability of [y] given [Si(t)] is the transmitted process and Pr(i/[y]) is the conditional or "a posteriori" probability of i being the index of the transmitted process given [y] is the received observation. The "Maximum A Posterior Probability" of the index is then the least index i for which Pr(i/[y]) > Pr(j/[y]) (V.4) for all j/i. (In the general case where the noise and signal processes are not limited to a discrete set, the measure of the set for which more than one index can achieve this value has zero measure.) Moreover, if we divide the numerator and denominator of (V.3) by Pro( [y]) which denotes the probability of [y] given noise alone is present, we find the index i which satisfies (V.4) also satisfies Pri([y]) Prj([y]) ai > aj (V.5) PrO([y]) Pro([Y]) for all jfi since Pro([y]) and E aiPri([y]) are independent of the index *See Ref. 43, p. 1.i 1. *See Ref. 43, p. 18.

207 Suppose, now, the noise and signal processes are no longer limited to a finite set of possible values. In this case, we see that both Pri([y]) and Pro([y]) go to zero so, to generalize (V.5), it is reasonable to consider some type of limit operation of the form Pri( [y ]EA ) lim - (v.6) ~E-oo Pro( [y]eA ) where {A~} is a monotone decreasing sequence of measurable sets containing [y] which, in some sense,approaches [y]. To obtain an expression for this limit, let us denote by Po and Pi the measures induced on the observation space by Pro and Pri, respectively. Then, if Po and Pi are equivalent, due to Theorem 7.14 we can write Pi(A~) = 5 -idPo (V.7) Ag dPo where dPi/dPo is the Radon-Nikodym derivative of Pi with respect to Po. It can now be seen, if dPi/dPo approaches a constant value on As as C+0, the limit expression of (V.6) becomes Pi([Y]EAr) dP lim = (v.8) ~+0 Po([y]cAC) dPo But, according to Theorem 7.28, Pi and Po are equivalent if there exists a bounded self-adjoint operator Hi which satisfies Eq. (7.112) and, if such an operator exists, then dPi dP = Dl1exp{Hi H([y]),[Y]) } (V.9) 0 2 Thus, if such anHi exists and if we define A= {[f]Q:ll[f] - [Y]lJ < C}, (V.10)

208 we see that, indeed, dPi/dPo approaches a constant value on At as &+-O since Hi is a bounded operator. Returning now to (V.5) and replacing Pri([y])/Pro([y]) by the limit expression of (V.6) which equals dPi/dPo due to Eq. (V.8), we see that a reasonable "Maximum A Posteriori Probability" partition of the observation space places [y] in Ai if dPi dPj ai _> aj _ dPo dPo for all j$i. But this is exactly the "Minimum Probability of Error" partition as shown in Eq. (V.1). This completes the discussion of this appendix which, as stated in the opening paragraph, has been somewhat less than rigorous. Hopefully, however, it has given some additional insight into the degree of "optimality" enjoyed by this particular partition.

REFERENCES 1. H. Hertz, Electric Waves, Macmillan Co., 1893. 2. G. Marconi, "Wireless Telegraphy," Journal of the Institute of Electrical Engineers, 28 (1899). 3. J. Zenneck, Wireless Telegraphy, McGraw-Hill Book Co., 1915. 4. S. G. Brown, "Directed Wireless Telegraphy," Electrician, 57 (1906). 5. R. Keen, Wireless Direction Finding, Iliffe and Sons Limited (1938). 6. E. Bellini and A. Tosi, "The Range and Advantages of Directive Aerials and the Bellini-Tosi Radiogoniometer," La Lumiere Electriue, 5 (1909). 7. R. S. Smith, "Electronic Observers for Radio Direction Finding," RRL Publ. No. 273, Radiolocation Research Laboratory, Univ. of Ill., Urbana, Ill., Feb. 1965. 8. W. J. Lindsay and D. S. Heim, "Design for Spinning Goniometer Automatic Direction Finding," J. of Res. of the National Bureau of Standards-D. Radio Propagation, 65D, No. 3, 237-243(May-June 1961). 9. C. E. Lindahl and B. F. Barton, "Analysis of a Detection Technique for a Spinning-Goniometer Direction-Finding System," IEEE Trans. on Aerospace and Navigational Electronics, 124-127 (June 1963). 10. J. L. Allen, "Array Antennas: New Applications for an Old Technique," IEEE Spectrum, 115-130 (Nov. 1964). 11. R. C. Hansen, editor of Microwave Scanning Antennas, Vols. 2-3, Academic Press, 1966. 12. S. A. Schelkunoff and H. T. Friis, Antennas, Theory and Practice, John Wiley and Sons, Inc., 1952. 13. K. J. Astrom, R. W. Koepcke, and F. Tung, "On the Control of Linear Discrete Dynamic Systems with Quadratic Loss," Res. Report RJ-222, IBM, San Jose Research Lab., San Jose, Calif., Sept. 1962. 14. A. A. Fel'dbaum, "The Theory of Dual Control, I-IV," Avtomatika i Telemekhanika, 21, Nos. 9, 11 (1960); 22, Nos. 1,2 (1961). 15. E. P. Maslov, "Markov Object Parameter Estimation," Avtomatika i Telemekhanika, 25, No. 1 (1964). 16. V. P. Zhivoglyadov, "Optimal Dual Control of Plants with Pure Delay," Avtomatika i Telemekhanika, 25, No. 1 (1964). 209

210 REFERENCES (Continued) 17. R. L. Stratonovich, "Certain Extremal Problems in Mathematical Statistics and Conditional Markov Processes," Teoriya Veroyatn i Primeneniya, t. 7, v. 2, 1962. 18. R. E. Mortensen, "Optimal Control of Continuous-Time Stochastic Systems," Ph.D. Thesis, Univ. of Calif., Berkeley, 1965. 19. W. M. Wonham, "Stochastic Problems in Optimal Control," 1963 IEEE International Convention Record, Part 2, pp. 114-124. 20. A. M. Legendre, Nouvelles methods pour la determination des orbites des cometes, Paris, 1806. 21. K. F. Gauss, Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections, Dover Pub. Inc., 1963 (reprint). 22. K. Pearson, "Contributions to the Mathematical Theory of Evolution," Philos. Trans. Roy~ Soc. London, 185 (1894). 23. R. A. Fisher, "On an Absolute Criterion for Filtering Frequency Curves," Mess. of Math., 41 (1912). 24. R. A. Fisher, "On the Mathematical Foundations of Theoretical Statistics," Philos. Trans. Roy. Soc. London, 222 (1922). 25. R. A. Fisher, "Theory of Statistical Estimation," Proc. Cambridge Philos. Soc., 22 (1925). 26. N. Wiener, The Extrapolation, Interpolation, and Smoothing of Stationary Time Series, John Wiley and Sons, Inc., 1949. 27. A. N. Kolmogorov, "Interpolation and Extrapolation von Stationarew Zufalligen Folgen," Bull. Acad. Sci. (USSR), Ser. Math., 5 (1941). 28. M. Loeve, Probability Theory, D. Van Nostrand Co., Inc., 1963. 29. J. L. Doob, Stochastic Processes, John Wiley and Sons, Inc,, 1953. 30. P. K. Halmos, Measure Theory, D. Van Nostrand Co., Inc., 1950. 31. F. W. Lehan and R. J. Parks, "Optimum Demodulation," 1953 IRE National Convention Record, pp. 101-103. 32. D. C. Youla, "The Use of Maximum Likelihood in Estimating Continuously Modulated Intelligence which has been Corrupted by Noise," IRE Trans. on Information Theory, IT-3, 90-105 (March 1954). 33. D. Slepian, "Estimation of Signal Parameters in the Presence of Noise," IRE Trans. on Information Theory, IT-3, 68-89 (March 1964).

211 REFERENCES (Continued) 34. J. B. Thomas and E. Wong, "On the Statistical Theory of Optimum Demodulation," IRE Trans. on Information Theory, IT-7, 420-425 (Sept. 1960). 35. H. L. Van Trees, Unpublished notes for an Electrical Engineering course offered by the Mass. Inst. of Tech., Cambridge, Mass. 36. T. T. Kadota, "Optimum Reception of Binary Gaussian Signals," The Bell System Technical Journal, 43, No. 6 (Nov. 1964). 37. T. T. Kadota, "Optimum Reception of Binary Sure and Gaussian Signals," The Bell System Technical Journal, 44, No. 8 (Oct. 1965). 38. T. T. Kadota, "Optimum Reception of M-ary Gaussian Signal in Gaussian Noise," The Bell System Technical Journal, 44, No. 9 (Nov. 1965). 39. T. T. Kadota, "Generalized Maximum Likelihood Test and Minimum Error Probability," IEEE Trans. on Information Theory, IT-12, 65-67 (Jan. 1966). 40. E. J. Kelley, I. S. Reed, and W. L. Root, "The Detection of Radar Echoes in NoiseI,"J. Soc. Indust. Appl. Math., 8, No.2, 309-341 (June 1960). 41. E. J. Kelley, I. S. Reed, and W. L. Root, "The Detection of Radar Echoes in Noise, II," J. Soc. Indust. Appl. Math., 8, No. 3, 481507 (Sept. 1960). 42. R. E. Bellman, Dynamic Programming, Princeton Univ. Press, 1957. 43. W. B. Davenport and W. L. Root, Random Signals and Noise, McGrawHill Book, Co., Inc., 1958. 44. A. E. Taylor, Functional Analysis, John Wiley and Sons, Inc., 1958. 45, T. S. Pitcher, "An Integral Expression for the Log Likelihood Ratio of Two Gaussian Processes," J. SIAM, App. Math., 14, No. 2, 228233 (March 1966). 46. B. H. Bharucha, "Minimum Error Probability Detection for Infinitely Many Hypotheses," IEEE Trans. on Information Theory, IT-12, 65 (Jan. 1966). 47. H. R. Pitt, Integration Measure and Probability, Oliver and Boyd, 1963. 48. C. Goffman and G. Pedrick, First Course in Functional Analysis, Prentice-Hall, Inc., 1965.

212 REFERENCES (Concluded) 49. N. Dunford and J. T. Schwartz, Linear Operators, Vol. 2, Interscience, New York, 1958. 50. J. Neveu, Mathematical Foundations of the Calculus of Probability, Holden-Day, Inc., 1965. 51. W. L. Root, "Singular Gaussian Measures in Detection Theory," Proc. of the Sym. on Time Series Analysis, John Wiley and Sons, Inc., 1963. 52. W. Schmeidler, Linear Operators in Hilbert Space, Academic Press, 1965. 53. A. C. Zaanen, Linear Analysis, North-Holland, Pub. Co., Amsterdam, 1953. 54. R. C. Buck, Studies in Modern Analysis, Vol. 1, Prentice-Hall, Inc., 1962. 55. F. Riesz and B. Sz-Nagy, Functional Analysis, Frederick Ungar Pub. Co., New York, 1955. 56. E. J. Kelley and W. L. Root, "A Representation of Vector-Valued Random Processes," J. of Math. and Phys., 39, No. 3, 211 (Octo 1960). 57. H. L. Royden, Real Analysis, Macmillian Co., 1963. 58. T. M. Apostol, Mathematical Analysis, Addison-Wesley Pub. Co., Inc., 1957. 59. W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Book Co., 1964. 60. M F. Gardner and J. L. Barnes, Transients in Linear Systems, John Wiley and Sons, Inc., 1958. 61. D. Middleton, Statistical Communication Theory, McGraw-Hill Book Co., Inc., 1960. 62. C.R.C. Standard Mathematical Tables, Chemical Rubber Pub. Co., 1955. 63. A. A. Ksienski, "Very High Resolution Techniques," Report No. 10, Contract No. DA-36-(039)-SC-90772, Hughes Aircraft Co., Culver City, Calif., Dec. 1964. 64. K. S. Miller, Multidimensional Gaussian Distributions, John Wiley and Sons, Inc., 1964.

Unclassified Security Classification DOCUMENT CONTROL DATA- R&D (Security clalificatlon of title,. body of abstract and Indexin4 annotation muet be entered when the overall report is classlied) 1. ORIGINATING ACTIVITY (Corporate author) a2. REPORT SECURITY C LASSIFICATION Systems Engineering Laboratory Unclassified The University of Michigan 2b GROUP Ann Arbor. Mighigan 3. REPORT TITLE Optimal Space-Time Signal Processing and Parameter Estimation with Emphasis on Its Application to Direction Finding 4. DESCRIPTIVE NOTES (Type of report and Incluelve datee) SEL Technical Report No. 16 S. AUTHOR(S) (Last namne, firt name, Initial) Mueller, Donald G. 6. REPO RT DATE TOTAL NO. OF PAGE 7b. No. OF REFs October 1967 225 64 Sa. CONTRACT OR GRANT NO. S9. ORIGINATOR'S REPORT NUMBER(S) DA 28-043-AMC-01870(E) ECOM-01870-14-T b. PROJECT NO. 1P021101 Ao42 C. s. THIR R PORT NO($) (Any other nwnber, that may be aaalned d. 02 7695-14 10. A V A IL ABILITY/LIMITAtION NOTICES Each transmittal of this document outside the Department of Defense must have prior approval of CG, U. S. Army Electronics Command, Fort Monmouth, N.J. Attn: AMSEL-WL-S 11. SUPPLEMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY U. S. Army Electronics Command Fort Monmouth, New Jersey 07703 Attn: AMSEL-WL-S 13. ABSTRACT The purpose of this study is to develop techniques for processing signals received by a special array of K-omnidirectional antennas so as to produce optimal direction-of-arrival estimates. Such estimates are useful for navigational purposes in which case the signals are electromagnetic in nature as well as for seismic investigations where the signals are mechanical vibrations transmitted through the earth. Another possible application is an underwater direction finding system which utilizes an array of hydrophones for its receiving antennas. Two distinct approaches to the problem are pursued which apply differing mathe matical disciplines. The first approach models the array as an element of a phased array direction fihding system and attempts to apply "Stochastic Optimal Control Theory" to produce optimal control laws for directing the pointing angle and specifying the beam width of the array. Optimal filtering of the signals is also considered. The controls and filter which result in "Minimum Error Variance" are considered optimal. The second approach applies "Estimation Theory" and considers the array only as an information gathering device whose signals are to be processed directly without the "prefiltering" present in phased arrays. The optimal criterion of this approach required the direction-of-arrival estimate to result in a "Minimum Probability of Error" on, what is shown to be equivalent, to have "Maximum a Posteriori Probability." DD DJAN4 1473 Unclassified Security Classification

Unclassified Security Classification 14. LINK A LINK S LINK C KEY WORDS ROLE WT ROLE WT ROLE w T Optimal control laws Minimum error variance Estimation theory Functional analysis Radon-Nikodym Dynamic programming Open-loop analysis INSTRUCTIONS 1. ORIGINATING ACTIVITY: Enter the name and address imposed by security classification, using standard statements of the contractor, subcontractor, grantee, Department of De- such as: fense activity or other organization (corporate author) issuing (1) "Qualified requesters may obtain copies of this the report. report from DDC." 2a. REPORT SECURITY CLASSIFICATION: Enter the over2a. REPORT SECUTY CLASSIFICATION: Enter the. over- (2) "Foreign announcement and dissemination of this all security classification of the report. Indicate whether "Restricted Data" is included Marking is tort by DDC is not authorized " ance with appropriate security regulations. (3) "U. S. Government agencies may obtain copies of'2b. GROUP: ~this report directly from DDC. Other qualified DDC 2b. GROUP: Automatic downgrading is specified in DoD Di- users shall request through rective 5200.10 and Armed Forces Industrial Manual. Enter users shall request through the group number. Also, when applicable, show that optional markings have been used for Group 3 and Group 4 as author- (4) "U. S. military agencies may obtain copies of this ized.!report directly from DDC. Other qualified users 3. REPORT TITLE: Enter the complete report title in all shall request through capital letters. Titles in all cases should be unclassified.,, If a meaningful title cannot be selected without classification, show title classification in all capitals in parenthesis (5) "All distribution of this report is controlled. Qualimmediately following the title. ified DDC users shall request through 4. DESCRIPTIVE NOTES: If appropriate, enter the type of. [ report, e.g., interim, progress, summary, annual, or final. If the report has been furnished to the Office of Technical Give the inclusive dates when a specific reporting period is Services, Department of Commerce, for sale to the public, indicouvered. cate this fact and enter the price, if known. 5. AUTHOR(S): Enter the name(s) of author(s) as shown on 11. SUPPLEMENTARY NOTES: Use for additional explanaor in the report. Enter last name, first name, middle initial, tory notes. If military, show rank and branch of service. The name of the principal author is an absolute minimum requirement. 12. SPONSORING MILITARY ACTIVITY: Enter the name of REPORTthe departmental project office or laboratory sponsoring (pay6. REPORT DATLi Enter the date of the report as day, ng for) the research and development. Include address. month, year; or month, year. If more than one date appears a on the report, use date of publication, 13. ABSTRACT: Enter an abstract giving a brief and factual. TOTAL NUMBER OF PAGES: Tetalpgsummary of the document indicative of the report, even though shd7a. TOTAL NUMBER OF PAGES: iThe total page count it may also appear elsewhere in the body of the technical reshould follow normal pagination procedures i.e, enter the port If additional space is required, a continuation sheet shall number of pages containinginformation port. If additional space is required, a continuation sheet shall' number of pages containing information. be attached. 7b. NUMBER OF REFERENCES: Enter the. total number of 7b. NUMBER OF REFERENCES Enter the total number of [It is highly desirable that the abstract of classified reports references cited in the report. be unclassified. Each paragraph of the abstract shall end with 8a. CONTRACT OR GRANT NUMBER: If appropriate, enter an indication of the military security classification of the inthe applicable number of the contract or grant under which formation in the paragraph, represented as (TS). (S). (C). or (U) the report was written There is no limitation on the length of the abstract. How8b, 8c, & 8d. PROJECT NUMBER: Enter the appropriate ever, the suggested length is from 150 to 225 words. military department identification, such as project number, 14. KEY WORDS: Key words are technically meaningful terms subproject number,...., mb task number et, 14. KEY WORDS: Key words are technically meaningful terms, r short phrases that characterize a report and may be used as 9a. ORIGINATOR'S REPORT NUMBER(S): Enter the offi- index entries for cataloging the report. Key words must be cial report number by which the document will be identified selected so that no security classification is required. Identiand controlled by the originating activity. This number must fiers, such as equipment model designation, trade name, military be unique to this report. project code name, geographic location, may be used as key 9b. OTHER REPORT NUMBER(S): If the report has been wordsbut will be followed by an indication of technical conassigned any other report numbers (either by the originator text. The assignment of links rules, and weights is optional or by the sponsor), also enter this number(s). 10. AVAILABILITY/LIMITATION NOTICES: Enter any limi tations on further dissemination of the report, other than those Security Classification

Unclassified Security Classification DOCUMENT CONTROL DATA - R&D (Security clsesification of title, body of abatract and Indexind annotation must be entered when the overall report is claeaified) 1. ORIGINATING ACTIVITY (Corporate author) ia. REPORT SECURITY C LASSIFICATION Systems Engineering Laboratory Unclassified The University of Michigan 2b GROUP. Ann Arbor. Michigan. 3. REPORT TITLE Optimal Space-Time Signal Processing and Parameter Estimation with Emphasis on Its Application to Direction Finding 4. DESCRIPTIVE NOTKS (Type of report and inchuilve date.) SEL Technical Report No. 16 S. AUTHOR(S) (Laet name, first name. ntial) Mueller, Donald G. 6G REPO ET DATE 7. TOTAL NO. OF PACGES 7b. NO. OF RrFS October 1967 225 84. CONTRACT OR GRANT NO. i*. ORIGINATOR'S REPORT NUMBER(S) DA 28-043-AMC-01870(E) ECOM-01870-14-T b. PROJECT NO. 1P021101 AO42 01C.. OTHIR REPORT NO(S) (Any other numbere that may be'aaigned d. 02 7695-14 10. A V AIL AILITY/LIMITAtION NOTICES Each transmittal of this document outside the Department of Defense must have prior approval of CG, U. S. Army Electronics Command, Fort Monmouth, N.J. Attn: AMSEL-WL-S 11. SUPPLEMENTARY NOTES la. SPONSORING MILITARY ACTIVITY U. S. Army Electronics Command Fort Monmouth, New Jersey 07703 Attn: AMSEL-WL-S 13. ABSTRACT The purpose of this study is to develop techniques for processing signals received by a special array of K-omnidirectional antennas so as to produce optimal direction-of-arrival estimates. Such estimates are useful for navigational purposes in which case the signals are electromagnetic in nature as well as for seismic investigations where the signals are mechanical vibrations transmitted through the earth. Another possible application is an underwater direction finding system which utilizes an array of hydrophones for its receiving antennas. Two distinct approaches to the problem are pursued which apply differing mathe matical disciplines. The first approach models the array as an element of a phased array direction fihding system and attempts to apply "Stochastic Optimal Control Theory" to produce optimal control laws for directing the pointing angle and specifying the beam width of the array. Optimal filtering of the signals is also considered. The controls and filter which result in "Minimum Error Variance" are considered optimal. The second approach applies "Estimation Theory" and considers the array only as an information gathering device whose signals are to be processed directly without the "prefiltering" present in phased arrays. The optimal criterion of this approach required the direction-of-arrival estimate to result in a "Minimum Probability of Error" on, what is shown to be equivalent, to have "Maximum a Posteriori Probability." DD.JAN 4'1 473 Unclassified Security Classification

Unclassif ied Security Classification 14. ( LINK A LINK LINK C 1. WORIGINATSNG ACTIVITY:EnteROLE WT ROLE WT ROLE wt Optimal control laws Minimum error variance Estimation theory Functional analysis Radon-Nikodym Dynamic programming Open-loop analysis INSTRUCTIONS 1. ORIGINATING ACTIVITY: Enter the name and address imposed by security classification, using standard statements of the contractor, subcontractor, grantee, Department of De- such as: fense activity or other organization (corporate author) issuing (1) "Qualified requesters may obtain copies of this the report. report from DDC " 2a. REPORT SECUNRTY CLASSIFICATION: Encter wthe. over- (2) "Foreign announcement and dissemination of this all security classification of the report. Indicate whetherd" "Restricted Data" is included. Marking is to be in accordance with appropriate security regulations. (3) "U. S. Government agencies may obtain copies of ~2b~. GROUP: Auoatcdonvdi ssecfe oDD-this report directly from DDC. Other qualified DDC 2b. GROUP: Automatic downgrading is specified in DoD Di- users shall request through rective 5200. 10 and Armed Forces Industrial Manual. Enter the group number. Also, when applicable, show that optional.. markings have been used for Group 3 and Group 4 as author- (4) "U. S. military agencies may obtain copies of this ized. report directly from DDC. Other qualified users 3. REPORT TITLE: Enter the complete report title in all shall request through capital letters. Titles in all cases should be unclassified.,, If a meaningful title cannot be selected without classification, show title classification in all capitals in parenthesis (5) "All distribution of this report is controlled. Qualimmediately following the title. ified DDC users shall request through 4. DESCRIPTIVE NOTES: If appropriate, enter the type of.i... report, e.g., interim, progress, summary, annual, or final. If the report has been furnished to the Office of Technical Give the inclusive dates when a specific reporting period is Services, Department of Commerce, for sale to the public, indicovered. cate this fact and enter the price, if known. 5. AUTHOR(S): Enter the name(s) of author(s) as shown on 11. SUPPLEMENTARY NOTES: Use for additional explanaor in the report. Enter last name, first name, middle initial. tory notes. If military, show rank and branch of service. The name of the principal author is an absolute minimum requirement. 12. SPONSORING MILITARY ACTIVITY: Enter the name of the departmental project office or laboratory sponsoring (pay6. REPORT DATE Enter the date of the report as day, ing for) the research and development. Include address. month, year; or month, year. If more than one date appears on the report, use date of publication. 13. ABSTRACT: Enter an abstract giving a brief and factual summary of the document indicative of the report, even though shoud7a. TOTAL NUMBER OF PAGES: The total page count it may also appear elsewhere in the body of the technical reshould follow normal pagination procedures, ie., enter the port. If additional space is required, a continuation sheet shall number of pages containing information, be attached. 7b. NUMBER OF REFERENCES: Enter the. total number of It is highly desirable that the abstract of classified reports references cited in the report. be unclassified. Each paragraph of the abstract shall end with 8a. CONTRACT OR GRANT NUMBER: If appropriate, enter an indication of the military security classification of the inthe applicable number of the contract or grant under which formation in the paragraph, represented as (TS), (S). (C), or (U) the report was written. There is no limitation on the length of the abstract. How8b, 8c, & 8d. PROJECT NUMBER: Enter the appropriate ever, the suggested length is from 150 to 225 words. military department identification, such as project number, subproject number, system numbers, task number, etc. or short phrases that characterize a report and may be used as 9a. ORIGINATOR'S REPORT NUMBER(S): Enter the offi- index entries for cataloging the report. Key words must be cial report number by which the document will be identified selected so that no security classification is required. Identiand controlled by the originating activity. This number must fiers, such as equipment model designation, trade name, military be unique to this report. project code name, geographic location, may be used as key 9b. OTHER REPORT NUMBER(S): If the report has been wordsbut will be followed by an indication of technical conassigned any other report numbers (either by the originar text. he assignment of links, rules, and weights is optional. or by the sponsor), also enter this number(s). 10. AVAIL ABILITY/LIMITATION NOTICES: Enter any limitations on further dissemination of the report, other than those Security Classification

UNIVERSITY OF MICHIGAN 3 9015 02653 5479