Technical Report No. 204 036040- 2- T MODELS FOR NONSTATIONARY STOCHASTIC PROCESSES WITH APPLICATIONS TO UNDERWATER ACOUSTICS by P. P. Nuspl Approved by:______________ Theodore G. Birdsall for COOLEY ELECTRONICS LABORATORY Department of Electrical and Computer Engineering The University of Michigan Ann Arbor, Michigan Contract No. N00014-67-A-0181-0032 Office of Naval Research Department of the Navy Arlington, Virginia 22217 July 1971 Approved for public release; distribution unlimited.

ABSTRACT Some models for nonstationary stochastic processes are presented and applied to real data processes. The nonstationary processes studied are those with time-varying autocorrelation matrices. Since probability distributions are complete statistical descriptions of a random variable, the fundamental problems are to find conditional distributions for the process x given observations y in a time interval. The Markov condition and the Gaussian distribution are assumed for the process x. It is called Gauss-Markov and the model is a set of linear, discrete-time difference equations, which are conveniently expressed in state-space notation. Also presuming knowledge of the model parameters, the solutions have been stated as the Kalman-Bucy equations. The conditional mean and conditional covariance are computed recursively based on past observations. The model parameters consist of the state-transition matrix, the driving noise covariance matrix and the observation noise covariance matrix. They are often not known, so a set of consistent estimators is given and then modified to become "tracking" estimators. A class of linear filters and digital processors is presented which have simple computer implementation. The data-acquisition consists of simultaneous processing of hydrophone outputs to produce

signal power and noise power measurements of underwater acoustic receptions. The study shows that the signal power process is apparently second-order stochastic and that the second-order model is adequate as a smoothing and prediction filter. The noise process has shipping and biological noise which may be of interest in themselves, but which make effective modeling difficult; by selective elimination of such observations a study of ambient ocean noise is possible. We conclude that the models are useful in studying these nonstationary stochastic processes.

FOREWORD A thorough understanding of the fluctuating bandpass noise and signal levels encountered in complicated underwater acoustic propagation is necessary before intelligent signal processing can be developed to reduce the damaging effects of these fluctuations on sonar and communications. Closely coupled theoretical and experimental investigations are the scientific technique for obtaining such an understanding. This report is one attempt at developing a model of such fluctuating power levels. It uses the approach of Kalman-Bucy filters, investigates techniques to extend the models to fluctuating levels, and uses measurements made during MIMI propagation experiments to aid in the development and validate the methods. Although promising, the results clearly indicate that the complexity of the model must be extended to better match and predict these fluctuations. At another level, the technique indicates advances in measurement processing that should be incorporated in propagation and noise measurement research.

AC KNOWLEDGEMENTS The research reported in this dissertation was supported by the Office of Naval Research, Department of the Navy, under Contract No. Nonr-1224(36), titled "Acoustic Signal Processing." vi

TABLE OF CONTENTS Page ABSTRACT iii FOREWORD v AC KNOWLEDGE MENTS vi LIST OF ILLUSTRATIONS xi LIST OF SYMBOLS xiv CHAPTER 1: INTRODUCTION 1 1. 1 Nature of the Problem 2 1. 1. 1 Stochastic Processes 2 1. 1. 2 Forms of Stationarity and Nonstationarity 3 1. 1.3 Models 4 1.1.4 Objectives 5 1. 1. 5 Acoustic Signal and Noise Processes 6 1. 2 Historical Background 7 1.3 Plan of the Thesis 8 1. 4 -Notation and Format 9 CHAPTER 2: STOCHASTIC PROCESSES AND MODEL EQUATIONS 11 2. 1 Introduction 12 2.2 Stochastic Processes 12 2. 2. 1 Markov Property 14 2. 2. 2 Gaussian Process 15 2.2.3 Ergodicity 16 2. 3 State-Variables and Matrix Notation 19 2. 3. 1 Linear Vector Equation 19 2. 3. 2 State Transition Matrix 22 2.3.3 An Example 23 2. 4 Model Equations 26 CHAPTER 3: KALMAN-BUCY FILTER 28 3. 1 Introduction 29 3. 2 Kalman Filter 30 3.2.1 Bayes Estimation 30 3.2.2 Recursive Equations for xl, x2 31 3.2.3 The Solutions 36 vii

TABLE OF CONTENTS (Cont.) Page 3.3 Continuous-Time Filter 39 3. 4 Initial Conditions 40 3.5 Performance Indices 43 CHAPTER 4: MODEL PARAMETER ESTIMATION 45 4.1 Introduction 46 4. 2 Consistent Estimates for the Stationary Model 48 4. 3 Tracking Estimates for the Nonstationary Model 52 4.4 Filtering of Field Data 57 4. 5 Initial Conditions on Statistics 60 4. 6 Summary of Recursive Equations of Model C 62 CHAPTER 5: A CLASS OF LINEAR FILTERS 65 5. 1 Introduction 66 5. 2 A Standard Form of a Class of Analog Filters 66 5. 2. 1 Impulse Response 66 5. 2. 2 Transfer Function 69 5.2.3 The T-Diagram 70 5. 2. 4 Special Cases: r = 1, r = ij 72 5. 2. 5 General Case; Complex r 79 5. 3 A Standard Form of Digital Processors 84 5.3. 1 Sampling 84 5. 3. 2 Equivalent Digital Processor 85 5. 3. 3 MPR Block Diagram 87 5.3.4 Examples 88 5.4 Combined Operation 89 5. 5 Time-Domain Viewpoint 91 CHAPTER 6: DATA PROCESSING 93 6. 1 Introduction 94 6. 2 Computer Program Flow Diagram 97 6. 3 Monte-Carlo Tests 6. 3. 1 Gauss-Markov Sample Functions 102 6. 3. 2 Testing a Stationary Model 102 viii

TABLE OF CONTENTS (Cont. ) Page 6.4 Source of Field Data 104 6. 4. 1 Underwater Acoustics 104 6. 4. 2 Dual Comb Data Processing 104 6.4.3 Processes T, N 107 6.5 Experimental Procedure 108 CHAPTER 7: MODEL APPLIED TO THE T PROC ESS 112 7. 1 Introduction 113 7. 2 T Process Sample Functions 113 7. 3 One-State Model, n = 1 113 7. 3. 1 Coefficients in the Estimators 118 7.3.2 Initial Conditions 124 7. 3.3 Filtered Process 124 7. 3.4 Performance Results 126 7. 4 Modified Two-State Model, n = 2 130 7. 4. 1 Minimum Mean-Square Fit to Obtain Vector y 131 7. 4. 2 Coefficients in the Estimators 134 7. 4.3 Filtered Process 135 7. 4.4 Performance Results 135 7.5 Comments on the T Process 142 CHAPTER 8: MODEL APPLIED TO THE N PROCESS 145 8. 1 Introduction 146 8. 2 N Process Sample Functions 146 8. 3 One-State Model, n = 1 149 8. 3. 1 Coefficients in the Estimators 149 8. 3. 2 Initial Conditions 151 8. 3. 3 Filtered Process 152 8. 3. 4 Performance Results 152 8.4 Comments on the N Process 152 CHAPTER 9: CONCLUSION 159 9. 1 Summary of Results 160 9. 1. 1 Theory and Methods 160 9. 1.2 Data Processing 162 9. 1.3 Achieved Objectives 164 9. 1. 4 Contributions 165 ix

TABLE OF CONTENTS (Cont.) Page 9.2 Future Work 166 APPENDIX A: A TECHNICAL GLOSSARY 169 APPENDIX B: EXPECTATIONS OF PRODUCTS WITH Ok' k-1 AS FACTORS 181 APPENDIX C: EXPECTATIONS OF QUARTIC TERMS FOR THE VECTOR CASE 186 APPENDIX D: EXPRESSIONS FOR THE OBSERVATION VECTOR 189 APPENDIX E: THE STEADY-STATE COVARIANCE A 191 REFERENCES 193 DISTRIBUTION LIST 196

LIST OF ILLUSTRATIONS Figure Title Page 4.1 Model C 60 4.2 a) Estimation of k' Qk' Rk 64 b) Kalman filter block diagram, H = I, nx n 64 5.1 A delay line of T units 68 5. 2 A tapped delay line filter 68 5. 3 T -diagrams 71 5.4 Amplitude spectra IH(f) for r = 1 74 5. 5 a) Power spectra I H(f) 2 for r =.90 82 b) Power spectra IH(f) 12 for r = 1. 0 82 c) Power spectra IH(f)12 for r =. 5 83 5. 6 IH(0)[2 versus r 83 5.7 Sampled filter 85 5. 8 Sampled circulating average 86 5. 9 Simple circulating average 86 5.10 P-position circulating average 87 5.11 MPR processor 88 5.12 The complex block average 89 5.13 A cascade of block averages 90 5.14 A cascade of circulating average and block average 90 6.1 Preprocessing of data 96 xi

LIST OF ILLUSTRATIONS (Cont.) Figure Title Page 6. 2 Program flow chart 99 6.3 Flow chart for ICIN 100 6.4 Flow chart for OUTPUT 101 6. 5 Gauss-Markov sequence sample function 103 6. 6 Data-acquisition processing 106 7. 1 a) T process sample function 114 b) T process sample function 115 c) T process sample function 116 7. 2 T process sample function, one of every 5 data points 117 7.3 T process: estimate xkIk for n= 1 125 7.4 T process: forecast k+jlk for n = 1 127 7. 5 Estimates 4k Qk' Rk for n 1 128 7.6 Pk' JOk for n=1 129 7.7 Line fit for vector Yk 133 7.8 T process: estimate kl k for n = 2 136 7.9 T process: estimate Xklk for n 2 137 7.10 T process: estimate xk k for n = 2 138 7.11 Estimates Dk') Qk' Rk for n=2 139 7.12 Error covariance Pk and performance index JOk 141 7.13 T process: forecasts Xk+jlk for n = 2 143

LIST OF ILLUSTRATIONS (Cont.) Figure Title Page 8.1 N process sample function 147 8. 2 Output of the nonlinear processor 150 8.3 N process: estimate Xklk for n = 1, time-invariant model 153 8.4 N process: estimate Xklk for n = 1, time-varying model 154 8.5 Model parameters k' Qk' Rk 155 8. 6 Model parameters bk' Qk with Rk = 1.0 156 8.7 N process: forecast Xk+jlk for n= 1 157 A. 1 A limiter 174 xiii

LIST OF SYMBOLS abstract space w element in t2 indicates "member in" r(e), x(.) random variables indexing set s, ttk time t e r(, t), r(t) x(, t), x(t), xk stochastic processes y(., t), y(t) Yk E expectation R(T), R(T, t) autocorrelation function a-field of sets in Q A, B sets in Q (in context) P( ) probability measure AIB A given condition B holds p(. ) probability density function k discrete time Xk, Yk column vector random variable at time k Ax' Ak covariance matrix of x n dimension of xk m dimension of Yk xiv

LIST OF SYMBOLS (Cont. ) or, f3 density parameters is identically equal to A is defined to be T a transformation, in Theorem 2. 1 T the process studied in Chapter 7,9 ua-field of invariant sets Ma time average over (-a, a) R (X) time average over (-a, a) F(t) coefficient matrix w(t) vector driving function D(t, t0) state transition matrix I identity matrix k (k, k-l) uk process equation driving term Qk covariance matrix of uk Hk observation matrix vk observation equation driving term Rk covariance matrix of vk 6 1 p = k (Kronecker delta) pk 0 otherwise R, Rms risk defined in Chapter 3 x(.) estimate of x XV

LIST OF SYMBOLS (Cont.);x klk optimal estimate given past observations Xklk error in the estimate Pk prediction error covariance matrix P k filter error covariance matrix Ak optimal gain in the Kalman filter Xk+j Ik prediction of Xk+j given observations up to time k JNk performance index, Eq. 3.54 k estimators for model parameters Qk ckj' dk~j, ek j coefficients in statistics r, r0, r01 radices of coefficients in statistics p(0) spectral radius of'Q A0' A0 k o, k statistics in Model A Al) A A1' A1, k B B 0' 0,1 k statistics in Model B B1, B1, k S-1 inverse of matrix S S+ pseudo-inverse of matrix S $2~~ ~ the L-sphere about r h(T) impulse response of a filter xvi

LIST OF SYMBOLS (Cont.) H(f) Fourier transform of h(T) T, A discrete time interval lengths M number of observations averaged P number of positions in MPR filter r radix of coefficients in MPR filter BAV block average CIRAV circulating average MPR multi-positional recirculator 6 (t) a sequence of delta functions, e spaced N the process studied in Chapter 8 a 2 scalar variance X2 chi-squared statistic xvii

CHAPTER 1 INTRODUCTION Synopsis 1.1 Nature of the Problem Stochastic processes Forms of stationarity and nonstationarity Models Objectives Acoustic signal and noise processes 1.2 Historical Background 1.3 Plan of the Thesis Introduction Model development Data processing Conclusion 1.4 Notation and Format

-21.1 Nature of the Problem This thesis is a study of models for nonstationary stochastic processes. The key words in the title are introduced and the nature of the research problem will be described in this section. The theory and the methods of dealing with this problem, and the data processing and results are the main part of this study. 1.1.1 Stochastic Processes. Beginning with the fundamental concepts in probability theory, some common terms in stochastic process theory are defined. A random variable* is a measurable function defined on the abstract space a = {w} A notation often used is r('), and r(w) is a particular value called a sample value. Where there is no possibility of confusion, the notation is often abbreviated to r for a random variable and the underlying space and probability structure are understood. A stochastic process is a family of random variables indexed by a set d/. The usual notation is r(-, t) for tEe, and r(w, t) is an indexed set of values called a sample function; there is such a sample function for each w e 2. When there is no confusion, the notation r(t) is commonly used. Likewise, for each fixed t, r(',t) is a random variable. The use of the letter t in the indexing set *Underlined terms are also defined in Appendix A.

-3suggests that "time" is the most common indexing parameter. Furthermore, in later chapters, the indexing set S, will be the positive integers, which corresponds to discrete time. A discussion of the necessary topics in stochastic process theory is given in Chapter 2. 1.1.2 Forms of Stationarity and Nonstationarity. Nonstationarity is a negative concept so it is defined as various forms of negations of stationarity. A strict-sense stationary process (SSSP) has all its statistics not affected by a shift in the time origin. A wide- sense stationary process (WSSP) has an expected value which is constant and an autocorrelation which depends only on the time difference. E r(w, t)) -= I = constant (1.1) E {r(w, t + T) r(w, t)) = R(T) (1.2) In later chapters, only Gaussian processes will be used, so that an important remark should be made. For a Gaussian process, since the mean and autocorrelation uniquely determine all statistics, strict-sense and wide-sense properties are the same. Nonstationary processes are studied in this thesis. The particular type of nonstationarity allowed is that described by the phrase not wide- sense stationary. Another statement of this condition is that the mean and autocorrelation are allowed to be time-varying.

-4E {r(, t) = ji(t) (1.3) E{r(cw,t + ) r(o,t) = R(T, t). (1.4) Without loss in generality, a further stipulation is usually made that the (unconditional) expectation be zero for all t. Thus, there remains to specify only the second-order statistic R(z, t). 1. 1.3 Models. The first and most significant word in the title is the word "models"; in mind, of course, are mathematical models. A natural question which comes up in such work is, "Why are models developed?" A brief discussion of this fundamental point seems apropos and several reasons for modeling are given. Often, a model is the only tractable description of the physical process being studied; indeed, the real world is very complex and many factors are involved, so that judgement and simplification are necessary to describe the process. In many human endeavors, there is often an ignorance (used in the dictionary sense, and not as derogatory) of the physical laws, or at best we have sparse information on governing laws. Modeling, then, can be a technique to describe our observations. A third reason for modeling might be an inability to solve physical laws even though they are readily described. Such solutions might be unavailable due to the high dimensions involved or due to non- computability, or they might not be economically feasible.

-5These are some of the reasons for modeling. The reader will easily find examples of these points in his own field of studies. The application of such models in this work is to acoustic signal processes. 1.1.4 Objectives. The main objective is to study a class of stochastic processes and a stated purpose is to develop models for such processes. To be more specific, two classes of general problems will be treated: the first is called the filtering problem and the second is called the prediction problem. The Filtering Problem Given the observed values of a stochastic process over some interval of time, find the conditional probabilities of all values of another related stochastic process. The Prediction Problem Given the observed values of a stochastic process over some interval of time, find the conditional probabilities of all future values of the stochastic process or some related stochastic process. There is another objective. Since nonstationary stochastic processes are being studied and since the models will have timevarying parameters, a purpose of the experimental part of this thesis is to accumulate information on such statistics and parameters. Such information may lead to further understanding of the physical laws or

-6to more sophisticated models. 1. 1.5 Acoustic Signal and Noise Processes. Oceanographic acoustic signals and noise processes have already been mentioned as the specific area of application of the theory and models. While a deterministic model has many advantages and may be practical under a controlled set of circumstances, a model with stochastic inputs seems entirely appropriate for the solutions sought. In the opinion of the writer, a stochastic model, in which the parameters are themselves unknown and stochastic parts of such a model, is a reasonable approach to the problems. This necessity for a stochastic study is elaborated upon below. Consider the physical system under study: an ocean or a large part of it. It is a large body of water which is under constant change; it is affected by daily, lunar and seasonal tides, by weather conditions. Ambient ocean noise is affected by the temperature and salinity of the water, and is contributed to by earth tremors, by biological noise and by man-made sounds from inside and outside. Propagation conditions can bring sound from several hundred miles and yet "mask out" nearby sources. How can we account for the vagaries of a school of fish, the path of a rain storm or ship? Of course, this study is not of such detail. The interest is in a gross description of the net effect of all these causes and not the causes themselves. The approach is to observe the process, model it and then process or filter.

-7This leads to a philosophy of stochastic models which is briefly described. Under strictly ergodic assumptions, a "measurement of the system" can be made and if this can be done often enough, perhaps the result will be a smooth picture of the actual process. However, such an approach is not acceptable due to the inherent "steps" in the model or the need for much processing. Rather, the philosophy adopted by a model-builder is that reasonable and available structure must be built into the model at the outset. This admits a time-varying model and a recursive solution which is smoothed and economical when the required conditions are met. This is a personal viewpoint and is subject to critical review. 1.2 Historical Background A brief historical background will include the major contributions to the topics mentioned above. The foundations of probability theory were laid down by Kolmogorov (1933) and within the last 25 years, stochastic process theory has become a branch of mathematics (Ref. 5). Probability had early application in such areas as detection theory and communication theory, but stochastic process theory has had engineering application only within the last 15 years. A celebrated stochastic process is the Gauss-Markov process, which found application in the development of the now- classical Kalman-Bucy filter (Ref. 11). A survey of the literature indicates extensive use of these concepts (Ref. 4), though publications in the open literature

-8are still sparse. There are other areas of background interest to this report. Parameter estimation is well-treated in writings on statistics and yet it is an area of present-day research, with conditions such as nonstationarity and dimensionality to investigate. At the foundation of the processes to be modeled, sit extensive studies on acoustics and oceanography (Refs. 18 and 21). These disciplines have traditionally taken a physical approach and they seek laws and cause-effect relations. The stochastic modeling approach, on the other hand, depends on physics for guidance and direction but attempts to obtain statistical descriptions directly from observations. Through this sketch of the background of the problem, the interested reader will find additional material to complete the picture according to his needs. This introduction to the problem is intended to be general and indicative of the methods used in the following chapters. A plan of the thesis follows. 1.3 Plan of the Thesis This chapter is introductory and states the problems and methods of solution. The body of the thesis can be thought of in two parts: one is model development, Chapters 2, 3, 4 and the other is data processing, Chapters 5, 6, 7, and 8. Chapter 2 states results of stochastic process theory which are essential to the development and presents the state vector and

-9matrix representation of the model to be used. Chapter 3 contains a derivation of the recursive equations and states the formal theorem which is the result known as the Kalman-Bucy filter. In Chapter 4, two classes of parameter estimation schemes are discussed. For a stationary process and for other stated conditions, a consistent set of estimators of the model parameters have been found. For a nonstationary process with certain other properties, a set of estimators is proposed and evaluated. Chapter 5 is a self-contained discussion of a class of linear filters and digital processors which are used in the data processing of Chapters 6, 7 and 8. Chapter 6 is a general description of the required data processing, while Chapter 7 is an application to a process which is a signal power measurement and Chapter 8 is an application of the model to a process which is a noise power measurement. The last chapter summarizes some general observations and concludes the thesis. 1.4 Notation and Format A special comment on notation seems desirable. The notation in the literature on such diverse topics is, understandably, diverse. Wherever possible, the notation in this thesis has been made the same as that in the literature on Kalman-Bucy filters; and in particular, the notation in Bucy and Joseph (Ref. 4) is followed with but a few exceptions. Other notation is consistent with this policy and

the present practice of journals. To assist the reader, a List of Symbols and explanations is included. The format of the thesis is unusual in two respects. A synopsis of each chapter is given at the beginning of the chapter; hopefully, this synoptic presentation will assist the reader. Also, definitions of many technical terms have been collected in the first appendix, which is a technical glossary. Readers from different background will find this an assistance.

CHAPTER 2 STOCHASTIC PROCESSES AND MODEL EQUATIONS Synopsis 2. 1 Introduction 2.2 Stochastic Processes Markov property Gaussian process Ergodicity 2.3 State Variables and Matrix Notation Definitions Linear vector model State transition matrix Example 2.4 Model Equations Conditions of the model Process equation Observation equation Initial conditions -11

2. 1 Introduction This chapter presents the elements of stochastic process theory which are needed. A review of conditional probability and conditional expectation is given, followed by a discussion of some properties of a Markov process. The processes studied will be shown to be Gaussian, so that they are usually called Gauss-Markov (GMP). A brief discussion of ergodicity is relevant at that point. Another section contains some features of state variable theory, including the matrix differential and difference equation representations of systems of equations. Some of the advantages of this theory are stated and an example will illustrate these concepts. With the preliminary technical background completed, the model is presented, accompanied by general discussions on conditions and validity of such a model. 2.2 Stochastic Processes Stochastic process theory is a modern and growing branch of probability theory. A presentation such as this can give only the essential elements needed in this development. Fundamental concepts in probability are conditional probability and conditional expectation. Their general definitions are given in modern terminology in Appendix A, and their more usual definitions are given here. Conditional probability: If the random variable y takes values in a set B and if P(y e B) > 0, then the conditional probability of A

-1 3given y c B is defined by P(AlycB) = P(A, yB) (2.1) P(y E B) Conditional expectation: When the conditional probability P(AI y E B) exists, the conditional expectation of x given y e B is defined by E{xlyEB} = f xdP(xEcAlyB) (2.2) A It is significant to note that both of these quantities are random variables and that they each enjoy the properties of probabilities and expectations, respectively. The modern writings on probability put considerable emphasis on the generalizations of these concepts and their results; for an understandable treatment, the reader should consult a text such as that by Breiman (Ref. 3). When the probability characterization can be expressed as probability density functions, as is the case in this thesis, the conditional probability density and conditional expectation take the following form. p(xY1, Y2,' k) = p(x,Y12 Y2,' Yk) (2 3) E(xly21,y2, Y2 * A3) P(Yl'(2 Y)k) xp(x, Y1, Y2,''" Yk)dx E(xlYY',Y2''' Yk) = (2..4)

2.2. 1 Markov Property. The definitions of a random variable and a stochastic process were given in Chapter 1. A special class of processes is now defined: a Markov process is a stochastic process with the following Markov property. P(X(tk+l) < Xkx(tk),..., x(tl)) = P(X(tk+l) < X (tk)) a.s. (2.5) for any tk+ > tk >... > t0 > O and for any X. (Almost surely, a. s. or "with probability 1" are technical terms in probability. ) This condition is more than adequate for this presentation, yet a more general definition is given in Appendix A. Strictly speaking, the Markov property is a statement about the conditional probability at the one instant tk+l in the future. However, this extends to a general statement about the future, given the present and past. Proposition: Let A be a set of future values of the Markov process x(t); then P(Alx(s), s<t) = P(Alx(t)) a.s. (2.6) Speaking intuitively and imprecisely, the statement is often made that, for a Markov process, the future is independent of the past when the present is given. So far, the random variable x(t) (or Xk) has been considered to be a scalar and such a process is often called Markov- 1.

-15The extension to the vector case is straightforward. Henceforth, let x denote a column vector of n elements. x(l) = x( (2. 7) x(n) Its m'th component is x(m) and will seldom be needed as notation, and xk denotes the vector random variable at the (discrete) time k. Similar Markov properties are given to the vector x and the process is then called Markov-n or vector Markov- 1. 2.2.2 Gaussian Process. A further condition placed on the process is that every finite set of the xk's must be jointly Gaussian; it is a Gaussian process. This is a very important practical constraint, because only for this case have extensive results been found. The Gaussian probability density function is familiar to the reader in the form px(X) exp[- (x- x)' -x (X- xA)] (2.8) (27T) 2 A1 2 where gx is the vector mean: jx = E {x x is the covariance matrix: A E{(x- x) (x- X) } Familiarity with the properties of this distribution is presumed.

-1 6The actual processes to be studied in later chapters are acoustic signal power measurements expressed in db (decibels). A brief note (Ref. 12) in reference to a recent article (Ref. 15) shows that such processes are Gaussian; that proof is adequate. Proposition: Let z(t) be a stationary Gauss-Markov process with the continuous covariance function R(t, u) = exp[-oIt- ul]. Define the statistic a 2 y = log I z (t) dt (2.9) 2 and let - a Then the probability density for y is approximately 22 P (Y) = t 1 exp[e - 3 cosh Y - Y/2] (2.10) (27r/2 ) - The significance of this result is tha The significance of this result is that it is the only definite assurance that the processes under study are Gaussian, which is an essential ingredient of the recipe. The other necessary condition is the Markov property, so that Gauss-Markov theory can be applied. 2.2.3 Ergodicity. Another fundamental concept is ergodicity, which concerns the statistics of a stochastic process. The intuitive

-17expression of the ergodic property is that time averages equal ensemble averages and this presumes that relevant statistics of a process can be determined from a single sample function x(w, t). The formal definitions of an ergodic process and an ergodic transformation require several concepts from measure theory, so that these definitions are given in Appendix A. For a discussion of ergodicity, the reader should consult the book by Breiman (Ref. 3). The ergodic property is usually stated as the following ergodic theorem. Theorem 2. 1. Let T be measure-preserving on (Q,, P). Then for x any random variable such that E I x I < x, n- limni x(Tkw) = E{xl9} a.s. (2.11) k=0 Such ergodic properties have considerable significance in the experimental study of physical systems. Whenever a measurement is made, an average of observations is taken or any other filtering is accomplished, some form of ergodicity is being used. Specifically, local ergodicity is often required; the ergodic property is assumed to hold over an interval (-a, a) but nothing is said outside this interval. Consider the time average a a = 2a x(t) dt. (2.12) -a

Since x is a random variable, so is a,' but its expectation E{ja = E{x(t)} (2.13) is a constant if x is stationary, or a known function of t if x is nonstationary. Furthermore, if the variance of Ma tends to zero as the interval becomes large, the process x is said to be ergodic in the mean. Conditions for this and similar properties are discussed by Papoulis (Ref. 13) and some examples are given there. A form of ergodicity used in Chapter 4 is called ergodicity of the autocorrelation; this means that the time average 1 a Ra(A) 2aI J x(t+) x(t)dt (2.14) -a has a limiting value lim Ra(X) = R(X), which is the autocorrelation aof x, and this occurs if and only if 2a lim 2a (1 i-f) (R(T) - R2(X))dT = 0 (2.15) a-ooc 0 where 0(t) = x(t+X)x(t) (2.16) R (T) = E{x(t+X+ T) x(tt+ ) x(t+X) x(t)}. (2.17) 003

Such an ergodic property for second order statistics seems necessary, so it is assumed henceforth and used by the estimators of Chapter 4. 2.3 State-Variables and Matrix Notation In the study of dynamical systems, the classical technique used to describe such systems is the formulation of differential and difference equations. Such theory is then concerned with the existence and uniqueness of solutions under various boundary conditions. A modern formulation of the last 15 years, called state-variable theory, casts these same equations in matrix differential-difference form. The result is a time-domain formulation which reverses the trend to time-invariant transform theory. Some of the major ideas in this theory will be presented, for the model will be defined in this manner. 2.3. 1 Linear Vector Equation. This presentation is limited to the linear case, though the theory is not. Linearity is very important in many applications and makes possible the pleasing results of the next chapter. A linear set of differential equations is often an easy and direct way to describe a physical system; the equations model the physical system, sometimes adequately but usually for only general characteristics. In general, a system of order n is studied but all the examples below are for n = 1 or n = 2.

- 20d2 z(t) dz(t) + a(t) + b(t) z(t) = c(t) (2. 18) dtZ z(0) = z0 (2.19) d z(0) = (2.20) dt Vo (2 20) dt 0 This equation is a second- order linear differential equation in its familiar form; in general, the coefficients a and b are known functions of time and c(t) is a known "driving function." To specify d1' Z(t) the solution of an n'th-order equation, the values z(t),... dtnat to are needed and n variables are required to define the solution at any time. For n = 2, define x (1) (t) = z(t) and x(t) = x()(tl (2.21) x(2)(t) = dz(t) x(2)(t) dt x(t) is called the state vector of the system because it describes the state of the system, which is the minimal amount of information at t0 about the effects of past inputs necessary to describe completely the output for t > to. The set of all possible states is the state space. A simple derivation of the state equation for Eq. 2. 18 is given. Observe that

-21dx()(t) = x(2)(t) (2.22) d x2(t) = d2 z(t) = c(t)- a(t) x(2)(t)- b(t) x(1)(t) (2.23) dt ~dt2 so that the state equation is the vector-matrix first-order differential equation dx(t) = F(t) x(t) + w(t) (2.24) dt where 0 1 F(t) =, (2.25) -b(t) -a(t) 0 z0 w(t) = and x(O) = (2.26) c (t) v The matrix F is called the coefficient matrix and may be time-varying and w(t) is the vector driving term. The solution of Eq. 2.24 and conditions for existence and uniqueness are well-known. The solution consists of a homogeneous part and a particular part in the form x(t) = (t,to) x(tO) + f (t, ) W(T) d (2.27) to

-22where C(t, t) is discussed next. 0 2.3.2 State Transition Matrix. D(t, to) is the state transition matrix and is defined to be the solution of the differential equation d D(t, t0) dt = F(t) 5(t, to) (2. 28) with initial condition D(t0, t0) = I. This matrix has many interesting properties, two of which will be needed in following chapters. It is positive definite and always has an inverse' l(t, to). For the time-invariant case, F is constant and ~ is a function of the difference t- t0 only. A simple analytical solution for ~ is then always possible. But for the time-varying case, no such easy solution has been found and it is usually most expedient to integrate the differential equation (Eq. 2.28). There lies the crux of the problem in dealing with time-varying systems or nonstationary processes. For small dimension, such computations are feasible and practical application has been made using n = 9; but large dimensionality precludes the use of the full theory. For a comprehensive treatment of linear system theory, the book by Zadeh and Desoer (Ref. 23) is an excellent source. There are short sections on this topic in many of the references cited in later chapters. Some of the advantages of the state-variable formulation should

-23be pointed out. Stochastic, nonstationary processes and finite time intervals are easily included. Second, the solution is in a form such that it can be implemented on a computer. Also, there is a strong duality between estimation and control theory which this theory can exploit. A fourth advantage is that nonlinear systems can be studied by this approach. A list of advantages should be tempered by some disadvantages. Closed-form or intuitively pleasing results are often difficult to obtain and also, some problems are not so readily solved by this approach. 2.3.3 An Example. A simple example will illustrate the theory and prepare for a later application. Consider the time-invariant equation (Eq. 2..24) with n = 2 and 0 1 F = (2..29) -A2 O This is recognized as having a homogeneous solution which is undamped oscillatory. The solution for Z from Eq. 2.28 is readily obtained as cos X(t- t0) sin. (t- t [::s~n sin X(t- t0) o(t) = (2.30) t- sin.(t- t0) cos X(t- to) and so from Eq. 2.27

-24V0 xO cos (t- t0) + y sin X(t- t0) x(t) = + u(t) (2.31) x0 — i — sin X(t- t0) + v cos X(t- t0) The first is the homogeneous part and the second is the particular part, t u(t) = (t, T) w(T) dT (2.32) to The change to discrete time and difference equations will now be made. Let t take on only integer values t = k and to = 0. Then'D(t,t0) = (k, O) (2.33) and by the time-invariance of this solution for ~, (k, k- 1) = (1,O) for all k. (2.34) In general, for the time-varying case, <D is a function of both variables and not just their difference. Because of the iterative nature of the following formulation of the model and its subsequent use, there is interest only in (D(k, k- 1), which will be denoted by the abbreviation Ok -- (k,k-1) (2.35) and called the one- step state transition matrix. Similarly, the

-25particular part will have the notation k uk -f c(k, ) W() d (2.36) k-i1 and {uk} is a sequence of inputs. The discrete form of the state equation is the vector equation Xk = Xk-1 + u k>1. (2.37) An approximation will be used in Chapter 7, which requires the following development. The one-step transition matrix Ok of the example is cos X sin -k (2.38) -k sin X cos X In the applications, the system natural frequency X will be small; X << 1, so a reasonable approximation is 1 2 1 - 6 2 6 -D= (2.39) k -x2 1- -2 ~~1-2(1 - k) k

-26X2 where k = 1 -2 and could be time-varying. 2.4 Model Equations The stochastic sequence will be modeled by a set of two vector equations; one is called the process equation and the other is called the observation equation; together, they will be called the model equations. Process: Xk = kk-1 + uk (2.40) Observation: k= k + vk (2.41) For the vector case, xk and uk are n-dimensional vectors and Yk and vk are m-dimensional vectors. The sequences {uk} and {vk} are assumed to be Gaussian white noise (GWN) sequences with zero mean; their properties are completely characterized by their covariances. The variance Qk for uk and Rk for vk are defined as shown. E{up uk'} = pk Qk (2.42) E{vp vk'} = pk Rk (2.43) where bpk is the Kronecker 6. The nxn matrix Ok has already been discussed as the state transition matrix for the process xk. The observation matrix Hk is mxn and is the link between the

- 27observation and the process. vk is the observation noise and is part of the reason why the observation does not contain full information about x. The other part of the reason is that H is m xn dimensional, m < n and so for m< n not all of x is observable. {uk} is called the driving stochastic sequence (by analogy to the case when uk is deterministic) and is the reason why {Xk} is a random sequence. Since the driving variables uk and vk, are Gaussian and the equations are linear, the process and observation random variables, xk and Yk, are Gaussian. Furthermore, by the zero-mean properties of uk and vk, xk has zero-mean a priori and so has Yk. But given observations, the conditional means will not be zero. The conditional mean of the process xk given all past observations will be shown to be the required solution for the filtering problem. The Markov property of xk is evident from the process equation. The recursive relation establishes this, and any Markov sequence can be put into this form. The dimension of x is n and determines the order of the Markov process. The model has considerable intuitive appeal. An n-th order linear dynamical system is represented by the matrices c and H. This system has the input Uk, a stochastic sequence which is welldescribed, and observation "noise" Vk, also a well-defined stochastic process. We ask for the probability characterization of the output process x, given the observations y.

CHAPTER 3 KALMAN-BUCY FILTER Synopsis 3.1 Introduction 3.2 Kalman Filter Bayes estimation Recursive equations for x,' x2 The solutions — filtering -- prediction 3.3 Continuous-Time Filter Estimator equation Gain equation Variance equation 3.4 Initial Conditions Transient conditions Steady- state conditions 3.5 Performance Indices -28

- 293.1. Introduction The previous chapter has presented the necessary topics in stochastic process theory, which have been used to build a general model for stochastic processes. The concepts and notations of statevariable theory and matrix representations of differential equations have been introduced. These are very useful in the precise statements and solutions of the filtering and prediction problems. The stochastic sequence we are studying is Gauss-Markov. These conditions of Gaussian statistics and the Markov property are central to the tractable solution of the problems. Their implications will be expressly stated in what follows. The formulation and solution of the filtering and prediction problems for the discrete case were first presented by Kalman (Refs. 9 and 10). Subsequent work by Kalman and Bucy developed corresponding equations for the continuous case (Ref. 11). The whole topic became known as "Kalman-Bucy filters, " although not all the equations represent filters. Recent treatments of the subject are to be found in the books by Van Trees (Section 6.3.2) (Ref. 22) and by Bucy and Joseph (Ref. 4). This theory is well-documented, so the purpose is merely to present the highlights of the development and establish the notation of the solutions.

- 303.2 Kalman Filter The prediction and filtering problems have already been discussed in general terms. Under the condition that x and y are Gauss-Markov, the solutions are quite simple in principle. The conditional probability distribution of a Gaussian process is completely described by its conditional mean and conditional covariance. From the Markov property, the mean and variance need to be known only for the latest discrete value of time. 3.2.1 Bayes Estimation. In estimation problems of the Bayes type, an a priori density Px(x) is assumed known. The cost function C(x) may be taken to be a function of a single variable x = x - x, where x = x(y) is the estimate, x is the true value and x is the error. y is the observed variable. Then the risk is defined as = C[X- (Y)] (X,Y)dXdY (3.1) and the Bayes estimate minimizes this risk. When C is quadratic in x, C(x) =x x, (3.2) the solution is called the minimum mean square error (MMSE) estimate. Since Px y = Py Pxly' the risk becomes

- 31ms S py(Y) J [X-x(Y)]' [X- x(Y)] Pxly(XIY) dX dY (3.3) -xC -xC These are m and n-fold integrals. s is minimized by minimizing the inner integral; differentiating with respect to x(Y) and setting the result to zero produces the unique minimum x(Y) = X x (XIY) dX (3.4) which is the conditional mean (the mean of the a posteriori density). Actually, this estimate is optimum for a broader class of problems. Under Gaussian conditions, it is optimum even for other cost functions. These additional features will not be needed in what follows. 3.2.2 Recursive Equations for x1, X2-. Recursive equations for the conditional mean will now be developed; some of the details are included. Let the initial condition on x by xO; x0 may be a random variable. Let it have the known variance matrix E{x0 x0' } = A0 and zero mean. The estimates xl and x2 will be found, then the general solution for the filtering and prediction problems will be given. The conditional probability Px I is needed.

- 32XPYl lx(YIX) px(X) P (XIY) P ( (3.5) p (YIX) = constant exp[- (Y-HX)' R- (Y-HX)] (3.6) p (X) = constant exp [-1 (X - Dx0)' A01(X - XO)] (3.7) Therefore, PXl yl(XIY) = constant exp[ (Y- HX)' R (Y- HX) (3.8) 1! -1 2,- > (X )' A (X- ( X)] This conditional distribution of x1 given Yl is Gaussian. Thus there are x(Y) and A1 such that Px1 y(XIY) = constant exp[- (X-(Y))' (Xx(Y))] (3.9) The conditional mean x(Y) and conditional covariance A1 can be found by completing the quadratic form of Eq. 3.8. The quadratic term, X'[A0 + H' R H] X identifies A1 = A1 +H'T I1 H. (3.10) The linear terms in X eventually yield

-33- A. -1- - 1 A1 x1(y1) = A 0 xo+ H'R Y1 (3.11) x!(Yl) = C XO+ A1 H' R (Y1-H xO). (3.12) Now the filtering error covariance matrix P1 is defined by -Pk EIkklk' } k=1,2,... (3.13) where XklXk xk- Xklk This is the error in the estimate for xk given observations up to and including time k. Since the estimate is the conditional mean, this is a convenient notation for the conditional expectation. Xklk - E{XklYk, Yk-l' 0} k>1 (3.14) has zero mean so P + H' R xkIk has zero mean so P =A [A0-1 + H'R H] The estimate for x2 is found as follows. The observation equation Yk = HkXk + Vk with independent vk implies that x2 dominates yl in regard to the distribution of Y2; in simple words, given x2, Y1 does not provide any more information about Y2. This is expressed as PY2jx2, Y( PY2 x2 and is where the Markov condition is used.

- 34Consider the joint probability of x2 and Y2 conditional to Y1. Y2' x2 1Y1 Y2 1x2' Y1 Px2 1Y Px2 = Y2' YlP 2 Y (3.16) Substituting Eq. 3.15 in Eq. 3.16, PY2 X2PX2Y1 P2 Y2'Y21 21 (3.17) or py1 x2 Px2 }y p = (3.18) "2 IY2'Y1 Yl 222 Let kl (3.19) and Xklk-1 xk Xklk-1 (3.20) P1 A (3.21) P1 - A0 The prediction error covariance matrix Pk is defined by k P E {xkk-l Xklk1} k=2,3,... (3.22) In Eq. 3.18, Py2 x2 is normal with mean Hx2, variance R, by virtue of the observation equation ~Y2 H x-2 V2. (3.23)

- 35The density px Yl is obtained from the process equation X2 = x1 +u2. (3.24) For given Y1, xl(yl) is known from Eq. 3.12, and x1 = x(l) + x (3.25) where xX is normal with mean 0 and variance P. So Px is normal with mean ~ x1 and variance 2P PC1 +Q (3.26) It has just been established that the two factors in the numerator of Eq. 3.18 are quadratic exponentials in (Y2 - H x2) and (x2 - C x (Y1)) respectively. Since the denominator of Eq. 3.18 does not depend on x2, px I Yl depends on x2 only through these quadratic exponentials and hence must be normal. The following results are obtained by completing the quadratic form as before. -— 1 =21 -1 P - P H + R H (3.27) 2 2 2(Y2) = x1 + P2 H'(H P2 H'+ R)1 (Y2 - H i) x) (3.28) The following theorem summarizes these results for the recursion from time k-1 to time k. The matrices Z, Q, R are

-36presumed known, and they may be time-varying. In the above notation, x1(Yl) = x111 and x2(Y2) = x212; x110 = x0' 010 0 3.2.3 The Solutions. The filtering problem has the following solution; it is called the Kalman filter or the Kalman-Bucy equations. Theorem The model with known matrices Dk and Hk and known noise covariances Qk and Rk, has the following linear form: Xk = ~kxk-l + ak (3.29) xO = random variable, initial condition on x Yk = xk+vk (3.30) E{u U;} = I jkQk (3.31) E {vjv} = 6jk Rk (3.32) where 6jk is the Kronecker 6. Independence between uk and vk for all k is assumed and u, v are independent of x, y. Suppose Rk > aI for a> 0, then xklk is the solution of the difference equations

- 37XO01 0 X0 Xk!k-1 (3.33) XklIk- I Ck Xk- llk-1 ( ) xklk Xklk-1 + Ak(Yk- k klk-1) for k =1,2,..., where Ak = Pk Hk (Hk Pk H+ k) k > 1 (3.34) k k Pk- ~1 + Qk k> 1 (3.35) P1 = E {x0 x0'} P = (I- AkHk) Pk(I - Ak + AkRkAk (3.36) Observe that Pk must be computed first, then Ak. Pk is then computed for the next iteration. The formal statement of this theorem and its proof are given by Bucy and Joseph (Ref. 4). Their method of proof is different than the recursive derivation of Section 2.2. Some general comments can be made. The equation for Xk k is a linear combination of the previous estimate and the latest observation. Since it minimizes the risk, it is called the optimal estimate. In the form given, Xklk-l is computed from the estimate k-1 Ik-1; Hk Xiklk-1 can also be computed before yk is observed.

- 38Only one matrix multiplication is required after Yk is observed. The term Ak is the optimal "gain" and has been directly related to the information that the observation Yk has about the process xk. Letting Hk = I, an intuitive interpretation is readily available; when the observation covariance Rk is much higher than the prediction error covariance Pk. the gain Ak is very small and the correction term is not weighted heavily; on the other hand whenever Rk is small, Ak is nearly the identity matrix I, so that the past estimate is not nearly as important as the present observation. The error covariance matrices Pk and Pk have already been identified; Pk is the one-step prediction error covariance; Pk is the filter error covariance. Together their equations are the discrete form of the Riccati equation and limit to this equation as the time increment tends to zero. The significant observation should be made that Pk, Pk and Ak do not depend on the values of the conditioning variables, the observations yk, Yk-l'''., Y1 This is a special property of the multivariate Gaussian distribution and has the important practical result that the structure of the optimal filter can be determined before and independent of the observations; it will not be "adaptive". In special cases, the gain Ak might even be computed well in advance and then used by the optimal estimate equation as needed. Usually, memory restrictions preclude such massive storage and the method

- 39is to solve the Riccati equation as needed. The structure and memory requirements are clearly evident from this formulation. They have the advantageous property that they are fixed. The prediction problem has a solution which is a corollary to the theorem. Corollary Under the conditions of the theorem, and since uk has zero mean, the prediction Xk+jlk is given by Xk+Ik k+1 xkk (3.37) Xk+2 k -k+2Xk+1 Ik -k+2 k+lxklk (3.38) Xk+jlk (Dk+j D k+1 XkIk (3.39) This result follows by noting that the conditional estimate is still optimum. With no more observations, the one-step prediction is just iterated. This has the physical meaning of running the autonomous (no input) system starting from the initial condition of xklk at time k to get state values at future times k+ j 3.3 Continuous-Time Filter For comparative purposes, the equations of the continuoustime filter will be stated.

Estimator equation: d(t) = F(t) x(t) + h0(t, t) - (t)] (3. 40) dtGain equation: h0(t,t) = P(t) H'(t) R (t) (3.41) Variance equation: d P(t) - R (t) P(t) + P(t) F (t) dt - P(t) H'(t) R- (t) H(t) P(t) + Q(t) (3.42) F(t) is the system coefficient matrix, as defined in Chapter 2. These two forms are quite similar, with the usual difference equation and differential equation distinctions and parallelism. Many of the properties of the discrete equations carry over to the continuous equations. It is possible to derive one set from the other. These continuous equations are presented by Van Trees (Ref. 22). Since our applications in later chapters are for the discrete case and the computations will be on a digital computer, these equations will not be discussed further. 3.4 Initial Conditions As in any differential equation or difference equation problem,

-41initial conditions are required to completely specify the solution; furthermore, the state-variable method is in the time-domain, so actual values of the initial conditions are needed. Much of the material written on Markov processes deals with the transient condition, where some initial distribution is given and the problem is to find suitable transition probabilities and conditional probabilities. Such a study is not the purpose of this section, but rather initial conditions are sought which satisfy steady state constraints (temporally constant). Furthermore, such initial conditions should be expressed in terms of the model parameters; the state transition matrix, the observation matrix H, the driving noise covariance Q and the observation noise covariance R. The covariance of the process x is denoted by = - E{xkxk'} k (3.43) and its initial condition A0 will be found. The reader will recall that, in the derivation of the recursive equations for xlI1 and 2 12' A0 was presumed known. The emphasis here is to give it that value as determined by using ~ and Q; from the process equation, Ak satisfies =k =qkk-1 k c'C+Qk (3.44) since xk and uk are uncorrelated. So for a steady condition,

-42~0 0 A0 0 D + Q0 ^ (3.45) Appendix E shows the conditions for a solution to this equation and that a reasonable approximation can be readily computed. This matrix is needed in the next chapter to establish initial values for some statistics. The recursive formulation of the Kalman-Bucy filter assumes P = Ao, which is in keeping with the condition that no observations have been taken. Observe that this value of P1 is independent of R and that this represents a situation of the transient type. This condition is not admitted, for a steady-state value for Pt is needed, as though the recursion had been running for some time. P1 = P0 must satisfy the conditions obtained from Eqs. 3.34, 3.35, 3.36. ~ C0~ P0 %Do + Q0 (3.46) P0 = (I - AoHO) P0(I - AH0)' + AoRoA0 (3.47) A = P0 Ho'(Ho PoH0o + RO)1 (3.48) These equations can be simplified somewhat, because H0 = H = I and another form of P can be written: P0 = (I- AoH0) P0 (3.49) However, no closed-form solution for P0 is evident, for this is the

-43matrix Riccati equation. This difficulty is not insurmountable; these equations are part of the Kalman filter and a solution can be obtained by iterating the equations with %0, QO and R0 as given. A0 =.9 I or some previous value will start the iterations and often only a few are needed. 3.5 Performance Indices The Kalman-Bucy filter is a filter in the usual sense of the word; this one has an input vector Yk and an output vector xklk, which is the minimum mean-square estimate of the process xk. Also, the "filter" is capable of producing predictions or forecasts of the state variable, Xk+j Ik. It is natural to define indices of performance in many studies and in this case particularly, as sums of differences squared. Pk and Pk are computed by the filter itself. Pk is the onestep prediction error covariance matrix for the process xk, as defined in Eq. 3.22. Pk is the closely-related filter error covariance matrix, defined in Eq. 3.13. Some additional performance indices will be defined. Such statistics are time-averages of observable processes; since xk is not observable, but Yk is, define Yklk analogously to Xklk and find that yklk = Hkxklk (3.50) jki I= Hk+jxk+j Ik ( )

-44The performance indices are therefore defined as the time-averages of the mean-square error of the N-th forecast. J Mo (yj ) (3.52) Jk M jj=k-1-M - Mjj k = 1 k (Yj - y )2 (3.53) k M j=k-1-M i j+lIj JN (YN Yj )2 (3.54) k M -M j+N j+Nj N should be kept small, where the forecasts have meaning. The point should also be made that these statistics are not used in the model, so they are viable as comparisons of different models and initial conditions.

CHAPTER 4 MODEL PARAMETER ESTIMATION Synopsis 4. 1 Introduction 4.2 Consistent Estimates for the Stationary Model Model A: A, Q, R 4. 3 Tracking Estimates for the Nonstationary Model Model B: (, Q, R Coefficients 4.4 Filtering of Field Data Model C Preprocessing 4. 5 Initial Conditions on Statistics E{Y3 Y1'} E {Y2 Y1 } B0, 0 and B1,0 4.6 Summary of Recursive Equations of Model C Block diagram -45

-464. 1 Introduction This chapter contains a derivation of the equations to estimate the parameters of the model. These estimates are then used in the model to determine the conditional mean and conditional covariance of the process. Consistent estimates in the stationary case are modified to become tracking estimates of the slowly-varying parameters. In addition, some initial "nonlinear filtering" and block-averaging are introduced to smooth the data. A complete block diagram is given for the recursive computation of the model parameters. The model equations, given in Chapter 2, are summarized here for convenience. Process: xk = k xk-1 + Uk (4.1) Observation: Yk = Hk Xk + Vk (4.2) where Uk is distributed as N(O, Qk); uk independent, and Vk is distributed as N(O, Rk); vk independent. Both the u, v statistics may be time-varying and it is assumed that u and v are statistically independent. The parameters Ak' Qk and Rk are unknown and will be estimated. This is a good place to discuss the "estimate-and-plug"

-47technique. Generally speaking, a large school of thought contends that when a parameter is unknown, it should be estimated in some reasonable manner and then used (with caution) as though the parameter were known precisely. On the other hand, the Bayesian school holds that parameters are either precisely known, or if unknown, the parameter is a random variable or random process. Hence the parameter has a distribution which a Bayesian must formulate and use. The parameter is "learned" or "estimated" in the sense that observations concerning the parameter are used to sharpen the knowledge through probability functions of smaller and smaller variance. Prior information in the a priori distribution is updated with a new observation to obtain a new distribution called the a posteriori distribution. Such studies of recent origin include the works of Spooner (Ref. 16), Birdsall (Ref. 2 ) and Jaarsma (Ref. 8 ). The Bayesian philosophy is a tough task-master. It demands a complete statistical description of the a priori information (though the results may not be very sensitive to the a priori distribution) and it requires the use of "Bayes' Law" * to obtain the a posteriori pdf. Such manipulations are often not mathematically tractable and, in the author's opinion, usually not practically feasible. Often, the required observations cannot be made or the computations are too complex or Bayes' Law is the historical name for the accepted definition of conditional probability, given in Chapter 2.

-48too time- consuming. The required computations have not been developed for lack of sufficiently well-defined stochastic models. For these and other reasons which will be given in this chapter, the approach adopted here is to estimate the parameters in a simple and direct manner. Some properties of these estimators will be studied. Later chapters will demonstrate their use with field data and randomly-generated data as input. 4.2 Consistent Estimates for the Stationary Model For the special case of a stationary model (in which Ok = 4' Qk = Q and Rk = R do not depend on time k), the work of Anderson, Kleindorfer, Kleindorfer and Woodroffe (Ref. 1) (referred to as AKKW in the following text) has produced some interesting estimators. They are consistent* estimators when the system is stable and stationary. The algorithms are easily implemented as a special purpose computer or on a programmed computer. Their development follows. Model A Consider the proposition: EIyk k_2 = E{Ykl Yk_2 k > 3 (4.3) *See Appendix A for definitions of underlined terms.

-49From the model equations, for the completely observable case H I, E{yk k2} = E {(xk + Vk)(x2 + k-2) (4.4) E {xkx_ + E {xk 2 } + E {Vk x 2_ + E {vk V2} Under the conditions of the model, x. and v. are orthogonal for every pair i, j, so the second and third terms are 0. Also, vk and Vk_2 are orthogonal. E Yk Yk 2 } = E {xk k_ (4. 5) E{YkYk_2} = E{(Xkl+vk l) (X 2+ V?2)} (4.6) = E{( Xk-_1 + Vk-_l) (x-_2 + vk-_2)} (4.7) = E(Xk - Uk-l + u Vk-1 ) (X)-2 + v-_2)} (4.8) = E{Xk x-2} + E{xk v-2} - E{uk_1 X_-2} - E{uk V-_2} + E {q Vk-l_ x_2} + EJ{q Vkl -2} All terms but the first are O by orthogonality.

-505E{yk-1 yk2I = E{XkX;_ 2 (4. 9) At this point the AKKW study estimated the terms in Eq. 4.3 as follows. k E{ykYk_2} (4 10) kk- gk 3 J j-2 j=3 (The scalar function gk is not relevant at this point.) Then the estimate for ~ after k observations is denoted by (Dk and is given by k ~k +~ = y YjY' S S k>3 ok ( Y 3j-Yj- 02 S12 k,j=3 J- j=3 (4.12) where + indicates the pseudo-inverse.* Let the spectral radius of ~ be denoted by p(O). The following theorem is given without proof. Theorem 4.1. If p(D) < 1 and if ~ and Q are nonsingular, then Ok is a strongly consistent estimate of 0. To develop estimates for Q and R for the same model, consider *See Appendix A for definitions of underlined terms.

-510 { E(k - Yk-) (Yk - Yk-1)'} (4.13) and A1 = E{(Yk - D Yk-1) (Yk- - Yk-2) } (4.14) In the manner illustrated above, it can be shown that A0 = Q+R+ R D' (4.15) A1 = - R. (4.16) Since ~D is non-singular, R and Q are uniquely determined. R=-~ A1 (4.17) Q = A - R - ( R' (4.18a) =A A + A1 + (4.18b) Observe that A0 is positive definite from Eq. 4.13 and that A1 should be negative definite from Eq. 4.17. Theorem 4.2. If ~ and Q are nonsingular and if ~k is given by Eq. 4.12, then k A k = (Y j. Yj-1) (yj_ Yjl), (4.19) A1,k (y YJ- )(yj_ j (4.20) j=3

-52are strongly consistent estimates of AO, AI respectively. Observe that in Eqs. 4.19 and 4.20 only known values of 4. are required at time k. No proof will be given here. A sufficiently detailed proof is given in AKWW (Ref. 1). From these two theorems, these estimates for cD, Q, R are strongly consistent. k k ( j= 3 iy J ( j3 Yj Y_2 (4.12) R= (k1 A (4.21) k ~k k1,k Qe = Ao Rk ~ (k Rk k' (4.22a) A, + k A0 k + Al k (4. 22b) 4.3 Tracking Estimates for the Nonstationary Model The preceding section shows the general approach to the problem, but in this section the condition of stationarity is removed. However, a general principle adopted throughout the remainder of the chapter is that time-variations of the model parameters cD, Q, R are slow relative to the observation rate of the process. Hence, estimators similar to the stationary case will have some applicability. The forms of these estimators and the reasoning behind them will be given below.

- 53Model B Consider the proposition: a) E(YkYk_2) = E(k Yk-lYk-2) (4.23) b) E(Qk Yk-1 k- 2) = %k E(Yk-1 k- 2) (4. 24) For the stationary model, these equations reduce to Eq. 4.3. To prove Eq. 4.23, the previous proof of Eq. 4.3 with Ok for cZ is valid. To justify Eq. 4.24, if Ok is a known function, the result is evident and exact. When Ok is a random process, the approximak tion is "good" if ak and Yk1 Yk-2 are independent and an unbiased estimate of Ok is used. Both of these conditions will be satisfied by the "tracking" estimator for Ok. The following modifications to Eqs. 4.10 and 4.11 seem justified in the case when the expectations are time-dependent. k EYk Yk2} gk k-jYjY2 (4. 25) j=3 k E{Yk-1Yk-2} gk j=3 ck-l- 2 (4.26) where gk (= Ck-j

- 54The sum is required to converge for k -ozc, so Ck j- 0 as j 3. k-j For example, let r < 1 and ckj = r. With a proper choice of rt, the estimates in Eqs. 4.25 and 4.26 will follow time-variations in the statistics. Therefore, = ( Ckj Yj Y_2) Ck-j Yj-1 j-2 (4.27) will be called a tracking estimator; compare to the estimate in Eq. 4.12. Similar modifications of A0, A1 in Eqs. 4.13 and 4.14 and their estimators will also be made. Since b is hypothesized as being a slowly-varying random process, consider 1 = E{(Yk- k Yk1) (Yk-1- k-1 Yk-2)} (4.29) Using the model Eqs. 4. 1 and 4.2 with H = I, Yk Dk Yk-l 1 k + Vk- k Vk-i (4.30) (4.31) Yk-1 - k-1 Yk-2 =Uk- + Vk- k-1 Vk-31) Consequently, because all cross-terms are approximately 0, (Appendix B for terms with ~ )

-55Bo = Qk + Rk + E{Vk Vk-1 v-l k Qk + Rk + k Rkl k (4. 32) where the approximate equality holds when 6k is a good estimate, unbiased, independent of vk. Similarly, B v I C3, R (4.33) B1 = E {-k vk-1 vk-l } * kk (k-. The dependence of B0, B1 on time is explicitly displayed to contrast with Eqs. 4.15 and 4.16 for A0 and Al. Tracking estimates for B0, B1 are now given. k BO,k bOk 3 dkj (Yj Yj1) (Y j Y (4. 34) k Bl,k bl,k j ekj Yj j -1 (yj- - y_2)' (4- 35) j =3 -1 wkhere b,k b,k dkj and ekj are scalars and where b1 = dkj and blk =j ekj are required to converge. Required j=3 conditions are that dk_ j O0 and e j- 0 as j- 3. For example, as above, let r0< 1, dkj = r0k-i and r < 1, ek.j = rlkSince Dk is an approximation to a state transition matrix, it should be nonsingular, and RklI is uniquely determined. Rk1 = k B1,k (4.36)

One more assumption is necessary to get Qk' Suppose the matrix Rk varies slowly with k; as a first approximation, let Rk = Rk1. Then = ~ (4.37a) Qk = Bo, k RIk —1 Rk-1 k (4.7a) Qk = B0 k + Ok Bl,k + Blk k (4.37b) To conclude this section, a brief discussion of these tracking estimators seems in order. No rigorous proofs of the following properties are offered but their plausibility will be illustrated. For the stationary random process, the estimators given by AKKW (Ref. 1) are strongly consistent; in general terms, when enough observations are taken, the estimates get close to their true value. This desirable feature occurs at a practical value of the number of observations; several hundred observations were enough in the AKKW study. Chapter 7 will have results for the time-varying case. When the model is allowed to be nonstationary through time-varying (, Q and R, the above changes in the estimators make it possible for these parameters to be tracked. The effect of the coefficients {Ck_j), Idk_ j and jek_ j is to deemphasize past data by giving it less weight in the sums. These tracking estimators were tested by using field data. Since such field data will not satisfy all the conditions of the model all the time (discussion below), additional processing is introduced.

- 574.4 Filtering of Field Data The process Yk as observed from a physical source perhaps does not satisfy the conditions of the models. For example, for some short observation interval, the following conditions might prevail. a) Multiplicative noise is present. Xk+1 =?k+l xk mk k+l (4. 38) where mk is hypothesized to have a mean of 1, small variance but unknown distribution. b) Additive impulse noise is present. Xk+l = lk+l 1Xk+1 + Pk+l (4.39) where Pk+l is impulse noise which is unknown but significant. The effect may be an apparent rapid change in Q, which the models cannot follow. Under such conditions, the models developed above will not be true. However, by their nature, the models should follow gross characteristics of the process and should be expected to smooth out such transients. Such transient conditions were not intended to be modeled here; to help eliminate them, pre-processing is introduced in the form of a fixed linear filter and a limiting-type, (possibly) timevarying nonlinear processor.

- 58A known, fixed linear filter is acceptable at this point in the development of the model. The output process of such a filter will be modeled by the models of this chapter; by linearity, the statistics of the input process can be determined from the statistics of the output. The form of the filter is chosen for ease of implementation on a digital computer. The parameters of the filter will be selected by experiment with the field data and then fixed. The particular filter is the so- called Block Average filter (BAV). k j=+k-M k > M where r. is the input process (received, observed). The output process Yk is usually sampled at some rate I/S, or one output value for every S input values. Hence kS kS > M Yk r - (4.40) k M j=kS+1-M k = 1,2,... j = discrete time parameter of the observed process r. k = discrete time parameter of model Yk = observed process for the model. The frequency-domain characterization of the BAV filter is

- 59given in Chapter 5. The important facts are that it is a low-pass filter with a 3-db bandwidth proportional to I The selection of the parameters M and S is left for Chapter 5. The need for a nonlinear processor of the limiting type is pointed out by an analysis of real data, which is done in Chapter 7. The general form of the processor is given here to include it with the development of the model. The form is dictated by the condition b) above: additive impulse noise. To reduce the effect of such observations, a limiter was first considered. However, preliminary tests showed that the model then depended strongly on the choice of the limiting value, which is undesirable. The following processor was considered next. l r. r. E {L-sphere about r} = g ~~~r.j~~~~~~~ = 0 (4.41) otherwise where r. = observed process r = mean of observed process CDL = fixed state transition matrix L = limiting value. (L and L must be selected jointly. The region f is the L-sphere about r; by selecting L properly and fixing it, "bad" observations

- 60can be eliminated. If the process can be modeled by Model B, all but a few observations will be in A; the probability measure of the set 9 is very close to 1. The parameters r, DL and L of the fixed nonlinear processor are selected by experiment in Chapter 7. The effect of the processor on data is observed by counting the instances that r.' R to verify that the postulated conditions are satisfied. A modification of this processor was also studied. xklk is the MMSE (minimum mean square estimate) of xk, which is the output of the Kalman-Bucy filter. Take r. =klk for r.1 Z k< j <k+l (4.41) The combined processing of the nonlinear processor, the BAV filter and Model B will be called Model C. The above notations suggest the following order of operations. Nonlinear Nonlinear BAV Model B Processor Fig. 4.1. Model C 4.5 Initial Conditions on Statistics Initial estimators for 5, Q and R should reflect any prior information about the parameters at the starting time of the observations.

- 61Since the subsequent estimates are unbiased, it seems reasonable to start with (a priori) expected values for initial conditions on the four summations used by the estimators. These initial conditions can be calculated by using the stationary condition. From Eq. 4. 5, E{yky_2} E{xkx 2 } E{k k-~ Xk-2 Xk-2} k k- + E{(kk) xk -2} ~f~k OP~k-2 t 2(4.42) where A is the variance of the process x. This result follows from the condition that past values of xk are not correlated with future values of uk. Then, initially (at k = 2 in this notation) and with = =D D g andA =A =A 2 1 0' 2 1 0 E{y2 Y0} = O t 0 A0 (4.43) Repeating Eq. 4.9, E{y Y_ _2} = E{Jxkx-2}. (4.9) Therefore, initially E{y1 Y0} = o A0' (4.44) These are the initial conditions for the sums in the A estimate. The initial values for B0 and B1 are given by Eqs. 4.32 and 4.33

-62with k = 2 and R1 = R2. Bo P2 Q2 +R2 + 2 <R 0 (4.45) B,2 =:~ORZ (4.46) B1 2 - 0 R2 (446) These are the initial conditions for the sums in the estimation of Q and R. Altogether, the following initial values are needed: Ao and bD2' Q2' R2 The initial value A0 for the process covariance was discussed in Chapter 3, Section 3.4. The initial conditions on the other statistics have been expressed in terms of 2, Q2, R2, which represent the initial model. These three matrices must be given starting values by using reasonable physical conditions. Again, similar comments apply to these initial conditions as were made regarding the model initial conditions. As k becomes large, the sensitivity to these initial values is decreased. For this reason, in AKKW no such initial conditions were assumed in their Monte Carlo studies, yet convergence was moderately rapid. They assumed values of 0 for all the statistics at k = 2. 4.6 Summary of Recursive Equations of Model C This chapter is summarized by a block diagram of the parameter estimation for the model. Model C consists of the

- 63preprocessor which is nonlinear, the linear Block Average and the estimators of Model B. The nonlinear processor has the purpose of eliminating abnormal data and is characterized by Eqs. 4.40 and 4.41. The block averaging performs a slight amount of smoothing of the data and is expressed by Eq. 4. 39. Recall that H = I in this model. Ok is a tracking estimator for 4k' the state transition matrix of the model; it is estimated by Eq. 4.27. The driving noise covariance Qk is estimated by &k in Eqs. 4. 37a and 4. 37b. The observation noise covariance Rk is estimated by Rk _ in Eq. 4.36. The estimators and the Kalman-Bucy equations are shown in block diagram form in Fig. 4. 2.

- 64Estimate ~k Ak VectorE aVector y Input "k-i k-i b) b) alman filtetor block diagram, H nxnput -pk (_ Y.-kP1)A k -- -'k- } kk-1 xk-lik-1 _ I''Memory Fig. 4.2. a) Estimation of 5k' Qk' Rk b) Kalman filter block diagram, H=I, nxn

CHAPTER 5 A CLASS OF LINEAR FILTERS Synopsis 5. 1 Introduction Analog and digital filters 5.2 A Standard Form of a Class of Analog Filters Impulse response Transfer function 7-diagram Special cases General case: complex coefficients 5. 3 A Standard Form of a Class of Digital Processors Sampling Equivalent digital processor Block diagram Examples 5.4 Combined Operation 5. 5 Time-Domain Viewpoint

- 665.1 Introduction This chapter will discuss and illustrate a standard form of filtering or processing of data. Generally speaking, we talk of filtering when analog signals are being used and we say processing when digital signals are involved; however, there is no universally accepted demarkation. There is a common nature in the several forms of digital processors in use, so that a single analysis will include them all. The procedure will be to study a general form of the analog filter by describing its impulse response and amplitude spectrum. We will demonstrate that this form has a simple digital processor equivalent. Some specific types of processors will be illustrated and a cascade of such processors will have meaning. A time-domain analysis will be offered for the standard form of the digital processor. We comment on the distinguishing and complementary features of the time-dormain and frequency-domain analysis. Historically speaking, such filters have been applied recently to the processing of acoustic signals (Refs. 7, 19 and 20). The derivations of the filter characteristics have taken many forms, often repeated and with unnecessary assumptions. A goal of the following material is a unified presentation. 5.2 A Standard Form of a Class of Analog Filters 5.2.1 Impulse Response. A particular class of analog filters

-67will be studied, namely that having the impulse response h(T) = 1 r 6( — kPA) (5.1) k=O A brief discussion of the origin of this function will be useful. The concept and physical implementation of a delay line* are quite familiar to an engineer. In the continuous time-domain, such pure delay is often modeled by the linear filter with the impulse response h (T) = 6(T- T), (5.2a) where T is the delay. For any input signal s(t), the output is the convolution s(t)* 6(t- T) = s(t- T) (5.2b) The convolution operator * is well-known in the literature. The frequency-domain description of the pure delay is H (f) = e j2ffT which has modulus 1 and causes a linear phase change of -T radians/ sec. Figure 5.1 summarizes these properties. The next step is to use delay lines of lengths which are multiples of T. Obviously, we can consider this to be a tapped delay line, and let the number of sections be M-1. It is important to recognize that the final result may not be in this form, but that the "*Underlined terms are defined in Appendix A.

- 68s(t h st- T) h1(7) = 6(T- T) H (f) = e-J2TfT Fig. 5.1. A delay line of T units Tapped Delay Line o I1 I M-1 M-1 h2(T) = (T- kT) Fig. 5.2. A tapped delay line filter Fig. 5.2. A tapped delay line filter

- 69analysis is linear and is simplified by this approach. The model is illustrated in Fig. 5.2. The input point is one of the taps so that one of the sequence of inputs has zero delay. Two aspects of this filter should be noted. It is input-output stable because the latest input (at the left) enters as the earliest input exits and is forgotten (at the right). Secondly, it is linear, since the delay operator and summation are linear. The impulse response is evident. M-1 h2(z) = T) (5.3) k=O The generalization to Eq. 5.1 results by allowing the coefficients M-l-k r and letting T = PA; r will determine the particular filter and the discussion of P A will be deferred until later. 5.2.2 Transfer Function. The Fourier transform of h(z) is readily computed. H(f) =l r-1'{6(T- kPA)} k=O M-1 M-1-A M C rkej2PP A (5.4) = r e (5.4) k=O A geometric progression to M terms has the sum

-70IAM 1 - XM I + x +L2...+X = (5.5) M AX=I X may be complex; here X = r-1 e j22fT r r-M e-j2MnfT) H(f) = (-5.6) r = ~1 is the simplest case, for which the results are well known. When these four cases have been treated, an extension to the whole complex r plane will be evident and presented. 5.2.3 The 7-Diagram. The impulse response and its Fourier Transform are the usual representations of linear filters. For the class of Eq. 5.1, a third useful representation has been introduced. Here it will be called the T-diagram and is the impulse train representation on the RT-time axis of the impulse response; it is the time-reverse of the operation of the filter on incoming signals. The general form for any r has the form shown in Fig. 5.3 with the label r. Several cases for specific values of r are also shown. A merit of the T-diagram is its pictorial display of the filter. h(T) is just the sum of these components. The diagram also shows that when r is complex, a complex filter results; it should be regarded as a 2- channel device producing two real outputs for one real

M-1 -71r r f t t t e X T O Pa 2Ph (M- 1)P r=1 r=- 1 _ r=j M=8 2=-j 2 4 2M=8 7F- 6 I I I T o P2P Fig. 5.3 - diagrams Fig. 5. 3 r - diagrams

-72input. The input is the real signal s(t); the output is the complex signal z(t) = x(t)+ j y(t) s(t) * hR(t) + j s(t) * hI(t) (5.7) Another point made by the T-diagram is that, for r = -I, M must be even and for imaginary r, M must be divisible by 4. Otherwise, the results are apparently not useful.* 5.2.4 Special Cases: r =~1, r = j. For this case, a general result can be maintained because r = e j2vf for a = 0, -, 1 4 Hj 2e(M- )a (1- ej 27Ma e i 2MfT) H(f) =M (1 ej2VC e-2'fT) ejwMa - j2wa e- jMifT sin(MvfT + Mra) M e jO j-jfT sin(ffT+ rfa) = e sinc (MT +-;M) (5.8) T, Some of the steps should be explained; first factor out the terms in a; then take out each of the second factors; the sine terms result. The sinc function is defined by sinc x = sin x (5.9) *This is an oversimplification; it will be clarified below.

- 73A function which has been called the periodic analog of sincx has also been defined. sinc(x;M) = sin(x) (5.0) M sin (-M) We identify the phase shift term in H(f), which corresponds to an average delay of (M-1) T and a phase shift of (M-1) 7ia. The sinc 2 function with parameter M is real and is the gain term. I H(f) I sinc T (f + ); (5.11) In this form, the frequency shift is evident for different a. The amplitude spectrum will be found for the case a! = 0. Then the four possible filters will be identified. Amplitude Spectrum The amplitude spectrum is the plot of IH(f)l against frequency f. The properties of the periodic sinc function follow from the zeros of the two sine functions involved. A value of I occurs at f values for which both terms have zeros; at f = 0, + p.... Zeros 1L 2 m occur at f = MP MP and in general for m not a multiple of M. With this general approach, a single plot of IH(f) I with the parameters M, P, r and A will suffice for all the filters considered below. Figure 5.4 illustrates this function for different scales

.9.8 7 sin MvfT I M sin ifTI.6 (M= 8) / (a).4.3 2 2 11 2 Fig. 5.4. Amplitude spectra IHf for r (b) 2 1 -I1 f -T 0 T Fig. 5.4. Amplitude spectra IH(f){ for r = 1

-75on the f-axis; recall that this is the standard form with r = I or a = 0. Observe that the maximum between the zeros at M and at M P is much smaller than. Just for the record, this maxi1.43 mum occurs at approximately f = M for all M > 4. The maximum has the value -sini 1.43 = 218, approximately for all 77 1.43 M > 4. These results follow directly from lim sinc(x;M) = sincx (5.12) M-xc and are significant in that they demonstrate that the relative shape of the response is about the same for all M > 4 for this case In1 = 1. The power spectrum I H(f) 12 has the same peak and zero locations of course, but the shape of the peaks is that of sinc2(x;M). The first maximum is down 13 db relative to the peak. Comb Filter or CIRAV The case r = 1 or a = O is studied above. From Fig. 5.4 c) observe that the spectrum looks like a comb extended infinitely in both directions; this filter has been given the name comb filter. Continuing the analogy, the peaks will be called teeth. Its spectrum can be summarized in the following list.

-7 6Comb filter r = 1 Tooth Center Frequencies f =; n O, ~1, n PA Tooth Bandwidth = M f -f n+l n Quality Q = bandwidth M The case r = -1 or Ox = is readily available from the above results in Eq. 5. 11; there is just a frequency shift of 2 Since ~~~~1~~~~~~~~~2P is the separation of the teeth, the case r = -1 is the filter with teeth shifted half this distance. Comb filter r = -1 n i Tooth Center Frequencies f = -; n 01.1, n - PA\ - 2PAn Tooth Bandwidth = Quality Q = M Complex Comb Filter When r = ~j, complex filters result. Setting q = 1 or 3 in the last of Eqs. 5.8 produces H(f). Removing the phase shift term as identified above, there still remains a complex function. Let r=+j or a=

-77H (f) = e sinc (M P (f + 4;M) M. e 4 e 4sinc( — -) + j) sinc( — -) M/4 odd -2 I+ j) -2 sinc( —) M/4 even (-1)M (1 - j) 2 sinc(- - -) HR(f) + j HI(f) (5.13) which defines the real and imaginary parts. To make the gain on each part equal, M should be divisible by 4, as stated above. Note also the frequency shift of 4 for the case r = +j There is a more instructive procedure. 1M1 jMlk 6(T -kP ) (5.14) k=O k even k odd _ M( 1 f~2 ~ C C1~ 6(7- -2r(x) l =O

- 78M 1 2 + Z (-I) 6(7- PA- m2PA) m=O We recognize two (real) comb filters with parameters 2, P, 2A, r = -1 and the second has an additional delay of PAl. Therefore, Hc(f) = HR(f) + j HI(f) (5.15) je e2fPh sinc (MP (f + 4p); + ( sinc MP ( 4p); 2)] e2f - Of course, we claim the filters of Eq. 5.13 and Eq. 5.15 are the same. Note the additional delay in the real filter which synchronizes its output with the imaginary filter; indeed the time-domain operation adds to the real output first, then to the imaginary output. (See the T-diagram.) Complex comb filter r = j; each part Tooth Center Frequencies f = +1, n 2P A 4P; Tooth Bandwidth = M ]M Quality Q = M Note that the teeth are half as far apart compared to r = + 1, the bandwidth is the same and the quality is half. Similar results are

-79obtained for r =-j. Complex comb filter r = -j; each part n 3 Tooth Center Frequencies fn = 2P 4 0, ~1, n 2PA 4 P A Tooth Bandwidth = Quality Q = 2 5.2.5 General Case; Complex r. The general case has for r any complex number. It is convenient to represent r in the form r = a ej2; a > 0, real (5.16) Q na + j 2irnf =e Also, define u = na, a e (5.17) v f + (5.18) From Eq. 5.6, (5 1 1 9) M-(v) = 1 ej(M-1)2a (1- e2Mna + j7T (v) H(f) M (1-2.na + jrT (f +),j(M-1)2!- j(M-1)7rTv e (Ml)u sinh (Mu + jnMTv) M sinh (u + jTTv) (5.20)

- 80The variable v is f frequency shifted by the amount T. The parameters are M, T, u. The similarity to the result in Eq. 5.8 is evident; as a check, for r = 1, we have a =1 and a = 0, so u = 0 and v = f; then use sinh(jX) = j sin X. The power spectrum will be computed. H(v)lz (M )u (sinh2 Mu+ sin2 7rMTv) IH(v) 12 = e (5.21) Mz (sinhz u + sin'z rTv) All arguments are real. With u = 0, the zero's and peaks are determined by the zero's of sin irMTv and sin 1rTv as already discussed. With the additional u terms, I H(v) 12 will be changed but its characteristics will remain fixed. For example, IH(v) lZ is still periodic in v with period -, it will exhibit peaks and ripple between them, but their positions will be changed. For no value of v is H(v) = 0. We claim that I H(v) 2 is "smooth and concave" near its peaks. Near the peak at v = 0, approximate sinX by X and note that sinh2 Mu > sinh2 u. The peak behaves as the function 1 v2 + y2 4 v Several examples are given for illustration. Consider the filter with M = 4, T = 1 and r =.90. Then a =.90, a = 0; u = -.05268 and v = f, there being no frequency shift.

-81I H(f)! a.729 (sinhz.21072 + sinz 4irf) I H(f) 12 - (5.22) (sinh2.05268 + sin' rf) This function is illustrated in Fig. 5.5 a). The function is also plotted for several other values of the parameter M. Figure 5.5 b) illustrates curves for r = 1.0 and several values of M. Figure 5.5 c) has the coefficient r =. 5. Of interest is the value of the peak for r values; Fig. 5.6 shows IH(0)12 and its dependence on r. 2(M-1)u e' sinh2 M u 1 I H(O); h= u i=~ nlrl (5.23) M2 sinh2 u To summarize this section, a standard form has been found for the filter with impulse response h(T). The filter has three parameters: M, T, r. M is the number of delays, counting one of zero delay; this is the quality Q of the filter and determines the number of ripples in the spectra. T is the amount of each delay and T is the spacing between peaks in the spectra. The complex coefficient r determines the type of filter; in the form r = a ej, the M product — 2 na determines the shape of the spectra and a fixes the position of the peaks. To this point, there has been deliberate avoidance of any mention of the properties of the input signal, such as low-pass, bandpass or sampled data. The above analysis is for a class of linear filters

-821.0 0.5 0.02.10.20.30.40.50 a) 0 \ 516 0.02.10.20.30.40.50 Fig. 5.5. a) Power spectra IH(f)12 for r =.90 b) Power spectra IH(f)12 for r = 1.0

-831.0 0.5 M=4 _ 1 f 0.02 10.20.30.40 Fig. 5.5. c) Power spectra IH(f) 12 for r=.5 1.01' II 0.5 -- 0O4 0.I5 1.041 0 0.5 2 versus r Fig, 5.6. IH(O)12 versus r

- 84with the specified impulse response of Eq. 5.1. The results are quite general. 5.3 A Standard Form of Digital Processors The impulse function 6(T- kT) is in the class of generalized functions. We have already used it with other members of this class through the convolution operator *. By definition, s(t) * 6(t - kT) = s(t - kT) and the net operation is a delay (time-displacement) of the function s(t). 5.3.1 Sampling. The operation of sampling a function can also be modeled using the impulse function. The operator ~ (multiply) is used and sampling is modeled as (by definition) s(t) 6(t - e) = s(e). (5.24) An infinite sequence of such delta functions may be represented as 6 (t) = {6(t-ke)}k=_oo (5.25) and is called an impulse train. The sequence of samples of the function is {s(ke)} = {s(t) ~ 6(t- k)} s(t) ~ 6 (t) (5.26) k=-xo k=-ox e With this model of the sampling operation, the digital processor (discrete time) equivalent to h(7) can be derived. The big point is

- 85that its spectra are already completely known. The output of a linear filter may obviously be sampled; this means we are looking for the results only at discrete instants in time. Clearly, the input may also be a sequence of samples (from some other filter) but usually the input and output samples are synchronized. The model for the filter h(T) is illustrated in Fig. 5.7. 6T(t) &MT(t) s (t) Fig. 5.7. Sampled filter 5.3.2 Equivalent Digital Processor. Now that only samples at discrete times are involved, we recognize that an exact equivalent digital process or (algorithm) can be found for the filters of the class being studied. Since the output is sampled, the tapped delay line realization is not needed. A single delay unit with impulse response h(T - T; r) is used as shown in Fig. 5.8. Such a linear filter is often called a circulating average. In digital devices, the delay operation is just the storing of results for the next recursion. For

-866 T(t) 6 MT(t) h(T; M, T, r) Fig. 5.8. Sampled circulating average the simplest situation where only one number needs to be stored, the digital processor is shown in Fig. 5.9. aT(t) OMT(t) Me mo ry 1 Word Fig. 5.9. Simple circulating average

- 87The input sampling corresponds to accepting data at discrete times and the output sampling takes the result when the processing is done; these sampling operations are implied in digital processing and will be incorporated with the digital processor. When a sequence of multiple inputs is provided in one duration T, a multi-position memory is required. Let the number of positions be P and T = PA, where A is the interval between input samples. See Fig. 5.10. 6 (t) 6MpA(t) M ( Memory 1 2 ___ P Fig. 5.10. P-position circulating average 5.3.3 MPR Block Diagram. A convenient block diagram description of this standard digital processor is shown in Fig. 5.11. From the parameters involved, this processor will be called an MPR filter and its acronym might have been taken from the apt descriptive phrase, "multi-positional recirculator."

- 88Fig. 5.11. MPR processor 5.3.4 Examples. For the case P = 1 and r =1, the processing consists of averaging over M values. M I Z s(kA) k=O so that this is often called Block Averaging (BAV). For P = 1 and r < 1 with M very large, a weighted block average results which approximates RC filtering. For P > 1, P words of memory are required and P words are contained in the output; we may regard this array of P numbers as the discrete time description of the analog result. The significance of a complex r has been noted. For M = 8, P = 1 and r = j, a complex filter results with a single complex output. It is equivalent to two parallel real filters; the block diagram is presented in Fig. 5.12. An additional delay of A is needed in the model to synchronize the x and y outputs. The MPR standard

- 89BAV. i i i _ 4 1-1 Complex BAV,. X+ jY A 8A_ 8 1 j | BAV 2A 8Ai jY Fig. 5.12. The complex block average form can not realize a pure delay due to the choice of the form of h(T). 5.4 Combined Operation Just as in the continuous case, these digital processor building blocks may be combined to form more elegant processing. The topics of input-output matching and cascading of filters will be discussed. A pair of digital processor blocks can be successfully cascaded in real time under certain conditions; their output and input sampling must be coincident. For example, in Fig. 5.13 a cascade

-90BAV BAV I., |M1 1 A 1 A "2 M2A2 ~' require: 2 = M A MI11 1 1 IM2 1 Fig. 5.13. A cascade of block averages of two BAV blocks is allowed if A2 = Ml1. Likewise, in Fig. 5.14 a CIRAV followed by a BAV requires A4 M3A3 and M4 = P. CIRAV BAV 3 1 M3P33 4 4 M4 4 require: 34 =M33 M | 1 P 1 = P3 Fig. 5.14. A cascade of circulating average and block average Furthermore, the memory of the devices can be exploited so that non-real-time processing can be performed. It is sensible to perform a CIRAV in real-time and then to process the P-dimensional array at a faster, slower or later time. This analysis has shown that the resultant spectra will not be affected by such implementation convenience.

-915.5 Time-Domain Viewpoint In the time-domain, the MPR processor is viewed as performing a weighted average of the set of M inputs. For example, if r = +1, an average is obtained and a signal is presumably present. If the signal is constant (but unknown), a single accumulation (P = 1) is needed. If the signal is periodic, a multi-position accumulator of P words is necessary. In either case, the purpose is to enhance the signal portion and reduce the effect of noise; the signal portion is correlated but the noise is uncorrelated, so the variance of the result is reduced. On the other hand, if the noise by itself is of interest, the signal portion must be cancelled; this is accomplished with even M and with r = -1, so that signs alternate. This type of processing eliminates the signal and noise mean, so that the resultant can be studied for other properties of the noise, such as its covariance and average noise power. Both of these cases, r = +1 and -1, and others are used in the data processing described in the next chapter. Given the statistics of the input process to such processors, the precise description of the output statistics is not simple. In general, an M-fold convolution of probability densities is required. Only for some densities can the integral be evaluated directly, and tabulations of difficult integrals are limited. Fortunately, for the Gaussian case, by linearity, the resultant output is Gaussian and the mean

- 92and variance can be calculated. A detailed statistical analysis of the MPR processor is not required for this study; the processes to be studied are modeled as Gauss-Markov. There are advantages to the combined viewpoints of the frequency-domain and time-domain. The frequency-domain viewpoint keeps the spectral picture of the signal in the foreground and clarifies from where the signal and noise measurements come. If the signal is periodic and has a known line spectrum, one MPR "comb filter" can be used to "look for" these lines while another measures the noise properties between the spectral lines of the signal. The time-domain viewpoint keeps clear the temporal operation of the MPR filter and is the spring-board for a detailed statistical analysis. To conclude this chapter, the MPR processor is reviewed. It is a linear, sampled-data processor which has a continuous analog with impulse response of Eq. 5. 1. The parameter M is the number of samples used in the averaging. P is the number of positions or words in the result. The parameter r is the radix in the coefficients of the average; r may be complex, in which case one real input sequence produces two real output sequences.

CHAPTER 6 DATA PROCESSING Synopsis 6.1 Introduction Model equations Kalman- Bucy equations Parameter estimation equations 6.2 Computer Program Flow Diagram 6.3 Monte Carlo Tests Gauss-Markov sequence sample functions Testing a stationary model 6.4 Source of Field Data Underwater acoustics Dual-comb data processing Processes T, N 6.5 Experimental Procedure Inspection of sample functions Checking initial conditions Inspection of filtered process Performance Time-varying model - 93

-946.1 Introduction In this and following chapters, the Kalman-Bucy equations and the parameter estimation equations will be applied. For convenience, the relevant equations developed in Chapters 2, 3, and 4 are repeated here. Model Equations Process: Xk = k-l k (6.1) Xk = (Dkk Xk-1 + Uk Observation: k= Hkk + k (6.2) Initial Condition: x0 (6.3) Recall that uk and vk are Gaussian white noise sequences which are defined in Eqs. 3.23 and 3.24. Kalman- Bucy Equations Process estimates: x010 = x0 (6.4) (6.5) Xklk- =k kXk-1 1k-1 (6.5) Xklk = klk-1 + k(Y k Hk Xklk-1) (6. 6) for k = 1,2,... where the optimal gain Ak is defined by the computations

-95P 1 E xO xO} (6.7) k k k-1 k + k (6 8) Ak PkHk (Hkk Hk + k) (6.9) k (I - AkHk) Pk(I - AkHk) + Ak Rk Ak (6.10) Pk and Pk are the error covariances defined in Eqs. 3.19 and 3.13; the required order of computation is Eqs. 6. 8, 6. 9, 6.10, 6.5 and 6.6. Parameter Estimation Equations - ( k-YY2) (- Ck _ y y (6.11) j =3 j -1 Rk- 1 C-k B, k (6.12) Q=B g~k~%_l~~k~x-l ~Pkl (6.13) Qk - B, k- Rk-1- "kRk-1 CDk (6'13) The statistics B, k and B1 are defined in Eqs. 4.34 and 4.35. 0, k 1,k Figure 6.1 shows the processing of the data before its use in the model. Some of the operations are described in the next chapter. The application of these equations is toward a study of two acoustic processes. These acoustic processes will be defined in

Magnetic Tape rk Dual >I L _| Running AVG $ Processing Average Memory rk-4'.. * rk) MMSE Y = Y-Intercept Vector r LinearYk rk ~ Fg..Percsigo =dtSlope Fig. 6.1. Preprocessing of data

- 97this chapter and related to an on-going acoustic signal processing program. A computer flow diagram is presented to show the recursive computations in yet another form. A Monte-Carlo test of the parameter estimation equations is included to study the model under controlled and known statistics. 6.2 Computer Program Flow Diagram The heart of this program is the set of equations (Eqs. 6.4 - 6.13) described above. However, certain ancillary features make the program very flexible in its use. These features include the calculation of initial conditions, a separate evaluation of performance, a minimum mean square error fit of a line to the observed data, the selection of several modes of parameter estimation and the selection of several output formats. The last two items are briefly discussed here. Due to the experimental nature of this work, it is deemed desirable to estimate the parameters C, Q and R simultaneously and also separately in various combinations. The following modes of estimation are allowed. 1. C, Q, R, as in the equations 2. (, Q, R fixed at initial values 3. Z, with Q, R fixed at initial values 4. ~, Q, with R fixed at initial value

- 98There may be physical reasons for fixing C, Q or R. Furthermore, during early phases of modeling, such procedures often help to get a ball-park estimate of the parameters. Also desirable is a selection of output formats of such a program. Of obvious interest are the series of computed estimates and the filter output vector. The list of options allowed is given below. kid QW' Rg ~ xk I k 2. Performance Indices JOk' Jlk' J2k, J3k' J4 3. Forecasts of x 4. Statistics S00, S12 S02' B0, B1 5. Program variables 6. Observed data These options were found useful in particular when field data was used. Item 3 will be explained in the next chapter. The flow diagram is given in Fig. 6.2. The choice of options is made by the "mode". The program was written in Fortran, for an 8K PDP- 8 system and for the Level-G Fortran system in MTS (Michigan Terminal System). Figure 6.3 contains the flow chart for the initial condition input subroutine ICIN; it consists of reading the control and run parameters and the initial conditions for (D, Q and R, from which the initial conditions of the statistics are computed. We found this particular flow of action very convenient. Figure 6.4

- 99End Yes 2 ode I End of a. 1,3, 4 ~ PHIEST Mro e -- SAMPLE Initialize Variables r: REST Rk Mode Counters - 0 Performance - 0 QEST Processinig _ -- servation on —--- Vectoryk k, Ak, y!k No Perf. Yes PERF JO, J2, J3 OUTPUT Fig. 6.2. Program flow chart 3A

-100tENTER- ICIN:.d Kode. < 0 Control \'?~~~~~~~~~~ ) Control? Initial Conditions 123 5.., AREu 2 3 Tim i_ (42) I33 Processing:[4_7.~~~~-. — Coefficients _ ~ ~ ~ ~ ~ ~ ~ _.S nitial Condition A, Q, R Fig. 6.3. Flow chart for ICIN

ENTER tOUT PUIT? K-~ode? ~~1 C~6 0 ) ()) Q -l I A, P I f Scale 1, 25P e Scale I kI Perf. ii Fortcast I J 1, J, 2 Statistics Observed J3, J4 k4ij-llkr X4C AST x~~ Var~iables k + I k. _ Fig. 6.4. Flow chart for OUTPUT

-102is the flow chart for the output subroutine OUTPUT; the selection of output quantities is called the "kode". 6.3 Monte-Carlo Tests 6.3.1 Gauss-Markov Sample Functions. A few sample functions of a discrete-time Gauss-Markov process (called a GaussMarkov sequence) are illustrated in Fig. 6.5. All vectors are simply scalars (n = 1) and Xk = Xk-1 + k where {uk} is a sequence of independent Gaussian variables with zero mean and variance Q. A standard random number generator routine was used. The sample functions are typical for a Markov sequence with order n=1. Observe that no particular pattern, no consistent oscillations are apparent. The values of x are correlated with correlation coefficient ~. The x process has zero mean and variance Q/(1-,2). 6.3.2 Testing a Stationary Model. The parameter estimation schemes were tested using a time-invariant model, n=1.. The values are - =.90, Q = 1. 0, R =.40; this is a moderately noisy observation system. The results of 38 runs are summarized in Table 6.1; they agree with previous experiments of AKKW (Ref. 1). Note that ~ is slightly lower than its "true" value. Q is nominally

C) or 01 C) + o D -4+ ++ + + +~~~~~~~~~~~~~~~+ + 0 + v++ +* + 4 + o_ + + " -.+ + 4 + + Zr 4-+4-4 +4 +4 44 + 4 + #4- 4-4-4- +4.- + C+ + + + + + + + ++ + ++ +- - + g~ l + + +* * o 4=+ ~ a:J 4-4- + + + + + + + >rt0-+ ++ +$ ++ = + CI + ++4+++ + +%+ + + + + + - + o1 + + + +++ ++~ +.~ + ++.. +1 +# zro+~~~: + = o + + + Q = 1.0 0 ~'.00 20.00 o40.00 60.00 80.00 100.00 120.00 10.00 160. 00 DISCRETE TIME LiX10-1 Fig. 6. 5. Gauss-Markov sequence sample function

-104estimated correctly. The biggest problem is in estimating R; it is under- estimated in all cases. k AVG.'D VAR. $ AVG. Q VAR. Q AVG. R VAR. R 500.8848. 0014 1. 0133.0523.1329. 0152 1000.8837.0015 1.0082.0258.1409.0076 1500.8960.0008.9920.0357.1440.0089 2000.8887.0023.9571.0297.1547. 0108 Table 6.1. =.90, Q=1.0, R=.40 6.4 Source of Field Data 6. 4.1 Underwater Acoustics. The following chapters are applications of this theory to a study of certain underwater acoustic processes. The data is part of a long-term and continuing study of underwater acoustics and oceanographic data. The experimental site and the laboratory facilities are described in a report by Steinberg and Birdsall (Ref. 17). Previous data acquisition is described in the reports by Under and Veenkant (Refs. 19 and 20) and Hatter (Ref. 7). 6.4.2 Dual Comb Data Processing. The term "Dual Comb" will be clarified presently. The previous chapter has presented a general treatise on a class of digital processors; there it was pointed out that these digital "filters" are readily implemented on a digital computer; a member of this class has been given the name MPR

-105processor. A dual comb filter is set up as follows: one comb is made to have its peaks or teeth at a set of frequencies and then a second comb is placed exactly mid-way between these teeth; the analogy is quite figurative and the name is aptly applied. It is common practice at this site to use a bandwidth with center frequency at 420 Hz. The sampling of data is done at four times this frequency, 1680 Hz. The data-acquisition processing of the dual comb filter is illustrated in Fig. 6.6. The bandlimiting filter with transfer function H (f) is of a paramount importance. All subsequent processing by processors of the MPR class is broadband. It is this front-end filter which prevents aliasing of all these frequencies. The filter has a nominal bandwidth of 100 Hz, center frequency of 420 Hz and I H (f) I has slopes of a 18 db/octave. The first MPR filter is complex and performs what in the past has been called demodulation. Its filtering action is clear: it has a peak at f 2 = 420 Hz and a bandwidth of about 210 Hz. 2 A N 4 AiN It produces a complex output X + jY at intervals OUT = 210 sec. Its smoothing is very slight but it produces the orthogonal results X and Y. X and Y are then subject to identical processing. The four MPR processors with M = 2 accomplish the interleaving of the peaks

-106Complex BAV Hydrophone Amp Cable 6~(t) 1680 C IRAV l S2 64/210 1 35-1/7 Square CIRAV BAV x 35-1/7 64/105 64 1 20 1 60 C1 1 CIRAV 64/120 35-1/7 Square CIRAV 64/120 35-1/7 Square 2 60 1 Y CIRAV p c i BAV 35-1/7 64/105 Log 64 120 1 60 I I CIRAV 64/120 35-1/7 y Square N Fig. 6. 6. Data-acquisition processing

-107of the comb filters. One has output S, the other N. The sum of the squares of the x and y components are averaged in the final BAV processors; the logarithm of both of these numbers is calculated and subsequently stored on magnetic tape. The data is over an interval of 35-1/7 sec. The "postprocessing" to the right of the dashed line in Fig. 6.6 is done in about 6 sec. and a CRT display is run for about 10 sec. Hence, the pair of observations arrive every 52 sec., approximately. 6.4.3 Processes T, N. The T process is defined* as the log of the average of the quadratic content of the S filters. The N process is defined as the log of the average of the quadratic content of the N filters. T = log ( (S2 + Sy2)) N = log ( (N 2 + N2) To this point, no mention has been made concerning the nature of the process at the hydrophone. The processing is meaningful for any input. In the event that no signal is being transmitted, these two processes T and N should be about the same. *In the experimental development of Dual Comb processing at the Cooley Electronics Laboratory, there have been several variatio on this definition, differing only in how the carrier power is trea

-108The primary purpose of the dual comb processing is to study the received signal power and noise power when a known signal of known period is being transmitted. The signal used in these experiments is a 420 Hz carrier amplitude modulated (switched ON-OFF) by a binary pseudo-random sequence of period 15 bits. The details of such signals are adequately discussed in previous articles. The essential feature is that the binary sequence is clocked at 1680 Hz. 32 zA~ =32 A A = 1 as above s 1680 /sin 7TfA I H (f)!2 1 = This envelope of the signal power spectrum has 0 values at f = m = 1, ~+2,.... The spectral lines are located at f 1 15 A p=0, ~1, ~2,... 6.5 Experimental Procedure Considerable experimentation has pointed to the need for an experimental procedure in establishing the model parameters. The following steps are useful. 1. Inspection of sample functions. An inspection of a sufficient amount of data can give information on the model parameters Z, Q, R and their initial conditions. For example, the T process exhibits sinusoidal behavior

-109which is attributable to a second order system with a stochastic process input. Periodicity of 50 to 150 sample intervals is observed. Also, the sample functions indicate that, for the T process, Q is small, being less than.50 and more likely.050. Likewise R is small due to the long averaging which produces each data point. However, it is significant to keep in mind that the initial values of Q and R are not very critical since the estimation equations will reduce their effect as time progresses. 2. Checking initial conditions. Having selected a nominal set of initial conditions for (, Q and R, it is often worthwhile to use the time-varying model to check these initial conditions. For example, if the results show a decreasing driving noise covariance Q but the data does not support such behavior, then the initial value of Q was too high. Iteration of this procedure easily determines reasonable values for initial conditions. 3. Inspection of filtered process. The filtered process is the output xlk of the optimal filter. For a fixed model with the nominal values for A, Q, R, the filtered process should be a smoothed version of the observed data. If it is not very smooth but rather seems to follow the observed points, then the gain A is too high. If the filtered process is smooth but seems to lag the observed points, the gain A is too low. These

-110points are clarified by rewriting the filter equations (Eqs. 6.5 and 6. 6) to get Xklk (I- 4kHk) (k Xk-1 k-1 + Ak Yk 4. Performance. This step will indicate the quality of the model by evaluating the performance indices JO, J1,..., JN defined in Chapter 3. Ideally, xk = xk - xk should be used in this evaluation, but of course xk is not available; yk = Xk + vk is observed. Therefore yk = Yk - Yk = Yk - Xk is used and consequently a lower bound for JO is P + R < Q + R. The estimation error variance can not be any less than the sum of the forcing process variance and the observation process variance. This performance index might be evaluated for several sets of values for D, Q, R. An improved fixed model (time-invariant) will often be evident. However, little effort is warranted to accomplish this, because the time-varying model should exhibit considerable improvement. 5. Time-varying model. Having established a set of initial conditions for c, Q, R, inspected the filtered process and evaluated the performance, the time-varying model can be applied to improve the performance. The parameters A, Q, R will be "tracked" by the estimation scheme

and better performance at all times k should be expected. The reasons for not getting a better performance at a particular time k will be discussed for each process under study.

CHAPTER 7 MODEL APPLIED TO THE T PROCESS Synopsis 7.1 Introduction 7.2 T Process Sample Functions 7.3 One- State Model, n = 1 Coefficients in the estimators Initial conditions Filtered process Performance results 7.4 Modified Two-State Model, n = 2 Coefficients in the estimators Initial conditions Filtered process Performance results 7. 5 Comments on the T Process

-1137.1 Introduction The T-process, as defined in Section 6.4.3, is a broadband signal power measurement expressed in db. For the reasons given in Chapter 1, the Gauss-Markov conditions are reasonable to assume for this process. The questions to answer in this chapter are the following: 1) What minimum n-state model is adequate for the process? 2) Is the time-varying model better? 3) What are the performance indices? 7.2 T Process Sample Functions To illustrate the T process, a few sample functions have been plotted. Figures 7.1 a), b) each represent about two hours of observations and Fig. 7.1 c) is about four hours of observations. These sample functions are exactly those produced by the data acquisition program called "Dual". Observe that the process is considerably over-sampled. By taking only 1 of every 5 data points, the sample function of Fig. 7.2 results. This is about 18 hours of data with samples spaced about 4.3 minutes apart. 7.3 One-State Model, n = 1 The first study is for the scalar case n =1 and all equations are scalar. This is a strong simplification, but is justifiable on three points. It is the elementary starting place to study the models

C) O o1 cL* + _-J > ++++ 0 o'500.00oo 5so50.00 600.00 6 50.00 700.00 750.00 800.00oo DISCRETE TIME K Fig. 7. 1. a) T process sample function

0 C) Mto TCDj~ al J + + i i 100. 0+ N'100.00 1t50..00 1200.00 1250.00 1300.00 1350.00 1'00.00 DISCRETE TIME K Fig. 7.1. b) T process sample function

+ CD.... CD' CJ'7?00. 00 750.0 0 800.00 850.00 900. 00 950.00 1000.00 DISCRETE TIME K Fig. 7 1. c) T process sample function

o +a +a~6+ + + + + + +; $ + + 308P++++ + $ ++ * ++t+ 4++ + 1-e ** + +'++ * ++ + + + + + + + ++ + + + + +4 +++ *+ MI:$ | ++ + + ++4- $++ u0 +++ + + + ++ tC ~+ + + *+ +++ c++ + + + + + + + + ++ +.+ + + 0 + 0 + oL > 1 9 + + +.009 20.00 40.00 60.00 80.00 100.00 120.00 DISCRETE TIME K (X1+ O- Fig. 7. 2. T process sample function, one of every 5 data points ac+ Fig. 7.2. T process sample function, one of every 5 data points

-118under field test. The observations Yk are often available only in scalar form; other state variables are often difficult or impossible to observe or very noisy. Thirdly, the scalar models can be expected to have acceptable performance for some processes, but if not, they will indicate which higher- order models to consider. This has been the case in this study, as the sequel will show. 7.3. 1 Coefficients in the Estimators. To use the estimation scheme C, the coefficients Ckj, dk_j, ekj must be chosen. To this point, a wide selection of statistics is possible by choice of these coefficients. However, the coefficients are now constrained to make the algorithm simple (for ease of programming, to conserve computer core space and to yield economy in run-time). The constraint is that they be powers of a fixed number less than 1; see Chapter 5 for r < 1 Ck =r k-j r < 1 (7.1) -ik =r k-j ro <1 (7.2) k-j k-1 1 < (7-3) r~ is associated with the ~ estimate, r0 with B0 and rl with B1. Then for each k, k-j measures time into the past and

-119C C = r2 =r, (7.4) d0 =1, d1 =rO, d2 =r2 (7. 5) e0 = 1, e=r, e2 r, (7.6) and the computations of the statistics are greatly simplified. S12, r='gl S + Y Y (7.7) S12, k 0 S12' k-1 +Yk-1 Yk-2 S02, k = r S02 k- + Yk Yk-2 (7.8) B /b = r B /b + (Y- k Yk-)(Y - k Yk (7.9) Bl,k/b r1 Bl,k-_l/b1 + (y k k - Yk)(Yk-1 k-lYk-2) ( ) The gain terms b0 and b1 are determined for large k. k k -b im lk = (7 11) 01 A - r(7.1) k-x oo j=3 k-xc i=0 O b ~ 1 = 1 (7.12) 1 -r Nominal values of ro, r0 and rl can be selected by inspection of the sample functions. These coefficients will be refined during testing and will remain fixed when found. A qualitative discussion may help to introduce these calculations. If the statistic were nearly constant, selecting r just below

-1201, at 1 -. 001 for example, would yield a filter coefficient of.10 for data at time k - 2300. For r = -.01, a coefficient of.10 is applied to data at k - 230. In either case, the statistic would be insensitive to observation noise and they would not follow local changes. On the other hand, if the statistic has significant timevariation, then r should not be close to 1. This would permit tracking of the variations but the estimates would be more in error (noisy). There must be a compromise between these two extremes. The following discussion yields a nominal value for r. The sample function in Fig. 7.2 is observed to have considerable "updown" motion. For this model, any regularity in this behavior should be ignored, for the one-state model (n =1) can make use of only the observation, and not its derivative. Assuming that Dk is almost constant (steady), the value of r~ should be very close to 1.0, so take a nominal value of r = 1.0-.001 =.999 (7.13) If Ck should be allowed to be time-varying, then a smaller value of r is needed. The shortest "ramp" in the data is of the order of 10 observation intervals. Take r = 1.0- 2x10 = 1.0-.05 =.95. (7.14) The calculations below give a nominal value for r0 used in

-121calculating B0. This statistic is based on difference terms of the form Yk Yk- = uk + vk- k k-l. (7.15) Due to the dependence on vk and vk 1, these terms are not independent. However, since v is postulated as having smaller variance than u, approximate independence exists. Furthermore, this condition is used only in this calculation of r0 and an upper bound will result. A digression to recall the chi- squared distribution, denoted by XZ, and its properties is necessary. Theorem Let x1 x2,..., xn be a random sample from a normal distribution with mean 0 and variance a2 often denoted as N(O, 02). Then n /x.\2 n i a <2 i=1 i has a chi- squared distribution with n degrees of freedom. The estimator A0 k for A0 is of this form and although BO, k is not exactly Xa, its behavior will be grossly similar to X2 Based on chi-squared properties, a value of n will be chosen from which r0 will be evident.

-122Given a random sample of size n from a normal population with mean = 0, the maximum-likelihood estimator for u2 is Z i= 1 x.2 2 an X2 (7.17) n 1 n The density for the estimator is therefore f(a2) [n=121 2 n2 n 2 (nI2)-1 -n a2/2f2 [n1 2- 1] H I2 for (2 > 0 A confidence interval with level. 80 means that the true value of the parameter being estimated will be in this interval with probability.80. We arbitrarily seek n to make the confidence interval shorter than.4 a2. Using the Handbook (Ref. 6 ) and the condition that n is large, the results are tabulated in Table 7.1. On the basis of the length of the confidence interval, n = 90 will suffice. To get the corresponding r0, gains are equated. n b = 1- r0 (7.19) r= 1- =.989 (7.20) This is the nominal value for r0 but it should not be larger than this.

-123(n/X2, n/X2 ) Z ~0 O 90 n X X9 80% Confidence Interval Length ~90.10 40 29.0505 51.8050 (.772, 1.375) 52.603 5z 50 37.6886 63.1671 (.792, 1.325) 5aZ.533 a2 60 46. 4589 74.3970 (.807, 1.290) 52.483 52 70 55.3290 85.5271 (.818, 1.265) 52.447 2 80 64.2778 96.5782 (.829, 1.242) 5(2.413 5r2 90 73.2912 107.565 (.836, 1.228) (J2.392 a2 100 82.3581 118.498 (.844, 1.215) 52.371 52 Table 7.1. Confidence intervals By reducing r0, better tracking of the parameter may result, but a larger confidence interval results for the same level of.80. Such a simple procedure is not possible for rl, because B is a sum of product pairs, not a sum of squares. Hence B has 1, k increments of about the same magnitude as has B0 k' but clearly they may be positive or negative. So Bl, kl is smaller than B0 k and has larger variance. A larger sample size n (or r1 closer to 1) is required. A development of the density for B1 and a calculation of the confidence interval have not been fruitful. A nominal value is needed, so a sample size 5 times that for r0 is used. rl 5 1 (90) - 1_ 0022 (7.21)

-1247.3.2 Initial Conditions. In Chapter 3, initial conditions for the Kalman-Bucy equations have been given in terms of (, Q, R. P2 was determined from a steady state condition and x0 = 0 was assumed. In Chapter 4, initial conditions on the estimators for ~, Q, R have been expressed in terms of D, Q, R. Only initial conditions for 5, Q, R are needed and this is the scalar case. Viewed as a first-order Markov process, the sample functions of the T-process in Fig. 7. 1 are not unlike those of the computer-generated sample functions in Fig. 6.5. No marked trends are apparent and the up-down motion might be attributed to the random nature of the received signal. By comparison with the sample functions of the Gauss-Markov processes in Section 6.3, the following nominal values are assigned to the initial conditions. Initial time is at k = 0, 1, 2, according to the notation of Chapter 4. = = =.95 Q0 = Q1.50 (7.22) R0 = R = R2 =.05 where the last choice of R2. 10Q2 is quite arbitrary. These values may be changed as explained in Section 6.5 on experimental procedure. 7.3.3 Filtered Process. The filtered process is the output xklk produced by the Kalman filter. Figure 7.3 shows the result for the model being studied. Observe that Xklk follows the observed

C) 1-r =.05 1-r0 =.011 Co' 1 —r1 =.0022 =.95 o 1~~~~~~~~~~~3~~~~Q =.50 co o + + +^ =^.05 0 4.00 2000 60.00 800 00.00 120.00 140. 00o +6. DISCRETE TIME K (XlO-1) Fig. 7.3. T process estimate Xklk for n = 1 Xkl

-126process fairly closely but that there is some smoothing. This is due to the high values of Ak as computed by the gain equation. For comparison to other models below, the forecast xk+jlk for j = 1,..., 20 and for several k are given in Fig. 7.4. For n = 1, and for a stable system with ~D < 1, the prediction is an exponential, decaying toward the mean of the process. The corresponding series of Dk' Qk and Rk are given in Fig. 7.5. These are for initial conditions of.97,.120 and.006 respectively. Ak is very close to 1.0, and so Pk is approximately Qk + Rk' which is just slightly larger than Qk' and is not plotted. 7.3.4 Performance Results. This one- state model has poor performance, as measured by several indicators. The filtered process seems to continually lag the observed process, for the simple model cannot follow rapid changes. The prediction error covariance Pk is small, being very close to its lower bound Qk + Rk, but the performance index JOk is much larger than Pk. Pk and JOk are plotted in Fig. 7.6. Consequently, the one-state model of the T process may be rejected. This model is not adequate as a smoothing filter, nor as a prediction filter (forecasting). The resultant sequences {%k}' {IQk}' {Rk} are therefore not very reliable. These conclusions are in part based on the significant improvements made by extending the model to a two- state nmodel.

o nl1 o C- mode = 1 1-r =.05 1-r0 =.011 o 1-r1.0022 0 * =.95 QO =.50 0 ~o | ~~~~~R =.05 o + + =;. + + + -D + + + + + ++ i + I + DISCRETE TIME K (X10-1) Fig. 7.4. T process: forecasts for n 1 Xk+jlk f

o O o 0 1-r =. 011 o l-r1 =.0022 L=)0 ~r oCo C1 Q 0.00 20.00 40.00 60.00 80.00 100.00 120.00 1400 160.00 DISCRETE TIME K (X1 01) Fig. 7.5. Estimates (k Qk' Rk for n= 1

cl o LO) ~~~~~~~x x l~~1-r i.05 o 1-r =. 011 0 C;-:r~.~~~~~ | l~~~~~~1-r1 =.0022 x O =.97 0 ooCx x QO~ =.120 "8nt ~~~~~~~~~~~R = 06 B I H~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cD [ xx x co >0 I__ X ~o 0"'x x x cu| ~x x x~~~~~ x x x x c)~~~~~~~~~~~~~~~~~~~~J x x x x x x ~00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 DISCRETE TIME K (X10-1 ) Fig. 7.6. Pk, J0k for n=l

-1307.4 Modified Two-State Model, n = 2 The model to be considered in this section is a two- state model from the optimal estimate vantage point, but is a one-state model from the parameter estimation viewpoint; hence the term modified two-state model. To clarify this modification, a table of variables and dimensions is given. Kalman Filter Parameter Estimation k2 x 1 scalar y(1) Xklk 2x 1 k 2 x 2 scalar 0k Ak 2x2 Y~ ~2x2 Pk scalar scalar scalar Rk scalar scalar A scalar Table 7.2. Dimensions in modified model The estimation equation for Xklk is 2-dimensional but other calculations are scalar. The first component of Yk is y(k1) and is used ot obtain the scalars k' Qk' Rk. The matrices ~k H k' Ak are obtained by the following

-131extension, which uses the approximation discussed in Section 2.3.3. 1- (1-0k)/3 11 ~12 k k = [: = - ] (7.23) L21 222 -2(1-k) k Hk = I (2 x 2) (7.24) Ak - (7.25) w e = AA22 where A1 = A22 is the gain from the scalar computation. These matrices are required in the filter equations (Eqs. 6. 5 and 6. 6). 7.4. 1 Minimum Mean-Square Fit to Obtain Vector y. This section contains a general discussion of a technique which can be used to get multi-dimensional observations of a process and a particular scheme to get a 2-dimensional observation. In general, the observation matrix Hk is mxn, where m is the dimension of the observation y and n is the dimension. of the process x. Usually m < n and it is desirable to have m =n. But if the observations are restricted to m = 1, the processing can still be achieved; there are two possibilities. i) Accept Hk as lxn. The Kalman equations are formulated for the general case, given ~k' Qk' and Rk.

-132ii) Obtain the full observation vector y, where m = n and Hk is nxn. As already pointed out in Chapter 4, the estimation schemes require an n-dimensional observation. Furthermore, to account for the oscillating tendencies of the process, higher-order models and observations are required. For these reasons, item i) is rejected and an algorithm is presented to obtain the two-dimensional y. The algorithm is based on a minimum mean- square fit of a line to the scalar observations. It will be successful if the original observations can be taken at much higher rate than the minimum sampling rate. The result will be a higher- order observation taken at a slower rate. The details for the case n = 2 follow. A minimum mean square (MMS) fit of a straight line to the data points is sought. Let the line be k = sk + y0 (7.26) 0 Error = ~ Wk(Yk- sk- )2 (7.27) k< 0 where wk is a weighting coefficient and Yk is the data point. The partial derivatives of this mean square error with respect to s and Y0 are set to 0 to obtain two equations in s and y0.

-133Lwkkz w k ~ w ky k k k (7.28) CW k J Y Wk Y kk Select the number of data points as 5, the weights wk = 10+k for -4 < k < 0 (chosen rather arbitrarily); this corresponds to having observed the data point at k = 0 and the last 5 being fitted by a straight line. The required solution for s and is readily accomplished. s =.226y +.087y_ -.026y_2-.113y_3-.174y_4 (7.29) Y0=.645 Y +.378y 1+.155 Y2-.023y 3-.155y 4 (7.30) As a check, observe that if the Yk sequence is a constant, then s = 0 and y= - constant. If the Yk sequence is a straight line, then indeed s = slope of that line and y0 is the ordinate intercept. To summarize, this processing of the field data can be diagrammed as shown. Henceforth, the observation vector is the output of such an algorithm. Y y ---- MMS Fit Y dy Fig. 7.7. Line fit for vector Yk

-134As a parenthetical note, numerical differentiation followed by averaging has also been investigated to achieve the higher- order observation vector, but the results were not satisfactory. The methods of numerical differentiation assume that the data is almost noisefree and that the process is a polynomial. However, the approach cannot be disregarded categorically and should be studied at greater length. 7.4.2 Coefficients in the Estimators. The coefficients r ro0 r1 are needed for the estimators. The nominal values of rO and rl are as determined for n = 1. r =.989 (7.31) r. =.9978 (7.32) A nominal value for r when n = 2 will be obtained. From the sample function in Fig. 7.2, a marked sinusoidal character is observed; a minimum period occurs at k = 1050. sampling interval = A = 52 sec. s minimum period = 40 A maximum frequency = f =.00048 Hz m 40 A reciprocal time constant = 1 = 2X = 4vf =.00603 T m m

-135r = -1 =.99397 (7.33) Thus we have chosen the digital equivalent of a filter which has a single pole at - -, so that the gain of the double frequency term will be 3 db down. Returning again to the sinusoidal behavior, the following table lists some of the observed periods and shows ~. Period =a X = cos 5w a 50.1255.80920 75.0838.91349 150.0418.97824 Table 7.3. Observed periods S is discussed in Section 2.3.3 and S = 5 is the number of data samples used to get an observation vector yk. These are trial values for 0 in the modified second-order model. 7.4.3 Filtered Process. The filtered process Xkk for this model is illustrated in Fig. 7.8. It is readily apparent that xklk is a smoothed version of the observed process. Figures 7.9, 7. 10 and 7. 11 are detailed illustrations of this filtering action of the Kalman filter. 7.4.4 Performance Results. The performance of this modified two- state model is substantially better than that of the one-state

C.) mode =1 1-r, =.006 1-r =.011 1-r =.0022 06 Q.95 0 R =. 05 o ~~+ + CccZ1'~ + +~t~f~l~T a r +~~; a:~~~~~4 I + 4+R tIIIP o + + + o~~~ + CCk C) 1.00 20.00 40.00 60.00 80.00 100.00 i20.00 140.00 160.00 DISCRETE TIME K AX10-1 A Fig. 7.8. T process: estimate xklIk for n =2

lo mode = 4 NT 1-r. =.06 1-r0 =.05 1-r1 =.01 co 93 Qo =.15 0 Ro=Rk =.05 o o + + =,: " ++ ++ LU + 4i CE' i + +' ++ +.4+N + ~1;+ + +4 +;,.......,....,, -w + ++ + + + 4.00 20.00 40.00 60.00 80.00 L0 0.00 12000 + + 60.00 DISCRETE TIME K (X1O1 ) Fig. 7.9. T process: estimate Xklk for n=2 Xkk Io nk

mode = 3 o 0 l-r =.06 1-r =.05 1-r =.01 =.90 R0=Rk.10 o + + COD+ + O2 20. 00 40. 00 60.00 80.00 100.00 120. 00 140. 00 160. 0 OISCRETE TIME K (X10-1) Fig. 7.10. T process: estimate xklk for n =2

C-4 1-r -.01 o 0 WD1-r0 =.01 i = 1-r.005.. C) 0oo0 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 DISCRETE TIME K (XIO-1) Fig. 7.11. Estimates qk' Qk' Rk for n=2

-140model. The illustrations of xklk have already made this clear in a qualitative way. In this section, the performance will be put to a quantitative test by inspection of the error covariance sequence {Pk} and the performance index sequence {J k}. The error covariance Pk is shown in Fig. 7.12 along with the performance index JOk. Pk is the solution of the variance equation and depends on k, Qk' and Rk but JOk, defined in Eq. 3.52, is another measure of the mean square error between the observed and filtered process. The two statistics are not the same, but they both indicate mean quadratic properties, so it is not surprising to find them nearly the same. A significant result which the model should produce is a reasonable forecast of the future values xk+jlk of the process based on the present state (as estimated by xk I k) and the knowledge of the system parameters (as estimated by ck, Qk' and R). That such forecasting, or prediction, is possible and sensible is a fact which will be demonstrated now. Given that xklk is the state of the process at the time k, then by the very definition of a state variable, subsequent states can be obtained from the system if the input is known. In the problem being studied, the input is a Gaussian white noise sequence with zero mean. A direct consequence is that the best forecast, in the minimum mean square sense, is the response of the system with initial

C) C)r mode 1 P-r.001 o ~~~~~~~~~~~~~~~~~~~~0 -.0022 -.95 C).50 QO g R.05 co 0 LU~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~CD -J Cv JDk C) C1 60-00o~a s~o 80.00 to o 10(0 100 E00 DISCRETE TIME K (X10- F ig. 7. 12. Error- performance Pk and performance index JOk 0~~~~~~~~~~~~~~~~~~~~~

-142state as xklk and no further input. In control terminology, the free response of the system from any time k with state xklk is the best prediction of future values of the stochastic process x. There are two general sources of error in these forecasts. First, the state of the process is not known exactly, for xlk is an estimate, being the conditional mean using all past observations. Second, as hypothesized in this work, the underlying system Dk and the noise covariances Qk and Rk are not known exactly. Such imprecise knowledge of (Dk and its variation in time make forecasting far into the future very difficult. However, for short-range forecasting such as 20 observation intervals, sensible results occur. Figure 7.13 is a composite of several such forecasts; the filter has been operating as a KalmanBucy estimator of xk up to the left-most point of each curve; then a 20-point forecast is drawn. Observe that specific values are not predicted very accurately, but general trends are often correctly indicated. Unfortunately, not all such trends are predicted, for the model is only a second-order, undamped system. 7. 5 Comments on the T Process The T process is average acoustic signal power, measured in a fixed frequency band by a comb filter, and expressed in db., as defined in Section 6.4. A periodic signal with known frequency spectrum is presumably being transmitted. At the hydrophone, the

Co o mode = 4 1-r =.06 1-r.05 o 1- rI.01 1 a)!,0 =.93 QO =.15 R0=Rk=.05 C: C_"+ + 0 + + + + Co Fg + Cl+ f+ + +E ++++ ++ + + + % xk+jlk

-144received signal has impressed upon it the effects of the ocean, including propagation variations, surface effects and noise. In modeling this T process, conditions assumed were that the process is Gauss-Markov and that a linear, stochastic model of order n is adequate. The results for n = 1 were not appealing, but the results for n = 2 and the model modified for the estimation equations, showed some interesting features of the process. The amount of data was insufficient for a definite conclusion, but indications are that the process is second-order stochastic. The time-varying model has been shown to be feasible; however, the resultant sequences k, { Qk } and { Rk of the model parameters have not been studied for physical significance. The capability of the model to forecast the future values of the process is noteworthy.

CHAPTER 8 MODEL APPLIED TO THE N PROCESS Synopsis 8. 1 Introduction 8.2 N Process Sample Functions Shipping noise Biological noise 8.3 One-state model, n = 1 Coefficients in the estimators Initial conditions Filtered process Performance results 8.4 Comments on the N Process -145

-1468.1 Introduction The N process, as described in Section 6.4. 2, is a broadband noise power measurement expressed in db. It is the output of the N portion of the Dual Comb Processing in Fig. 6. 6. Unlike for the T process, no ready-made demonstration is known to the author to show that the N process is Gauss-Markov. The argument used for the T process is not applicable, because the process represented by | Nx + j Ny is not Gauss-Markov. However, the N process is assumed Gaussian on the grounds that it is a time-average power measurement of the process iNx + j Ny, which is itself the effect of a multitude of macroscopic causes. Concerning the Markov condition, a plausibility argument is that the N process samples represent a physical phenomenon and that these samples are close together in time. The physical phenomenon is called ambient noise, and has some finite bandwidth. The Markov-n condition is satisfied for some n. The model for n = 1 is used below and evaluated. 8.2 N Process Sample Functions The N process is illustrated by the sample functions in Fig. 8. 1; they are over the same time intervals as in Fig. 7.1. Observe that the process is somewhat over-sampled and that it has a much larger variance in comparison to the T process. Very interesting aspects of this process are the extremely large excursions from the mean value. These high noise power "pulses" are attributed to

+ + C) + + + 4 + ++ + +I + + 4 4+ $ I4- +4. o ++ + + it. + *+ + + a:++ *+*,+ + + + + 0. i+ + ~++++ + ++ -+ + + ++ 4L - 4 + 4~ — + 4 + + #,-., + + + + a -4 ++ + + I Fig. 8.1. N process+ +sample function ~I + + + m. + C4 [700.00 750,00 800.O0 850.O0 900.00 95. O0 1000.O00 DISCRETE TIME K Fig. 8.1. N process sample function

-148shipping noise and biological noise; their duration is typically 15 to 20 minutes. Shipping noise is a very strong contributing factor to the ambient noise in the Straits of Florida. When a ship is close to the (bottom-mounted) receiving hydrophone or when propagation conditions are such as to allow distant shipping to be heard, the noise level rises sharply, only to fall as the condition fades. The Dual Comb Processing has an analog prefilter which has a bandwidth of about 100 Hz centered at 420 Hz, and this is in the high, attenuated portion of the shipping noise spectra (Ref. 21). Low frequencies are much more predominant, so by listening to the total band, shipping can be identified. Positive, visual identifications were not made. Biological noise is also a strong contributor. The swimming action of schools of fish, the feeding noises of masses of aquatic animals and the sounds emitted by some species are examples of noise sources. Usually, such sources just contribute to the ambient noise in the ocean, but occasionally a source is near the sensitive hydrophone or propagation conditions enhance a far-field source, so that a large increase in noise level results. There are several other factors involved. Weather conditions affect the noise level directly and propagation conditions change due to surface effects. Strong correlation between wave heights and noise level have been reported under various conditions by many

-149researchers, as documented by Urick (Ref. 21). There are also micro-tremors in the ocean floor and major and minor earthquakes. This list is not intended as a catalog of noise sources, but rather as an indication of the ambient and transient sources contributing to ocean noise, which the N process measures in db. 8.3 One-State Model, n = 1 Only the simple one-state model, for which all equations are scalar, will be studied. Some of the comments made in Section 7.3 about this model for the T process apply for the N process as well. 8. 3. 1 Coefficients in the Estimators. The model uses the estimators of Model C, in Fig. 4. 1, so that parameters for the nonlinear processor and the coefficients ckj, dk_j, ekj must be chosen. The limiting values in the nonlinear processor are chosen as ~4.0 db about the running average of the filtered data; inspection of the sample functions shows that this processor will eliminate most of the large sample values attributed to shipping or biological noise. It is not a purpose of this model to include such effects and their elimination is essential for the proper functioning of the estimators. The sample functions at the output of the nonlinear processor are illustrated in Fig. 8. 2. The coefficients are selected as in Section 7.3.1 as powers of r <1. The nominal choices of r0, r0, r1 are made intuitively

0 0 0 - C) ro z +! - i;~+++ + + +T++ + +?f z + ~ - + I ~ tt~uiiiin 4+~- — + + + + 7+ *+ +j + + * + ++ + + 4j ++ ++ LI + + + + + + oD + + + + + ++ ++ + + ++ -+ + + + +++ - + + + +++ ++ + + +o %++ + + - _ -. 00 20.00 o40.00 60.00 80.00 100.00o 120.00 1 40.00 160. 0 DISCRETE TIME K (X10-1) Fig. 8.2. Output of the nonlinear processor Fig. 8.2. Output of the nonlinear processor

-151and by inspection of the sample functions. For this study, Dk is assumed steady and a high value of r0 =.99 is taken. In Section 7.3. 1, r0 was chosen to put the noise statistic B0 k in a confidence interval with level.80; r0 =.989 was the result. Here, to allow for some time-variation in B0 k but at the expense of a poorer estimate, r0 =.95 is the selected value. As previously, the ad-hoc choice for r1 is to have a sample size of 5 times that for r0: rl =.99 is the result. 8. 3. 2 Initial Conditions. As discussed in Section 7. 3. 2, initial values for only @4, Q, R are needed. The state transition matrix i, is just a scalar and has the meaning of a correlation coefficient. An initial value of h0 -l= 4=2 =.80 0 1. 2 is assumed, because the N process is not as correlated as the T process. The observations Yk of the N process in Fig. 8. 2 have an approximate variance of 4 db2. If the observation noise variance is given an initial value of R0 = R1 = R2 = 1.0, this allows a variance of 3 db for the random variable x0: A0 = 3.0. Then from Eq. 3.45, Q 1 =2 = (1-4@02) A = 1.08

-1528. 3.3 Filtered Process. The filtered process x k is the output of the Kalman filter, and is illustrated in Fig. 8. 3 for these coefficients and initial conditions; the parameters 1, Q and R were held fixed. Consequently, the filter gain A is constant and has a value A =. 593 computed by the equations. It is apparent that this filter does much smoothing. Figure 8. 4 shows the results if the same initial conditions are applied, but in this instance the parameters Ek' Qk and Rk are estimated and time-varying. There is not so much smoothing because Ak has an average value of about.85. The plots of (k, Qk and Rk are given in Fig. 8. 5. Once again, the estimate for Rk seems to be low; selecting a constant value Rk = 1.0 and repeating, produces Ik and Qk in Fig. 8.6. 8.3.4 Performance Results. The prediction filter produces the forecasts Xk j k for the process x given observations up to time k. The noise process N is being modeled with a model of order one (n = 1), so that the forecasts are exponentials decaying to the running average, as for the T process, n = 1 model. The results are illustrated in Fig. 8.7. Due to frequent shipping and biological noises, the performance indices Pk and JNk apparently convey little information about model performance. 8.4 Comments on the N Process Having applied the model to the N process and seeing the

Co cm, - 1-r =.01 1- r.05 0 1 =.01 o4='.80 ~o ~0=k =. Q0= Qk='1.08 R =R = 1.0 0 k + C:4 4- 4-+.. i ~~~LLJd ~ ~' + +-/+ + +. Fig. 8.3. N process: estimate kk for n1, time-invariant model +fiif~ +. t 9 + 1: t'.0 20.00 40.00 60.00 80.00 00.00.120.00 140.00 DISCRETE TIME K (X10-1) Fig. 8.8. N process: estimate xklk for n = 1 time-invariant model

CO 1-r =.01 00 1-r0 =.05 l-rl =.01 =.80 ao = 1.08 OQ t. R =1.0 0 uI~~~J + + + + + + oti + + + T~~~~o~~ To 06 -4 + A-4- -.00 20.00 40.00 60.00 80. 00 100.00 120.0 1L.0 14060. 00 DISCRETE TIME K (X101) Fig. 8.4. N process: estimate xklk for n = 1, time-varying model

C, uL 1T l-r =.01 0 =.8 l-r0 =.05 Q= 1.08 0o~0. 0 1-rI.01 R0 = 1.0 LO CD LO + u'J M a Ae A + A.0 o0 20.00 40.00 60.00 80.00 oo100.00 120.00 10o.00oo t60.00 DISCRLTE TIME K (X10'1 Fig. 8.5. Model parameters q>, Qk' Rk

mode =1 1-r.01.8 0 =. 1-r=.05 = 1.08 1rI=.01 R 1.0 1 o=Rkk. 4 a ~~~~~~~~~~~~~~~~~ & LI) 4 0~~~~~ 0 ~ 1 ClIl.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 DISCRETE TIME K (X10' Fig. 8.6. Modelparameters Dk' Qk with Rkl.O

CO > 80~~~~~~~~1-r =01.80 R:. 0 0 1r=.05 Q0 = 1.08 o o ~~~00~ ~ ~~~~~~~~ L~E + ++ + + + + ++ o 4`~ | z+ + + 0> I 4-i-+DSCRETE TIME K (X 10-i i Fig. 8.7. N process: forecast Xk+lk for n + 05g:.7. Nproces::' m;t~k+j I

-158results, it is apparent that the model is valid. There is obviously correlation in the observations, and furthermore, the estimator for v, shows that it is time-varying. The shipping and biological noise "pulses" detract from the effectiveness of the modeling of the actual ambient noise. The model does not, and could not have been intended to, model such "pulses". The absolute values of Qk' Rk are therefore in doubt, but their relative values and their time-variations can be regarded as correct. The smoothing of the data by the filter and the short-range forecasting of the process are acceptable.

CHAPTER 9 CONCLUSION Synopsis 9.1 Summary of Results Theory and methods Data processing Achieved objectives C ontributions 9.2 Future Work -159

- 1609. 1 Summary of Results The major results and contributions of this thesis are summarized. The major purpose has been to study nonstationary stochastic processes. A general linear model has been given and under the Gauss-Markov condition, viable solutions to the filtering and prediction problems are well known. Since such a linear model has parameters which are usually unknown, a set of estimators was devised to obtain these parameters from the data sequence. The data processing parts of the thesis include a general discussion of a class of linear filters and processors and a descriptive flow-diagram of the computer processing. The model was tested briefly with randomly generated data and applied to two real acoustics processes. The highlights of each of these items are reiterated below. 9. 1.1 Theory and Methods. In this study of nonstationary stochastic processes, strong emphasis has been placed on the nonstationary properties of the physical processes which have need of being investigated. Such time-variations of statistics are often exhibited by natural phenomena and it is only the exceptional process which is adequately described as stationary. The need for a stochastic study has been amply demonstrated by the two acoustics processes: acoustic signal power, the T process and acoustic noise power, the N process. Both are affected to varying degrees by winds, tides and

-161weather, by natural and man-made noises. Fortunately, this multitude of factors does not need to be described one by one, but rather their gross result is of interest to researchers in detection theory and communication theory. This state of affairs has led many to model-building and this work has been no exception. For compelling reasons listed in Chapter 1, models for nonstationary stochastic processes were developed and studied. For mathematical tractability, the models are linear and the processes are constrained to be Gauss-Markov. Due to the stochastic nature of the model and the time-varying parameters, the state space approach is particularly suited to these problems and the n'th order model is readily treated. The model equations are given in Eqs. 6.1, 6.2, 6.3. The filtering and prediction problems, stated in general form in Chapter 1, have viable solutions due to the three key conditions: linear, Gauss-Markov. The driving process uk and the observation noise vk are assumed Gaussian white noise sequences. By linearity, the output process xk and observation process Yk are also Gaussian, so that the conditional probability density of the process Xk is determined by the conditional mean and conditional covariance. Coupled with the Markov condition, the solutions take the form known as the Kalman or Kalman-Bucy equations, as given in Eqs. 6.4 to 6.10.

Two significant observations have been made concerning these solutions. The process estimate Xklk is a weighted sum of the previous estimate Xk - Ik-1 and the present observation Yk; the optimal weighting is determined by the system matrices Ok and Hk and the optimal gain Ak. This has the practical implication that the filter is quite simple and does not have a growing memory. Secondly, the prediction error covariance Pk and hence the optimal gain Ak are not dependent on the observations; in some applications, advance computations are possible. An important part of modeling is the determination of the system parameter matrices, such as the state transition matrix ~, the noise covariances Q and R for this model. For the processes studied, little information exists to determine the parameters from physical laws. Parameter estimation was adopted and a set of consistent estimators was discussed. Simple modifications to these estimators were made in Chapter 4 to permit "tracking" of the timevarying parameters. The parameter estimation equations are given in Eqs. 6.11, 6.12 and 6.13. 9.1.2 Data Processing. Actual data was used in the experimental part of this thesis. Data acquisition, the orderly collection of considerable amounts of observations of the real world, is a very important prerequisite to a study such as this. The previous work of others (Refs. 7, 19 and 20) and the contemporary development of

-1 63the Dual Processing by T. G. Birdsall were the foundations of the acquisition of data used in Chapters 7 and 8. The Dual Processing of Fig. 6.6 uses a class of processors which were given the name MPR processors and treated at length in Chapter 5. Two important features in the data processing should be emphasized. It was necessary to use a nonlinear processor to eliminate certain observations, such as those due to shipping in the N process study. The other feature is the use of curve fitting to obtain the vector observations Yk, as presented in Section 7.4. 1. Fulldimensional observations are needed by the parameter estimators, yet the Dual Processing provides only scalar observations; the MMSE Linear Fit shown in Fig. 6.1 bridges this gap. The model was applied to an acoustic signal process called the T process. A periodic pseudo-random signal at constant power level was transmitted from Fowey Rocks, near Miami. The received signal at the hydrophone, 7 miles east in the Straits of Florida, was cabled back to the laboratory for processing by Dual. T is the log [time-average quadratic of a processor output]. The model assumes the T process is Gauss-Markov and the parameter estimation and filtering equations were used in the computations. The T process observations exhibit cyclic behavior, so that a second-order model is appropriate. The filtered output xklk of the Kalman filter of this model is a smoothed version of the observation

-164vector yk, as was to be expected. This estimate klk is the conditional mean of the process Xk, given all observations Yk in the past, and is optimum for any convex criteria. Furthermore, the Kalman prediction filter produces xk+j k' which is a forecast of the expected value of the process Xk+j given all observations Yk in the past. This forecasting ability of the model is a natural result of the state-space formulation. For example, as shown in Section 7.4.4, short-range forecasting over 20 observation intervals is apparently reasonable, and is a technique with potential applications. The estimates of the model parameters, the state transition matrix ~ and the noise covariance matrices Q and R, are apparently adequate. No absolute criterion of judgement is available, because the true values are unknown to the experimenter. However, Monte-Carlo tests indicated acceptability of Ok and Qk although Rk was troublesome and needs more work. Chapter 8 applied the scalar model to the N process, which is the noise power process produced by the Dual processing. The process contains shipping and biological noise "pulses" far greater than the ambient ocean noise. This fact and insufficient data for the process really prevent sweeping conclusions. The estimators and Kalman filter for the scalar case were operable and produced sensible results. 9. 1.3 Achieved Objectives. The objectives stated in Chapter 1

-1 65have been achieved, albeit for only a special class of processes, the Gauss-Markov. The density function is a complete statistical description of a random variable or process. Since the conditional probability densities of xk are Gaussian, knowledge of the conditional mean and conditional covariances completely solves the filtering and prediction problems. The conditional means are denoted by Xklk and Xk+ lk and the conditional covariances can be determined from the prediction error covariance Pk. The solutions have the intuitive appeal of being cast as filters and predictors. An objective in this study of models for nonstationary stochastic processes is to provide information on the behavior of the model parameters. These results have been limited to sequences of the parameters Ok' Qk' Rk as estimated from the data. No further studies of this data have been done. 9.1.4 Contributions. Though much of the work reported here is that of others, as documented, some of the small contributions to the practical aspects of stochastic modeling are briefly described. 1. In Chapter 3, a set of steady-state initial conditions were found for the Kalman-Bucy equations. This was found useful in working with real data. 2. In Chapter 4, estimators were given which are called tracking estimators, in the hope that a proper choice of coefficients by the engineer will permit time variations of the parameters to be

- 166followed. Also, a reasonable set of initial conditions were found for the statistics used by the estimators. 3. Several forms of digital processors were discussed. Chapter 5 is a new and unique treatment of the class of processors which have been labelled MPR processors. The phrase "multipositional recirculators" is descriptive of these processors. The nonlinear processor to eliminate shipping noise and the curve fit to extract the vector observation are examples of ad hoc designs which are very important. 4. To the author's knowledge, these applications of stochastic models are the first made in the field of ocean acoustics. Indeed, it should be pointed out that the Dual Processing data represents the first large quantity of data taken uniformly in time and made available in digital form for post-processing. The data reported on the T process and N process is but an initial step and part of more such data in a continuing study. 9.2 Future Work One of the interesting characteristics of research is that it usually points to the need for a wider theory, the possibility of another application or opens up new avenues to existing problems. Exceptional and the exception is the report which completely documenits a field of research and leaves nothing to be done. Future work on this and related research problems is abundant.

-167Some of these areas have been mentioned in the body of the text and some are offered for the reader's consideration. If some of these suggestions seem vague, I caution that they are not yet well-defined and that is often part of the problem. 1. Concerning the oscillations in the T process in particular, a propagation model is suggested which may account for the fluctuations. Consider distant noise sources and propagation effects which cause constructive and destructive interference at the hydrophone. It is not clear that such a propagation model, incorporated in the stochastic model, could account for such changes. 2. More sophistication in observation techniques is certain to have good results for some processes. For example, if it were possible to observe the Yk vector rather than just a scalar sequence, better forecasts can be expected. Also, wind speed and wave height have been shown to correlate with noise level (Ref. 22); by increasing the observation vector dimension and the model order, better estimators of the noise levels should result. 3. This obviously points to the need to study other processes (such as wind speed data), both by themselves and jointly. The theory exists but the application is lacking. 4. Significant improvements are expected in modeling by finding better estimators. The shortcomings of the estimator Rk have been discussed. Perhaps a higher order "filter" in the statistics

-1 68is required. 5. Asking for more dimensions and better estimators increases the computational needs. For a study of computational aspects, in particular for large dimensional models, the researcher should consult the thesis by Pentecost (Ref. 14). 6. For sensitivity studies, concerning the effect of estimates on the Kalman filter, see the work reported by Bucy and Joseph (Ref. 4). 7. Having obtained better models and sufficient data on the process, extensive studies on long-range forecasting and its reliability seem necessary before this concept can be correctly applied. As for other areas of application, there come to mind weather system studies and forecasting, ionospheric propagation studies and the field of economic systems. Considerable data is currently being gathered on these topics and some modeling has been done. For other applications, see the text by Bucy and Joseph for examples.

APPENDIX A A TECHNICAL GLOSSARY This appendix collects the definitions of all the technical terms which may be unfamiliar or new to the reader. All underlined terms in the main body of the text are defined here. The starting point for these definitions is a knowledge of terms in probability theory. The probability space is denoted by (52,,d, P), where ~ is an abstract set, called the sample space, 3S is a a-field of subsets of ft and P is a probability measure on?~. Let o be a subset of,F and let W be the a-field on the real line and W(s) is the field generated by the space s. Almost surely A condition is said to hold almost surely if the set on which it does not hold has probability zero. Chi-squared distribution The chi-squared (chi-square) distribution is a particular form of a gamma distribution with a = - 1 and I = 2, _ 1 1_ 2 7 -x/2 f(x;n) x e n 11! 2n where n is called the number of degrees of freedom. -169

-170Conditional expectation The conditional expectation E {x I) is any random variable measurable (62, A) such that f E{xl]} dP = f xdP, all D e D D Any two versions differ on a set of probability zero. If ~ = e(Y1, Y2'...), denote E{x1q} = E{xly1,y2,... } Let y be a random variable on (R, ), then for A R, the conditional expectation 0(x) = E{xly=X} is any random variable on (R,W), where P(B) = P(yc B) satisfying S E{xly=X} dP = x dP, all Be. B y B Conditional probability Let 5 be a sub-a-field of. The conditional probability of Aec given 9 is a random variable P(AII) on (12,) satisfying f P(AI5) dP = P(AnD), all Deg D

-171Consistent estimator An estimator is said to be consistent if the sequence of estimates converges in probability. e -e in probability. Convergence in mean-square The sequence x tends to x in the mean square sense if E{lx -x12} - O for n-oc. n This is also called k. i. m. convergence and written.i.m. Ax -x2 _ O Convergence in probability The sequence xn tends to x in probability if for all X, P {lx - xi > X} - 0 for n-oc This is also called stochastic convergence or convergence in measure. Convergence with probability 1 The sequence xn is said to converge to x with probability 1 if P{x -x}=1 for n- cc n This is also called convergence almost everywhere (a. e. ).

-172Convolution The convolution of functions f1(x) and f2(x) is denoted by f(x) = f1(x) * f2(X) and is defined by f(x) f (Y) f2(x- y) dy f fl(x-Y) f2(y) dy -cO -0C Delay line A delay line is a physical device which accepts input (voltage waveforms or numbers) to deliver as output, at a fixed time later, the value of the input. Ergodicity Let T be measure-preserving on (.I, P). T is called ergodic if for every Ae,, P(A) = 0 or 1. Gamma distribution A random variable x is distributed as the gamma distribution if its density is 1 xx e X/P 0<x < a! a+1 f(x; a), =) O elsewhere where 3 > O and a > -1.

-173Gaussian process A process xk (with random variables xO, xl, x2,...) is called Gaussian if every finite subset of variables has a joint Gaussian distribution. Generalized inverse A matrix A# is called the generalized inverse of a rectangular matrix A if it satisfies i) AA#A = A ii) A#AA = A iii) (AA)' = AA iv) (A#A)' = A#A Input-output stable A system is said to be input-output stable if bounded inputs cause only bounded outputs. Joint Gaussian distribution The array Y = (Y1,, Yn) E{Yi} = 0 has a joint Gauss ian (or joint normal) distribution with zero means if there are m independent random variables X = (x1,.., xm), each with the Gaussian distribution with zero mean and unit variance (called N(O, 1)), and if an mxn matrix A exists such that Y = XA

-174Limiter A limiter is a transformation of a process such that the (output) range is bounded above and below. x for x >x a - a Y = x xb <x <x Xb x <xb It is often illustrated by the diagram shown and is sometimes called a clipper. OUT x __ y IN Fig. A. 1. A limiter Linear function A function f(.) is linear if f(ax + by) = a f(x) + b f(y) for all a, b, x and y in its domain. Markov process A process x(w:, t), we 12, t > O is called Markov with state space Se, if x(t) eS, t >0o, andforany Bel1(S), t and

-175T > 0, P(x(t+T)e Bl x(s), s <t) = P(x(t+T)e BI x(t)) a.s. Matrix inverse A square matrix A has an inverse, denoted by A 1, if its left and right inverses exist, so that -1 -1 A A= I =AA Matrix Riccati equation The matrix Riccati equation for continuous time is the matrix differential equation d P(t) = F(t) P(t) + P(t) F'(t) + Q(t) dt P(t) A0 Measure -preserving A measurable transformation T on i2 - i2 is called measure-preserving if P(T1 A) = P(A), all AE,j. A set AEe, is invariant if T A = A for such T; the class of invariant sets is a a-field 9. Norm of a matrix Let RP be the space of pxp matrices with real entires. The Euclidian norm of Ge RP is denoted and defined by

-17611 GI] = [tr(GG')] 2 Orthogonal random variable The random variables xl, x2,... xn are orthogonal if E{xix. = 0 Vi / j Periodic analog of sinc x The function with parameter M sinc(x; M) = sin 7x is called the periodic analog of sinc x. Positive, nonnegative definite A square matrix A is called positive definite if x' Ax > 0 for all column vectors x. It is called nonnegative definite if x' Ax > 0 (where the distinction is the allowed 0 value). Pseudo-inverse A matrix A+ is called a pseudo-inverse of a rectangular (not necessarily square) matrix A if it satisfies the relation AA+A = A.

-177Random sample Let the variables x1, x,..., xn be independent identically distributed with density f(x). Then x1,., xn is a random sample of size n from the population with density f(x). Random variable A random variable r(w) is a measurable function defined on the space (Q2, J). Sinc x The sinc function is defined by sin rx sinc x = 7rX Spectral radius of A Given the square real matrix A, the spectral radius of A is denoted by p(A) and is defined as p(A) lim flAnil n-xc State equation The state equation is the vector-matrix first-order differential equation d x(t) = F(t) x(t) + w(t) where x is called the state vector and F is the coefficient matrix.

-178State of a system The state of a system is the minimal amount of information at to about the effects of past inputs necessary to describe completely the output for t > tO. This does not include the description of the input, but when the input is included, the result is an extended state vector. State transition matrix The state transition matrix is usually denoted by 4D(t, to) and is formally defined as the solution to the matrix-differential equation d ~(t, t0) dt = F(t) 4(t, t0) b(t0, t0) = I Strict-sense stationary process A strict-sense stationary process has all its statistics not affected by a shift in the time origin. Stochastic process A stochastic process is a family of random variables r(w, t) indexed by a set, te4. (t is often time.) Strongly consistent estimator An estimator is said to be strongly consistent if the sequence

-179of estimates converges with probability 1. en - e almost everywhere (a.e.) Symmetric matrix A square matrix A is called symmetric if X.. =... Tapped delay line A sequence of delay lines connected in series is a tapped delay line. Tracking estimator The name tracking estimator has been given in this study to the statistic k k -Ck-j yj 2) ck-j Yj-1 Yj-2 j=3 j=3 because the coefficients Ckj are chosen such that slow variations in the parameter bk can be followed. Uncorrelated random variable The random variables x1, x2,... x are uncorrelated if E{xixj} = E{xi} E{xj} for all i j

-180Wide-sense stationary process A process with constant expected value and an autocorrelation which depends only on the time difference is called wide-sense stationary.

APPENDIX B EXPECTATIONS OF PRODUCTS WITH f, AS FACTORS B. 1 Required Expectations In Chapter 4, the formulation of the estimators for D, R and Q requires the expectations of terms involving Ok and Ok-l' as shown in Table B. 1. E{k Vk-l Vk-l }k Rk1 see Eq. B.20 (B.1) E{ k vk-1 vk-_2 -k-1 r- Rk-2 S2, k-l (B. 2) E {~k Vk- 1 k-l' 1 0 (B. 3) E {k Vk-l Uk-2' 0 (B. 4) E k k- 1 k- 2 } 0 (B. 5) E{k- Vk-2 uk 0 (B.6) E {k-1 Vk-2 Vk } 0 (B. 7) E {k Vk- Vk- 2' vk- 0 (B. 8) Table B. 1. Required expectations A first approximation is given in the second column, which is based on the assumption that 4k is independent of the indicated -181

-182products. This should be a good approximation because Dk - ~D and D is the system transition matrix, which is assumed independent of x, u and v in the linear model. Also, E {k } = k has been used. These are the conditions used in formulating Model B. However, the estimate k is certainly dependent on v. and u. for j < k. To display this slight dependence, the following recurJ sion relations for Ok are required and then the above expectations are found and recorded in column three. B. 2 Analysis of the Estimator for (D Recall that the estimator for ck is k k - YjYj - 3 (j- y j ) k> 3 (4.12) k=3 j=3 For notational simplicity, let _ YJYj-2 and S O k A (B.9) aj Y j- 2 02, k J j=3 y and S 2 (P.10) j - Yj-1 Yj-2 S12, k J j=3 These are nxn matrices and A+ is the pseudo-inverse of A. Then A + k- s02, k s12,k

-183S S k 12, k 02, k k( 12,k-1 k),k-1 k (B. 10) (fk(I+ 6kS12 k-l ) =Sk-l + AkS1Z k-l (B.12) kk (k-1+ ZkS12,k- ) (I+ b6k S2,k-1 (B.13) (- Xk-l + Ak S12 k-I ) (I- 6k S2 k- (B.14) k-1 + (Ak <'k-1 k+ 12,k-1 (B.15) In Eq. B.14 the approximation (I+b)+ I- b is valid for "small" b This is the matrix analog of 1 - b Equation B. 15 results by 1+b retaining only first-order terms in A and 6. By letting k - k-lI in Eq. B. 15 k k-1 + i ( k- 1 k-2k-1 ) 12, k-2 (B.16) So 4f = + (A k - 6) S (B.17) k k-2 k- k-2 k-l 12,k-2 + (k k-1 6k) S12, k-I B.3 Computations A sample computation of the result in (B. 1) will be shown. k-2'' k- ad S,- k-2 are all independent of Vk1- Hence,

-184E{ Vk- 1 Vk-l'k = Ik-2 Rk- I 2 - 3 k- 2 2 xk- 2 S12,k-2 Rk-1 + (k k-1 x, k-2 k-1 k-1 x,k-2) S12,k-1 R_1 (B. 18) These results are obtained by assuming the differences in Eq. B. 17 are independent of vkl. Any error is reduced by the matrix S12 which is small. Also, E{ k,} = Dk and E{S12 k-2 } S 12k-2 have been used. Now note that k S 2, k. E = k ) k- x,k-2 j=3 Therefore, S + )1 -it, -1 s (O; (B 19) 2, k k- 1 x, k- 2 x, k- 2 k-1 ( Finally, E 1-0 v VT + CD 0,;=_ -i -1 {kvk-lvk-l} I [k-2 ( k-1 k-2) k-2 x,k-3 x,k-4 k-3 + ((k - (k-1 k-1 x,k- 2x, k- 3 2 RkI (B. 20) A brief discussion of this result seems in order. The value in column one (B.1) is k Rk.' ~k is replaced by the square

-185bracket in (B. 20). If ~ is stationary, the difference terms are 0 and'Dk = Ok-2_2 If 4 is increasing (Dk-2< < k- k) or decreasing (k-2' Ok-l > ck) the new result is between _ 2 and k The result in (B. 20) depends on Zx, which is undesirable from the modeling point of view. For Z time-invariant, x x, j- for all j (B. 21) x, j x, j-1 so that E{k Vk-l Vk-l} k-2 k- k-k- 2 k-3 is not involved. When is timek-varying, t-2his dependence-l and sx When x is time-varying, this dependence on )x remains, but only past values are required. Furthermore, inspection of (B. 20) shows that large changes in Zx are needed before the effect begins to show. Other values in the table (B. 1 to B. 8) are obtained in the same manner.

APPENDIX C EXPECTATIONS OF QUARTIC TERMS FOR THE VECTOR CASE The need arises for expectations of quartic terms. For the scalar case, n = 1, E{x1 X2 x3 x4} = E{x x2} E{x3 x4} + E {x1 x3} E{x2 x4} (C.1) + E{xl x4} E{x2 x3} if xi are jointly Gaussian and have mean 0. The extension to the vector case is given here. E{x x2 x3 x4} = E{x1 x2' } E{x3 x4 } + E{x1X3 } E{x2x4'} + E{xX4' } E{x2x3'} if the x. are n-vectors (column), have mean 0 and the x. are 1 1 jointly Gaussian. Proof: The proof will use the following general result. -18 6

-187ar (o0,... 0).r E 1k l k2 km * k k k X2 w aw 2 aw. 2w m where r = k + k2 +... km (wl1, w2,... wm) - characteristic function. For the jointly Gaussian xl, x2, x3, x4 vectors with mean 0, the characteristic function is (w1 2' w3, 4) exp 2 (w1, w2Iw3,w4 R Iw w3 W4 and R = [Rij], Rij = E{xixj } R.. is nxn, R is 4nx4n in dimension. 1J a(w1, w2, w3, W4) w4 = (R41W1 + R42W2 + R43w3) C(w,w2, w3, w4=0 az ~ (w1, w2' w3, 0) a3 aW4 + {(R41w41+ R42w2)(R311 +R32W2)- R43} w3-0 2O(W1,w2, 0, 0) Papoulis, page 244.

-188a3 O(Wl,w2, O, O0) aw2' aW3 aW [- {R41W1W1 R13 R43} (R21w1)+R4231W 1 w2=O + R32R41WI] C(w1, 00, 0) a4 (Wl1, 0, 0,0) awaw2 aw3aw4 (+R43R21+ R42R31+R32R41)' R12R34 + w =0 R13R24 + R14R23 which is the result claimed.

APPENDIX D EXPRESSIONS FOR THE OBSERVATION VECTOR In the development of the model equations and estimators for the model parameters D, Q and R, expressions for Yk in terms of previous observations are needed. For ready reference, these results are listed below. They are merely repeated applications of the model equations. Xk+l = k+1 Xk Uk+1 (4.1) Yk k + k (4.2) Yk- 4 = k-4 + vk-4 Yk-3 = Xk3 + k-3 k-3 Xk-4 + k-3 + Vk3 Yk-2 Xk- 2 + k- 2 k-2 k-3 + k-2 -k-2 lk-3 xk-4 + -2 k-3 + k-2 + k-2 -189

-190Yk-i = Xk- + Vk-I k- Xk-2 + Uk-1 + Vk-l k-1 Dk-2 Xk3 + -1 uk-2 + Uk-i + Vkk-l k-2 k-3 k-4 + k- k-2uk-3 + k-1 uk-2 + Uk-i + Vk-i Yk = Xk + Vk k Xk- + k + Vk k 1k-1 Xk-2 + k -1 + +Vk k k-1 k-2 k- 3 + k k- 1 k-2 k k-1 + Uk + vk =k k-l k-2 k-3 Xk-4 k k-1 k-2 Uk-3 + k k-l Uk-2 + k Uk-1 + Uk + Vk

APPENDIX E THE STEADY-STATE COVARIANCE A The need exists for a symmetric positive definite solution A of the matrix equation A = A (' + Q 4D is a state-transition matrix and Q is a covariance matrix. For the scalar case, the solution is evidently A = 1 - 42 which is positive if 42 < 1. For the matrix case, a similar condition is required but the solution is not so readily expressed. Proposition: Let 4b have spectral radiusi p(M) <1 and let Q be symmetric positive definite. Then the equation A = 4A' + Q has the solution i=O See Appendix A for the definition. -191

-192where =_ I; (i = D --- 5 is the i-fold product. The proof is by substitution of finite sums A(k) E- Q(t i=O and then letting k - oc. Existence and uniqueness have been shown. Often, the actual computation of A can be accomplished with a small number of iterations k, because of the condition on. Furthermore, convergence of the solution is accelerated by actually "solving" the equation recursively: A(k)=, A(k-1), + Q k > 1 A(0) = Q A good initial approximation greatly reduces the required iterations.

-193REFERENCES 1. W. N. Anderson, Jr., G. B. Kleindorfer, P. R. Kleindorfer and M. B. Woodroofe, Consistent Estimates of the Parameters of a Linear System, Department of Statistics, Carnegie-Mellon University, Pittsburgh, Pennsylvania, August 1968. 2. T. G. Birdsall, Adaptive Detection Receivers and Reproducing Densities, Cooley Electronics Laboratory Technical Report No. 194, The University of Michigan, Ann Arbor, Michigan, July 1968. 3. L. Breiman, Probability, Addison-Wesley Publishing Company, Menlo Park, California, January, 1968. 4. R. S. Bucy and P. D. Joseph, Filtering For Stochastic Processes With Applications to Guidance, Interscience Publishers, New York, May 1968. 5. J. L. Doob, Stochastic Processes, John Wiley and Sons, New York, 1963. 6. M. Abramowitz and L. A. Stegun (Eds.), Handbook of Mathematical Functions, National Bureau of Standards, AMS 55, Washington, D. C., 1964. 7. N. Hatter, Triband: The Three-Band Filter for the Continuing MIMI Experiment, Cooley Electronics Laboratory Technical Report No. 201, The University of Michigan, Ann Arbor, Michigan, February 1970. 8. D. Jaarsma, The Theory of Signal Detectability: Bayesian Philosophy, Classical Statistics, and the Composite Hypothesis, Cooley Electronics Laboratory Technical Report No. 200, The University of Michigan, Ann Arbor, Michigan, February 1970. 9. R. E. Kalman, "New Methods in Wiener Filtering Theory, " Proc. 1st Symp. on Engrg. Applications of Random Function Theory and Probability, J. L. Bogdanoff and F. Kozin, Eds., New York: Wiley, 1963. 10. R. E. Kalman, "A New Approach to Linear Filtering and Prediction Problems," J. Basic Eng., A.S.M.E., 82 (1960), pp. 35-45.

-194REFERENCES (Cont.) 11. R. E. Kalman and R. S. Bucy, "New Results in Linear Filtering and Prediction Theory," J. Basic Eng., A.S.M.E., 83 (1961), pp. 95-108. 12. P. P. Nuspl, "Probability Density Function of Time-Average Power and Log (Power)," to appear as a Letter-to-the-Editor, J. Acoustical Society of America, July 1970. 13. A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, 1965. 14. E. E. Pentecost, "Synthesis of Computationally Efficient Sequential Linear Estimators, " Thesis, University of California at Los Angeles, 1965. 15. M. I. Schwartz, "Distribution of the Time-Average Power of a Gaussian Process, " IEEE Trans. on Information Theory, IT-16, No. 1, (1970), pp. 17-26. 16. R. L. Spooner, Theory of Signal Detectability: Extension to the Double Composite Hypothesis Situation, Cooley Electronics Laboratory Technical Report No. 192, The University of Michigan, Ann Arbor, Michigan, April 1968. 17. J. C. Steinberg and T. G. Birdsall, "Underwater Sound Propagation in the Straits of Florida, " J. Acoustical Society of America, Vol. 39, No. 2, February 1966, pp. 301-315. 18. I. Tolstoy and C. S. Clay, Ocean Acoustics: Theory and Experiment in Underwater Sound, McGraw-Hill, New York, 1966. 19. R. Unger and R. Veenkant, Underwater Sound Propagation in the Straits of Florida. The MIMI Experiment of 3 and 4 February 1965, Cooley Electronics Laboratory Technical Report No. 183, The University of Michigan, Ann Arbor, Michigan, May 1967. 20. R. Unger and R. Veenkant, Underwater Sound Propagation in the Straits of Florida. The MIMI Continuous and Sampled Receptions of 11, 12, and 13 August 1966, Cooley Electronics Laboratory Technical Report No. 186, The University of Michigan, Ann Arbor, Michigan, June 1967.

-195REFERENCES (Cont.) 21. R. J. Urick, Principles of Underwater Sound for Engineers, McGraw-Hill, New York, 1967. 22. H. L. Van Trees, Detection, Estimation and Modulation Theory, Part I, John Wiley and Sons, Inc., New York, 1968. 23. L. A. Zadeh and C. A. Desoer, Linear System Theory: The State Space Approach, McGraw-Hill, New York, 1963.

-196DISTRIBUTION LIST No. of Copies Office of Naval Research (Code 468) 1 (Code 102-OS) 1 (Code 480) 1 Navy Department Washington, D. C. 20360 Director, Naval Research Laboratory 6 Technical Information Division Washington, D. C. 20390 Director 1 Office of Naval Research Branch Office 1030 East Green Street Pasadena, California 91101 Office of Naval Research San Francisco Area Office 1076 Mission Street San Francisco, California 94103 Director 1 Office of Naval Research Bramch Office 495 Summer Street Boston, Massachusetts 02210 Office of Naval Research New York Area Office 207 West 24th Street New York, New York 10011 Director 1 Office of Naval Research Branch Office 219 South Dearborn Street Chicago, Illinois 60604 Commanding Officer Office of Naval Research Branch Office Box 38 FPO New York 09510

-197DISTRIBUTION LIST (Cont.) No. of Copies Commander Naval Ordnance Laboratory Acoustics Division White Oak, Silver Spring, Maryland 20907 Attn: Dr. Zaka Slawsky Commanding Officer 1 Naval Ship Research & Development Center Annapolis, Maryland 21401 Commander 2 Naval Undersea Research & Development Center San Diego, California 92132 Attn: Dr. Dan Andrews Mr. Henry Aurand Chief Scientist Navy Underwater Sound Reference Division P. O. Box 8337 Orlando, Florida 32800 Commanding Officer and Director 1 Navy Underwater Systems Center Fort Trumbull New London, Connecticut 06321 Commander Naval Air Development Center Johnsville, Warminster, Pennsylvania 18974 Commanding Officer and Director Naval Ship Research and Development Center Washington, D. C. 20007 Superintendent Naval Postgraduate School Monterey, California 93940 Commanding Officer & Director Naval Ship Research & Development Center Panama City, Florida 32402 Formerly Mine Defense Lab.

-198DISTRIBUTION LIST (Cont.) No. of Copies Naval Underwater Weapons Research & Engineering Station Newport, Rhode Island 02840 Superintendent Naval Academy Annapolis, Maryland 21401 Scientific and Technical Information Center 2 4301 Suitland Road Washington, D. C. 20390 Attn: Dr. T. Williams Mr. E. Bissett Commander Naval Ordnance Systems Command Code ORD- 03C Navy Department Washington, D. C. 20360 Commander Naval Ship Systems Command Code SHIPS 037 Navy Department Washington, D. C. 20360 Commander 2 Naval Ship Systems Command Code SHIPS 0OV1 Washington, D. C. 20360 Attn: CDR Bruce Gilchrist Mr. Carey D. Smith Commander Naval Undersea Research & Development Center 3202 E. Foothill Boulevard Pasadena, California 91107 Commanding Officer Fleet Numerical Weather Facility Monterey, California 93940

-199DISTRIBUTION LIST (Cont.) No. of Copies Defense Documentation Center 20 Cameron Station Alexandria, Virginia 22314 Dr. James Probus Office of the Assistant Secretary of the Navy (R&D) Room 4E741, The Pentagon Washington, D. C. 20350 Mr. Allan D. Simon Office of the Secretary of Defense DDR&E Room 3E1040, The Pentagon Washington, D. C. 20301 CAPT J. Kelly Naval Electronics Systems Command Code EPO- 3 Washington, D. C. 20360 Chief of Naval Operations 1 Room 5B718, The Pentagon Washington, D. C. 20350 Attn: Mr. Benjamin Rosenberg Chief of Naval Operations Rm 4C559, The Pentagon Washington, D. C. 20350 Attn: DCR J. M. Van Metre Chief of Naval Operations 1 801 No. Randolph St. Arlington, Virginia 22203 Dr. Melvin J. Jacobson Rensselaer Polytechnic Institute Troy, New York 12181 Dr. Charles Stutt 1 General Electric Co. P. O. Box 1088 Schenectady, New York 12301

- 200DISTRIBUTION LIST (Cont.) No. of Copies Dr. Alan Winder EDO Corporation College Point, New York 11356 Dr. T. G. Birdsall 1 Cooley Electronics Lab. University of Michigan Ann Arbor, Michigan 48105 Dr. John Steinberg University of Miami Institute of Marine & Atmospheric Sciences Miami, Florida 33149 Mr. Robert Cunningham Bendix Corporation 11600 Sherman Way North Hollywood, California 91606 Dr. H. S. Hayre 1 University of Houston Cullen Boulevard Houston, Texas 77004 Dr. Robert R. Brockhurst 1 Woods Hole Oceanographic Institute Woods Hole, Massachusetts 02543 Dr. Stephen Wolff 1 Johns Hopkins University Baltimore, Maryland 21218 Dr. M. A. Basin 1 Litton Industries 8000 Woodley Avenue Van Nuys, California 91409 Dr. Albert Nuttall 1 Navy Underwater Systems Center Fort Trumbull New London, Connecticut 06320

- 201DISTRIBUTION LIST (Cont.) No. of Copies Dr. Philip Stocklin Raytheon Company P. 0. Box 360 Newport, Rhode Island 02841 Dr. H. W. Marsh Navy Underwater Systems Center Fort Trumbull New London, Connecticut 06320 Dr. David Middleton 35 Concord Ave., Apt. #1 Cambridge, Massachusetts 02138 Mr. Richard Vesper Perkin- Elmer C orporation Electro-Optical Division Norwald, Connecticut 06852 Dr. Donald W. Tufts University of Rhode Island Kingston, Rhode Island 02881 Dr. Loren W. Nolte Dept. of Electrical Engineering Duke University Durham, North Carolina 27706 Mr. S. W. Autrey Hughes Aircraft Co. P. 0. Box 3310 Fullerton, California 92634 Dr. Thomas W. Ellis 1 Texas Instruments, Inc. 13500 North Central Expressway Dallas, Texas 75231 Mr. Robert Swarts Honeywell, Inc. Marine Systems Center 5303 Shilshole Ave., N.W. Seattle, Washington 98107

- 202DISTRIBUTION LIST (Cont.) No. of Copies Mr. Gordon R. Hamilton Palisades Geophysical Sofar Station APO New York 09856 Mr. Charles Loda Institute for Defense Analyses 400 Army-Navy Drive Arlington, Virginia 22202 Mr. Beaumont Buck General Motors Corporation Defense Research Division 6767 Holister Ave. Goleta, California 93017 Dr. M. Weinstein Underwater Systems, Inc. 8121 Georgia Avenue Silver Spring, Maryland 20910 Dr. Harold Saxton 1601 Research Blvd. TRACOR, Inc. Rockville, Maryland 20850 Dr. Thomas G. Kincaid General Electric Company P. O. Box 1088 Schenectady, New York Applied Research Laboratories 3 The University of Texas at Austin Austin, Texas 78712 Attn: Dr. Loyd Hampton Dr. Charles Wood Dr. Paul McElroy Woods Hole Oceanographic Institution Woods Hole, Massachusetts 02543

- 203DISTRIBUTION LIST (Cont.) No. of Copies Dr. John Bouyoucos General Dynamics/Electronics 1400 N. Goodman Street P. O. Box 226 Rochester, New York 14603 Hydrospace Research Corporation 5541 Nicholson Lane Rockville, Maryland 20852 Attn: CDR Craig Olson Cooley Electronics Laboratory 25 University of Michigan Ann Arbor, Michigan 48105

Secu rit Classification DOCUMENT CONTROL DATA- R & D Secttiritv Classification of title, body of abstract and indexing annotation must be entered when the overall report is classified) 1 ORIGINA TING ACTIVITY (Corporate author) 20. REPORT SECURITY CLASSIFICATION Cooley Electronics Laboratory Unclassified University of Michigan 2b. GROUP Ann Arbor Mi a 3. REPORT TITLE MODELS FOR NONSTATIONARY STOCHASTIC PROCESSES WITH APPLICATIONS TO UNDERWATER ACOUSTICS 4. DESCRIPTIVE NOTES (Type of report and~inclusive dates) Technical Report No. 204 - 036040-2-T| 5. AU THOR(S) (First name, middle initial, last name) Peter P. Nuspl 6. REPORT DATE 7a. TOTAL NO. OF PAGES 7b. NO. OF REFS July 1971 220 23 8a. CONTRACT OR GRANT NO. 9a. ORIGINATOR'S REPORT NUMBER(S) N0001 4- 67- A- 01 81- 0032 036040-2- T b. PROJECT NO. C. 9b. OTHER REPORT NO(S) (Any other numbers that may be assigned this report) ld. |'TR 204 10. DISTRIBUTION STATEMENT Approved for public release; distribution unlimited. II. SUPPLEMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY Office of Naval Research Department of the Navy __________________ _ |Arlington, Virginia 22217 13. ABSTRACT Some models for nonstationary stochastic processes with time-varying autocorrelation matrices are presented and applied to real data processes. Since probability distributions are complete statistical descriptions of a random variable, the fundamental problems are to find conditional distributions for the process x given observations y in a time interval. The Markov condition and the Gaussian distribution are assumed for the process x. Presuming knowledge of the model parameters, the solutions have been stated as the Kalman-Bucy equations. The model parameters consist of the state-transition matrix, the driving noise covariance matrix and the observation noise covariance matrix. A class of linear filters and digital processors, which have simple computer implementation, is presented. The data acquisition consists of simultaneous processing of hydrophone outputs to produce signal power and noise power measurements of underwater acoustic receptions. The study shows that the signal power process is apparently second-order stochastic and that the second-order model is adequate as a smoothing and prediction filter The noise process has shipping and biological noise which make effective modeling difficult; by selective elimination of such observations a study of ambient ocean noise is possible. The models are useful in studying these nonstationary stochastic processes. 1D D FORM 1473 (PAGE 1 ) S/N 0101-807-6811 Security Classification A- 314r08

Sc:urity Classification 14. LINK A LINK B LINK C KEY WORDS ROLE WT ROLE W T ROLE WT Nonstationary Stochastic Processes Models Kalman- Bucy Underwater acoustics DD D Fov 651473 (BACK) S /N o i).n7^ i F~ — 7 - i z _

UNIVERSITY OF MICHIGAN 3 9015 03022 6438