ROBUST RECURSIVE ALGORITHM FOR NONLINEAR STATE ESTIMATION ALEXEI R. PANKOV Department of Applied Mathematics Moscow Aviation Institute Moscow,127080, Russia ALEXEI V. BOSOV Department of Applied Mathematics Moscow Aviation Institute Moscow,127080, Russia ANDREI V. BORISOV Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 Technical Report 92-27 April, 1992. 1

ROBUST RECURSIVE ALGORITHM FOR NONLINEAR STATE ESTIMATION ALEXEI R. PANKOV, ALEXEI V. BOSOV, 2 ANDREI V. BORISOV3 Abstract. The nonlinear recursive filtering algorithm for the discrete-time stochastic systems with observation outliers is considered. The results of numerical tests are given. Key Words. Conditionally-minimax nonlinear filter, robust nonlinear recursive filter, optimal robust filtering. 1 Introduction. It is well-known that the Kalman filter (KF) gives the optimal in a mean-square sense estimate of the linear system state only if the system and observation noises and the initial conditions are gaussian (Liptser, and Shiryayev, 1977). If for example the observation errors contain outliers, the KF may work very poor, i.e., the estimate accuracy is low. Many papers are devoted to the problem of robustifying the KF (Masreliez, and Martin, 1977), (Ershov 1978), (Subba Rao, and Yar, 1984), (Barton, and Poor, 1990). This fact shows the practical importance of the problem. The problem of optimal robust filtering is a nonlinear filtering problem (Liptser, and Shiryayev, 1977), the solution of which leads to very complicated numerical algorithms. In this paper we shall construct the robust nonlinear recursive filter (RNRF) based on the conditionally-minimax nonlinear filter (CMNF) (Pankov, 1990). The last method makes it possible to solve the general nonlinear filtering problem It is based on the idea of conditionally-optimal nonlinear filter (Pugachev, 1979), (Pugachev, and Sinitsyn, 1990), (Raol, and Sinha, 1987). By taking into account the specific character of the robust filtering problem and using the CMNF method we obtain a rather effective RNRF algorithm, which is not a robust KF modification. In this paper we consider the RNRF structure for a linear difference stochastic observation system, the corresponding estimate, statistical properties and numerical results, which allow us to compare the accuracy of the RNRF and the robust KF (Ershov 1978). We also briefly consider the RNRF for nonlinear difference stochastic systems. 'Department of Applied Mathematics, Moscow Aviation Institute, Volokolamskoye shosse, 4, Moscow, 127080, Russia 2Department of Applied Mathematics, Moscow Aviation Institute, Volokolamskoye shosse, 4, Moscow, 127080, Russia 3Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor MI 48109-2117 1

2 Problem formulation. We shall use the following notations: IIxII = (xTWx)1/2 for some known weight matrix W: W = WT, W > 0; p(xlm, S) - the multidimensional gaussian probability density with the mean m and covariance matrix S; P(m, S) - the set of all random vectors x with E{x} = m and cov(x,x) = S. Let us consider the following stochastic observation model: f n = anyn-1 + wn, n =,2,...; =, (1) Zn = CnYn + Vn, where yn E RP is a state vector, i7 E RP is a vector of initial conditions with the density p(xlm,, R7); Zn E Rq is an observation vector; an, c, are known (p * p) and (q * p) matrices respectively; {wn} and {vn} are independent random white noises with the one-dimensional densities f,(x, n) and fv(x, n) of the following form: fw(x, n) = p(xlmv, Rv), (2) fv(x, n) = (1 - 6)p(xlm, Q1) + Sp(xIm2, Q2), (3) where mw, R, m, Qi > 0 (i = 1,2) are known multidimetsional parameters; 6 G [0,1] is a probability of outliers in observation ( the density of outliers is p(xlm2, Q2)). From (1),(2) it follows that {yn} is the gaussian process. Equations (2),(4) define the standard observation model with outliers (Kassam, and Poor,1985). It should be mentioned that in (2),(3) all distribution parameters depend on n, but we shall omit this for simplicity. Consider the problem of a robust recursive process {yn} estimation given the vector of observations Zn = col(zn,...,z1) in the context of conditionally-minimax filtering theory (Pankov,1990). Let Yn-1 be the CMNF estimate of yn-_ given Z'-1. The estimate n of yn takes the form X yn = anYn-l + m, (4) 1Yn = on + *((n(yniZn)) t where (n(Yn Zn) is some known basic correction function (BCF) ( the choice of BCF is considered in section 3 ); yn is a prediction of yn given Zn-1 which is based on the equation (1); <n(.) is the vector-function obtained from the optimality condition *(vn) = arg min max E{ Iyn - An - On(Vn) ll2} (5) n E~ P(m,S) where i is the set of all measurable square integrable functions; x = col(eyn, Vn) E P(m, S) with known m, S; ZASn = yn - yn is a prediction error. 2

The equation (5) shows that the CMNF belongs to the class of algorithms with a "predictor-corrector" structure, and q$(.) provides the mean-square optimal correction of Yn under the additional condition that the correcting term is an arbitrary function of the BCF,n. The probability distribution of the random vector x is unknown, but the moments of the first and second order m, S are known ( or can be calculated in some way). Hence, t(.) is a conditionally-minimax correction on the set of distributions P(m, S). If the distribution of x is known, we can obtain the optimal correction <(vfn) = E{Aynlvn}, but numerical evaluation of it has the same level of complexity as the original nonlinear filtering problem. If the solution of (5) exists, the equation (4) defines the recursive CMNF. To obtain the robust modification of CMNF we need to choose the appropriate (n(y, z) and determine the function 0*(.) from (5). The corresponding results are given in the next section. 3 RNRF for the linear model. In order to choose Vn = (n(n,,n) we consider the additional problem of the random vector X estimation given the observation vector Y = CX + V, where X has the density p(xlm,, RR) (i.e. X is gaussian); V is a vector of observation error which is gaussian with outliers: fv(xln) = (1 - )p(xlml, Q1) + 6p(xlm2, Q2). Let us obtain the mean-square optimal estimate X of X given Y: X = 7'(Y), where y*(Y) = arg min E{ IIX - y(Y) 12} LEMMA 1. Let X and V be independent, then y*(Y) = (1 - X(Y))X(Y) + x(Y)X2(Y), (6) x() = f2y(Y)[(1 - 6)f'(Y) + 6f2(Y)]-; fi() = p(ylCm + i, CRm CT Qi), i= 1,2; X,(Y) = m, + RCT(CR CT+ Q,)+(y - Cmx - m,). (7) Proof of Lemma: see Appendix A. Denote m- = E{Ayn}, R, = cov(An, AYn) and C = c,. Let us take the BCF (n(ynn Z) as follows (n(n, Zn) = )(Zn - Cn n), (8) 3

where -y*(.) is defined in (6),(7). Taking into account z, - cj, = CnAyn + V. we obtain from Lemma 1 that the BCE (8) provides the optimal correction y under the condition of gaussianity of 'Mn- Hence, (n y,, Zn) as in (8) provides the robust property of the estimation algorithm. The real distribution law of Ayn slightly differs from gaussian, hence the filter accuracy may be raised by the nonparametric filter optimization in conformity with (5). First we state a preliminary result concerning the minimax problem (5). Let z = col(x,y) C P(m,S). Denote mx = Ejx}, my, = Efy}, S, = cov(x,x), Sy cov (Yy), S~" = cov(x,Y) = SyxT LEMMA 2. Let J(q, F) = El{Ix - 0(y)I121. If z has the distribution F E P(m, S) then J(q*, F) ~ J(q*, F*) ~ J(q, F*) whereq*(y) = SyShy + (m,, - S1yS+my), and F* is the gaussian distribution with the expectation m and covariance S. Proof of Lemma 2: see (Pankov, Bosov, and Borisov, 1992). THEOREM 1. Let yo = E{7/}, Vn = (n(~n, Zn) satisfy (7)-(9), then i) the solution of (5) exists for all n > 1 and { (Vn) = HnVn + hn, Hn= cov(A n, vn)cov+(v,, v,); (9) hn= -H - nE(vn; ii)'n is an unbiased estimate of y, with the covariance matrix Kn of the estimate error '= Y - w, where If = anInijan +RW- Hncov(vn, Xn), n > 1; Ifo = R. (10) Proof of Theorem 1: see Appendix B. Corollary. Let E { ~ yn - Ejyn}iI2} ~ L < oo, n > 0, then RNRF (4) is nondivergent, i.e. E{IlznIII2} ~ L < oc. Proof of the statement follows from (10). Let W = 1 and Kn..1 < L. Then Ef JAY"'n112= trIkn} ~ trjanKln..iaT} < tr~ancov(yni,-yni1)aT + Rw} = trIcov(yn, yn)} = E{ Iyn - Eyn}lI12} ~ L. Hence, the estimate error variance of the arbitrary component of Yn is less or equal to the variance of this component for any n > 0. From Theorem 1 it follows that yn is also an unbiased estimate of y, as Mn = an'Ayn and E{fAA-1} = 0. The 4

accuracy of y, is determined by its error covariance matrix IKn. Consider the final system of equations, which defines the RNRF for system (1)-(3): Yn = anyn-1 + mu, n >= 1 m; (11) nf n=n + Hn n+ hn, { n =(1 -X(ne))~X(En) + X(En)X~(Cn); = Zn - Cn; (12) X(enn) = KnnCT(CnIkcT + Qi)-l(n - mi); In = anIfn-lan + Rw; X(en) = 6Sf(6e)[(1 -6)fn(6n) + f,2(6e)]-1; (13) f (cEn) = p(nlmi, CnKncn + Qi); i = 1, 2; Hn = cov(yn, Vn)cov+(Vn,,n); hn = -HnE{ln}; { A~~~~ A== ~~~~(14) K n = anIfn - Hncov(Vn, An), n > 1; 1i0 = R,. Formula (11) defines the prediction, (12) defines the correction procedure, equations (13) define the robust BCF, and (14) allows one to calculate the optimal filter coefficients Hn, hn and its error covariance Kn. It should be mentioned that in the case of gaussian observation errors without any outliers (i.e. 6 = 0) algorithm (1)-(14) coincides with the KF algorithm, i.e. Yn = yn + KnC (CnKnc7 + Qi)-1, where {cn} = En - m is the innovation sequence. There is another interesting case, which leads to the minimax interpretation of the KF algorithm. From Lemma 2 it follows that if we use the linear KF equations (Sage,and Melsa, 1972) together with the model (1)-(3), but considering the observation error {vn} as a gaussian white noise with the expectation (1-6)m1+6m2 and the covariance matrix (1 - 6)Q1 + 6Q2, we obtain the estimate y, which possesses the minimax property, and the ky accuracy is guaranteed for given values of 6, m, m, Q1 and Q2. 4 RNRF for the nonlinear system. Consider the following nonlinear observation model: yn = an(yn-l) + bn(yn-l)Wn, > 1; yo = -7; (15) Zn = gn(yn) + Vn, (16) where an(y), bn(y), gn(y) are known nonlinear vector-functions. All properties of 77, {wn}, {vn} considered in section 2 hold. To derive the robust CMNF let us introduce the second structural function <n(y) - the basic prediction function (BPF). Denote n =,n(yn-i) and consider the conditionally 5

minimax prediction ~, of the following type Pf = 0*((nni (17) where 'Ib~(.) is defined by the condition On ((n)= arg min max E{IIYn.4'Q)HI2}, (18) 2kEP P(mo,So) where mo = Efxo}; So = cov (xo, xo) and X0 = Col(yo, Jn).A From (17),(18) it follows that kn is a function of BPF ~, and provides the mean square optimal prediction of yn for the most unfavourable joint distribution of yn and i, from the set P(mo, So). Consider the sufficient conditions for the RNRF for (15) to exist and the structure of the corresponding filtering algorithm. THEOREM 2. Let for an(y), bn(y), g,(y) from (15, (16) and Fn(y) there exist cn, On < 00 such that for all y E RP and n > 1 the following inequalities are correct: a) IIan(y)II + IIbn(Y)II + Ign(Y)II + In(y)1I ~ 3n(I( IIY+ I n); b) there exists dn(y) = agn(y)/ay, and IHdn(y)II /n(I IIY+ I C). Then the RNRF exists and is defined by the equations ~n n Fnn + fn; n > 1; Yo = M 77 (19) where Fn = COV(Yn, n cove (n I n), fn = E yI - FnE{f,}, and formulae (12)-(14), where E6 = Zn - g(~n), c, = d(~n) and I~~ = COV(yn, yn) - Fncov(En yn). The estimates ~n and ~, are unbiased and obtain the error covariance matrices K, and IQn respectivly. The proof of the Theorem 2 is based on the fact that under the conditions of Theorem 2, y, and An have finite moments of arbitrary order for any n > 0. Consequently, it looks like the proof of Theorem 1, hence we omit the details. Consider some obvious types of BPF &n(Y) a) 'n( n-1) = an-1 -linear BPF; b) En( n-1) = an(yn-1) + b(yn-l)mw-the BPF based on the dynamic system equations (15); 6

c) In(yn-1) = col(ei(yn-1),..., eN(Yn-)), where {ei(yn-l)} is a system of linearly independent functions on RP (the BPF of the general type). 5 Numerical examples. From the equations (14) and (19) it follows thar RNRF parameters are independent of particular measurement trajectory {zn} and depend only on the moment characteristics of the vector x = col(iAyn, tIn) and x0 = col(yn,,n). Consequently they can be calculated in advance and store in a computer memory. There are several methods for determinating Fn, f, H, and hn (Raol and Sinha, 1987), (Pugachev, and Sinitsin, 1990), developed for the conditionally-optimal filtering and based on the evaluation of the joint characteristic function of y, and y,. These methods are very complicated and require many a priori computations. We think that much more effictive method is the method based on the joint computer recursive statistical modelling of yn and yn and the statistical treatment of the simulations results. The details of this method are given in previous papers (Pankov,1990). Using these methods we obtain several estimation results which allow us to compare the RNRF method with others finitedimensional filtering algorithms with the optimal one. Consider the following observation model: yn = 0.9yn-1 + Wn,; Y0 77; (20) Zn,= 2yn+ Vn. U. 1. The distributions of wn, v, and r7 are given in section 2. We use the following concrete distribution parameters: m, = m, = ml = m2 = 0.0, R, = 0.49, R, = 1.0, Q1 = 1.0, Q2 = 100.0, 6 = 0.2. We simulated on a computer N = 5000 system (20) trajectories and obtain the following estimates. y~^ - the mean-square optimal estimate, determined by the special numerical algorithm of the spline-appriximation of the conditional density of the system state (Pankov, Bosov, and Borisov, 1992); yk - the minimax linear estimate; given by the KF algorithm (see section 2); yrk -the estimate of the robust KF (Ershov 1978).; yn - the RNRF estimate. The corresponding numerical results are given in Table 1. 7

Table 1. 7n K 7Kn7 Krk Kk 1 0.592 0.618 0.712 1.040 2 0.490 0.539 0.605 1.061 3 0.444 0.500 0.600 1.071 4 0.449 0.481 0.638 1.077 5 0.429 0.460 0.632 1.079 6 0.385 0.473 0.618 1.081 7 0.392 0.478 0.650 1.081 8 0.392 0.471 0.665 1.082 9 0.420 0.441 0.669 1.082 10 0.409 0.441 0.655 1.082 We see that yn is much more accurate than yrk and yk is close to y~. The computation time of yn is practically the same as of yk and sufficiently smaller than of y. 2. In order to check the importance of the normality condition for AZn (see Lemma 1) we consider the system (20) with two types of nongaussian disturbances {wn} and initial condition rq: the uniform distribution and the Laplace distribution. All parameters of these distributions were chosen as in (21). We calculate the RNRF-estimate,n and the optimal one y~. the resuts are given in Table 2. Table 2. From the obtained results it follows that the influence of the deviations in the gni distribution from the gaussian one is negligible. 8

6 Appendix A. Proof of Lemma 1. Define fx (x) = p(x Imx, R~,), fi (x) = p(x Imi, Qj), 1 12. Then the probability density of V is fv(x) = (1 - 6)fi(x) + &f2(x). It could be easily checked that in this case Y has the probability density fy(y) = (1 _ &)fl(y) +2 (ii), (21) where fi(y) = p(ylICmx + mi, CRx CT + Qi), j' = 1, 2. The conditional density f (xlIy) of X given Y may be expressed in the form f (x Iy)= f(yjx)fx(x)[fy(y)]-',hence =JPxf (x y)dx =JPxf (yjx)fx(x)dx[fy(y)f-1. Obviously f (yjx) =(1 - &)f1(Y - Cx) + Sf2(y - Cx), then f(YIX)fX(X) = (1 - 6)f1(y)fi(Xjy) + Sf2(y)f2(~) where fi(xly) is the conditional density of X given Y under the assumption that V has the density fi(x), i' = 1, 2. Then 7( = [fy(y)]1,[(1 - 6)f1(Y) J~xfi(xy)dx ~ 8f2 (y) JPxf2(xly)dx]= [fy(y)]-1[(1 - &)-y1'(y) + &y2*(y)], where -<is the optimal estimator of X given Y if V has the density fi(x), i' = 1, 2. From the normal correlation theorem (Liptser, and Shiryayev, 1977) it follows that if *y(y) = mr, + RxCTCRxC I Qi)>1(y - Cmn, - m-)), i = 1,2. (22) Denote X(~ f()f~)~ X2(Y) = y(Y), then for the optimal estimate X = 7y*(Y) from (21), (22) we obtain X = (1 - X (Y)) Xi(Y) + Xy(Y)) X2(Y) 9

7 Appendix B. Proof of Theorem 1. From the conditions of Theorem 1 it follows that E{I yoI12} = tr{WR,}+J+HmJI2 < oo, then for all n > 1 E{llynll2} < 2(11anll2E{llyn-1112} + E{llwn112) < 00, Let for some n > 1 we have E{Ilyn-i l2} < 00, E{yn_l - n-1} = 0. For ASn = yn -,n we obtain E{llAyn112} < 2E{llyn112 + Inj112} < 00 Let us show that in this case E{lIIn(n, zn)112} = E{I|*(6n)112} < 00, where en = Zn - CnYn = cnAn + vn. Obviously, E{lleCn12} < oo since we have E{llAyn112} < oo and E{llvn,12} = tr{W[(1-6)Q1+6Q2]}+Il(1-6)m l+6m2112 < oo. 7*(En) = (1 - X(n))Xi(n) + X(En)X2(n) and besides X(En) E [0,1] with probability 1. Hence, it is sufficient to show that E{lIXi(En)112 < 00 i = 1,2. Xi(En) = KfnCT(CnKnCn + Qi)(n - mi), hence there exists DW < oo, such that IX(n)112 < D (1 + ylv12). So, E{(IX,(En)112} < DW(1 + I E{n112}) < oo. Then for x = An, y = y*(e) the random vector z = col(x,y) E P(m,S) where m = E{z}, S = cov(z,z) with tr{WS} + llmI12 < oo. Then from Lemma 1 it follows that Yn = An + 04(C;n(Yn, Zn)) = an + Hn7*(En) + hn, where Hn = cov(x, y)cov+(y, y), hn = -HnE{y} It should be mentioned that in this special case m, = E{Ayn} = anE{yn-1 - n-l} = 0. Let now n = 1 and o = m,. Hence E{|I o|I2} = llmIl2 < 00 and E{yo - yo} = 0. Now the result follows from the mathematical induction principle. References Barton,R.J., and Poor,N.V., 1990 Robust estimation and signal processing. IEEE Trans. on Inform. theory, IT-36, 485-501. Ershov,A.A., 1978 Stable methods of estimating parameters (survey).. Automat. and Remote Contr., 39, Number 3 Kassam,S.A., and Poor,N.V., 1985 Robust techniques for signal processing: a survey. Proceedings IEEE, 73, Number 3, 433-481. 10

Liptser,R.S., and Shiryayev,A.N., 1977 em Statistics of random processes. N.-Y.,Springer Verlag. Masreliez C., and Martin,R., 1977 Robust bayesian estimation for the linear model and robustifying the Kalman filter. IEEE Trans. on Autom. Control, AC-27, 361-371. Pankov,A.R., 1990 Synthesis of conditionally-optimal filters by computer simulation. Analysis and synthesis of dynamic systems under uncertainty, Moscow, MAI Press, 69-75 (in Russian). Pankov,A.R., 1990 Recursive dynamic system trajectory estimation by means of regression nonlinear filter. Statistical methods in flight control, Moscow, MAI Press, 117-118 (in Russian). Pankov,A.R., 1990 Synthesis of conditionally-minimax filters for nonlinear stochastic systems by method of modeling. The All-Union Conference on Dynamic, Control and Operation Research, Moscow, MAI Press, 117-118 (in Russian). Pugachev,V.S., 1979 Estimation of variables and parameters in discrete-time nonlinear systems. Automat. and Remote Control, 40,512-521. Pugachev,V.S., and Sinitsyn,I.N., 1990 Stochastic differential systems, London, J.Wiley & Sons. Raol,S.R., and Sinha,N.K., 1987 Conditionally optimal state estimation for systems, governed by difference equations. Canadian Electr. Engin. J., 12, 71-77. Sage,A.P., and Melsa,J.L. 1972 Estimation theory with application to communication and control. New York, McGrow-Hill. Subba Rao T., and Yar M. 1984 Linear and nonlinear filters for linear but nongaussian processes. Int. J. Contr.,39, Number 1, 235-246. Pankov,A.R., Bosov,A.V., and Borisov,A.V. 1992 Finite-dimensional algorithms of nonlinear system state estimation. University of Michigan, Technical Report 92-13. 11