ESTIMATION IN GENERALIZED UNCERTAIN-STOCHASTIC LINEAR REGRESSION ANDREI V. BORISOV Department of Industrial & Operations Engineering University of Michigan Ann Arbor, MI 48109-2117 Technical Report 91-39 December, 1991. 1

12 ESTIMATION IN GENERALIZED UNCERTAIN-STOCHASTIC LINEAR REGRESSION ANDREI V. BORISOV1 ( Revised on December, 17, 1991 ) Absract. The problem under consideration is to estimate optimally some finitedimensional vector, which is induced by linear operators, defined on the special class of stochastic and uncertain processes. Optimality of the estimate implies minimization of a minimax-stochastic criterion. We assume a linear model of observations, which contains random disturbances and has a discrete-continuous structure. The optimal estimate must be found as a linear operator of observations. Necessary and sufficient conditions for the vector identifiability and estimate optimality are proved. The optimal filtering algorithm for uncertain-stochastic differential systems is obtained as an application of this estimation theory. Key Words. Optimal estimation, generalized linear regression, minimax-stochastic criterion, identifiability of parameters, filtering problem for uncertain-stochastic systems. AMS(MOS) subject classifications. 93Ell, 93E12. 1.Introduction. Parameter estimation in the linear regression arises often in stochastic control and signal processing theory. Diffrerent approaches may be used to solve this problem. The choice depends on both a priori information about regression parameters and noises and optimality criteria for the estimates. If both parameters and noises in the model are gaussian, then the estimate, which minimizes mean square estimate error, i.e., conditional expectation is also equal to the best linear estimate (BLE) [1]. In the case parameters and noises are random but nongaussian with given moment characteristics, the BLE may by found by using the same formula as in the gaussian case. There also exists a sufficient condition for the linear estimate to be the BLE - the well-known Wiener-Hopf condition [2]. The estimation problem for pure stochastic regression is solved in [3] and [4] when parameters and noises have singular variance matrices. In [2] the RKHS approach is also considered when parameters and noises are random but contain uncertainties in their moment characteristics. When it is assumed that parameters and/or noises are uncertain and bounded, a minimax optimality criteria and minimax [5], [6] or ellipsoidal bounded [7], [8] estimation methods are usually used. If the parameters are uncertain and unbounded, 1Department of Industrial & Operations Engineering, University of Michigan, Ann Arbor MI48109-21 17 1

then the least square estimate ( LSE ) [9] is the BLE. In practical situations it is necessary to estimate some vector of parameters in generalized uncertain-stochastic regression [10]. That means the estimated vector is a linear combination of unbounded uncertain and random parts induced by some linear operators from the space of random and uncertain processes respectively into the finite-dimensional space. We assume that observations contain random noises and have a discrete-continuous structure. Similar problems in the pure random case arise by filtering or smoothing of the state in the linear difference, differential and integral dynamic systems [11],[12],[13],[14]. In this article, the necessary and sufficient conditions of the state identifiability and linear estimate optimality for generalized uncertain-stochastic linear regression are given. Problem, which is similar to the identifiability problem, is considered in [15] for pure uncertain differential inclusions. Here, identifiability conditions are obtained for arbitrary linear bounded operators. Presented estimate optimality conditions are generalizations of the Wiener-Hopf condition from pure stochastic to uncertainstochastic regression. The filtering problem for linear uncertain-stochastic differential systems given complex observations of both system state and input signals is solved. This algorithm was presented in [16] briefly for the special case of separate observations of the state and inputs. In this article, optimality of this algorithm is proved for the general case as an application of the optimal estimation theory in generalized regression. 2.Preliminaries. Consider a probability space (f, F, P), where f1 is a sample space, F is a a-algebra on Q, P is a probability measure on F. Further, we consider a finite time interval [0, T]. DEFINITION 2.1. The n-dimensional stochastic process lt = (l,..., pT belongs to the class S" iff the following conditions hold: i) each component of it p = p1i(w), t), w E Q is F-measurable, i =1,...,n; ii) all paths of Ht, i = 1,..., n belong to the L2 space of square integrable functions on [0, T]; iii) it is a second-order zero-mean process with known covariance matrix-valued function cov(pr7,/) = R(r, ), 0< r < T, 0 < < T. The class Sn is quite broad. For example, all n-dimensional zero-mean secondorder diffusion processes belong to this class [1]. We will also denote the k-dimensional space of square integrable functions on [0, T] as Lk; $+ and $4+ are the left pseudoinverse matrix and operator respectively, i.e. the matrix and the operator, which satisfy $olo + -o =, =+B = $. 2

We will consider a square of the following norm of vector a, Ilal2 = aTEa, where E is a known symmetric matrix, E > 0. 3.Problem Formulation. Consider the following model of observations { ygo=.ouo+~iu+H~oo+Hicd+Qoo,( 1) Y = (2Uo + 4U + H25o + H5 + QO. Here uo E RP' is an uncertain vector, 'o ~ RO0 and o RrO are zero-mean random vectors with known variance matrices cov(&o, o) = Co and cov(Oo, o) = Po respectively. Matrices 4o, Ho, Qo of appropriate dimensions are given. {u4} E LP is an uncertain process, {J(} E S9 and {0r} E Sr are q- and r-dimensional stochastic processes with known covariance matrix-valued functions cov(r,,,) = C(T,c) cov(0, 0a) = P(r, ). The linear bounded operators 41: Lr - Rmo, H: L R+?mo, 4: L -- Lm, (2: RPO - L2m H: L2 m L~m, H2: Rqo - L' and Q: Lr -2 L and given. We suppose for simplicity that o, {gr}, 0o, {0,} are not correlated. Consider also a n-dimensional vector x = Aouo + Boo + Au+ B, (2) where matrices Ao, Bo are given, A and B are known linear operators LP -- Rn and Lq -- Rn respectively. The problem under consideration is to calculate a linear estimate x = Moyo + My of j, which minimizes the following criterion J= sup E{llAll2}, (3) uoE~RP,{ur}EL2 where A = x- x is an estimate error, Mo and M are appropriate matrix and linear operator respectively. Such estimate will be called further the best linear minimax estimate (BLME). 4.Necessary and Sufficient Conditions of Vector Identifiability and Estimate Optimality. IEFINITION 4.1.The vector x given by (2) is identifiable by the observations (1) iff there exists some measurable operator K ( may be nonlinear ) such that x = I(yo, ), J($) < o0. LEMMA 4.1. If x is identifiable,then (ker f nker 41) C ker EA and (ker To n ker I2) C ker EAo. Proof of Lemma 4.1: see Appendix A. 3

This is a generalized necessary condition for the optimality of x. THEOREM 4.1.If 5: = Moyo + My is the BLME of x, then E(AO - MAOo - Mt2) = 07 4 (4) { (A - MAI, - M4) = 0. Proof of Theorem 4.1: see Appendix B. Conditions (4) are equivalent to unbiasing the estimate 5:: CE{A} = 0. Trhis theorem defines some joint restrictions on the matrices AO, Bo and operators A, t, t2, t for the existence of the optimal estimate 5:. THEOEM 4.2. Let x be identifiable, then 5: is the BLME of x if the following conditions hold f E2covQ(i, (I - - 1iI tY - = 0, Ecov(\,A [(I - *'+)(y - 42Ayo)](r)) 0 for all r E [0, T], where 'I'i = ID- (2PAO'1l Proof of Theorem 4.2: see Appendix C. Conditions (5) are analogues of the Wiener-Hopf conditions of optimal parameter estimation in the pure stochastic linear model [2]. Now let us consider the estimation problem for the pure discrete-time regression Yo = %ouo + Ho.o + QoOo, (6) Xo = Aouo + Bo~o, (7) Jo= sup E{IIxo-5:o112}, (8) uo E RPO = -~~~~~~~~o to obtain the necessary and sufficient condition for the optimality of 5:. COROLLARY 4.1.Let xo be identifiable and QoPoQT > O,then i0 is the BLME iff XCOV(Ao, Yo - 4YOu*) = 0, (9) where u* = Loyo is any linear estimate of uO with the property E{YAo(uo - u*)} = 0, (10) x=AA+ + BoCoHgTR '(I - Po~+)}yo, (11) 5 z~~YOL0 001L 4

where Ro = HoCoHo + QoPoQ, I+ = (oTRo-'-o)+joRo-l. Formula (9) is a special case of (5) and any estimate u*, which satisfies (10) may be obtained as u* = 4+yo + (I - D+o)Vo with an arbitrary vector Vo of appropriate dimension. In the case of uncertain parameters (Ho = 0) we obtain x= Ao+ yo i.e., the MSE. In the case of stochastic parameters (%o = 0), we obtain = BoCoHoRo 'yo, i.e., the BLE for a random vector ( or corollary from the normal correlation theorem [1] in the gaussian case.) 5.Filtering Problem in Linear Uncertain-Stochastic Systems. In this section filtering algorithm is derived using a necessary and sufficient conditions of estimate optimality in generalized linear regression. Consider the following dynamic system dxt = a(t)xtdt + b(t)utdt + dr, t > 0, (12) Xo = VI where Xt E R7 is a state, v E Rn is an uncertain initial condition, {uj} E L\ is a q-dimensional uncertain process, {(T} is a zero-mean Wiener process with the differential covariance matrix-valued function C(t). Information concerning {x,}, {u,}, {(,} and v is given by the observations zJ = qo0 + o, 13 dyt = 4k(t)xtdt + q(t)utdt + r(t)ddt + dwt, t > 0, yo = 0, (1) where wo is a zero-mean random vector with known covariance matrix Qo, {w,} is an r-dimensional zero-mean Wiener process with differential covariance matrix-valued function Q(t). Matrix Oo is known. The matrix-valued functions a(t), b(t), C(t), b(t), ~>(t), r(t), Q(t) of appropriate dimensions are given and contain only piecewise continuous elements. The filtering problem under consideration is to calculate the linear estimate ~t of xt, which minimizes the following criterion Jt = sup E{llxt - xt112}, (14) vERnu uTELq using all observations {zo, y, 0 < r < t}. THEOREM 5.1. Let the system and the observations be given by (12)and(13). Let the following conditions also hold: i)Q(r) > for all r E (0, t], Qo > 0; ii)b(rT)+(r)0(r) = b(r) for all r E (0, t]. 5

Then the BLME Xt is unbiased. Xt and its error covariance matrix k(t) are given by the following equations ( dxt = a(t)xtdt + [b(t)O+(t) + M(t)(I - 0(t)O+(t))][dyt- x(t)xdt], k(t) = a(t)k(t) + k(t)aT(t) + 2b(t) [T(t)P-1(t)o(t)]+bT(t)+ 1 M(t)O(t)[OT(t)P-l(t)O(t)]+OT(t)MT ( t) + C(t)- ( (b(t)O+(t) + M(t)) P(t) (b(t)O+(t) + M(t))T, with initial conditions XO = k(O)oyo (16, k(O) [=oTQo10o]-1 where { P(t) = r(t)C(t)rT(t)+ Q(t) > 017 M(t) = [k(t)fT(t) + C(t)T(t)]P-l(t). (17 Proof of Theorem 5.1: see Appendix D. The given algorithm is an extension of linear filtering theory from the class of pure stochastic systems to the class of uncertain-stochastic systems, where the classical Kalman filter is useless. 6.Conclusion. i) In this article, the optimal estimation problem in generalized uncertain-stochastic linear regression has been formulated. ii) The class of identifiable vectors has been determined and the necessary condition for vector identifiability has been proved. This is a special joint restriction on the estimated vector and observation model. iii) The necessary and sufficient conditions for the estimate optimality have been derived. They are generalizations of the Wiener-Hopf conditions for the class of uncertain-stochastic regression. We have also obtained a formula of the parameter estimation for the discrete model as a corollary of the general case. The MSE and BLE in the pure stochastic model are special cases of this estimate. iv) The filtering problem in the linear differential uncertain-stochastic dynamic systems has been solved under general assumptions about the linear model of observation as an illustration of the presented estimation theory in generalized linear regression. Appendix A. Proof of Lemma 4.1. First we decompose an estimate x =;(yo, y) in the form X = K(yO, y) = O(vo, v1, 2, v ) + C(o0, 0, 0), 6

where vo = 0ouo, v1 = 1jlu, v2 = uo, v = (u, (0E(O,,o, 0) = - ((vo, i, 2, v) and then fix some value, v*, v2, v*, u*,,: vg = (oOu, v u V -2U*,, = V u* Let us also denote '1 = E{|IB' + Bo0o - (,, 0o 0 )112, T = Aou* + Au* - b(vO*, v1, v*), UJ = {uo: = 4ouo, v2 = (2UO}, U= {: Vl= 'u, V* = u, 'rhen J = iY + sup IlAouo + Au - q(vo, v 2, v)112 > uo ER" o,()rEL2 71 + sup IIAouo + Au - -(vo, 2, v;, *)1 =1 uoEUo,uEU 71 + IIll12 + sup {27rTE(Aowo + Aw) + IIAowo + Awll2} wo Eker o n ker 2,wEker ~ n ker O1 J1 = 2irTE(Aowo + Aw) is a linear operator with respect to wo and w, hence J > 71 + 1111 2 + sup IAowo + Awl12] wo Eker 4o n ker 2,wEkert n ker 01,J1 >0 Supremum in this sum is finite only in the case S(Aowo+Aw) = 0 for all wo E ker EAo and w E ker A i.e. (ker n ker 41) C ker(EA), (ker lo l ker 2) C ker(EAo). Lemma 4.1 is proved. Appendix B. Proof of Theorem 4.1. Let x = Moyo + My be the BLME of vector x, then J(x) < oo. J(x) = EllBoto + Be - Mo(Hogo + Hj1 + QoOo) - M(H2~o + He + QO)112+ sup II(Ao - Mo$o - M42)uo + (A - Mo$0 - MO)UII2 uo ERPo,uEL2 Supremum in this sum is finite only in the case ~!(Ao - Moro - M42)uo = 0, for all uo E RPO, and S(A - MoA1 - M ))u 0, for all u E L2, 7

which is equivalent to (4). Theorem 4.1 is proved. Appendix C. Proof of Theorem 4.2. We will find all Mo and M which satisfy condition (4) and then select matrix Mo and operator M optimally, using criterion (3). First we consider the equation S(Ao - Moo - M42) = 0 (18) It is necessary to mention that the composition I2 = MR2: RPO - R'- of operators M and (2 is some matrix of appropriate dimensions. Using Lemma 4.1 and the property 4&o +4o = 4o it may be shown, that S(Ao - 2),+o0 = S(Ao - l2). Then (18) is a system of linear equations with solution [3] Mo = S+S{[Ao - '2]o+ + Zo(I - $0oo)} + (I- -+)No, where Zo and No are arbitrary matrices of appropriate dimension. It is easy to verify that criterion (3) does not depend on No, hence we can select Mo in the form Mo = [Ao - 2]+ + Zo(I - $+) (19) Substituting Mo into the second equation (4) we obtain S[A - Ao4+, - Zo(I - $o+)41] = EMT1, (20) where 1i = I - 42+41. Without loss of generality we suppose that +: Im(Ii) - ker(1i1)' in the one-to-one manner and I+[(Im(li))-] = 0. Otherwise, we can transform TI+ to such form. All u E L can be decomposed uniquely as u = uk + u- where uk E ker i, u1 E (ker E1)'. Then using Lemma 4.1 and the property II1JTi =- 11, (21) we can show, that where IQ = A - Ao+1 - Zo(I - o 00+)1. Operator M may be rewritten in the form M =- At+ + M, where M must be found. From (20) it follows that M satisfies the following equation EMi1 = 0, (22) This is equivalent to Imhl C ker(EM). Using (21) and (22), we can obtain following properties of M and Ti EMSv = EMI, (I-lwhe v v where v E Lm, v - vI + val, vI E Iml, vaI E (Imral)l. 8

Similarly to the common solution of (18) we obtain M=E+EI{{+ + Z(I- )} +- (1- E+ )N where Z, N: L' -- R" are arbitrary operators. Again, criterion (3) does not depend on the choise of N. Hence, we can consider operator M in the form M = T+ + Z(I- I,+). Then general solution of (4) is f M = (A - Ao+4il)T+ - Zo(I - Io4t+)4 'l+ + Z(I - (I +), Mo Ao+ - (A- Aoo+4i)F+42 ~o+ (23) Zo(I - oO+)[I + (1 It2I0+] - Z(I- -11+)2+. t A A Suppose that Zo and Z are the optimal matrix and operator respectively, i.e. f = Mo(Zo, Z)yo + M(Zo, Z)y is the BLME. For arbitrary Zo = Zo + 6Zo and Z = Z + SZ we have the corresponding estimate and criterion value, k=x + 6Zo(I - om+)[Yo - T1+(y -,24+Yo)] + 5Z(I - i' l+)(y - 42A+Yo), J(x)=Jl(Zo, Z) + Z2(0Z, Z)- Z(Zo, 6Zo) - J4(Z, 6Z), where Jl(Zo, Z)= E{1lA 2}=J(), J2(SZo, SZ)= E{II6Zo(I - to4+)[yo - tI+(y -4 2I o)]+ (24) SZ(I - il+)(y - 42t+yo)1I2} > o, J3(Zo, 6Zo) = 2tr{cov(A, SZo(I - oo+)[yo - 1i(y -:2+yo)])}, (25) J4(Z, Z) = 2tr{cov(A, Z(I -,i,+)(y - $2+yo)}. (26) If conditions J3(ZZo ) Z 0,), (27 J4(Z,SZ) (27) J4(Z, bz) =- 0, do not hold for some SZo and 6Z then from (24)-(26) it follows that there exists the estimate x* = Mo(Z, Z*)yo + M(Z, Z*)y = 0.5[(J3(Z ) + J4( Z))J2(, Z)Zo) +, o, {Z = 0.5[(J3(Zo, 6Zo) + J4(Z, 6Z))/J2(,Zo, SZ)],Z + Z, and J(x*) - J() = -0.25(J3(Zo, SZo) + J4(Z, SZ))2 < 0. Hence J(x*) < J(x), or x is not optimal. So (27) is necessary for optimality of x. Now let us suppose that (27) holds, then for all x J(X) = J1(Zo,Z) + Z2(tZo, Z) > J(&). Hence, x is the BLME. Furthermore, it is 9

easy to verify that (27) is equivalent to ( 5 ). Theorem 4.2 is proved. Appendix D. Proof of Theorem 5.1. First we prove that X' is unbiased. Denote mt = Ejxt}, 7Mt = E{ I}, /.t =Mt_ - mt = E{At}. Xo in (16) is the MSE, hence it is unbiased A0 = 0. Furthermore, d~t- I't)td = 't)Adt + k(t)utdt + il(t)d&~ + dw4;t. Using the first equation (15) and condition ii) of Theorem 5.1 we can obtain the differential equation for 'Mt: dfn~h = a tMftdt + b(t)utdt + [b(t)o+(t) + AI(t)(I - 0(t)0+(t))Ib(t)Atdt. Using (12) we can also obtain differential equation for mt dint = a(t)mtdt + b(t)utdt. Then fdz~t = {a(t) - [b(t)qo+(t) + AI(t)(I - q0(t)qS+(t))]0,(t)}Atdt, (28) Equation (28) has a unique solution At- 0, hence &t is an unbiased estimate of xt. Equations (12),(13) may be rewritten in the equivalent integral form [1] Xt = ~(t, 0)v + j ~(t, r)b(r)u1.dr + j ~(t, ~~,r Yt =J[0~(r-)xr + q$(r)uj]dr + j r d~, + j w 7 zo = q0VOL + wo, where 4~(t, T) is a solution of the following differential equation V(t, r) = a(t)4~(t, r),t>r,(,r)=I Processes {x,}, {y,} have a.s. continuous paths, so these paths are square 'Integrable in the time interval [0, t]. It is necessary also to mention that all operators, which transform v, {u-r}, {~-r}, {Wr} and wo are linear and bounded. Let us state conformity between generalized regression (1) - (2) and system (12) - (13): tIDOUO = q0Ol/ (Cu(P)= 07 ('NU0)(IL) = k t(r) 4~(r, 0>'v dr, =(u jY 0b(r) j '(r, ojb(cT)u~dudr + jq(,r)urdr 10

Ho'o = 0, (H260)() - o, (H1()(~,) = 0, Qoo = wo, (Qc)(o) = j dw, Aouo = $(t, 0)V, Au = | (t, r)b(r)uTdr, B = - (t, r)d~. So filtering problem (12) - (14) is a special case of the estimation problem (1) - (3) and conditions (5) are f Sc(At, (I - o0o+)yo) = 0, Scov(At, [(I- +)(y- 2 yo)](r)) = 0, E [0, t] (29) Using (15) and (16), we obtain A, = A(t, 0)A0 + j A(t, r){d(T[b((r)+(r) + M(r)(I - 0(r)+(rT))][q(r)d, + d]}, A0 - Ao = 0(I- 0o (I - ~o 0o+)yo = (I - o0+ ),o, 11

where A(t, r) is a solution of the following equation At(t, r) = {a(t) - [b(t)S+(t) + M(t)(I - q(t)q+(t)]4(t)}A(t, r), t > r A(r,r) = I. Then S(COV(ALt, (I - b+)o) = A(t, O)+Qo(I - +) =, i.e., the first condition (29) holds. Furthermore, t = A(t, r)A, + A(t, )dt - j A(t, )[b6(a)+(7) + M(o)(I- 0(o)O+(r))][i7(a)de + dw], and p, = [(I - - +)(y - 2o+yo))]() = j (I- ()+(a)) () - ( x,)da + i(r)deo + dwo]. Note, that cov(At, A.) = A(t, s)cov(A,, A,) for all t, s > 0. Then cov(At, p) = co(At, dp) = j cov {{(t, r)A. + A A(t, cr)da.j A(t, - )[b(,)q+(S) + M((t)(I - ()f+( ))][r(/)d, + dw,]}, {(I - 4(a)u+())[b(a)(x, - x,)da +,7()do, + dw)}} = 0 A(t, a){k(a)T(a ) + C(a)rlT(a)- M(a)R(a)}{I- (a)O+(C)}Tda 0 for all r E [0, t]. The second condition (29) holds. Theorem 5.1 is proved. REFERENCES [1] Liptser,R.S.,and Shiryayev,A.N.,Statistics of Random Processes, SpringerVerlag,New York, 1977. 12

[2] Barton, R.J., and Poor,H.V., An RKHS Approach to Robust L2 Estimation and Signal Detection, IEEE Transactions on Information Theory, 36, (1990), pp.485 -501. [3] Albert,A.E., Regression and the Moore-Penrose Pseudoinverse, Academic Press, New York, 1972. [4] Catlin,D.E.,Estimation of Random States in General Linear Models, IEEE Transactions on Automatic Control,36, (1991), pp.248-252. [5] Fomin,V.N.,Recursive Estimation and Adaptive Filtering, Nauka, Moscow,:1984. [6] Kurzhanskii, A.B.,Control and Observation under Uncertainty, Nauka, Moscow, 1977. [7] Chernous'ko,F.L.,Estimation of the Phase State of Dynamic Systems. Method of Ellipsoids, Nauka, Moscow,1988. [8] Rao,A.K.,and Huang,Y.-F.,Recent Developments in Optimal Bounding Ellipsoidal Parameter Estimation, Mathematics and Computers in Simulation,32, (1990), pp.515-526. [9] Anderson,T.W., Statistical Analysis of Time Series, Wiley, New York, 1971. [10] Katz,I.Ya., Minimax-Stochastic Problems of Estimation in Multistage Systems, Estimation under Uncertainty, Ural Scientific Center, Sverdlovsk, 1982, pp.43-59. [11] Kalman,R.,and Bucy,R.S.,New Results in Linear Filtering and Prediction theory, Transactions of ASME, Journal of Basic Engineering, 82, (1960), pp.35-45. [12] Mayne,D.Q.,A Solution of the Smoothing Problem for the Linear Dynamic Systems, Automatica, 6,(1966), pp.73-92. [13] Astrom,K.J., Introduction to Stochastic Control Theory, Academic Press, New York, 1970. [14] Kleptsina,M.L.,and Veretennikov,A.Yu.,On Filtering and Properties of Conditional Laws of Ito- Volterra Processes, Statistics and Control of Stochastic Processes, Steklov Seminar, N.V.Krylov et al, eds., Optimization Software Inc., 1985. [15] Aubin,J.-P., and Frankowska, H., Observability of System Under Uncertainty, SIAM Journal of Control and Optimization, 27, (1989), pp.949-975. [16] Borisov,A.V.,and Pankov,A.R.,Optimal Signal Processing for UncertainStochastic Systems, Proceedings of the 30th Conference on Detection and Control, Brighton, 3,(1991), pp.3082-3083. 13