THE UNIVERSITY OF MICHIGAN COLLEGE OF ENGINEERING LEAST SQUARES AND LINEAR UNBIASED MINIMUM VARIANCE ESTIMATION IN EUCLIDEAN SPACE AND HILBERT SPACE Philip H. Pis'ke Wilfiam L. Root...'.'.'' ~~ ^~ ORA Project-10110 1 U.i,'' -.. supported by: UNITED STATES AIR FORCE AIR FORCE OFFICE OF SCIENTIFIC RESEARCH AIR FORCE SYSTEMS COMMAND GRANT NO. 72-2328A WASHINGTON, D. C. administered through: OFFICE OF RESEARCH ADMINISTRATION ANN ARBOR January 1974

^ib "sk`'l 1 \

1. INTRODUCTION The problem to be discussed is that of estimating the unknown quantity h in the linear model y = Xh + n, (1. 1) where y represents an observed signal, X is a known linear transformation, and n is a noise or error signal. The unknown "vector parameter" h is an element of a separable Hilbert space 1; y is an element of a separable Hilbert space 9'2; X is a bounded linear transformation from /i into 4z' and n is either an / -valued random variable or an unknown error term in.2' The spaces ~ 1 and. 2 may be real or complex Hilbert spaces, and either or both may be finite-dimensional, although clearly only a finite-dimensional projection of h is estimatable when X is finite-dimensional. The following cases are considered: (i) h is an unknown, and n is an unknown error; (ii) h is an unknown, and n is a random variable in H with mean zero and a known covariance operator; (iii) h 2 is unknown but known to belong to a specified subset of /1 and n is again a random variable as in (ii). Case (i) is handled, as is conventional when there is no statistical information, by finding a leastsquares estimator for h. This estimator is also put in recursive form for a linear dynamical systems model. For case (ii) a linear unbiased minimum-variance (LUMV) estimator is obtained. For the special case of finite-dimensional parameter and observation spaces in a dynamical systems model, this estimator is also put in recursive 1

2 form. For case (iii) a modification of the LUMV estimator is obtained which is no longer unbiased but which yields smaller error under the stated conditions than the LUMV estimator. The least-squares solution for case (i) is well-known (see, e.g., [B-l]). We are not aware that the recursive form has been given for Hilbert-space-valued parameters, but it follows readily by standard methods. The LUMV estimator for case (ii) is developed in [B-1] and is stated in [R-2] without proof. The recursive form in finite-dimensional spaces is well-known (see, e.g., [S-l]). The modified LUMV estimator for case (iii) is given in [R-2] and [R-3]. Thus, most of the material contained herein appears elsewhere, and some is quite standard; however we know of no place where it has been put together in a reasonably self-contained, unified treatment. The proof of the Gauss-Markov theorem, that is, the theorem yielding an LUMV estimator, is somewhat complicated in the Hilbert space context, and has been put in an Appendix. The proof given is basically the same as that in [B-l], but it is superficially different because we do not use the theory of pseudo-inverses explicitly. The recursive solutions obtained for the least-squares and. LUMV estimators are of the same form as the Kalman filtering equations, but they are obtained as solutions to a different minimization problem. Kalman filtering requires that h be a random variable, and then the expected value of the squared error in the parameter estimate is minimized.

3 There are, of course, a great many applications of the theory of estimating parameters in a linear model. However, we have in mind application to system identification, considered from a rather general point of view according to which there is no initial parametrization. This application leads to the model (1. 1) and the estimation problems based on (1. 1) as follows. If we momentarily neglect statistical considerations, the systems to be considered can be characterized by the equation y = H(x) where x (the input signal) is an element of a specified set,,y (the resulting output signal) is an element of a specified set C, and H (the system transformation) is an element of a specified class of functions from into. Briefly, the basic identification problem is to find an H E 4 that adequately represents an unknown system from the results of one or a series of experiments involving the introduction of known input signals x and measurement of the corresponding output signals y. One can regard each x E,a as determining a function from / into 0. To emphasize this point of view, we can write y = H(x) = X(H), He ~, i.e. X is the mapping that assigns output y to system H when x (- X) is the input signal and y = H(x). Then, obviously, the problem of identifying H when the input signal is x is essentially the problem of inverting the mapping X. If the output space! is a linear space this can always be interpreted as a linear inversion problem as follows. Define addition and scalar multiplication in the way they are conventionally defined for spaces of functions,

4 i.e. for each H1,H HH E / and all real numbers a define H1 + H and aH by (H + H)(x) = H(x) + H (x), x e (aH)(x) = a[H(x)], x ei Extend ) if necessary so that it is closed under linear combinations; then 2 becomes a linear space. This extension is permissible because it only enlarges the basic class of systems being considered. The notation A will be retained, but it is henceforth assumed that CV is a full linear space. Then X is a linear transformation on /, because: X(a H1 + 3 H2) (a H1 + 3 H2)(x) = a H1(x) + H2(X) a X(H1) + X(H2). The first and third equalities follow from the definition of X, the second from the definitions of addition and scalar multiplication in /. Hence, with the linear structure prescribed for?, the mapping X from / into'/ defined by y = X(H) = H(x) is linear. This simple observation is important because it says that the input-output identification problem can always be studied by linear analysis. When output observation noise is added, the problem can be treated as a linear regression problem. The observation takes on practical as well as theoretical importance if, when parameters are introduced, as they must be to permit actual computation, they are introduced in such fashion as to determine linearly the transformations

5 H. Then the actual computation algorithm will be linear, or, in the usual terminology, the problem will be "linear in the parameters. " The only condition necessary for this linearity of X is that ~ be a linear space. The problems applicable to this paper are those in which 4 and a are separable Hilbert spaces (denoted, respectively, by X and I21) and X is also a bounded transformation. The assumption that X is bounded is chiefly for convenience, but it is realistic for most applications.

2. NOTATIONS AND CERTAIN ELEMENTARY FACTS ABOUT LINEAR OPERATORS IN HILBERT SPACE To establish notations, let /i1 and 22 be separable Hilbert spaces, which may be the same space. If ~ c 1 we denote its closure in i1 by and its orthogonal complement in %1 by i. Let X be a linear mapping from (part of) 21 into'. JP(X) = {h E 7Il: Xh is defined} is the domain of X; /(X) = {h E a(X):Xh = 0} is the null-space of X, and a (X) = {y E 2i: Xh = y for some h} is the range of X. All are linear sets. We call X an operator from 1. to 2; the term bounded operator will imply that not only is X bounded in the usual operator norm, but also that /(X) = 1 unless an exception is explicitly stated. Let X be a bounded operator. Its adjoint, X, is the bounded operator from'2 to il defined by (Xh, z) = (hX*z) for all h e 1 z E -2. Note that (Xh, z) is an inner product in;'/ whereas (h, X z) is an inner product in?Y. There is no real ambiguity in using the same notation for inner products (or norms) in both spaces. The following facts are standard, and for the most part can be proven easily. (However, see [B-l], Appendix A). Assume X is a bounded operator 1) L(X) and'2(X ) are closed subspaces of 1 and /2' respectively. 2) y (X) is closed if it is finite-dimensional, which is necessarily true if 21 or 7>2 is finite-dimensional. In general, a necessary and sufficient condition for <,(X) to be 6

7 closed is that X restricted to'kh-(X), which is a closed subspace of 9l, be bounded from below. That is, there must exist a constant c > 0 such that \I Xhl > c I hll for all h E X-L(X). We denote the restriction of X to r7Z-(X) r by X 3) tV(X)= -7-(X), and 6(X*)= 7,(X). If we denote direct sum by ), then. = }i-(X) L - (X) = (x) (x") =- (x ) L 0... ~x) /a 1 7, X(x (X =.(X) ~ #,(X) 4) 7t(X X) ='(X), and. 7Z(XX ) = 7l(X ). 5) ( (X X)c * (X) c Z(X X), and de (XX")c {(X)c /(XX ). 6) If any one of the sets 6(X), /(X ), (XX), &_,(XX ) is closed, so are all of the others, and then,(X) =- c(XX ) - v4(X), and,(X) = T(X X) - (X ). Consider A = X X, which is a bounded self-adjoint, nonnegative operator from?/ into?1. Since A = A, 7- -(A) = /(A), and we denote the restriction of A to f,(A) by A. Since 7. (A ) = {0}, r r(A A is 1:1 from'(A) onto /t(A) =,(A ) c.(A). Consequently, -1 Ar exists as an operator (not in general bounded) from /7i(A) r to j(A), with /,(A ) = (f(A). It is thus possible to write (X X)1 g for all g E /(X X), where X -- 1 - (X X) means A (the subscript is dropped to simplify the notation). r

8 If X has closed range, then it follows from (6) that A is 1:1 r fr om /(A ) onto j (X) = I(X) = E.(A), and A is then a r r bounded operator on ^(A). These observations, not all of which are usually stated explicitly in standard works on Hilbert space (see, e.g. [R-1] ), coupled with some basic results that are given in almost any reference, should suffice for the material in the body of the report. If one wishes to restrict consideration to finite-dimensional space only, the formal calculations in what follows can be read as just matrix calculations. In Appendix A it is necessary to deal with unbounded closed operators, and considerations similar to those above but somewhat more intricate are necessary.

3. LEAST-SQUARES ESTIMATION In the model (1. 1) we take n simply to be an unknown error; n is not assumed to have any statistical properties. It is desired to find a best estimate i of h according to the criterion that the corresponding error n = y - X be of minimum norm. This is the classical criterion of least-squares estimation, and is usually stated: find fi such that llxi -yl < Xh - yll VhE /. To see what 1 should be we first do a formal calculation that consists essentially of completing a square. Precise statements and proofs are given subsequently, but it may be noted that the following calculation already yields a proof for the finite-dimensional case. The expansion of I Xh-yll is Xh-ylJ = (Xh-y, Xh-y) = (X Xh,h) -. 22e(h,X y) + IlyI2. (3. 1) * Let A = X X. Since A is a nonnegative self-adjoint operator, it has a nonnegative self-adjoint square root, A1/ ([R-l,p. 265). (A 1/)- 1 exists as a (perhaps unbounded) operator on p(A /2) with domain equal to ^(A1/); denote (A2) by A- Suppose that -1/2 * -1/2 * A X y is defined, and put g = A X y. Then We note that (A2) (A-)l/ = A-t/h is also a nonnegative self-adjoint operator. 9

10 A 1/ h-gl = (A1/Zh, A1/h)- Z FA1h/2, g) g -(Ah,h)-2 Z(,(h,X y)+ ligll (3.2) 1/2 1/2 1/2 1/2 since (A h, g) = (h, A g) = (h,A/ A X ) (h,Xy). But (3. 1) can be rewritten 2 + j~gjI2 + (1* 212 2 IIXh-yl2 = (Ah,h) - 2/t7,(h,X y)+ Ilgl2 + (IlYll - igl ) IIA1/Zh-g gll + (lyl - IglIg ). (3.3) Obviously (3. 3) is minimized by h = f such that A 1/ = g if such f exists. That is, the minimum is given formally by f = A-1/2g = A- /A- 1/2y -1* =(X X) X y. (3.4) The expression on the right side of (3. 4) is a classical formula for the least-squares estimate in finite-dimensional spaces. We now develop rigorously the rather simple basic facts about least-squares estimates. Proposition 3. 1: Put y = yl + y2 where y1,?,(X), y2 E 7(X). Then a necessary and sufficient condition for there to exist an h such that IXh-yll is a minimum is that yl e /(X). If ye f (X) then any h such that Xh = yl provides a minimum. Proof: Since (y, y ) = 0, IIXh-yll2 = - Xh-y-y2y I= lXh-yill2 + Ilyl (3.5) If Y1 E f)(X), min IIXh-yl exists and is equal to II y. Clearly any h such that Xh = yl provides a minimum. Conversely, if min I Xh-yE { exists it must be equal to I yzl}, which implies that

11 If X has closed range, i.e. 9(X) J= (X), min IlXh-ylI always exists. As mentioned in the previous Section, a necessary and sufficient condition that X have closed range is that X restricted to — (X) be bounded from below. If the range of X is finitedimensional it is always closed and hence a minimum always exists. Finally, it is worth noting that since the estimate is based on an observation in the output space, if X is not 1:1, the minimizing ~ is not unique. In this case a further criterion must be introduced to fix n; usually 1f is taken to be itself of minimum norm, which amounts to requiring that h E /o (X). Proposition 3. 2: If y1 e / (X) then (X X) X y is defined and provides a minimum. Proof: -1 * * -1 * (X X) X y = (XX X (1 + Y)', - *(X X) X Y1 Since y1 E?. (X), there exists h such that Xh = y1. Hence * -l * * -l1 (X X) X y =(X X) X (Xh) = h * -1 * which implies that (X X) X y is defined. From Proposition 3. 1 and the fact that * -1 * X[(X X) X y] - yl - 1 * we have that (X X) X y provides a minimum. Proposition 3. 3: If (X X) X y is defined then a minimum exists (i.e., y1 e / (X)) and is provided by (X X)1X y.

12 -1 * Proof: Since (X X) X y is defined we can write * -1 * * -1 * (XX) X y = (X X) X (Y1 +Y) c -1 * =(X X) X Y1 Hence X y1 E AI[(X X) ] so in fact X y1 e,(X X). Thus there exists h E X1 (in fact, we can choose h 7~ - (X X)) such that * * X Xh = X y Since Xh E 7' — (X ) it follows that Xh = yl (i.e., y E e (X)). Since yl E,/(X), we know that a minimum exists from Proposition * -1 * 1. 1. Proposition 1. 2 yields the fact that (X X) X y provides a minimum. I| It is perhaps worth noting that if yl E,F/(X) then the steps in the heuristic derivation of the' least-squares estimation formula are - 1/2* justified. Recall that this amounts to showing that (X X) X y * -1/2 * - 12 * * -1/2 and (X X) (X X) X y are defined, where (X X) means * 1/2 the inverse of the restriction of (X X). From Proposition 3. 2 -1 * * we have that (X X) X y is defined, so X y E,/(X X) and we can write * * z X y X Xh, hl e -(X X). Then hI e -4t(X X)] since 1 1(X X) c 4(X X)1 ] Then 1/2 - 1* 1/2 1 /2 But, (X X) h [(*X ( X X)X h *-1/2 =(X X) z -1/2 (X X) X y, * -1/2 * so (X X) X y is defined. Also

13 1 j- 1 1/2 * -12'-1/2 * [(X X) ] (X ) /h - (X X) (X X) X * -1 * is defined. The last expression can be shown to be just (X X) X y, of course. We comment briefly on the situation when no minimum exists (i.e., Y1 /A (X)). First, one can always find elements h that provide an approximation to a minimum. In fact, given e > 0, one can find Y e I(X) such that 1IYl-Y l1 < E. Write y =Xh, then I Xh - y} - x y-2 2 JXh - yl|2 = I Xh: Y1 - Y211 < E + ly211 However, this fact is of limited usefulness because of the result of the next Proposition. Proposition 3.4: IfY1 y,;(X), then any sequence {hn} such that lim Xh = y1 is unbounded. n -oo Proof: Suppose that there exists a constant C such that IIh l < C for all n. Since every bounded point set in a Hilbert space is weakly compact [A-l, p. 46], there is a subsequence of {h }, which we denote by {h }, such that (h, f) -> (h, f) for all f E,1. Pick an arbi1 1 trary z E 9"2 (without loss of generality we can choose a z such that I I z = 1) and consider (Xh-yl, z) = ((Xh-Xh ) + (Xhn - ), z) n. n. 1' 1 1 = (X(h-h ),z)+ (Xhn-y1 z) 1 1 = (h-hn,X z)+(Xh n-yl z) 1 1 Since izilj = 1, l(Xh-ylz)l < I (h-h,X z) + l lxh - YjIl. i i1

14 Given any > 0, there exists n such that for n. > n, \Xh - Y II O 1. < e/2 and I (h-h, X z)j < E/2. Hence I (Xh-y1, z)j < 6, but e is arbitrary, so (Xh-yl, z) = 0. Since this relation holds for all z E'/' we must have Xh = y1. But yl i <(X), so a contradiction results and thus {h } must be unbounded. II n In the finite-dimensional case where ^ and are Eu1 g clidean spaces,,/ (XX) = X) (X ), X X is a 1:1 mapf- i* - ping from X (X ) onto'.. (X ), and (X X) is well defined as an operator from /.,(X )to /'j(X ). Also, of course,,(X ) =, (A). The formula - = (X X) X y (3.6) which gives the least-squares estimate of h is always defined. In what might be regarded as the conventional situation, at least from the statistical point of view, 77 (X) 0, and K (X) = 7 (X is a proper subspace of j1' usually of much lower dimension. In other words, there is redundancy in the observations, and the leastsquares formula uses this redundancy to minimize the errors. At the other extreme, the dimension of 2 exceeds the dimension of 1' and,I (X) = There is then not only no redundancy in the observations, but part of h cannot be estimated because, (X) is necessarily non-zero. In this case h minus the projection of h on 7L1(X) is treated as error and its norm is set to zero by the estimator. The formula for the least-squares estimator thus gives a pseudoinverse. In general, neither.71(X) nor <?(X ) are zero, so part of h cannot be estimated, while on the other hand, there is redundancy

15 available for reducing the error in estimating the projection of h on In the infinite-dimensional case the situations regarding redundancy in the observations and a non-estimatable part of h are essentially the same. However, (X) need not be closed, and in case it is not, least-squares estimation does not make much sense practically because only for some observations (those that project into the range of X) does the method apply. If,,(X) is not closed. it is always possible to introduce a new Hilbert space ~" which contains the elements of /A(X) and in which,/8(X) is closed. In many instances "2 will give a satisfactory model for least-squares estimation, but we do not discuss this matter here (see [B-I, Section 3]). For the remainder of this Section we consider the problem of obtaining a recursive solution for the least-squares estimation problem in the case where and. are Hilbert spaces. The pro1 2 cedure followed for converting the basic solution into a recursive one is standard for the finite- dimensional problem, and. as will be seen, there is no difficulty in extending it to Hilbert spaces. We consider the system 1 Blhl + nl Blh + nl Bn h+ d Yn n.,11 n 2 -= B2'h + n2 33 3 3, 2 2. lhl +n3 = B313 lh +n2 n 3 Bn- n, l' i * 4 Bhl B + nh+n

16 which equivalently can be written in the state variable form h, = i+1 i+, i y = B.h. + n il,., (3.7) 1 1 1 1 hi h We assume that B. and i+l, i i=,...,n, are all bounded operators with zero null-space and closed range, and in fact that the i+l, i d d, are 1:1 from 1 onto /1. We can then define X = B1, X2 d B c, 1''' Xn = B 1 so that 2 2, n n n, 1 Y1 = Xlh+nl Y2 h+n (3. 8) y = X h+n n n n where the X. are bounded operators from t/i to /2 with;(Xi) 1 1 2 = 0 and`(Xi) closed in /. The n. are taken to be unknown errors lyingin /V. The system of equations in (3. 8) can also be written in the form Xn = h+n (3.9)..n - ~n ~n where YX n ~1 "~1 1 d y.' X = x n= -n n ~ n ~n Y n X n ~n ^In n i. e., y and n are elements of, the n-fold direct sum of copies 2n ~n' f ect andX is abounded operator from tl into s. The "vector-matrix" notation is self-evident.

17 From Proposition 3. 3 it follows that the least-squares estimator of h is given by = (XX) X y (3. 10) n - n~n " non * -1 if the expression on the right is defined. Put P = (X X ) and n n n b =X y. Note that since <,(Xi) is closed, so is /(X. X). Thus t;(x ix) - (xi i) - (X) = (x i) / (x ) 11 i1 = 1 7 (X X) = [ ((xi 1 = I(X.x)L - - = o. * o So X X. is a 1:1 map with _k?(X. X.) = XX = Consider R(P ) = /(X X ). Suppose z J_ (X* X). Then n nn~ nn.n ((X nXn) Y, 0 for every y E 1 But ((QnXn)YZ) = ((X 1X)y,z) +...+ ((X X)y,z) = (Xy, X )+...+(Xy, Xnz) Since the above is true for all y E, it must be true for y = z. Hence we have ((X nX)z, z) = X1zlll + + I XnZll = 0 This implies that we must have Xlz =.. = X z = 0 which in turn implies that z = 0 since (Xi) = 0, i=l,...,n. Hence (X X )'1' Since XX X= X X + +X X n~n 1 1 n n s elfadoin, ( is self-adjoint, 27(X Xn) = O~ = 0 by an argument similar to the one presented above.

18 We have just established that )3(P ) = G[(X X )] = The operator (X X X) is self-adjoint, hence it is also closed. n" n However, a closed operator defined everywhere is bounded. So * -1 -1 P =(X X ) is bounded and self-adjoint. Of course, P n ( n nn X X is also bounded and self-adjoint. n ~n We can now write -1 * * * P = X X X +. +X X n ~ nn 1 1 n n *. X X +X X n-l n-1 n n P +X X (3. 11) n- 1 n n Also, d * * * * b X X +X +... +X n n~n 1Y1 22 nn "- nYin + XY n-ln-l nYn =-b +X y (3. 12) n-l n n From equation (3. 11), PP P 1= P [p-l +X X ]P n n n-l n n-i n n n-i Hence, P P +P X X P (3. 13) n-l n n n n n-1 and P X =P X +PX XP X n-1 n n n nnn n-l n = PX [I+X P X ] n n n n-l n We now note that [I + X P X* ]1 exists since [I + X P X] n n-l n n n-l n [I + (X Pn1 )(X P 1) ] which is known to exist [R-1, p. 307]. n n-i n n-1 Furthermore, the norm of this operator is bounded by one. Therefore, we can now write

19 P X [I+X P X ] X P X X P (3.14) n-1 n n n-l n n n-l n n n n-l Using equations (3. 13) and (3. 14) we obtain P =P -P X [I+X P X ]n X P (3.15) n n-i n-1 n n —1 n I n n-l Using equations (3. 12) and (3. 15) in the expression for yields n -{ ={P - P X [I+X P X ]- X P }{b + X y} ~n n-l n-1 n n n-l n n n-l n-l n n =P b n - P X [+ X P X ] X P b n-l n-l n-l n n n-l n n n-l n-l *. * * r -1 x* +P X y-P X [I+X X P X Xy (3 16) n- n n- n n n n-l n n- n n We now observe that if we have an operator A such that (I+A) exists and is bounded, then [I - (I+A)- A] = (I+A) 1[I+A-A] = (I+A)1 Thus we can now write h = fi -P X [I+X P X ] [X -y ] ~n en-1 n-l n n-l n- n nn- or i n-l +Knn -X fi] (3. 17) where K - P X [I+X P X 1 (3.18) n n-l n n n-l n and P = P -KXP. (3. 19) n n-l n n n-l Hence, to obtain the recursive least-squares estimator, we start with 1 (X 1 ) 1X 1 P1 = (X 1X) and apply the recursive relations for n=2, 3,....

4. LUMV ESTIMATION Consider again the linear model y = Xh+n (4.1) where as before h belongs to a separable Hilbert space;/1 (which may be finite-dimensional) and is the vector of parameters to be estimated, X is a bounded linear transformation from Y1 to a separable Hilbert space;, but where n is now a random variable taking values in: (which may also be finite-dimensional). Again'2 y is a vector of observations in'?, but now, for each h, it is a random variable taking values in t2'. We assume that n has mean zero, that its covariance operator R exists and is strictly positive definite and that R is Hilbert-Schmidt. The covariance operator R is the unique bounded, self-adjoint operator, when it exists, satisfying E(f, n)(n, g) = (Rf, g). If R is Hilbert-Schmidt, then R has finite trace. If 2 is finite-dimensional all these conditions are satisfied automatically of course except the one that R (which is just the usual covariance matrix) be strictly positive definite. ( For further discussion of covariance operators for Hilbert-spacevalued random variables, see [B-1, Appendix B]. We use the notation h = h 2+ h h E? L(X), h2 E'7/(X) We need this condition for our proof. In the finite-dimensional case there are proofs that do not require it, see [A-2]. 20

21 so that y = Xh + n. The problem is to find an unbiased linear estimator for h1 that is of minimum "variance, " if such exists. That is, find (if possible) a bounded linear operator C from /Z to 7 1(X), defined on all of -', such that the estimate P i Cy has the properties (1) Eti = h (Z) E.ll - ERI Z = Ellfi-hlll1 exists and is a minimum. We ask only for an estimator C for h because clearly the observation y tells us nothing about h2. Before proceeding to this problem we discuss first the possibility of unbiased finite-variance estimators. The unbiasedness condition (1) can be rewritten Ef = E(CXhl +Cn) = CXh = h (4. 2) or, CX hl = h, where X is the restriction of X to /- (X). Conr 1 r sequently, the restriction of C to /(X) must be X. Now we note r that C CH- sup I ICyl > sup I Cyll = supX lyll =1 I lyll=1 Ilyll =1 r y eR(X) y ER(X) 11X 11. r Thus C cannot be bounded unless X-r is bounded, or equivalently, X r has closed range. Hence we must have the condition that X has closed range. The mean-squared error, or "variance, " is given by

22 Ell -h12 Ell CXh +Cn -h1Z = El Cnl (4. 3) from the unbiased condition. Let {.} be any c. o.n. s. for Then lICnI E = I(Cn,,i)l = (C i, n)(n,C i) (4.4) i i Formally we have EJICnl2 = E(C, n)(nC i) = (RC C i i =/ (CRC.i',i) = Trace (CRC ) (4. 5) i In fact, if C is such that Trace (CRC ) exists, the interchange of expectation and summation is justified (by the Beppo Levi theorem), and equation (4. 5) holds. Now if R is Hilbert-Schmidt, then 1/2 1/2 * CR and R C are Hilbert-Schmidt since C is bounded, and CR/2 R1/2C = CRC has finite trace [G-1]. 1/2 Thus if we assume that R is Hilbert-Schmidt, C can be any bounded operator that satisfies the unbiasedness condition, and a finite-variance estimator results. In this Section we shall derive the LUMV estimator only for the classical case where %1 and 7VZ are finite-dimensional, with the statement and proof for the Hilbert-space case relegated to the Appendix because of some fussy details required. The structure of the proof remains the same in the Hilbert space case. However, there are two preparatory lemmas which we now state and prove, which, because they will be used in the Appendix and because they are of) some interest in themselves, are given in Hilbert space form. In the following two propositions A need be only a closed, densely defined

23 linear operator, but if one does not need the results for so general an operator, A can be taken as bounded and everywhere defined. Proposition 4. 1: Let? and /. be Hilbert spaces and let A be a bounded (or merely closed and densely defined) linear operator from to. If z E j and b E'X' then a necessary and sufficient condition that a minimum exist for the problem min 1121 subject to Az = b is that b E t (A). If a minimum exists it is unique and belongs to A A X 1(A). Also, if z e 7?(A) and Az = b then z is a minimum. Proof: Clearly, b k /,(A) there exists no z E s satisfying Az = b. Suppose b E /4(A) and let z be any solution of Az = b. Put z =z+1 z, z -(A), z2 e''(A). If z' is any other solution, put = z + z' z_ E 7 (A), z E (A) 1 2 1 2 Then Az - Az' = A(z-z') = 0 which implies that (z-z') E -7(A). Since z-z' = (z1 - ) z+ (z - z' ) e (A) (1 1 ) 2 2 With z z (A), we must have l =z That is, z is uniquely determined. Since Az1 =b, and IIz111 < Iz121, z gives the desired minimum. Now, suppose z E 721(A) and Az = b. Then z must be the same as the z1 just determined. |IH

24 Proposition 4. 2: If A is as defined in Proposition 4. 1, then a sufficient condition that the problem min l|zl 2 subject to Az = b has a solution is that b E X (AA ). The minimum value is then given by = A (AA ) b. -1 * l -1 Proof: Recall that by (AA ) we really mean (AA ) where ~~ ~~~~~~~~~~~~~r r (AA ) is the restriction of AA to ~ (AA ). Since,.(AA ) c (/t(A) the condition is sufficient by Proposition 4. 1. Since &/(AA ) c /'(A) and d [(AA*)^- = - (AA ) n noW(AA ) the expression for ~ is meaningful. Note that A = A[A (AA ) lb b A From Proposition 4. 1, z provides a minimum if and only if A171~) (A R~Ag C L * A A A AE e -(A) = Bt(A ). But (A ) c ((A ), so z must provide a minimum. Il{ For the remainder of this Section 41p and,'T are to be finite-dimensional Euclidean spaces. For now let'' and 2/ be N and M-dimensional, respectively, so that y and n ale random M-vectors and h is an N-vector. Let {.i}, i=l,..,N, be a c. o. n. s. for qc' with the special property that {1..., p} spans 7? (X), where p (p < M,iN) is the rank of X. Then N EllCn112 c * ElICnII = Li (RC iC i i.). i=l Put ci = C 9., so that

25 N N 1 EIICnj =- Z (Rc.,c.) Z IIR1/ cII. (4.6) i=l i=l Now the condition (4.2) that CXh = h implies that CXh = CX(hl+h2) = h1, so CX is the orthogonal projection on -7? (X). But then CX = (CX) = X C so that X C i = CX. =., i=,.,p 1 1i 1 = O, i=p+l,...,N or, X ci = p., i=l,...,p 0, i=p+l,..., N. (4 7) The problem of finding the LUMV estimator C is thus equivalent to finding c., i=1,...,N, to solve the problem: N minimize Z 1i R /c11 i=l 1 1 subject to X c. = i, i=l,. p 0, i=p+l,..., N We can individually minimize each term R c subject to X c. = i., i=l,.., p, as we shall show immediately, and we can certainly minimize each term IIR c.I 2 ip+l,.., N, by setting c. = 0. The c. that yields these minima will then determine the esti1 1 mator C. To minimize R 1/2c. i 2 subject to X c. = 0. we first put 1 1 1 d 1/2 -1/2 z. = R c., so that c. = R z. (which is defined since R is stric1 1 1 1 tly positive definite). Then the problem is to minimize 1 z ill sub* -1/2 ject to X R z. = p.. In the finite-dimensional case which we are 1 1

26 -1/2, * -112 *-1 considering,.(X R ) = (X R ) = X R X) =/( * -1* -1/2 = (X(X R X), so Proposition 4. 2 applies with A =X R and A -1/2x* -- R -1/2x) 1 z. R (X R R X) -1/2x* -1 -1 R 12 X(X R X) i,...,p. (4.8) Thus, the minimizing c. are A -1/2A -1 * -1X. = R =z. = R X(X R X) ~., i=l,...,p 1 1 1 = 0, i=p+l,...,N. (4.9) A Since c. C i., 1,* — 1 * - 1 -1 C = R X(X R X) P I * where P is the orthogonal projection on 7?, — (X) = N,(X ) -= (X R X). Then A\ c -1-1 - * 1 C = P(X R X) X R -1 -1 * -1 (X R X) X R. (4. 10) This development is summarized, in the following proposition. Proposition 4. 3: If /2 and ]~ are finite-dimensional and R is 1 2 strictly positive definite there always exists a unique LUMV estimate for h given by 1~ (X*-1 -1 *-1 fi = (X R X) X R y. (4. 11) The error variance is ElIg - hl = Trace[(XR X) ]. (4.12) A Proof: That hI as given by (4. 12) is LUMV has already been proved. The minimizing element z. given by (4. 9) is unique by Proposition 4. 1, and the uniqueness of 6 and hence h follows obviously. Finally,

27 from (4. 5) Elh -hlL = EliCnl -= Trace(eRC = Trace[(X R X) 1X R-1X(X* R X)- Trace[(X* R-1X) -1 * -l -1 where by (X R X) we mean as always the inverse of the restriction to J(X R- X). ll Corollary 1: The proposition still holds if only'4Z is finitedimensional. Proof: In this case'7 — (X) is necessarily finite dimensional, and the proof is unchanged except that there are now an ifinite number of Ci to be set equal to zero, corresponding to those pi that lie in /'(X). I11 The case where only /1 is finite-dimensional is not so sim1 ple since R becomes an unbounded operator. For this case it is necessary to go to the more general theorem of the Appendix. Corollary 2: Not only is A A* * Trace (C R C ) < Trace (CRC ) (4. 13) where C is any unbiased estimator, but also, A A* * CRC <CRC (4. 14) in the usual ordering of nonnegative self-adjoint operators. Proof: The inequality (4. 13) is just one way of saying that C is of A A * minimum variance. The inequality (4. 14) says that (C R C f) < (C R C f, f) A A A A* for any f E But we actually proved that (RA., cA.) = (C RC i, w.i) was a minimum for each.i amongst the class of unbiased estimators C. Since {'i.}is a c.o.n.s. this establishes (4. 15). |||

28 Corollary 2 says that we actually proved a little more than we set out to show. It has the following useful implication. Corollary 3: Let B be a linear transformation from 91 into //. 1 3 The unbiased linear estimate h of h that minimizes ElI Bhl - Bhll 4^~~~~~~ A is given by h = h, the LUMV estimate. In particular, h minimizes EI Xhl - Xh'll over the class of linear unbiased estimates. Proof: Let h' be an unbiased linear estimate, h' = Cy. Then CXh = h and 1 1 ElIBh - Bh'11 = El Bh - BCXh - BCn}2} = E IIBCnil2 = Trace(BCRC B ). (4. 15) Now, for any f E Vi, by Corollary 2 * * * * * (BCRC B f,f) = (CRC B f,B f) A A* * * > (CRC B f, B f) A A* * (BCRC B f,f) * A* * * *A Hence BCRC B > B6RC B which implies that Trace (BCRC B ) A A* * > Trace (BCRC B ). Remark. The formula (4. 11) for h involves (X R X)- which is meaningful as the inverse of the restriction of a transformation. However it cannot be directly interpreted as a matrix inverse unless * -1 1 i -1 / 1'Z(X R X) = 0. Since 7 — (X X) = 1(X R X)= 4(X R ) *-1 -1/2 7?,?(X R X) = -1(R /X) = 7Z(X). If,(X) X 0, an orthogonal transformation of coordinates in Ae can be used to put the matrix 1 * -1 representing X R X in diagonal form, from which a matrix representation of (4. 11) is obtained easily.

29 Let us now consider the problem of obtaining a recursive solution for the LUMV estimation problem in the finite-dimensional case. We consider the system d Y1 = Blh I + nl - Blh + n B+n - B h+n 1 B1 1 1 1 y B4 34h+nZ =h + Bnh Y3 = B32302, l h Y3 - B33, A2, 1h1 + n3 - B33, 1h + n3 y = B h 1(t 1 2-*~~~4> + n = Bng h + n Yn nn, n-ln- n- 2 n-2 2, Ihl n n,l n which equivalently can be written in the state variable form hi+ =..h. i+l =i+l, i i y. = B.h. + n. i=l,...,n' i 11 1 h d h (Note that the subscripts no longer refer to projections. ) We assume that each h. is an m-vector, each B. is a kxm matrix, each.i+, i 1 1 i+l, i is an mk m matrix, and each n. is a random k-vector. The noise vectors n. are taken to be uncorrelated, to have mean zero, and to 1 have the same covariance operator (or matrix) R, where R is strictly d d positive definite. We can define X = B X = B,. X 1 1 I 2 2 2, I' n d = B s so that n n, 1 =X h+n Y 1 X 1 1 2 Xh + n Y R 2 2 (4. 16) y =X h+n n n. n

30 The system of equations on (4. 16) can also be written in the form y = Xh+n ~n ~n ~n (1) where yl~ -X n1d, d y ~ X n ~n ~n -n X = R R = ~n R ~n R_1 o (1) The "partitioned matrix" notation is to be understood as a shorthand notation for defining n,,, as in Section 3. Thus X is a linear transformation from m-dimensional space into kxn-dimensional spaceI defined in an obvious way. With this interpretation, (X*R-lX )" is meaningful, as before, as the inverse of the restriction ot an operator.

31 *-1 -1 Put P = (XR X) and b X R. Since n ~n -n ~n n n n ~n n - 1 - *1. * - 1 ~n ~n 1 2 n we have that -1 -1 * -1 P1 =P X+X R X (4. 18) n n-1 n n and * -1 b = b +X R y (4. 19) n n-l n n From equation (4. 18) P PP P[p + X R X]P n n n-l n n-l n n n-i Hence * - 1 P = P +P XR X P (4 20) n-l n n n n n- and,R- 1/2 ~ -1/2 -1/ 2 ~ -1/2 P xR / -P XR/2[I+R-/2 XR ]. n-i n n n n n-I n The inverse to the bracketed expression exists (see Section 3) and, P X R -1/2 R-1/2 P X * -1/2] -R1/2X n-l n n n-l n n n-l *- -1 P X R X P (4.21) nn n n-I Then from equations (4. 20) and (4. 21) one can obtain P P - P X R /2[I+R- /X P X*R-/R- X P n n-1 n-l n n n-l n n n-l (4. 22) A Using equations (4. 19) and. (4. 22) in the expression for h yields -"n — /2 ] {P -P X R n[I+R1/ P X R R- XPn- n n n- n ~R XP ^b +X R n n-1 n-l n Yn

32 P b -P X R [ ] R X P b n-l n-l n-l n n n- n-l * -i * -i2 -1-i/2xp * -i +P IX R-1/2 1R-1X X R P X R [ R X P X 1 n-1 n n n- n n n- n n n A ^* -1/2 -1 i /2 A h P X*R-I/Z[ ] IR~1/2X h h -P X R'[ Xh -n- n- n-n+ P X R 1/2I [ ]-R- /2X P X R-1/2R-1/2 n-i n n n-i n For operators A such that [I+A]- exists we have I - [I+A]'} [I+A]- [I+A-A] [I+A]1 Therefore, hn n- P Xn R -1 /2[ ]-1-1/2X h + P X R-/2[I+RX P X R /2]1R/2 n-l n n n-i n n h 1 P X R 1/2[I+R -1/2X X R -n- n-i n n n-i n [R X/2n - R / n] or h h + K[y - Xh ] (4. 23) "n'~n-I nn n n- 1 where n n- [R + X PnlX] (4. 24) and from (4.22) P P - K X P (4. 25) n n-i n n n-l Hence, to obtain the recursive LUMV estimator we start with ^ * -1 - * -1 hl = (X R X) X R (4.26) and Pl = (X R X) (4. 27) and apply the recursive relation for n=2, 3,.... The error variance th at the n step is

33 Ellh-h h -= Trj(X R X ) ~n - n -n -n = Tr [XlR'X1 +X 2R X2 (4.28) +X*R- Ix 1 n -1 +X R X n nJ

5. A MODIFIED LUMV ESTIMATOR Each term in the model y = Xh+n (5.1) is to have the same interpretation as in Section 4. It is now supposed, however, that there is prior information that the unknown h belongs to a known bounded subset of l. This information is used to modify the LUMV estiA /A\ mate h so as to reduce the mean-squared error. The new estimate, h, is biased, so that its mean-squared error becomes a function of the true value of h, but its mean-squared error will be seen to be not greater than that of A h, for any h. It will be convenient to use the term rectangular parallelopiped (r. p. ), defined as follows. In a separable Hilbert space 2v let {(.} be a c. o. n. s. Any subset of X of the form B {hE: I (h,i) < b., i=l,2,...} (5.2) where {bi} is a bounded sequence of nonnegative numbers will be called an r. p. with respect to the i.. Suppose now that h = Cy is an LUMV estimate for hi; i. e., either the conditions of Proposition A. 4 are satisfied so that C is the S given by that A proposition, or, in the finite-dimensional case, C is simply the C of equation (4. 10). The operator CRC is self-adjoint and compact (since C is bounded 1/2 and R is Hilbert-Schmidt), so by the spectral theorem for such operators one has * 2 CRC i =, i=l,2,... (5. 3) where the eigenvectors from a c. o. n. s. and the eigenvalues i are real and 35

36 nonnegative (zero eigenvalues are allowed if necessary so that the set {i.} * can be complete). Furthermore, since CRC has finite trace, oo * 2 Trace (CRC) =.< o. (5.4) The LUMV estimate can be written A h(y) = Cy = CXh + Cn h + Cn (5. 5) and also of course it can be expanded in the.i, A A h(y) = (h(y), i). (5.6) i Equation (5. 5) can be interpreted as describing a trivial case of the linear model for which the linear transformation is I, the "observation" is h(y) and the "noise" is Cn. Note that f and 2 have both been replaced by -11,L (X). It can easily be shown that, even in the infinite-dimensional case, the LUMV estimate for h! in the model (5. 5) is just the observation. Thus A there is no ambiguity in writing h(y) for the observation. The covariance r of the new noise Cn is the operator CRC; in fact, (Fu, v) = E(u,Cn)(Cn,v) = (C u,n)(n,C v) = (RC u,C v) = (CRC u,v). (5. 7) It follows that * 2 E(4i, Cn)(Cn, j) = (CRC ii, j) =.. ij. (5.8) Now, starting from the model (5. 5) we consider a completely arbitrary linear estimate h(y) of hI and expand it in terms of the i.. Thus, h(y) = E aj (h(y), ij)yi (5.9) i,j

37 where the ai. are real numbers. The formula (5. 6) is of course a special ij case of (5. 9). The error in h(y) is h(y)-h =- [(aii-l)(hl i) + aij(hl j)+a(ijCn, ). (5.10) i j j j/i From (5. 8) and the fact Cn has mean zero it follows that the mean-squared error is given by 2 1' s' 2 S' 2 2 Ellh(y)-hl = El (ai- 1)(hi.)+ a(h a 1 )12 + E a.. a. (5.11) j 1i, j j/i The equations (5. 9), (5. 10), (5. 11) still apply to an arbitrary unknown h and to an arbitrary linear estimate. Now we impose the condition that h belong to a known bounded subset of 7 —(X). Then there is a set 5 which is an r. p. with respect to the qi such that hi must belong to 4 (of course, the "fit" may or may not be very good) and we have (hl, qi)\ < b., i=, 2,... From (5. 11) it then follows that E lh(y)-hil2 < [laiil- I I b a ijlbji + Z aij j (5. 12) i j ij 1 j/i for all hl E.. The upper bound given by (5. 12) is minimized by putting aij = 0, i/j, and a.. = b /(bi +.). The linear estimate that results when ij ^1i i i i these values for a.. are used is, 1J 2 h(y) -= 2. (Cy, )J, (5. 13) 3J and from (5. 12),

38 2 2 h~y zh (I b 2 -Ib2 + ( 2i 2 2 Ell h(Y) hl 1222'i' 2 i 2 2 o-. b. (514 2 2 i i From (5. 4) we have that 2~ 2 * Ellh(y)-h111 = Tr(CRC ) = ( oY, (5. 15) and comparison with (5. 14) then gives EI h(y)-hl l l 2 E h(y)- hi11 for all h E 4. It cannot be claimed from what has been done that h(y) is optimum in any very meaningful sense, because the upper bound on mean-squared error given by (5. 12) may not be tight, and because the elementary minimization carried out was on an upper bound, not on the error itself. Still, h(y) would appear to be a good estimate. Since h(y) is a linear estimate it necessarily will yield values for some observations y that do not belong to /. It can be improved, as can h(y), by being truncated to /, but this of course makes it nonlinear. A truncated estimate h that has uniformly smaller error variance than h is defined by: hb(y) = i a.. (5. 16) whee, where,

39 b. if _ _L (Cy,b.) > b 2 Xb. b. ~- ~ otherwise. 22 b +o- - X.. J J The estimate h is discussed in [R-2] and [R-3], where an application is made to identification.

APPENDIX In this Appendix we obtain a solution to an infinite-dimensional LUMV estimation problem in a Hilbert space setting. A proof that is fundamentally the same but different in detail is given in [B-l], where the problem is attached to a systematic study of pseudoinverses. In what follows the operator A is required only to be closed and densely defined, instead of closed and bounded as in most of what precedes. If A is closed and densely defined then it is readily verified that 72(A) is a closed linear subspace, and also that A is densely defined in 7-1-(A). We do not give specific references, but the operator-theoretic background for all that follows is in [R-1]. To begin with we again note that if A is only closed and densely defined, the conclusions of Propositions 4. 1 and 4. 2 are still valid since the respective proofs do not require that A be bounded, but merely that'L(A) be closed. Before going further, we need a preparatory lemma: Proposition A. 1: If A is a closed, densely defined operator from / to Z2, then AA restricted to 1'(AA ) is self-adjoint and 1:1, so its inverse exists and is self-adjoint. Proof: We have the fact that AA = A A is self-adjoint and hence closed and densely defined in so we can write / = /6(AA) () /7(A A ) Let f E P/(AA ) and write f = fl +f2' fl E 6(AA), f AA Then f e A(AA*) and g = AA f belongs to 6(AA ) c d(AA ). Hence AA provides a 1:1 mapping from / (AA ) n 6t(AA ) onto 40

41 l(AA*) c <U(AA*). Thus (AA ) is defined on 1(AA), which is dense in ol(AA*), and 6[(AA ) ] = (AA )n O (AA ). (AA*) will al* /* * ways mean the inverse of AA restricted to OL(AA ) as indicated. Let T be the restriction of AA to 6(AA ). Then T = (AA as defined above. T is also closed and densely defined. Let f and g E 9(T) = ) (AA ) n (,(AA ). T is symmetric since (Tf,g) = (f, Tg). Thus ~ * T c T. We now want to show that T m T.We can start by observing that (Tf,g) = (f,T g) for all f E (T), g e (T*). Also, if fe 2(T) (Tf,g) = (AA f,g) Let f' be an arbitrary element in 9(AA ) and write f' f'1 + f' f'' )(T), f'2 e 7(AA) Then (AA f',g) = (AA f',g) = (Tf',g). Suppose (Tf,g) = (f,g ) holds for some g/ i(T) and g e E (AA ). Since we are working in the Hilbert space fE(AA), we must have g e )(AA) for every f e'(T). Then for every f' e O(AA ), (AA f',g) = (AA f',g) = (Tf',g) = (flg ) = ( f'2,g ) * * = (f',g )-(f'2g ). Since g cE (AA ) and J(AA ) = 7?J (AA ), we must have (f'2 g ) = 0 Thus,g) (f, (AA f',g) = (f',g)

42 where g f 3(T), but g e {(AA ) = 7,- (AA) for all f'. But since 0(T) = (AA ) n X(AA ), we must have g (AA )which is a contradiction since AA is self-adjoint. 1I The covariance operator R for the Hilbert-space-valued noise r in equation (4. 1) is a bounded self-adjoint, nonnegative definite operator on 2' which, as mentioned before, is assumed in addition to be strictly positive 1/2 definite. For further discussion see [B-l, Appendix B]. We note that R exists and is self-adjoint, and that (R1/2 ) also exists and is self-adjoint (hence closed and densely defined). (R1 )1 may be unbounded. In fact we 1/2 eventually make one more requirement on R, that R be Hilbert-Schmidt (so that R is trace class), which ensures that (R/ ) is unbounded. Since 1/2 -1 - 1/2 1/2 1/2 - 1 (R ) = (R ) /we can write R- = (R1/2 In the material to follow we refer to three related minimization problems, listed below as (A), (B) and (C). Solving problem (B) is the essential step in getting the LUMV estimator. The problems are: min 1l z2 }t (A) subject to Az = b min (R, ) ri (B) subject to Di = bJ minllzll I subject to DR- 1/2z b where R is a bounded, self-adjoint, strictly positive definite operator and D is a bounded operator from _/2 to'l. Note that (B) is equivalent to the problem

43 min IR1 /2 g1 subject to DP = b. Proposition A. 2: Problems (B) and (C) are equivalent in that if either has a solution, so does the other. If g and z are the respective solutions, they are related by Z = R1/2e e R-1/2 z =R ~=R z o o o 1/2 Proof: Suppose (B) has a solution o. Put z= R C. Then DR-1/2 -1/2 1/2 DR z DR = D =b. -1/2 -1/2 Suppose z also satisfies DR z = b. Put ~ = R z, which is certainly -1/2 defined if DR z is, and note that D5 = b. Then o1z 11 IIR 1./2 12 < IR1/21 ll -2 11 112 1/2 So z = R 1/ provides a solution for (C). Now, suppose (C) has a solution z. Since R 1/z is defined, we may put = R /2. Then -1/2 Dg = DR z =b. o O Suppose 9 also satisfies DI = b. Put z = R. Then DR1/2 - 1/2R 1/2 b DR z DR R = b and IiR1/2 112 = lIZo 12 < I1zI2 = IR1/2gl 2 so 0 = R /z provides a solution for (B). Ill Let us now try to relate problem (C) to problem (A). From the facts that R1/2 is densely defined and 9(D) = 2 we have that DR is densely -1/2 * defined; however it need not be closed. On the other hand, R D need -1/2 * not be densely defined, but it is closed since R is closed and D is bounded. We shall have to introduce some additional hypotheses and we -1/2 * choose to require that R D be densely defined ( this is restrictive, but _1/2 * not too much so, see [B-l, Section 2]). Then R D is closed and densely

44 -1/2 defined, so A = (R D ) exists and is closed and densely defined, and -1/2 -1/2 A R D. A is a closed linear extension of DR (the condition has -1/2 effectively guaranteed that DR has a closed extension). Suppose now we consider problem (A) with the operator A as just defined; this problem can be regarded' as a modification of (C). From Proposition 4. 1 we know that (A) has a unique solution if b e A(A). Call this solution z a weak solution for (C). Problem (C) may not actually have a solution be-1/2 cause DR z may not be defined. If a weak solution z happens to be cono 0 tained in e (DR /1), then problems (C) and (B) do have proper solutions -1/2 with =R z. o o From Proposition 4. 2 we know that if b e (AA ), then a solution exists to (A) with * * -1 z = A (AA ) b. -1/2 Furthermore, if R z is defined, then (B) has a solution o -1/2 _ -1/2 * *-1 = R /2z R A (AA ) b. In a sense this is the answer for problem (B) when the only condition imposed -1/2 is that R D be densely defined. It is not satisfactory, however, because a weak solution is not guaranteed to exist or to be a solution when it does exist, and even a weak solution may not be given by the formula if A/ (AA ) is not closed. Thus, in order to be sure of getting a solution for problem (B) that holds for all b e A/1 we need to introduce further conditions that will make {(A) closed (and hence,' (A(R*/2 closed (and hence, R(AA ) closed) and will guarantee that z e SA (R 1/2). We have:

45 Proposition A. 3: Let 1 and /2 be Hilbert spaces. Assume that: (a) R is a bounded, self-adjoint, strictly positive definite operator on 2. (b) D is a bounded linear operator from h/ to: with 9(D) = nz and 2 12 t(D) = (D). (c) R D is densely defined on (d) With A = (R 1D ), the operator (AA ) AR / is bounded. * -1 -1/2 Then (AA ) AR has a continuous extension S to all of 9, and problem 2' (B) is solved by i = S b. This solution is unique. Furthermore, 7' (S) = 7 (D ). * -1 Proof: We already have by Proposition A. 1 that (AA ) (interpreted as the restriction to 6(AA )) is self-adjoint. From (a) and (b) it follows that * -1 * (AA ) is a bounded operator on the subspace /t(AA ). In fact, since D has closed range, so also does D. Then D restricted to 77- (D is -1/2 buddfo l. Is bounded from below, and R D is bounded from below on (D But,I(R -/D ) = 7(D ) since R /D y = 0 implies D y = 0. Thus -1/* -1/2 * 12* A = R D restricted to? —(R D ) is bounded from below and hence * has closed range. This implies that A and AA have closed range (iin fact 6<(A) = ((AA ) = 4(A) ) which in turn implies that (AA ) is bounded. Since it is closed and has domain dense in 4(AA ), it is defined everywhere on /R(AA ). Now, * -1 -1/2 * -1 -1/2 -1/2 (AA* -1 DR-1 (AA) AR (AA) DR R =(AA) DR But (DR ) is dense in 2' and /t(DR-) c (DR1/2- (A), so *-1A -1/2 (AA- ) AR is densely defined. By (d), it then has a continuous extension S to all of'.2

46 We show that SD, which is a bounded operator on 2', is in fact the orthogonal projection on /Z(AA ). We have that * * -1 -1/2 * * -1 -1/2 1/2* SD o (AA) AR D, (AA) DR R D and that the domain of the operator on the right is equal to (R D ), - * -1/2 * which is dense in 91 by (c). Let g ~ P(R1D ) c (R/ D ), then ~* *-1 -1/2 * SD g (AA ) AR D g * -1 * = (AA ) AA g = g where g is the projection of g on ~ (A- ( = AA ) (AA ). Since SD o agrees with the orthogonal projection in question on a dense subset, SD * * is equal to it. Then SD is self-adjoint, so DS is also the orthogonal projection on %(AA ). * 1/2 To show that = S b solves (B) we show that z = R / solves (C), and invoke Proposition A. 2. Clearly, -1/2 -1/2 1/2 DR z DR R C =DSb b, o'~o so it remains to show that z provides a minimum. This will be established if it can be shown that z e 7- (A). For then, by Proposition 4. 1, z will minimize z 2 subject to Az = b and a fortiori zo will minimize I z 2 sub2* -1/ *. ject to DR /z = b. So let y e W. (A); then since (AA ) Ay = (AA,,-1/2 1/2 1/2 1/2 AR /2R / y = SR y, it follows that SR y = 0. Hence 1/2 * 1/2 (zy) = (R~S b,y) = (b,SR y) = so that z E 72? (A). The uniqueness of z follows also, a fortiori, from the uniqueness guaranteed by Proposition 4. 1.

47 Finally, we verify that 72(S) = 7?(D ). The bounded, densely * - -1/2 defined operator (AA ) AR is a mapping fromto (AA), since dL( AA )] = P(AA ) c /(AA ). But we have seen earlier that /(AA*) = (A), and also that 7T(A*)= 7~(D ). Consequently, (AA ) = - ) = L(A) = (-(D). Since S is the continuous extension of (AA )'AR 1/, it follows that ((S) =,i(S) c:-(D or that 7(S) o fT(D ). On the other hand t(S): 3 (AA ) = (D since SD is the orthogonal projection on /t(AA.Thus, (S ) 7(D ), which establishes the equality. i| From the solution to problem (B) as just provided, the LUMV estimator is readily obtained, almost exactly as in the finite-dimensional case discussed in Section 4. Proposition A. 4: Let /1 and 2 be separable Hilbert spaces, and let y = Xh + n where he lE, X is a bounded operator from?'1 into 42, and n is an A'2 valued random variable with mean zero and covariance operator R. With * the correspondences R - R, X D, assume that the conditions (a), (b), (c), (d) of Proposition A. 3 are in force and also that R is Hilbert-Schmidt. Then a unique LUMV estimate for hI exists and is given by h(y) - Sy At where S is the operator defined in Proposition A. 3. The variance of h(y), i. e. 2 * Eh(y) - hll, is equal to Trace(SRS ). Proof: Let {i}, i=1,2,..., be a c.o.n.s. for k with the special property that a subsequence of the 4i spans'- (X) and the remaining.i, of course, span:7 (X). Let C be an unbiased estimator (C must be a bounded linear

48 operator as discussed in Section 4). Then, with h(y) = Cy, we have as in (4.5) ElIh(y)-hlll2 = EllCnl2 - (CRC i, i) < o, i I where C must satisfy CXh = h for all hi e'-J(X). We wish to find A 2 * C = C such that EliCn I is a minimum. Again we put c. = C i. and deduce, as in Section 4, that the problem of finding the LUMV estimator is equivalent to finding the c. = c. that solve the problem: 1 1 minimize J IlR /2ci 12 i subject to Xc = L, x r,(X) = 0, ie 7(X). As in the finite-dimensional case, this problem can be solved by minimizing individually each term II R /2c.\i subject to either X c. = C. or to X c. = 0, respectively as i e 7?j-(X) or i) e 7(X). In fact, for the latter case we take c. = 0, and for the former we apply Proposition A. 3. Thus, A the minimizing c. are A * I c. = S 4., 4 e -T (X) = 0,.e (X) But from Proposition A. 3 we know also that -7(S ) = -,(X); hence we can Al * A A * write simply c. = S i for all i=l, 2,... Since c. = C, C =S as claimed, and El h(y) hili = e (SRS4),4) = Trace(SRS ). i

49 * * -1 Remark: SRS = (AA ) P, where P is the orthogonal projection on (AA*). In fact, * -1 -1/2 * *-1 1/2 SRS (AA ) AR RS =(AA) AR S -1 * -1/21/2 * *-1 * * (AA) X R R S =(AA) X S * -1 * * * * -1 * * But (AA ) X S is everywhere defined, so SRS = (AA ) X S. Since * * X S = P, the assertion follows. Remark: It should be noted that Corollaries 2 and 3 to Proposition 4. 3 still hold in the infinite-dimensional case.

REFERENCES [A-l] N. I. Akhiezer & I. M. Glazman, Theory of Linear Operators in Hilbert Space, vols. I and II (tr. by M. Nestell), Frederick Ungar Publishing Co., New York, 1961 and 1963. [A-2] A. Albert, Regression and the Moore-Penrose Pseudoinverse, Academic Press, New York, 1972. [B-l] F. J. Beutler & W. L. Root, "The operator pseudoinverse in control and systems identification, " Proc. of the Advanced Seminar on Generalized Inverses and Applications, Mathematics Research Center, University of Wisconsin, 1973. [G-l] I. M. Gelfand & N. Ya. Vilenkin, Generalized Functions, vol. 4: Applications of Harmonic Analysis (tr. by A. Feinstein), Academic Press, New York, 1964. [R-l] F. Riesz & B. Sz. -Nagy, Functional Analysis (tr. by L. Boron), Frederick Ungar Publishing Co., New York, 1955. [R-2J W. L. Root, "On the modelling and estimation of communication channels, " Multivariate Analysis-III, (ed. R. Krishnaiah) Academic Press, New York, 1973. [R-3] W. L. Root, "Estimation in identification theory, "Proc. Ninth Annual Allerton Conference on Circuit and System Theory, pp. 19, 1971. 50

UNIVERSITY OF MICHIGAN 3 9015 02826 5760