THE OPERATOR PSEUDOINVERSE IN CONTROL AND SYSTEMS IDENTIFICATION by Frederick J. \kBeutler William L. Root Computer, Information & Control Engineering Program The University of Michigan September, 1973 Research sponsored by the Air Force Office of Scientific Research, AFSC,':USAF, under Grant Nos. AFOSR-70-1920C and, AFOSR-7-2 328A, and the National Science Foundation under Grant No. GK-20385. To be presented at the Advanced Seminar on Generalized Inverses and Applications (Mathematics Research Center, University of Wisconsin, October, 1973)

1-00 ejlj I Ng. t k,..p JA, " 4411 "c2l" i 0': k ",7 f, t,4., h - -"", I",

TABLE OF CONTENTS Section Page 0 INTRODUCTION 1 1 PSEUDOINVERSE OPERATORS IN HILBERT SPACE 7 2 A GAUSS -MARKOV THEOREM FOR HILBERT SPACE 23 3 AN APPLICATION TO SYSTEM IDENTIFICATION 38 4 ON THE QUADRATIC REGULATOR PROBLEM 55 5 PSEUDOINVERSE OPERATOR APPROXIMATIONS 68 Appendix A. SOME PROPERTIES OF OPERATORS IN HILBERT SPACES 86 Appendix B. HILBERT-SPACE-VALUED RANDOM VARIABLES 97 REFERENCES 105 ii

PART 0 INTRODUCTION Extension of the concept of the pseudoinverse of a linear mapping in finite dimensional linear space to a linear operator in Hilbert and. even Banach space has been of interest for some years [B-3] [B-4] [D-1] [N-1] [P-3]. Our concern in this paper is with the (MoorePenrose) pseudoinverse of a linear transformation between Hilbert spaces. Part of our work deals with basic theory, which we then use for applications to systems identification and the quadratic regulator problem. Since the finite dimensional (matrix) pseudoinverse can easily be interpreted geometrically in terms of certain orthogonal projections on Euclidean spaces, generalization to Hilbert spaces is completely natural; indeed, if one limits consideration to bounded, operators with closed, range, much of the theory of matrix pseudoinverses generalizes directly to Hilbert space [D- 1 [B-3]. However, widening the class of transformations to encompass densely defined. closed. linear operators introduces substantive complications; domains of definition must be carefully treated, and account taken of the fact that neither the sum nor the product of densely defined closed operators necessarily enjoys the same property. Nevertheless, a good, part of the theory carries over in satisfying fashion.

There are genuine difficulties, however, when an operator has non-closed range, for then the pseudoinverse is unbounded and only densely defined; moreover, many of the usual pseudoinverse characterizations become invalid. In case the range of the operator is not closed, the pseudoinverse usually fails to provide an answer to real problems (e.g., in statistical estimation), since one cannot tolerate a "solution" which is only sometimes meaningful. To circumvent this impediment, we consider (1) shrinking the image Hilbert space and, endowing it with a new topology which insures that the range of the operator is closed, and (2) replacing the original operator or its pseudoinverse by a better behaved approximation. Either procedure can be applied as desired, but since some change is necessarily engendered thereby in the figure of merit which is optimized, the changed formulation may no longer be legitimate for its intended application. Roughly speaking, changes in the Hilbert space or modifying the original operator is related to proper mathematical modelling of a physical situation, while approximations to the pseudoinverse are more closely connected, to pseudoinverse theory as such. There.is good, reason to analyze pseudoinverses of arbitrary densely defined linear closed. operators, rather than just bounded operators and/or those whose ranges are closed. Our examples (and doubtless many others) will indicate this. On the other hand, it seems evident that we cannot easily drop the requirements that an operator be closed. and densely defined.

3 Part I of this work is devoted to a self-contained exposition of the basic theory of the pseudoinverse for densely defined linear closed. operators with arbitrary range. It is self-contained in the sense that. no results are borrowed from the literature on pseudoinverses, finitedimensional or otherwise. Much of the material covered in Part I is available elsewhere in some form, but we believe some of our results for unbounded, operators represent extensions. The theory we develop includes the case of operators that do not have closed, range, but any questions regarding approximations and change in topologies are deferred until later. Appendix A. consists of a collection of lemmas ("exercises for the reader") relevant to unbounded. operators; they are intended to facilitate the reading of Part I and of the later sections. In Part II, a Gauss-Markov theorem on statistical estimation is proved under the hypothesis that both the quantity to be estimated and the observations are elements of Hilbert spaces. This theorem is an improved version of a theorem stated, earlier in [R-5], where the proof was not given; the proof given here suffices for the previously stated. result. Our theorem applies only to non-singular covariance operators, and reduces to the classical Gauss-Markov theorem when the spaces are finite dimensional. To obtain our result (in Hilbert spaces) it becomes necessary to treat the inversion of an unbounded non-closed. operator, which carries us beyond the subject matter of Part I. Consequently, an additional hypothesis and some further argument are required.

4 Appendix B (needed for Part II) is a brief development of the more elementary aspects of the theory of Hilbert-space-valued random variables. The use of Bochner integrals enables us to establish the needed, facts very quickly, thus sparing the reader the chore of studying the literature on the subject. Two applications of engineering interest are discussed; the first is a problem in unknown system identification (Part III), while the second (Part IV) is an optimal control problem known to workers in the field. as the quadratic regulator problem. In Part III, VolterraFrechet integral polynomials are taken to represent a class of unknown systems in input-output form, and the Gauss-Markov theorem of Part II is applied. to the estimation of their kernels. There is also in this section some discussion of the applicability of the pseud.oinverse to identification problems in general. In one classical form of the quadratic regulator problem, it is required to find. the minimum energy input which will move a system from some initial state to the origin at a designated time. Part IV offers a reformulation which generalizes this problem to admit a greater variety of linear constraints, possibly including some which are incompatible and/or unattainable by the system. The solution always exists as a pseudoinverse (even if the system is described by an unbounded. operator), and reduces to the classical result if the system is capable of meeting the constraint. Another related quadratic regulator problem, usually called the free endpoint problem, is also treated in Part IV. Here, a quadratic loss function involving both input

and output is to be minimized by appropriate choice of input. We show this loss function to be conveniently represented if we change the topology on the image Hilbert space of the inpuLt-otLtput operator; nm-oreover, the new topology guarantees the range of this operator to be closed. The solution to a generalization of the free endpoint quadratic regulator problem can then be directly obtained by applying the pseudoinverse. In short, the pseudoinverse is a unifying influence on extensions of some quadratic regulator problems, and as such, provides an insight not generally obtained. through the more classical approaches. In both Parts III and IV we encounter as a recurrent theme the necessity of inverting, in somne acceptable sense, operators with nonclosed range. The technique of changing the topology on a reduced image space is employed in both places, but not in identical fashion. In particular, Part III describes a general approach which may also be useful in dealing with problems other than those considered. here. Approximations to the pseudoinverse constitutes the subject matter of Part V. Some known matrix approximations are examined in the context of unbounded operators whose range may not be closed. Properties of these approximations, and of other techniques suggested for operators of non-closed range, are investigated. Systematic use is made of the polar decomposition of operators, and. of the functional calculus applicable to the resulting positive operators expressed as a spectral representation. We believe the material of this section to be largely original, especially in their coverage of unbounded operators.

Part V also contains some results on the limits to be expected from any choice of approximations. It may be useful to the reader to mention that the entire paper is dependent on Part I, but that Parts IV and V are independent of one another, and of Parts II and III.

PART I PSEUDOINVERSE OPERATORS IN HILBERT SPACE The pseudoinverse of a matrix has a rich literature, and, has become sufficiently well recognized to constitute the subject of two recent books [A-2] [R-1]. Much of the underlying theory is phrased, in matrixtheoretic concepts, even though some of the principal optimization. applications are more clearly motivated by the "best approximation" property, which the pseudoinverse matrix possesses with respect to Euclidean norms. In particular, it has been observed. that application of the pseudo-inverse matrix solves certain optimal control (quadratic regulator [K- 1]) and, minimum mean square estimation [A- 1] problems. The emphasis on norm minimization suggests a function analytic rather than an algebraic approach to the pseudoinverse. Indeed, it seems natural to attempt to extend pseudoinverses to a Hilbert space context [D-1], since Hilbert space itself is no more than a generalization of finite dimensional linear vector space with Euclidean norm. The necessary extension is in fact easily accomplished, for bounded linear operators of closed range [D-1] [B-2]. This class of course includes all bounded linear operators of finite dimensional range, and, a fortiori, operators on finite dimensional spaces. Thus, the Hilbert space pseudoinverse theory is not only a legitimate extension of the matrix theory, but also represents an approch eminently suitable to optimization que stions. 7

8 If we abandon the assumption that an operator is necessarily bounded and/or equipped with closed range, new conmplexities are encountered. These must be faced squarely if application to control or estimation theory are to be contemplated in any degree of generality. One may recall, for instance, that the differentiation operator in an L2 space is unbounded [R-Z], so that the input-output relation of a dynamical system described, by differential equations is represented by an operator whose range is not a closed set. On the other hand,, we shall see that the Gauss-Markov estimation problem in separable Hilbert space is equivalently formulated in terms of unbounded operators. In order to avoid. a complete chaos of pathological behaviors, we shall suppose that we seek the pseudoinverse of a linear operator A: H1 - HZ, where A is a closed operator ([R-2], Section 115) densely defined on the Hilbert space H1, with Hilbert space H2 as range space; such an operator will be called DDC (densely defined closed). Roughly speaking, the pseudoinverse A of A will solve the following minirnization problem: given z e H2, A delivers x = A z, where x0 e H1 minimizes the norm of z - Ax over x E HI and where x is the element 1' ~o of least norm accomplishing the minimization. These notions will be made precise, and, we shall also attempt to determine the conditions under which A exists, what its properties are, and how it may be characterized. We begin by defining the symbols we shall need for our analysis. D(A) is the domain of A, and R(A) its range; an overline denotes closure, so R(A) is the least subspace containing the range of A. By a

9 subspace we shall consistently mean a closed linear manifold in the appropriate Hilbert space. Thus, the null space N(A) of A is a subspace because A is DDC. An orthogonal complement is indicated by the standard symbol I, as for example N(A) I is the complement of the null space of A. We use P to stand for an orthogonal projection, and, particularly PR for the projection on R(A) and PM for the projection on N(A). In addition, standard symbols are employed for norm, adjoint, inverse and the like. For the minimization problem, it is convenient to define some special symbols which are used recurrently. Thus, let =z inf D I z-Axil z H, Z (1. 1) x e D(A) K = (x: Ax=PRz} (preimage of PRz under A) (1.2) FA = {z: PRz E R(A)) (1.3) Finally, the subscript indicates the restriction of any operator on H1 to the subspace N(A)i, or the restriction to R(A) of an operator with domain space Hz. Specifically, A: N(A) R(A) and, A RR(A) N(A)'. Properties of restrictions of operators and their combinations are summarized, in Appendix A to the extent that they are required in what follows. As we have noted, our applications of the pseudoinverse rest on its "best approximation" property. In intuitive terms, the pseudoinverse, when applied. to z E HZ, must give us the best approximate + solution x = A z to the functional equation z = Ax (1. 4)

10 We use BAS as the acronynm for "best approximate solution, "the term being more precisely described by Definition 1. 1: x E H is a BAS of (1. 4) if...... o 1 l z - Ax (1.5) and 1lXo l < IlxI (1. 6) for any other x which also attains the infimum (1. 6). Remark: Uniqueness of the BAS constitutes part of its definition, being an immediate consequence of the strict inequality (1. 6). The definition does not assert the existence of the BAS; indeed, there is always a sequence {Xn} E D(A) approaching the infimum (1. 1), but without any element attaining it as required by Definition 1. 1. However, the question of existence of the BAS is quickly settled, by Theorem 1. 1: A BAS exists iff z E FA [cf. (1..3)j; whenever a BAS x exists, it is unique and. satisfies Ax = PRz (1.7) and x EN(A)'. (1.8) Conversely, assume x satisfies (1.7 ) and (1. 8). Then x is the BAS, and is the only element of H1 satisfying these two equations. Proof: We know inf I I| z-YI I = I, the infimum being uniquely aty eR(A) tained by z1 = PRz. It follows that (1. 5) is met iff x is such that Ax = PRz. But x of this type exists only if z E FA, in which case K [see (1. 2)] is non-empty, and (1.5) and (1. 7) are both equivalent to x eK z

11 Suppose now x', x" eK. Then Ax' - Ax", or (x'-x") E N(A). z Hence any x E K has the orthogonal decomposition x - x + x 1.9) o 1 where x e N(A)' is the same for every x K, and x N(A). To verify xo E K, we use Lemma A. 1 to argue x E D(A), and obtain O z z Ax = Ax from (1. 9) and xl E N(A). Now apply the Pythagorean theorem 0 to (1. 9); the strict inequality (1. 6) follows unless xl 0 and consequently x = x. Thus x0 is a BAS satisfying (1. 8). For the converse, note (1. 7) means z E FA, whence a BAS x' exists and satisfies (1.7) and (1. 8). But (x -x' ) e N(A) from (1.7), whereas (x -x'0) E N(A) - by (1. 8). Therefore x = x', or x is seen to be the BAS as claimed. The same argument shows that at most one vector can satisfy (1.7) and (1.8). ||| Theorem 1. 1 is readily specialized to operators whose ranges are closed sets, viz. Corollary 1. 1: If R(A) is a closed set, the BAS always exists. Proof: For any z E HZ, PRz E R(A) = R(A), so FA = H2. IIW We have seen that every z E FA has associated with it a unique BAS x E H' thereby suggesting an operator which transform elements of H2 to elements of H1. More formally:. 2 I' Definition 1. 2: An operator A: H - H1 is called a pseudoinverse operator (henceforth abbreviated, PI) if D(A ) = FA, and if, for each z EFA x =A z (1. 10) is the BAS.

12 How can we recognize an operator as a PI? The following criterion is often helpful. Lemrna 1. 1: The linear operalor A: l - H1 is a PI iff D(A ) F and + N(A ) R(A) (1. 11) + R(A ) N(A)- (1. 1Z) and AA y = y for all y E R.(A). (1. 13) Proof: (Sufficiency) For z E FA, again call PRz = Zl Then zl E R(A), and (1. 13) asserts A z E K, i. e. Ax = P z with xo A z Second, this x E N(A)lby (1. 12), so we conclude x is the BAS [compare (1. 8) + + in Theorem 1. 1]. To complete the proof, we need only show A z = A z To this end, we decompose z = z1 + zZ, where z1 is as before and + + + ~ ~ zE e N(A ). Now z E D(A ) FA, and A z = 0, whence A z = A zl x as required by (1. 10). (Necessity) Clearly, R(A ) c N(A)-, for otherwise (1. 8) fails. To prove R(A ) dense in N(A)-, observe D(A) n N(A)- to be dense in N(A)-L (cf. Lemma A. 4), and each x E [D(A) n N(A)' ] is a BAS for y = Ax (Theorem 1. 1). Since then x = A y, [D(A) n N(A)- ] c R(A ) Thus (1. 12) is necessary. If (1. 13) is not satisfied, A z t K for some z e R(A) c FA, so z so (1.7) does not hold. Respecting (1. 11), consider that the BAS for z j R(A) is perforce the null vector from (1. 7) and (1. 8), requiring that N(A+) m R(A) — if A is to be a PI. On the other hand, (1. 13) de+ mands N(A ) c R(A), which then leads to the validity of (1. 11). Hl

13 Remark: The conditions (1. 11) and (1. 13) of Lemma 1. 1 already insure that R(A)L and R(A) are in the domain of A+ wherefrom FA c + + D(A ). To see this, observe that A is linear by hypothesis, and that FA = R(A) R(A (1. 14) + + is then in D(A ). We note FA is dense in Hz, which implies A to be + + densely defined because FA c D(A ). However, if D(A ) is larger than FA we cannot claim A to be DDC, whereas (as we shall prove below) A must be a closed operator if D(A ) = FA. There are many alternative forms for a matrix PI [R- 1], but these fail to carry over to unbounded operators in direct fashion. The source of difficulty is that combinations of unbounded operators need, not be DDC, and may in fact be defined only on the null vector; hence, manipulations of such operators is anything but routine. Nevertheless, we can demonstrate some of the properties of the PI by an explicit construction, which, (inter alia) exhibits the PI as a linear DDC operator. Theorem 1. 2: The PI exists as the uniquely defined linear DDC operator + -1 A A PR (1. 15) r R Proof: Since a unique BAS corresponds to each z E FA, the PI is properly and uniquely described on FA. It then suffices to prove that the right side of (1. 15) constitutes a linear DDC operator which meets the conditions set forth in Lemma 1. 1. In the first place, Lemma A. 6 asserts A to be defined and DDC. r -1 Then, since PR is bounded, A PR is likewise a closed operator. A-1 P is also clearly linear. To show A PR densely defined, we r PR is a160 clearly linea.r

14 note its domain includes both its null space R(A)'I and the dolllain of A, namely R(A) [see Lenmnla A. 2]. By linearity, we therefore have r D(Ar PR): FA' the latter having the form (1. 14.); since FA is dense -1 in H2, so is D(Ar PR). We have proved that A [as given by (1. 15)] is linear DDC, with + + D(A )D FA. To show D(A ) = FA, consider z ~ FA. Then PRz E R(A) = D(A 1), which means z e D(A PR). Finally, we must verify (1. 11), (1. 12) and (1. 13) of Lemma 1. 1. From (1. 15), N(A +) R(A) —, but also A y X 0 for non-null y -ER(A); in fact, A - R(A) - N(A) has domain R(A) and possesses r an inverse A. Consequently, the A of (1. 15) satisfies (1. 11). r Next, we see from Lemma A. 4 that R(A ) = N(A). Moreover, R(A ), since each range depends only on preimages in R(A), + -1 and for any y E R(A), A y = A y. Thus, (1. 12) has been proven. For any y E R(A), we now obtain + -1 -1 AA y AA y = A A y = y (1. 16) r r r where the middle equality results from the definition of A as the restriction of A to N(A)-, which is precisely the range space of A r Therefore, (1. 13) is validated also, and the proof of the theorem is complete. + Corollary 1. 2: A is bounded iff R(A) is closed. Proof: A is bounded iff A is bounded, and the latter is equivalent to r R(A) closed by Lemma A.8. H

15 The matrix pseudoinverse is often defined not by its best approximation property (Definition 1. 2), but rather as the (unique) matrix satisfying the identities [G-2] +~ + + + + AA A = A AAA A A, AA PR A A + PM. (1. 17) These same identities- suitably modified for DDC operators-are relevant to the PI as per Definition 1. 2. Indeed, the modified identities (1. 17) follow from our definition. Conversely, a DDC operator for which a weaker form of (1. 17) is valid must be a PI. We now proceed to state precisely and prove these relationships. Theorem 1. 3: Let A be the PI for the DDC operator A. Then AA A =A (1. 18) A +AA+ =A+ (1. 19) AA c PR' with D(AA ) FA (1. 20) and A A c PM with D(A A) = D(A). (1. 21) Proof: (1. 20) is a direct consequence of (1. 11) and. (1. 13) in Lemma 1. 1. In view of (1. 11), both sides of (1. 19) yield the null vector when applied to R(A)-, so we need consider only y e R(A), and demonstrate + + + ++ A AA y = A y for (1. 19). But AA y = y by (1. 13), and, so (1. 19) follows. To prove (1. 21), decompose x E D(A) as x=x1 +x', x1 EN(A)1, x2 N(A). (1. 22) Then x ED(A) by Lemma A. 1, and + + -1 -1 AAx A+Ax = A A =A 1A x =x ( 23) 1 r R 1 r r 1 1 (1.23) which shows A A acts like PM for every x E D(A). With this result it is easy to prove (1. 18), since from (1. 23) and (1. 22) above AA Ax =Axl =Ax (1. 24)

16 for all x ED(A). ||| Although A +A and AA are DD according to (1. 21) and (1. 20), one cannot a.ssume these to be closed operators. In fact, A A. (respectively AA ) closed corresponds precisely to the boundedness of A + (respectively A ), as is seen from Corollary 1. 3: Let A be the PI of the DDC operator A. Then A is bounded iff A A is a closed operator; A is bounded [or equivalently, R(A) is a closed set] iff AA is a closed operator. Proof: Because of the duality evident in Theorem 1. 3, we need only consider the first assertion. By (1. 21), A A is bounded, and DD, so A A is closed. iff it is defined on all Hi as a bounded operator. Of course, A (a DDC operator now defined on all H1) is then bounded also. On the other.hand, A bounded and. A DDC implies A A to be a closed operator. We could, state a direct converse to Theorem 1. 3, but is is noteworthy that less restrictive requirements can be substituted for (1. 20) and (1. 21). More specifically, we have Theorem 1. 4: Suppose A: H2 H1 is a linear closed operator with D(A+) c FA. Let this operator satisfy (1. 18) and (1. 19), and assume further A A is a symmetric operator (1. 25) and AA is a symmetric operator. (1. 26) + Then A is the PI of the (linear DDC) operator A. Proof: We again turn to the sufficiency conditions of Lemma 1. 1, starting with (1. 13). If y E R(A), we may take x E D(A) such that Ax = y,

17 whence (1. 18) becomnes AA y - y for each y e R(A). This disposes of (1. 13). Our next task is to prove (1. 11), or N(A ) R(A. Actually, N(AA+) = R(A) suffices, because N(A ) = N(AA+) by the chain N(A+) c N(AA+) c N(A AA) = N(A ) (1. 27) The usual argument on ranges and null spaces produces R(AA+) = N([AA+]*) c N(AA+), (1. 28) the right hand inclusion following from (1. 26). However, the containment in (1. 28) is even an equality, because D(AA ) is dense in D([AA+]-*), so that N(AA+) is not only dense in N([AA ]*), but is also closed by (1. 27). We then have R(AA+)- - = N(AA+) (1. 29) Finally, R(A) = R(AA ) from R(A) D R(AA ) R(AA A) = R(A). (1. 30) The combination of these equalities therefore yields N(A) = R(A)-, as was to be shown. The remaining equality (1. 12) of Lemma 1. 1 can be written as R(A +~- = N(A), which is like (1. 11) except that the symbols A and A are interchanged. But the hypotheses on A and A are symmetrical, so that the desired identity can be demonstrated in the same fashion as R(A)I = N(A+). + We must still show A to have the correct domain. The domain clearly includes R(A), since AA y = y for all y E R(A). Also, the null space of A is R(AL-, so by the linearity of A

18 D(A ) R(A) ) R(AY- = FA; (1.31) + +~nhv (A in view of the original hypothesis D(A ) c FA, we then have D(A III FA' ) Remark: In the literature on the matrix PI (1. 25) and (1. 26) are re - placed by the stronger hypotheses that A A and AA are self-adjoint. Such assumptions are inappropriate here, since (by Corollary 1. 3), they would limit the applicability of Theorem 1. 4 to bounded operators + + A and A. Moreover, even the (apparently) weaker assumptions A A and AA closed symmetric already imply A A = PM, AA = PR, with both A and A, bounded. Many identities have been developed for the matrix pseudoinverse [R-1] [A-2]. Most of these equalities are retained, in a weaker version when applied. to unbounded. operators. The two presented below, however, are generalized. without change in their statement, although the proofs necessarily become sophisticated. beyond. mere modifications of the usual matrix arguments. Theorem 1. 5: (A ) = A. (1. 32) Proof: We construct (A ) by the method of Theorem 1. 2. In the first place, (A ) = A from (1. 15). Secondly, R(A ) N(A)-by (. 12). r r + -1 + But then (A ) = A and the projection on the closure of the range of A r r is PM' so that the application of (1. 15) to A yields +)+ (A )= A PM = APM A. (1. 33) The right hand equality is here justified. by Lemma A. 1 and the orthogonal decomposition (A. 1). H

19 Theoremn 1.6: (A ) = (A ). (1. 3)) Proof: The construction of Theorenl 1. 2 exhibits the PI of A':- to be ~, + *> - I (A) (A ) PM; (1.:35) by Lemma 1. 1, this PI has null space N[(A+) J R(A )1 N(A), with the right hand equality being obtained from (1. 12). Since both operators in (1. 34) have the same null space, it suffices to compare their restrictions (A ) and, [(A ) ] to N(A)-. Using successively Lemma A. 9 and the representation of Theorem 1. 2, we obtain +.'*+ * - 1 * (A) = (A) = (A ) (1. 36) r r r But also -1 =- (_ (A-1l [(A)] = (A ) (A(A ); (1. 37) here the equalities follow respectively from (1. 35), Lemma A. 9 and the interchange of inversion and the adjoint operation (see [R-2], Section 117). il It is also characteristic of the matrix pseudoinverse that it may be expressed. in various forms involving combinations of other matrices related to A [R-1] [A-2]. For example, among the alternative formulations of the matrix pseudoinverse we have A 1A (AA tions of the matrix pseudoinverse we have A - A (AA ) and, r + -1 A (A A)r A Since such forms also occur in applications to infinite dimensional problems (e.g., Gauss-Markov estimation), it would, be desirable if they were also valid in the more general case of DDC operators. But again, operators A which are unbounded and/or have non-closed range lead to complications which must be taken into account. To this end, we need

20 Definition 1. 3: AX: Hz - H is a restricted pseudoinverse operator (acronym: RPI) for the linear DDC operator A if A cx A Remark: AX need be. neither densely defined nor bounded to be a RPI. Theorem 1.7: A' A (AA) PR (1. 38) is a RPI with domain D(A') R(AA ) (+~ R(A (. (139) 1 - -l1 -l Proof: By Lemma A. 14, (AA ) is invertible, and (AA) = (A) A Thus (1. 38) makes sense and A' A [(A) 1A 1p (1. 40)' AVIr r R' * - 1 * 1 1 Now A (A) =) A (A ) is a restriction of the identity. In other words, r r r (1. 30) implies A' c A P = A (1. 41) r R Let us determine D(A'). It is clear A' is linear and, well defined, on R(A)1. In R(A), A' is at best defined on R(A) because of (1. 1)1) and the extent of D(A ); we therefore limit consideration to y e R(A). -DZ 1 >* *" Now D[(AA ) ] - R[(AA ) ] = R(AA' ), the equality on the right followr r ing from Lemmas A. 2 and A. 10 and Corollary A. 2. Hence y e R(A) is * - 1 v., in the domain of (AA ) iff y e R(AA ). Further, such a y leads to *.' -l * -; -l':-.'{ (AA )y R[(AA"' )J D[L(AA)r] D(A D(A c D(A ) c D(A ) (1. 42) (AA~ r y Er r r r r Here we have again used Lemma A. 12 for the equality on the right. * -1 * Since (AA ) y automatically falls in D(A ) as shown by (1. 42), and. since R(A)D R(AA ), D(A') is correctly given by (1. 39) and A' is an RPI. I|| Corollary 1. 4: A' = A iff R(AA ) = R(A); in particular, A' is the PI if R(A) is closed (or equivalently, any of the conditions of Lemmas A. 8 or A. 15 are met).

Proof: From (1.39) and (1. 1)I), D(A') - D(A ) iff R(AA") R (A). I (1. Should R(A) be closed (which is true if any of the conditions of JLemsmnlas A. 8 or A. 15 are satisfied), Corollary A. )t states (1. 43) to be valid. Remark: A' is densely defined in any event because R(AA ) = R(A) as indicated, by Corollary A. 3. Also, (1. 41) verifies the existence of a closed extension for A'; it is plausible that A is actually the minimal closed extension of A', although neither proof nor counterexample has been found to elucidate the question. Analogous to A', but differing in its domain, is Al" (A A) A (1. 44) If A is a matrix, or even if A is bounded and R(A) closed, A' and A" are both bounded and coincide with A, and there is little advantage of one with respect to the other. But when A is an unbounded operator of closed range, we can guarantee only A' = A (see Corollary 1. 4 above) and conclude nothing further on A", whereas A bounded (with arbitrary range) gives rise to A" = A without any simplification of A'. Theorem 1. 8: A" is a RPI with domain D(A") = [D(A ) n R(A)] ( R(A _ (1. 45) Proof: The proof is much like that of Theorem 1. 7, so we shall omit some of the details. Since R(A) = N(A we write * - 1 A" = (A A) A P (1. 46) r rR 1' 1 = A [(A ) A ]PR. (1. 47) The term [(A ) A ] is a restriction of the identity on R(A) to D(Ar), so that

22 A" A -1p = A + A"cA P A. (1. 48) r R The domain of A" evidently contains R(A)'. On R(A), this domain is limited to vectors in R(A), and [by (1. 46)j even to y E [R(A) n D(A )]. For such y, Ay e R(A) D[(A ) j, and iJ r r r [(A) A y = y But y R(A) D(A ), whence A [(A) A' ]y is r r r r r r defined also. Thus y E [R(A) n D(A )] implies y E D(A"), and D(A") is described by (1. 45) by the linearity of A". |1| + * Corollary 1. 5: A" - A iff R(A) c D(A ); in particular, Al' is the PI if A is bounded (or equivalently, if any of A, AA or AA are bounded). Proof: If R(A) c D(A ), the D(A") of (1. 45) coincides with D(A ). The second assertion of the corollary is then immediate, since any of the hypotheses imply D(A" ) = H2. HI| Remark: Attention is called to a limited duality between A' and A". -1.-11 1 -l + If any of A,(A ), (AA or (A A) are bounded, A' is the PI A r r r r and is thus a bounded operator defined on all H2. On the other hand, if any of A, A, AA or AA i. e. A, A (AA) or (A A) ] re bounded, A" is the same as the PI A, and is thus a DDC operator on H2

PART II A GAUSS-MARKOV THEOREM FOR HILBERT SPACE A natural extension of the classical theory of linear unbiased minimum-variance estimators (LUMV estimators) is to the case that both the vector of unknown parameters and the vector of observations are infinite-dimensional in the sense that both are elements of separable Hilbert spaces. Although one may well argue that in any actual'data processing scheme only a finite set of parameters will be estimated. and. only a finite set of data will be used, a Hilbert space version of the theory is of interest as providing characterization of limit cases. Indeed, there are many estimation problems which are inherently infinite-dimensional, and. for which reduction to finite sets of observations and parameters represents an approximation. The subject of identification of unknown systems to be discussed briefly in Part III provides examples of such problems. In this Part we state and. prove a Gauss-Markov theorem for Hilbert space, but we leave comments about its application to Part III. Necessary and. sufficient conditions for the existence of an LUMV estimator in Hilbert space, and. a characterization of the estimator in terms of reproducing-kernel Hilbert space concepts, is given in [P-Z]. Our objective here is somewhat different; we wish to exhibit reasonable sufficient conditions for the existence of an LUMV estimnator, that also guarantee an extension of the standard classical formula for the 23

24 estinlator. Let H1 and Hi be separable Hilbert spaces. Consider the linear nmodel z - Bx -+ w (2. 1) where x E H1, where B is a continuous linear transformation from all of H1 into H2, and where w is an H2-valued random variable. Hilbertspace-valued random variables are discussed below and, in Appendix B, where definitions are given for the mathematical expectation of an Hvalued random variable w and for the covariance operator of w, as well as conditions for the existence of the covariance. We assume the expected value of w, Ew, exists and equals zero and that a covariance operator, K, for w exists. The element x represents the vector of unknown parameters to be estimated, and z represents the vector of observations. A linear estimator C is defined to be a continuous linear transformation from all of Hz into H1; Cz is then an estimate of x. Since z is an H -valued random variable, Cz is an H -valued random 2 1 variable (see Appendix B). We call ElICz-E(Cz)>l2 = EIICwl 2. the variance of the estimator C; it will be seen below that if K has finite trace the variance is always finite. We say C is an unbiased estimator for x if E[Cz] exists and equals x. If C is unbiased, then the variance equals the mean-squared error, El Cz-xIl 2 C0 is LUMV if it is linear, unbiased and of finite variance, and Eli C~z -x| 2 < El Cz-xii for all linear unbiased estimators C. In the classical case HI and. H2 are finite-dimensional Euclidean spaces. Then, if K is strictly positive definite, and if N(B) = 0, an LUMV estimator always exists and

2; is given, e. g., bythe fornmula''' -1 -1 -l —1(Z 2) C (BK B)lB K r It is this formula we extend to the case HI and H. are Hilbert spaces. If one puts A = K /2B, then (2. 2) can be written A 12 ~ -11/2 x =( 1A* (K- z) (2. 3) (A (K 1/ ) The standard proof of (2. 2) is carried out by first proving that C is in -1/2 fact the PI when K = I, then transforming z in the model (2. 1) by K that is by re-norming the observation space Ha. Eseentially, this is what we do below, although it is not quite literally what we do because of certain technical considerations. Before getting to the theorem, some further elucidation of the conditions already imposed and their implications is warranted. The discussion to follow also points out the reasonableness, and sometimes the necessity, of certain additional conditions. First, there are various ways of.establishing the existence of a quantity w that can reasonably be called a Hilbert-space-valued random variable, that satisfies E(w) - 0, and that has associated. with it a bounded, self-adjoint, nonnegative-definite operator K such that The notation is consistent with that of Part I. It would be interesting to extend a Gauss-Markov theorem to Hilbert space with the condition that K be nonsingular removed, but we have not done this. The classical method and. formula we use does not appear to be suitable. It would appear more promising, for example, to try to extend the method and. formula in [A-2].

Z6 (Ky, z) = E[(y,w)(,)J. (a )j K is called the covariance ope lator (since the i-matrix fornm of (2. )i) characterizes the covarianc~e i'atrix of a finite-dimiensional random-w vector). An outline of a simple way of defining w is relegated. to Append.ix B, since the material involved is largely foreign to the rest of the paper. It will be observed that the construction given in Appendix B automatically implies that K is nuclear (i. e., is compact with finite trace). This condition is also reasonable as a requirement imposed on the estimation problem; indeed, if El IwI 2 < oo and the probability measure is countably additive, K must be nuclear (see Appendix B), and Tr K = EIIwI 2. The condition that K be strictly definite is necessary for the forthcoming formula to be meaningful; see the footnote p. 25. The definition of an LUMV estimator is lifted from the classical one for the finite-dimensional case. The now additional restriction that the estimator be continuous as well as linear is almost necessary for the estimation problem to make sense. In fact, the estimator must be everywhere-defined or it is useless; for technical reasons we want it to be a closed, operator. Hence it must be bounded. The condition that B be continuous is certainly a technical convenience. As far as the modelling of real problems is concerned, it seems natural to require B to be continuous. Now if the unknown element x in (2. 1) has a non-zero component in the nullspace of B, that component in no way affects the observation z, and it cannot possibly be estimated. Hence we can only estimate the

27 orthogonal projection of x on N(B)-. In a notation consistent with that used in Part 1, we let PM be the projection on N(B)', B be the restriction of B to N(B)' and put xl PMx. Henceforth we are concerned only with estimates of xl. Since w and z are H2-valued random variables, Cw and. Cz are H -valued random variables for any (bounded linear) estimator C (see Appendix B). Further, the mathematical expectations and, covariance operators exist, and we have E[Cz] = E[CBx + Cw] = CBx (2. 5) = CBxl The unbiasedness condition (for xl) then becomes CBxl = x1, or CB x = x (2. 6') ~~r 1 ~~~~-1 Thus, the restriction of C to R(B) is necessarily B, which is another r way of stating the unbiasedness condition. From this observation we can make the simple but important inference that B must have closed range. In fact, I CII -= sup IICyl > sup li C II ll=1 IlYll=1 y E R(B) = lI B-iyl l sup r y= r I. y E R(B) Thus C cannot be bounded unless B-1 is bounded, but this is equivalent to B having closed range (Lemma A. 8). The condition that B have closed, range becomes an essential hypothesis. The variance of a finite variance linear unbiased estimator C for xl is 1

28 E|lCz-x 11 = EIICBx+Cw-xlll1| = E2Cwlj. (2.7) Let {n }, n=l, 2,..., be a c. o. n. s. for H1, chosen so that a subsequence of the i's exactly spans N(B)-. Since K, the covariance of w (see Appendix B), is required to be a nuclear operator, we have for C unbiased; E IICz - x11 = EI Cwll 00 0o E[ (w, C n)l ] Z E(C nw)(C *,nw) 00 00 - L (K C bn, C 4n) = Z (CKC n nKn) = Tr(CKC ) (2. 8) In fact, CKC is nuclear since K is nuclear and C is bounded, and so Tr(CKC') exists (cf. [G-1], Chapter 1). We see that the conditions on K and C guarantee that the estimator is of finite variance. We summarize these comments and make an obvious addition in Lemnma 2. 1: Let the B of (2. 1) be a DDC linear operator and, let w be an H -valued random variable with a nuclear covariance operator K and, with Ew - 0. Then a necessary condition that a linear unbiased, estimator C for xl exist is that B have closed range. If B has closed range a finite variance linear unbiased estimator does exist, in par+ ticular C = B is one. For these assertions to hold K does not have to be strictly definite and B does not have to be bounded. Proof: The first assertion has been proved above. By Corollary 1. 2, B is a bounded operator. It meets the condition for unbiasedness,

9'3 + + (2. 6), and the error variance is Tr(B KB ), which is finite as has been pointed out above. lII Remark: By Corollaries 1. ) and 1. 5 it follows that B can be expressed, in the form B', and if B is bounded, in the form B". B provides an estimator whose range is N(B) —. However, it is fairly obvious that if R(B) A H2 and N(B) A 0 finite variance unbiased, estimators C exist such that R(C) n N(B) / 0. We proceed now to a consideration of LUMV estimation. The key step in getting the LUMV estimator is solving a minimization problema slightly complicated variant of the basic minimization problem that led. to the pseudoinverse. We treat this problem first out of context. Let H1 and, H. be Hilbert spaces (not necessarily separable). Let K be a bounded self-adjoint, strictly positive-definite linear operator on H2. Let B be a bounded linear operator with D(B) = Hi and closed range R(B) c H2. Let 4 be an arbitrary element of N(B)-L. We call the following minimization problem, problem I: find c e H2 to minimize (Kc, c) w2~s (I) subject to Bc = It is convenient to change the form of this problem. Since K is selfadjoint and, strictly positive-definite, it has a self-adjoint strictly 1/2 1/2 positive-definite square root K. Put i = K c, from which we have c = K /. Then problem II is defined: find, E H2 to minimize II I g2 subject to B* K- 1/2 -I = Q(II) subject to B K g - =q

30 I and II are exactly equivalent in the following sense: if a ~ exists that solves II, then K 0 is defined and c K solves I; if a c exists that solves I, hen = K c is such that B K-/ 0 is defined, and io solves II. Consequently, we can restrict our attention to II. Except for one thing we could apply the results on BAS directly to problem II; the difficulty is that the (in general unbounded) ~ -1/2 operator B K need not be closed, although it is densely defined. We can introduce hypotheses to guarantee that it is closable, but then just replacing it with its closure is not good enough, because a minimizing element g in the domain of the closure would not need to belong to D(B* K 1/2 The difficulty is circumvented by working with the adj oint. Theorem 2. 1: Problem II has a unique solution t if (1) K B is densely defined on H1 (2) B 1K /2PRK / is a bounded operator, where PR -1/2 is the projection on R(K B) (This operator is welld.efined. as will be shown). The solution q is given by t = K S 1, where S is the bounded -is1/2 -1/2 continuous extension of B K / PRK / to all of HZ (it will be seen r R 2 that B K P K / K is densely defined and such an extension exists). Further, S has closed range, R(S) = N(B)'. -1/2 -1/2 Proof: K B is closed, since K is closed and B is bounded; it is densely defined by hypothesis (1). Thus, if we put A = K/ B, A is DDC and admits a PI, A. By Theorem 1. 2, A = A PR, where PR rRR

31 is the projection on R(A) and A is the natural restriction of A to N(A)J-. We note certain prelinminary facts about the transformation A: first, N(A) = N(B), (2. 9) or, equivalently, R(A*) N(A) = N(B)1; (2. 10) second., A K- 1/2B (2. 11) r r where B is as before the restriction of B to N(B) —; third, A1 is r r bounded, or, equivalently, A has closed range. The equality (2. 9) is -1/2 -1/2 valid because N(A) = N(K B) and the only element K carries into the null element is the null element. Then (2. 11) follows immediately from (2. 10). To show that A is bounded one can verify direcr tly from (2. 11) that A = B K, and note that B is bounded. r r r To show that B=1Ki/2P -1/2 To show that B K PRK is well-defined we must show that -l 1/2 -1/2 1/2 -1/2 1/2 D(B )a R(K PRK ). In fact, R(K PRK )c R(K PR) 1/2 -1/2 R (K K B) = R(B) = R(B) = R(B ). The one perhaps non-obvious r step in this chain is the first equality, which one can verify straightfor1/2 ward.ly using the fact that K is bounded. + -1/2 - -1/2 Now consider the transformation A K = A P K = r R -1 1/2 -1/2 + -1/2 B K 1 P K /. Since A is closed. and, bounded, D(A K ) - r R -1/2 + -1/2 D(K ), which is dense in H. By hypothesis (2) A K is bounded. -Consequently A+- 1/2 Consequently, A K can be extended by continuity to a closed bounded linear transformation S defined on all of H2.

32 We show that SB is the orthogonal projection PM on N(B)-. Let x c D(K /2B) and put x =x xl + x2 N(B), x2 ~ N(B). Then, SBx = AK B (x1 + x2) A A(xl + x2) x1 by Lemma A. 1 and Theorem 1. 3. Since SB is continuous and everywhere defined and agrees with PM on a dense subset of Hi, SB = PM. * 1' But then SB is self-adjoint, and so B S = P We can now verify that. = K1/ S solves problem II. First, since N (B) *-1/2 * -1/2 1/2 B K = BK K S -B S4)=-) 0' * -1/2 so the linear constraint is satisfied. Now, AX is closed and A D B K since for all x c D(A) and y c D(K1/2), -*1/22 Ax, y) = (K Bx, y) (x, B K 1/2 *1 -1/2* Consequently, any ~ satisfying B K = ) satisfies A ~ = 4), and A ( -g) = 0. Thus, c N(A*). It will follow that I | IZ is a minimum if E N(A *) —, for then 111 o2 + 0H2 2 11 O11il12, 11_ 12 1/2 + )K 1A and But g does belong to N(A )& For SK/2 (AK K/ 1/2 A nd + 1/2 + 1/2S* + since A is everywhere defined, SK1/2 = A. Then K S (A ), so R((A ) ). But by Theorem 1. 6 and Lemma 1i. 1, R((A ) ) = R((A*) ) M(A ) To establish uniqueness, let us note first that E e R(A ), and hence by Theorem 1. 1 the problem of minimizing | 2 subject to A g - q has a unique solution'.' is characterized as that one element which satisfies A' = 4 and' e N(A*)-. Since has been shown to satisfy both these conditions, g' = i. Then, a fortiori, is a unique solution to problem II.

33 Finally, we observe that R(S) R(SB) = R(PM) N(B)-, while on the other hand, R(S) R(A+K c R(A ) - N(A) -= N(B)-. Hence R(S) = N(B)'. HII Corollary 2. 1: The conclusion of Theorem 2. 1 holds if hypothesis (2) is replaced by any of the following: + -1/2 (2') A K is bounded; -' 1 -1/2 (2") A (AA* ) P K /is bounded; r R (21"') (A A) A *K is bounded. Correspondingly, S may be defined as the continuous extension of any of the operators defined in (2'), (2") or (2"'). + - 1/2 Proof: From the proof of Theorem 2. 1 we have A = B K P r R' + * * -1 and from Corollary 1. 4 A = A (AA ) P so the assertions involving * -1 * + (2') and. (2") follow. For (2"'), we note that (A A) A c A by Theorem r 1. 8. Furthermore, -1 -1/2 *2 -1/2 -1 * -1 (A A) A K D (A A) B K K =(A A) B K r r r But R(B K )c R(A ), and by Corollary A. 4 D[(A A) ] = R(A A) r R(A~:),',DI~A~~'1 -1/2 _ 1 = R(A ), so D[(A A ) 1A K /] D(K- ), which is dense in H. Thus the continuous extension of the operator of (2"') is the same S as before. II Theorem 2. 1, or equivalently Corollary 2. 1, can now be applied, "coordinate-wise " to give a Gauss-Markov theorem. For this, we need, again that the Hilbert spaces be separable. We retain the notation of Theorem 2. 1.

3 )4 Theorem 2. 2: In the linear mnodel given by (2. 1) let Hi and 11l be separable Hilbert spaces, and let w be an H -valued random variable with mean zero and, strictly positive definite, nuclear covariance operator K. Assume: (1) B is a bounded linear operator with D(B) = H1, R(B) c HZ, and. R(B) =R(B). (Z) K 1/B is densely defined on H1. + -1/2 (3) A K is a bounded operator (or equivalently the operators defined by (2), (2") or (2"') of Theorem 2. 1 and Corollary 2. 1 are bounded). Then C = S is an LUMV estimator for xl, where xl = PMx. The solution C = S is unique. Proof: For C to be an LUMV estimator of xl it must be bounded, satisfy CBx1 = andL minimize (CKC oi, ci) over the class of all C satisfying the other conditions. From Theorem 2. 1, S is bounded and SB = PM' so S = C satisfies the first two conditions. Now the condition CBx = x1 implies CB = P.' Hence CB = B C and the condition CBx= x1 can be rewritten as B C x = xl. Recall that the c. o. n. s.. {i} was chosen so that a subsequence exactly spans I * N(B)-'-. Put c. = C 4i, i=l, 2. Then the unbiasedness condition becomes * B ci = fi for those Ei E N(B)- (2. 12) and. the expression to be minimized is 1See also [R-5].

35 00 (K c., c.). (2. 13) 1 1 If we can find bounded C so that the corresponding ci satisfy (2. 12), so that all ci corresponding to.i e N(B) are zero, and so that each nonzero term of the sum in (2. 13) is minimized, then C is LUMV, provided the sum in (2. 13) is finite. However, the finiteness of the sum is automatic since C is bounded (see (2. 8) and. the suc ceeding comments). -1/2 1/_ * But c. minimizes (Kc., c.) subject to B c. = ci if c = K (K S 4i) = S 4i by Theorem 2. 1 and the equivalence of problems I and II. Furthermore, S 4i = 0 for all Xi E N(B) because N(S*) = R(S) - N(B). Thus C* = S* provides a minimizing set of c.'s, and, C = S is LUMV. The uniqueness follows from Theorem 2. 1. I Corollary 2. 1: If the subspace R(B) is invariant under K, then conditions (2) and (3) of Theorem 2. 2 are automatically satisfied, and B is the unique LUMV estimator. Proof: Since K is self-adjoint, R(B- as well as R(B) is invariant under K, that is K[R(B)] c R(B) and K[R(B)1] c R(B)1. Then since K is nonsingular on H2, its restriction KB to R(B) is a 1:1 mapping of R(B) into R(B). KB retains the properties of self-adjointness, strict -1 -1 positivity and nuclearity; KB is defined, and, D(KB ) is dense in R(B). B B By Lemma A. 8, B is bounded from below, hence D(K B ) is dense in r r N(B)-L, whence by Lemma A. 1, D(K B) is dense in Hi. This implies condition (2) of Theorem 2. 2. We now consider B K PRK, where PR, it will be recalled, is the projection on R(K /B). Let PB be the projection on R(B). It

36 is immediate from the hypothesis that PB commutes with K, fronwhich it follows that PB colmmutes with K (c.f. [R-z ], P. 11)3). Then, since B has closed range, R(K 1/ B) -- R(K-/ PB) R(PiK - R(B). Thus P P and 1 -KI / PRK- - BK K -1/ R B' r R r B -1 1/2 -1/2 -1 1/L -1/2 =B P K K =B K K. This operator is bounded, so r B r condition (3) is satisfied, and its continuous extension is B. I.| Remarks: (1) Although in the case of Corollary 2. 1 the LUMV estimator does not depend on K, the error variance, as given by (2. 8), does. The result of Corollary 2. 1 is to be expected, of course, from the classical case. Intuitively, Corollary 2. 1 is saying that if the noise in R(B) — is uncorrelated with the noise in R(B), one may as well first project the observations on R(B). (2) The hypotheses of Theorem 2. 2 are presumably somewhat awkward to verify in many instances. An example where they are satisfied, involving conditions similar to but somewhat less restrictive than the commutativity condition of Corollary 2. 1, can be constructed as follows. The verifications are routine and will not be given. Let B be written in the form DU where D is a self-adjoint operator on H2 and U is partially isometric from HI into H2. Suppose D has discrete spectrum, so Di = kidi., where {ii} is a c. o. n. s. for R(B) and... > 0. Let {Ui} 1' 1'1 be a c. o. n. s. for R(B)'. Let K = K1 P + KP + K3P3 where: K 1 1 2 2 33' is a nonsingular covariance operator on M1 = V{q1,' qN; 1 "71... M} } K2 is a nonsingular covariance operator on 1. denotes the subspace spanned by the vector {..} denotes the subspace spanned by the vectors.

37 M = V{4N+1 N+..} K. is a nonsingular covariance operator M2 - {N+l' 1 N+2. 3 on M = VM+1 M+. } and P. is the orthogonal projection on M.. 3 M+ Z 1 i Then conditions (2) and. (3) of Theorem 2. 2 are satisfied.. (3) The condition that B have closed. range is, as we have seen, essential, but it is often not satisfied in the kinds of applications one would like to consider. In some instances, one can replace the "natural" H2, which is typically IZ or L2, by a smaller Hilbert space H'2 which still contains all elements y E R(B) c H2, but in which the range of B becomes a closed set. This cannot be done satisfactorily unless the noise is small enough in an appropriate sense. We shall describe this procedure in connection with the example of Part III.

PART III AN APPLICATION TO SYSTEM IDENTIFICATION To identify an unknown input-output system means to determine a suitable mathematical model for the unknown system from incomplete prior knowledge of the system, using data obtained by measurement of outputs and either.measurement or prior knowledge of corresponding inputs. A model is suitable if (i) it will reproduce the behavior of the system well enough, according to some set criterion, when the system is stimulated by any one of the class of inputs of interest, and (ii) it is in a useful form. Both the criterion of fit and the usefulness of the model will depend to some degree on what the identification is to be used for, e. g., to permit control of the system, to allow transmission of information through the system, to yield predictions of future behavior, etc. There is usually no reason why there should be only one acceptable model. A definition as general as the one above encompasses a tremendous variety of problems; any two of them may have very little in common with each other, and acceptable solutions may involve disparate mathematical methods. However, at bottom, identification problems are inversion problems, as will be pointed out specifically below, so it is not surprising that certain examples involve generalized inverses. Often the measurements are noisy enough that the basic inversion required becomes a problem of statistical estimation. 38

39 To fix ideas, let u denote an input to a system, and y the corresponding output. We suppose u E L. the class of inputs of interest, and y e', any fixed class of outputs containing all y corresponding to uCe. The sets'I and 2 will be assigned mathematical structures as seem appropriate (of course, in modelling a problem, these structures are not unique). We assume there is a functional relationship from u to y, that is, for each input u c %t there is one corresponding 1 output y; then we have y = F(u), uEl. (3. 1) The function.F characterizes the system in question completely, of course, as far as input and output data are concerned (as long as we think of the system as a "black box"), and we shall refer to the mapping F as the "system. " This simple terminology carries with it the implication that what one is perhaps accustomed, to calling one dynarnical system, but with different initial states, here becomes a collection of systems. Now suppose the system F is unknown and we want to identify it. First, either:from prior knowledge, or purely as a working hypothesis, we postulate a class of systems to which (we hope) the unknown system must belong. In different language, we postulate a set i't of mappings There has to be some relation between inputs and outputs or identification makes no sense. There are more general situations of course, e. g., an input u might determine a probability distribution on outputs.

40 from 7L into.', which we assume contains the unknown F. Then we carry out if possible a set of experiments to yield data to fix F closely enough within the class,Y. Temporarily we can deal with the nmapping F as an abstract entity, but eventually it must be represented concretely. The final representation of the estimate of F is the identification. Often an unknown dynamical system is represented in terms of a differential equation, the parameters of which are to be determined by the identification. This really amounts to representing F Another type of representation is directly in terms of integral operators, and this is the kind of representation we work with here. Suppose now that?6,.,' and,V are given as sets and. that in addition 2' is a linear space. We do not yet need to require any mathematical structure for 1c. There is then a linear structure imposed on g/ in the ordinary way; i. e., one defines a F, a a scalar, and F+ F2 F, F, F c, by the equations [aF](u) = aF(u) (3. 2) [F1 + Fz](u) = F(u) + F(u). (3.3) Let C_ be the linear space of mappings from L into. generated by -g /. Consider first the problem of noise-free identification of F in which the outputs y = F(u), uE ZC, can be known exactly. We interchange the roles of u and. F and regard, u as determining a mapping from Q, and. even from,j>, into v. So we can write y = U(F) = F(u), F e (3.4) for each u e 7,X, where U is the mapping corresponding to the input u.

41 The problemn of finding F is now the problem of inverting U. U is always a linear mapping. In fact: U(aF1 + F2) = [caF1 + [F2](u) a Fl(u) + F(U) = a U(F1) + UT(Fz2) Thus, in a basic sense, noise-free identification is always a linear problem. If the output of the system can be observed only in the presence of noise w, then the model (3. 1) is replaced by z = F(u) +w (3. 5) and (3.4) is replaced by z = U(F)+w. (3.6) Equation (3. 6) is an abstract version of the usual linear model for statistical estimation. This simple observation that identification is basically a linear problem, under fairly general conditions, is often not appreciated. It indicates the potential applicability of generalized inverses to identification, quite apart from the particular example to follow. We note, obviously, that the linearity can be lost by using representations of the unknown systems which do not preserve it, and such representations are often used, sometimes for good reasons. In the generic example to follow, a situation is to be considered in which / is a Hilbert space, so that w and z are Hilbert-spacevalued random variables. 7/, the space of systems, will also be a Hilbert space, and. U, which is necessarily linear, will be a bounded operator, sometimes with closed range. The model used in the example has been chosen so that we can illustrate the application of theorems

in Parts I and II, and hopefully provide some insight into certain aspects of identification theory. However, the model is not typically used in practice, partly because of the computational complexity it introduces. A number of comments need to be made to place what we are doing in better perspective, but these are postponed till'after the example. We model the class of systems to which the unknown system is to belong with a class of Volterra-Frechet polynomials. In particular, consider transformations of the form N T T y(t) = [H(u)(t) = Z f... f k (v..., v )u(t-vl)..u(t-v )dvl. dv, n~ on 0' n n' n' tET (3. 7) where T is a finite interval in R, T is a positive number or + oo, N is a fixed positive integer, and the functions u, kl...,kN, y are real or complex-valued. Let u e L 2(R1) and k e L([O T]n) n=l,.., n; that is, let k satisfy T T f. *f k( vl, **,,Vn ) dvlk., dVn < oo. (3. 8) 0 0 Clearly, if one permutes the arguments v. of the k's the integrals in 1 n (3. 7) are unchanged. Consequently, one can symmetrize each k, i.e., replace k (v,..,) by n E k (v(),., V()) where the sum is over n n n. ir n I(T (n) all permutations 7r of n integers, and we suppose this done. The symmetric kernels of n arguments form a closed linear subspace of L2([O, T in) which we denote'L The transformation defined by (3. 7) represents a system which, in systems engineering terminology, is causal with finite memory. Clearly, by changing the interval of integration, the causality condition

43 can be removed. If T = [0, T] and if u(t) = 0 for t < 0, then the integrals can be rewritten with the variable tipper limit t without changing the transfornmation. It is convenient to introduce the notations: F for the integral operator with kefnel k, and yn for the n integral in (3. 7). Then (3. 7) can n n be written N N y =- Fn(u) = y (3. 9) n=l n=l We identify the operators F with their symmetric kernels k, and define n n Fn II to be the L2 norm of k. Thus, the F form a Hilbert space that is isometrically isomorphic to n under the correspondence Fn k and to cut down the verbiage we say simply that' F. c. We define a N N norm for F by II FlI E I F= I|| = lE I k |, and, with again an n=l,abuse of notation, regard F as an element of - 1 (3... (+) AN which we denote by <. An application of the Schwarz inequality shows that IY (t)I2 < ilk 112 IlulI 2n (3. 10) and hence that y e L2(T), with 2 2 2n II Yn I < m ||k || uI| (3, 11) where m is the length of the interval T. Thus F is a mapping from L2(R1 ) into L (T), and it is bounded on bounded sets. It can be shown without much difficulty that if yn = F (u), y' = F (u') then n n n IlYn-Y'n 2 < C(n)llu-u'll 2 [max(ull, llu'll)] 2(n) (3. 12) (c. f., [R-4]), where C(n) is a constant for fixed n. From (3. 12) it follows that H is a continuous mapping from L2(R ) into L2(T). We do not need this fact in what follows, so we do not bother to prove (3. 12).

44 We are now ready to discuss identification of systenms nmodeled by (3. 7) in accordance with the ideas leading to (3. 6). Let r, the class of systems to which the unknown system is to belong, be,s. Reinterpret (3.7) as y = U(F), where F = {kl kN} and U is the operator determined by the right side of (3. 7) with fixed u. U is a linear operator from,.- into L (T). It follows easily from (3. 11) that U is a bounded linear operator. Actually, U is compact. To see this, let us again introduce notation for each term in the sum in (3. 7). Let U be n n integral operator in the sum; U has kernel [u(t-vl)... (u(t- Vn)]. N N We have y = E U (k) = E U (F ) with an obvious abuse of notation. n=l n n n=l n n Further, if we regard U as the operator on, which carries F into y and all F., j f n, into zero, we have n N N y = _ U (F). (3. 13) n=l n Now each U is Hilbert-Schmidt, in fact n 00 00 2 Zn.J S] lu(t-vl). u(t-vn)| dvl...dv dt < mlul U. (3. 14) T -oo- -oo Thus, each U is compact, and from (3. 13) it follows that U is compact. n Suppose that observations of the output are made in the presence of additive noise w, where w is to be represented as an L2(T)-valued random variable with mean zero and nuclear covariance operator K (see Appendix B). The model is then of the form of (3. 6), which we repeat, z = U(F) + w, (3.6) and we wish to estimate F c /' = ok. Consider first (non-statistical) least-squares estimation of F-for this, of course, we disregard the statistical properties of w; w is just an error. U is DDC, so by Theorem 1. 1 a BAS exists iff z e FU, the set of z's for which the projection of z on

45 + + R(U) belongs to R(U). The BAS is given by U z, where D(U ) = FU. + -: Since U is bounded, U = U" (U U) U by Corollary 1. 5. These r results, though they describe the situation that obtains, are of little practical interest if R(U) is not closed, because an unbounded estirnator that is not always defined is of little value. Unfortunately, R(U) is not closed unless U is a degenerate operator, since U is compact. If we consider the estimation of F by linear unbiased estimators from a statistical point of view, the same difficulty obviously arises. In particular, Theorem 2. 2 does not guarantee an LUMV estimator unless (aside from other hypotheses) R(U) is closed. Fortunately, this difficulty can be circumvented mathematically by replacing the observation space LZ(T) with a smaller Hilbert space in which the range of U is closed; we do this below. This replacement changes the model of the problem in such fashion that, speaking loosely, some output information is lost. This loss of information may or may not be of consequence, depending on the noise, as we shall see. We digress from the example temporarily to construct the new Hilbert space that replaces the observation space H2. This can be done in more generality than is needed for the example. Lemma 3. 1: Let T be a DDC operator on a Hilbert space H. Then the linear space D(T) is itself a (complete) Hilbert space with the inner product (x, Y)'T = (x, y) + (Tx, Ty), x, y e D(T). (3. 15) If T is bounded from below, D(T) is also a Hilbert space with the inner product (x, Y)T = (Tx, Ty), x, y c D(T). (3. 16)

46 Proof: Since T is closed, the graph of T, G(T), is a Hilbert space with the norm, Il {x, Tx}12 = 1 xi2 + IITxl2 ([D-2[, p. 1186). That is, (3. 15) defines an inner product for G(T), so D(T) is a Hilbert space. If T is bounded from below there is a > 0 such that IITxII > all xll, xED(T). Then II xl = IITxll2 < XII2 + HiTxl 2 < (a-1+1)i x2 T +x so the norms given by (3. 15) and (3. 16) are topologically equivalent. I|| Theorem 3. 1: Let B be a DDC linear transformation from the Hilbert space H1 into the Hilbert space H2. Then there is a Hilbert space H2 formed on the linear set FB c H2, but with a different norm, such that (1) R(B) c H2 and R(B) is closed in H2. (2) The orthogonal complement of R(B) in H2 is the same set as the orthogonal complement of R(B) in H2, and furthermore the H - norm of y e R(B)' is the same as the H2-norm. (3) B, the mapping B reinterpreted as a mapping from Hi into H2, has closed range, and is bounded. iff B is bounded. Proof: B*, being DDC, has a polar decomposition, B: = JT, where T (BB*)/2 is self-adjoint and J is partially isometric ([D-2], XII 7. 7). Then B = B** = T -J* since J is bounded ([D-2], XII 1. 6), and. J* is also partially isometric ([D-2], XII, 7. 6) with initial domain R(B*) = N(B)-.- and final domain R(B). To avoid, some confusion of notation later we put M = R(B) and N = R(B) L, so that H2 = M O N. Now R(T) = R(B) = M, and the restriction of T to M, T, has a self-adjoint inverse T by Lemma r r A. 14. We note further that D(T ) = R(T) = R(B). In fact, from r

47 B = TJ it follows that R(T) D R(B). To show that R(B) -) R(T), suppose y e R(T), then y = Tz for some z c N(T)- - R(T) = R(B). Bult z e R(B) is given by z = J x for some x c N(B).L, since J restricted to N(B)- is a unitary mapping onto R(B). Thus y TJ x = Bx, so y c R(B). -i -1 We can now apply Lemma 3. 1 to R(B) = D(T ), with T replacr r ing T, and call the resulting Hilbert space M. Then we define H2 =M t) N. The assertion (1) is now immediate since M is itself a Hilbert space. (2) is obvious from the definition of H2. R(B) = R(B) is just a restatement of (1) since R(B) and R(B) are the same set. We have left to show that B is bounded iff B is bounded. Let Pl and P2 be the projections on M and N respectively. For x C H, let xl be the projection of x on N(B)- and x2 the projection on N(B). Now, using the norm induced by (3. 15) for M, and denoting norms in H2 by | * II|, we have for x e D(B) 2. lBxx l!+ T ~Bxll2 = l Bxl 2 + II T TJXI 2 = IBxII2 + IIx 1I2 which proves the assertion. || Remark 1. If B is bounded the norm established by (3. 16) can be used for M, and then B is partially isometric. Remark 2. If B is compact, T = BB* is compact as well as selfadjoint and positive, so that T e = k ne, n=l, 2,..., where the en are orthonormal and span R(T) = R(B), and where kN > 0. a.nd converge n

48 monotonically to zero. The norm induced by (3. 16) is applicable in this case, and we have for y E H2, IYI2 = Xn (y, e)2 + IIP2y~1 (3. 17) n 112 where the norm in the last term is the norm of H2. Let us return to the identification problem foir which the model is given by (3. 6), or by (3. 4) in the noise-free case, and for which u is the class of Volterra-Frechet polynomials which has been introduced. The operator U now plays the role of the operator B in the preceding theorem; otherwise we shall use the notation of that theorem and of Remark 2. For the noise-free case, since R(U) c H2 (as a set), we can clearly replace the observation space H2 and the equation y = U(F) with the space H2 and the equation y = U(F), where H2 and U are as given byTheorem 3. 1 with B = U. When there is additive observation noise, however, so that the model is given by (3. 6), this replacement may mean taking unjustified liberties with the model. For if LZ(T) is the "actual" observation space, then for some values of w(w), z will not lie in H2; changing the mathematical model would in this case actually mean changing the original problem. The only case where no harm is done in changing to H2 is when w(w) belongs to the set H2 with probability one. A condition which guarantees this is given in the theorem to follow. Furthermore, the condition will prove to be necessary in a sense to be specified. Since w(w) is an H2-valued random variable, in the sense defined in Appendix B, {(w, e)}, with the en chosen as in Remark 2, is a sequence of scalar-valued random variables. We suppose w has

49 mean zero and nuclear covariance operator K; then E[(w, e ) 0, n El (w, e)2 = (K en, en) = variance [(w, e)]. From Remark 2 it follows that oo lw(wa)|2 -= )-11 (w(o),e )12 + I|P2w(w)12 (3. 18) n=l for all w for which the right side is finite. The orem 3. 2: If -1 -'n (K e,en) <0 (3. 19) ne n then w(w) c H (as a subset of H2) with probability one. If w(w) c H2 is defined to equal w(w) for all w for which I Iw(co)l. is finite and zero otherwise, then w(w) is an H2-valued random variable with mean zero and, with a nuclear covariance operator K. Conversely, if w(w) E H2 with probability one and w is defined, as above, then the existence of a nuclear covariance operator K for w implies that (3. 19) is satisfied. Proof: From (3. 18) it follows that 00 Elw()1 = x nA (Ke en, en)+ E|| PZ w()ll 2 (3. 20) n=l Since the second term of the right-hand, side of (3. 20) is finite by the properties of w, the condition (3. 19) guarantees the finiteness of (3. 20) which in turn implies that w(w) e Hz with probability one. With the notation used in Theorem 3. 1, 2 = M (3 N. We establish a c. o. n. s. for H2 by separately specifying c. o. n. systems for M and N. The c. o. n. s. for N can be taken arbitrarily; we denote it by {fm}. For M it can immediately be verified, using the fact from

50 -2 -l Remark 2 that Tr en X e, that {en} is a c. o. n. s. in M if e = X / e. Then, with w defined as in the statement of the theorem, n n n we have with probability one, (w,f) = (w, f) (3. 21) and. (w,e = (Tr w,T en) = (w,T 2l/2e )) r n n - 1/2(w,e ). (3. 22) n n From (3. 21) and. (3. 22) and the fact that w is weakly measurable it follows that (w, fn) and (w, e ) are measurable scalar-valued functions. Then a limit argument implies that w is weakly measurable, and hence is an H2-valued random variable (see Appendix B). It now follows from the fact that EI| wI 2 = ElIW WI < oo which has already been established., that w has a bounded (and hence selfadjoint) covariance operator K. The proof is the same as the corresponding proof in Appendix B. To show that K is nuclear we show that K /2 is Hilbert-Schmidt and use the fact ([G- 1], p. 39) that the square of a Hilbert-Schmidt operator is nuclear. In fact 00 00 00 I1K/2 (2Ke,I - -Z E(,Iew)~2 Fj IIl~ ( n n | n n=l nl n=l o = x \E1 (enw)I2 n=l oo = -n (Ke, en) < o (3. 23) n=l n n and,

51 oo o (K 8 8 l/2ml 2 11(n f~n',n n=l n=l oo (Kf,f) < 00, (3.34) n=l where we have used (3. 21) and (3. 22). The finiteness of the left sides of (3. 23) and, (3. 24) demonstrates I1/2 that K is Hilbert-Schmidt ([G- 1], p. 34). The existence of the mean of w follows as in Appendix B. Equations (3. 21) and, (3. 22) together with the fact that Ew = 0 imply that Ew = 0. To prove the converse we suppose K exists and is nuclear. Then 00 K has finite trace, so a fortiori the suim E (K en, e ) must be finite. 00 n1l n_11nn=1 But this sum is the same as E i K /2e 112 and the conclusion n=l n follows by (3.23). ill Thus, in the identification example, if the condition (3. 19) is satisfied, we can replace H2, U and w by H2, U and, w, respectively, and then apply Theorem 2. 2 to the new model with the assurance that the new model is just as faithful a representation of the real -life problem as was the original one. Theorem 2. 2 will apply, of course, only if the covariance operator of the noise is such that conditions (2) and, (3) of that theorem hold. These conditions do not conflict, at least in general, with (3. 18), as consideration of Corollary 2. 1 and the remark (2) following it will show. As indicated earlier there are a number of comments to be made, both regarding the example as such, and its place in identification theory.

52 Remarks (1). The condition (3. 18) can be interpreted as specifying that a "noise-to-signal-ratio" is finite, or more properly that the sum of noise-to-signal ratios in each orthogonal component is finite. (Ke, e ) n n is the variance of the noise component (w, e ), and, n is the square of the "signal" component along e (2) There is no difficulty in extending the system model of (3. 7) to the finite-dimensional vector case where u and y are vector-valued (c. f. [R-3]). The same results follow. (3) The restriction that F be time-invariant, can be relaxed in the following way with no essential changes resulting. Let the kernels k be of the form: M k (t, v1..,v) = ai(t)k v. v ) (3. 20) ~ n il1ni- n With M finite and suitable restrictions on a. (t), the function spaces are still Hilbert spaces, and U is still compact (c. f. [R-4]). Certain classes of time-varying systems can be modeled, this way. (4) The estimate one obtains is, of course, only for F1, the projection of F on N(U) —. How'large" N(U)- is depends of course on U. If T is finite, one can choose a sequence of inputs ul, u2,.., of duration m-T, say (recall that m is the length of T); then, allowing T units of time between inputs (dead-time), make repeated measurements. If p measurements are made, the output space becomes the p-fold product of L2(T), the input space becomes the p-fold product of L2(T-T), and U is suitably determined by ul,..., u., however, remains effectively the same, Clearly, N(U)j- is a monotone nondecreasing sequence of

53 subspaces with increasing p. One would expect to do a better job of identifying F as mnore nmeasutrenenlts are made, if the ui are sulitably chosen (see Remark (5)). B31t iliaking the P repeated measurellcrnts is just one special way of making one measurement where the input has duration pm-T and the odtput has duration pm. (5) From the point of view of identification theory as reasonably constrained by practical considerations, we feel a better approach than what has been done here is the following. Assume a priori that,26: is a compact subset of and that is a compact subset of L2(R1 Then, given e > 0, the model (3. 4) can be replaced by a finite set of linear equations with finitely many unknown parameters to be found with the property that the solutions for the parameters in the finite set of equations will determine the true system in'z-uniformly to within the specified c> 0. With the addition of observation noise w, the problem becomes that of estimating parameters in a finite-dimensional version of (3. 6). Then the classical theory of pseudoinverses and of LUMV estimation applies (c. f. [R-3], for example. One can at least argue that the compactness assumptions required are reasonable for most problems. When identification is studied from the point of view of finitedimensional approximations, the question raised, in the previous remark about the number of measurements necessary (or the required duration of one measurement) is easier to phrase:satisfactorily, and, is often answerable in principle at least (c. f. [R-3] ).

(6) In fairness to the reader unfanmiliar with the area, it should be pointed out that the application of Volterra polynomials to problems in system identification has been considered for some time (c. f. [B-2]) but as has been indicated, has not been widely used. in practice. A much more common approach is to estimate parameters directly in a state-variable model, using techniques from the Kalman-Bucy recursive filtering theory.

PART IV ON THE QUADRATIC REGULATOR PROBLEM' In optimal control theory, no problem has received more attention or attained. a greater degree of maturity than the one indicated by the title of this section (see [I-1] for detailed, summaries and, bibliographical information). Various versions of the quadratic regulator problem (hereafter designated QRP) have appeared. in standard texts [A-3] [B-6], perhaps because minimization of a quadratic loss functional applied, to a linear dynamical system with linearly constrained inputs leads to a simple and. easily understood solution. Here we shall exhibit the PI as unifying apparently disparate aspects of the QRP, providing generalizations beyond its customary form, and, having broad capabilities to solve QRP specified in infinite dimensional spaces and by unbounded operators. The optimal control literature generally relates the QRP to the notion of a linear dynamical system, as described by x = C(t)x + B(t)u x(0) = x. (4. 1) Here u(. ) and x(' )-usually referred to as input and output, respectively-are finite dimensional vector functions of time over [0, T], C(. ) and, The authors wish to thank Professor Elmer G. Gilbert for his helpful comments on this section. 2The PI was first applied to a QRP for a linear dynamical system in [K- 1]. 55

56 B(-) are matrix functions of consistent dimension and, assuring that a n p rn unique solution x E L corresponds to every input u E L2. By LZ we mean a Hilbert function space; v E Lg implies that v(t) is an rn dimensional vector for each t E [0, T], with v having measurable components v., and being possessed. of the Ln norm 1 2 llvii = i f 2 fviv(t) dt (t)dt - (4. 2) iY-1 0O We can now state one form of the QRP, sometimes called. the fixed endpoint QRP. It is desired. to choose u E Lp which moves the system from x(O) = x to a designated x(T) = x, and among the u satisfying this constraint find the u having minimum Lp norm. (See [B-6], p. 137). As usually phrased and solved, this QRP presupposes conditions insuring that any x can actually be attained by the system (4. 1) at time T; the solution is then said, to exist. But actually, this is an unnecessarily narrow interpretation of the fixed endpoint problem due to the limitation of the techniques sometimes used. Application of the PI places this QRP in a natural setting admitting easy generalizations. Let us begin our analysis by describing a classical approach to the fixed point QRP, and thus exhibiting the limitations resulting therefrom. Since T x(T) = f W(T,t)B(t)u(t)dt + W(T, O)x (4. 3) 0 in terms of the state transition matrix W(.,. ), the constraint x(T) = x can be expressed for an n dimensional x by the simultaneous linear equations

57 (u, h.) = a., in (). )I) 1 1 th and h. (-) the p component row vector function with i- entry h.(t) = [i i... niW(T, t)B(t). (4. 5) 1 l i i ni To complete the classical solution, let M be the subspace of Lp spanned by h, h,...,hn, and write u in terms of the orthogonal decomposition n u = Ah. +u (4. 6) 1 i 1 with u 1 E M'-. If (4. 6) is substituted in (4. 4), we see u meets the required constraint x(T) = x iff n Z aj(hj,hi) = ai ilZ,...n. (4. 7) j=l J J 1 1 The second; term ul in (4. 6) plays no role in meeting the constraint, and (by the Pythagorean theorem for orthogonal elements in Hilbert space) only serves to increase I'IuIH; consequently, j|uI{| is minimized uniquely by choosing u1 = 0 (4. 8) Evidently, the system can satisfy the final state constraint x(T) = x for arbitrary x iff the matrix with elements {(hi, h.j) is nonsingular.: This matrix is Gramian [A- 1], which means that its invertibility is equivalent to the linear-independence of the h., i=l, 2,...,n in Lo. If we say-following standard terminology in

58 linear systems theory-that the system is controllable over [0, TI if, for every x and x there exists an input u e LP taking the system from x(O) = x to x(T) = x, we have attained the following result: the system (4. 1) is controllable over [0, T] iff the n rows of the matrix W(T,- )B(. ) are linearly independent in Lp. This criterion is well known in systems theory [C- 1]; our arguments constitute a short and, elegant proof of its validity. We now turn to a PI formulation generalizing the fixed endpoint QRP in several directions. First, the QRP can be solved, even if the system is not controllable. In fact, the optimal input u0 to (4. 1) minimizes!Ix(T) - x I| over all u e LP, regardless of controllability; if the system (4. 1) should be controllable, this norm is zero and. the u obtained. via the PI model agrees with the optimum resulting from the classical problem statement. Second, we shall be able to consider constraints not only on the endpoint x(T), but also on x and u. These three classes of constraints can all be expressed in terms of linear functionals on u, so that they can be viewed, in unified, form. It is also noteworthy that the PI yields valid optima even when the constraints are incompatible with one another. Finally, the PI model permits us to consider dynamical systems (4. 1) in which u(t) and, x(t) are Hilbertspace-valued (i. e., infinite dimensional) for each t E [0, T], although we will continue to be restricted to a finite set of constraints. To apply the PI to the fixed endpoint QRP, we need. to express the input-output system relation by a linear transformation, that is Lu = x, (4. 9)

where u is any elenment of a Hilbert space H1, and L:H1 H is a linear bounded operator defined for every such u. In general, a dynamical system such as (4. 1) fails to satisfy (4. 9), but we may write (substituting x for x) x(t) = (Lu)(t) + S(t)x; (4. 10) then, taking x = x - Sx, we see (4. 9) to be applicable without loss of generality. The linear constraints are considered next. They represent desired system behavior, but do not preclude the possibility that they cannot be attained; in the latter event, the PI automatically chooses a u which causes the constraint values to be approached as closely as possible. As we have mentioned, there are three types of constraints, all of which ultimately lead to constraint equations having the form (u, f) = a. 1. 1 A linear functional on u is already in the stated. form. As for a linear functional on x, we may write G. (x) = a. which, by the Riesz repre1 1 sentation theorem and the definition fi = L*gi gives rise to G.(x) = (x,gi) (Lu,gi) (u, Lgi) =(u f) i a.; (4. 11) the constraint is then in the proper form. Lastly, choosing f. = h. as 1 1 in (4. E), and. defining a. as in the equation above, (4. 5) renders (u, f.) 1 1 [x(T)]i and, a. = xi. whence (u, f) a i=l, 2,.., n is equivalent to 1 i''' 1 1 1 x(T) = x. Application of the PI involves an operator equation to which the PI gives the BAS. Thus, the constraints expressed as (u, fi) = a. must 1 1

60 be translated into a more appropriate form for this purpose. Accordingly, let H3 be a separable Hilbert space with a maximal linearly independent set {y }, and define A:LP H by Au = (u, fi)Yi (4. 12) Then, with z - ZaiYi (h;. 13) we obtain Au = z (4. 14) as the operator equation embodying all the desired constraints. If a finite constraint set is set forth, the range of A is a finite dimensional set in H3 and perforce closed. By Theorem 1. 2 and. Corollary 1. 2, A possesses a PI A which is bounded and, delivers a BAS (cf. Definition 1. 1) u =A z = a (4. 15) 11 1 which means u LP is the unique input to the dynamical system satisO 2 fying inf ||Iz-AuI = inf { [(u,fi)-ai][u, fj)-a.](yi, j)} = p2 uL( 16) (4. 16) [(uo fi)-ai][Oo fj )-ai] (Yi Yj) iJ and having smallest Lp norm (i. e., least energy) attaining the infimum (4. 16). By choosing suitable families {y } we are thus able to prove existence and. uniqueness of optimal controls for a variety of quadratic

61 loss functions. We remark parenthetically that if there exists a u E Lp exactly satisfying all the constraints (u, fi) - a., (4. 16) be2 1 1 comes zero, and u is the element of smallest norm which meets the 1 constraints. In particular, if the constraints represent x(T) = x and the system (4. 1) is controllable, the u furnished by the PI (4. 15) is the same as the optimum obtained as the solution of the classical fixed endpoint QRP. In applying the PI to the QRP in the manner described, we need not be concerned with the consistency of the constraints; if the constraints are incompatible, the PI mediates among them to produce a compromise BAS. To illustrate this property, we given an example involving a simple constraint. Consider therefore the system (4. 4) with n=, hI = hZ = h and al? a2, so that (4. 7) is clearly impossible to satisfy. We may, however, apply (4. 15) to determine the BAS u. The latter naturally depends on the choice of Y1 and yZ, as can be seen from (4. 16). If we choose yl and y2 orthonormal (for convenience), we have N(A)- = V~h), R(A) = V{yl+y2}, Ah = l hfll (y1 +y2), and z = a 1 + ay In view of PR= (a + a the description of A+ by Theorem I. 2 leads to + 1 h u - A z =(a + a) Thus far, reference in (4. 1) has been to dynamical systems in which x(t) and u(t) are finite dimensional vectors of fixed dimension for each t. Since we have made no use of the finite dimensionality, we may as well generalize (4. 1) as follows: U(t) E K1 for almost every

62 t E [0, T], K1 being a separable Hilbert space. Assume u(.) strongly (Bochner) measurable [H-2], and u E H(K1), that is T 2 UI112 = S II (t) l1dt < oo. (11. 17) 0 Further, in (4. 1), let B(- ) be an essentially uniformly boundedr measurable operator valued function which for each t carries K1 into a new Hilbert space K2, and let C( ) be locally Bochner integrable, with C(t) an endomorphism on K2 for almost every t. Then, if (4. 1) is interpreted in integral equation form, there is a unique solution x(- ) which is uniformly bounded (cf. [M- 1]) and hence belongs to H(K2); indeed, one obtains (4. 9) and (4. 10) as before, with L:H(K1) - H(K2) and S(T):K2 - K both bounded operators. The remainder of the analysis 2 2 is unchanged, provided that there is a finite constraint set, which then implies R(A) in (4. 14) to be finite dimensional. The finite dimensionality of R(A) [with A specified by (4. 12)] becomes inconsistent with a constraint of the type x(T) = x whenever K2 is infinite dimensional. At this point we could confine ourselves to a finite constraint set as represented by the projection of x onto a hyperplane of finite dimension. Alternatively, we might attempt to deal more directly with infinite dimensional x(T) (or any infinite constraint set) by choosing {fi} and {Yi} in Au = (u, fi)yi so that A [cf. also (4. 12)] is DDC with closed range. Only then is the PI A defined for arbitrary z (Corollary 1. 2) and hence for every choice

63 of x. Moreover, z makes sense only if the sum aaiYi appearing in (4. 13) is convergent in H3. The requirements on A and z are formidable, and cannot be met in general. For instance, the obvious terminal constraint choice of Yi) as a complete orthonormal set in H3 = K2 and the a, as projections of x on V{yi) lead to a convergent sum for z, but in combination with a DDC operator A whose range is not closed. The limitation of the above PI technique to a finite constraint set is avoided in a somewhat different QRP model, which is termed the free endpoint QRP. As usually described in the optimal control literature, the free endpoint QRP is concerned with moving the system 0 (4. 1) from x(O) = x toward the origin at time T with minimum expenditure of energy, simultaneously maintaining x as small as possible. More precisely ([B-6], Section 21), one wishes to attain the infimum of the loss function J(u) = S [Ilu(t)II2 + iIQ2x(t)112]dt + IQ3x(T)II (42. 18) 0 where the I are Euclidian norms on the relevant finite dimensional vector space. Analysis of the free endpoint QRP revolves around the existence and uniqueness of the optimal control [i. e., u attaining the infimum of (4. 18)], and the computation of the optimum whenever it does exist. In what follows, we shall generalize the free endpoint QRP to infinite dimensional spaces, and demonstrate that the optimal input is (uniquely) provided by the PI even for some models involving unbounded operators. Moreover, our loss function

64 I(u) = IlQ(U-Zl1)1I + IIQ2(x-z2)lI 2 + IIQ3[x(T)-z 311 2 (4. 19) will be more general than the J( ) of (4. 18). The zi in (4. 19) may be thought of as target values of the input, output and terminal state sought in the system design; they also reflect the change of variable (4. 10), so that (4. 19) becomes equivalent to (4. 18) if we specify in particular Z = 0, z2(t) = - C(t)x and z3 = - C(T)x. We should mention, however, that any interpretation of (4. 19) in terms of an optimal control problem is merely an intuitive convenience, since we shall make no use of any special properties of the dynamical system (4. 1). For treating the minimization problem incompletely posed by (4. 19), we establish the following structure. Let u E H1 (an arbitrary Hilbert space), and take L:H1 - H2 with Lu = x. (4. 20) It will be supposed that L-like all other operators mentioned hereis linear and DDC. We shall assume further that H is a Hilbert 2 function space, so that we are writing x for x(. ), and that there is a linear DDC operator S:H1 -H3 such that x(T) = Su. (4N.21) We introduce yet another Hilbert space, K = H1 H2 X H3; its norm is the standard II{u,v, )}12 = Ilullz + llviZ + IlwlZ (4. 22) The operator A: H1 - K is defined by Au = {u, Lu, Su} (4. 23) and we have D(A) = D(L) fl D(S). (4. 24)

65 Since L and S are both closed, Au is a closed graph in K, and A itself is a closed operator. But D(A) is not necessarily dense in H1, so A must be assumed densely defined by hypothesis. Lastly, let us define Q: K -* K by the relation Q{u, v, w) = {Qu, Q2v, Q3w (4. 25) where each Q. is not only bounded, but also bounded from below. The latter condition is expressed by the existence of an a > O such that I1Q iYi ->a IylJy for allYi EH. (i4. 6) We can now state Theorem 4. 1: Let L and S be DDC operators as defined above, with D(L) n D(S) dense in H1. Assume that each Q. is bounded and bounded below. Then, for each z = {z1, z2, z3} E K there exists a unique u E H1 satisfying I(uo) = inf [I(u)], (4. 27) u EH1 with I as in (4. 19). This u is given by + u = (QA) Qz. (). 28) Proof: It is clear from (4. 25) and the properties of the Q. that Q is an endomorphism on K, and is-bounded below; from this (and that A is DDC). one shows QA to be a closed operator. Furthermore, A is densely defined, so the same is true for QA, and, in fact QA is DDC. The range of QA is closed, because [see Lemma A. 8(e)] for all u E D(A) alAull IetII as llsw (4. 29)o - and the inequality also shows that QA is one-to-one. In the light of

66 these properties, we can now consider the BAS for the equation (QA)u - Qz; (,.30) since QA is DDC with R(QA) closed, the BAS is furnished for every (Qz) E K (Corollaries 1. 1 and 1. 2) by the PI. This means the u of (4. 28) satisfies IIQz- (QA)uol = inf IIQz - (QA)uIt. (I. 31) u EH1 If we compare the norm of Qz-(QA)u with the expression (4. 19) for I(u), we see at once they are identical, so that (4. 31) is equivalent to (4. 27). To prove u is the unique element attaining the infimum of (4. 31), note (from the proof of Theorem 1. 1) that (QA)u must be the projec0 tion of Qz on R(QA). But QA is one-to-one, so u is the only element of H1 such that (QA)u = PR(Qz) H|1 Various other combinations of assumptions also yield. free end.point QRP having the same solution (4. 28). Most of these are of little interest, and can be reproduced by the reader as needed. However, we may want to consider bounded L and S, especially since these correspond to the dynamical system (4. 1) with infinite dimensional vector valued. functions as per our earlier discussion. We find that the assumptions on Q2 and. Q3 can then be relaxed, as is indicated by Corollary 4. 1: If L and S are bounded operators, if each Q. is bounded.. 1 and Q1 is bounded from below, the conclusions of Theorem 4. 1 continue to hold.

67 Proof: QA is now a bounded operator which is everywhere defined. The remainder of the proof of Theorem 4. 1 remains unchanged, the same assertions being valid. throughout. III

PART V PSEU DOINVERSE OPE RAT OR APPROXIMATIONS In the literature on the matrix PI, it is sometimes proposed. that the'PI be obtained by a sequence of approximations involving AA* or A*A (see [A-2], Theorem 3.4 and [L-2], p. 167). Such forms exhibit computational simplicity, since they substitute the inversion of a strictly positive (Hermitian) matrix for more complicated, combinations of inverses. When approximations of the same type are applied, to DDC operators, the objectives are necessarily different. Computational simplicity is no longer at issue; we are now concerned, with analytical tractability, particularly for unbounded operators whose range may not be a closed set. One goal is to circumvent the problems that inevitably arise when R(A) is not closed, and A is unbounded, and, not defined everywhere; we have already sought solutions in Parts III and IV by changing the topology, but now we also consider approximating sequences of operators. The description of A and approximations thereto in terms of the * * positive operators AA orA A not only generalizes matrix approximations of the same type, but also suggests use of the spectral representation and' te associated functional calculus. We shall find, that the PI can be expressed in terms of positive operators (and the functional calculus) as a consequence of the use of the polar decomposition ([D-2], XII. 7. 7); indeed, the polar decomposition proves to be an extremely 68

69 powerful tool in the analysis of the PI, and, will be used often in this section. Finally, we remark that the approximations considered below are closely related to the operators A' and A" defined by (1. 38) and (1. 1[4), respectively; these operators are studied extensively in Part I, and find application in Parts II and III. Since we are dealing with DDC operators, we shall face questions that fail to arise in the context of the matrix PI. In addition to the boundedness and domain properties as in Part I, we shall need to investigate the mode of convergence for each approximation to the PI. In particular, we shall show that certain approximation sequences always converge to the PI strongly, but in norm iff R(A) is a closed, set. In what.follows, we shall freely use the notation and, results of Part I and Appendix A, although we shall assist the reader with specific references whenever possible. For future reference, we also call the reader's attention to the well known content of Lemlma 5. 1: Every DDC operator A has a polar decomposition A = VS (. 1) where V:H1 - H2 is a partial isometry from N(A)-L onto R(A), and, S is the positive operator (A A) with spectral representation S (A A) 1/k J \dG (5. 2) In (5. 2), Go. is the projection on N(A). Alternatively, A may be decomposed, as A = TU, (5. 3) U:H2 - H1 being a partial isometry from R(A) onto N(A)' [and null on

70 R(A) ], and T:H2 H1 the positive operator T - (AA1 / (.4) 0 he re E is the projection on R(A~ Proof: The first decomposition is stated, and proved in [D-Z], XII. 7. 7, and, the spe-ctral representation discussed in XII. 2 of the same reference. Now apply this decomposition to A, i. e., A UT'. On taking the adjoint of A, we obtain (5. 3); the equality is a consequence of U bounded ([R-2], Section 115). From the equality ([D-2], XII. 2. 6) 00 II Tyll2 =S f ZdIE YI 2 0 and, the properties of the resolution of the identity, we see that E is the projection on N(T) = R(T)- = R(AA )L = R(A)- (compare Corollary A. 3). The proof that Go is the projection on N(A) is similar.||| The representations (5. 1) and (5. 3) lead, to a natural way of expressing approximations to the PI when these involve 2 2 AA =T or A A. (5. 6) To conveniently compare the approximations with A, we then write the latter in terms of T (or'S) also. Furthermore, the formulation of A in terms of the polar decomposition may be of some independent interest. Motivated by these considerations, we state and. prove

71 Theorem 5. 1: The PI A of A can be represented as + -1 A =U T P (5. 7) r r R or alternatively + -1 -1 A S V PR. (5. 8) r r R Proof: We apply the characterization of A A P of Theorem 1. 2. r R From the definition of U as a partial isometry, N(U ) N(A), so the results of Appendix A on restrictions of operators are applicable. By Lemma A. 9, we may write U to denote either (U ) or (U ) interchangeably; hence, for x c N(A)-, A x = Ax = TU x. Furr r thermore, U is a unitary operator on the restricted spaces, with R(U ) R(U*) = R(A) so that we even have r-1 A = TU T U = T U. (5.9) r r rr r r -1 To compute A from (5. 9)-this inverse exists by Lemma A. 6-we first argue that T is invertible; this follows because (5. 9) implies r A U = T, with both operators on the left side of the equality having r r r inverses. With the existence of T, inversion of (5. 9) yields r A = U T (c.f., [R-2], Section 114), and this proves (5. 7). r r r The argument on (5. 8) is analogous and even easier. R(S) = N(A, which means A = V S, Vr being a unitary operator from N(A)' to R(A). Clearly, A =V S from which A = S V-1. Then A is given by (5. 8) r r r r r r as claimed. III To avoid possible confusion, we again remind the reader that the restrictions applied to any operator are to the nullspace or closure of the range of A. The identities and relationships given in Appendix A apply to Ur, Tr, etc. only because the null and/or range spaces coincide with N(A) and/or R(A).

72 The restrictions of S and T are easily identified in terms of the spectral representations (5. 2) and (5. 4). Since the spectral representations will appear again later in this section, we need Lem.ma 5. 2: Let H =G G - G and F E - E (5. 10) Then S and. T have the respective spectral representations r r oo oo S = f kdH and T = f dF (5. 11) 0 0 Proof: The verifications are identical for S and T, so we need prove r r the spectral representation only for the first of these. From Theorem 5. 1, G is the projection on N(A), whence G x = Hkx for any x E N(A0 this means the spectral integrals (5. 2) and (5. 11) are the same when applied to any such x. It remains to show that Hk is a resolution of the identity on the Hilbert space N(A). Evidently, Hk is a right continuous increasing family of projections with H = 0 and lim H - 0o I - G. The latter is the projection on N(A)-, which is the identity in the restricted space. Thus, Hk satisfies all the conditions required of a resolution of the identity. i| We are now ready to define the approximations to the PI which constitute the principal objects of our study. For any positive number r, let A' = A [FI + (AA )]. (5. 12) oIn the finite dimensional case ([A-2], Theorem 3. 4 and [L-2], p. 167) the limit of A' exists as C - 0, and in fact lim A' = A. (5. 13) cr30

Since one is only required to invert a positive matrix to imnplemlent A' (5. 12) appears as an attractive formula for approximating the PI. Some matrix manipulations on (5. 12), followed by a passage to the limit, yields lim A' A' (5. 14) cr 30 0 where we have previously defined A' = A (AA )r PR (5. 15) Clearly, A' is another form of the PI in finite dimensional spaces; this also is obtained as a special case of our Corollary 1. 4. As we shall see, the simple and straightforward. results pertaining to finite dimensional spaces continue to hold in the more general context of operators whose range is closed. In the event R(A) is not closed, A' is an RPI as shown by Theorem 1. 7, whereas A' behaves quite differently from A', and moreover fails to converge to A in norm as C- -C 0. The comparison of A, A and A' for operators of non-closed range is of particular interest in Part II, where the PI of a HilbertSchmidt operator makes its appearance. The essential facts regarding A' are summarized in Theorem 5. 2: The operator A' given by (5. 12) is bounded, and defined on all H2, and. the same is true for AA' = (AA )[arI + (AA )]. (5. 16) The restriction of A' to D(A ) converges strongly to A. The conovergence lim A' = A (5. 17) cr- O

74 is uniform (i. e., in norim) iff R(A) is a closed set. Remark: Since AA' is defined for all z e H2, one nmight conjecture that A' accomplishes what A cannot. As we know, the BAS fails to exist and z d D(A ) if PRz t R(A) (see Theorem 1. 1), but it is reasonable to expect that A' delivers something close to the optimum as abecomes small. These possibilities will be explored. later in this section. Proof: It is known that [a-I + (AA")1 is bounded and everywhere defined ([R-Z], Section 118), and since A is DDC, the operator A' is closed. We also have R[I + (AA')] }= D[I- + (AA )] D(AA ) c D(A ) (5. 18) so A' is everywhere defined and, being closed, must be bounded acacording to the closed graph theorem. The same reasoning shows AA' bounded and everywhere defined also. For a comparison of A and A' it suffices to consider the application of these operators to z e R(A). In fact, N(A ) = R(A)- as shown in Lemma 1. 1, and we now prove N(A' ) l R(A)'. Suppose 0then z e R(A)L, which implies [-I + (A.A l)] z-= z R(A), or in other words, R(A) — reduces [-I + (AA )] 1 But N(A ) = R(A)so we obtain A' z = 0. To demonstrate the asserted, strong convergence of A' to A, we apply (A' - A+) to z e R(A). Since z = Ax for some x eN(AC1, and since A Ax = x for such an element (Theorem 1. 3), we have I (A' - A )zI 12I={A [I+ (AA )]-A}x xl 2 (5. 19) All operations on the right side of (5. 19) take place in the restricted,

75 Hilbert spaces N(A4L and R(A) because [LrI + (AA')]I reduces R(A) as we have shown. Thus we obtain 1f; -1 A [rI +(AA ) ] A U T [aI +T ] T. U. (5. 20) r r r r r r r r r r Now U is unitary over the restricted Hilbert spaces, so that (5. 19) becomes oo IIT{T[o I+T21T - I ryyI = f2 r r r r Iy =X|r 20 -+k -1 for U x = y E D(T ); the rules applicable to such calculations may be r r found in [R-z], Section 128. On putting these last three equations together we find lim IA I - AAz'l = lim 2 dF Fy1I2 (5. 22) Tr T The measure generated by II Fkyjl2 is finite, and the integrand in (5. 22) is bounded by unity. Consequently, the Lebesque dominated convergence theorem insures that the right side of (5. 22) is zero, thus yielding the strong convergence asserted by the theorem. Suppose now R(A) is not closed. Then A' remains bounded but +. + A is unbounded (Corollary 1. 2), so A' cannot converge in norm to A. On the other hand, for y E R(A) it is always true that + 2 2 1 1,2 2-1 1 2 II(A' -A )yl = IIU [T (I+T-) -T ]Yl = I[T (-I+T ) -T ] a' r r r r r r r r r (5. 23) by the characterization (5. 7) of A, and the fact that U is unitary. r The norm of (5. 23) can once more be written in terms of the spectral representation, viz.

76 (A' -A )y = |z - 0j | El} 1y 2I al l r yl I2 =I 2 NY I I 0 - +N. 0 a( +k (5 2?-)A) If R(A) is closed, R[(AA ) ] is likewise closed (Corollary A. 3), whence r T and a fortiori T have bounded inverses. But then ([R-2], Section r r 128) there exists an a > 0 such that Fk = 0 whenever X < a. The lower limit of integration in (5. 24) can then be changed from zero to a, -6 2 with the result that the integrand is bounded by a ar over the entire interval of integration. Consequently, (A' -A+)Yl 2 < a-6 2 2YII2 (5. 25) and hence A' converges to A+ in norm as C tends toward zero. |II.* 1- -1 In Section I, we introduced not only A' = A (AA ) P but also another RPI which enjoyed a limited duality with A'. The latter operator was A" = (A A) A, whose domain was quite complicated. (see Theorem 1. 8}) in general; nevertheless, A" proved useful in connection with the applications of Section II. The relation between A' and A" is mirrored by a corresponding similarity of A' to A", where A" = [rI + (A A)] A, C >0 (5. 26) The principal properties of A" are stated. in Theorem 5. 3: A" is bounded and is possessed. of the domain D(A"') D(A*). The (closed minimal) extension of A" to an operator defined on all H2 converges strongly (as a -* 0) to A on D(A ), and converges + 1 to A in norm iff R(A) is a closed, set. Strong convergence of A" for bounded A is already asserted in tB-l, p. 60.

77 Remark: If A (and hence A ) is unbounded, A" cannot be closed. To avoid the ensuing pathologies, we therefore state the convergence parts of the above theorem in terms of the closed extension of A". Proof: The domain of A" is indeed D(A*) because [crI+ (A A)] is bounded and everywhere defined on HI. To show A"' bounded, observe that (A)i = A[u-I + (A A)] is bounded. and everywhere defined, as may be seen by applying the last theorem to A in place of A. Then its adjoint [(A )' ] is likewise bounded and defined everywhere of Hz. The adjoint satisfies (see [R-2], Section 115c) [(A )' i] A". (5. 27) 0 0G' Thus Al" is bounded on D(A ), and since the latter is dense in H2, the minimal closed extension of A" is the bounded operator [(A )' ]. We call [(A )' ] A. When R(A) is not closed, A is unbounded, + + and Ax cannot converge to A in norm. On the other hand, A is bounded whenever R(A) is a closed set, and then l^x ~A+ x — (AX-A+)"I *~- A I Ax | I I I(A -A )' = I I(A" (AA) I= (A)' -(A )+ 1 I. (5. 28) These relations are justified. by: (1) a bounded operator and its adjoint have the same norm ([H- ], Theorem 22. 2), (2) (A ) (A) by Theorem 1. 6, and (3) since R(A ) is closed also (Corollary A. 1) (A)' con0C verges to (A ) by Theorem 5. 2 above. To complete the proof, we verify the strong convergence of AX to A on D(A ). We note at once that N(A") = N(A ) = R(A = N(A ), so (as in the proof of Theorem 5. 2) we confine ourselves to z E R(A), which we will represent as z = Ax, x E N(A)-K Consequently, it is

78 sufficient to show ||Al1 Axx A IAx I |-0. (5. 29) 0Here again A Ax = x, and A A nmay be replaced by its minimal closed extension S (c-I + S )-. That the latter is in fact the closed minimal extension follows from the definition of AX and ax 2 2 -l (A A) A (A)' = S2(-I+S), (5.30) in which we have used (5. 16), and adopted once more the notation 2 A* S A A. The minimal extension is obtained from (5. 30) by taking the adjoint of both sides of the equation, and, noting that 00 2 S2 - ) Iz d G (5. 31) 2 2-1 wherefrom S (0-I + S ) is self-adjoint. A repetition of the argument employed in Theorem 5. 2 indicates that the restriction of S ((rI + S2)to N(A)- is simply S (aI + S.)-1 The left side of (5. 29) is then r r r o00 I (AXA-A+A)xll2 = II [S2 (cI +S2) -I l]x =- S cr cr r r r r 0 +2 (5. 32) The desired cbnvergence now follows, because the integral in (5. 32)' tends toward zero with cr, as is easily shown through use of the doniinated, convergence theorem. III Of course, A' and A" are merely two of the many possible approximations to A. Except for the convenience of dealing with positive operators, we might as well have chosen + -1 A= U (T) P (5. 33) cr r cRr

79 with (T )r = f (x+o-)dFk. (5. 34) When so defined, A is bounded, and converges to A in the same modes 0as A' (vide Theorem 5. 2); proofs of these claims follows the same lines of argument already used, in Theorem 5. 2. Alternatively, one might have taken the T as T =f dEX, (5.35) or perhaps based on approximation on the decomposition (5. 8) as a starting point. If we apply (5. 34), or use either of the above approximations in the context of the other form of the polar decomposition (5. 8), we again obtain strong convergence to A in general, and convergence in norm whenever R(A) is closed. From the practical viewpoint, little is gained by applying the approximations mentioned earlier in this paragraph; no computational advantage is evident. We do, however, assure that a solution exists for every z E H2, for the above approximations have the effect of closing the range of A. When R(A) is not closed., the introduction of a new topology can be used to modify the minimization problem (perhaps in undesirable fashion) to insure that R(A) is closed in the new topology. One such technique has been analyzed in detail in Part III, and its validity demonstrated by Theorem 3. 1. Another appeared in disguised form as part of Theorem 4. 1. Here we describe an approximation similar to the latter, but in more explicit form. Let us take K as the Hilbert space 0

80 K = H X H2, with norm specified by 11{y} 112 1 12 1, O >0. (5 36) Given the DDC operator A:Hl - 121, we define the new operator A~ H - K as follows: 1 1 C A1x = {x, Ax}. (5. 37) Because A is closed, Al has closed. range ([H-2], Definition 2. 11. 2). If we now let = {0, z}, there always exists a BAS (Definition 1. 1). for the functional equation Alx = zl, and the BAS is furnished by the PI of Al, i. e, x = Alzl' In this instance, the BAS x is the unique element attaining the infimum of l! ^x Zlll z I(x, x)-, z)12Ax-z - II,Ax}-{0,z}I = rIx-z. (5.38) Formally speaking, one hopes that for small cr the infimum of (5. 38) is close to the infimum of IlAx-zl,, and that this is accomplished without excessive growth of | I x I. Although the above touches on the question, we have been unable to solve the following: given z E H2 and a > 0, is there a (unique?) x E H1 of smallest norm satisfying I Ax-zIl < +'? If the answer is affirmative, how is the optimum x related to z? A partial conjec.ture suggests a construction of such x on a "piece-by-piece" basis. 1The answer to a related. problem is better understood: If A is a bounded operator and C a closed convex bounded subset of H1, there exists x E C satisfying inf IlAx-zll = Ax -z.lI. To prove the assertion, consider xE C 0 {xn} E C such that IIAx -zff tends to the infimum. Now C is weakly compact, so a subsequence of {x } converges to x, say. This subsequence has, in turn, another subnsequence whose easaro sums {w } converge (in norm) to x. Lastly, |lAw -zll tends to the infimum, and Aw - Ax in norm. This result is stated as an exercise for compac' operators in [B-1], p. 59.

8 1 In the first place, this problemn (iln fact, any BAS linear problemn) can be reduced without loss of generality to an equivaleInt forlmulation in which x E N(A)-L, z E N(A)1, 0=O, and A is a positive operator on N(A)L (we omit the proof). In the new form, the optimization problem can be viewed in terms of the spectral representation of A, whose associated resolution of the identity now divides N(A)-1 into mutually orthogonal subspaces with projections (E - Ek ). Now if we take j+l j A.x = \(E -E )z, we diminish IiAx-zli2 by If(E. -E )zii ~~jl 3 2j+l kj while enlarging lxii the amount I 2 (E E )zl; these are estiN j+l j mates tending to true values as I 1j+l-ji -I 0. To obtain I Ax-zll < a, we take as many increments A x as are needed to bring this norm down to the desired level. For each increment A jx, we choose for the index j the one (not already used,) corresponding to the largest k., and hence the smallest possible A x. We remark that the orthogonality and reducing property of the incremental operators on z imply ilAx-zll =1 iE zlI and IIxii Z3X-.2II (E -E )zi2, (5.39) n+l j+l j where x A.x, and there are a total of n terms with the N.. arranged: J j in decreasing order. If we-formally pass to the limit, we find that x = f k- dE z (5. 40) N. in which 3 is chosen as the largest number consistent with I Ax-z L < a. Since 13 is dependent on z, (5. 40) does not define a linear operator, so we must assume that the solution of the optimization problem (if this is indeed the solution) represents a nonlinear operator.

82 To gain further insight into the behavior of PI approximations when PRz A R(A), let us study an example involving A'. Let us call R ox A' z, P z t R(A), (5. 41) a- o- R and evaluate II Ax -zil and I Ix I. It will become apparent that Ix x| -e oo as IIAx-zll -*; as we shall show later in Theorem 5. 4, this undesirable behavior of x is inevitable. However, by analyzing the specific case (5 41) we will obtain sharper results on the variation of IAx -zll and I|x II as a function of (r. As we saw in the proof of Theorem 5. 2, N(A' ) = R(A)-L, and a' not only A' but also AA' is defined on all H2. Therefore, the first norm becomes I 1Ax zl 2 AA' z P zl + 1(I- PR)zI 2 2 +S 12d I FPz A- z + 5 2 kR (5. 42) in which we have used the fact that AA' = T[adI + T ], thus giving rise to a spectral representation and, corresponding form for norms. From the right hand integral in (5. 42), one infers that I Ax -zIl decreases monotonically with a, possessing the limit lim I AX -Zll = z. (5. 43) 1Equation (5. 43) is actually a simple corollary to the strong convergence of A' asserted in Theorem 5. 2. Obviously, A" then produces convergence in the sense of (5. 43) also.

83 Thus, we can approach the infinltium (1. 1) as closely as desired by choosing cr sufficiently small. The convergence does not require PRz E R(A); in fact, A' can be applied to z E HZ which leaves the PI A undefined, and retains the desirable property of minimizing I AxX-ZII even for such z. But our analysis remains incomplete without consideration of the norm of x. Since A' = U T (I + T ) PR a direct computation shows o a- r r r R li0crii2 2 11 _112 I I 2 | dI FPRz.1 (5. 44) 0 cr+ Since the integrand is monotonically increasing as cr tends towards zero, the same is true of |Ix |I. In fact, if PRz E R(A), PR z is in the d.oR R -0 main of T [See (5. 9) and the argument following], and. r cX |r 1 ||T PRZ II. (5. 45) From (5. 7), we recognize the right side of (5. 45) as the norm of the BAS x. Accordingly, as cr tends towards zero 0 IAx -Z1I j I}Ax -zll and l Ilxolx (5. 46) for all z such that PRz E R(A); the norms are strictly decreasing and increasing, respectively. Whenever PRz z R(A),- the right side of (5. 44) tends toward infinity (see [D-Z], Theorem XII. 2. 6) because then PRz A D(T 1) from (5. 9). R r We may therefore conclude that Iix || oo for PRz E R(A). (5. 47) The rate at which this norm tends to infinity can be bounded from above by majorizing the right side of (5. 44). Indeed, we always have

84 IIx I < - (5. 18) whether or not PRz belongs to the range of A. That 1AxS -zl -z j is accompanied by I x l - (X) is not merely a coincidence attributable to a poor choice of operator or sequence {x }. Rather, the comparative behavior of the two norms is immutable, as is indicated by Theorem 5. 4: Suppose PRz k R(A), and {xn} E D(A) such that IAx -z.ll z (5.49) Then I xnil -oo. (5. 50) Proof: By an argument identical with that leading to the left hand equality in (5. 42), we can replace (5. 49) by IIAx - yll - 0 (5. 51) if we take y = PRz. Let us then suppose {Xn} bounded and reason by contradiction. If {xn} is bounded, it is weakly compact, and we may assume without loss of generality that the entire sequence converges weakly, say x x. (5. 52) n w o Then {Xn} has another subsequence-which we again denote by {Xn for brevity in notation-such that ([R-2], Section 38) m w = m- x (5. 53) m 1 and w - x in norm. (5. 54) m o But also, the triangle inequality yields

85 IIAw -YlI < rnm I Ax.-yII 0 (5 55) m -1 J with the right hand convergence following front (5. 51). But A is a closed operator, and IIAw -yl -E, tolgelher with (5. 5)l), implies x E D(A) and Ax y. This contradicts the original hypothesis y R y kR(A). HIl

APPENDIX A SOME PROPERTIES OF OPERATORS IN HILBERT SPACES In the classical literature on linear unbounded operators in Hilbert spaces [D-Z] [R-2] [A- 1], one is generally confined to densely defined closed (DDC) operators; operators which are not DDC fail to yield significant useful results, and are therefore seldom considered. But even the DDC assumption leads to a much more complex structure than is encountered for bounded operators (which can be trivially extended to DDC operators) For instance, if one of A and B is bounded and the other DDC, AB need not be DDC. Thus, applications to pseudoinverses requires some knowledge on the characteristics of certain compositions of DDC operators and their restrictions to subspaces. Consequently, we offer here a series of technical lemmas, to which we shall refer in the body of work from time to time. Although each lemma is quite easy, and perhaps not interesting in itself, we have not found these results- conveniently accessible in the standard. literature. Our notation is consistent throughout the paper. A is a linear DDC operator from a Hilbert space H1 to another, H2. The operator A has domain D(A), and is endowed with range R(A) and. nullspace N(A); its restriction to N(A)- will be called A. Projections on N(A) and r N(A)j are denoted by PN and PM, respectively. Analogously, B:HZ - H1 will have a restriction B to R(A) = N(A*)S, and the corresponding projection on this substance is PR. 86

87 Lenlma A. 1: x e D(A) iff Px e D(A). Proof: Consider the orthogonal decomiposition x = PMx + xl, (A. 1) where xl E N(A) c D(A). It is clear from the linearity of A that PMx E D(A) implies x E D(A); conversely, if x E D(A), so is PM x-= X-Xi | Lemma A. 2: R(A ) = R(A). Proof: Since R(Ar) c R(A) is obvious, we need only prove the set inclusion in the opposite direction. To this end, take any z E R(A), so that there exists an x E D(A) such that Ax = z. The decomposition (Ao i) then yields A (PMx) = A(PMx) = Ax = z because Ax1 =. H The next lemma applies to any restriction of A, and we take A to be such. Of course, our interest here is the specialization of the result below to K = N(A). Lemma A. 3: Any linear operator A c A has a closed extension A r r If, in particular, D(A ) D(A) n K, where K is a subspace, A is a r r closed operator. Proof: The graph G(A) of A is a Hilbert space under the usual norm, and G(A ) c G(A). Clearly, A and hence G(A ) can be extended to a r r r linear manifold dense in some subspace of G(A). We may therefore suppose G(A ) already extended in this fashion. Since {0, y} / G(A ) r r for any y 0 f, A can be defined uniquely in terms of G(A ) ([H-2], r r Section 2. 11), and A is a closed operator because its graph is closed. r Assume now K is a subspace; we show G(Ar) is then complete in G(A). In fact, if {{xn, A n}} is a Cauchy sequence in G(A ), {xn } E D(A ), n ~~~~~ ~~ ~~~rn r n r'

88 and since K is closed, xn x E K. Moreover, {x Ax Ax ) -t x, Ax ) E G(A). Thus x e G(A) n K = D(A ), which means 0 0 0 r {x, Ax ) E G(A ). This completes the proof. It 0 o r We return to the notation of A as the restriction of A to the subr space N(A)fL. By virtue of the lemmas above already proven we see Ar is a closed linear operator from the Hilbert space N(A)-' c H1 into the Hilbert space R(A) c H2. Its domain D(A ) consists precisely r of the set D(A ) = {u:u = PMx, x e D(A)}, and its range coincides with r M that of A. That A is in fact a DDC operator now follows from r Lemma A, 4: A: N(A)4 — R(A) is densely defined. Proof: Since A is densely defined, there is for each x E N(A4- a sequence {x}n E D(A) such that x -- x. But ItX-PMX < X-x | | 0, n M n n and {PMXn} ED(Ar). I|I It sometimes becomes necessary to deduce the properties of an operator from those of its restriction, or to extend an operator from N(A.L to H1 [R(A) to H 1]. That the extended operator is well be1. 2 haved is assured by Lemma A. 5: The linear operator A:H1 - Hz is DDC if (a) N(A) is a closed set I~~~ ~~(A. 2) and (b) its restriction A is DDC on N(A)-. r Proof: Since A is linear, N(A) is a subspace. Let A be the restriction of A to N(A), and take G(A ) to be the corresponding graph, conq sisting of elements {x, 0} with x E N(A). Then G(A ) is a subspace in q H1 X Hz, and is orthogonal to G(A ). We observe further [cf. Lemrma A. 1 and the linearity of A] that G(A) = G(Ar) Q G(A ), which exhibits

89 G(A) to be closed. To demonstrate that A is densely defined, first note that D(A) = D(A ) ()N(A).If now x E H1, there exists {xnl) D(A ) such that xn - PMx, whence x + PNX = n -* X. As {Xn} E D(A), this shows D(A) dense in H1. II We next exhibit some results on inverses of operators related to A. If A is invertible with range dense in H2, there is little content to the theory of the pseudoinverse. We therefore do not suppose A has an inverse, but rather subsume it under the more general results that follow. The first of these is Lemma A. 6: A has a DDC inverse A r r Proof: The inverse exists if A is one-to-one. For xl, x e N(A)A x1 = A x is equivalent to Axl = Ax, which shows xl -x2 E N(A). Ar~Xl r2 1' But x -X2 N(A)I also, which means x = x2; thus, A:N(A)I - R(A) is one-to-one. That A is closed follows from the fact that A is r r closed (cf. Lemma A. 3 and. [H-2], Theorem 2. 11. 15). Finally, D(A-) = R(A ), and the latter is dense in R(A) from Lemma A. 2. I|| r r Lemma A. 7: A has a DDC inverse, and r (Ar ) (Ar ). (A. 3) -1 _r r Proof: A and (A ) are themselves DDC, whereupon the desired r r conclusions follow from a well known argument based on G(A ); see [R-2], Section 117 for details. |I|

90 Lemma A. 8: The following conditions are equivalent: -1 (a) A is bounded. * -1 (b) (A ) is bounded. r (c) R(A) is a closed set. -1 (d) A A is a closed operator. r r (e) There exists m > 0 such that ml| x| < I A rxl for all x E D(A ). (f) There exists k > 0 such that k|lyll <i 1A Y|l for all y eD(A ). Proof: Since D(A ) R(A) and A is a closed operator (cf. Lemma r r A. 6), condition (c) implies (a) by the closed graph theorem [R-2], Section 117). Conversely, the domain of a closed bounded operator is a closed set, whence (c) follows from (a) and. Lemma A. 2. To prove (b) from (a), consider (A. 3) and the fact that (a) implies - 1k - * (A ) bounded. On the other hand, (b) requires (A ) bounded, and r r (a) follows. -1 -1 Respecting (e), observe D(A ) = R(A ), so we may put x Ar y, r r r with y uniquely defined and. ranging over D(A ) as x ranges over D(A ). This means A yl < m IlYll for all y E D(A ), i. e. A i <m r 1 r r For the converse, one proves (a) from (e) by retracing the above steps in reverse order. The equivalence of (b) and. (f) is similarly demonstrated. -1 We turn our attention to the relation between (a) and (d). ArAr: R(A) - R(A) is the restriction in this space of the identity operator to its domain D(A rA ) = R(A). Thus A rAr is bounded and densely defined. Consequently, A A- is a closed operator iff its domain R(A) r r is a closed set. li

91 It is tempting to assert that (b) is equivalent to yet another condition, namely (g) R(A ) is a closed set. (A. 14) However, the proof given in the first paragraph (of the proof of the lemma) fails because the connection between (A ) and A is not r clear. We recall here that by B for B:H - H we mean the restricr 2 1 tion of B to R(A) Lemma A. 9: (A )r Ar (A. 5) Proof: For any x E D(A), y E D(A ) we have (Ax,y) = (x,A y) = (x1A y) = (x1,A 1)= (X1 (A )r) (A. 6) where x1 - PMx and y1 - PRY= At the same time (Ax, y) = (Ax, Y1) = (ArXl' Y) * (A. 7) The right sides of (A. 6) and (A, 7) are valid for all xl E D(A ) from Lemma A. 1. Therefore, A v (A ), the latter being defined at r r least on D(A ) n N(A ) —. To demonstrate equality, consider any y E D(A ). For such y and all xl E D(A ) (A x,y)= (x1 A y) - (x (A. 8) ______ 1'_ r (X1, Y), in which y E R(A ), and P,(A ) = D(A ) because [see Lemma A. 7] r r r (A ) is densely defined. But any x E D(A ) has the orthogonal decomposition x = xl + x2 with xl E D(A ) and x2 E N(A) (cf. again Lemma A. 1). Hence in (A. 8) A x = Ax = Ax, while (x1, y ) (x, y ) because y is orthogonal to N(A). In other words, (Ax, y) = (x, y ), or y E D(A ). The proof is completed by noting that the inclusion D(A ) c D(A ) r just obtained yields

D(A ) = D(A ) n R(A) c D(A ) n R(A) D[(A )r]. II| The validity of Lemma A. 9 permits us to use the notation A to replace both (A ) and A. This lemmna may also be applied to strengthen the results of Lemma A. 8, viz. Corollary A. 1: Lemma A. 8 remains true if condition (g) of (A. 4) is added; i. e., conditions (a) through (g) are equivalent. Proof: R(A ) = R[(A ) ], an assertion whose proof we defer for the moment. Since R(A ) is closed, or not according as (A ) is r r bounded or unbounded., (b) and (g) hold, simultaneously. By Lemma A. 8, (b) is equivalent to the other properties (a) to (f), so the proof is finished.'I Lemma A. 10: (Duality of A and A ) Lemmas A. 1, A. 2, A. 4, A. 5 and A. 8 continue to hold, if symbols are consistently interchanged as follows: A with A, A with A. PM with PR' and HI with H2. (A. 9) r r Proof: A: H2 - H1 is DDC, and A is its restriction to R(A), where 2: 1 or R(A) - N(A ) —. The arguments of Lemmas A. 1, A. 2, A. 4 and A. 5 therefore lead to the desired results in terms of the sumbol substitution (A. 9). No proof is required, for Lemma A. 8, in view of its inherent symmetry with the addition of Corollary A. 1. I | Our next objects of study are the restrictions of A A and, AA As we have pointed out earlier, compositions of DDC operators need, be neither closed nor densely defined. However, it is known that A A and AA are not only DDC, but even self-adjoint positive operators

93 ([R-2], Section 119). The restriction (A A)r:N(A) N(A)L is then closed by Lemma A. 3, and similarly for (AA ) R(A) -- R(A). it is less r obvious that (A A)r and (AA )r are densely defined and self-adjoint. We now prove these facts, together with other properties of (A A)r and (AA)r useful elsewhere in this work. Lemma A. 11: N(A) = N(A A); N(A ) = N(AA ). (A. 10) Proof: We prove only the first set equality. Clearly, N(A A) D N(A). Consider then x E N(A A); we compute 0 = (A Ax,x) = IAxII' (A. 11) which demonstrates that x E N(A). I.I Lemma A. 11 above enables us to write (A A): N(A A) - N(A A) showing through use of identical arguments that Corollary A. 2: Lemmas A. 1, A. 2, A. 4 and A. 6 remains valid, if A is replaced by A A and A by (A A). In particular, (A A) is densely defined and possesses a DDC inverse. Corollary A. 3: R[(A A) R(AA) = R(A ); R[(AA ) = R(AA ) = R(A). r r (A. 12) Proof: Lemma A. 2 yields R[(AA) I = R(A A), while R(AA)W R(A I1 follows from the standard result N(A) = R(A*)1 and (A. 10). The second set of equalities is similarly obtained. || To verify the self-adjointness of (A A) [as well as (AAK) ]I and r r to provide a useful alternative form, we state first L*emma A. 12: (A A) AA and (AA"' - AA. (A. 13) r r r r r r

94 Proof: The left side of the first equality satisfies'A A)rx = A Ax for any x E D(A A) n N(A.. For such x, A A x -A (Ax) = A Ax; (A. 14) r r r the latter equality is true because Ax E R(A), and A is the restriction of A to R(A). We have thus shown (A A)r c A A. To complete the r r r proof, we demonstrate that D(A"A ) c D[(A A)]. In fact, A A x is rr r r r defined only if x E D(A) n N(A) and Axe D(A ) D'( R(A). But Ax E R(A) in any case, so the second. condition becomes merely Ax E D(A ). That is, x E D(A A) n N(A) =D[(A A)]. I' Remark: The proof makes implicit use of Lemma A. 9, since we d educe the domain of A as that of (A), whereas the applications of r r (A. 13) interpret A as Ar Lemma A. 13: (A A)r and (AA )r are self-adjoint operators on N(A and. R(A), respectively. Proof: Although it was shown earlier that (A A)r is DDC, this follows again from Lemma A. 12 and the fact that A A:N(A) - N(A) is r r DDC. Indeed, A A is self-adjoint on this space ([R-2]; Section 119), r r so the same is true for (A A) by (A. 13). As usual, the assertion r regarding (AA )r is analogously proven. ||| Corollary A. 2 already claims that (A A) and (AA ) possess r r DDC inverses. We now repeat and sharpen this result via Lemma A. 14: (A A): N(A) - N(A) — and (AA ): R(A) -r R(A) -m A1r r have self-adjoint inverses given respectively by A-1 -l (AA*- )1 (A- 15)(A A) =A A and (AA) =A A.. 15) r r r r r

95 Proof: We recall that (A A)-A"A by (A. 13). Then, since A and rPrr r r A have inverses from Lemmas A. 6 and A. 7, respectively, (A A) 1r r has the inverse stated in (A. 15) (see [R-2], Section 114h). The inverse of a self-adjoint operator is likewise self-adjoint ([R-2], Section 119), and so the self-adjointness of (A A) is a consequence of Lemma A. 13. 1 Finally, we give the conditions under which the inverses of (A A) and (AAK)r are bounded, and their ranges closed. Lemma A. 15: Suppose (a) any one of the ranges of A, A", (A"A) or (AA') is closed or (b) any of the inverses A, A (A A), or (AA ) bounded. r r r r Then all the ranges in (a) are closed, and, all inverses (b) bounded. Proof: By virtue of Lemma A. 2, and, its application to adjoint and composite operators via Lemma A. 10 and Corollary A. 2, we consider the ranges of the restrictions of the operators appearing in (a). Then the arguments of Lemma A. 8 demonstrate the desired result for A and A and also yield the relation between range and inverse for (A A) and. (AA). If now A- and, A are bounded, formula (A. 15) verifies the r r *; -land (AA' -l 1 -1 boundedness of (A A) and (AA ) Conversely, suppose (A A) r r r bounded. Then R(A A) is a closed set, and we have from (A. 12) and R(A A) cR(A ) R(A ) = R(A A) c R(A )c R(A ); (A. 16) this means R(A ) = R(A A), or R(A ) is closed. The latter enables us to show that all the other conditions of (a) and (b) are met. Likewise, * * -1 R(AA ) closed or (AA ) leads to R(A) closed, and hence the truth of each condition (a) and (b). ill

96 In the course of the above proof, we have additionally deduced Corollary A. 1.1: If any of the conditions (a) or (b) of Lenmnla A. 15 are satisfied, R(A A) = R(A ) and R(AA ) = R(A), (A. 17) each of these ranges being a closed. set. Il It is well known that if A1 and A2 are DDC, (A1A2)' A2A1, but this, sheds little light on the closed, extension of the right side. However, if either A1 or A2 are bounded, at least partial results can be secured ([R-Z1, Section 115). Our interest lies in taking A2 bounded, since this occurs in Gauss-Markov estimation. More specifically, we have Lemma A. 16: Suppose a. Al is densely defined and closable, with closure A1. b. A2 is bounded and everywhere defined. c. A A is densely defined. Then the closed. extension of A A 1 is (A1A). Proof: Since A2 is bounded. and A] is densely defined, AzA1 is densely defined and hence has an adjoint; in fact, A2 being bounded implies (see again [R-2]., Section 115) (A2A1) A A (A. 18) 2 1 8 Taking adjoints once more then yields the conclusion, viz.,<'. *,., __ -, AA1 = (A A1) A (A A2) (A. 19) Remark: It does not follow that A A = (A AZ), because the left side 21 12 need not be a closed operator. This can be demonstrated via a counterexample in which Al and A2 are each self-adjoint, and A2 has finite dimensional range.

APPENDIX B HILBERT-SPACE-VALUED RANDOM VARIABLES In discussing statistical estimation in Hilbert space we deal with random quantities taking values in a Hilbert space. There is an extensive literature concerning Hilbert-space-valued random variables, or what is essentially the same thing, probability measures in a Hilbert space (see [P-l]). However, what we need is relatively simple and can be presented briefly in a form that directly fits our requirements. Let ( Q,, P) be a complete probability space, where (2 is a - algebra of subsets of 2 and. P is a probability measure on (a. Let {wi}, i=l, 2,..., be a sequence of (real or complex-valued) random variables on (Q2,c0) such that 00 00 Ew Ewi =S - Z f w(w)i 2dP(w) < oo. (B. 1) We note that, by the Fubini theorem, the inequality (B. 1) implies 00wi) 0S wi0 dP() 00 = D EjIwi1 <a 0 (B. 2) Now let {( }, n=1, 2,..., be a complete orthonormal system (c. o. n. s.) in HZ and define w(w) by 00oo w(w) = w.(w)., w E A (B. 3) = 0, We 00oo where A c 2 is the set of all w for which jwi(w))2 <0oo. By (B. 2) 97

)8 and the assumption of completeness of P, Ac c L, and P(AC) = O. Thus', for all w, w(w) is an element of H.I and for a. e. c 00 Il(W()L =2 lw.(w)ll. ('. i) We can now define w (w) = (w (o), qn) for all w. The w are measurable n n n (G) and are equal to the w except on a fixed set of probability zero. It should cause no confusion if we drop the tilde and henceforth simply denote (w(w), qn) as wn. Then (B. 4) holds for all w. Since the (w, n) are all measurable it is easy to establish that w is weakly measurable, that is that (w, y) is a measurable scalar-valued function for each y c H-2. Then, since H2 is separable, w( ) is also strongly measurable (see [H-2] Section 3. 5 for both definition and proof). We call w a Hilbert-spacevalued random variable. The condition (B. 1) is sufficient to establish both a mathematical expectation of w and a covariance operator. Since, by (B. 2) and (B. 4), iEI|wII = f Ilw(w)tll dP(w) < o(B.5) the n, - HIw(c)II dP(w) < 00. (B. 6) The strong measurability of w and. (B. 6) imply that w(w) is Bochnerintegrable (Q2,6L, P) (see [H-2], Section 3. 7). We define Ew = f w(w)dP(w) (B. 7) where the integral is a Bochner integral. An application of a basic theorem of Hille ([H-2], Theorem 3. 7. 12) gives (y,Ew) = S (y,w)dP.

99 Hence, if the w. have nmean zero it follows that Ew = ~. Also, we have from (B. I) that E (y,w)(zw )l < Lyll' t z|| EIw||Z < oo. (B. 8) Since 4 (y, z) = E{(y,w)(z, w)} is a Hermitian symmetric conjugate bilinear form, which by (B. 8) is bounded, there exists a bounded symmetric operator K defined on all of H2 such that (Ky, z) = ~(y, z). (B. 9) K is obviously nonnegative definite; it is called the covariance operator of w. The assumption (B. 1) further implies that K is nuclear, that is that it is compact and has finite trace. One can verify directly from (B. I) that the bounded operator K is Hilbert-Schmidt, and hence compact, and then, using the compactness, verify that it has finite trace. 1We refer the reader to [G-1], pp. 26-47. A nonnegative self-adjoint compact operator T has an o. n. s. of eigenvectors {r; n} that span R(T), and nonnegative eigenvalues N; Trl =. By definition, T is nuclear if n (Trn,ln) -E X <oo. n n n n=l n=l If T is nuclear, (T,, n - (Trn=ln) = Tr(T), n=l n=l n where {n} is any c. o. n. s.

100 However, a shorter proof is as follows. Since K is self-adjoint and 1/2 nonnegative, it has a nonnegative square root K. Then IIK/( (K n' n) El (Wn') 2 00 = El w 12 < 0o (B. 10) n The convergence of IK1/K I|Z ensures that K is HilbertSchmidt (cf. [G- 1], p. 34). Then K = K1/ K1/is nuclear (cf. [G-11, p. 39). Finally, we observe that K is strictly positive definite if and only if (K)n, On) > 0 for all n, that is, if and only if E wn 2 > 0 for all n. The preceding remarks may be summarized as a theorem: Theorem B. 1: Let (Q., G, P) be a complete probability space and {wn}, nl, 2,..., a sequence of scalar-valued random variables satisfying (B. 1). Let {cn} be a c. o. n. s. in the separable Hilbert space H. Then w as defined by (B. 3) is an H-valued random variable, i. e., a strongly measurable H-valued function on (Q2,6?). The expectation of w as defined. by the Bochner integral in (B.'7) exists, and a unique bounded self-adjoint, nonnegative -definite covariance operator K exists that satisfies (Ky, z) = E(y,w)(z, w), y, z e H. Further, K is nuclear. K is strictly positive definite if and only if EIwn12 > O, for all n, and Ew = O'if and only if Ew = 0 for all n. n

101 Remark: The nuclearity of K is essential in the following sense. Sup' pose v is an H-valued random variable satisfying El I vii < oo. Then a covariance operator K for v exists and is nuclear. In fact, the existence of K follows exactly as above, and K is bounded, self-adjoint and nonnegative definite. Then for any c. o. n, s. s{} the first two equalities in (B. 10) hold. But oo ZD Ei(4,v)L2 = Ei ivii, which is finite. The rest of the argument follows as before. One standard, situation is for the observations to be functions in an L2 space. This leads us to discuss stochastic processes whose sample functions belong almost surely to L2. Let w(t,w) be a measurable, separable, real or complex-valued stochastic process with t belonging to the parameter set T c Ri, and with probability space (Qi 0., P) (c. f. [D-3], Chapter 2). T can be an interval, or all of R, or indeed any measurable subset of R with positive measure. w(t,') is a random variable on (2,(i, P). As is conventional, we usually suppress the probability variable and write w(t) instead, of w(t, w). Let w(t) satisfy f Eiw(t)i2dt < oo. (B. 11) T Then R(t, s) = E w(t)w(s) exists for a. e. t and a. e. s, and f Iw(t, w)2 dt < oo T for all X4 in some A, P(A) = 1. We put R(t, s) = 0 for those values of

102 t, s for which Ew(t)w(s) is not defined. We define w(c) c Lz(T) to be the Lebesgue equivalence class of functions w(0, cv) for each w E A, and to be the zero element otherwise. Then we have Theorem B. 2: w(w) as just defined is an H-valued random variable (H = Lz(T)). It has a nuclear covariance operator K given by [Ky](t) = S R(t, s)y(s)ds (B. 12) T for y e L2(T). The expected value of w(w) exists and is characterized by (Ew,y) = E(w, y) = E w(t) y(t)dt (B. 13) T for all y c L (T). Proof: The assertion that w is an H-valued random variable means that it is strongly measurable, Since LZ(T) is separable this is equivalent to weak measurability. We have for any y e L2(T), (w(w), y) = f w(t, w)y(t)dt, w c A (B. 14) T 0, woe A The integral in (B. 14) is a measurable function of w(c. f. [D-3], Theorem 2.7)hence (w,y) is measurable. Thus w is an L (T)-valued random variable. From (B. 11) we have EIIwIIZ < co. Hence, as in the preceding theorem and the remark following, w has a nuclear covariance operator K.

103 (Ky, z) = El(y, w)(z, w)] E{ f y(t)w(t)dt f z(s)w(s)ds } T T = fS R(s,t)z(s)y(t)dt (B. 15) T T where the interchange of integrals is justified, by the Fubini theorem after the following calculation: f f Elw(s)w(t)j Iy(t)l I z(s)| dtd,s T T < S f [R(s, s)]l/ [R(t,t)ll/| Iy(t)J Iz(s)| dtds T T < (f R(s,s)ds) Ilyll IIzH <O T by the assumption (B. 11). Since (B. 15) holds for all y, z c L (T), (B. 12) is proved. The expected, value of w is given by the Bochner integral Ew = J w(w)dP(w) (B. 16) which exists because I Iw(w)Il dP(w) < oo, (B. 17) which in turn follows from the square integrability guaranteed by (B. 11). (B. 13) follows from (B. 16) and a theorem used previously ([H-2], Theorem 3.7. 12). ||| Now let C be a bounded linear transformation from all of HZ into H1, and let w be an H —valued random variable satisfying E|iw|I < co.

104 Then Cw is an Hi-valued random variable. Clearly Cw(w) is defined for all uo. Also Cw is weakly (and hence strongly) measurable since (Cw, x) -(w, C*x), x c H1, and w is weakly measurable. Finally, 2 ~2 f lICw((w)1 dP(w) <flCtI2 f Hw(w)HI dP(w) Thus E[Cw] is defined, and a covariance operator K2 for Cw is defined. We have,, furthermore, again by the Hille theorem, E[Cw] = f Cw(w)dP(w) = C f w(w)dP(w) = C(Ew) and (KZ u,X) = E[(u,Cw)(x,Cw)] = E[(C* u,w)(C x,w)] = (KC u, C x), from which K? = CKCV:.

REFERENCES [A- 1] N. I. Akhiezer & I. M. Glazman, Theory of Linear Operators in Hilbert Space, vol. I and II (tr. by M. Nestell), Frederick Ungar Publishing Co., New York, 1961 and 1963. [A-2] A. Albert, Regression and the Moore-Penrose Pseudoinverse, Academic Press, New York, 1972. [A-3] M. Athans & P. L. Falb, Optimal Control, McGraw-Hill, New York, 1966. [B- 11 A. V. Balakrishnan, Introduction to Optimization Theory in a Hilbert Space, Lecture Notes in Operations Research and. Mathematical Systems, vol. 42, Springer-Verlag, New York, 1970. [B-2] A. V. Balakrishnan, "Determination of nonlinear systems from input-output data, " Proc. of the Princeton University Conference on Identification Problems in Communcation and Control Systems (1963), 31-49. [B-3] F. J. Beutler, "The operator theory of the pseudo-inverse I. Bounded operators, " J. Math. Anal. and Appl., 10 (1965), 451-470. [B-4] F. J. Beutler, "The operator theory of the pseudo-inverse II. Unbounded operators with arbitrary range, " J. Math. Anal. and Appl., 10 (1965), 472-493. [B-5] R. Bouldin, "The pseudo-inverse of a product, " SIAM J. Appl. Math. 24 (1973), 489-495. [B-6] R. W. Brockett, Finite Dimensional Linear Systems, Wiley, New York, 1970. [C-1] C. T. Chen, Introduction to Linear System Theory, Holt, Rinsehart & Winston, New York, 1970. [D-1] C. Desoer and B. Whalen, "A note on pseudoinverses, " J. SIAM, 11 (1963), 442-447. [D-2] N. Dunford & J. T. Schwartz, Linear Operators, vol. I and II, Interscience Publishers, New York, 1958 and. 1963. [D-3] J. L. Doob, Stochastic Processes, Wiley, New York, 1952. 105

106 [G-1] I. M. Gelfand & N. Ya. Vilenkin, Generalized Functions, vol. 4: Applications of Harmonic Analysis (tr. by A. Feinstein), Academic Press, New York, 1964. [G-2] T. N. Greville, "The pseudoinverse of a rectangular or singular matrix and its application to the solution of systems of linear equations, " J. Soc. Indus. Appl. Math., 1 (1959), 3843. [H- 1 P. R. Halmos, Introduction to Hilbert Space and the Theory of Spectral Multiplicity, Chelsea, New York, 1951. [H-2] E. Hille & R. S. Phillips, Functional Analysis and. SemiGroups (rev. ed. ), Am. Math. Soc. Colloq. Pub., Am. Math. Soc., Providence, R. I., 1958. [I-1] IEEE Trans. Automatic Control, Special Issue on LinearQuadratic-Gaussian Problem, AC-16 (1971). [K-1] R. Kalman, Y. Ho & K. Narendra,' Controllability of linear dynamical systems, " in Contribution to Differential Equations, vol. 1, Wiley, New York, 1962; pp. 189-213. [L-1] W. S. Loud., "Generalized inverses and generalized. Green's functions, " SIAM J. Appl. Math., 12 (1970). [L-2] D. G. Luenberger, Optimization by Vector: Space Methods, Wiley, New York, 1969. [M-l] J. L. Massera & J. J. Schaffer, "Linear differential equations and functional analysis, " Ann. of Math., 67 (1958), 517-573. [M-2] N. Minamide and K. Nakamura, "A restricted pseudoinverse and its application to constrained minima, " SIAM J. Appl. Math., 19 (1970), 167-177. [N- 1] M. Z. Nashed, A Retrospective and. Prospective Survey of Generalized Inverses of Operators, Mathematics Research Center Report no. 1125, University of Wisconsin, Madison, 197 1. [P-1] K. R. Parthasarathy, Probability Measures in Metric Spaces, Academic Press, New York, 1967. [P-2] E. Parzen, "An approach to time series analysis, " Ann. Math. Stat., 32 (1961), 951-989. [P-3] W. A. Porter, Modern Foundations of System Engineering, Macmillan, New York, 1966.

107 [P-41 W. A. Porter & J. P. Williams, "Extensions of the minimum effort control problem, " J. Math. Anal. and Appl., 10 (1966), 536-549. [R-1] C. R. Rao & S. K. Mitra, Generalized Inverse of Matrices and Its Application, Wiley, New York, 197 1. [R-2] F. Riesz & B. Sz. -Nagy, Functional Analysis (tr. by L. Boron), Frederick Ungar Publishing Co., New York, 1955. [R-3] W. L. Root, On the Modelling of Systems for Identification, Report, University of Michigan, Ann Arbor, 1972. [R-4] W. L. Root, "On the structure of a class of system identification problems, " Automatica, 7 (1971), 219-231. [R-5] W. L. Root, "On the modelling and estimation of communication channels, " Multivariate Analysis-III, (ed. R. Krishnaiah) Academic Press, New York, 1973.

UNIVERSITY OF MICHIGAN 3 90 15 02229 3081 3 9015 02229 3081