THE UNIVERSITY OF MICHIGAN INDUSTRY PROGRAM OF THE COLLEGE OF ENGINEERING SIGNAL EXTRACTION OF SYMMETRIC FUNCTIONS FROM NOISE Richard S. Norman A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the University of Michigan Department of Communication Sciences 1967 December, 1967 IP-795

ACKNOWLEDGEMENTS I am very grateful for all the help I nave received from the members of my doctoral committee, from the staff of the Radar and Optics Laboratory, and from the Engineering Industry Program. Certain individuals have lent such great assistance that they deserve special mention. Dr. Edward Eckert, of the Department of Epidemiology, provided me with advice and electron micrographs of viruses in the early stages of this work. Mr. Art Hercz, of the Radar and Optics Laboratory, instructed me in the use of their optical, photographic, and electronic equipment which provided the basis for this work (and which resulted in the illustrations in Figure 14). Mr. Donald Danford, of the Industry Program, has guided and propelled me through the agonizing period of producing printed copy from a manuscript. It is impossible for me to thank Dr. Henry Swain adequately for the help and guidance he has given me during my graduate career. This work has been possible only through his patience and perseverance. During the course of research for this thesis, I held a Graduate Fellowship from the National Science Foundation; in the final stages I was supported under grant NIH-NB-0617-02 to Dro D. M. Maynard, of the Department of Zoology. Dr. A. Fo Howatson, of the Ontario Cancer Institute kindly supplied Figure 1.2. ii

TABLE OF CONTENTS Page ACKNOWLEDGEMENTS............................................... ii LIST OF FIGURES.....o 0O..... oo oo..............................o iv I INTRODUCTION. ooooo................................. 1 II THE PROCESSING PROBLEM................................. 25 A. The Structure of the Data............................ 25 B. The Noise Process........................,........ 33 C. Data Processing...................................... 37 D. The Optimization..OOOOOO...OO O... oo............. O.. 55 III FINITE DATA SPACES...................................... 72 A. The Processing Problem.............................. 72 B. The Processing Solution.............................. 82 C. Special Covariance Matrices......................... 88 IV REPLICATED DATA SPACES.................................. 94 A. The Processing Problem............................... 94 B. Wiener Multivariate Smoothing.............................. 101 V TOPOLOGICAL GROUP DATA SPACES.....o............ o........ 108 A. Data Processing on Groups............................ 108 B. Special Processing Solutions......................... 113 BIBLIOGRAPHY................................................. 123

LIST OF FIGURES Figure Page 1.1 Paradigm for Data Collection.......................... 3 1.2 Electron Micrograph of Viruses,....................... 7 1.3 Virus Models.......................................... 11 1.4 Optical Transforms of Virus Models.................... 12-15 2.1 Data Space Corresponding to the Replicated Experiment.......................................... 28 2.2 Two Possible Building Block Configurations for the Data Space of Fig. 2.1.0.e e oe........................ 31 2.3 Periodic Function Data Spaces......................... 34 2.4 Resolving Signals in the Presense of Noise............ 40 2.5 Representation of the Various Spaces.................. 51 2.6 Three Ways of Representing the Noise in Terms of Components............................................ 64 2.7 Relationship of the Noise Components in Two Decompositions........................................ 66 3.1 Two-dimensional Function Spaces....................... 76 4.1 The Symmetric Function Subspace and the Noise Line.... 105 5.1 The Weighting Functions in the Data Processing Example............................................... 122 iv

CHAPTER I INTRODUCTION Signals, Noise, and Prior Knowledge In the collection and processing of data about any real system, the desired information is always perturbed or disrupted by irregularities or uncontrolled variation in the data. If some prior knowledge is available concerning the expected properties of the information sought, it may be possible to analyze the collected data to determine more from it than is obvious by direct inspection. In short, if you know what to look for, you will have an easier time finding it. The data considered in this thesis are obtained by photographing symmetrical structures. The prior knowledge concerning the data is the knowledge of the symmetry of the object photographed. It will be helpful to keep in mind the idea that the data is a photograph, even though the notion of the observed data will be generalized considerably. In particular, the data are given over a region of space (the area of the photograph) and the prior knowledge is a knowledge of certain relations between the images at different spots on the photograph. The development of information theory and the analysis of information handling systems has led to very powerful techniques for processing the data that arise in communications Very general data handling problems can also be formulated within the framework of information theory, independent of the context of the appiecation or the source of the data. Described in communications terms, the data processing problem becomes one of detecting and analyzing a "'signal" in the -1

-2presence of obscuring "noise." To make clear the connection between analyzing the photograph and processing communications data, consider the following paradigm for the observation of data (Fig. 1.1). The system under study produces (or emits or outputs) information, the signal, which is transmitted to the observer through a channel. The signal is that aspect of the system in which we are currently interested. Simultaneously the system may be producing extraneous information which is included as the noise source to the channel. In addition, other sources may contribute to the channel; these are all lumped together as noise. The result is the channel output, the observed data. Of primary importance to us is the prior knowledge we may have concerning the system under study, the noise, and the channel. This prior knowledge may have been obtained previously through the channel, through a different channel, or may even be a hypothesis concerning the system. In the language of information theory, the presence of this prior knowledge means that the signal is redundant, This redundancy allows the recovery of part of the signal from the noise. The type of prior knowledge usually considered in communications theory is the knowledge of the probability distributions of the signal parameters. In information theory, the signal may consist of discrete symbols and the prior knowledge is given in terms of joint probability distributions or groups or words of these symbols. In signal theory, the signal is considered a function of time and the prior knowledge is knowledge of certain correlation or power spectral density functions. In signal detection theory, the signal is of a

-3PRIOR KNOWLEDGE OF NOISE NOISE SOURCES NOISE _ OBSERVED DATA ACHANNEL OBSERVER SIGNAL SYSTEM UNDER STUDY PRIOR KNOWLEDGE OF SYSTEM Figure 1.1. Paradigm for Data-collection.

-4known form but with an unknown, random parameter such as its time of occurrence In the problem considered here, neither the statistical parameters nor the form of the signal is known a priori. Instead the signal is assumed to have a kind of internal structure, a symmetry, so that certain relations are known between one part of the signal and another. There are two significant differences between the signals I will consider and those usually considered in communications theory. First, the prior knowledge concerning the signals in this thesis is given by geometric structure relations between different parts of the signal, rather than by assumptions on the statistical parameters of the signal. And second, the signals are defined in terms of space, rather than in terms of time. It is not meaningful to speak of one observation as occurring before another -- of data that have been observed in the past and data that have not yet been collected. As a consequence of the non-temporal nature of the problem, two considerations of great importance in communications theory do not arise -- prediction and realizabilityo The prediction problem is the estimation of a future value of the signal on the basis of knowledge of past values of the data. The realizability criterion expresses the metaphysical idea that the processing cannot involve data not yet collected. Since the data are all obtained simultaneously, there is no prediction problem, and all the data may be used in the estimation of any valueo The idea of processing this type of data grew from the problem of analyzing electron micrographs of virus particles. In order to

-5provide motivation for the concepts to be developed, the structure of viruses is briefly reviewed. The form of the present problem is derived by considering the problems I encountered in trying to process micrographs of viruses. Virus Structure Viruses are believed to be constructed from a core of nucleic acid, containing the genetic information, and a protective coat of pslotein, the capsid. This nucleo-capsid is sometimes surrounded by a membrane. (See Horne, 1963 and 1964; Horne and Wildy, 1963.) Of particular interest in the processing problem are the so-called simple viruses which almost invariably appear as rods or spheres in the electron micro0 0 scope. These may have a diameter from 150 A to 1300 A. Early electron microscopic observation of these virus particles did not disclose any internal structure, although a growing body of evidence indicated that the protein coat was composed of a number of subelements. X-ray diffraction studies, though, did show the shell to have internal structure, leading Crick and Watson (1956,1957) to suggest that the shell is composed of identical units packed together in an orderly mannero The spherical viruses would have their subunits arranged in a pattern similar to one of the regular polyhedra; the rod viruses in a helical patterno Two observations supported this hypothesis. First, there was not enough genetic material to code a large number of different proteins. Second, a regular arrangement of identical units would be a minimum energy configuration, aiding in self-constructiono Further X-ray diffraction studies showed the internal structure of the spherical

-6viruses to be that of the icosahedron, one of the regular polyhedra. Soon after, direct evidence for icosahedral symmetry was observed in the electron microscope, when a technique of double shadowing on the largest "spherical" virus showed it to have angular corners arranged in a pattern consistent with an icosahedral configuration. With the development of the "negative staining" technique in 1959, surface structure became visible (Fig. 1.2). Since then, many viruses have been shown to be icosahedral and all viruses examined show that the surface is composed of an orderly array of small subunits, 0 0O or capsomeres. The capsomeres are about 50 A or 60 A in diameter and may 0 0 have a central hole of 30 A or 40 A. The number of capsomeres making up the capsid of spherical viruses varies from 12 to 812 (see Almeida, 1963; Hosaka, 1965). The bewildering diversity in number of capsomeres was resolved by Caspar and Klug (1962), who found a rational basis for categorizing the capsomere counts and arrangements. Only certain arrangements and numbers are possible, and can be enumerated in advance. The best resolution so far possible in electron micrographs of o viruses is on the order of 20 A, so that capsomere structure cannot be determined by observation. Indeed it is frequently not possible even to determine the number of arrangement of the capsomeres in the capsid (Howatson, 1965; Klug and Finch, 1965). The limit on resolution is not set by the electron microscope, which has a theoretical resolution on the order of an Angstrom (Heidenreich, 1964).o Rather it is due to random noise, which may be caused by shot noise in the electron beam, photographic losses, and

-7 Figure 1.2. Electron Micrograph of Viruses. Upper, portion of an electron micrograph field of human wart viruses, negatively strained with phosphotungstic acid. Scale: 0.1i. Lower left, single virus particle at high magnification. Scale: 50mti. Lower right, model in same orientation as particle. (Figure 3 of Williams, et. al., (1961), reproduced with the kind permission of the authors and Nature.) *..... i:::::i: t i ii if i i 0' A iiS i r' )_: i iFi i. S~i^V;V:.>: {i..:..........: S.: {.::v:: 8)::.,::::.:j:,......... Figure 12. Eletron Mcrograh ofVruses.Upperportio

-8random irregularities due to the staining technique and the supporting film. Valentine and Wrigley (1964) discuss the first two noise sources and some analysis has been made of the statistical properties of photographic granularity (Klein and Langner, 1963, and Thiry, 1963), but no data are yet available concerning the statistical properties of the staining and support film irregularities. Processing Virus Micrographs Since the object being examined has such a high degree of structure, the micrograph has a large amount of redundancy. It should thus be possible to recover some of the signal structure from the noise by suitable processing techniques. Two processing problems appear: signal classification to determine the arrangement or the number of capsomeres, and signal extraction to determine the structure of a single capsomere. A processing scheme has been proposed by Markham, et al. (1963) to increase the resolution by utilizing part of this redundancy. This is a superposition technique which averages sections of the micrograph that correspond to each other. This technique was found to have several pitfalls (Agrawal, et al., 1965) which could lead to misinterpretation of the data. My interest in the virus micrograph processing problem began with the recognition that this technique can be described in precise mathematical terms. The pitfalls in the technique could then be anticipated, hence avoided (Norman, 1966). The analysis lead me to investigate the possibility of using more sophisticated signal processing methods on these photographic data.

-9The power of signal theory in describing data processing rests largely on the idea of the Fourier transform, or spectrum, of a function. The data is represented as a sum of known standard functions with simple properties. The process of finding this representation is the transform, the coefficients in the representation are the spectrum. An especially important class of data manipulations, the linear operations, have a particularly simple form when expressed in terms of the transform; the data processing involves multiplying the data spectrum by the "system transfer function" of the manipulation. A simple lens system can be used to find the transform of photographic data: the spatial distribution of light in the back focal plane of a lens is the transform of the light distribution in the front focal plane. Multiplication by a fixed function can be implemented by passing the light through a suitable photographic transparency. Through certain tricks (which are themselves derived from communications ideast) using coherent light (see Cutrona, et alo, 1960) it is possible to multiply the light amplitude by negative or even complex values. These coherent optical techniques seemed to be particularly appropriate for processing the micrograph data. If the original data has translational symmetry, corresponding to the periodic functions of signal theory, the transform is simple. The light energy in the transform plane is concentrated into discrete points, with no light energy appearing in between. The helical symmetry of the rod shaped viruses gives rise to a more complex spectrum, but one which has been determined for X-ray diffraction purposes (Cochran, et al., 1952). Klug and Berger (1964) took advantage of this fact and

-10used the optical transform of micrographs of helical viruses to make certain measurements, although they did not attempt any actual processing. Berger, et al., (1966) proposed using coherent light in the optical system, but only for practical convenience, again not utilizing the processing capabilities. Besides this optical transform technique and Markham's superposition technique, no processing method has been proposed to utilize the redundancy in the virus micrograph. One of the difficulties in applying transform methods is that if the original symmetry is rotational, the transform also has rotational symmetry and no simplification is effected. The superposition technique succeeds because, in polar coordinates, rotational symmetry is reduced to translational symmetry in the angular coordinate. If othe original symmetry is more complex, e.g., icosahedral, no general expression for the structure of the transform has yet been obtained. Moreover, no simple coordinate change will reduce the problem to translational symmetry. My original intention was to perform both the signal classification and the signal extraction problems using the coherent light techniques. The classification problem was to be handled using the results of signal detectability theory -- the matched filter (Middleton, 1960; Kozma and Kelly, 1965). Filters were made from models of the appropriate arrangements so that the correlation coefficient between the data and each arrangement could be determined. These models were based on structureless capsomeres (Fig. 1.3). The transforms of some of the virus models are shown in Fig. 1.4l Although the patterns are complex, it can be seen that different arrangements have their energy concentrated

-11-:.::::::Efi::;00: iVE: 0:f::::.I: i~i:::::;: ~:i::; ~::-::~~~~~~~~~~~~~~~~~~~~ II:::.::::I:1, I: i E:;:! gE:E!i:iE: E;; Ei:E:::::: - - - i -: ~~~~~~~~i:i~lli:l::,:: ~l:i;.-.:: i::::::::;: E:::::::::::::::::::::::::::::::.::::::;:::: w ~~~~~~~~~~~~~~~~~: S:::::;:: fi:;:.::;: 40::: E;::02:d ~:: aD: i: f:0::lii:::'l::E:il::E:ii! i:00:E:.::::0::;i:;:E:::t [s:?'t ~~::: ": Nd:, 4 E: 0'E:':::::: I::~'#::::i:i::iS::i;,0:.E':l::':i:'i E': i:.!:::::.ii::iS i' i.i::E::' 0' I _::li:::::'::I iji:': i::'.ii::~t S; f i:::: tV:ii~i:: i::d'V00:;~',,000f'9; [1Et','' ":': 0S;i;': ^~~ ~ ~ ~ ~ ~~~~~ _lc _;,,^i''''a'::Sj.~ i. T;S?:A:;: A>i:::I':R'::'i:::S:'::~';' i-: a Si::, IC,;~ ~ ~i Ai:'S C Vi_:,-S, _;:'&'.:':S';0 S 0~~~~~~~~~~~~~~~~~~~::;5~i::::: O ~0: d::i Sd i,:'iLC:::;:;::: CoN'LELi::C>Z>> -::(::.;:::i:::::'::::~ ~:; S'':::':S: S: i:.:S -:: S:;'::::':: i::i:: -SSii:.:':. i::::::,:::::-:::.:.::: -:::::: j::: ~:: ~::,..::i:: i: S:: E::: E 7i: i:E::E:::E::;E:::::E-::;:E::in:::: i.:: i::: i: EE::::::::::::::::::::: Ei:::ES:.i 5::::::.:: }' v; 0;0; ii:i:: 0 i i0200: i:':-.:: -:i-$i::::i-' i:::::0ti00;02si2Ei02: i:i~:i:i E:;:;::;l:- ii;; i'g':g:i;.:0::';!2':' _ _' _i' i.:::: {: l _: E i::: i!~~~~~~~~~~~~~~~~~~~~~::::::':::::::::::::::: W O: V t i?, 0 t; 0 0 id 0 i f 0'.; f 0, 0! X 0;E; l @ n * 00: 0 f:,-t::::r:S.::::.-il:-i~:i:i~i:';i o o nX;:0X; S:X:>;0. XSXS~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~z^ n^/CVV4> A;.Vd~~~~~~~~~~~~~~~~~~~~~~~~i:a;XS >u <<<>S0.<Lii:: a~l.iii:: iiii':i~pt O 0^ v Q;;;0:f00f;0S;0:;;U-liii:i::i:::i:: i:::ii:i:i~;'~~i:i'i:i::i w: w 6': id't',:y'.f0'di...;'',0.',,"'.'''''"'.'$,''',!"'','.,'''',,''.,'''''.".'.'''' "'' " w i ^; d' W';''000C'E'':''Sil:iili::iiii~~i:::.:a::::iii~::::ic~~ir~ii:::i~:.:ii:.:'::::i:::::i:::::I:I.:i:::::::(:: i::-i::::(:i~ t~~~t000 S; j 0; C ffiLS)> StS;<Stf O>SiG75\<S<<>><4zECS00 0;R W X 0 ftz S?; < i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~::: i i 0 0; 0 0;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.~ i:':ig:`_jji:;-!I''-:~l"::j:jii;i~Eli;:'.0$y:~i:i~i~i:......,................................ X M........ 4 4.......................~~~~~~~~~~:c~i~i::iiiii~::::ici~i~~i~ii i~i~:i~~i:':::::ii::'.liii:'iii~~liiii~c::::'i:;;::;-ilii-i::i Figure 1. 3. Virus Models. Constructed from flat~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:::-:i::::::::::::::::::::::::,:j~-:~;,i`isl:::i ":j:-::: capsomers placed on a spherical surface.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~:::~::i::::~ii.::::::::::::::'Ii~ii:i:~i~~iij:'i:~ii:::i'iii~ji~iji:::il:'ii

-12centra region of mo d e l. Lower, t a l Figure i.4a. Optical Transforms of Virus Models. Upper, central region of model. Lower, optical transform of the model. ( a) Arrangement with 42 capsomeres. Three-fold rotation axis central. Model based on spherical surface.

13:::::i:::ii::-i:i:::::::.:::::i-::l:,i:i: i:::i ii ij:~i:i ~:::::::i i::i::'i:::i:lli:..:::'::::::iiii::.'.-::":::': -::ii~:::::ii-lii:.i::::Plii:'::iiii'i::il:-l:jl:l:ii:iiji_:;;i:'ii,-:i -:ii-.ii:ii:::::::::iil jill::::,:::il:i::._::.. i:.r:::(::::::::i:::j:::::::::::::: i:i::::::(::::::.:.::~:(::'::::i:::l:i::::: ~:::::::i:li::::::I:]:::.'. i:::i:::::::::-::i:i::::::;::''::I:'::::::::::::::::::::::,:i:::':ii::iii:ii::-:i:::;:j:: ~:::::~:::::::i:::. i:i::i::ii~::i:,i:::i:r:ii:iiil:l:::::: -:..: i:i::::i-:'::i::i::::i:i':ii::iii::::::i i:i;::~:::i::i i I:::::i::::::::::::'iil:I:: i:::'::::i:il-:':i:li:i::i::::: i i::::ii:::'.i::ll:'::i:::i::::::: i::::::::::: j~i::: L~''::iii::~i:ii:::i-ii:::::iii::il:ilii:: i:::.::::i::::::::i:::::::: ir-:,:i:i:li ii:ii::il,,,ri:::~li::''::'::':::::::-:::i:::::::i::::.:::iiii:i:.iii:::ii:ii ii::~iiii:r:::i: ii:: it:::i:~:i:i:::l:::::::: ilii::i:iiiii'iii,'I::::::I'':':l:':'''i::l'::::::::i:::ia';iiiiiil:lilli::'i:: i:::::l::::i:::::i-ii:;:::i_::;:i::liiiiiil;l'l:::ii:~::.....:.::;-iiiii;:::;:iiiii: ri:(i::::,''i;3:i''iliil~';ii::'iiiii:-'':i::'i liii _:::.::: i:I,;ii.::-:r:il:::::l:::::i.iii..:.::,:::i::i:.:...:.::.. j;::i::::: I:::l,::l —'l:::i:''::ii:i:':.':i::::'i::::::: iijii:::: i::: i.::::::~::::i:':l::,:::,:':i::':''i'::i::::::::i::::::::::::::'::'::i::::::: i::;::,''i'll:::,::i:i:::::I... i:i(-l:i i::i:'::: i:::::: r;::: i.,::i:::::::::::i:l:i-ii:::.::::i:'::: i::i:::i:::::: i ir::'ilii:'ilr':i:-;::::: ":l-:iiiii':iii:iiii:ji:::;:I:::::::::':::::::':::: i,::::::'ii,.ii::.,:ici'ii:i:,::::::::::::::: I::i:iil::'-i::;:::::.:'i:.::ii:::ii::ii:::i:::::::~::::::::::::::iiiii::l'ii::i:i::~i:-:::::.::.::::::::::::::::::::: "ii:::i:i::i::i::ii:iiiiiii::::i.i::i~~:i:;i::il:l:i:'i'ill:i:li::::::I:~::::::i~ji:i:::i:i:::-:::::: Fii::l:;ii:i:iii:i:::jlii:i;ij::i,:::i::::iii::ii:i:::::::::: j::':i'::::::l~::''i; -i:i;l:l::e i:::::::l::.:: i:i::::i:i:i:::ji::i'::ri~::::: I:.:::'iii:::li::::.:::..:..:.. ~::i::i:cii:::i:li::: -:.:::::::_:i::,:.,i~:i:ii::::i:i::iiii:i.-.::i::::.:,:I::lii:ii;iii:i:i'iiii:::::iri:ji:i::::':::::::i: i~:;%'I;'::i:i:i::::::ii:::i: i:.:: ii:;..:.: i::::::::i:::1:11....:.. i::::: ii:: ~:i::::i:l:::::i.:ii -i::::-:::iil::'ii::.iil:;:i:il;'i:::i::r::ii::':i::::''::::::':::::::::::-:i::i:::: li:::::i::l::ll::::'I::i::::::`:ii:i::i::-::::I::. I::. i::i::::l::i:lil;: ii:llli:i::l:'::i;ll::~:::::i~::: I.::..:i':::i: ii:::..-:-: i::: i:::i:::::::::::( ii::::::::::l:::::ii:ir:::::: i:l:::i:::~::::'::::::::::I:: ~:i~~::I::::::::::i: i: i:~ i::::i:II:II:I:::r:i :':::::::ili::i:,:il::::i.ii-::i i::::i::::::::::iii::-':i'';'i'iiii;:~iiiiiii:i:iii ill;iii:::ii::i:ii.. — i:::::::::ii:::';~:;i'i:':::::::: ": ii:.,iiii:.iii.(.i:.:ill-~..4~:.::I::::::-::::::::::_:i:l-:i::i:i:~::i:ii:::-::::::::::::::::l'i':'i'i:-iii:iii':iji:ii:'-ii2iiiiii:i~i:ii:i:il:i:iiiii::iii:i::::::.::-.::.:::::i:::::::ii::i::?iiii:iiic:':':"::::::::~::::::::~:::::::::::ii~~:::::li:ji:::li::ii::::i:::i:'::l: i:'::i:::::jl.. ii''''':'''':' i:ii:.iii:l::ii-:ii'r~liiii:.ii:::i:~::-i:::lii-:i:-:i::i:::::i:ii::::i~:i::::il::::::::1:i::~:::::: i::iii::ii:l:iiiii:i:riii:::i:i i:i:i:i:i:::i:i:i!:-i:;::i:::::i:l::i:.; iic:iii:i':iiiii'iiiiiiii:iii iiiiliiiii-i: -:::iiiii(::;:il:lj..ii:ill:iii:lii:i:~i 1:i:'iii'':'l::':::::li i:'il:lll: i iiiii"iiii:i'iPiiii::::i:i.:i:;i:::::i:i:i:::ii:~ii:::':::(.:i:i:::::i:::~:i::::i~::i::::i::j:j:::jil:::::I:: ~:i::::,::I':::::::::~:ii:i::ii:::iii:iii:i-::::::iii-:iii:ii: iiiiiii:i::::iiil:::ii:::':'~i::ii:ijili:iii:ii:: i:-ii:::::i::i:::::,:::::::::i~: ~: iiii:ii:i:iic-iiiiiii:l:i:~::i::~::::::: i'''l::'::::I i"':':l:iil:iii'i:::ii::'i:::i:'::l:::'i'li(:::-::::i::::: i::ii':i:i::,:i:ii:ii:::::::::'::ii:il'ij I::..:.:. i::':i:S:iii:'iiii:lii':iic:iiiiiiji'ljiiiiii::i:i:i:::i:i:iiiiii:ii::i::~::i:i:i:: i::::: I::::':::.:i:::::::: uiii',i,iii:x:iiii;,i:::::::::::::::::::::.:.i::ii:iiciiiii":i: iiii:.:iiii:iiii::::::i:ii::::ii:iiiiiiii:i:i:iii:iii;::i:::i:::ii:i:::;ii::::l-c~:iii::::::,i:: I:::::::::::iii:::i:,:ii::ii::i:i;:i:i-iii:::::::i::~~~:i::'l:::i::i:1i::i::::::::-,:::::::: i::::i::ii:i::'iii:i::: —::::-::'ii:iiiili'iiCii i'i:'''''i'i~:i:i:'l'::',':iii;i:iii:l I:ii':i:,::i:::ii'i:'':'':i:il:,::....,i:i:iii!ii:i;i:iiiii:::'sii::::l:'i' iil'i::ii:ii;::l: ii:::::::i~::I::::I::i:i:i::iliiiii~:iiiilii~;:-ii:::l:r:ii iii;ci!iii;'ii:::'i'i-'iiiii':i:;iiliiii`i:i:.iii.:::::I::iii::::::i:: i:: iei\:iii:::i-il'i-i:::ii'"iii':.:liiic:iil:~:ii::ii::r:ii:::i:i:.::-. i:i:i:i:.:::;::::: i:.:i:i:::::i::::i::il:::l:i:.: ~::'':''::':'::'-''''''l::iT':ii::i:::'::::ii: i::iiii:i:~:::i:::::::'::::'i:'''i~::'i:ii:i-::::::::i::::::i:i::.::.:~::i::::i::i:i:i:::i:::.:::::-:.::i:':i':'' ii(::li:ii:::::i:::i:jj:::::': ii:i-iii:~i:i:::i:i:l:i:::iii:''iiiii:ii:::~1::::i:i.::i: i:i::::i::ii::::: ii ili::::::::i:e i:i.~:.:ei~~i..fi.~-:-:::i:i:::::::::iii-ii:i-:iiiii:::;i:i:ii:i:l:i:i:i:ii;i:ir:iii::'i:::::i:-l;:-::i ii: —i:;::i:-:-i:~:i-il::i:i-:i:i:::::: ~:,::iii:'iiil~:lii:I.':iiii:-:li::ii::::: ~~~::::::::~:::::::::::::'::i:::,:::::' I''::ii-::l-iiil:il::',:;:li:i-:i:::..:: ii:l:::::::I:.:::::::":i::::::i~::j:::::::::~:::::-:~:::::::~::i::::~::-i::::::::::::::::::,::::;::.:.:,.::::,:i:;:::ir:l:i:i:-i i:::i::::::::::~~:::: i:i:-:i:ii: i:iiiii:::::,::::i::i::::':':':'::'::':::'':::::'::i'iiii:ij::l:::::::::i:iii::li::::i:i:ii: ii:iEil:i:icii:'ii.iii-iii:iiCii',iii:iiiiiiii'i:i:::ii;i-.:::~:::i::::i::::ilii-:liii:.::l:i-::ii:::-:::::::'::''::::-::::::::::::::::::-~::::::~-::: li::i::::i:i:::':i:::::i:::i~:~i`:'ii:~:ii:::::-:::::::::::::::: ~::::::::::: Liii':iiiii-iiiai:i.ii:i::i:iii;i1::~:::::'i.:ii::::ii:i:'i':i:::li::I::i:iii: iiii:iiii~':iii~:::::: -:::'i:'::::::~:::::::::::::::i~:::::..: il':.i'iiii'ii-'iiii'i:i:':jjij;:-iii::li:iiii-i';'''iil'ii-i:: iii-'iii:liii~iii::iiiii:iii~~ii:'i:j,:::i:i':: iiiiaii':i:i:::ii';:-iiiiiiil:xiii'iD:ii'ii'aiicliliii.:::::::::~:::::::'''.:'::::''' ~:'::':::::::: i:::''':'''i:::::'':''':il:.:'::::':':::::...::::i:r:riiii,;'l:l,''i::ii;::::l:ii:::::'' i:~ii:::::i:i:i:::i::I::::::::i::::::::::ii::i::::il:::::.::::::i:i:::(:::::i::I:I:i:i: S::i::i:i:::::iii:::iii:..:. i'::iiiiii'i:i::':;;,'-::i'l.l'il'i;i:i:i:,II:::::ii:i::::lili:'8iiil:i'ii'ili:::iiiiii.:l::::iii.::~::~: i:.::: -liii`iiii-iiii:i9liiiiii:i:i:;:i-::::.. i:i'i:izi;iiii'iii:'iii::ij'ii-i:iiili:i::::::::~i:::::.i:iii-ii:'i:icl-i:-:i::ii:i::isii-ii i:i::::::'''::'''''':':::':''''':'i''':':':li:-~i:::i:::::.::-:::::;::::~::i::i:::i:'iii;::::.i:::::::::~:::-::::::~:::::::::i':i:-.i:~:iiii:::iiiii:l:i::::::;:::i-:::i:i::'ii: il-::":i:i;iiii''''':ii::::il:ii:i:::i:::i:-i:::::::-::i:i::l::::i:i: —:i:':'::: iii'iiili:'il:i:;iifi.:iiiiii:ia;i: ii''.ii~:ii-iiii~.'ii':ii-iii ii?'i:'iiiiiii-'iii~'ii-i:li-iii~:iii'i iiiiiii-:i':j':';iiii:iiii~i:ii i:i:i il:i:iiiiiiiii-:ii':'Biiiii::'ilii:::l iiii: s::r::::::::':::r:::: -:-:;::;;:::::::::::::::::::::::: I:::-o:::::::::::::~,::::::ii:.:: Figrlrel.4b. Optical Transforms of Virus Models. (b) Arrange ment with 92 capsomeres. Three-fold rotation axis central. Model based on icosdhedral surface.

-14-. I.I. I.... 11:1I1. .. I.1.1 I-.1I,11. 1.I..II'll: .-:1.:. I .I II I I. I-I I I. I,-I11,..III —. -I.II111.,II.1.11.I.1-... I.1I-1... 1''. IIll.- I...., :..., . II II - 1.I.1, .I-.111.1.1 1.11.- 11 I I''I.11 — :: -.:: _, 1.., . 11..11,-.-. I...I II 11 — I..,.... 11,: II,I. I II. I 1. 11 I I-III 1. - 11 I,''.I I.II., -1. 1-1.1 .."III1,..::,::: I I: .:_-. : _:.l.. I.I. I.. II-F:-I I 1..- I'll . I III1. -1,:.,.- .. 1... II —- 11 I - I -, -, —:,::.,].:: —:.:: 11-1-1 - I I.,I. "I'll" I:. :.: -- ::: -: . I.I I., . -.. I :.: I. II II. I I - I, . --.l-.-:l —I..1-1.1 — I .''...11.,,.. I -, -,::,_.I..II " - I, I", 1., 1: I:11.,- -.-:,:-, 1-1. ,. :, II - -... I — -. I.. —::: I.,11 .III I" 1: - -I - , .. -.-I.1 I-,... I.1.1,11,:. II1-1, I I.-::]::::: I-I II.III,-., I.- I1,I I.1II.11, 1. I III I - . I.-11.-Il 11.I,..-. -, - I.I I.1 l:- I —.I - - -. IIII1"....''...,:II11 11 II- I1. I -11 1.I.1 .. 1-1 . -''.I:::-:::.'-:] :II'll,11 I,.:,:,,., I II I-11. I: 11 111.11 _:.1 .. ,I 1, 1-111 I'll, - I.1-1, I"...II.-.. I I I.1 I I::.':'.,. ] I.. .. I .-:: —:,, I 1, 1. I..1. I.. , 1.1 —l I-I11 - . I.1 I I I.1,- ..,::::l 1... 11 1.II-1-,I-I-I... 1, 1: I., :III.-1- -II 1., -I 1,:: -::::, I I:-, - I-: I I.1: I..::: III I —-,,11I., .I, ... -I 11I- II-F.._ I.1,111:....'',,..'',,......''I -- . ... — -, I I — I I I I-, I:I..1,,I, I I: —-I1 —l'', I:... IIII11 11.-.'' 1-1. II - —... 11 I...III _-::, : ::I. .....:I.,.I -...I-.111,1111, ::::. -II-I.1I —- 11. .. I,.... I.''I'll-'',.I — 11.1..'... 1.I I-I. I.:1, ': -:l. —. 11.1. I I. -..-I. I- , .: I.II1:::- -1.1 — .. 1. 11 -- 1-1, - 1. I.. 1: 1, I - -. —I I-': - I 1, I-.111I,:,:... — I..''II.11II, I,11 I-. -11-1, -I. I1, 1,-.1.:. — - 1-1.11..1-I11:.::- _-.1 1.1111,I. -''Ill 1.11,.....1111I1,: I...1-1 —.1-1 I:: '..... -:.l'l:,l:l. — - -1 —l-1-11 . I., - - 11 - I, - I I.:::., I I....''.-Ill.... 1-11. 11.1.1 -.1-11.1. - I1-1- "..''I". 1. I.. I 1, I 1, 11 I I I.- 1. III1: - I1- 1-1 -. —--— l.-...l'::,.;:_::::_:::: - - . . 1-1,. - 111.I -,I'll..::, - --,,, _.. .-, II 1. I::: ::,.:,,, -. 1, -.- I I I I - 1..111, I - 11 1, 1. -..,....- I.,.1.1.1 I I I - I... . ... I. I ---— I.I-.. 11 I . I.,, - I - "., I I I ..,- 1... - III-.1I.. —:: I I. I - I , -, -,::..:, -. - - ". I - I.I. - . --::.:::,,,,. I 11: 1...,,. —-::::- -..- —... - _ -, . I.-I:::. _:: :::::::,..'"..I.-.1..1 I..., 1. I I-F-.:::::-_-1-1.1''I'll,,:: 11I 11II 1: 1-1-1-1-1, I.II..I.-.%.::.l 11 - —:, ,.,..: I. 1-1-I.-I-1-4..... .:"::l. . -,, .. 1.II I - -..'",..'' -..::::::,.:]l,::,:.::,,': -:-11. .I.I:i;:::]:,;::;:::],::::::.,'I I-.1-::]:::::., I..:,,.::., :::,:, Xx. 1.11 1.. I, ". ". -:...''...-,,...,..:.,.1 . :.'almm, I .:,':,..,,::::',, ::,:- :::: --,... I........ I'll,:-_,:,]:I1-1: ":,:::_: ::'-,.'':::::.:::;:::!: I1. I... -.:-:-,.:.1 - 4, ..1..:.:.-..1-II -, I1..1. I ——::...:...:::.: --- III'..: 1: -., , "." ...,-.- -1. -,, . ...1 . —...-......, 1,.-:.-l-.. —:-: .:-: ::::.::: - I —:,::: :::::,............,I:1.I- 1: -: -!::-...,.-. ". iI.. 1:.. .,,.1.:..,-.-.:,.:-: ,,::::.: .. IIII I- - I.-,-.]::..:F::::.-.:, .... I.1 ",..... X.-.. ::. . I.....:-.1.... . I.. II.:II.1 -:-..:.-.,:::.::.:: 11 I.I..... 1. IIII11 1-1-.1II,,I — I,.II. 1..1,.,,-..::1-11-:-..1 X...-.."., :::...::i:::,];:::::,::]::I, I.11::1 :; ,:;.:_],:,;]: ]: :.,, I I.. —]]::: ::::. :1 :..,.:.,::: :] :1. I...:::::::::::::..1 -:: :1 :: :: i:_:::.I II. ..I".. I -... . -.1..:...................1..''. - -:.1 I.. 1. ". I... I —-. 1.,.''. -,,, —...-,.I,,, . I -. 11 - -:,. : : ]:i:!:]:, I .::.,.1. F]: 1. 1. I I ",, ":... -::,:,:::::::,.. I I.::::-, :...... 1. 1.11.:-.:%::,:-:- ,;. :.- -.. i 1. - ;:]:.:!::::::,.:,.:..:II .. -- "''....,. III ..1.11.1:;:]::.::!; :i:] i:, ;] -....'.,:]::::iil::;.::]:::]!:]!:: 11. 1. II.I I 1. .:... I.-:1..",."";:;l;,:;::i:I: -I....:,.,...,.. II,,.: —...:.....,,..,:::, :,:,: t:::::::::-.:::,:...... I I.::: -... " 1''. I I. I 1.. -:::.:..:._., "..'....'. "'. -,.'.. I'',: 1,, -.1- 1 1, I. - II, ,,: I.:,111 I II.-.,... 1..,-,:,,:,,I::::-,,, :::.:::::.::::.]:]::]''::,::":..-:::;::::, II. I, ,:-,..'' -,.. - -,. I1,., -.. - : :;: :! !:, i !::i::].:il!!!il !.! I.....I II:,:,:]::,:]:.]..., 1, "..".,,, -..,'... - I... 1.II -.::::,.::,:]::]::.:]:!:.:::]::::::::.11,I-:: I -.1 - I'-.".." I I I... 1. III:. . I.l...... II.. - -:::::::1::].]:!:::]:::::i::::]:.. - 1.-.I... I,, I.I II III-::,::_::::-: ":,,, ._%:_I:-.. -.1-: ", - I.....- -:,..I,,..1..:::- :. —'':,. --- I11...I'll I -,. -o,..,.. I.::::.. ::,:.,.-I',-I..,. ::..:..... ....... -.11,I1-11'....I.-., , -, —----- I-I11. —.:::,:.,.:.::::,-...:;]::.,.::-.: —.......-F. —.1-1-1,11...... II....... I I -.:::,::Ix: m., I.. %- ].., - %,'::,:::,.-......-II.,I:..,I:. -::i::_ -F:...]:::.::: ".... -.-.-.I.. I,,i;-I.11 111.. ., -.,...::,.._ j: !:j: —. -..,.I 1... . -::!::,: :]: i:: ]: I.. 11.-:I.. I'llII-,: I :::] xii::: :': —:.:_.. ".....''..,..", I.,.. I 4- ", 11 :::::: :::, ::]:.:!.,:%::: :.:i -IIF,:...- ..-11 -] ]::::: : ]:::::::;:: .::::. I::.1 I ,,: ,.. ",, ".p.I..... -.1-1.11.1111. I.l...III:: - -:-:,:::::::.::. 1....4...::.:: I.,- I..1:: :::::::.::.:::::::::,.,::,:]:i:::i:::,]:]:,::i;]],:.::: I.,:''.. I:':,: 1:.-l-,, I..I... , ". I....:.::::..:::::::::::::,:.:::.,:::::]:.::...::,.:.''. "... I -.:: : .,:_,.p'l 1,: .. 1. I......... "..,:;.::::....: -.,:..:.-''. "..'.......... I-,...I.:-I::-.:::,.. Il:: I ,-]:i.... 1.. I-::,::.::,:,: ::::, - : -.:,:,, I . II-::.:.:::]::::.]::].:]:i:l:.::!::]::: ... I..::::,: 1:1.I-, -::: 1. ,::.:::.,::::.....::.:::::: —.-..:.... —. I III, -I.I.- -.1::::.l::!::I11I.I I.I.''.,4-.::,,-::I.:] X!:'.,...::.:.:::::::,::::]:::]:,:.::::: --— 1, -. II I -1:1- -1-::, I 1,::.1 -.:,.,:::::,:::;::::-.:.-.-: - -1. —:,:,.... -,.. II.,1.....11,::.,....-I.......,-,. II I-::,:::,....l..:l'...::.l.:...::.::.:. .1.-I..,,,: ::,::,::::::_- -.1 11:-,. 1. 1.:q.,..',,.".,.."..",..."...I,..1....11,...I. I] I I.., . I 11.- 1". I.1 -1::,..,,,,.,,,..,.,,..,,.,,...,,...''I''. -..,...,...., . -. —.,.: ll-, I:1,::: —,II I.. .. 1. I 1...,,.::] ::::] i::::::":._..1.1.1I::..-.111,1", 1: — --. I.,I:,.::, l:.::,:;.: -,...."::.._.,o: I::.:,::::,.,:"].:::i::,:,.,:]:.,:].:::::.,::: l..l.-... —--- -—, 1. I.-:::::::, ",,, I I::::::,:?.'', ..-..1,:1 :1. I.. , .:.: I 1:. .::: 1. I : "I::,: ,]:I.:]. I: I I I - I 1. %, -:- -:,-... ]..1.11,,-:::,,,I-.1 -......I —--'..1,:;];:,!.;,.::i:.;:!::::]:..I.." I I.II : . :: x :, I.. 1. II. I - - I 11 I... 11::, - I.1.:]!:,:::::.::,:::::::,::]::,::::::]::]::: -: ::::::::::.l..I - I ... - 1. - I 1.: - 11.11111..- 1, 1., I11I. I.-1::,.-::::-. —-,.,.,. -:_-_-%.:::_::_:_.1.,.''..:;;:::::,:;:::]:lii,::;:': -I-..II'll..:.I.. I I I-. I, I ... :: 11... -...-,:]:.::] II.,.. .I, I, I11, 1. - -1-1-1.111. 1. ]. II. II I,. -I.-...''.'. —- -.- -, I.I.1.,.1 1, I -, 11 11 1.:': I....,:::::11:11:, 11 -,: X. .:.-I,.:. ,,.- I.." - II -, 1:... — :.,::::]: I I.-:1I11: -1, I:1. I 1 I .. -. 1. I-I.-.1,:.I.:.1:. I.1.,::.::::,:.:,...'.,.,:..:.:,:.:::]::,::,::::,::.:::!:,:,::.,,I —II-,-:.. II. 11-.:-. -.1 I11 11 II.1.II-..1 ...I-,.-. III111.I-I., ..,....,I.11,11, - —.:l: —:,:....,... — III 1. ---- ". - 1 — I I,,--,:::::,:::: 11....::,-: —:,:::,,.1. -..I I I II . 1. 11 I ..., II.... 1, 11 -.X ": I]::, —,......-II. II11-]] :,.....'. ": ".'. ",_., I,."...:,:.:.,]-... I.,!-: I-, I, -,.I,,::,.::.::::,:::::::,::,.;.;:,.:,:::'':,. I..I.1 1, -I11., 1. -.1I: V.: I.,I1.-....''....''I.-II,''., -, - -.:,:,::::::l::].:::::::i I,.. I.. 11II.I I-. -, -,, 1. II1. 1, - 1. -I,,...,: -I...:,.:.. 1-::.. "....':.-... V -.1 .,.III 11-, , -I,.. III::_-":-: 1,I,,-.,.....-l. —.- 1, -, I-V -.I11 - I . - 1. 1..11-.I. I1-1- I::I 1... I 1. I., . I......., -1.....I :,:!:: -11.1-I, —-:::.:, -:_-:_.:.,:::]::,. —. I., -. — — l- I..IIII ,. I,11 - .".....".". -. 1. II.,, -::::,:..: 1.111. — :::I,. ..,.- I11I'l l -, - -.. 1. I.....,'I., 1.,:: . : 11 I'', II I -.1- 1. I... 1. I I 1, , I - I -, I "I I, . - I-.:.. 11.1.....,....'...".., . -.. IIIII , —.- 11 I ,.I11I... — .,.. ...,,..11,I' —III:1, w-: ——..- 11 - - I —-I -:: ,:::::::"":.::::::.:,:,:.,:,:,::.:..,, I'll-. —:--:-,:-. -..III I-II-,.", . : . I -:, - - - .I..II I.,,I... 1. I-'. ..,... 1., I I..:.., 111111,1k&.,,, I.1 - 1 I -- - -..,. I..:,x:. - _: .,,],.4 ::: %::- -, -- l'..... III1.III:II...- I:1, . . ",-..-.-I.I., 1.I I,...1.., I- ",,:.,:::::::,-:::,:,,...,.. II.,.1,.1,I- I.:I... --! ::,:, :, I.II11II I.I II.11 1.I,:.-I I.I:,, .::''..:_:.,:.,,.::,: ,. 1.1,II...I.-.,I1,.::": : -. I,..''...,l I.. II:-.:':I.-i .II-I —L.::].::],,,::,...: :...:.,.,,,-.,.., I II.:.:I:..; :! ::::i: ]::i:il,:..1.1..: -..I I II, --, -,...., ",::... -.1.1:- I I:,.4. -IW: , :, . -.:,,,,,,,II I,,: 1. I,.:,:I - I,.".:I-.: I1,I,::,:.1,.I I1.,::,::,':,:;.:, i:.:.,:.,::::::: I:.]:z:-].::,,.,..., ,,::..1. 1II.-1,.. I., 1. I., I.. 1-.: -::_,. -.-:."::',.!]..:i:.:i:: I1,..I:,,::.:11II..,:1.1.,.,.-,: !!::: i:::: :!; ':,:.- II:,,, -,, ".. II,. .:...-::::::.::I 11 11III.1.IF.: . .': I::: -::: 1. ". ". I":::. : ]:.:.:::: ]:: 1,..,..11... I-II II.1III: III --... 1, 1-1.1-:.1-::..,..":..:,, -1,.".: I.. I,..,.: I" :.1I IIII1. I.1I.,,:.,- ::.... __.,...::.::...],.I.III,I.II lI:.,:I.. II I'11. ". 11,:,-...:.. 11:, I..''.1:,.. I..::': -:,.-,,:. IIII 1II,,,. I...,::11:.::, ::,::::.::: .:., ::l:::. —::.: IIIII.I1..I,11 ::1 ::,,::::::: ::,:1. I- .::.:1, : III-11..1 II.., I.I III II.I..:::.'. 1..":.,::,:,%: -::.:.-,:,.::,.I I.,,-1:.1 — I.1. I.11.11, I III.I.1:I,:1." - -.1 I:%-.",..,:.I-I..I.III.::...:_. -.'.-.. -I::::,", l I01l...I-I -I. - I,,.1..- .. 11 I 11. I-..II'll, -, I., I'll I:-I,-.1I... I.,,,,. ....,.-I —.1 I..I III.-I - 11 - I.,:-, . I., 1. -... I,:.1-:.:-".,::,..,.:.,", -:, I. ..1 I. I..1 I,,.-. —:: -.1-1 -.,.: III...I I I., II.II-I.11I. 1- I I.11 II,,.,I.. %-II, I -,I% -1.1 1.-.... I1. .II11-.11, 111-1- I.... -: :: -1, III11 11I1-1 ,I.I.11.II:. —- -.I - I. I. .-1..1,, -,: —.. —....-: : - 1. I 11II-,..1-IIIIII.I.,...,.,.:: 1,II..III. I, -.1I.-;-::,,.....1II —] I — 11 II11I.. I II- 1,.,I.: 1, - I11.11I —'',I,. I.1,,:-, I I.III...., I — - I.-: - -I:::: -,:::-' —. —. 1.,: -. II:.1,,., - I:::I.,".,-,I,:::i,.::. 1., . I.. IIII.. III-,. III::.11 1.-1-1 —_.,II..,:::::,::::._I I.I... . I.II I.-..I.III.. I I..I1:,: - —.:::::I I I-,,I-. I.1.II-I. .:... I . I ".'I, I..... I.-:II, ,.. .1, I — 11I......1,':-::,:: ::::, -: — 1. 1: III,- I.., - - I,I-::: -, 1...: II-II,11-1.111.... I,,,.. I111.1I,.I...11,,,. II11 - -.1 I I..I1. -.-I... - 111..II. II 11 II1,1. I .., 1. - -,.I I I. I - -: _. %,'I', I I : % -,11.1,111 1. 1.1. 1. I 1 1, II 1. I 11 1.,, - 1, - 1 - I ..11.1 .. 1: -1 I . -''. ",::,. ,,,,, . III.. 1: -,,1.1- II, I —-.:_-._-.I, -: :::-:_ _:_.b, I, I11.1,..I:I-,,,, 1.11-1 I...I,l I.. I I... -, "- I.::.:_,:-: —_..__. I -II,I.::::::::]-:-..,I , ::1. I..11.1 I I.." - I:.'':..::.:.,:::-::,::-':'- ::. —. II-11 ::::,:::,::::..:.,:,::.,::.::,:::.:''::.,,-...: -.-.1.11.,.. .I.. I.-I-I -.,:, .: 11 1:1 I: ".11.I-,, 11.1 ....".. I., I -, ::l.'...''.1-1.1'..... I I,. III - -1-1.1'..II I - -,..:I.1.11::]:,I.. ,,- ". II1.I,:., 11:]:.:::-.:....'::44 1..II I.... I I 11 1, ",. I:.. -.1 -:..-:.: I 1. -F:. I 1. I .. - I 11.. 1- -:-_ : II:II. I.. 1.:,]. ::,:: I.II.: :" :,-,I'll 11"'.1 :: : - 1..1 -1. - I1..:::::,.'',,::. - 11:1-I.:,:_: —:::,,''..''.1-1 - I I I.111, -:::.:: .1.1111,.,:I 11 II 1. II.,.''.".,.I11...I11 - I. -.::- ]-::,::,:_,, I:., - I I —-,..''. -.11 .. -"::::::,:_-:,:-. 1. ::::,., I I I I ,I 11 - —, - -.1 1: I.. 1. -: I I- - -: IX., I. -, . 1...II.II,I11 - 11 I ,.''..", —, I.-.. -I I , -.1. 1,,IIII.1. II111.II -II:,, - ., I- . - I-. I1.. II.II...I...I'',:..:::::: :::.1.I:.. I, -IIII II I. .-..1, II'll,.,.I.I..: : - -x-, -—. I.1-1 —.-., I.1II.II..I11 11 1 —-l.''... I... I.I —11I1, I I..,,-. . I I I. I I. I 1,,..I:.-.:.:.: I I . - , -.., 1: . ::",,, - - .I-. -11.11;': : I I,, .I II II -- 1, 1:,:..].::I. "" ", I IIII I II:: 11,:,:.::-,. -I.1 1.11.- -, -,:...... ,., ,,.::-:: —., , .II, -.. I .., -:1 -''.. III- . - - -, 1-1....-1::::,.-...::::: --—, 1....I..1..,..I IIII1.I- 1. .II.I.. —: ..:''..-..''. 1- -—.-:.. —_,._-..'', —a-,. I.". 1. .II..1, I, I.1II -.11 - I I.. .II - 1. . I1.. I-.- -:-:,: ——.::::::::::::::.,::-,:,-:,..l ":1 I I I: 1. I 11- I I I . . 1- I.. I I,:,:::::.:::l ::..: I... II I' ll 1, I IIII - -:. I:- - 1,'. ".::':'':.'':.,-. "...,,,,, "::. —...... —III I1.11,,-,,.,. I —', __:'. -- I- 1.-",. -- -..- -. —... I I ..I II -I.:,..-:_:.:::::::.::::, 1III.I- I 11:.- ], 11 I.. -I, -,,-IX —-::..'"...-1-1.:..1. 1, 1,,:- I:.1....:I,.:e,:: ..,.:,.,.,.-,.:,::,., III.II"-.., ".., I.:: .. 1. .. I.. 11,...,::.,: -I I.-I1. I::-]-... 11 1 I I -.. :..-,:.II,II"- I .II., 1.:. I.,II.1I1: I::..1:%:.: -,, ::,. I. I..::X-'. -.,.:,., II. I,.:.,. ,..,..,.-: I.. II...'I,,I.,::: ].:],:::::: I i::::]::: :::i:::: ],: .::.':,.I,I III.I.:, :::..:,:,:.-:.:::..,.... II.III II...'.,I —,, II I..,-.IIII -,::::::::_:,I..I:1-1- -.. .1.,::.:::::: :I-. .1,.I,:., I.. I -:.11.1.11::,,:::1:.: ,:.,::::,::.:,:, I.,.:,., I.,. II I.,II, -.1 I,.....IIII I II:...-1-....''... I.I II. I IIII.III-:,..::-::: —:...,,. -:':,,I I.11 II.II....II::::,::..I,I.:_:.:-.,. I.1. 1. --::::: :::, , %%:.::.::,.,:,.,:: :,,, .. II-.IIII., -::]:,':::::::,:... ....1......, -.". .::1, -,I 11 II.:,.::.III.11 1.IIII I,,,. I-II..III. .1.1,..I..,,.. I.1II.1,I.: .:::,, I:I1.,,.I.::: ,. I.:::::;: :]::::::: ::::]i j:: ::]:: 1:': XI: ,d,:: -.1.1I11 II:-..,..", I,::::::,];::::: I,]]:::::::::.1::".: -:, -. ...II.1I.. ...,:-:-: , - —, I I I 11 I. , I..1 I,,.:: ].: ].1.1 I... I.1 - ,: -.1- -, 1. - 1.!,. - -, I..I11.::.., i]!::i:,:i;:.:Ii:.-..I."..III. IIIIII'::::I.1." -, 11 . .,- . IIII,.I:,: I:1:.,.Ii!::;::;;:]::::,..,II,,.I:::::::I::,:.1.I,:::::.::1 . .,. ..I.IIIIIIII ..II......::,:::,::,:::,"I.II.II11III -::.'': "I.........,.I. 1. I 1:.1: xx,:::,:::]:::],,]] :,-I I.-11.11I. -,.".. III II I.III,,-,:::.::.:::.:, I:.".........4., —,",,"..,.-........ II...I...:,:.x.- - 1-1 I...-.,",. I1,.III,,,.,1. 1.II,: :,.11,I- -. III11,.IIII11 1. ,;.. —.1..,.. ..III,:::I..., i: :1:,....... I...I.,: . I:, %, x:....:,:: -,:,:::::: 0: ..., I -. I...1: 11 —..I.,...''..., I ...,. 11 IIII.. I IIII:,::.::I.-.:...:.:,:.,:..:::_.::;:,:.:.:_:::,,.......... III.II::,I.,I.,::.,, . II1.II,..I I I-.1.1.::,..:.'',:.,: I I I I I I1 1 ....,::,:::.:, - X':::, - ,....., -,,.,.,. III...I.11:,;::; ::I:;::;::,::;]::::,:_::::: x:,,.1 1..-I, .., "I. -,....".1,:..1 .....I.I., -I:::..II...11 11...::-:::'l:::::-:;_;::::, __::,,::,,l -, . - I1, IIII....I I.III..:I1.III.., ...- I:%%.1. .,,.I,",,:.:.. ......II,, 11,II..,,. I.. I.I.I.11,II,::,': :.,' :-::: 1.1- "I I., -1I, - -,.....IIII.III,: I I::: -- - '.:,.... 4, -:: 11: ::-:::.I II,..:: 1:::.:,'. I.,:11-1.,: IIII.II —-::,::::,:::.,::::::::.:,.:" .: ..,:,II,III-I..1.1I I.I..1...... I.III1.II I-.I-....1.1,:. -,:: .,::,::::: -, -::, :,...::, ,]i:i: .::,:] ::::::::,:::,:,::::X::,:: —.- .:.:: I,IIII I:.:....11:. I,: , 11 I.III:::..1I:, :,,.::., -,,::::1:.::: 1:..- 1:::. : ::.,,...,...,,.1..111. I .....;.:.....,..I,.II II::.. : ; :; : :::,::::]: i]::::,::::::, -.1".... — 1. 1..,I11 IIIII.: -I.-:1 :1..II., -.4. I 1,, I. I.1, I II.:',,.: -'::-,.::.:,-:, " I-:::.::::.-,, II Ix.,:,:..::,,: I ::.:.: :.,.:.,,. : .:::.: I:,,.:,': II.......:-".1,,:::, : - -.11....II., I:, :,:, ::,:"::,::.: .: ".,X..'..,.'.:., III::II::I11:.. I` I::.::. , ::,,....I I...II- -,.1:::::,:: X:.:..,:..,...::,.: 1.. I..,,,..11I..,

-'5 - ~~~~~~~............ ~i..........? Figrel.~. ptcalTrnsfrm o ViusModls () Arage i[: ~ en wit ii-2 casmrs Fiefl rottio axis[~'i ~: ~!i~centrail. Moe basedI on1 icsaeda su!rface.~! iiiiiiii!i

in different places. A working hypothesis was assumed: the structure of the capsomere does not influence the basic pattern of the spectrum, only the relative intensities of the different parts of the pattern. On this basis, a filtering operation suggested itself. Since the transform of the model shows that no light should be expected in certain areas, any light actually appearing at those places should be due to noise. The appropriate filter should then block light from passing where it is not expected, allowing it to pass where it is expected. This filtering process is very similar to an "optimal" filter derived by Wiener (1949), the so-called Wiener smoothing filter. I expected to be able to justify my procedures, first by proving the working hypothesis, then by demonstrating the applicability of the matched and the Wiener smoothing filters in this application. Two mathematical problems had to be solved. First the spectrum or transform of icosahedrally symmetric structures had to be determined. And second, the assumptions on which the Wiener filter is derived had to be reconciled with the knowledge of the symmetry of the virus data. In considering the first problem, I came to realize that the icosahedral symmetry of the data was not consistent with the concept of the Fourier transform. The representation in terms of the exponential function is intimately linked with translational symmetry, as seen by the very simple structure of the transform of periodic functions. Although each data function has a transform, two functions almost identical under icosahedral symmetry may have vastly different transforms. In fact, the regularities that exist between regions of icosahedrally symmetric data lack some of the mathematical prerequisites for performing

-17the Fourier transformation; the domain of the data is not a topological group. Considering the second problem, I recognized that the difference between the formulation of the virus processing and the Wiener processing problems is in the nature of the prior knowledge. The Wiener filter is derived using the statistical structure of the signal and the noise as the prior knowledge. In the virus problem, the prior knowledge is the symmetry relation in the underlying structure. There is no a priori reason to believe that a Wiener filter, derived under such different considerations, would be at all appropriate for my purposes. In the process of solving these two mathematical problems, I came to realize that the optical processing techniques were not appropriate for the virus problem. The full analysis of the micrographs would require digitizing the data and performing the analysis by computer. The results would probably not justify the tremendous effort required. However the concepts required in the solution represented a distinct generalization of signal theoretic ideas. The present thesis is a consideration of this generalization; the specific problem, the virus micrograph, will no longer be considered. The General Processing Problem Two general problems have to be answered: (1) What kind of signal processing can be performed on data which is defined on a mathematical structure weaker than a topological group? (2) What is the relation between such processing and the Wiener smoothing filter? The development of these questions from the virus problem has led to the notion of symmetric functions and to a generalization

-18of the types of spaces these functions are defined on. Since the data are spatial, not temporal, the ideasof prediction and of realizability do not arise. Because these latter concepts create some of the most difficult aspects of the general processing problem in signal theory, the generalization in the statement of the problem has been obtained at the expense of the demands on the solution. The idea of a symmetric function in the present context is novel, so that techniques of processing had to be developed. These techniques are essentially parallels of those involved in classical signal theory, particularly the introduction of abstract function spaces. It is thus natural to sketch some of the history of signal processing, following the development of the conceptslwhich I will be using. The concepts themselves will be defined at the time they are used. The ideas of processing random signals and noise begin with Wiener's work (1949) on stationary time signals, where he applies his development of random harmonic analysis (Fourier analysis of random signals) to the engineering problems of prediction and smoothing. The engineering or communication theory aspect of processing was generalized to include certain kinds of deterministic signal functions (Zadeh and Ragazzini, 1950) until now it is a standard part of signal theory (Middleton, 1960; Davenport and Root, 1958). Particular specializations of interest include multivariate analysis, or simultaneous processing of several signals (Masani and Wiener, 1957, 1958), and "distortionless filtering" (Bendat, 1957; Stewart and Parks, 1957). The mathematical development of similar ideas is intimately bound up in the representation of random or stochastic processes in an

-19abstract mathematical space, a Hilbeit space (Karhunen, 1947). Hilbert space notions had been developed to handle linear operations in classical analysis as well as for the development of Fourier analysis in a generalized and abstract manner (see Murray, 1941; Halmos, 1957). Although Wiener did not stress Hilbert space methods in his 1949 work, it was implicit in the relation between abstract harmonic analysis, generalized Fourier analysis, and the Hilbert space representation of the noise process and the linear manipulations on the data. The explicit nature of abstract space ideas in prediction and filtering was brought out by Doob (1953), Grenander and Rosenblatt (1957), and Loeve (1960). Meanwhile, development of Hilbert space representations of random processes continued (Cramer, 1951; Doob, 1953; Loeve, 1960; Kawata, 1965), particularly on weakening the requirements on the stationarity of the process; the parameters of the random process could change with time. In addition, the random process could be defined as arbitrary mathematical functions, not only as functions of time. The introduction of the reproducing kernel Hilbert space (Aronszajn, 1950) lead to new applications of the abstract space ideas to signal theory (Parzen, 1961 and 1962). These developments were brought into the engineering literature just recently (Capon, 1965). The "classical" approach of Wiener and followed by the engineering texts derives integral equations for the processing operations. The processing is optimized using techniques of calculus of variations. In the abstract space approach, the data functions are represented as points in a space and the optimization is done geometrically, by minimizing

-20the distance between two points. The latter approach is the one that will be used here. The Solution The first step in this thesis is the exact formulation of the notion of symmetry and of symmetric functions. The data will be considered to be a function defined on a mathematical space, the data space. The nature of this space is left unspecified, but it must have some mathematical structure to allow data manipulation. The notion of symmetry is expressed in terms of geometric relationships in this data space. The observed data is represented as a point in an abstract function space, the data space. The data is assumed to be the sum of some specific, but unknown, symmetric function and a random noise function. The processing problem can thus be expressed in geometric terms: given the observed data point, find the signal estimate that lies closest to the actual signal point. In order to carry out the geometric interpretation, a notion of distance is needed in the function space. This distance notion is to correspond to our intuitive ideas of how difficult it is to distinguish between two functions. Two functions that are close together are easily confused, while functions distant from each other are easily resolved. Because the difficulty in resolving two functions is caused by the noise, which may be different in different parts of the space, the distance function must depend on the noise process. Optimizing the solution by selecting the signal estimate nearest the actual signal point is interpreted as minimizing the mean square error between the estimate and the signal.

-21Finding the estimate depends on the notion of the signal, or symmetric function, subspace. This subspace consists of those functions which satisfy the symmetry requirement. The actual signal point must lie in this subspace. The processing proceeds by recognizing that the signal subspace can be segregated from the non-symmetric functions in a natural way. The problem of finding the optimum signal estimate is then shown to be simply the problem of finding the point in this signal subspace which lies closest, according to the distance measure, to the data point. The estimate is found by the projection operation, whereby the data point is projected perpendicularly down into the signal subspace, the foot of the perpendicular being the estimate. Integral equations are then derived which completely determine the processing procedure. Results and Conclusions The optimal processing procedure is completely specified by three conditions: symmetry, idempotency on the symmetric function subspace, and self-adjointness. The first condition states that the signal estimate is itself a symmetric function -- if P is the projection operator and d is the data function, then the signal estimate s = P[d] is symmetric. The second condition states that if s is any symmetric function, then P[s] = so The term "idempotent" is used to describe the fact that P2 = P. The third condition expresses the perpendicularity of the projection. If R is the noise ad covariance function, interpreted as a linear operator, then P R = R P ad where P is the adjoint of P.

-22A special case of the general processing problem arises if the data are a finite set of numbers. The three processing conditions can then be expressed as a matrix equation, the function space being a finitedimensional vector space. The optimal processing operator may be found explicitly by solving a finite set of simultaneous linear algebraic equations. In another special case, the data consist of a finite set of functions. This case may arise in considering a replicated experiment, each trial producing a data function. The processing operator again may be expressed as the solution to a matrix equation, with the matrix entries now being functions, rather than numbers. The operator may be found by solving a finite set of simultaneous linear integral equations. In particular, the Wiener multivariate smoothing problem may be expressed readily in this context and the smoothing filter obtained as the solution to a symmetric function smoothing problem. Another special case arises when the data space is a topological group, for then Fourier analysis is possible. However, the power of Fourier analysis is lost unless the symmetry structure is compatible with the group structure of the data space. The only type of symmetry compatible with topological group structure is translational symmetry. Although many important types of symmetry are translational -- periodic functions of time, for example -- the virus symmetry is not, so that Fourier analysis is not appropriate for processing the virus micrographs. The idea of a symmetric signal may be expressed,loosely, by saying the signal is repeated over the domain of the data. A na9've approach to estimating the signal -- here called the "trivial" processing

-23technique -- is simply to average over all the repetitions. In general, the optimal processing solution weights the repetitions differently, depending on the relative amplitude of the noise over the various repetitions. Moreover, the optimal estimate of the signal at one point may depend on all the data, even data values which appear at first to be irrelevant to the estimate since they occur at points which are not related to the estimated point by the symmetry. Under many important special cases, the optimal processing procedure reduces to this "trivial" operation. These cases include white noise and noise for which the covariance function is symmetric. If the data space is a topological group, the concept of stationary or wide-sense stationary noise can be defined; for these cases the "'trivial" processing is again optimal. Furthermore, if the processing technique is restricted to be stationary, or translation invariant, the "trivial" processing is the best that can be done. This latter result has important practical consequences, since the optical technique, as well as ordinary electronic devices, have the stationarity property. The processing procedure is solved for an example in which a periodic signal function is observed over three periods. The noise is assumed to be uncorrelated between points, but varies in amplitude over the observation period, reaching a minimum of zero at one point. The estimate of the signal at this point is, of course, the data value. The data with the least noise is weighted most heavily in computing the signal estimate but, interestingly, the data with the most noise is not weighted least. The results of the study indicate that the optical processing technique is inappropriate for the processing of virus micrographs on

two counts. First, the optical technique is based on the Fourier transform, and the virus symmetry is not compatible with the transform. Second, the processing operation is restricted to the "trivial" one, since the optical processing technique is translation invariant. The idea of a symmetric signal covers more than just photographs or diffraction patterns of geometrically symmetric structures. It includes statistical sampling, multivariate or replicated data, and periodic time functions. Under the unifying mathematical development presented here, it may be possible to apply results from one to another, seemingly unrelated area. In particular, the solved example derived a processing filter for time-varying noise in a relatively easy way. This approach may prove to be useful in the analysis of other time-varying problems concerned with periodic functions.

CHAPTER II THE PROCESSING PROBLEM A. The Structure of the Data The Data Space The signal, the noise, and the observed data are the functions which will be considered in this thesis. A function, f(x), is a rule which assigns a real or complex value to each point x in a data space X. In the fullest generality, the space X is left undefined, although it must have some mathematical structure in order to perform the desired manipulations on the data. Definition 2.1. The data space, X, is a countable union of disjoint connected Hausdorff spaces with a Borel measure. It is convenient to take the measure of the whole space as cne. For bounded data spaces this poses no problem. In unbounded cases this may be accomplished by a procedure illustrated by the distinction between functions of finite energy, Jf(t)1| dt < c, and those of finite power, +T 2 - lim (1/2T) f(t)I dt < a. The latter type of integral corresponds T-o oo-T to making the measure of the entire real line equal to one. An important type of data space arises when an experiment is replicated several times. Consider the data obtained on the first trial. This data will be defined over a space, Xi. Similarly the data from the i-th trial is defined over the space Xi. Assuming that no experimental conditions change between trials, all the Xi spaces should have the same structure. The data space X corresponding to -25

-26the totality of the data is the union of all the spaces from the separate replications, X =Ui {Xi} In communications theory the functions are usually functions of time, functions of a single real variable. Sampled data are defined only at discrete points in time, the data space consisting of the set of these discrete points. If each trial of a replicated experiment produces a time function as data, the data space consists of a union of lines. The same data space occurs if a single signal is monitored simultaneously on separate channels, the multivariate signal case. In optical data processing, the functions are defined on twodimensional spaces, usually the surfaces of photographs. Three-dimensional data spaces may also enter into optical data processing by a consideration of emulsion thickness (Van Heerden, 1963), by photography of three-dimensional objects as in holography (Upatnieks and Leith, 1964), or by considering time varying two-dimensional figures. Threedimensional data spaces also occur in X-ray diffraction problems (Buerger, 1956). The functions involved in the structure of viruses are defined on a sphere or a cylinder. The virus micrographs, being photographs, are functions defined over the face of the micrograph plate. Symmetrical Functions The structure to be assumed for the signals is a generalization of the notion of periodicity. A periodic function of a real variable, say sin (x), is left unaltered if the data space, the real line, is moved a certain distance to the left or right: sin (x + 2n) = sin (x). Similarly a cube is left unaltered if it is rotated 120~ through a body diagonal. These are rigid body motions of Euclidean data spaces.

-27The concept of motion is now generalized to include all transformations that do not involve bending or tearing. Definition 2.2~ An automorphism, a:X -- X, is a 1:1 continuous mapping of X onto itself. If x' is the image of x under a, then a:x -e x' or x' = xa The automorphism a:x -- x' acts on a function f(x) defined on X to form a new function, f'(x) = f(x'). It may happen, as was the case with f(x) = sin (x) and x' = x + 2x, that f'(x) = f(x). Definition 2.3. A function f(x) on X is invariant with respect to a, an automorphism of X, if f(x) = f(xa) for all x in X. Two automorphisms can be applied in succession to form a third, the product. If a function is invariant with respect to each of the two, it is also invariant with respect to the product. A function invariant with respect to an automorphism is also invariant with respect to the inverse automorphism. All functions are invariant with respect to the identity. Thus a function invariant with respect to a set of automorphisms is invariant with respect to the group generated by that set. Conversely, the set of all automorphisms which leave a function invariant is a group. Definition 2.4. A symmetry group, G, is a discrete group of automorphisms of a space, X. The Borel measure of X is assumed to be invariant with respect to the elements of the symmetry group. By this definition, the idea of symmetry is expressed in terms of the algebraic structure of the data space. The relation between this imposed structure and any pre-existing algebraic structure of the

-280X2... X a x x The whole data space is the union of these subregions. Figure 2.1. Data Space Corresponding to the Replicated F]xperiment.

-29data space will be explored in a later chapter. The types of groups that are to be studied are all discrete, so that no processing capability is lost by restricting the symmetry group to be discrete at this stage. By making the Borel measure invariant under the symmetry group, the automorphisms are, in effect, rigid motions that do not distort distance. Definition 2.5. A function is symmetric with respect to a symmetry group, or more simply is symmetric, if it is invariant with respect to all the automorphisms of the group. This concept of symmetry as invariance under a group of automorphisms is important in virtually all branches of mathematics. The concepts of physical law and prior knowledge, the foundations of modern physics, are based on just this concept of symmetry (Weyl, 1952). It is not surprising, then, to find symmetry arising from such diverse situations as periodic functions or replicated experiments, as well as measurements made on regular structures such as crystals and viruses. The Unit Cell Symmetry has been defined as an algebraic concept. In this section, symmetry will be expressed in terms of the geometry of the data space. The goal is to find a way of expressing the data space in terms of "building blocks", each block being identical to every other block. The data space in the case of a replicated experiment has already been described as the union of subspaces, Xi, each resulting from a single trial. Let xi and xj be corresponding points in the subspaces Xi and Xj (see Fig. 2.1)o The signal function s(x)

-30must have the same value on xi and xj, s(xi) = s(xj), so that the automorphisms of the symmetry group must be of the form, a:xi - xj. Since all of the spaces Xi are equivalent, the automorphism group consists of all the ways of shuffling these subspaces, leaving the structure of the subspaces unchanged. If there are m replications, the symmetry group is the full symmetryl on m items, the group of all permutations of the subscripts. Any one of these subspaces can be taken as a building block for the whole space in that X is composed of a series of these blocks which are completely equivalent by the symmetry. Any symmetric function on X is completely specified merely by defining its values on this building block, since the function must repeat these values on all the other blocks. In the replicated experiment situation there is a natural way of choosing the building block. In general, though, there is no reason why the natural block, shown in Fig. 2.2a, should be chosen rather than the chopped up block of Fig. 2.2b. Either selection is adequate for describing the symmetry and structure of the data space. There is no a priori way of choosing a single building block description out of all the possibilities. In addition to not being able to select a unique representation, two further difficulties arise in the building block concept. The first arises if the data space contains "singular" points under the symmetry group. 1 The word "symmetric" is used here in a technical sense (Hall, 1959). In this case, the symmetry group is the symmetry group.

-31(a) (b) Figure 2.2. Two Possible Building Block Configurations for the Data Space of Figure 2.1.

-32Definition 2.6. A point x of X is singular with respect to the symmetry group G if there exists in G an automorphism, a, not the identity, such that a:x -e x. All the points on a rotation axis or on a plane of reflection are singular points. The difficulty is caused by the fact that two building blocks must overlap at singular points. The topological structure of the building block at these singular points becomes very complex. The second difficulty is caused by the fact that the overall geometry of the building block can be rather different from that of the data space. This situation occurs when two different points on one building block are equivalent and will be illustrated later. To avoid these complications, the building block idea is dropped in favor of the mathematical idea of a quotient space, the unit cell space. This space may be considered to represent all the equivalent building blocks superimposed; it hasall the properties and only those properties that all the building blocks share. Definition 2.7. Two points, x and x', of X are equivalent under the symmetry group G, or simply equivalent, if there exists an automorphism in the symmetry group such that a:x - x'. Definition 2.8~ The unit cell space, Y, of X under the symmetry group G is the set of equivalence classes of X under G. The topology of Y is that induced from the topology of X. In topological terms, the space X forms a covering space for Y. Several topological structure results are of interest. If X is connected, then Y is not simply connected. Conversely, if Y is simply connected, then X is composed of disjoint subregions and

-33the building block type of structure of X is valid. Further topological structural relations will be considered in a later chapter. The unit cell space for the replicated experiment situation looks superficially the same as the "natural building block" of Fig. 2o2a. A point y in Y, though, represents an equivalence class of points of X, rather than a single point of X. The unit cell space corresponding to the periodic functions of a real variable is a circle whose circumference is the period. The data space is the real line. According to the building block concept, the data space is composed of a sequence of segments, each of length equal to the period, joined end to end (see Fig. 2.3a). But the end points of any segment are equivalent, so that topologically they may be considered joined, forming the circle. In this sense, the geometry of the building block differs from that of the data space. This case also illustrates the point that a connected data space gives rise to a nonsimply connected unit cell space. A more accurate description of the formation of the unit cell in the periodic function case is illustrated in Fig. 2o3b. The data space is drawn as a helix so that the distance around one turn is equal to the period, and equivalent points lie on a vertical lineo The unit cell is the set of these vertical lines, a cylinder. Since the vertical dimension has no significance, it is dropped and the set of lines becomes the set of points shown, the circle. B. The Noise Process The noise is assumed to be a sample function from a set of random variables, one for each point of the data set. At any particular

-34(a) equivalent points > —- x2 x2! 3 d-Jn y (b) Figure 2.3. Periodic Function Data Spaces. (a) "Building block" representation of periodic function data space. (b) Quotient space representation of periodic function data space and unit cell space.

-35point x of the data space, the noise value n(x) is a random variable taken from a probability distribution which depends on x and possibly also depends on the values of the noise at other points. It is assumed that the expected value of n(x), that is the mean, is zero at all points and that the probability distribution is independent of the signal function. The noise must have finite second moment at all points, but it is not assumed to be stationary, either in the full or the wide sense, nor are ergodic properties assumed. The assumption concerning the independence of signal and noise is necessary to develop the processing procedure without undue complications. It is recognized that non-linearities in the channel may cause larger noise amplitudes at small signal amplitudes than at large, or vice versa. In a photographic system the noise may well depend on the intensity of the signal in that two different noise mechanisms may apply. At small signal values the noise will be a background of dark granules, hence the mean is positive, while at high signal levels it is "drop-out", with a negative mean. This situation will apply to all data which are restricted in range -- at extreme values of signal, the noise will have a non-zero mean tending to bring the data back to the center of the range. In electron microscopy, a large noise component is the shot effect of individual electrons in the electron beam, so the noise amplitude again depends on the strength of the beam, hence the signal. The prior knowledge that is assumed about the noise, in addition to the assumptions of independence from the signal and zero mean, is the covariance function, R(xl,x2) = E [n(xl) n*(x2)], where

-36the asterisk denotes complex conjugation. The expected value is taken across the probability distribution, not over the data space. The covariance function is assumed, for convenience, to be strictly positive definite. This means that no linear combination of noise samples will cancel to zero, not only in the mean but for every sample. Assumptions on the noise. The noise is a sample function from a set of random variables with zero mean, finite second moment, independent of the signal. The covariance function is assumed known and strictly positive definite. In signal theory, great emphasis is placed on stationary or wide sense stationary noise processes. If the noise is stationary, the probability distribution underlying the noise is independent of location in the data space -- the noise has the same structure at all points. For wide sense stationarity, only the first two moments need be stationary, other noise parameters may vary at different locations in the data space. In particular, either type of stationarity implies that R(xl,xl) = R(x2,x2) for any xl and x2 in X. It is not possible to express completely the notion of either type of stationarity in the data space without some additional algebraic structure. Either condition implies that R(xl,x2)= R(x3,x4) if X2 - x1 = X4 - X3. In the general data spaces so far considered, it is not possible to subtract two points of the data space. It is possible, though, to define the notion of periodic noise: Definition 2.9. The noise process is symmetric with respect to the symmetry group G, or more simply symmetric, if R(xl,x2) = R(xla,x2a) for all a in G.

-37The noise process is symmetric if it has the same structure at all equivalent points. Note that this is not the same as saying that each sample function is symmetric. It will be shown later that symmetry is a weaker condition than weak sense stationarity, in that if the noise is weak sense stationary it must necessarily be symmetric. C. Data Processing The Function Space The basis for the data processing is the construction of an abstract space, the function space. The possible data functions, the signal functions, and the noise sample functions are all included in this space as separate points. The space must also include all linear combinations of points within it: if f(x) and g(x) are represented as points in the function space, then the function af(x) + pg(x) must also be a point, where a and P are complex numbers. It is also necessary that limits of Cauchy sequences of functions exist in the function space. Integration is then possible, so that the function: f'(x) = h(x,xt) f(x') dx' is a point in the space. The functions in the space are assumed to have the properties necessary for convergence of all integrals and limits. This assumption is usually expressed by limiting the functions to those that are integrable square, or L2. No great emphasis is placed on these mathematical niceties since the data in practice must satisfy the assumptions.

-38The data is bounded, of bounded variation, and at least piecewise continuous, being derived from physically occurring variables. The data space is actually compact in any real situation, since it must be a bounded region of a finite-dimensional space. The resolution of observation is limited so that data occurring at one point is related to the data occurring at nearby points. It is possible to make only a finite number of independent observations on the data. The actual number of these "degrees of freedom" in the data is not important, it is sufficient that the number be countable for all the ensuing mathematical operations to be valid without further restrictions. Some geometric structure is needed in this function space. First, the notion of distance is required, so that it is possible to say how similar or dissimilar two functions are. Furthermore, the notion of an inner product is needed. With an inner product it is possible to say, not only that two functions are so far apart, but whether the dissimilarity is due to differences in the nature of the functions or in their amplitudes. Mathematically, the inner product provides for the notion of direction in the space: two points f and g are orthogonal if their inner product is zeroo The inner product between two functions f(x) and g(x), written2 (f(x),g(x)), is a complex valued bilinear, skew-symmetric, function on the function space. That is, it must satisfy the conditions: (afl + Pf2,g) = c(fl,g) + P(f2,g), where a and P are complex numbers,and (f,g) = (g,f)*, A "norm", written2 If(x) l, can be defined from the inner 2 When several different spaces are being considered, a subscript is used to identify which inner product or norm is meant; i.e., (f,g), or lffll,

-39product for all functions in the space: jlf(x)jl2 = (f(x),f(x)). The norm must be a positive definite quadratic form: IIf(x)lI > 0, with equality holding if and only if f(x) = O. The distance between two functions can now be defined as the norm of their difference: Ilf(:x) - g(x)ll. Definition 2.10. The function space,A, is a complete linear inner product space; a Hilbert space. The points in # are functions defined on the data space. In the processing problem, we are given an observed data function which is the sum of a particular, but unknown, signal and noise. The signal is a symmetric function; the noise a sample function from the random process. We are to find the "best" estimate of the actual signal. To optimize the estimate, a quantitative error value is assigned to every possible estimate, the best being the one with the smallest mean square error value. The idea now is to choose the function space so that the distance between the signal function and the signal estimate is the desired quantitative error value. The problem then becomes a geometric one: given a data function point, find that point of the function space which lies closest to the signal point and which can be found by knowing only the data. The notion of the distance between two functions is interpreted to mean the ease with which the two deterministic functions can be resolved in the presence of the noise. In the situation illustrated in Fig. 2.4, the difference between f and gl appears to be smaller than the difference between f and g2. Thus f should be more easily confused with gl than with g2. However the discrepancy,

-40f 1 g2 Figure 2.4. Resolving Signals in the Presence of Noise.

-41or error, between f and gl occurs at a place where the noise is small, while the error between f and g2 occurs where the noise is large. Thus f and gl are more easily resolved than f and g2. The notion of distance, hence of inner product, must depend on the magnitude of the noise at each point. If the noise has the same magnitude at each point, and if the noise at one point is uncorrelated with the noise at any other point, then the noise is said to be white. The noise amplitude may be normalized to have unit variance at each point, R(x,x) = 1, without loss of generality. With white noise, the distance between two functions depends only on the difference between them, not on where in the data space the difference occurs. The square of the distance can be taken to be: lf(x) - g(x)[ = E j{f(x) - g(x)2 } dx corresponding to the inner product: (2.1) (f'g) = f E [g*(x) f(x)] dx. The subscript,W, indicates that this inner product is derived from the white noise process. The function space consisting of the functions on X with this inner product will be denoted W. Note that this inner product definition, without the expected value operation, is the usual one for L. function spaces. Two mathematical concepts will be useful for the further development of the function space: the notion of a coordinate system or a "basis", and the "associated linear operator" of an inner product. A set of linearly independent points, bi, in a space is said

-42to form a basis for that space if any point, p, in the space can be written as a linear combination of the basis elements: (2.2) p = Zi ai bi. The vector from the origin to bi can be considered to be a coordinate axis, with ai the coordinate of p along this axis. The data space X can be thought of informally as forming a "span" for Y. Any point f in 7 is a function f(x); the coordinate in the xi direction being the value of the function f(xi). The actual span element corresponding to xi is the function: (2.3) fi(x) {, x xi The span is reduced to a basis by considering only linearly independent points of the form given in Eq. 2.3. A basis is called orthonormal if: (2.4) (b i'bj) = 1, i = j Interpreting Eq. 2.4 geometrically, a basis is orthonormal if two different coordinate axes are perpendicular (orthogonal) to each other and if the unit length along each axis (the norm of the basis element) is one. If the coordinate system is orthonormal, the inner product is that of Eq. 2.1. The white noise space,j, has an orthonormal coordinate system. The second concept is that of the associated linear operator, corresponding to an inner product (cf. Friedrichs, 1960). The functions f(x) and g(x) are points, f and g, in the white noise space as well as the function space ". The idea of the associated operator

-43states that the inner product of * can be expressed in terms of a bounded, self-adjoint, linear operator in'W: (2.5) (f,g) =(f,Hg) = (Hf,g)a In terms of integral operators, H becomes H(x,x'): (2.6) (g) = E [g* (x) H(x, x' ) f(x')] dx dx In terms of a coordinate system, the inner product of two basis elements is (bi, bj) = H(xj,xi). The coordinate system is orthonormal if and only if H is the identity operator3 The remainder of this section is concerned with deriving the form of the inner product for''. It will be shown that, if the covariance function, R(x,x'), is considered to be a linear operator, then -1 the operator H is the inverse of R: H = R. This derivation is easily provided if a formulation for the space O can be found in which the noise becomes white. This formulation will be given by a transformation of variables, a change to an orthonormal coordinate system. Assume for the moment that such a transformation, T, is found so that n(x) is tranformed into a white noise process N(t): N(t) = T(t,x) n(x) dx and E[N(t) N*(t')] = 1, t = t' This result is valid even for spaces of uncountable dimension. The delta function is really the integral operator representation of the identity operator.

-44This same transformation is applied to the deterministic functions: F(t) = T(t,x) f(x) dx. The functions F(t) form a new function space,'. Since the noise in N' is white, the inner product is given by Eq. 2.1: (F,G)W, = (F,G)W =|E [G*(t) F(t)] dt. The inner product in' is now defined to make it consistent with that in at: (2.7) (f,g)* =(F,G), Writing Tad for the adjoint of T: (f,g)4 = (FG),w = (Tf,Tg) = (f, TaTg), But the associated operator notation gives (fg)4 = (f,Hg)l,, so that the desired inner product is obtained by setting: (2.8) H = Tad T, or H(x,x') = T*(t,x) T(t,x') dt. Since the noise is assumed to be strictly positive definite, R(x,x) is bounded away from zero and the operator T is bounded and has an inverse: n(x) = fT-(x,t) N(t) dt, n(x') = T-l(x,t) N(t) dt, so that (2.9) R(x,x') = E [n(x) n*(x')] = T-l(x,t) T*-l1(x',t) dt.

-45Interpreting Eq. 2.9 as an equation in linear operators, R = T-1(Tad)-l. Thus (2.10) R-1 = Tad T = H Q.E.D. Finding the transformation to represent the noise in terms of a white noise process is a complex problem. Cramer (1951) derives a condition very similar to Eq. 2.9 using the assumption that R(x,x') be of bounded variation in every finite domain. It should be noted that a white noise representation is always possible if the data is defined on a countable set of values (Loeve, 1960). The assumption of strict positive definiteness on R guarantees the existence of R-1 as an operator, hence as an inner product measure (Friedrichs, 1960; Thm. 31.1). For the general function space, the inner product of Eq. 2.5 using H = R-1 can be simply posited, the formulation being justified by the actual construction of R on any subspace of countable dimension. The function space ~ with inner product measure given by R-1 is very closely related to the idea of the reproducing kernel Hilbert space introduced by Aronszajn (1950) and applied to signal processing by Parzen (1961, 1962; cf. Capon, 1965). The noise process forms a Hilbert space (Karhunen, 1947), 7, where: (f,g) = (f,Rg = g*(x) R(x, x) f(x) dx dx' An operator B = B(x,x') is a reproducing kernel if, when considered as a function of its first variable, it satisfies: (f(x), B(x,x')) = f(x'); or (f,B.l) = f.?1?I~~~~~~~"

-46Using the associated linear operator, R: (f,B.1) = (f,RB.1) = f. Thus RB = I, the identity operator, and the reproducing kernel is R-. Conversely, the space for which R is the reproducing kernel must have R -1 for the inner product operator. The space O that we shall use is this space with reproducing kernel R. Using this viewpoint, the process of changing inner products replaces the process of finding a transformation between two spaces. Still a different approach to finding the inner product on is the "classical" technique: finding the spectral representation of the operator, R. Friedrichs (1960) shows that every bounded Hermitian operator, even on a space of uncountable dimension, has a spectral representation. Since R is positive definite, all its eigenvalues are positive. The spectral representation is worked out in detail when X is a finite line segment by Rosenblatt (1962, Ch. VIII). In general it involves finding the eigenvalues and the eigenfunctions of the covariance function; that is, solving: Sx(x) = R(x,x') 0(x ) dx. The expression of the noise as a white noise process can be found from the eigenfunction expansion (Loeve, 1960). The transformation T corresponds to the operator whose eigenvalues are the reciprocals of the square roots of the eigenvalues of R. The existence of a reproducing kernel means that the eigenfunctions sptan the an whole space %. In finite dimensional terms, it means the rank of the

covariance matrix R is equal to the dimensionality of the function space. In this case, R is non-singular and has an inverse, R_. It has thus been shown that R-1 exists and that the inner -1 product using R as the associated operator is the appropriate one for %A. The transformation T need not actually be found. Yet the idea of the space %.' in which the inner product is the simple one of Eq. 2.1, in which the noise is white, and in which the coordinate system is orthonormal, has sufficient intuitive appeal that it will be retained in the following discussions. The Signal Subspace The ability to process the data is based on the fact that the symmetric functions lie in a subspace of the space of all functions. Furthermore, a coordinate system can be given to' in which the symmetric functions are isolated from the non-symmetric ones. Naturally, the distance notion in a' must be preserved in this new coordinate system. The symmetric functions clearly form a subspace since linear combinations and limits of sequences of symmetric functions are also symmetric. More precisely, the symmetric function subspace,, is the function space of functions defined on Y, the unit cell space. The inner product on j is defined as: (2.11) (sls2) = s2*(y) Rs-l(y,y') s1(y') dy dy' where Rs1 (y,y') Zi j R-l(xaix' ) where y is the equivalence class of x, y' of x'. The summations in -1 the definition of Rs are taken over all the automorphisms in the

symmetry group. Definition 2.11. The symmetric function subspace, $, is the function space of functions defined on Y, with inner product given by Eq. 2.11. The symmetric function space is defined as a space separate from. A symmetric function can be considered to be either a function on Y or a special type of function on X. In geometric terms, the space J may be considered to be a separate space or a subspace of *: the space J may be embedded in,. The inner product for 4 has been defined to make it consistent with this embedding: (sl(x),s2(x))~ = (sl(Y),s2(y))J, or s2(x) R-l(x,x' ) sl(x' ) dx dx' = 4(y) RS1(y,y') sl(y') dy dy' This equality is easily verified, since the integration measure, as well as sl and s2, are invariant under the automorphisms of the symmetry group. A method is now needed to determine which points of3 are in the subspace p o This determination is made by extending the notion of an automorphism of X to a transformation of 4. It is recalled that, given a function f(x), the automorphism, a, produced a new function f'(x) = f(xa). The automorphism, a, can thus be considered to be a linear operator which transforms the point f in * to the new point f' = a fo This operator is an automorphism of* in the strict mathematical sense, being a 1:1 continuous mapping of S onto itself which preserves all its mathematical structure, A function s(x) is symmetric if and only if it is invariant under all the automorphisms of the

-49symmetry group. Thus a point s in P is in the symmetric function subspace if and only if it is invariant under all the operators, a, in the symmetry group. The symmetric function subspace,, considered as a subspace off, is the set of all points of O which are singular points under all the operators, a, in the symmetry group. In order to simplify the processing procedure, it is convenient to have a coordinatization of the function space in which the symmetric functions are expressed very simply. As an example, consider a twodimensional function space4 -- the x-y plane. A function in this space will be described by its coordinatesj5 (a,p). Suppose the symmetric function subspace is the set of all points with equal coordinates, (a,a). The subspace is thus the line described by the equation y = x. If the plane is rotated through 45~, a new coordinate system x' - y' is found, where the x' axis coincides with the symmetric function subspace. The symmetric function now has coordinates (a',O). In the new coordinate system, a point is in the symmetric function subspace if and only if its second coordinate is zero. Returning to the general case, a new coordinate system is to be found in which the function f(x) becomes F(z). The set of points, z, forms the new basis for this space V" of the functions F. The idea is to choose the coordinates z in such a way that F(z) is in the signal subspace if and only if F(z) = 0 for certain values of z. That is, there is a set, Zs, such that S(z) is a symmetric function if and only if S(z) = O for z ~ Zs. This space will be considered in detail in Chapter III. This notation should not be confused with the inner product.

-50Such a coordinate system does exist since the points in Y form a basis for the subspace J. The orthogonal complement of, in W is the space r of all points t that satisfy (s,t), = O for all s in. - A basis for 9' combined with the basis for di forms a basis for vW with the desired property. The basis can be constructed by a spectral representation technique. Each transformation, a, is a permutation of the points of X, the basis elements of % o Therefore they are unitary transformations, with all their eigenvalues having absolute value one. The invariant functions for each transformation are those eigenfunctions with eigenvalue one. The basis for the symmetric function subspace is taken from those functions that have eigenvalue one for all the transformations. The basis for the orthogonal complement is taken from those functions that do not have eigenvalue one for any transformation except the identity. Summary of Function Spaces Several types of spaces have so far been introduced. It will be useful to survey the spaces and their relations in order to provide a fuller understanding and motivation for the processing procedures. The data is defined on the data space X. The data is composed of a structured signal and a sample function of a noise process. The signal is a function defined on the unit cell space Y, which may be considered to be a representative structural unit of X. The space X is composed of equivalent building blocks, each identical to Y (Fig. 2.5a), though these blocks may themselves have some interrelations.

-53Finally, equations are derived which involve only the automorphisms of the symmetry group and the covariance function R, and which have as their unique solution the optimum processing operation. These equations do not involve either the intermediate transformations or the inverse covariance function R-1. The space+. -- A point in', written f, represents the function f(x). The covariance function is R(x,x') = E[n(x) n*(x')]. The inner product operator is the inverse of the covariance operator: (fg)4 = (f,R-lg) = E[g*(x) R-l(x,x') f(x)] dx dx Symmetric functions satisfy: s = a s, or s(x) = s(xa), for all a in G. The space 4'. -- A point in', written F, represents the function F(t): F = T f, or F(t) = T(t,x) f(x) dx. The covariance in O' is R'(t,t') = E[N(t) N*(t')]. In operator terms, the covariance is transformed as: R' = T R Tad. T is chosen to make R' = I, giving the inner product operator R'- I: (F,G)c, =s E[G*(t) F(t)] dr. Symmetric functions satisfy S = T a T-1 S:

-54S = A S, where A = T a T-1 for all A in G or S(t) = A(t,t' ) S(t' ) dt', where A(t,t) = T(t,) T-l(a,') dx. The space ". -- A point in f", written F, represents the function F(z): F = U F, or F(z) = U(z,t) F(t) dt. U is a unitary transformation: Uad = U-1. The covariance in V" is R" = U R' Uad = I since R' = I. The inner product operator is R = I: (F,), = E[G*(z) ] dz Symmetric functions satisfy S = U A U-1 S: _-. - for all A in G. S =AS where A = U A U =UT a T U, forallAinG. U is chosen so that the A operator takes the form: A F = A(z) F(z) Since the operator, at is unitary, 2A(z)j= 1. Let Zi be the set of z such that Ai(z) is actually 1: Zi = {zIAi(z) = 1l Define Zs as the set of z such that all the Ai are 1: zs=nilzi,

-55taking the intersection over all the elements of the symmetry group. Then the symmetric functions are exactly those functions that satisfy: S(z) = O for z t Zs. D. The Optimization Data Processing There is now enough mathematical structure to consider the data processing operation itself. The manipulations of the data function that are considered here are restricted to linear, continuous operations. If the data is d(x), the signal estimate is the result of applying a linear operation, L, on the data: L[d(x)]. The linearity and continuity conditions provide that: L [ad1 + Pd2] = aL[dl] + PL[d2], where a and a are complex numbers L [ h(x,x') d(x') dx'] = h(x,x') L[d(x')] dx'. Since the transformations between the various function spaces are linear, the processing manipulation can be expressed equally well in all of them: L' operates in OW' on D(t), L" operates in J." on D(z). Specifying the manipulation in any one space completely determines it in all of them. It must be remembered that, in the most general processing problem, the transformation T may not be available. The processing solution must be expressed solely in terms of the'. space, and any results derived using concepts from the other spaces must also be valid completely within'v.

-56The error function between the signal, s, and the signal estimate, L[d], is s-L[d]. The error magnitude is the norm of the error function, ils-L[d]jlj. The optimal processing operation Lopt must minimize the magnitude of this error over the class of all linear operations. Definition 2.12. The optimum data processing manipulation, Lopt, is the linear, continuous operator that satisfies: lls-Lopt[d]l14 = inf lls-L[d]1~, taking the infimum over all linear, continuous operations, L. Write' = Lopt[d], and A = lss-. It is apparent that the signal estimate s must be in the signal subspace. For suppose the contrary. Then s can be separated into two components, one in d and one in a, the orthogonal complement of J s= sl + t, where sl E and (t,f) = O for all f in. Then: Al = 1_s_2 = - tel2 = lls_-ll2 +lls-s- t since (s-sl,t) = O. The error may be decreased by setting t = 0, contrary to the assumption that s is the optimal estimate. This condition on the optimal processing operator is called A the symmetry condition, since s must be a symmetric function. Processing condition (1): The optimal processing operator Lopt is symmetric: Lopt [f] c J for all f in W. This condition can also be expressed in the other spaces: Lopt[F] c j', Lopt [F_] E The only prior knowledge that we have concerning the signal is that it. is located in the symmetric function subspace. The optimization

-57must tbherefore proceed for all possible s in and the optimum operator must be independent of the signaLo As a consequence, the signal subspace poirnts are all invariant under the processing operatorI Lopt Is] = s for all s in o To prove this statement9 first note that, since Lopt is irinear, Lopt[d Lopt[S] + Lopt[n:lo Thus: A2 ls~ lJ2: Ils-Lopt[S ] - Lopt[n]112 (2o12) A2 - [ls-Lopt[S.l12 + IlLopt[n]i2 + (s-Lopt[s),LOpt[ni + (Lopt[n] s-Lopt[s]). Now consider the first inner product terma -(sL-OptsJ2optt[ni B [ Lopt[n(x)] R (xlx ) {s(x )-Lopt[s(x i dx dx The signal is a deterministic function, as is Lopt[Si. Since the noise is assumed uncorrelated with the signal, it is also uncorrelated with any inear transformation of the signal, Thus the expected value operation applies only to the noise term. But the noise has been. assumed to have zero mean. Thus the whole inner product is zero. The other inner product is the complex conjugate of this one, so it, too, is zero. Equation 2012 simplifies to~ (201 3) 2 -lo2 I-opt[s] 12 + II1opt 1g ll2 To make the minimization of error independent of the signal, set: (29 4) Lopt[S] = s, for all s in 4 QE.D. Since the result of processing any data is a syrrnetric fun%:C: c n7ko furtoer EimprovementJ can be obtained by passilng the data through t'he

-58processing scheme twice. The second processing manipulation will give the same result as the first: Lopt[f] - Lopt Lopt[f]] Lopt[f], or 2 2 Lopt Lopt Any operator that satisfies L = L is called idempotent, and for every idempotent operator there is an invariant subspace. To show that the invariant subspace for the optimal processing operator is the symmetric function subspace, the condition given by Eq. 2o14 will be called idempotency on _. Processing condition (2): The optimal processing operator Lopt is idempotent on ~: Lopt[s]= s, for all s in j. Similar idempotency conditions apply to the other function spaces: Lopt[S]= S, for all S in'; Lopt[Sj= S, for all S in ". Two consequences of idempotency are of interest. First, Eqo 2.13 reduces to: (2.15) A = lILopt[n]lI The processing error is simply the mean amplitude of the result of processing pure noise. The final processing condition will be derived from this result. As a second consequence, the signal estimate is unbiased: E[S Lopt[s + Lopt[nJ Lopt[s] + E[Lopt[n]] (2o16) E[s] = so The condition of idempotency requires Lopt to have only 1 and 0 as eigenvalues. Idempotency on S requires the eigenfunctions of eigenvalue 1 to be the symmetric functions. The eigenfunctions of

-59eigenvalue 0 form a subspace, t, which is annihilated by Lopt: Definition 2.13. The space $ is the subspace of 5 which is annihilated by Lopt: Lopt[f] = 0, for all f in L. The orthogonal complement oft in. is denoted - o The subspaces' and ", which are annihilated by Lpt and L" and ~ opt opt' respectively, are the analogs of ~ in 4' and ". Two facts about L will be of use later: (1) the spaces and g span 4, and (2), and ~ are disjoint. The first follows since any point f in $ can be written as a sum of a point, Lopt[fl, in 4 and a point, f - Lopt[f], in $. The second follows since any point f simultaneously in - and L must satisfy Lopt[f] = f and Lopt[f] = 0O hence f = 0. Noise Components in Subspaces The first two processing conditions are independent of the noise process. The noise covariance enters by means of Eq. 2.15, which states that the processing error amplitude is the mean processed noise amplitude. Since the norm of any function is preserved by the transformations: (215') A = IJLoppt[n]J = lLopt[N]I, = [ILopt[N]IIl,, The optimal processing operation has been shown to reduce the whole function space to a subspace. Some results concerning noise amplitudes in subspaces will be derived in this section. The main idea to be developed is that the mean noise amplitude in any subspace depends only on the dimension of the subspace in comparison with the

-60dimension of the whole function space. The mean amplitude of the pure noise is one, in any function space: (2.17) njnii = INi, = =N 1. This need only be demonstrated in one of the spaces, since the norm is preserved by the transformations: lnl =x E[n*(x) R-l(x,x') n(x')] dx dx''~4. x R(x',x) R-l(xx') dx dx' = 1 dx= 1 since the measur the measure of the data space is 1. If Z is the "data space" for the'" function space, that is the set of points z, |IN|,, 1 =Z E[N*(z) N(z)] dz = R(z,z) d1 dz = z 1 since the covariance function in 4 " is the identity. The measure of Z is 1. Now suppose that " has finite dimension, k. Then Z is a finite set and the integrals become finite summations: fINI,2, = (1/k) Zi R(zi,zi) -= 1. The weighting factor (1/k) normalizes the measure of Z to one. Now consider the subspace of dimension 3, formed by z2, z5, and z7, say. The noise on this subspace will have three components:

and3 = (noise amplitude will be:)) and the mean square noise amplitude will be: IN_311'2 = (1/k) [R"(z2,z2) + R"(z5,z5) + R"(z7,z7)] = 3/k since the covariance R"(z,z) = 1 for all z. It is clear that the mean square noise amplitude on any subspace of dimension 3 will also be 3/k. The noise on any subspace of arbitrary dimension will be composed of several components, one for each dimension. The mean square amplitude of each of these components is equal to l/k, since the noise is uniform in every direction. Taking these components to be mutually orthogonal, the Pythagorean theorem says that the total mean square noise amplitude is the sum of the mean square amplitudes of the components, thus for any subspace of dimension j, IINj9IN,, = j/ko For spaces of infinite dimension, a simple counting procedure will not work, and this result must be modified. Still referring to the space +tt consider the noise N in the subspace of functions which vanish outside Z1: F(z) = 0 for z d Z1. The set Z1, which we have been calling a basis for the subspace, is the support for the subspace. INZ1' /Z R"(zz) dz = 1 dz = [1(Z1) where [(Z1) is the measure of Z1. The mean square noise is given by measuring the support of the subspace, rather than by counting basis elements. It should be recalled that the measure of the support of the whole function space is one. The concept of the measure of the

-62support of a subspace replaces the idea of the number of basis elements for infinite dimensional spaces. Thus, if two subspaces are supported by sets of equal measure, the mean noise amplitude in each subspace will be equal. The key to finding the mean square noise amplitude in a subspace is finding an orthonormal basis for the subspace, then counting the basis or measuring its support. Orthonormality is required in order for each component to contribute equally and independently to the total. The inner product function in L has been defined to make the noise in orthogonal directions uncorrelated and of equal amplitude. By fiat, the notion of dimension in 0j has been equated with mean square noise amplitude. Two subspaces are of equal dimension if they have equal noise amplitude. If an orthonormal basis for a subspace is found, so that the support of the functions in it can be expressed, the dimensionality of the subspace equals the measure of the support. For our own purposes, we need not actually count or measure the dimension of any subspace in 4. We need only compare certain subspaces to verify that they are of equal dimension and thus have equal mean noise. As a convenient notation, write p(Jd) for this "dimension" or measure of J. The first result we need follows from the additivity of measure or dimension: Lemma 2.1. If 1 and4 2 are disjoint subspaces that span,%, then ($ 1 ) + P(J 2) = (IV ) = 1.

-63The second result follows from the generalized Pythagorean Theorem: orthogonal components of noise combine by adding the mean square amplitudes. Lemma 2~2, Suppose 4( has two orthogonal decompositions: W= 1 +- l = 2 + 2' where J 1. -1 and p2J. 2 Then write n =n + n' nj= + nar where the subscript indicates to which space that noise component belongs. Suppose further that () =) Then lIn(, 11 = Minimizing the Error The space & can be decomposed into disjoint subspaces in three ways. We have already used the decomposition into the symmetric function subspace J and its orthogonal complement o. The discussion following Definition 2o13 gives a decomposition into the spaces, and t.o Finally, ~ can also be decomposed into and its orthogonal complement t. These three decompositions are illustrated in Fig. 2.6, using the "i space in which orthogonal axes can be drawn geometrically perpendicular. The algebraic decompositions can be written: (21) n = nj+ n N =N fT + N (2) n - n + n N =N + LN (3) n =.~inlt + no _vANd if The subscript indicates to which subspace that component of the noise belongs. The subscript 4L indicates that the component is in the space ~ and that it is obtained using the Lopt operator; the subscript 4o

N -tN i/.....',ff (b) N,, (c) Figure" O "t / \ /A\ N Figure 2.6. Three Ways of Representing the Noise in Terms of Components.

-65indicates that the decomposition is orthogonal. Apply Lemma 2.1: (2.19) P( ) + P( ) p= 1( ) + [($ )= (4 ) + 1(.) = k0(j") + (g")= + +((" ") =. =I1 (2.20) = k()); 1) It then follows from Lemma 2.2: (2.21) Injoll l 11 4 11; ||N=J 1, = 11N i it Furthermore, applying the operator Lopt to the noise as expressed in the second decomposition of Eq. 2.18: Lopt[n Lopt[L] + Lopt[n~ i since L annihilates _ and leaves J invariant. Thus: (2.22) A = lln L J1 INI,,4 Now referring to Fig. 2,7, it is obvious that IIN IIl, if IIN II 1 since a leg of a right triangle cannot be longer than the hypotenuse. Applying Eqo 2.21 and 2.22:,IN< -N,,/L -o 9~( - ~ Ah

N N II / N/ N N N*v Figure 2.7. Relationship of the Noise Components in Two Decompositions.

-67But A is required to be the smallest error, so the equality must hold and IN4,,11 = IIN4,,Jl. This can be accomplished by setting J " = ". The optimal processing operator must project the data point orthogonally to the symmetric function subspace. Essentially the same argument can be carried out algebraically in the r space by noting: n — = n - n- n L - n i from Eq. 2.18. Since (n-ni) E c and (no - n L) C (n-na on, -n ) = 0. Thus A2 - 1= = In - n L 12 2 2o 2 2 2 = inl o + Ino -nLo Since In~ 0 1-n L| > 0 and, from Eq. 2.21, |1n 1I = lnJ |II || nJ o11< A = lln4 L1| Again, A is not the minimum error unless in, LI = n~ |11|, which is accomplished by setting L = T. Processing condition (3): The optimal processing operator Lopt is obtained only if ~ = Wo The Projection Operation It has been shown in the previous section that the optimal

operator Lopt transforms the data orthogonally onto the signal subspace. The best estimate of the signal is that point in 4 that lies at the foot of the perpendicular from the data point to O. Of course, in the space s this "perpendicular" is not geometrically perpendicular, but is orthogonal in the sense of the inner product. Thus the processing operation depends on the noise covariance function. The processing problem has been expressed so that the signal estimate is that point in the symmetric function subspace which lies closest to the actual signal point. It turns out that the signal estimate is also that point in the symmetric function subspace which lies closest to the observed data point. The optimal processing operator is the operator that gives the $ component in the orthogonal decomposition of the data into the and Sr subspaces. This operator is called the "projector" or "projection operator" from W to. The optimal processing operator shall henceforth be written P (or P' or P") to indicate that it is this projection operator. The existence and uniqueness of this operator is guaranteed by the completeness of the function space (see Murray, 1941). This projection operator satisfies the conditions of symmetry, since it projects onto the symmetric function subspace, and idempotency, since it is idempotent on its range. The optimal processing solution. The optimal processing operator is the projection operator, P, which projects the function space * onto the symmetric function subspace J e This operator may be explicitly constructed if the transformations T and U are found In the coordinate system of ~ ", the

-69symmetric function subspace has the simple representation: S C "i if and only if S(z) = 0 for z I Zs The projection operator in % " is found by inspection: it sets to zero all values of the data function D(z) which correspond to z I Zs. In this coordinate system, P" has its spectral representation and is represented by a multiplicative operation, not the usual integral: (2.23) () = P" [D(z)] P( Dz where P(z) = 0 z Zs 1, Z C ZS The manipulation may be expressed on the original data function d(x) by transforming through *V to'N", obtaining D(z); performing the projection using P" to get S(z); then transforming back through O' A to'%, obtaining s(x): A (2.24) s = P[d], where P = T- U-1 " U T; S(x) = P(x,x') d(x') dxt, where P(x,x') = T L(s,t)U - (t,z)LP(zU(zt')T(tx' d' dt' dz dt An alternate procedure uses some properties of the projection operator. The projection operator to 4 is idempotent and self-adjoint: p2 = p and (f,Pg)Y = (Pf,g), for all f,g in. Conversely, any idempotent, self-adjoint operator is a projection. Furthermore, any projection that has values in 4 (is symmetric) and holds J invariant (is idempotent on 4 ) must be the desired optimal operator.

-70Lemma 2. 3 Any self-adjoint, symmetric operator that is idempotent on ~ is the optimal processing operator, P, the projection from $ to J. To prove this statement, let L be a self-adjoint, symmetric operator, idempotent on. Any point f in ~ has the unique decomposition, f = s + t, with s Ec, t C T-. Then for any s' in: (s',Lt) = (Ls',t) by self-adjointness of L (s',t) by idempotency on = 0 since s' Ec and t c-. But L t c ~ by symmetry of L. Thus Lt = 0 and: L f = L s + L t = L s = s. But the projection operator also satisfies P f = s so that P f = L f, for all f in y. Thus L = P, Q.E.D. The concept of self-adjointness is expressed in terms of the inner product in, hence depends on the inner product associated operator R-l (fPg)~ = (f,R-1Pg)O (Pf, g), = (Pf,R-lg) = (f,padR-lg) In operator terminology, self-adjointness can be expressed: pad R-1 R-1 p Applying the R operator to the left and right of Eq. 2.25 gives: (2.26) R pad R

-71The three conditions of symmetry, idempotency on $, and selfadjointness (as expressed in Eq. 2.26) will uniquely determine the operator P. The processing operator P which minimizes the mean square error between signal and signal estimate is given uniquely by the three conditions: (1) symmetry: P f E J, for all f in s (2) idempotency on j: P s = s, for all s in (3) self-adjointness: P R = R pad If the noise process can be represented by a transformation of a white noise process, the optimal processing operator is given explicitly by Eq. 2.24.

CHAPTER III FINITE DATA SPACES A. The Processing Problem The Data Space and Symmetry In this chapter, the data is assumed to consist of a finite set of N numbers. The data space isa set of points, xi, i = 1,..,N, with the discrete topology. The i-th data value is the value of the data function evaluated at xi: di = d(xi). Since all the function spaces and all the processing operations are linear, matrix algebra notation and the ideas of finite dimensional vector spaces are natural. Functions defined on X will be written as column vectors: f(xi)) (3.1) f = (fi) = (f(xi)) = f(xN)J The automorphisms of the symmetry are shufflings, or permutations, of the points xi and can be represented as permutation matrices. Each row and each column of a permutation matrix contains exactly one entry with value 1, all other entries are 0. The permutation matrix A corresponding to the automorphism a:xi - xj has a 1 in the j-i position. The automorphism, a, transforms the function f(x) to the new function f'(x) = f(xa). In matrix notation, the permutation matrix A transforms the vector f to the new vector f' = A f, the operation being performed by matrix multiplication. A vector, s, is symmetric if and only if:1 1 This concept is meaningful since there always exist symmetric functions, namely those functions all of whose components are equal. -72

-73(3.2) s = A s, for all A in G. The following notation is adopted for convenience. If xi is equivalent to xj under the symmetry group G, write i - j. The equivalence class of xj is written yj and contains j members. Let v be the number of points in the unit cell space. Then the total v number of points in the X space is N = Z pi. i=l This type of data space can arise from a situation in which an experiment is replicated p times, each replication producing v data values. The signal v-tuple obtained at each trial should remain the same from trial to trial, only the noise should vary. The unit cell space has v points and each equivalence class has the same number, ~, of points, there being pv data points in all. An example of such a space is the statistical problem of estimating the mean of a multivariate, v-dimensional probability distribution given p samples. The first moments, the means, correspond to the unknown signal, while the central second moments, the covarianceS$ are known. Of course when the samples are uncorrelated, the best estimate of the population mean must be the sample mean. The present formulation includes, not only the case of correlated samples, but also the case in which the data are derived by correlated samples from different probability distributions. Of particular interest, is the case in which all the points are equivalent, the case v = 1. The data space has M points and the symmetry group is the "full symmetry group" on p items, the group of all p x p permutation matrices.

The Function Space and the Noise The function space,, will be a vector space with as many dimensions as there are points in X. Points in # are column vectors and will be denoted by lower case letters: f represents the vector (fi) with components fi = f(xi). Linear transformations can be written as finite sums: (3.3) f (xi) = ZS hij f(xj) Using matrix notation, let H = (hij): (3.4) f' = H f. The prior knowledge of the noise is given by the covariance function R(xi,xj) = E [n(xi) n*(xj)]. This, too, can be written in matrix notation, setting R = (rij), where ri.. = R(xixj) If n is 2 the noise sample column vector: (3.5) R = E[n n*] The noise process is known only by the first moments, which are assumed to be zero, and the second moments, which are given by R. The noise probability distribution thus may be approximated by the "ellipse of concentration" (Cramer, 1946). A probability distribution which is uniform inside this ellipse and zero outside has the same first and second moments as the given noise process. The boundary of the ellipse is the locus of points f that satisfy f* R-1 f = c, 2 In this chapter, the superscript asterisk denotes the operation of conjugate transposition; n* is a row vector whose entries are the complex conjugates of the elements of n.

-75where R-1 is the inverse of the covariance matrix and c is a real number whose value depends only on the number of dimensions in ~. Furthermore, the mean square noise amplitude in the direction u, where u is a vector of unit geometric3 length, is E[nu* nu] = l/(u* R-1 u). The desired notion of distance in the yr space is the geometric distance divided by the mean noise amplitude in the direction the distance is measured. The unit of length in any direction is thus set equal to the mean noise amplitude in that direction. Let ~ be the geometric length of the vector f, so that the unit vector in the f direction is v = f/i. The desired distance notion is: e (v* R-__-_ = (iv)* R-1 (iv) = f* R-1 f.'h (v* R-1 v)-1 The inner product in the' space is thus defined to be: (3.6) (f,g) = E[g* R1 f]. The Transformations The optimal processing operator can be derived directly from the three conditions given at the end of the previous chapter. The transformations from one function space to another arise so naturally, though, and are so easily visualized in the finite dimensional case that it is worth while considering this approach firsto In two dimensions, the function space may be illustrated by the plane, as shown in Fig~ 3.la. The coordinate axes for L are xi 3 The geometric norm is lul2 = u*u, the geometric inner product is Uov = V'u, the vector scalar product.

-7- T = D1/2V4 D r P Fiur d3 Td' n__ Spa R' = I R" = =I q ~r' ir' pa Figure 3.1. Two-dimensional Function Spaces.

and x2; the data space forms the basis for the function space. The ellipse of concentration represents the noise process: the noise shown is positively correlated between the two data space points. The points p and q are geometrically farther apart than the points q and r. In the'A space distance, though, the two distances should be equal because the trend in the noise makes them equally hard to resolve. The two points x1 and x2 are equivalent under the symmetry group. The permutations are: Al = I =0 A2 = with the multiplication rules: A2 = I, A2-1 = A2. A symmetric function must have equal components, s(xl) = s(x2), so the symmetric vector s must lie on the line x2 = xl, the symmetric function subspace,.o Returning to the general finite-dimensional case, apply the non-singular matrix T to the function space as a coordinate transformation, obtaining the space ot'. The point f in. transforms to the point f' in S' by the rule fV = T f. The covariance in fl is given by RI = E[n'(n')*] = E[(Tn)(Tn)* ] = E[T(nn*)T*] = T R T*. The symmetric functions in $%L are transforms of the symmetric functions in e1, so that s' is symmetric if and only if T-1 s' = A T-1 s', or s: = T A T-1 sI = A' s' for all A in G, where A' is the transformed permutation matrix. These rules can be summarized: (3j7) (1) f? = T f (2) R' = T R T* (3) A' _ T A T

-78Since R is a positive definite matrix, there is a unitary4 matrix V such that V R V = D, where D is a real diagonal matrix with positive entries. Thus V can be considered to be a coordinate transformation, taking the M space to the new function space 41' which diagonalizes the covariance. A unitary transformation preserves the distance and inner product notions, (Vf,Vg) = (f,g). It is therefore a "rigid body motion", the algebraic representation of a rotation or reflection. Returning to the two-dimensional example, the transformation V rotates the coordinates until the ellipse of concentration has its axes aligned with the coordinate axes (Fig. 3.lb). The covariance matrix corresponding to such an ellipse is diagonal. Since distance is preserved, the relations between Pi, ql, and rl are the same as in. The symmetric function subspace is no longer the line with slope 1, but must now be found from the transformed permutation matrices. Since the covariance matrix R is assumed to be strictly positive definite, the matrix D will have no zeroes along the diagonal -- in no direction is the noise amplitude zero. Geometrically, the area inside the ellipse is not zero. By expanding the ellipse along some coordinate axes and shrinking it along others, all the semi-axes can be brought to have unit length; the ellipse becomes a circle, and the noise becomes white (Fig. 3.1c). In the general case, this change in scale factors along the axes is provided by a diagonal transformation matrix, specifically the matrix D-1/2' Applying D -1/2 as a transformation to n1 gives the space A'. The covariance in' is 4.. * -1 A matrix V is unitary if V = V

-79-1/2 -1/2 * -1/2 Rl - D R1 (D ) But D is diagonal and positive, so D is -1/2 * -1/2 -1/2 -1/2 diagonal and real; (D ) = D. Thus Rv = D D D = I The noise has been transformed to a white noise process. The inner product in $i thus has the white noise or geometric form: (3.8) (f',gv = E[(gv)* f]. In Fig~ 3lc, the distance between p and qt is now equal to the distance between q~ and r'o The successive action of the transformations V and D-1/2 is, in the notation of the previous chapter, the transformation T. In the r space: *' T A T-1 (3~9) f = T f, R- = T R T I, A' = T A T Defining the inner product in ~ in terms of that in': (f,g) -(TfTg)O, = E [(Tg)"(Tf)] = E[g*(T*T)f] (f g), = E[g Hf], where H = TT Thus the associated linear operator for the inner product is H = T T. This result is simply a restatement of EqO 208, recognizing that the representation of the adjoint of an operator is the conjugate transpose of a matrix. From Eq. 3o9, T R T* = I. Therefore, R = T-l(T*)-l and R-1 - TXT = HT (3 10) (f,g) E[g* R-1 f]. The optimal signal processing operation in 3 is found by

-80projecting the data point d' orthogonally to the symmetric function subspace idv, as shown in Fig. 3.lc. This projection is most easily performed by rotating the coordinate axes once more so that the symmetric function subspace lies oriented with the coordinate axes. In Fig. 3.ld, the rotation brings v" to lie on the horizontal axis. Rotating the covariance circle leaves it a circle -- the noise remains white. In the general case, suppose the unit cell space consists of v points. The symmetric function subspace is then v-dimensional. A unitary transformation U will rotate the coordinates until " lies along the first v coordinates in the new space v.". The symmetric functions are those vectors whose coordinates are all zero after the first v. The projection operation in a" consists of retaining the first v coordinates of the data vector, setting the remainder to zero: (3.11) V" =P" d", where P" = (P"ij) is given by 1, i = j ~ o, if j It is easily verified that P" is idempotent and self-adjoint. The existence of the transformation U is guaranteed by a theorem in van der Waerden (1955; v. II, p. 134) which can be rephrased in the present notation: Given an inner product operator (f,g) = g*R-lf and a self-adjoint operator P: (f,Pg)4 = (Pf, g),, there exists a coordinate transformation UT such that if f"= UT f, g" = UT g, then (f",g")tt = g f" and (f,Pg)" = g"*P"f", where P" is a diagonal matrix whose diagonal entries are the eigenvalues of P.

The eigenvalues of a projection operator must be either zero or one, by the idempotency condition, and the eigenvalue one has multiplicity equal to the dimension of the symmetric function subspace, by the idempotency on ~ conditiono The operator P" of the theorem is the desired projection operator in the O " plane. For simplicity, in our formulation of the 1-" space, the non-zero entries on the diagonal of P1" preceded the zero entries, This theorem proves the existence of the solution. The uniqueness proof follows that in Lemma 2.3. The projection operation in the % space is found by transforming the data to * ", performing the projection with P", then retransforming the solution back to o (3o12) s P d, where P = T U P" U T. The P" operator in I" is symmetric, idempotent on 4 ", and self-adjoint by constructiono It is now verified that the P operator in O also satisfies these conditionso Note that P" = UTPT-1U-i from Eq. 3.12. Symmetry. -- A" = UAU-1 = UTAT -1U-i By the symmetry of P", P" = A"P" for all A" in Go Then UTPT-U-1 - (UTAT-1U-1)(UTPT-1U-_) or P = AP for all A in G. Idempotency on 4. -- Ps = T-1U-1P"UTs = T-1U-lp"s - T-1U-ls" - so Self-adjointness. -- Note, first, that R-i TT, u- U, (T*) (T-1))* and P"* = P" Then P*R-l = T*U*P"*(U- l)*(T-l)*T*T = T*U*P'UT. Also R-P = T TT-1U-P"UT = T*U-lP"UT - T*U*P"UTo Thus P*R-l = R- P. (f,Pg)~ - E [g*P*Rlf] = E [gR-lPf] -= (Pf,g)~

-82B. The Processing Solution Obtaining the Solution In practice, it is easier to find a matrix that is simultaneously symmetric51 idempotent on ~, and self-adjoint, than to find the transformations U and T. The equations which must be solved can be expressed in matrix form: (3.13) symmetry: A P = P, for all A in G idempotency on: P s = s, for all s in self-adjointness: P R = (P R)* = R P* Since A is a permutation matrix, it is unitary. Thus if A is in G, A-i = A* is also in G and A*P = P. By self-adjointness, then: PR = RP* = R(A*P) = RP*A = PRA, for all A in G. By symmetry, PR = APR, for all A in G, proving: (3.14) If C = PR, then AC = C = CA, for all A in G. The elements of C are therefore symmetric in both subscripts: (3.15) C = (cij), where cij = cijtj for i = i', j S j', and there are only v2 different coefficients to be determined. Since the projection operator P can be obtained from C: (3.16) P = C R-1 5 In the sense that AP = P for all A in G.

-83it, too, has only v2 linearly independent entries. Define a series of "standard" symmetric functions, si, such that the function si has value 1 on all points equivalent to xi and value 0 on all other pointso These v standard functions form a basis for 4. (3o17) i = (si(xj) 1 i j for i = 1,2,.,v The idempotency condition for P requires si = P si, for all i, Applying Eq. 3.16 expresses this condition in terms of the C matrix: (3o18) si = C R-1 si for i = i,o..9V Allowing for the symmetry requirement, Eqo 3.15, for the coefficients of C, each value of i in Eqo 3.18 yields v linearly independent equationso In all, then, Eq. 3o18 expresses the v2 equations needed to determine the v2 different elements of C. The matrix P is then found by applying Eqo 3.16. In addition to solving these v2 simultaneous linear equations, this solution requires inverting the N x N covariance matrix. A shorter procedure involves expressing idempotency in terms of P directly: (3o19) Z p for i = l1Ov j B O j For each value of i, Eq. 3o19 expresses v conditions on the Pij, so each row of P has only N-v independent elements. Because of the symmetry condition, Pij = Pi j for i, i P has only v distinct rows. There are v(N-v) coefficients left to be determined for P.

-84These may be found by applying Eq. 3.14 and 3.15. In fact, the solution decomposes into the solution of v separate sets of N-v equations which can be chosen to have the same (N-v)x(N-v) kernel. The entire solution is then reduced to inverting one (N-v)x(N-v) matrix, a significant saving in computational effort. As an example, let xl-x3, x2-x4. Applying the symmetry and idempotency requirements to P reduces it to only four coefficients: [), l-a7, 1 ( ) =, -y, 1-, Obtaining C from Eq. 3.14 and applying Eq. 3.15 reduces the problem of finding these coefficients to the solution of the two sets of equations: (3.21) M = 33 - 31 3 R41 L(j LR34 -R321 LR44 - R42J where M = r RllR13-R31+R33 R21-R41 —R23+R43 LR12 -R14 -R32 34 R22-R42-R24+R44 All four coefficients in P may be found by inverting the single 2x2 matrix M. It is interesting to note that M is the covariance matrix of the noise differences. (3.22) M E[ in1n312] E[(n2-n4)(nl-n3)*] E[(nl-n3)(n2-n4)*] E[ln2-n4|2] Full Symmetry If all the N data points are equivalent, the symmetry is the full symmetry group on N elements. The unit cell space has only one

point and the symmetric function subspace is one-dimensional. There is little computational difference between the short and the long solutions, but the latter can be expressed rather simply, involving the computation of a single constant. The matrix C = PR must have all its entries equal, by Eq. 3.150 Writing J for the NxN matrix with all elements equal to one: C = cJ. Once R-1 is determined, the constant c is the only parameter left to compute. Equation 3.18 then becomes:. = c. R_-1 1 = C 1 -L 1 This equation can be solved to give c = (s 1 R-1 s) = ( Z r..) i,j JJ where sl is the vector with all N components equal to one. The reciprocal of c is the sum of all the entries in the R-1 matrix. Applying Eq. 3016: -l S1 1*R-1 (3.23) p JR = s(s R ) s1*'R1S S R-11 ( R) si since J = sl sl* (cf. Grenander and Rosenblatt, 1957). The coefficients of P can be written: (3024) Pij = - _ E -I j, kr The contribution of the j-th data value to the signal estimate is the

-86ratio of the sum of the elements in the j-th column of the inverse covariance matrix to the sum of all the elements in the inverse covariance matrix. In the two-dimensional case, the solution becomes: (\ (r22-r21)dl + (rll -rl2)d2 (3.25) s = rll - r12 - r21 + r22 Estimating the Noise Some interesting properties of the solution can be derived from Eq. 3.19 without explicitly solving for P. The estimate of the i-th signal value can be written, in general: (3.26) xi)= i = yjPij d(xj) The estimate of the signal at xi depends on all the data values. The coefficient Pij is the weight of the contribution of d(xj) to the estimate of s(xi). Grouping the data into equivalence classes, Eq. 3.26 becomes: A (3.27) si = EjPij dj + ZkPik dk +. The first term on the right is summed over the Hi points in the equivalence class of xi. The first summation is the contribution of the data values as measured at points equivalent to the estimate point: the contribution of "relevant" data values. The remaining summations are each over a separate equivalence class. They represent the contribution of points not equivalent to the estimate point: the contribution of "irrelevant" data values.

-87Equation 3.19 states that the sum of the weights, Pij, in the first summation is 1, the sum of the weights in each other summation is 0. In particular, if xo is an isolated singular point of the data space, being invariant under all the automorphisms, the equivalence class yo contains only the one element xo. This datum value makes no contribution to the estimate of any signal value other than s(xo). The question arises: why should any irrelevant data value contribute at all to the estimate? The answer, of course, is that the irrelevant signal does not. The noise at the irrelevant point is correlated with the noise on the desired signal, though, and knowing this noise value aids in removing the noise obscuring the signal. But the noise values, "uncontaminated" by the irrelevant signals, are not known. If the data at two equivalent points are subtracted, the signal will cancel out, leaving the noise difference. These noise differences are used in making the estimate. The datum at an isolated point cannot contribute to the estimate because the noise cannot be separated from the signal. The contribution of these noise differences is illustrated by rewriting Eq. 3.27 in terms of the signal and noise separately. Because of the constraints of Eq. 3.19, the signal term in all the summations after the first disappears: A si = Si + ZjPij nj + ZkPik nk +. o The dependence of the estimate on the noise differences is tacitly contained in the constraints on the weights. This dependence may be emphasized:

-88i = si + njo + jp (n.n j. ) + *jPi(n j-nj ) + 2. ~ i o kik k 0 where xj is any point in the equivalence class yi, xk any point in Yk, etc. Since si + nj = d(x j), the problem can be restated: The signal si is estimated by the data value d(xjo). The coefficients Pij are then chosen, under the constraints of Eq. 3.19, to best cancel out the noise nj with the remaining noise differences. Jo The presence of the irrelevant data does not automatically lead to an improvement in the estimate. Even though the noise at points xk and xkt may be correlated with the noise at Xjo, the noise difference (nk-nk, ) may be uncorrelated with nj. In this case, the knowledge of the data at irrelevant points is, in fact, irrelevant to the signal estimate. In the four point example (Eq. 3.20) the estimate sl depends on all four data values. If the noise difference is uncorrelated between equivalence classes, the matrix M, the covariance of the noise differences, becomes diagonal, as seen from Eq. 3.22. The solution then reduces to s = d1 + (1-d) d3 which is the solution to the two-point problem, given by Eq. 3.25. C. Special Covariance Matrices The "trivial" Solution In this section, two special types of noise processes are considered. In these cases, not only are the irrelevant data values

-89indeed irrelevant, but all the relevant data are weighted equally. The estimate of si is just the arithmetic mean of the data at the relevant points: (3.28) si = (l/i) E dit where ki is the number of points in the equivalence class of xi. This solution will be called the "trivial" solution, since it seems to ignore all the information about the inter-relationships between the noise samples. Symmetric Noise The noise process is symmetric if the properties of the noise process are invariant under all the automorphisms of the symmetry group. For our purposes, only the first two moments of the noise process are considered, so a weaker condition is sufficient. The noise process is "wide sense" symmetric if the covariance function satisfies: R(xa,(x')a) = R(x,x') for all a in G. In matrix terms, R = E [n n*], so the symmetry requirement becomes: R(xa, (x )a) = E [(An)(An)*] = E [Ann*A*] = ARA*. The noise process is wide sense symmetric if and only if:6 (3.29) R = A R A*, for all A in G. The self-adjoint property can then be expanded: Since the permutation matrices, A, are unitary, the white noise process, R = I, is wide sense symmetric.

-90PR = R P* P ARA=ARA* P* for all A in G. By symmetry, AP = A and PA* = (AP) = P so APARA* =ARA* = ARA* P*A* PAR = RA*P* (PA)R = R(AP)* Thus the matrix PA is self-adjoint. PA is symmetric since A(PA) = APA = PA for all A in G. PA is also idempotent on $, since (PA)s = P(As) = Ps = s. Therefore PA is a solution to the processing problem. Since P is the unique solution, PA = P for all A in G, so that Pij = Pij: for j'-j'. Thus all the summands in the sum of Eq. 3.19 are equal:, Pij' = Pj Pij = 0, i i j The solution is therefore the trivial one of Eq. 3.28. A special case of symmetric noise is the statistical problem arising by taking uncorrelated samples from a single multivariate probability distribution. Let each sample produce v data values. The v x v covariance matrix R describes the relationships between the noise in any one sample. Since different samples are assumed uncorrelated, the total covariance matrix is simply: Rv.. O Rv.. O

-91where 0 represents a v x v block of zeroes. Since the covariance matrix R is symmetric, the solution is the "trivial" one: given uncorrelated samples from a multivariate probability distribution, the sample mean vector is the best estimate of the distribution mean vector in the sense of minimizing mean square error. This estimate is unbiased. Stationary Noise The concept of stationarity is meaningless unless the data space is a topological group. It will thus be considered in full detail in a later chapter. Because this type of noise does lead to the "trivial" processing, and because it can be discussed with the mathematical structure that has already been developed, it will be considered briefly. The finite data space, xi for i = O,1,...N-l, is now assumed to form a group by defining addition: xi + xj = xi+j, where the summation on the subscripts is taken modulo N. In essence, this means that xo is "adjacent" to XNl1 just as xo is adjacent to xl. The noise process is stationary if the joint probability distribution of ni and nj depends only on the difference i-j. The condition of "wide sense" stationarity is sufficient for our purposes: the noise is wide sense stationary if the covariance function R(xi,xj) depends only on the difference i-j. In matrix terms: R = (rij) = (ri.j)o ro, r1, r2,.. N r r*, ro, r rN2 r2*, rl*, ro,.. rN-3 e

-92Each row of R is identical to the row above, but shifted one unit to the right. The topological group structure means that an entry shifted off the right end re-appears on the left. Such a matrix is called a "circulant," with eigenvalues Xj and eigenfunctions Oj given by (Marcus, 1960): Oj R =- j Oj.2 where =j = Zk Rk exp(2ti jk/N), i = - 1 and = (l,exp(2iij/N), exp(4ij/N),..., exp[2(N-l)dij/N]) Since the eigenfunctions and eigenvalues are known, the transformations V and D are easily found. The matrix V is the matrix whose j-th row is the j-th eigenfunction. The matrix D is the diagonal matrix whose j-th diagonal entry is the j-th eigenvalue. V is given by: V = (Oj,k), where Oj,k = exp(2nijk/N) The transformation g = V f becomes: gj = Ek dk exp(2viljk/N). The j-th coefficient of g is exactly the coefficient of the j-th term in the Fourier expansion of f. In fact, the eigenfunction Xj is the j-th coefficient in the power spectral density function of the noise. The mathematical manipulations we have created reduce to the Fourier transform when the problem is specialized enough for Fourier analysis to apply' Continuing with the solution, assume the symmetry to be full, so that all the points are equivalent. A function is symmetric if and

-93only if it is a constant. Therefore its spectrum consists only of the "DC term." The signal subspace in either f1 or in A' is already aligned along the j = 0 axis. The processing therefore consists of computing the transform of the data, ignoring all terms except the zero frequency term, then retransforming. This procedure, of course, is exactly the "trivial" processing of Eq. 3.28.

CHAPTER IV REPLICATED DATA SPACES A. The Processing Problem Reduction to the Finite Case In this chapter, the data are assumed to be derived from a series of replicated experiments. Each trial produces a data function, rather than the single datum value of Chapter III. The symmetry is assumed to be full -- the signal function is identical from trial to trial. The first part of this chapter shows how this replicated data problem reduces to the problem of the finite data space, so that the solution involves the inversion of a finite matrix or the solution of a finite number of simultaneous linear equations. In contrast with the previous chapter, the matrices will have functions, rather than numbers, as entries. The simultaneous equations will be linear integral, rather than algebraic, equations. The data space for this type of data is illustrated in Fig. 2.1. The data obtained from the i-th trial are a function defined on the space Xi. This space is a "building block" for X, X = Ui Xi. Under the symmetry group, each Xi is equivalent to every other Xj and also to the unit cell space Y. Any point x in X must be located in one of the building blocks, x = yi for some y in Y, i < N. Any function f(x) defined over X can thus be represented as a series of functions, each defined over a separate block: f(x) = f(Yi) = fi(Y), Y c Y, i = 1,2,...,N. This series of functions can be written as a column vector: -94

-95f2(Y) f = (fi(Y)) I I fN(Y) Two points x and x' in X are equivalent if and only if x = Yi and x' = yj: equivalent points can differ only in their subscripts. The automorphisms of the symmetry group permute the subscripts. In the vector notation, the permuted function f' (x) = f(xa) has the same components as f(x), but in different order. Thus f' = A f, where A is an NxN permutation matrix as in Chapter III. The symmetric functions are those with equal components: (4.1) f C if and only if fi(y) = fj(y) for all y E Y; i,j = l,...N. In order to proceed with the machinery of Chapter III, it is necessary to represent transformations of coordinates and linear operators as matrices with functions as entries. The operators that will be needed include the transformations V,T, and U; the projection operators P and P", and the linear operators associated with the -1 inner product notion, R and R. The permutation matrix A is also an example of such an operator matrix. Consider a linear transformation, represented by the integral operator h(x,x') as in Chapter II. (4.2) f'(x) = h(x,x') f(x') dx' Writing x as yi and noting that the integral over X can be broken down into a sum of integrals, each over a subspace Xi: i

-96N f'(Yi) = h(yix') f(x') dx' j=l xj The variable of integration, x', in each integral is limited to one subspace, so the integral can be rewritten: N f' (yi): Z h(YiY'j) f(y'j) dy' j=l The kernel function h(yi,y'j) is now broken up into a series of functions: (4~3) h(Yi,Y'j) = hij(Y,Y') so that the transformation becomes: N f?(yi) jl j hij(Yy') f(y' j) dy This result is formally similar to Eq. 3.3, but rather than multiply the datum value f(xj) by the constant hij as in Eq. 3.3, the data function f(y' j) is composed with the function hij(y,y'): (4.4i) hij * fj =y h j(y,') fj(y') dy' The operator matrix H can now be defined as the matrix of functions hij(y,y') as given in Eq. 4.3. The transformation of Eq. 4.2 becomes: f' = H f where the multiplication of elements is given by the composition of Eq. 4.4.

-97In general, the multiplication of two matrices is given by the rule: If U = (uij) and V = (vij), the product U V = W = (wij) is given by: (4.5) wij = Zk Uik * Vkj, where uik*kj = uik(yy ) kj(Y,) dy". In order to extend this definition to include the product of a matrix with a vector or the scalar product of two vectors, the definition of composition is extended to include: fi * hij =Sy fi(Y ) hij(Y,y) dy' fi * gi fi(y) gi(y) dy As an illustration of these composition methods using the matrix notation, consider the desired inner product notation: (f,g)~ = E[g* R1 f]. This can be written out in full as: * -1 (f,g) E[gi * Rij fj] i,j y E[gi (Y) Rij l(y,y') fj(YT)] dy dy' i,j -i z,i j E[g (x) R (xx') f(X')1 dx dx( = E[g (x) R- (x,x') f(x')] dx dx' X

-98Thus the matrix notation for the inner product corresponds exactly with the integral formulation of Chapter II. Note that the asterisk has been used in three different senses in these equations: as a superscript on a constant or a function it denotes complex conjugation; as a superscript on a vector it denotes conjugate transposition; and as an operation between two functions it denotes composition. These usages are always distinguished by the context. In order to utilize fully the matrix methods of Chapter III, the matrix entries, the operator functions, must form a field of characteristic not equal to 2 (Van der Waerden, 1955; v. II, p. 128). Three conditions must therefore be satisfied by the operator functions: (1) There must be an identity function, Iij, with the property that Iij * fj = fi and fi * Iij = fj for all functions fi, fj. (2) Each operator h.j must have an inverse, hi -1 such that hij * h =..... suc tha hij * hjk hij * hjk = Iik. And(3) it must not be the case that hij * hjk = Iik for all operator functions h. Condition (3) is automatically satisfied if the data values can be numbers other than zero and one -- the processing techniques do not necessarily apply to binary data. Condition (2), the existence of inverses, need not apply if we are satisfied merely with writing down the set of simultaneous linear equations that the processing operator must satisfy. The techniques involved in actually solving the equations will not be discussed here. Condition (1), the existence of the identity function, cannot be dismissed so easily. In fact, the entries of the permutation matrix A are the operators Ojk and Ijk, not the numbers O and 1. The

-99issue of the identity function was avoided by defining symmetric functions in Eq. 4.1 without reference to the permutation matrices. The required identity is the delta function, widely used in signal theory even though mathematically it is not a function. Rigor may be maintained, for the present purposes, by including the identity operator as a possible entry in the operator matrices. Thus the operator matrix H has, as entries, the function of Eq. 4.3 or the identity operator. Matrix multiplication is given by the rule in Eq. 4.5 except in the case of the identity operator, when the properties given above in condition (1) are used. The identity operator need occur only in the operator matrices. The function vectors (the signal, the noise, the observed data, and the estimated signal) never contain the identity as an entry. Although the iderlmpotency condition in Chapter III was expressed with the aid of the "standard" symmetric functions (Eq. 3.17) which contain these identity operators as elements, the idempotency condition in this chapter can be expressed in other terms. The Solution The optimal processing operator is an operator matrix, P, which must satisfy the three conditions: (4o6) symmetry: A P = P, for all A in G idempotency on p: P s = s, for all s in $ self-adjointness: P R = R P* The symmetry requirement is satisfied by setting Pij = Pitj for all i,i'. Since full symmetry is assumed, only one estimate is needed and the first subscript can be dropped, the processing operator matrix reducing to a single row:

-100(4-7) S = ~j pj * dj s(y) == jY pj(y,y' ) dj(y' ) dy' In order to avoid inverting the R matrix, the "short" solution is used, following Eq. 3.19. The solution is thus expressed in terms of N-1 simultaneous linear equations. Since these equations involve the composition operator, they are integral equations. The idempotency condition, Eq. 3.19, becomes: (4.8) ~j pj = I, that is, Zj pj * f = I f = f, for all f in. By Eq. 3.14, the matrix C = P R has all its entries equal: (4.9) j pj * rjk = c = c(y,y'), k = 1,...,N. Since it is not necessary to compute the function c, set: C = Ej Pj * rjl so that Eq. 4.9 reduces to: N (4.10) Z pj * (rjk - rjl) = 0, k = 2,...,N. j=1 N Equation 4.8 can be rewritten: p1 = I - ~ pj, so that Eq. 4.10 becomes: j=2 N ~ Pj * (rjk - rjl) = - P1 * (rlk - r11) j=2 N = - (rlk -1l) + Z j * (rlk -r11) j=2 N (4.11) Z Pj * (rll-rlk-rjl+rj k) = (rll-rlk), k = j=2

-101Equations 4.8 and 4.11 give the N equations necessary to determine the N entries of the processing vector. Since all rows of the matrix P are identical, the processing operator P is completely determined. B. Wiener Multivariate Smoothing The Dummy Signal The Wiener filtering problem is derived under the assumption that the signal, as well as the noise, is a sample function from a random process. The data space is assumed to be the real line, all functions being functions of time, t. The signal is assumed to have zero mean, E [s(t)] = 0, for all t, and the signal covariance is known: RS(t,t') = E [s(t)s*(t')]. The covariance between the signal and each noise term is known: Rsni(t,t') = E [s(t)nj(t')], as well as the noise covariance: Rninj = E [ni(t)nj*(t')]. Finally, it is assumed that the N data functions, di, i = 1,.oooN, are given and that each is the sum of a particular signal sample function and the corresponding noise sample: di(t) = s(t) + ni(t). The problem is the same as the symmetric function problem: find a processing operator that produces, from the data, the best estimate of the signal in the sense of minimizing the mean square error. The solution can be obtained as the solution to an N+1 dimensional symmetric function problem. Define the "noise" function no to be the negative of the random signal: no(t) = - s(t). The remaining N noise terms, ni(t), in the symmetric function problem are equal to the corresponding N noise terms of the Wiener problem. Define a "dummy" signal function sd(t) and N+1 dummy data functions: fi(t) = sd(t) + ni(t), i = O,1,..oN. The noise covariance matrix will be

-102(N+l)-dimensional, R = (rij), i,j = 0,1,...,N: (4.12) roo(t,t') = Rs(t,tt) roj(t,t' ) = - Rsnj(tt') rij(t,t') = Rninj(tt') The negative sign in the definition of roj is due to the fact that the noise no is the negative of the random signal. The solution to this N+l dimensional symmetric function problem is the solution of the N equations given in Eq. 4.11, where now the summation index runs from j = 1 to N, and the special role of subscript 1 is replaced with subscript 0: N (4.13) Z Pj (roo-rok-rjo+rjk) = (roo-rok), k = 1,...,N. j=1l Assume that the solution has been obtained, so that: N Sd j= j * f j=0 Since no = fo - sd, the best estimate of no is n o = -o d = no + Sd - sd: N no= no + sd - pj * (sd + nj) j=O By Eq. 4.8, j pj * sd = sd, so that the dummy signal cancels out, leaving: Again using Eq. 4.8, write po = I - ~ pj, so that: j=1 A~~~ N no = n o - Po no - *nP j *n j=1 N no - no + Z Pj * (no - nj) j=1

-103But n= - s so that no = a - nj =- (s+nj) = - dj. N (4.14) s= Zp pj *dj n=l Therefore the solution for the Wiener multivariate smoothing problem, given in Eq. 4.14, is obtained by solving the symmetric function problem Eq. 4.13. Define two new covariance functions, the data covariance Rd d (t t' ) E [dj(t)dk*(t')] and the signal-to-data covariance Rsdj(t,t') = E [s(t)dj (t')]. Rdjdk = E [(s+nj)(s+nk)*] = E [ss* + snk*+ njs* + njnk*] = Rs + Rsnk + Rnjs njnk and Rsdj = E [s(s+nj)*] = E [ss + snj ] = Rs + Rsnj Using the covariance functions defined in Eq. 4.12, Equations 4.13 reduce to: N (4.15) Z pj * Rdjdk = Rsdk, k =...,N j=l as the equations determining the Wiener smoothing filter. The Augmented Function Space The Wiener multivariate smoothing problem may also be expressed in terms of the function space of the symmetric function problem, with out the introduction of a dummy deterministic signal. In order to develop the relation between the two problems, they must both be expressed in the same language.

In the symmetric function problem, the function space % is an N-dimensional vector space. The N data functions determine the data point d in this space. The assumption of full symmetry determines the symmetric function subspace - -- the line passing through the origin with the equation: (4.16) xi = xj, i,j = 1,2,...,N The actual noise sample vector n has coefficients ni = di - s. Since ni - di = nj - dj = s, this noise vector must lie on the noise line ~n: (4.17) xi - di = xj - dj, i,j = 1,2,...,N This noise line is the line that passes through the data point and is parallel to J (Fig. 4.1). By Eq. 2.15, the problem of finding the point s in A closest to d -- that is, minimizing Hls-dli -- is equivalent to finding the point in tn closest to the origin -- that is, minimizing IIan| = I1d-sII. This point is found by projecting the origin orthogonally to the line ~n (Fig. 4.1), according to the inner product notion in M. The Weiner problem involves the random signal process as well as the N random noise processes. The space' is thus augmented by the signal process, obtaining the N+l dimensional vector space 1 a. The inner product and distance notions in Ia are given by the augmented (N+l)x(N+l) matrix Ra, defined in Eq. 4.12. A point in,a 1 over the ring of functions, f(y).

-105X2 d ~~// /~~~~~A S n X1 Figure.1. The Symmetric Function Subspace an Figure 4.1. The Symmetric Function Subspace and the Noise Line.

is determined only when the signal value s, as well as the N noise values ni, are known -- N+l coordinates in all. Since the observed data provide only N values, di = s + ni, the data are not sufficient to fix a data point in al. They do determine a line, though, the data line Ad: (4.18) xo + x1 = di, i = 1,2,...,N where xo is the signal axis. The signal point sa in la,n given by the actual signal value and the N noise samples, must lie on this data line. The Wiener problem is to find the point sa in a that lies closest to the actual signal point sa, using the notion of distance in'a.' This problem is equivalent, by an argument similar to that in Chapter II, to finding the point in d which lies closest to the origin. Thus the signal A estimate sa is found by projecting the origin orthogonally to the data line. The above discussion may be summarized as follows: The symmetric function smoothing problem is equivalent to projecting the origin orthogonally onto the noise line 4 n in *-, the notion of orthogonality being derived from the noise covariance R. The Wiener smoothing problem is equivalent to projecting the origin orthogonally onto the data line Ld in Wda, the notion of orthogonality being derived from the augmented covariance Ra. These two problems are related by the fact that: The noise line n is the geometric projection of the data line 4 d in'Wa to the subspace'. The proof of this statement is simple. A point on id has coordinates, by Equation 4o18: (xo,dl-xo,...,dN-xo). The geometric projection of

-107this point to the V space is found merely by dropping the xo coordinate: (-, dl-x, d2-xo,.,d N-xo), where xo is a constant. The coordinates of this point obviously satisfy Eq. 4.17, the equation of the line n'

CHAPTER V TOPOLOGICAL GROUP DATA SPACES A. Data Processing on Groups Topological Groups If the data space forms a topological group, the powerful techniques of Fourier analysis can be used to describe the data processing operations. In this chapter, the data space will be assumed to be a locally compact, Abelian group. For simplicity, the data functions are assumed to be real. A topological group is a topological space whose elements form a group. Thus, two points xl and x2 in the space can be added: x3 = xl + x2. This addition must be continuous -- loosely speaking, if one of the addends is changed slightly, the sum is changed slightly. There must be a zero element, 0, so that x + 0 = 0 + x = x, for all x in X. Finally, there must be negatives, so that x + (-x) = (-x) + x = 0. The group is Abelian, or commutative, if x + x' = xT + x, for all x, x' in X. The condition of local compactness is a type of "finiteness" condition -- finite-dimensional spaces are locally compact. The data space usually considered is the real line, the data being a function of time. In practice, the data cannot extend over an infinite duration, so the data space is really a finite line segment, which is not a group. In order to maintain group structure, the ends of the segment are joined to make a circle. The data are thus, in effect, defined over the whole real line as a periodic function, one -108

-109period being the interval of observation. The data space considered in optical data processing is usually the two-dimensional plane, again a topological group. Here, too, the data do not actually extend over an infinite region. The actual data function is extended over the whole plane by defining it to be zero outside the region of observation. When the data space is a locally compact Abelian group, the function space A/ can be transformed -- using the Fourier transform -- to the new space pF. The function f(x) is transformed into the function F(w), the Fourier transform of f(x). The transformed variable w forms a topological group W, the dual group of X. The unit cell space Y is the quotient space X/G (cf. Definition 2.8). If the symmetry group G is a normal subgroup of X, the unit cell space is also a topological group and the functions on Y -- the symmetric functions -- will have Fourier transforms. The transformed variable will form the topological group Q, the dual group of Y. The transformed symmetric function space can be embedded in the transformed function space &F since the group Q is a subgroup, as well as a subspace, of W. Specifically, Q is the "annihilator" of G in W. The symmetric functions in F are precisely those functions2 F(w) such that F(w) = 0 for w ~ Q. Thus the coordinate system of tF is aligned with the symmetric function subspace. The data processing operator can be represented as the integral: Pontrjagin, 1939; Theorem 34. Rudin, 1962; Theorems 2.7.2, 2.7.4.

-110(5.l) s(x) = p(x,x') d(x') dx' In the notation of the previous chapter (Eq. 4.4), the estimated data function is the composition of the processing operator with the data: A s = p * d. Since the data are assumed to be real, Parseval's theorem provides that: (5.2) S(w) = (ww') D(w) dw' where P(w,w') is the "two-dimensional" transform of p(x,x'), the result of transforming each variable separately. The processing operation in AF is again a composition. Under two special conditions -- first, if the data space X is compact, and second, if the processing operator is translation invariant -- Eq. 5.2 represents a significant simplification over Eq. 5.1. If the data space is a compact topological group, its dual group is discrete.3 The integral in Eq. 5.2 becomes a summation: (5~3) sj = z Pjk Dk k where the P jk and Dk are complex numbers. The transformed space IF now has a well-defined identity function: (5.4) I { 1, i = k that satisfies fj = Zk Ijk fk. This condition is explored further in the final part of this chapter. 3 Pontrjagin, 1939; Theorem 31.

-111An important subclass of processing operations has the property of being translation invariant. If P is a translation invariant operator and's(x) = P[d(x)], then's(x + xo) = P[d(x + xo)]. In signal theory, the corresponding condition is time-invariance -- the processing operation is independent of the time of occurrence of the signal. The optical data processing technique of Cutrona, et al., (1960) is translation invariant. A translation invariant operator has the property that (5~5) p(x,x') = p(x-x') and the composition integral of Eq. 5.1 becomes the convolution integral:' (5.6) s(x) p(xx) d(xt) dxT X In the transform plane, the processing operator is not the composition of Eqo 5~2, but has a multiplicative form:5 (5o7) S(w) = P(w) D(w) where P(w) is the transform of p(x)o The identity function is simply (5~8) I(w) = 1, for all w in W and the inverse of F(w) is l/F(w). Data Space Structure and Symmetry In order to utilize fully the advantages of the Fourier transform, both the data space and the unit cell space have to be topological 4 Rudin, 1962; Theorem 3.894o Rudin, 1962; Theorem 3~8.3~

-112groupso The symmetry group must therefore be compatible with the group structure of the data space. In this section, it will be proved that the only symmetry compatible with any topological group is pure translational symmetry. In a pure translation group, all the automorphisms have the form: xa = x + Oao The points x and x' are equivalent under the symmetry group (written x = x') if there is an automorphism~ a in G such that xI - xa (cf. Defo 2~7). In order for the symmetry group to be compatible with the group structure of the data space, the automorphisms of G must preserve addition: (5 9) (x + x )a a a a x)a -(xa) It follows that the equivalence class of zero, the set of all points of the form 0a is a subgroup of X, the "kernel" of the symmetry group. The unit cell space Y is the set of equivalence classes of Xo Algebraically, Y is a homomorphic image of X. Thus Y can be written Y = X/K, where K, the kernel of the homomorphism, is the kernel of the symmetry group. In order for Y to be a group, equivalent points can differ only by an element in the kernel: x - x' implies x - x' - 0. If Y is to be a group, the addition of equivalence classes must be well-defined. Thus if xls xl' and x2 - x2', then X1 + X2 -X1' + x2. These two conditions, which must hold if Y is to be a topological group, are each equivalent to the condition that the symmetry group be purely translational.

-113Theorem 5o1: The following statements are equivalent: (1) x1l xl and x2 - x2' implies xl+x2 _ xl'+x2' (2) If x,x' then x-x'- 0 (3) The symmetry group is a pure translation group. Proof: (1) implies (2): Let x1 = x, x2 = x2' = -x', x'ls Xt. If x - xf then xl — xij, so by (1), xl+x2 = x-x'- x1l+x2' = x'-x' 0 (2) implies (3): The function f(x) xa-x is a continuous function of x, since addition and subtraction, as well as the automorphisms of the symmetry group, are continuous. But xa x, so by (2), f(x) = xa-x 0. Since G is a discrete group, the kernel K is a discrete set. Thus f(x) is a continuous function with discrete values and is therefore a constant Since f(O) = 0a-, f(x) = 09 for all x, or xa = x + Oa. The same argument applies for each a, so that the symmetry group is pure translational. (3) implies (1): xl-xl' implies xlV = xla for some a in G. Hence by (3), x1' = xi + Oa Similarly x2 = x2 implies x2' = x2 + 0b for some b in G. Then xl' + x2v = x1 + X2 + (0a + 0b )- x1 + x2. B. Special Processing Solutions The "trivial" Solution In the case of a finite data space, the processing solution became a "trivial" averaging operator when the covariance function was either symmetric or stationary. These results are now extended to more general data spacesAssume the symmetry group is finite, of order p. The trivial

-114processing technique is defined to be the operation: (5.10)'(x) = (1/i) Z d(xai) where the summation is over the whole symmetry group. In operator terminology, the automorphism a:x - xa becomes the operator A:f(x) - f(xa). Definition 50l, The trivial processing operator is Pt = (1/i) Zi Ai, summing over the symmetry group. Note that Pt is symmetric and idempotent on, and that Pt* = Pt. In particular, if R = I, then Pt is the optimal processing operator. If P is the optimal processing operator, then P Pt[f] = Pt[f], since Pt[f] is a symmetric function. That is, P Pt = Pt. Lemma 5lo. If P A = P for all A in G, then P is the trivial processing operator Pt. Proof~ P t = P (1/) Ei Ai = (1/u) Zi PAi = (1/u) Ei P, by the hypothesis. Thus P Pt = P. But P Pt = Pt, proving that P = Pt. In terms of integral operators, the condition of the lemma is that p(x,X'a) = p(x,xl), for all a in G. The noise process is defined to be wide-sense symmetric if the covariance function is invariant under all automorphisms in G: R(x,x a) = R(x,x?). In operator terminology: Definition 5.2. The noise process is wide-sense symmetric if A R A* R, for all A in G. As in the finite case, the optimal processing operator for symmetric noise is the trivial one~

-115Theorem 5.20 If the noise process is wide-sense symmetric, the optimal processing operator is Pt, Proof: Pt is symmetric and idempotent on, as noted above. Also: PtR = (1/i) ZiAiR = (1/) Ei(AiRAi )Ai = (1/) EiR Ai = R Pto Since Pt = Pt, PtR = R Pt, so Pt is also self-adjoint. Thus it is the solution to the processing equations, and is the optimal operator. The above results are valid for general data spaces -- they do not require topological group structure. Now suppose that the data space and the unit cell spaces are both topological groups. If the processing operator is translation invariant, then P(x,x') = P(x-x'). Theorem 5.3. If both X and Y are topological groups and the processing operator is translation invariant, then the optimal processing operator is Pt' Proof: If P is translation invariant, then P(x,x'a) = P(x-x'a). But if Y is a group, then by Theorem 5.1, x'a = x' + Oa, so P(x,x'a) -P(xxOa). But _Oa (a = 0a so that 0a = 0b for some b in G and -Oa = x+Ob= xb Thus P(x,xa) = P(xb-x') P(xbxx'). Since P is symmetric, P(xb,x ) = P(x,x).o Therefore P(x,xTa) = p(x,x'). By Lemma 5.1, P = Pt, The condition of wide-sense stationarity for the noise, given in Chapter III, is simply the condition that the covariance function be translation invariant. Definition 530. The noise process is wide-sense stationary if the covariance function is translation invariant. Theorem 5)4o If both X and Y are topological groups

and the noise process is wide-sense stationary, then the optimal processing operator is Pt. Proof: By hypothesis, R(x,xv) = R(x-xT). Then R(xa,x'a) = R(xa-xta). But if both X and Y are topological groups, xa = x + 0a and xta = x + oa, so xa - x = x - xt Thus the covariance function is symmetric and, by Theorem 5o2, the processing operator is Pt. Replicated Data Suppose now that the data space is obtained through a replicated experiment, as in Chapter IV, and that the building block, the data space of each trial, is a topological group. An example is the case when each trial produces a function of time as data. The entries in the matrices and vectors of Chapter IV are all functions defined over Y, so that they have Fourier transforms. If the noise process is wide-sense stationary within each trial, the processing operator within a trial is translation invariant: p.(yy') = p (y-y') (See Eq. 4.7). The composition operation involved in matrix multiplication of the transforms (Eq. 4 5) can be replaced by simple multiplication of the transforms (Eq. 5~7), so that the matrix entries form a field. The matrices can thus be inverted and the processing solution can be obtained in closed form. Processing on Compact Groups Assume that the data space X is a compact topological group, that the unit cell space is a group, and that the symmetry group is of finite order, p. The dual space W of X is discrete. The space W can be thought of as a transformed data space, the transformed functions

-117being functions over W. The transformed symmetric functions are defined over the dual group Q of Y, which is a subgroup of index i in W. The space Q = o0 and its cosets Q1, i = 1,...,;-1 form a type of "building block" structure for W: W = Ui Qi. Since W is discrete, the Qi are disjoint. Furthermore, each Qi has the same algebraic and geometric structure as Q0. Following a procedure analogous to -that in the previous chapter, write the function F(w) as a column vector: FO1 ()) (5-11) F(w) =(Fj(w)) where each Fj is a function over Qj. The symmetric functions are those for which Fj(w) = 0, j i 0o The linear transformations and operators are defined as matrices whose entries are functions of two variables, just as in Chapter IV: H = (Hjk(,cD'?)). Matrix multiplication involves the composition of Hij with Fj, but because a is discrete, the composition integral becomes the summation: (5.12) Hij * Fj = Hij(cuW') Fj(c') = F'j() The identity operator is well-defined: (5-13) Ij( W W') = f' 3 If F' = P F, then in order for P to be symmetric, F'i must be zero for i i 0: P ij. = 0 for i 7 O. Only the top row of the P matrix is non-zero. Idempotency on ~F requires that

-118Ej Pj * Sj = P00 * So = So, so that P00 = Ioo0 Finally, self-adjointness requires that P R = C be symmetric in both variables (cf. Eq. 3.14), so that ~i Poi * Rij = Coj = 0, for j # 0. The optimal processing operator can then be written: A~ P-1 (5.14) S0 = Do + ~ Poj * Dj i=l where the P. must satisfy the equations: k-1 (5.15) E Pj * - R j=l If the matrix entries form a field so that functions have inverses, the solution may be obtained very simply: P = C R. Since -1 -1 C O = j ss 3 C~R oi. But = C * R Cij = 0 unless i j P 00* R But P0 = Coo * R 00 Ioo, so that Coo = (R 00) and: -l -loo -1 (5.16) Poj (R o)-1 R-1 Processing Example Let the signal be a periodic function of the real line, with period xo The data is observed for, say, three cycles -- over an interval of length XO = 3 xo0 Any function defined on an interval of length Xo can be written as the Fourier series (5.17) f(x) = Z F(w) exp (iwWox), where Wo = 2i/Xo The function F(w), defined for integral w, is the transform of f(x). F(w) can be written as the vector (Fj (c)), j = 0,~.,2, where (5.18) Fj(o) = F(3c + j).

The symmetric functions have period xo, so they have the series form: (5e19) s(x) = > S(w) exp (icMox), where wo = 2x/xo Since wo = 3Wo, the variable c can be written as 3w: (5,20) s(x) = S(3w) exp (i3wWox). Writing S(w) as a vector gives So(O) = S(3w), and Sl(O) = S2(w) = 0, as expected. Let the noise be white, for simplicity, but let it be timevarying so that its amplitude will fade in and out: (5.21) R(x,x ) = (1 + cos WOX) b(x - x') The covariance function R has the Fourier series representation: (5.22) R(x,x') =, w,R(w,w') exp (iwWox) exp (-iw'Wox') r 1 w'-w = 0 where R(w,w') = 1/2, ww'-w1= 1 0, 1wT-wl> 1 Writing R in matrix form, R = (Rjk), where Rjk(,cu') = R(3o+*j, 3'+k) gives: R R R R2 r 1/2, rcu 1 cu= Rol =Rl = R12 = 21= (1/2)I = c o,' f 1/2, = 1+w' 02 O, otherwise R1/2, C = 1+c0 R20 = 1, otherwise

-120To solve the processing equations, it may be noted that the entries in the covariance matrix are all translation invariant: Rjk(WwO) = Rjk(0-w )o. Thus they have Fourier transforms which combine multiplicatively, and inverses can be' found. The matrix R can be inverted and the solution found from Eq. 5.16. R L/2, I I/2 where L is the R02 function interpreted as an operator, L' the R20 function. 3I, L-2I, I-2L Rl L-2I, 4I-LL',L-2I 2I+ (1/2) (L+L )-LL' I-2L',L-2I, 3I so that P - 3I, P2 - 3I giving the processing formula: 31 31 (5o24) so = D + (l/3)(R02-2I)*D + (+ /3)(i-2RO2)*2 The same result is obtained by solving the two simultaneous equations given in Eqo 5o15, without inverting Ro Writing out the compositions indicated in Eq. 5.24, using the rule of Eqo 5o12 with the values of R02 in Eqo 5.23, gives: so Do(W) - (2/3) Dl(w) + (1/3) Dl(j-l) + (1/3) D2(w) - (2/3) D2(o-l) Rewriting according to Eq. 5018 gives: (5o25) s(3w) D(3w) - (2/3) D(3w+l) + D(3w-l) + (1/3) D(3w+2) + D(3w-2)

-121Using standard Fourier transform techniques, it is easily verified that: (5.26) s(x) = w(x) d(x) + w(x+xo) d(x+xo) + w(x+2xo) d(x+2xo) where w(x) is a weighting function given by (5.27) w(x) = (2/9)(cos 2Wox - 2 cos Wox + 3/2) 2 cos Wox - 1 2 3 The value of the estimate at any point x is the weighted average of the data at the three equivalent points,, x x+xo, and x+2xo. Since the noise is uncorrelated between points, "irrelevant" data values do not enter into the estimate. The sum of the weights adds to one, as is required of an idempotent estimate. The trivial processing technique is given by Eq. 5.26, using the weighting function Wt(x) = 1/3. Both the trivial weighting and the optimal weighting functions are shown in Fig. 5.1. The optimal weighting function weights the noise heavily when the noise amplitude (given by Eq. 5.21) is small. Note that the noise is zero for x = 3xo/2, so the data at that point is taken as the signal estimate. Thus the equivalent points x = xo/2 and x = 5Xo/2 are assigned weight zero. The smallest weight is not necessarily assigned to the points with the greatest noise: x = 0 and x = 3x0o

-1221.0 W(x).6 ~.41 2 \ Wt(x).2 x /2 x 3x /2 2x0 5x/2 3x =X Figure 5.1. The Weighting Functions in the Processing Example.

BIBLIOGRAPHY Agrawal, Hari Om, J. W. Kent, and D. M. MacKay. 1965. Rotation Technique in Electron Microscopy of Viruses. Science 148: 638-640. Almeida, June D. 1963. A Classification of Virus Particles Based on Morphology. Can. Med. Ass. J. 89:787-798. Aronszajn, N. 1950. Theory of Reproducing Kernels. Amer. Math. Soc., Trans. 68: 337-404. Bendat, Julius S. 1957. Optimum Filters for Independent Measurements of Two Related Perturbed Messages. Inst. Radio Eng., Trans. Circuit Theory CT-4:14-19. Berger, J.E., C. Richard Zobel, and P. E. Engler. 1966. Laser as Light Source for Optical Diffractometers; Fourier Analysis of Electron Micrographs. Science 153:168-169. Buerger, M.J. 1956. Elementary Crystallography. Wiley, N. Y. Capon, Jack. 1965. Hilbert Space Methods for Detection Theory and Pattern Recognition. Inst. Electrical Electronics Eng., Trans. Information Theory 11:247-259. Caspar, D.L.D. and A. Klug. 1962. Physical Principles in the Construction of Regular Viruses. Cold Spring Harbor Symp. Quant. Biol. 27:1-24. Cochran, W., F.H.C. Crick, and V. Vand. 1952. The Structure of Synthetic Polypeptides. I. The Transform of Atoms on a Helix. Acta Cryst. 5.581-586. Cramer, Harald. 1946. Mathematical Methods of Statistics. Princeton University Press, Princeton. Cramer, Harald, 1951. A Contribution to the Theory of Stochastic Processes, p. 329 to 339. In Jerzy Neyman ed. Proc. Second Berkeley Symp. Math. Stat. and Prob. Crick, F.H.C. and J. D. Watson. 1956. Structure of Small Viruses. Nature 177:473-475. Crick, F.H.C. and J. D. Watson. 1957. Virus Structure: General Principles, p. 5 to 13. In G.E.W. Wolstenholme and E.C.P. Millar eds. The Nature of Viruses. Churchill, London. Cutrona, L.J., E. N. Leith, C. J. Palermo, and L. J. Porcello. 1960. Optical Data Processing and Filtering and Filte ring Systems. Inst. Radio Eng., Trans. Information Theory IT-6:386-400. Davenport, Wilbur B., Jr. and William L. Root. 1958. An Introduction to the Theory of Random Signals and Noise. McGraw-Hill, N.Y. -123

-124Doob, J. L. 1953. Stochastic Processes. Wiley, N. Y. Friedrichs, K. O. 1960. Spectral Theory of Operators in Hilbert Space. New York University, Institute of Mathematical Sciences. (copyright, K. O. Friedrichs, 1961). Grenander, Ulf, and Murray Rosenblatt. 1957. Statistical Analysis of Stationary Time Series. Wiley, N. Y. Hall, Marshall, Jr. 1959. Theory of Groups. Macmillan, N. Y. Halmos, Paul, R. 1957. Introduction to Hilbert Space and the Theory of Spectral Multiplicity. 2d ed. Chelsea Publishing Company, N. Y. Heerden, P. J. van. 1963. Theory of Optical Information Storage in Solids. Appl. Optics 2:393-400. Heidenreich, Robert D. 1964. Fundamentals of Transmission Electron Microscopy. Interscience, Wiley, N. Y. Horne, R. W. 1963. The Structure of Viruses. Sci. Amer. 208:48-56. Horne, R. W. 1964. Electron Microscopy of Viruses. Sci. Progress 52:525-542. Horne, R. W. and P. Wildy. 1963. Virus Structure Revealed by Negative Staining. Advances Virus Res. 10:101-170. Hosaka, Yasuhiro. 1965. A Criterion for Evaluating the Number of Capsomeres of Icosahedral Capsids. Biochim. Biophys. Acta 104:261-273. Howatson, A. F. 1965. Structure of Viruses of the Papilloma-Polyoma Type. J. Mol. Biol. 13:959-960. Karhunen, Kari. 1947. iUber Lineare Methoden in der Wahrscheinlichkeitsrechnung. Ann. Acad. Sci. Fennicae (Suomalaisen Tiedeakatemian Toimituksia) I. Math.-Phys. 37:79pp. Kawata, Tatsuo. 1965. Fourier Analysis of Nonstationary Stochastic Processes. Amer. Math. Soc., Trans. 118:276-302. Klein, E. and G. Langner. 1963. Relations Between Granularity, Graininess and the Wiener-Spectrum of the Density Deviations. J. Phot. Sci. 11: 177-185. Klug, A. and J. E. Berger. 1964. An Optical Method for the Analysis of Periodicities in Electron Micrographs, and Some Observations on the Mechanism of Negative Staining. J. Mol. Biol. 10:565-569. Klug, A. and J. T. Finch. 1965. Structure of Viruses of the PapillomaPolyoma Type. J. Mol. Biol. 13:961-962.

-125Kozma, Adam and David Lee Kelly. 1965. Spatial Filtering for Detection of Signals Submerged in Noise. Appl. Optics. 4:387-392. Loeve, Michel. 1960. Probability Theory. Van Nostrand, Princeton. Marcus, Marvin. 1960. Basic Theorems in Matrix Theory. National Bureau of Standards Applied Mathematics Series - 57. U.S. Government Printing Office, Washington, D. C. Markham, Roy, Simon Frey, and G. J. Hills. 1963. Methods for the Enhancement of Image Detail and Accentuation of Structure in Electron Microscopy. Virology 20:88-102. Middleton, David. 1960. An Introduction to Statistical Communication Theory. McGraw-Hill, N. Y. Murray, F. J. 1941. An Introduction to Linear Transformations in Hilbert Space. Princeton University Press, Princeton. Norman, Richard S. 1966. Rotation Technique in Radially Symmetric Electron Micrographs: Mathematical Analysis. Science 152:1238-1239. Parzen, Emanuel. 1961. An Approach to Time Series Analysis. Ann. Math. Stat. 32:951-989. Parzen, Emanuel. 1962. Extraction and Detection Problems and Reproducing Kernel Hilbert Spaces. J. Soc. Ind. Appl. Math., Control Series A 1:35-62. Pontrjagin, L. 1939. Topological Groups. (trans. Emma Lehmer) Princeton University Press, Princeton. Rosenblatt, Murray. 1962. Random Processes. Oxford University Press, No. Y. Rudin, Walter. 1962. Fourier Analysis on Groups. Interscience, Wiley, N. Y. Stewart, R. M. and R. J. Parks. 1957. Degenerate Solutions and an Algebraic Approach to the Multiple-Input Linear Filter Design Problem. Inst. Radio Eng., Trans. Circuit Theory CT-4:10-14. Thiry, H. 1963. Power Spectrum of Granularity as Determined by Diffraction. J. Phot. Sci. 11:69-77. Upatnieks, Juris and Emmet N. Leith. 1964. Lensless, Three-dimensional Photography by Wavefront Reconstruction. J. Opt. Soc. Amero 54:579-580. Valentine, R. C. and N. G. Wrigley. 1964. Graininess in the Photographic Recording of Electron Microscope Images. Nature 203:713-715.

UNIVERSIT' OF MICHIGAN 63 901 03695 61732 -126Waerden, B. L. van der. 1955. Algebra. 3rd ed. of Moderne Algebra. Springer-Verlag, Berlin. 2 vols. Weyl, Hermann. 1952. Symmetry. Princeton University Press. Princeton. Wiener, N. 1949. The Interpolation, Extrapolation, and Smoothing of Stationary Time Signals. Wiley, N. Y. Wiener, N. and P. Masani. 1957. Multivariate Stochastic Processes. I. The Regularity Condition. Acta Math. 98:111-150. Wiener, N. and P. Masani. 1958. Multivariate Stochastic Processes. II. The Linear Predictor. Acta Math. 99:93-137. Williams, M. G., A. F. Howatson, and J. D. Almeida. 1961. Morphological Characterization of the Virus of the Human Common Wart (verruca vulgaris). Nature 139:895-897. Zadeh, Lotfi A., and John R. Ragazzini. 1950. An Extension of Wiener's Theory of Prediction. J. Appl. Phys. 21:645-655.