INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN ESTIMATES OF e-CAPACITY FOR CERTAIN LINEAR COMMUNICATION CHANNELS by W. L. Root Professor of Aerospace Engineering

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN ESTIMATES OF c-CAPACITY FOR CERTAIN LINEAR COMMUNICATION CHANNELS W. L. Root This paper is a preliminary attempt to study the questions, "How much information can be communicated in a given situation?" and, if meaningful, "What is the largest possible rate of transmission?" when it is not appropriate to characterize the distortions and perturbations of the signal probabilistically, at least in more than a rudimentary way, so that the Shannon theory cannot be applied. The use of the concept of c-capacity is not new in the study of such questions (see [ 1], Appendix, and [ 2]), but it appears not to be widespread, and most of the emphasis has been on purely mathematical questions, rather than engineering ones. In this paper, estimates are obtained for c-capacity when e is not small; they are not asymptotic estimates. It was desired to represent situations where there is appreciable additive disturbance or a significant lack of knowledge about the linear channel. The upper and lower estimates are not nearly as tight as could be desired, but this seems to reflect not only the naivete of the author, but general difficulties in the mathematical theory of packings and coverings. There is certainly room, however, for the use of ingenuity to improve the results; it would appear (see the Appendix) that a straightforward application of the classical results on packings and coverings is not very fruitful. For time-invariant, or nearly time-invariant channels the notion of rate of transmission makes sense. Estimates have been computed using the well-known Kac, Murdock, Szego theorem (see [3]) on asymptotic behavior of the eigenvalue of a translation kernel. These calculations bear considerable resemblance to Gallager's (see [7]) on the (Shannon) capacity of This work was supported in part under NASA Grant 02905.

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -2 -Gaussian channels (they were not, however, motivated by Gallager's work), and it is of interest to compare results, because the problems are not the same. I SIGNAL SETS AND TRANSFORMATIONS We use an abstract but very simple model which is suitable for a large class of communications problems: there is a transmitted-signal set, which is a specified subset of a suitable function space, a receivedsignal set, which is the same sort of thing but which must contain the images of the transmitted signals under whatever transformation they undergo, and a channel (or perhaps a class of channels), which is a mapping from the transmitted-signal set into the received-signal set. In this paper the channel will always be either a linear transformation or an affine transformation: if x belongs to the transmitted signal set, the element of the received signal set that corresponds to it is z = Hx + n (1) where H is a linear transformation and n is an element of the received signal space. Conceptually, n is added "noise", and H is the operation performed by the transmission medium on the emitted waveform; for example, in HF radio communication H could represent the transformation of the radiated signal effected by scattering in the ionosphere, and n could represent thermal noise, antenna noise, shot effect, etc., plus, perhaps, distortion in the receiving equipment.

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -3 -Signal Sets More or less arbitrarily we take all the signal sets to be considered to be subsets of the space of all square-integrable, real-valued functions defined on a specified finite interval. We denote by L2(a, b) the space of such functions with the usual inner product, i. e., x e L2(a, b) if x(t) is real and b 2 x (t) dt < oc a Then, b (x, y) = K x(t)y(t) dt a 2 and |IxIl = (x, x). Two functions x(t), x(t) are equivalent if |Ix - x|| 0; then, as elements of the real Hilbert space L2(a, b), they cannot be distinguished. There are other function spaces which could serve and which might in certain circumstances be more appropriate. However there are some obvious reasons for the use of L2: functions of integrable square on a finite interval correspond to physical quantities of finite energy; the L 2-norm corresponds to a root-mean-square error criterion, which is often appropriate - the distance, lix - y||, between two functions is the square root of the energy of the difference; and, not least, separable Hilbert space is more tractable for computation than most function spaces. Because of the identification of the L2 norm with the square root of energy, it seems appropriate to confine the transmitted signal set to lie

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -4 -within a ball in L2(a, b) centered at zero, where by a ball of radius r centered at x is meant the (closed) set of points x satisfying x - x < r. (We say, specifically, open ball to designate the set of points x for which x - x < r, and sphere to denote the set for which x - x || = r. The same terminology is in force when we talk about finite-dimensional Euclidean spaces. ) Suppose for the moment a = 0, b = 1. One can take as a complete orthonormal basis for L2(0, 1) the set of functions { 1, B/~2 sin 27rnt, / 2 cos 27rnt}, n = 1, 2,... Then the unit ball (centered at zero) consists of all x(t) = a + 2 /2an cos 27rnt + S /B2 b sin 27rnt for which 2 2 20 n n a + Z (a + b ) < 1. This means that there are functions x(t) in the 0 n n unit ball with an arbitrarily large fraction of their "energy" concentrated in arbitrarily high frequency components. Consequently, the entire unit ball in L2(a, b) appears at first to be unsatisfactory as a model for a signal set, because, at least in any one given physical situation, signals cannot be generated with arbitrarily high frequency components containing non-negligible energy. However there is an artifice for getting around this difficulty; we return to this point presently. In an asymptotic sense, any set of functions { o (t) } which comprise a basis for L2(a, b) can be thought of as giving a decomposition of a function in L2(a, b) in frequency. This is a loose statement, but it has at least the following simple but precise interpretation. Let { O (t) } and { Qk(t) } be two complete sets of orthonormal functions in L2(a, b). Let x(t) be any function satisfying 11 x1 i< 1 N

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -5 -for some arbitrary e > 0 and some fixed arbitrary positive integer N. Then, as is easily shown, there exists a positive integer M, depending on N, {n } and {n } but not on x, such that M x - 2 (x, 'k)/|k < 2E If now one takes the 0 's to be the sines and cosines of the ordinary Fourier n series, one can say that any function which has all but a negligible part of its energy concentrated in the frequencies less than or equal to that of O (t), has all but a negligible part of its energy concentrated in the first M k-components. Thus, the condition that the high-frequency components of a signal must necessarily tail-off, can just as well be stated in terms of the k- components, where {Sk} is an arbitrary orthonormal set. If the previous arguments are accepted, there is then the question of exactly how to restrict the "high-frequency" behavior of the signals which are to be considered admissible for a particular problem, and of how to state this restriction in terms of an arbitrary orthonormal basis. This is still a question of mathematical model-making, and not a question of solving a well-posed mathematical problem. It appears that an appropriate and convenient restriction is to require that a class of admissible signals should be a subset of a compact subset of L2(a, b). To see why, let us first note the characterization of compact subsets of a separable Hilbert space H in terms of an arbitrary orthonormal basis. Let K be a subset of H, and let { n} be an arbitrary complete orthonormal set (c. o. n. set) in H. Then each element x e K has an expansion

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -6 -00 X= x X _Z n n where x = (x, 4 ). The set K is compact if and only if it is closed and bounded and 00 Urn Z [x 2 = 0 (2) n-N uniformly for all x c K. Thus a subset of a compact set must have the uniform tailing-off at higher frequencies which we require. Furthermore, we shall argue presently that the linear operators of interest in our formulation of communications problems are compact operators; as such they have the property that they carry any bounded set into a subset of a compact set. So if the signals available come by means of any such operation acting on a finite energy source, they lie in a compact set. Finally, compact sets are appropriate from a computational point of view, because, as is evident from Eq. (2), they are "nearly finite-dimensional", so that finite computational algorithms have a chance of yielding good results. Two particular examples of compact sets should be noted. First, let {a } be an arbitrary sequence of non-negative numbers such that S a < co. Then, the set of all x such that x < a is a compact set, a nn n called a compact parallelopiped. Second, the set of all x such that 00 X lZ2. l1 a n is compact; it is called a compact ellipsoid with semi-axes al, a,... -....

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -7 -In modeling communications problems it is convenient to take as a normalized transmitted signal space the unit ball in L2(a, b), but in the face of the above objections this would appear to be a poor choice. However, if one accepts a compact ellipsoid as being a suitable transmitted signal space, one can conceive of it as the image space resulting from the operation of a hypothetical linear operator on the unit ball (this statement is justified later). Then the compact ellipsoid may be replaced by the ball if the actual channel operator is replaced by a new operator which is the composition of the actual one and the hypothetical one, as shown in Fig. 1. Signal Space I Channel Image in received (compact ellipsoidc) (linear operator) signal space Unit ball Hypothetical oper-j ator (pre-filter) | Unit ball taken as signal space FIGURE 1

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -8 -In other words, one can think of the signals as being pre-filtered before they are transmitted, and hence justify a system model which uses the unit ball as transmitted-signal space. Linear Channel Operations The linear operator H indicated in Eq. (1) we assume to be representable as an integral operator with a real kernel h(t, s); so that if x e L2(a, b) and Hx = y, we have b y(t) = \ h(t, s)x(s) dx (3) a We assume further: (a) H is realizable, i. e., h(t, s) = 0 for t < s. (b) H has bounded memory T, i. e., h(t, s) = 0 for t > s + T b+T b 2 (c) + h (t, s) ds dt < oo. a a Conditions (a) and (b) imply that y(t) as defined by Eq. (3) vanishes for all t outside the interval [a, b + T]. We put c = b + T; then condition (c) implies that Eq. (3) defines a compact (actually Hilbert-Schmidt) linear transformation H from L2(a, b) to L2(a, c). The adjoint transformation, H*, carries elements from L2(a, c) into L2(a, b), and is defined by ac x'(s) - h(t, s)y'(t) dt, a < s < b (4) a where y' e L2(a, c).

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -9 -The transformation H H maps L2(a, b) into itself, is compact and self-adjoint. By the spectral theorem for compact self-adjoint operators, there is then known to exist a (non-empty) orthonormal sequence {f } of elements from L2(a, b) such that " -X k n= 1, 2, * (5) H' n nXn' ' 'n where X > 0, where the X have finite multiplicity, and ('if there are infinitely many of the X ) X -0. We can thereby order the X so that n n+1l Put 1/2 Hor (6) 8/n 1/2 H~n' n so that Ho- X 1 l/2 (7) n n n for all n. Then _= 1~ H HHo n 1/2 n n or - = 1/2~ (8) Also, HH' '= X / Hn = X & (9) n n n n n

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -10 -Since H is defined on L (a, b), it is defined on the transmitted-signal set. The image of the transmitted signal set under H is called the receivedsignal set for H; i. e., if x belongs to the transmitted-signal set, y = Hx belongs to the received-signal set. If we are talking about a class of channel operators H, then the term received-signal set will denote the union of the received signal sets for each H in the class. Now if the transmitted-signal set is the unit ball in L2(a, b), the received-signal set is a compact ellipsoid in L2(a, c). In fact, take for an orthonormal basis of L2(a, b) the set {n (t)i (augmented if necessary by an orthonormal basis for the null space of H' H). Then the transmitted signal set is the set of all x = A x f such that x < 1. The receivedZ X n n 1n 00 1/2 00 signal set is the set of all y = X n xnn = Z Y such that oo 00 i, 2 — < 1 (10) n Since either X — iO monotonically or there are only finitely many X O0, the inequality (10) defines a compact ellipsoid with semi-axes a 0, k k where ak = k. A special case of particular interest to us is that of a time-invariant channel. We say a channel is time-invariant if the kernel h(t, s) characterizing the channel is defined for all real values of t, s and is a function of their difference alone. Thus we can write h(t, s) = k(t - s). We still

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -11 -use the kernel to define an integral operator from L2(a, b) to L2(a, c), and we retain the requirements of realizability and bounded memory. Thus, because of these conditions, (a) and (b), b y(t) = \ k(t - s)x(s) ds, a < t < c, a or 00 y(t) = k(t - s)x(s) ds, -oo < t < oo, (11) -00 where x(s)= x(s), a s< b = 0, all other real s and y(t) = y(t), a < t < c = 0, all other real t Eq. (11) still defines a linear transformation from L 2(a, b) into L2(a, c) r~,~~~~~~~~ 2 through x —x — y B —y. We still require condition (c), so the transformation is Hilbert-Schmidt. It should cause no confusion if henceforth we drop the tildes and identify x, x and y, y. Obviously we can write Eq. (11) (with tildes removed) in the form y(t) = 0 k(t - s)Iab(s)x(s) ds, -00 where

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -12 -I (S) 1 a<s <b = 0, otherwise Then, c 0 oo (Hx1, Hx2) =- ( k(t - S)Iab(S)Xl(S) ds a -oC k(t - s')I a(s')x(s') ds' dt -oO =. k(t - s)Iab(s)xl(s) ds k(t - s')Iab( s)x(s') ds' dt -00 -o00 -00 Hence, since (Hx1, Hx2) = (H Hx1, x2) (note that the inner products are respectively on L2(a, c) and L2(a, b)) we have, (H'Hx1, = x2 (s) I (s) k(t ) - k(t - s) dt I(s) x(s)ds ds' -o0 -00 -00 Thus, still with the identifications x, x, and y, y, the operator H"'H is given by [H Hx (s')= Ibs')g(s, s)I (s) x(s) ds -00oO where -00 The kernel g(s', s), which can be readily shown to define a bounded linear operator on L2(-oo, oo), is symmetric and time invariant, i.e.,

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -13 -g(s' + u, s + u) = g(s', s). Thus we can write g(s' - s) instead of g(s', s). These facts will enable us later to obtain certain asymptotic results for the case that the linear channel is time-invariant. II COVERINGS, PACKINGS, c-ENTROPY AND c-CAPACITY Definitions, notation, and one or two elementary results are given in this section. A finite or infinite system of sets {Ki} covers the set A if AC U K.. Let R be either a finite-dimensional real Euclidean space with the usual norm, or a real, separable Hilbert space. Let Ac R be a non-void set with compact closure; i. e., let A be totally bounded (see [ 1]) (or just bounded if R is finite-dimensional). Let the sets K. of the covering be translates of a ball, B(E), of radius e/2, i. e., K. Bi.() = B(c) + x., x. R, i= 1, 2... 1 1 1 1 Since A is totally bounded there are coverings of A by finite systems of translates of B(e). We denote by N (A) the smallest number of sets in such a covering, and call its logarithm H (A) = log2N (A) the e-entropy of A.

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -14 -A system of sets {K.} is a packing of a set A if K. n K. =, i / j, U K.c A. Again, we are concerned with packings by translates of a ball of given diameter. In an oo-dimensional Hilbert space a compact set cannot contain a ball of non-zero diameter. However, we are really interested in the centers of the balls, which form an c-separated set (to be defined), and the concept of c-separated set is useful in the oo-dimensional case. As before, let R be either a finite-dimensional real Euclidean space, or a real, separable Hilbert space. A' set S of elements s, s2,... belonging to R is an c-separated set if jsi - sj > e, for all i, j, i / j. It is clear that if, given A c R, we let A' be the union of A with all points x that satisfy |i - a|]<E/2 for some a c A, and if S is an c-separated subset of A, then the system of balls of diameter c with the points of S as centers is a packing of A'. Conversely, for any packing of A' with balls of diameter c, the centers of the balls provide an E-separated subset of A. It is well known that an c-separated subset of a totally bounded set A can contain only finitely many elements. Let Ac R be totally bounded. We denote by M (A) the largest number of points in an e-separated subset of A and call its logarithm C (A) = log2M (A) the e-capacity of A. There is a simple fundamental relation between e-entropy and ecapacity. For any totally bounded set A N2(A) < Mc(A) < N(A) or, equivalently,

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -15 -H2~(A) < C(A) < H(A) To prove the first inequality, note that M (A) is the number of points in a maximal c-separated set S. Hence if one centers spheres of diameter 2E about each point of S, every point of A must be covered; for any point not covered could have been adjointed to S, thus contradicting the maximality of S. To prove the second inequality, suppose M (A) > N (A). C C Then there would be an c-separated set contained in A with more points than the number of balls in some covering of A by balls of diameter C. Hence at least two elements of the C-separated set would fall in one c-ball, which is a contradiction. III BOUNDS ON c-CAPACITY WITH KNOWN LINEAR CHANNEL Suppose the transmitted-signal set is the unit ball in L2[0, 1]. Suppose the known channel operation H is linear and Hilbert-Schmidt, and is realizable and has finite memory T, so that it maps L [0, 1] into L2[0, 1 + T]. Thus, as pointed out in Section I, the received-signal set is a compact ellipsoid with each semi-axis equal in length to the squareroot of an eigenvalue of H H. We would like to know how many different signals can be transmitted, subject to the condition that the received signals are at least e apart in the L2 norm. Thus, we are asking how many elements can there be in an c-separated set contained in the received-signal set, or, equivalently, what is the e-capacity of the received signal set. Upper and lower bounds for C-capacity are obtained, and, for the special

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -16 -case of a time-invariant channel where the concept of rate makes sense, an estimate is obtained for an asymptotic rate. Let us denote the ellipsoid in L [ 0, 1 + T] that constitutes the received-signal set as E. We denote its semi-axes by a1, a2, a3,... and assume they have been ordered so that a1 > a2 a3 >... '. The unit elements along principal axes of the ellipsoid, b1', 2,...., enumerated to correspond to a1, a2,..., are taken as coordinates in the receivedsignal space. The projection, E, of the ellipsoid of possible received signals on the linear space spanned by 1, V' I..., ' is itself an ellipsoid n in n-dimensional real Euclidean space, R. The volume of the unit ball n/2. in R is denoted by k k - = k/ ); the volume of the ellipsoid with semi-axes a1, a2..., a is then k a a.a 2 ' n n 1 2 n It is convenient to discuss ellipsoids in R slightly larger and slightly smaller than E. We denote by E (n) the ellipsoid concentric with E, with the same principal axes as E, and semi-axes, in order of decreasing magnitude, a (1 + r/an), a2(1 + n/a),... an +. Obviously, if n is negative it must be less than a in absolute value for this notation to be n meaningful. Then, if r7 > 0, E () c E, if -a <.) < 0, E () DEn. In either case the minimum distance between the surface of E and En (r) is n. This statement is readily proved by considering concentric spheres of radius a and a + r7, respectively, with a small sphere of radius ri centered at an arbitrary point of the inner one. The conclusion is obvious if one transforms the whole configuration by the non-singular linear transn formation that carries the sphere of radius a into the surface of E. n

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -17 -Upper Bounds First let us consider a simple upper bound that results from the obvious volume argument. Let e > 0 be fixed, and temporarily let n > 1 be fixed. Any c-separated set in E determines a packing of e-balls in in E (e/2) if an c-ball is centered at each point of the E-separated set. The number of balls in such a packing is less than the ratio of the volume of E (e/2) to the volume of the balls, hence n k X a.(1 + E/2a ) n in M (E ) - n n = (1 + /2an)n [ (2a.i/) (13) i-1 For fixed e this bound is valid for all n, but it is useless for large n because it diverges to +00 (as can be easily seen, since a — 0), whereas M (E ) converges monotonically as n —oo to M (E), which is finite. It E ( appears that this upper bound starts to become very loose when n is large enough so that a is smaller than e. We can modify (13) so as to obtain an upper bound that applies for arbitrarily large n, and hence provides an upper bound for M (E), as follows. Choose a value a, 0 < a < 1, and pick n to be the smallest integer such that a n —. Suppose S is an c-separated set in E consisting of points s., s,..., s. Consider the m ellipsoid E, and note that any point in E can be written as s = s1 + s" where s' E, (s1, s") = 0, and |ls"ll< a < Now, project S on the subspace spanned by Al b,2..., i/ to give S', a set of points s1, s...', s

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -18 -Since 1is s < 2a < ace while s. - s. > e, it follows that H' J f i J |s - s. > / -, i, j = 1, 2,..., m; i. e., S'c: E is an J2 ( '/I - a )-separated set. Consequently, M (E) is not greater than 2 n the number of points in some (c 1 - a )-separated set in E, i. e., 1 C eIl - a / a \ c~ where n is the smallest integer for which a <- a < 1. If a <1 is n 2 adjusted so that a = - a < 1, for some n, then we can write a little n 2 more simply, 1 1 M(E)s.+ (15) \ h 1 The bound given by (13) is not satisfactory as a bound for M (E); the one given just above in (15) is satisfactory, but can fairly readily be improved upon. The reason for including (13) is that it is very simple and gives a sort of bench mark with which to compare a bound about to be derived. The truncation procedure that carries (13) into (15) can be used again on the better estimate without change. To improve (13) we apply the method of the well-known paper of Blichfeldt [4] to an ellipsoid (instead of to a cube, as he did; this is a trivial change). Blichfeldt's argument depends on the following inequality (see [4]; the proof is rather simple, however): if there are m nonintersecting balls of radius e/2 in Rn, and if r. is the distance from an 1

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -19 -arbitrary fixed point to the center of the i'th ball, then m 2 2 2 \, r. > 2(m - 1)( (16) i=1 Now, one considers balls of radius - /2 with a varying "mass density" 2 a 2 equal to 2 - r at distance r from the center. Each of the original m non-intersecting balls is replaced by one of the "new" balls centered at the same point; the new balls may, of course, intersect. Then, from (16), since the r. are unchanged m m 2 2 2 e mE \ 2 \ / 2 D > - ] r.: r r - 22 2 i 2 i/ i=l i-1 The expression on the right is just the total density of the new balls at an arbitrary point, which has been taken as the origin of coordinates. If now we consider a packing of E n(/2) with balls of radius c/2, then replace each ball with one of the larger, variable density balls centered at the same point, the new balls will be contained entirely in the ellipsoid En A( 2 T2, and the total mass of En ( ) will be less than c 2 2 - times its volume. Hence 2 2 2.T a. 7l2 ' 2 n 2 k Ij ai 1+ 2a ) Mk r dr 2 n - e1 o 2a n 2 2 where M is the number of balls. Thus

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -20 -n+2 n/2 2 E>- 2 E E M 4 < a. 1+ 2 n+ 2 i 2a.,a 1 + - ' or, since this must hold for any packing, M (En) < n.2. 1 + (17) ns+2E 2a n 1 The ratio of the right side of (13) to the right side of (17) is /2 2a +e 2 \ 2 / N'e n n+2 2 + 2a +e El n which exceeds one for sufficiently large n. The improvement, however, for non-realizable c is not as dramatic as might be expected if one compares Blichfeldt's classical upper bound on packing densities for all space with the simple upper bound given by the volume argument. The reason, of course, is the appreciable "edge effect" that shows itself when c is not very small compared to a n Again, the right side of (17) diverges to +oo as n- oo with e fixed. However, by exactly the same argument used to obtain (14) from (13), it may be truncated to provide an upper bound for M (E): n 2 n 2 a. _n+2 2( l + -i) 1+ 2a /K2a ) n C

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -21 -where n is the smallest integer for which a < _E a < 1. Again if a < 1 n 2 is adjusted so that a = this becomes Vi n 2' M (E). 2 + - - (19) Taking logarithms gives the bound on c-capacity C (E) log + n log +i=l n + 2 \ + log; 2 ) (20) where a and n are as given above. The precise value of a to give the tightest bound depends, of course, on the way the a. decrease; no attempt is made here to investigate this (see [2]). Lower Bounds If S is a maximal c-separated set in E, then the union of the system of 2c-balls centered at points of S is a covering of E, as pointed out in Section II. The number of sets in a covering of E must exceed the ratio of the volume of E to the volume of the 2E-balls, hence n k a. M (E) > M (En) > ~ c n k e n IT1 02N) (21)

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -22 -This inequality holds for all n, but when it is used to bound M (E) n should not be taken larger than the maximum value for which a > e, n for then the right side of (21) starts to decrease, and actually diverges to zero as n-woo. Despite the triviality of the argument leading to (21) it appears difficult to improve on it substantially. Some justification for this statement can be found in the Appendix, where a lower bound is obtained for M (E ) which is somewhat sharper than (21), but which is valid only for small c, and is not suitable for the analysis to follow. This sharper bound is obtained using a "best known" result on packing densities, and improves (21) roughly by the factor n. The estimate of capacity corresponding to (21) is, of course, rn ~/ 2a. \ C (E)> j log - ) -n log 2 (22) i=1 where n is the largest integer such that a > E. n Asymptotic Rates with Time-Invariant Channels Suppose there is given a kernel h(t, s) defined for all real t, s which satisfies the conditions for realizability and bounded memory (< T) and is square integrable in any bounded domain of t, 's. Then for any finite time interval [-T, T], h(t, s) determines a linear integral transformation with domain L2[ -T, T] and range L2[ -T, T + T]. We define the c-rate of transmission, if it exists, to be

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -23 -C (ET) CE(ET) lim r = lim r 2T+ 2T T — oo T —oo where ET is the compact ellipsoid which is the image of the unit ball in L2[ -T, T] under the (Hilbert-Schmidt) channel operator. We shall not attempt here to study the general question of when this rate exists; we limit the problem to the special case that h(t, s) = k(t - s) is a time-invariant kernel, and even in that case we do not prove the existence of the c-rate; but simply obtain asymptotic upper and lower C (ET) bounds for 2T- that are valid as T- coo. We use the facts established at the end of Section I concerning timeinvariant operators, and a well-known theorem of Kac, Murdock, and Szego [ 3], which reads as follows for the special case in which we are interested: Consider the integral equation p(x - y)o(y) dy = X4(x), -T < x < T (23) where p(t), -oo <t <oo, is an even function which is absolutely integrable, and whose Fourier transform F(g) =, c itx F([) - \ p(x)e -oo is also absolutely integrable. Then, if NT(a, f3) is the number of eigenvalues of (23) falling within (a, 3) and if (a, 3) does not contain zero,

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -24 -T-*oc L where 2T NT( ' ) < F()< 1] where Au Y]: a < F(~) < ( means the Lebesgue measure of the set of I's for which a < F($) < /3. Note that the conditions on p(x) and its transform are sufficient to guarantee that both p(x) and F(g) are bounded and continuous, and that p(x) = 2 B F(~)e t d -o00 Now consider the situation that the channel operation is determined by a kernel h(t, s) = k(t - s), when k(t) is bounded and k(t) = 0 for t < 0 and t> T. Since k(t) is bounded, the condition that jh(t, s) 2 be integrable in both variables in any bounded domain is automatically satisfied. We put K(f) = k(t)e dt = k(t)e dt Since k(t) is bounded and non-zero only on a finite interval, it belongs to L1 and L2. Hence K(f) exists, is bounded, and belongs to L2 by the 1 2 2 Plancherel theorem. Consider the iterated kernel g as given by Eq. (12), p 00 g(u) = k(t)k(t - u) dt (24) -00 g(u) is an even function; it belongs to L because it is the convolution of two functions in L1, and its Fourier transform is K(f)K(f) - iK(f)| which also belongs to L because K(f) belongs to L2. Thus g(u) meets all the conditions required of the kernel p in the Kac, Murdock, Szego theorem just quoted.

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -25 -We now estimate finite products of eigenvalues of the self-adjoint, non-negative operator on L2 [-T, T] given by Eq. (24), T [H'Hx](s) = \ g(s - s')x(s') ds', -T < s < T Let a = IK(0) | and suppose that a > 1. Let a1, a..., aM be any finite sequence of real numbers satisfying a >a1 > a2 >.. > aM > 0. By the Kac, Murdock, Szego theorem the number, n., of eigenvalues A of H 'H satisfying a. < X < a. is T 2 L J n.=I! < IK(I) '+n where n. —0 as T —oo. Put m. - H i E:a. < IK (~)| < aj 1 j J2 then n. = T/7r(m + n.). The product, P., of eigenvalues lying between a. and a_ is then bounded by J J-1 n. n. o P. <p a. J J j-1 or T/ 7r(m+n.) T/ r(m +n ) ac. 3 P. <_a (25) J J j-~ Now since |K(f) j is the restriction to the real axis of the Fourier transform in the complex domain of g(t), and since g(t) (by Eq. (24))

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -26 -2 vanishes outside a finite interval and is bounded, it follows that iK(f) I cannot take on any one value on an infinite set of points contained in any finite interval; for, by a well-known theorem of Paley and Wiener, the Fourier transform of g(t) is analytic and not equal to a constant. Consequently, a fortiori, |K(f) | cannot take on any one value on a set of positive measure. Thus, given an arbitrary e > 0, it follows that the number v. of eigenvalues exactly equal to a. is less than or equal to TE/7r for sufficiently large T, j = 0, 1,..., M. We can now write estimates for the product of all eigenvalues greater than a. Suppose the A's are ordered so that > 2 X >..., and M 1 2 3 let XN, N = N(aM, T), be the smallest of the X's strictly greater than aM. (This is possible because the set of X's contains no limit points other than zero.) Then T/7r(m1+n1) T / 7r(m2+n2) T / (mM+nM) a2.M 1 2 v.. M 1 2 v M v0 o 1 M-1 1 2...aM-1 - X < a2 ak 1 2 N o 1 M-l T/7r(m1+n1) T/7r(m2-n2) T/7r(mM+nM) a a1 aM-1 (26) Now, given e > 0, take T large enough so that n.i <- m. -,and so that v so ' aj 2 C C 0 0 v. <T/ir m -and also v. T/t7r m - for all j j j 2 j+l 2'

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -27 -Then M T M N - \ m. log a. m. log a. < log X. i=l i=l i=l M e T M < - -- m. log ai (27) 7r i -1 ir i " 1 i=l i=l We also have upper and lower bounds on the number n of eigenvalues exceeding a: m M M-1 M T M n= > n.+ < v - m. - m. it " J / J J j 7rj j=l j=l j=l j=l M e T M n>_ - r m. -- 2m. (28) 7r j J ' 2 j=l j=l We can now apply (27) and (28) to (19) to obtain an upper bound on the e-capacity C (E). Since 1/2 2a. 2X./ 1 1 1 4 log -= log - -=log-2 and since (27) remains valid when all the X.'s and a.'s are multiplied by 1 1

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -28 -a fixed constant (this follows from (25), or from (26)), we have M C (E) < (1l+E )- m log 2ai-1 i=l E M j I 2 i -a 0 M7 + log (1 + j) E m+2 j=l -A M M aM 2 A mi log E i- m log i=l i=l is an upper approximation to S log 2/E |K(f)| df IK(f) I >EM 2 2 by simple functions. Since aM can be taken equal to 4, and since the al 2 inequality is valid for any finite sequence a > al >... > og <a = | K(O) | aM u, we have for any positive a < 1 and any e > 0, log 2JE O~f) df

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -29 -2K(f)|> C (E) (1 + df log, _ + df K( >|K(f)| e (E) (1 +e_) _r _ log _K | df + 2.5(1 + e ) T dflog |K(f)|>| — +l og (1 + ) df+1 (30) 2 I K(f)> 2 9) gog df r C I K(f)1 > + 2. 5(1l+ ) T df IK( f) >.

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -30 -for all sufficiently large T. Thus we have finally for an asymptotic upper bound on the rate: C (ET) 1 2K(f) 25 5 lim 2T log K ---) df +- df (31) T — oc i |K(f)|>i 2I K(f)l>2 These inequalities are valid, of course, for all e > 0. In analogous fashion we now obtain an asymptotic lower bound. From (27), (28) and (22) we have for sufficiently large T T 4 T M (E ) m log (1 - e ) 27 lo m (32) i=l i=l 2 where aM is equal to e /4. The first sum on the right side of (32) is a lower approximation to log- |K(f)| df IK(f)l > M by simple functions. Hence C (ET) > (1 - e ) log | -) df- (1 ) - df (33) |K(f)| >2 I K(f) >,for all T sufficiently large. Thus,

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -31 -C(E) 1 2K(f) 1 r E T >21r > log df - df (34) T oc-^ o 2T -27 - 2"r |K(f)j > |I K(f)J > These inequalities are valid for all e > 0. The definition of rate is perhaps worth a comment. Implicit in the definition is that the maximum permissible signal energy is normalized to be 1 as T- coo, and the 'noise energy", E, is also held constant. This is equivalent, of course, to letting both signal energy and e grow proportionately as T increases. It seems reasonable in the case of timeinvariant channels that a rate exist (that is that C (ET) should increase asymptotically proportionately to T) if one thinks of sending long duration signals by chopping them into equal duration pieces which are sent one after the other, but with a pause between transmissions long enough for the channel to quit ringing. IV APPLICATIONS AND COMMENTS The most obvious application of the estimates of Section III is to the situation that there is unwanted noise added to the signal before it is available for processing at the receiver, where the L -norm of the noise is less than E, and where there is no (or very little) statistical information about the behavior of the noise so that the Shannon channel capacity cannot be computed. It has been pointed out by Kolmogorov and Tihomirov [ 1]

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -32 -that if it is known only that the probability is zero that the received signal lies outside the e-ball centered at the image of the transmitted signal, then the c-capacity is less than or equal to the Shannon channel capacity. If the statistical structure of the noise is known then the Shannon channel capacity is obviously a more satisfactory measure of the informationtransmitting capability than is the c-capacity, which usually is sort of a crude approximation to the channel capacity. In connection with this last remark, let it be noted that if the norm of the noise is less than e with probability (1 - r), then the e-capacity is the logarithm of the largest choice of signals that could be sent with an individual probability of correct reception of not less than (1 - r)). The location of possible signals in the transmitted signal set has not been discussed. However, for the case we have been considering in which the transmitted signal set is the unit ball in L2 and the channel operator is compact, it is easy to see that the e-capacity is not reduced if the signals are all constrained to lie just outside the surface of the unit ball instead of anywhere within the ball. The argument to show this has been used already in Section III. In fact, let S be an arbitrary e-separated set in E consisting of points s, s,..., s. Then, we have already observed that if n is chosen large enough so that a < -, the projection S' of S n 2 n 2 onto E is an (c 1 - a )-separated set with m elements. Each of the m elements of S' can now be "lifted" to the surface of E; i. e., we can replace s' by s" = s' + u. where u. is orthogonal to the subspace spanned by Ena 1? i 1 1 n+ E and s'.' lies on the boundary of E and hence of E. The set S" of 1 i -a2. points s" is a fortiori (e ^1 - a )-separated and contains m elements. Since a > 0 is arbitrary, we can achieve an ( + $ )-separated set of m elements lying on the surface of E, for arbitrary r > 0. Hence,

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -33 -if the unit ball is replaced by a ball of radius 1 + rA, for any nr > 0, there is a set of m points lying on its surface whose image under the channel operation is an c-separated set in E scaled up in all its linear dimensions by the factor (1 + rA). This observation justifies the application of the c-capacity estimates to one kind of situation where the channel operator itself is unknown. Suppose the true channel operator H is unknown, but is known to lie within c of a Hilbert-Schmidt operator H, when the measure of distance is provided by the Hilbert-Schmidt norm. Then if lIx|| = 1, |Hx - H x|l - e; and, 0 in fact, the set of elements of the form y = Hx, where H is any such operator, is exactly the set of elements in the ball of radius e about the point H x. This last statement can be proved by establishing an orthonormal basis in the transmitted signal space which has x as its first element; establishing an arbitrary orthonormal basis in the received signal space, and representing any Hilbert-Schmidt operator H as an infinite matrix with respect to these bases. Then it follows that the number of possible signals of norm one which could be transmitted without error is less than or equal to M (E) and greater than or equal to M (E) for arbitrary Y > 0. The estimates of c1+7 Section III apply directly. In particular, if H is a "time-invariant" channel operator of the kind specified in Section I, the estimates for rate obtained in Section III apply. Since the upper and lower estimates are continuous in c, we have from (31) and (34),

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -34 -r -2r log e dfe + df (35) I K(f)| >| JK(f)| > 2 and r > 2 log i 2(fi df - df (36) J|K(f)| > |K(f)| > where ~r - -im log [number of signals] -TV s~2T r = lim log [number of signals] T — co 2T Note that H does not need to be time-invariant; it simply must be known to lie within E of a time-invariant operator. These estimates appear to be of some interest, because they give some notion of communication rate for "slowly time-varying channels. " It is reasonable to suppose that, by virtue of intermittent measurements and suitable calculations such channels can be approximated for finite periods by time-invariant channels (see [ 5]). It also seems reasonable not to try to describe the lack of precise knowledge of such a channel in probabilistic terms, as would have to be done to permit a computation of channel capacity.

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -35 -The restriction imposed above that the signals must be of equal energy is not a very happy one; it was imposed solely to make the problem simple. Unlike the situation for additive noise, where essentially it does not matter, the restriction matters a great deal when the distortion is "multiplicative" as it is in the case of the unknown channel. When the uncertainty is in the operator H, the size of the ball of possible received signals corresponding to a transmitted signal x is proportional to the norm of x, so there is advantage to using signals in the interior of the unit ball. Indeed, if there is no additive noise, there is no bound to the number of distinguishable signals lying in the unit ball. A meaningful problem involving additive noise, unknown channel, and free choice of signals in the transmitted signal set leads to such a tremendously complicated packing problem, that no effort has been made here to attack it. APPENDIX The purpose of this Appendix is essentially to justify the lower bound obtained in (22). The argument leading to (22) is so simple and crude that one feels that just a little care should give markedly sharper results. Since e-capacity has to do essentially with packings, and since there exist lower bounds on packing densities in R (real Euclidean nspace) which have been arrived at only after rather difficult arguments it would seem such bounds could be used profitably. Undoubtedly (22) can be improved. However, using best known results on packing densities ("best known results" means the strongest results quoted by C. A. Rogers in his recent monograph [6]) in a straightforward (and perhaps naive) fashion gives roughly an improvement over (21) only by the factor n.

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -36 -Moreover, the estimates are suitable only for small E, and are unsatisfactory as n- co, as will appear in the development to follow. First let us give the definition of packing density used by Rogers in the reference cited. Let K R where K is a Lebesgue measurable set, 0 < p(K) < oo, u(K) = Lebesgue measure of K. Let K be a system {K + a.}, a. c R, of translates of K; let C be a cube in R whose sides are of length s. Define p + (K, C) () (K a (K+a.)n C where the sum is over those K + a. which intersect C. Then define the upper density of the system K to be p + (K) = lim p+ (K, C) s — oo and the packing density of K, 6(K), to be 6(K) = sup p + (K) K where the supremum is taken over all systems K of translates of K that form a packing of R. If K is bounded, 6(K) < 1. We are concerned with packings in bounded sets, not all of space, so some further considerations are necessary. If a set Ac R can be packed with at most a finite number of translates of a set K, we write M(K, A) for the largest number of sets that can belong to a packing. If both A

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -37 -and K have finite, non-zero Lebesgue measure we define the packing density of K in A to be 6(K, A)- M(K, A)p (K) p(A) If K is bounded we may assume in the foregoing that it has diameter 1. Then for e > 0 let K(e) be the set of diameter e which is geometrically similar to K; i. e., K(c) is K = K(l) scaled down in its linear dimension by the factor c. One can then show with a little fussing but no real difficulty that for any fixed cube C lim 6(K(E), C) > 6(K) C — 0 where, obviously, lim 6(K(E), C) = lim 6(K, C(s)) (Al) e - 0 S — oo C being fixed on the left side, K fixed on the right. It is equally obvious that (Al) holds if lim replaces lim on both sides. But now it is easy to see that 6(K, C(s)) actually has a limit as s-woo if K is bounded. In fact, for any e > 0 there is an s such that o o 6(K, C(s )) > lim 6(K, C(s)) - c~ s -W o0 Then for any integer m, 6(K, C(ms )) > 6(K, C(s )) 0 0

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -38 -because m cubes C(s ) can be fitted exactly to give C(ms ). But, if ms < s < (m + l)s 0 0 n 6(K, C(s)) > 6(K, C(ms )) m -o!m + \ / since a packing of C(ms ) is a packing of C(s). From these inequalities it follows that lim 6(K, C(s)) > lim 6(K, C(s)) s -- o 00 — 0oo For the special case that K is a ball of diameter E, C is a cube of side s and the space is Rn we write M (e, s) for M(K(E), C), 6 (e, s) for 6(K(E), C), and 6 for 6(K). Then we have (see [6] page 36, and page n 98 for the definition of T ) lim 6 (e, s) > 6 >_ (A2) n n n -3/2 where T - ne as n- -o. n We now apply this to obtain a lower bound for M (E ). A cube of side s in Rn has diagonal s A^fT. For arbitrary rn > O, r < a, put s = - and partition Rn into cubes of side s, oriented so that their edges are parallel to the axes of E. Let B C Rn be the union of the smallest collection of cubes such that B D E n(-_). Then volume B > volume E n(-) - k 1 - 7 a aa...a. Furthermore, B is entirely contained in E An

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -39 -that at least one c-ball can be packed in each cube. Then M (e, s)(c/2) kn n 6 s)-: S)- - M (e, s)kn t -) n Since at least as many E-balls can be packed into E as into B, and since the number of elements in an c-separated set in E is not less than the number of e-balls that can be packed into E, volume M (c s) e n volume of cube n volume E (-n) _> -- M (e, s) / / -n n 6= (c, c)(2/e)n(l - n/a ) aa2..a (A3) n n 12 n This inequality holds for arbitrary n, r7, e as long as the conditions nc < r < a (A4) n are satisfied. Then, for arbitrary fixed n and arbitrary rY < a, since lim 6 (e, s) > 6, lim Me()/) 6 (E1 n) 1 2 ** Thus elim M(En)(c/2)n> 6 aa...a (A5)

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -40 -for any positive integer n; or, stated fifferently, given any n and any e > 0, for all sufficiently small e (depending on n and c ), M (E ) >n (1- ) T I-) i-l The lower bound on 6 given by (A2) yields n n.., M (E n (1e ) i=l for all n and all sufficiently small c. The asymptotic lower bound on 6 yields n n 3/2 K 2) (A6) 2'~ e i=l for arbitrary e >0, then for sufficiently large fixed n, and then all sufficiently small c < E(n, e ). Since M (E) > M (E ), (A6) gives a lower bound of sorts, but it is nearly useless for our purpose because it is meaningful only for small e, and furthermore, there is no criterion available as to how small e must be.

INSTITUTE OF SCIENCE AND TECHNOLOGY THE UNIVERSITY OF MICHIGAN -41 -BIBLIOGRAPHY 1. "ec-Entropy and c-Capacity of Sets in Functional Spaces, " A. M. Kolmogorov and V. M. Tihomirov, Uspeki Mat Nauk, 14 (1959) 3-86. 2. "The c-Entropy and c-Capacity of Certain Time-Varying Channels, " R. T. Prosser, submitted for publication. 3. "On the Eigen-values of Certain Hermitian Forms, " M. Kac, W. L. Murdock, G. Szego, Journ. of Rat. Mech. and Anal., 2(1953) 767-800. 4. "The Minimum Value of Quadratic Forms, and the Closest Packing of Spheres, " H. F. Blichfeldt, Math. Ann., 101 (1929) 605-608. 5. "On the Measurement and Use of Time-Varying Communication Channels, " W. L. Root, Info. and Control, 8 (1965) 390-422. 6. "Packing and Covering," C. A. Rogers, Cambridge Tracts in Mathematics and Mathematical Physics, No. 54 (1964). 7. "Continuous Time Channels," R. G. Gallager, Draft of a chapter of a revision of "The Transmission of Information" by R. M. Fano.