THE UNIVERSITY OF MICHIGAN OFFICE OF RESEARCH ADMINISTRATION ANN ARBOR DIGITAL COMMUNICATION STUDIES PART I: COMPARATIVE PROBABILITY OF ERROR AND CHANNEL CAPACITY PART II: ASYNCHRONOUS TIME-MULTIPLEXING OF MULTICHANNEL SPEECH SOURCES Technical Report No. 133 4251-2-T Cooley Electronics Laboratory Department of Electrical Engineering By: M. P. Ristenbatt Approved by: T. G. Birdsall H W. Farris T. Felisky S. B. Weinstein ORA Project 4251 Contract No. DA-36-039 sc-87172 U. S. Army Signal Supply Agency U. S. Army Signal Research and Development Laboratory Fort Monnmouth, New Jersey March 19-GMl

ACKNOWLEDGMENTS The authors wish to acknowledge the contributions of Mr. R. A. Carlsen, who wrote Appendix B, and was responsible for the queuing theory approach of Section 3. 1. Thanks is due Mr. A. J. Viterbi of Jet Propulsion Laboratory for supplying a tabulated form of computed error probabilities for coherent block codes. Credit is due to the members of the Signal Corps Multichannel Security Division for calling out attention to pertinent reports and work.

TABLE OF CONTENTS Page ACKNOWLEDGMENTS ii LIST OF ILLUSTRATIONS V LIST OF SYMBOLS (PART I) vii LIST OF SYMBOLS (PART II) ix ABSTRACT xi PART I: COMPARATIVE PROBABILITY OF ERROR AND CHANNEL CAPACITY OF DIGITAL COMMUNICATIONS SYSTEMS 1 GENERAL CONSIDERATIONS 2 1. 1 General Block Diagram 2 1. 2 Source Coding 3 1. 3 Error Correcting and Detecting 4 1. 4 General Detection Considerations 5 2. DETECTION CONSIDERATIONS 6 2.1 Optimum Receivers 6 2. 1. 1 Optimum Coherent Receivers 6 2. 1. 2 Optimum Incoherent Receivers 7 2. 2 Signal Choice 7 2. 2. 1 Coherent Operation 9 2. 2. 2 Incoherent Operation 11 2. 3 Example of Signal Selection in Binary Case 12 3. PROBABILITY OF ERROR CALCULATIONS AND RESULTS FOR DIGITAL SYSTEMS 14 3.1 Binary Systems 15 3. 1. 1 Binary Simplex 18 3. 1. 2 Coherent Orthogonal 21 3. 1. 3 Incoherent Orthogonal 23 3. 2 Nonbinary Systems 25 3. 2.1 Error-Coded Binary Signals 25 3. 2. 2 M-Symbol Alphabet 27 3. 2. 2. 1 General Pch Calculations 27 3. 2. 2. 2 Basic M-ary Symbols 34 3. 2. 2. 3 Block-Coded Binary Signals 34 3. 2. 2. 4 M-Phases 35 3. 3 Comparative Results and Conclusions 38 3. 3. 1 Bit-Error Comparison 40 3. 3. 2 Word-Error Comparisons 44 3. 3. 3 Information-Rate Comparison 47 3. 3. 4 Comparison Conclusions 50 4. CHANNEL CAPACITIES AND MAXIMUM RATES 55 4.1 Capacity Relations for Digital Systems 55 4. 1.1 Continuous Channels 56 4. 1. 2 Discrete Channels 57 4.2 Maximum Information Rates if Data Rate is Variable 62 4. 2.1 The Small S/N Region 64 4. 2.2 Large Signal-to-Noise Ratio 64 4.3 Conclusions 66 111

TABLE OF CONTENTS (Cont.) Page PART II: ASYNCHRONOUS TIME-MULTIPLEXING OF MULTICHANNEL SPEECH SOURCES 69 1. INTRODUCTION 69 1.1 Background and Previous Work 70 2. DESCRIPTION OF ASYNCHRONOUS TIME-MULTIPLEXING SCHEME 73 3. PERFORMANCE OF THE BUFFER 76 3.1 Queueing-Theory Approach 77 3. 2 Specialized Approach 79 3. 3 Freeze Out Fraction (FOF) 81 3.4 Freeze Out Runs 81 3. 5 Starting Transients 84 3.6 Channel Idle Time 84 3. 7 Wait Time Distribution 85 4. RECONSTRUCTION OF WAVEFORMS 90 4.1 Theoretical Sampling Relations 90 4.2 Reconstruction Methods 91 5. PERFORMANCE OF SYSTEM 93 5.1 Choice of Parameters 93 5. 2 Digit Rate 94 5. 3 Comparisons 95 6. CONCLUSIONS 98 APPENDIX A: DETERMINATION OF THE MAXIMUM NUMBER OF BINARY DIGITS PER SECOND THAT CAN BE TRANSMITTED OVER A CHANNEL OF BANDWIDTH W 100 APPENDIX B: THE EQUIVALENT SOURCE 108 APPENDIX C: ANALYSIS OF FREEZE OUT RUNS 119 LIST OF REFERENCES 124 DISTRIBUTION LIST 127 iv

LIST OF ILLUSTRATIONS Figure No. Title Page 1 General block diagram of digital communication system 2 2 Optimum coherent receivers for signals known exactly in white Gaussian noise 6 3 Two types of optimum incoherent receivers 8 4 Probability of error vs. Eb/No for binary signals 13 5 (This figure omitted) 6 General depiction of probability of error calculation binary, symmetric decisions in terms of probability densities 18 7 Two basic receivers for binary coherent operation 19 8 Distribution diagram for simplex binary symmetric situation 20 9 Distribution diagram for the coherent orthogonal, binary symmetric situation 22 10 Sketch of probability distributions for the incoherent, orthogonal AM case 24 11 Pw versus Eb/No for coherent simplex and incoherentorthogonal error-coded binary signals 28 12 Pch versus E/No for M-phase CPSK and DCPSK 39 13 PB(M) versus Eb/No for coherent, bi-orthogonal signals, comparing binary with M-ary 42 14 PB(M) versus Eb/No for incoherent, orthogonal signals comparing binary versus M-ary 43 15 PB versus Eb/No for M-phas-e (CPSK and DCPSK) 45 16 Pw versus Eb/No of 12-bit word for three types of binary transmission 46 17 Pw versus Eb/No for 12-bit word with bi-orthogonal signals 48 18 Pw versus Eb/No for 12-bit word with incoherent orthogonal signals 49 19 Pw versus Eb/No for 12-bit word with CPSK and DCPSK 51 20 Information rate/bandwidth for M-ary bi-orthogonal signals (coherent) 52 21 Information rate bandwidth for M-ary incoherent orthogonal signals 53 22 (This figure omitted) v

LIST OF ILLUSTRATIONS (Cont.) Figure No. Title Page 23 C/W versus Eb/No for binary simplex, coherent orthogonal, and incoherent orthogonal signals 62 24 Maximum information rate per bandwidth for binary signals 65 25 Maximum information rate per bandwidth for M-ary coherent orthogonal signals 66 26 Maximum information rate per bandwidth for M-ary incoherent orthogonal signals 67 27 Depiction of sampling for extremal coding 73 28 Block diagram of transmitter for asynchronous multiplexing of multichannel speech using extremal coding 74 29 Depiction of the probabilities a k and S3k along a time axis 79 30 Freeze out fraction as function of input rate u and storage capacity M 82 31 Channel idle time versus buffer length 85 32 Probability density curves of waiting time in buffer 88 33 Probability density curves of deviation of waiting time in buffer from its mean value 89 34 Portion of (sin x)/x curve suitable for maximum rate transmission 104 35 More practical curve (2TW=4) for maximum rate transmission 105 36 Buffering in an asynchronous multiplexed communication system 109 37 Notation for f(x/5) 110 38 Notation for f(5) 110 39 Notation for P'(k, T/5) 112 40 Notation for P(k, T/ 1,' 2'N) 112 vi

LIST OF SYMBOLS FOR PART I In General: x = input y = output (1)(0) = binary states C = channel capacity in bits/sec —-both continuous and discrete; maximum information rate at which one can send information over a channel and still reduce the Pe to 0 by external coding means D vector space dimensions per symbol time T, for a given signalling method; also called minimum "signal dimensions" D' = vector space dimensions of a space corresponding to an ensemble of successive symbols which occur in time T' E = energy per symbol Eb = energy per bit I = the information rate of digital systems in bits/sec I' = the information rate of digital systems in bits/symbol M = the maximum information rate for digital systems, when taking both Pe and bandwidth into account M = number of alternative signals or symbols = alphabet size N = number of successive symbols which can be placed in a time T', and theoretically have no intersymbol influence N 0 single-sided noise power per unit bandwidth = N/W Pe = general term referring to probability of "making an error", or probability of confusing one symbol for another PB = probability of bit error Pch(M) = probability of error for an M-ary symbol or character P = probability of word error, where a "word" refers to a group of bits; word length is determined by source data eT = total probability of error —-after corrective coding, occurs only when data is in bits. vii

LIST OF SYMBOLS FOR PART I (Cont.);Pi = transition probabilities of signals, between input and output of channel, due to noise Pi(Yj) = probability density of y; (output from jth filter) given that signal i is sent P(1 0) = conditional probability of 1, conditional on 0 s(t) = a signal time waveform S = a signal's average (or rms) power. Thus, variance of a signal S(f) = Fourier transform of s(t) T length of baud in digital (alternative) transmission = length of any alphabet symbol Tb = length of bit or equivalent bit length y YM or YM/2 = output of desired matched filter in Pe calculations y* = normalized y Pij = time correlation between si(t) and sj(t) /, = cummulative Gaussian function = particular error function = 1 S e-t/2 dt...

LIST OF SYMBOLS FOR PART II ak = probability that exactly k buffer stages are occupied just after a buffer clock pulse k = probability that exactly k buffer stages are occupied just before a buffer clock pulse FOF = freeze out fraction = the percent of total samples from all of the sources which are rejected because the buffer is full when they arrive M - capacity of the buffer in "number of samples" N the total number of sources for the multichannel situation p(k, u) = the probability of k samples among the N sources in a time interval T P(O) - a column vector of initial state probabilities P 1(ST)] = the probability of n items in buffer (nth state) at time ST P = state probabilities just before the occurrence of a buffer clock pulse P (c) = the probability of a run of exactly c clock intervals, during which one or more samples from some source are frozen out during each interval P(dlf > 1) = probability that exactly d units are frozen out, in a given interval, provided that one or more was frozen out in a previous interval P0(k) = probability that k samples are in the buffer immediately before the arrival of the (k+l)th sample at time OT P O(k get in) the probability of a sample encountering k samples already in the buffer when it enters at time T, if it is not rejected as freeze out = the number of quantizing levels for the speech samples R = the digit rate in bits/sec R' = the average rate of occurrence of samples from an active speech source ix

LIST OF SYMBOLS FOR PART II (Cont.) R" = the average rate of samples from a speech source when one takes into account the percent of "off" time of the source AtM = the maximum amount of time displacement of samples which may occur before undesirable properties in the reconstructed speech occur [T] = the matrix of transitional probabilities between "states," and can be found using a state diagram T = the time interval at which the samples are regularly removed from the buffer, if available u = the average rate (taken across sources) of occurrence of samples in a time T w = the wait time for the (k+l)th sample, which arrives at time

ABSTRACT Part I is an organized presentation of the theoretical advantages and disadvantages of alternative modulation and coding methods for digital communications. White Gaussian noise with optimum receivers is assumed. The signals considered are: simplex, bi-orthogonal, coherent-orthogonal, incoherent-orthogonal, and M-phase. The probability of error and bandwidths for existing and contemplated signals of these types are compared. Both a constant and a variable data rate comparison are made. The data for these comparisons were obtained from the literature and from calculations. Graphs and tables are used to depict the comparisons. M-ary bi-orthogonal signals are deemed best for low signal-to-noise ratios if probability of error is stringent. Binary and M-phase signals appear best for medium and large signal-to-noise ratios. This treatment should help structure the selection of modulation and coding methods for proposed communication systems. Also, since many signalling methods are considered, some fundamental principles, terminology, and methods of comparing are suggested. Part II is an analytical evaluation of asynchronous time-multiplexing for multichannel speech signals. This asynchronous system applies where a number of speech sources participate in a common channel or trunk, and it is proposed that the source-tochannel sample assignment be asynchronous. The analysis here includes the performance of the buffer and a predicted digit rate for the asynchronous operation. The buffer analysis was accomplished by using a specialized probabilistic approach, which may be regarded as a special case of the more general queuing theory. The digit rate analysis assumes the use of extremal coding on the speech source. Based on the parameters assumed for the proposed asynchronous system, it is predicted that a multichannel situation consisting of 100 channels will use only one third of the digit rate of synchronously multiplexed PCM, and about 5 db loss in quantization signal-to-noise ratio can be expected. Successful operation of this system would mean a more efficient use of the channel and possibility of temporary overloads. Based on the analysis here it is felt that the theoretical advantages of asynchronous time-multiplexing are sufficiently profitable to warrant a serious feasibility demonstration of this technique. xi

PART I: COMPARATIVE PROBABILITY OF ERROR AND CHANNEL CAPACITY OF DIGITAL COMMUNICATIONS SYSTEMS In order to choose or design a communication system sensibly it is necessary to know the advantages and disadvantages of the alternatives which are available. It will be the objective in this first section to present both existing knowledge and proposed comparative methods in the selection of digital communication system modulation and coding methods. Attention has been paid to sensibly organizing the material, and it is presented as briefly as possible. It is the intention that with the knowledge to be summarized below, one might sensibly proceed toward selecting methods for fulfilling a given communication systems recul'irements. It should be stressed that the emphasis in this report will be overwhelmingly on theory. The state of the art, and the practical considerations which may or may not make certain systems possible will receive cursory mention as these were not seriously studied under this work. Because of the fact that new comparative calculations are published almost weekly in this communications area, it serves a very useful purpose to take cognizance of these results and to attempt to assay their significance. Hence some of the material in this section will naturally be reporting on published work, while other material will be original with the authors. In many cases, statements will be made without proof since derivations may be followed in the pertinent reference. This section is undertaken only because we feel that aim serves a very necessary function before getting into more detailed and original contributions. After discussing the general considerations for digital communication systems, the detection picture will be presented. The two major topics then are: (1) probability of error calculations for various conditions and (2) channel capacity relations. A good deal will be said about channel capcity for digital systems.

1. GENERAL CONSIDERATIONS Under "General Considerations" we will depict the general block diagram for digital communications systems and discuss generally the possibilities (and trades) for improving the performance in each of the "box" areas. In addition to quantitative analysis, there will be a collection of comparisons and qualitative conclusions. 1. 1 General Block Diagram The block diagram for a digital communication system can be depicted as shown in Fig. 1. By "source coding" we refer to those operations which convert the actual message source into a sequence of symbols. A common example is the sampling, quantizing, and subsequent encoding (PCM) of any analog voltage. "Error coding" refers to putting some constraints between the symbols so that certain combinations of errors can be detected (and possibly corrected) in the receiver. This is a form of using intentional redundancy to protect against channel noise. "Modulation" is concerned with the actual choice of waveform for a particular symbol, and the RF transmission of the symbol. In the receiver, of course, the mirror image of each of the above functions occurs. MESSAGE SOURCE ~ ERROR I MODUSOURCE CODING CODING LATION DEMODU- ERROR MESSAGE MESSAGE Fig. 1. General block diagram of digital communication system. 2

In this report we will regard the modulation-channel-demodulation complex as the detection situation. This is sensible since it is impossible to separate the effects of signal selection, power, etc., on the channel probability of error. We wish, then, to consider the alternatives in digital communication system design as being separable into the three areas of: (1) message (or source) coding; (2) errorcorrecting (detecting) coding; and (3) detection improvement. In any system design concentration on one or on another of these areas will probably be most profitable to improve the system. The "trades" involved and the costs of each improvement are the subject of the ensuing material. 1. 2 Source Coding The standard method of source coding (for digital communications) is to sample an analog signal at the Nyquist rate (2W samples per second), quantize the resultant samples, and digitally encode them in some fashion. With such a method it is clear that one is attempting to reproduce (in the receiver) nearly the exact original waveform. For many analog signals (in particular, speech) one can reduce the apparent information content of the source by removing redundancies. For example, it is well known that good quality speech should require only about 50 bits/ sec., but the sampling method above results in 30, 000 bits/sec. The discrepancy may be interpreted as due to redundancy of the (source) speech signal, and bandspreading involved with digital encoding. Clearly, if the redundancies can be reduced, a speech source will require less communication facility. Using the Vocoder principle, one can reduce the requirement to 2, 000 bits/sec. An approach different from the Vocoder method is to operate on Shannon samples in some fashion. "Delta modulation" schemes do this by sending only the "difference between present and integrated past samples. " An extension of this is to use a number of past samples and predict the next sample value. Then the difference between the actual and the predicted sample value is sent. Experimental results using such predictions are reported in a companion report, "Some Computer Experiments Using Predictive Coding On Speech," TR-132. Another source coding situation arises when the source information occurs in bursts. If delays are tolerable, one can try to smooth out the data so that a transmitter can operate closer to the average rate of the source. This aspect will appear in Part II with considerations of "asynchronous time-multiplexing" where a number of sources must be multi3

plexed. 1. 3 Error Correcting and Detecting In error coding the general objective is to introduce constraints between the syinbols at the transmitter so that channel errors may be detected and possibly corrected in the receiver. It is useful to consider that error codes include both error-correcting and errordetecting (without correcting) situations. Probably the earliest work on error codes was by Hamming (Ref. 1) who dealt with three codes: the parity check code, a single error correcting (which later became known as tile Hamming code), and a single error correcting-double error detecting code. Such early codes were devised under the assumption that the occurrence of a single error was most likely and that the errors occurred at random and were independent of one another. For the more realistic situation where errors occur in bursts (in adjacent digits) a new series of codes, called "cyclic codes," was devised. This includes the Fire codes and the Bose-Chaudhuri codes. The name "cyclic" comes from the fact that the code words can be generated in a cyclic fashion by feedback shift registers. The present status of error codes is that a number of codes have been mnathematically demonstrated, but that the application art lags somewhat behind this. Recently there have been profitable applications using shift-register encoding (Ref. 2) to implement error detecting By the use of such modern codes one can obtain efficient codes (high ratio of information-to-check digits for a given error rate) while at the same time providing for practical operating equipment (shift-register methods). A summary of probability of error calculations with error codes will be given in Section 3. 2. 1. The most authoritative (recent) description of codes is Ref. 8. One firm conclusion about error codes is that they are definitely profitable in situations where a feedback facility exists, that is, where one can ask for repeats. Then one can use a maximum of error detecting and a minimum of correcting. A large body of work has been published investigating this situation (Ref. 9). Such feedback situations are clearly applicable to the transmission of non-real -time digital source data. However, no methods have been found to exploit the feedback-errorcoding situation when the digital data stem from real-time sampled analog data(such as speech). The difficulty lies in dealing with the delays. 4

1. 4 General Detection Considerations Referring to Fig. 1, the "detection situation" is characterized by the modulation technique, the channel, and the demodulation. Each of these three items is concerned with identifying the right signal after reception. Since the effect of channel noise on probability of error is dependent on signal selection and receiver decision method, one should not separate these three items. In general, the two things that can be done to improve error-rate performance in the detection area are: 1) Use proper detectors for the chosen signal whenever possible. (When the channel disturbance is additive Gaussian noise these will be optimum. ) 2) Use as much "energy per decision" as necessary. Since the probability of error (with white Gaussian noise, using optimum detection) depends only on signal energy and channel noise (and not on signal time characteristics), one should integrate as much "energy per decision" as necessary. In other words, the greater the energy per decision, the lower the probability of error (P ). There are basically two ways to increase energy per decision in a digital system: (1) narrow-banding, and (2) increasing alphabet size. In narrow-banding, one simply spends more time per symbol to obtain the increased energy. This of course means that the data rate is decreased accordingly. One can increase energy per decision without decreasing the data rate by increasing alphabet size. Two ways of increasing the alphabet size are: (1) use M-ary basic symbols instead of binary, or (2) use "grouped" binary symbols. In either case the energy per decision is increased. Although it will be found in Section 3. 3. 4 that such increasing of alphabet size may not be efficient, it may nevertheless be necessary if one has a given S/N, Pe and required data rate. When increasing alphabet this way, one is in effect trading S/N with increased c mp jnl)lexity. The detection situation will be discussed in more detail in Section 2. 5

2. DETECTION CONSIDERATIONS Although the general conclusions about detection were summarized in Section 1, we now treat this in more detail. First, optimum receivers are discussed, then signal selection, and finally, the very important binary case is used as an example. 2. 1 Optimum Receivers In a digital communication system various alternative signals are sent at the transmitter in intervals of T seconds. Hence, for each interval, the receiver detector must decide which signal was sent. Receiver detectors will be either coherent or incoherent, depending upon whether propagation and equipment conditions are stable enough to use RF phase in the decision process. 2. 1. 1 Optimum Coherent Receivers. It has been shown (Ref. 38, etc. ) that, if the channel noise is white Gaussian noise, the optimum method to perform the digital detection process is the crosscorrelator or matched filter detector. If one does not choose to incorporate a priori knowledge of the occurrence of the symbols, then it can be shown that this type of detector is the best that one can do, either on a signal-to-noise ratio criterion or an a posteriori probability criterion (Ref. 3). Thus in general the receiver will take the form shown in Fig. 2. x(t) x(t) T o x(t) s,(t) =y,(T) I, M FILTER Y(T)....... MATCHED SAMPLE to S,(t) s, (t).T ~ S ~, x(t) s2(t)= y2(T) Y2(T) 4 FILTER I /o 11P1 I I al 1 I!l exactly in white Gaussian noise.

Since these are coherent receivers, it is implied that one knows the phase and frequency of the RF carrier, and one is truly operating with the exact s(t). In Fig. 2(a) the incoming signal is first multiplied by the coherent local replicas. Then each product goes through an "integrate-and-dump" operation. In Fig. 2(b) the matched filter and sampler perform the identical function. After the integration,the decision device (called the "comparer") specifies the output on the basis of which signal is greatest at t = T (end of each symbol). Such receivers are called "maximum likelihood receivers. " It should be emphasized that the receivers of Fig. 2 are optimum, regardless of the choice of signals. However, one affects the detection situation by the choice of signals (considered below in Section 2.2). 2. 1. 2 Optimum Incoherent Receivers. The incoherent case refers to the situation where one does not know the RF phase sufficiently, and hence can work only with the video. Two types of optimum incoherent receivers are shown in Fig. 3 —-both involving matched filters. In (a) the carrier-band matched filter is followed by an envelope detector and sampler. The outputs of the sampler are then compared. This receiver is "optimum" for incoherent operation in white Gaussian noise (Ref. 18). An alternative (but equivalent) incoherent method, shown in Fig. 3(b), is to multiply each alternative signal by both a sine and cosine function of the eract carrier frequency, and put both of these through a baseband matched filter (or correlator). Then one takes the square root of the sums of the squares, and compares these for the difference signals. This method requires knowledge of the eactcarrier frequency, and is equivalent to the carrierband matched filter receiver (Ref. 22). Both of these receivers work on the envelope of the received signal, and hence all considerations depend on energy alone, since phase cannot be utilized. Also, the decision device (called the comparer) decides on the basis of which signal is greatest at t = T. Such a receiver is also a maximum likelihood receiver. It should be emphasized that the receivers of Fig. 3, like those of Fig. 2, can be shown to be optimum regardless of the choice of signals. However, one can improve the detection by choosing the signals properly. 2. 2 Signal Choice We now deal with the question of what signals are best for use in conjunction with 7

M FILTER I ENV. (a) Carrier-band matched-filter receiver X 7 BASEBAND M. FILTER a. 1 I I TO s, (t) SIN uwt (b) Base-band matched-filter receiver 8 BASEBAND M,FILTER TO sM(t) Fig. 3. wo type of optmum incoerent rceivI I

the receivers of Figs. 2 and 3. First the coherent case is treated. 2.2. 1 Coherent Operation. "Coherent operation" implies that propagation and equipment conditions are stable enough to use RF phase in the receiver "decision" -- even at low signal-to-noise ratios. For such coherent operation, it is sensible to regard the possible signals as belonging to one of two classes: (1) "far-apart" or "distinguishable" signals, or (2) "close-packed" signals. Such signals as "orthogonal" signals are considered "far-apart" because the major objective is to have the symbols as distinguishable as possible in the presence of noise. These signals are of interest when probability of error is a significant problem. In "close-packed" signals (such as M- phase with M > 8, or "phase-plus-amplitude" modulation) the major objective is to obtain efficient information transmission; here it is implied that S/N is high enough so that P is within requirements. e In this report we are concerned mainly with the far-apart signals. Hence we are concerned with those cases where probability of error is a significant problem. The following far-apart signals will be considered: (1) simplex, (2) bi-orthogonal, and (3) orthogonal. That set of signals which are as "far-apart" as possible areusually termed " simplex" because of their analogy with vectors of a geometrical simplex. Simplex signals (Ref. 3), viewed as points in signal space, correspond to a geometrical regular simplex where every vertex is equally distant from every other vertex. Of these, there are M signals, the geometrical simplex appears in an M-1 dimensional space. Two simple examples of such simplex signals are: (1) two binary signals, whose veCtors are 180~ out of phase —-sl(t) = -s2(t), and (2) a ternary signal, where the vectors are at 1200 phase with each other. Another simplex (baseband) signal set is a maximal (pseudo-random) sequence and its time translates (by one clock pulse). Since simplex signals are the "farthest apart," such signals are optimum for coherent operation. Although simplex signals appear optimum, orthogonal signals are near-optimum [and approach the optimum (simplex) for large alphabet Mi. Signals are orthogonal, over an interval T, if: T f si(t)sj(t) dt = o i/j = E. i=j (1) where: Ei = total "symbol" energy of signal si(t).

If the signals all have equal energy (a common assumption) then: T fs si2(t) dt = E for any i (2) Signals may also be orthogonal in the frequency sense, and this would be given by o0 f Si(f) Sj(f) df = 0 i j -co = E i =j (3) T -j2 ift where: Si(f) = f si(t) ej dt 0 Thus it is seen that signals can be orthogonal in two senses: time or frequency. A common example of "signals orthogonal in time" is simply time waveforms which do not overlap. Signals which do overlap in time can also be made orthogonal. The most common examples of "signals orthogonal in frequency" are FSK signals. Here the frequencies of the different signals are chosen to be effectively nonoverlapping. Consider for a moment the role that orthogonality plays in affecting the probability of error for a digital system. Equation 1 says that, when signals are orthogonal, the time correlation or pairwise correlation coefficient between signals is zero. In general one writes the correlation coefficient as simply: T E- f si(t) sj(t) dt = Pij(4) 0 One can see why the simplex signals mentioned above are superior to orthogonal signals for small M. Consider the binary case: if s1(t) is orthogonal to s2(t), the P12 = 0; but if sl(t) = -s2(t), the P12 = -1. Clearly a "pairwise" correlation coefficient of -lisbetter than 0. Consider now an M-signal simplex alphabet. To calculate the total Pe is exceedingly complex, and has not been done directly. However,' one can find the simplex Pe from the coherent orthogonal P by considering an energy increase (see Section 3. 2. 2. 1). Of course, the more negative the pairwise correlation, the better. It can be shown (Ref. 3) that the most negative value that can be obtained for the 10

pairwise correlation coefficient between members of an M-signal alphabet (where all pj's are equal) is given by: Pij = M-1 (5) Here is the reason, then, that as M grows large, the simplex signals approach the orthogonal ones in detection effectiveness —-since, as M increases, pi, approaches 0. Concerning the comparison between simplex and orthogonal signals, another point should be made. From an engineering viewpoint a very effective set of signals consists of using orthogonal signals and their negatives, called "bi-orthogonal" signals. For an M-signal alphabet one needs only M/2 correlators or matched filters. Also, the bandwidth is reduced, since now one has only M/2 "different" signal constructions. The above signals have been based on the idea of being quite distinct or far apart, for purposes of distinguishing them in noise. Such signals are necessary when the S/N ratio is such that a significant probability of error (Pe) occurs. If the S/N ratio is high enough so that Pe requirements are easily met, then signals other than far-apart signals may be desirable. One common signal which is partially correlated (for M > 4) is the coherent M-phase signal. Because of the increasing correlation between signals as M increases, M = 8 is probably the maximum which one would consider (Ref. 19) for intermediate S/N. In addition to coherent M-phase signals, one can operate "differentially-coherent" where one requires coherence only over a symbol length. These signals will be discussed further in Section 3. 2. 2. In conclusion, for coherent operation where Pe is significant, simplex signals are best, bi-orthogonal next, and finally orthogonal signals. When Pe is within requirements, signals other than the above far-apart signals become valuable; i. e., M-phase signals and others. 2. 2. 2 Incoherent Operation. For any incoherent operation there is no possibility of using phase information (by definition). Hence neither simplexinor bi-orthogonal (nor Mphase signals) are possible. This leaves orthogonal signals as the optimum signals for incoherent operation. 11

2. 3 Example of Signal Selection in Binary Case Since the binary case is the most frequently-used case, and its Pe curves are well known, we will note the results here to serve as an example of the above discussion. For the coherent binary case, it is well known that the probability of error (in white Gaussian noise with optimum receivers) is given (Ref. 17) by: Pe PB =((1 - P12) E/N0o 2) (6) where: c~(x) = - f e- 2/2 d E = signal energy N = si -d white noise power per uniiiTdiandwidth P12 = correlation coefficient between s1(t) and s2(t) PB = bit error rate Note that for binary the simplex signal 1s1(t) = -s2(t)] is identical to the biorthogonal signal. Curve (a) of Fig. 4 shows PB versus Eb/No for the simplex signal sl(t) -s2(t), where p12 = -1. if one were to use orthogonal signals in this binary coherent case (P12 = 0), one would obtain curve (b) of Fig. 4. Finally, in the incoherent case, where one should use orthogonal signals, the PB curve is shown in curve (c). The basis for these curves will be exhibited in Section 3. 1. In conclusion, we have noted that, for optimum detection situations, the Pe depends only on the energy (E) and the noise power density (No). It does not depend on other characteristics of the signal, such as BW, etc. For coherent signals itw was noted that (for a given M) simplex signals have the best P behavior, bi-orthogonal next, and then orthogonal. For incoherent operation orthogonal signals are best. Note that, for the coherent case, orthogonal signals become nearly as effective as simplex ones, as M increases. 12

I0INCOHERENT CASE COHERENT.5e -E/2No -2 \ CASE bQ~~~~~~~~~~~J~~~~ BINARY SIMPLEX Tb Lii m j e2 dt W-,2 _/NT Tb 10-4 -5 0 I 2 3 4 5 6 7 8 9 0 12 0 N Fig. 4. Probability of error vs. Eb/N for binary signals. 13

3. PROBABILITY OF ERROR CALCULATIONS AND RESULTS FOR DIGITAL SYSTEMS For digital systems, the probability of error is the measure of effectiveness regarding channel noise. Of course there may be other sources of noise, such as quantization noise, timing errors, etc. However, to begin any realistic comparison, the channel noise probability-of- error is most basic. In this section we will exhibit briefly the mathematical calculation for the probability of error and compare the results for various conditions of operation. The first section will be devoted tobinary systems and the second section to nonbinary systems. There are three types of error probabilities in digital systems, depending upon the original data and the size of alphabet: bit error probabilities, word error probabilities, character error probabilities. If the original data are bits (such as a series of binary numbers) then the bit error rate (PB) is the pertinent criterion. If the information is sent by means of characters (M-ary basic symbol or block binary), the character error probability (Pch) should be converted back to equivalent bit-error (PB). On the other hand, if the original data are words (such as teletype or sampled analog signals) then the word error probability (Pw) is the pertinent criterion. If the information is sent by bits, the equivalent Pw should be calculated. If M-ary characters are used to transmit the words, the Pch may or may not be equal to Pw depending on the value of M. In this report the symbol "P " will be a general term (which includes all the e others) to denote "probability of error. " Depending upon the case, it will be either a bit, a word, or a character error rate. It will always be marked accordingly. A system will be called "binary" or "nonbinary" according to the "order of the decision process. " If a decision is made at the end of every bit, and is not later revised (based on an error correcting code, for example), the system is considered "binary. " With this definition, the nonbinary systems can be of two types: (1) error (correcting) coded binary signal, where intersymbol constraints are evaluated after the first binary decisions; and (2) M-ary symbol alphabet, where basic symbols (characters) other than binary (ternary, quarternary, 14

etc.) are used, and a decision is made on each of the M-ary characters. In turn there are two ways to realize an M-ary symbol alphabet: basic M-ary symbols and block coded binary. We feel this categorization is sensible, and has the most consensus in this field. (See Table I. ) The above items can be regarded as alternatives available to a system designer. Of course, within these alternatives there are still a number of further choices. For example, if binary, one can operate either coherently or noncoherently. If coherent, one can use PSK, or coherent FSK; if noncoherent, one can use noncoherent FSK or AM. 1 Because of all these permutations it is difficult to set up a consistent and sensible categorization. However, the chart in Table I attempts to do this, alongwithproviding informationto be discussed later. In general the "practical" signals which are optimum in some sense are (see Section 2. 2): (1) simplex, (2) bi-orthogonal,(3) orthogonal, and (4) M-phase. We will consider the P relations for digital systems in terms of these categories. One can reach a general Pe equation for both the bi-orthogonal and the orthogonal set of signals. However, for the simplex set, no equation has been found so far, but the P e can be related to the coherent orthogonal case. For the binary signal, however, the PSK signal is the simplex signal, and this P can of course be calculated. For this reason, we will first depict the P calculations for binary, treating it as a sole case. Remember that, for the bi-orthogonal and orthogonal cases, the binary Pe could be obtained as a special case of the M-symbol general relations of Section 3. 2. 2. Another justification for proceeding in this way is that the binary case is the most common in practice, and therefore deserves special attention. 3. 1 Binary Systems As mentioned, binary systems are taken to include all those systems in which a binary decision is made at the end of each received bit, and is not changed thereafter (i. e., by a correcting code). In the following the symbols 1 and 0 will represent the two binary states (a +1 and -1 would be better for coherent situations, but this is clumsy to write). For binary cases there are three different signals: (1) Simplex —-PSK, where s1(t) = -s2(t); (2) Coherent-orthogonal —-coherent FSK, and (3) Incoherent-orthogonal —-Incoherent FSK and AM. To depict the Pe calculation, we will consider each of these three cases. 1Incoherent AM will mean carrier "off-on" operation in this report. 15

1 -Theoretical Constant Channel Capacity DECISION Signals in Use Bandwidths Digit Rate Comments Signal Minimum Max C in LEVEL (or contemplated) Type Bandwidth Signal Dimen- terms of Eq. No. Controlled UncontrolledSignal Dimen- terms of Environment Environment (in terms of Tb) sions D W CPSK* Simplex (6) & (13) Tb 2/Tb 1/Tb 1 2W DCPSK** Simplex - Tb 2/Tb 1/Tb 1 2W Tb Tb Coh. FSK Coh. — Orthog (16) 2 3/Tb 2/Tb BINARY Tb Incoh. FSK Incoh. — Orthog (25) 2 3/Tb 2/Tb 4 W/2 | | %'u C. 0 lulIncoh. AM Incoh. — Orthog (25) 2 3/Tb 2/Tb 4 W/2 Error- Incoh. AM Incoh. — Orthog Coded Incoh. AM Incoh. — Orthog CPSK M-phase (52) l/T 2/T T 2 W log2M TblOg 2M M-ary DCPSK M-phase - l/T 2/T 2M 2 W log2M M-ary Tb log2M M+2 M Basic Kinoplex Bi-orthogonal (45) M/2T 2T 2Tb og2M M/2 Symbol M+l M 2W Coh. FSK Coh —Orthog (35) M/T T Tb og2M M M2 log2M NONBINJBT~~~~~~ARY ~M+I M W M-symbol Incoh. FSK Incoh —Orthog (40) M/T T Tb log2M W log2M b 2 Alphabet Shift-reg. Simplex - M-1 2M-2 M-l 2W encoded T T Tb log2M M-l log2M Reed-Muller M+2 M Block- Reed-Muller Bi-orthogonal (45) M/2T 2T logM M/ 2 encoded 2T 2Tb log2M Coded Binary Coh-orthog (35) M/TM M log2M T Tb log2M M g2 M+I M W Incoh-orthog (40) M/T T Tb loM 2M i logPM *CPSK-Coherent phase shift keying **DCPSK-Differentially coherent phase shift keying Table I. Depiction of various digital signals and their parameters. Although the items appearing in this table will be here assume that the spectrum of the ideal signal will be ply the controlled-environment W written in terms of Tb. dealt with in the following material, it is profitable to cut off at some point. With a "square-wave" signal For constant rate, Tb is constant. Hence, these bandexplain terms here. pulse, the numbers here assume that only the main lobe widths are convenient for constant rate comparisons. Controlled Environment Bandwidth -- The bandwidths of the (sin x)/x spectrum is passed. Such spectrum Signal Dimensions (D) -- D specifies the minimum dihere refer to the bandwidth occupancy of a signal, if the limiting will of course alter the square-wave signal mensions (in a signal space representation) per symbol adjacent communicators use the same signal and if their pulse. This is called "uncontrolled environment" because time for a given signalling method. Section 4. 1. 2 refrequencies are controlled (and integrally related) with now there need be no relation between communicator's lates this minimum dimension to a minimum TW product respect to the given signal. (Note that phases need not signals. (per symbol time) for a given signalling method. This be integral). The bandwidths here state how close sig- Considering the spectrum of M-ary coherent cases, in turn is used to find a capacity associted with each nals can be placed and not (theoretically) interfere with the bandwidth here includes the main lobe of the outer- signalling method. each other, under the stated conditions. The ideal sig- most (in frequency) signals. For coherent FSK there are Maximum C -- Since the C for a given signalling method na]s take more bandwidth, but nevertheless they do not M lobes; for block coded binary there is only one major varies with Eb/No, the maximum C here is the highest interfere. lobe. (or asymptotic) value of C. For these values the Pe's Uncontrolled Environment Bandwidth -- The bandwidths Constant Digit Rate Bandwidth -- This bandwidth is sim- of the symbols approach zero. 16

For binary signalling the following probability equation for probability of eirtii' can always be written: Pe PB = P(110) P(O) + P(011) P(1) (7) where: PB = probability of a bit error P(110) = probability of specifying a 1 when a 0 is present. If one assumes that the a priori probability of a 1 equals the probability of a 0, then P(0) and P(1) = I. Furthermore it will be assumed that the penalty for either wrong decision is equal, so that P( 1I 0) = P(0 1). Then the probability of error is given by P = P(110) + P(0I1) P(110) = P(OI1) for symmetric, (&) binary situation The major issue then, for all such symmetric binary systems, is to evaluate either P(1l 0) or P(O 1). In general, this can be done by considering the a posteriori probability distributions. Consider that one is using two correlators or matched filters, as shown in Fig. 2. Then let: Po(Y1-Yo) = prob. density of the output (y1-yo) if symbol 0 is sent P1(Yl-Yo) = prob. density of the output (yl-yo) if symbol 1 is sent (9) These a posteriori distributions, for a symmetric case, are shown sketched in Fig. 6. For all symmetric cases (and if the two distributions are of the same shape), the cut-point "x' of Fig. 6 will occur where the two curves are equal. The cut-point is the threshold, abovewhich one always decides 1, and below which one decides 0. Thus the "area of confusion," given by Eq. 8, is shown by the two marked areas of Fig. 6. One can evaluate either of these areas in the symmetric case. From these curves it can be seen that the PB of Eq. 8 can be evaluated by: [let y = (Yl-Yo)] X o PB = P(110) = P(0O1) = S P1(Y) dy =S Pf po(Y) dy (10) -o X 17

Po(Y,-yo) PI (y,-yo) Ix (Y,-Yo) Fig. 6. General depiction of probability of error calculation for binary, symmetric decisions in terms of probability densities. In general, for binary situations in white Gaussian noise, the Pe calculation involves the procedure just described. For each special situation the cut-point may vary, and the distributions may vary. With this general procedure we now show the relations for simplex, coherent orthogonal, and incoherent orthogonal. 3. 1. 1 Binary Simplex. In binary simplex, s1(t) = -s2(t). Two basic receivers for such signals are shown in Fig. 2. In addition, for coherent binary cases, one can reduce the number of correlators (or matched filters) by a factor of two. One does this by correlating (or match-filtering) tothe "difference of signals. " Figure 7 shows the correlator version: (a) is repeat of Fig. 2(a), and (b) shows multiplication by s1(t) - s2(t). Note that the decision device in Fig. 7(a) is "choose greatest, " while in Fig. 7(b) it is "check polarity. " Although correlators are shown here, matched filter versions may also be used. For purposes of calculating P we willusethe one-correlator receiver of Fig. 7(b), although one could use either. The objective is to obtain p1(y) and p2(y) (see Fig. 6). First note that if s2(t) is sent: y(T) = f [s2(t) + n(t)] [s2(t)- s1(t)] dt = s2(t)dt - sl(t)s2(t)dt + T n(t) [s2(t) sl(t)]dt 18

Since, in binary simplex, s1(t) = - s2(t) T y(T) = 2E + f n(t) [s2(t) - s1(t)dt11) y, (T) Sl~t ) f CHOOSE GREATEST S2(t) T y(T) CHECK POLARITY Fig. 7. Two basic receivers for binary coherent operation. Looking at the "statistical" effect of y(T), it is seen that the first term contibutes a mean (2E), but no variance (since it is a constant). The second term will have a variance, and a zero-mean [assuming n(t) has zero-mean]. If n(t) is Gaussian, the second integralwill be Gaussian since the integral of a Gaussian process is Gaussian (Ref. 10). By a variety of means, one can show that the a2 of this second integral is 2EN (Ref. 20). 0 If sl(t) were sent the mean would be -2E with the same variance (2EN). Before depicting these curves, it is convenient to normalize for purposes of comparison and ease of writing equations. Consider that the mean is normalized to + E. Then the variance will be ENo/2. Since the cut-point of this binary symmetric situation will be at zero, the normalized distribution diagram will appear as in Fig. 8 (where y* Y 2-Y 1) 19

p,(y*) P2 (Y*) 2 Ek EbNo:2 2 2 - +E Eb Y= Y2-y, Fig. 8. Distribution diagram for simplex binary symmetric situation. The equation for either of the curves will be: y* + E1 (Y* + Eb p(y*) = p e - ep (12) ( EN EN 2 0 where: a 2 Then, using Eq. 10: (let y = y ) 0oo o Eb =p f Pl(y)dy 1 f e dy 2 (13) = 1 | exp -2 ) dy (13) This is the equation that was plotted in curve (a) of Fig. 4 (Section 2. 3). 20

PSK. A direct way to obtain s1(t) = -s2(t) is to use phase-shift keying (PSK). For this case, the received signals can be written: sl(t) = + A V2 sin wt s2(t) = - A V sin wt (14) Then, the following are true: Eb = A2Tb 2 A2TbN = b (15) This elementry calculation for energy per symbol (and the ones to follow) must be regarded with caution. Such square wave calculations would be precise only if an infinite BW were available (although signals may be relatively closely spaced in frequency; see Section 3. 3 for discussion of BW). The amplitude of the waveforms over a symbol will usually be an A(t) rather than a constant A. The energy calculation would then be based on this A(t). We will plot most curves of this report in terms of Eb/No, where Eb/No = the signal energy per bit divided by the noise power per cycle. 3. 1. 2 Coherent Orthogonal. For the orthogonal case the relations of either Eq. 1 or 3 must be fulfilled; i. e., the correlation between signals is zero. For this case, one can again have the two basic receivers shown in Fig. 7 or their matched-filter versions. Because the signals are now orthogonal (instead of having a negative correlation as in the above simplex case) the p(y)'s have the same shape, but are now "closer together. " Using the receiver of Fig. 7(b), the mean of theoutput distributions p(y) will be + Eb, and the variance EbNo. The p(y) diagrams for this case are shown in Fig. 9. Using Fig. 8, it is seen that here the variance is twice that of the simplex case. FPsed on this p(y), an'd using Eq. 10, the Pe is given by: e P = = 1 e-y/2 dy (16) e B This equation was plotted in (b) of Fig. 4 (Section 2. 3). 21

-Eb +Eb Y Y2 -y Fig. 9. Distribution diagram for the coherent-orthogonal, binary symmetric situation. Coherent FSK. A way to obtain coherent-orthogonal signals would be to use coherent FSK. The signals for this case can be written: s (t) = /2i A sin wlt s2(t) = / A sin [w2t + 0] = 2 A sin [(cw1 + AO)t + 0] (17) where: 0 is kept constant In order for these waveforms to be orthogonal it is necessary that the following equation be fulfilled: T T f s(t) s(t) dt = 2A2 f sin colt sin [(0 + A0w)t + 0] dt = 0 (18) 0 0 nib equation is zero if: A W = -T 2 where M, N are integers and (2w1 + a) = r (19) These conditions are independent of 0. If 2c1 >> Acw and 2c 1T = K2v7, only the first condition of Eq. 18a need be considered. For such usual cases, then: M c2 = 1 + 2+- M = 1, 2, 3,... (20) Thus the coherent frequencies can be spaced 1/T apart, under the usual conditions, for orthogonality. The E for this case would be the same as that of Eq. 15. 22

3. 1. 3 Incoherent Orthogonal. For the incoherent-orthogonal case the signals are orthogonal, but incoherent operation of the receiver is used. The two possible optimum receivers are shown in Fig. 3. Above, the P was investigated by finding the p(y)'s, and (using symmetry) one e needed only to evaluate one "tail" of a Gaussian distribution, since there was symmetry about the cut-point. For the incoherent case the distributions are not symmetrical about the cutpoint, as will be seen. For incoherent orthogonal the two distributions of interest concern the envelopes. The distribution of the filter output which is not matched to the signal is given by the distribution of the envelope of narrowband, white Gaussian noise (i. e., the Rayleigh distribution). Thus, (Ref. 5): -Y1 2Z/2 PO(Y1) = Y1 e (21) where: P(y) = probability density of output yl if symbol 0 is sent. The distribution of the output whose filter is matched to the signal is given by the distribution for the envelope of a sine wave plus noise, sometimes referred to as the Rice distribution. Again, using Ref. 5, this is: y12 + A2 Yl P 1(Y 1) = Y 1 e 2 o(y 1A) (22) where: p (y ) = probability density of output Y1 if symbol 1 is sent A = peak-signal-to-rms-noise voltage ratio = Io(z) Bessel function with imaginary argum ent x0 Z2n n=0 22nn!n! Equations 21 and 22 are the densities of interest for both incoherent AM and FSK. Huwever, the two cases are different because of cut-point considerations. 23

Incoherent AM. For incoherent AM (or "off-on") there is only one matched filter [a single one of Fig. 3(a), or a single pair of Fig. 3(b)], and the distributions appear as in Fig. 10. Note that the abscissa Y1 in this diagram is the output of the single matched filter, and not the difference (Y2-Y1) as in Figs. 8 and 9. If symmetric errors are desired, one sets the point x so that the two areas shown are equal. If one chooses symmetric operation, the proper x can be found by the use of tables, and the following procedure: Find x such that: 1 - Q(O,x) = -ex/2 Q(A,x) (23) where: Q(A,x) = f y e-Y /2 eA2/2 I (Ay)dy x Then: -x /2 P P e-X2/2 (24) e B P(Y1) I (y,) x Y Fig. 10. Sketch of probability distributions for the incoherent, orthogonal AM case. Incoherent FSK. For incoherent FSK the distributions of Eqs. 21 and 22 hold, but it is difficult to sketch the distributions in the fashion of Figs. 8 and 9. The P calculation will be shown in Section 3. 2. 2. 1; we simply note the result here. For symmetric errors, incoherent (orthogonal) FSK results in: 24

Eb 2N P P.5 e 0 (25) e B This was the graph plotted in (c) of Fig. 4 (Section 2. 3). One will obtain this P if one uses incoherent FSK, which is an incoherent-orthogonal case. 3. 2 Nonbinary Systems By the earlier definition a system is nonbinary if the decision process involves more than one "bit. " Thus, there are two nonbinary situations: (1) intersymbol constraints are imposed on the original binary decisions —-called "error-coded binary signals," and (2) M-ary signals are used (see Table I). Suppose that, with a given S/N ratio, the Pe is unacceptable. One can then consider remaining binary and increasing the symbol time (which decreases the data rate), or going to a nonbinary situation. Suppose it is necessary to keep the information rate constant; then it is necessary to go to the nonbinary systems. Of the two possibilities, error-coded binary and M-svmbol alphabet, we consider first the error-coded binary. 3. 2. 1 Error-Coded Binary Signals. Since error-coded binary signals involve groups of binary digits, the first probability calculation that can be made will be a word probability (Pw) —-that is, the probability that an error-coded word is incorrect. This result is immediately applicable if the original data are words of the same length as the "number of information digits" in the error-coded signal. If the original data are independent bits, one would like to find the equivalent information-bit error-rate from the P. Unfortunately this is exceedingly difficult and we are not aware of any such calculations. Consequently we will be restricted here to a calculation for the P in an error-coded situation. w In error-coded binary, a message word of m information bits is encoded into an n-bit code word, containing k = n-m redundant digits. Clearly, with this increase in numbers of transmitted bits, the net information rate must decrease, or conversely, the bit transmission rate must be increased, with consequent increase in bandwidth needed if information rate must be held constant. After the coding, the binary code words are then transmitted by one of the schemes already discussed in Section 3. 1. Thus, the bit-error probabilities PB are determined by the modulation-detection method, while the word-error probabilities P are determined by the 25

error-encoding and decoding scheme used. Therefore, the relations below are given in terms of bit-error probability, which have already been described. As a comparison, the word-error probability for an n-bit uncoded word is simply: Pw(n)= 1 -(- P)n (26) where: PB is the bit-error probability. The word-error probability (Pw) for a general error-coded word (where the errors are assumed independent) has been worked out (Ref. 11). The assumption is made that the bit-error probability PB is known and is independent for each digit. This is the case for a channel perturbed by white Gaussian noise, but does not necessarily hold for cases when the channel is disturbed by impulse noise. The word-error probability is given for a code word of n bits, containing m information bits and k check digits —-denoted as (n, m). Thus, there is a total of 2m information words and 2n message words, leaving r redundant words, where 2n 2 = r (27) n-m r redundant words or 2 - 1 - 2m information word Therefore, the code should be able to correct at most 2 errors, since each information word can have a total of 2 message words associated with it. The best code will correct all combinations of errors up to s-tuple, plus some Js+l Of (s+l)-tuple errors, where Js~1 () + js+1 = 2n-m (28) i=O s+ The total probability of correctly receiving the word, or receiving a word with correctable errors, is: s n i n-i p s+i n-s-1 (29) = ()P Q +j P Q c i=O 0 B Qe B e where: QB = 1 - PB 26

So the probability of a word received in error is: S n i n-i s+ 1 n-s-l PW(nm) = 1 - [0 (i ) PB QB + jS+l PB QB (30) This expression can be evaluated for a particular code and modulation scheme. Figure 11 shows the word probability of error (Pw) for a (16, 5) and a (32, 5) code. Both incoherent (orthogonal) and coherent (simplex) binary detection are shown. These curves were calculated using -Eq. 30 with the P's given in Fig. 4. Also, the information data rate and the average power of the transmitter are kept the same as for the uncoded transmission. This is necessary to make a sensible constant data rate comparison. This means that, as is increased, the Eb per bit decreases. As seen in Fig. 11, there is relatively little advantage in this type of error code (for constant data rate and average power) for m = 5. It must be emphasized that Eq. 30 is for codes where the errors are assumed independent. For more modern "burst correcting codes," we know of no equivalent relation, and no basic work was done on this in this contract. The reason for this, as mentioned in Section 1. 3, is that the real power of error codes lies where feedback may be utilized. Since speech channels do not usually adapt themselves to such cases, error-coded signals are not of greatest importance to us. 3. 2. 2 M-Symbol Alphabet. The other general way to obtain better P's, in cone trast to the error-coding schemes described above, is to use communication signals which are M-ary rather than binary. Again, referring to Table I, there are two cases: (1) M-ary basic symbols and (2) block coded binary symbols. Although, from the theoretical error point of view these two cases are the same, there is good reason, from the equipment point of view, to separate the cases. The essential profit of going to M-ary symbols lies in achieving the "integration of more energy per decision" (see Section 1. 4). In this section we will consider the probability-of-errdir calculations for M-ary signals in general; then the different ways to implement the M-ary basic symbol and block coded binary signal will be noted. 3. 2. 2. 1 General Pch Calculations. If one calculates the error-probability for an M-ary signal, the result is a character error rate rather than a bit error rate (PB) —-hence 27

.4 INCOHERENT.3 \ ORTHOG.(16,5).2 - (5) \ (32,5) (UNCODED)\ -I \ >-' _ 65 \ o.02 - 0. -~~2_2 10 L..005 P(5) l005 \ (UNCODED)A cr 3.003 U.002 - o cr BINARY r -'SIMPLEX (32,5) 3 - 2I"~ \ \ (16,5) 15 1 2 3 4 5 10 20 30 50 102 200 300 500 10' 25-.5 I 2 3 4 5 10 20 30 50 102 200 300 500 I03 Eb/No 28

the use of the symbol Pch' This is reasonable since an M-ary symbol can be viewed as equivalent to a "group of bits." If the original data are words, one should convert the Pch to a Pw, if they are not already equal. If the data are bits, one should convert the Pch to equivalent bit-errorrate (PB). In this section, then, we will be dealing with the Pch. For the comparisons of Section 3. 3. 1, the Pch's will be converted to equivalent PB and P. For M-ary symbols, the 4 types of signals to be studied are: 1. Orthogonal- - -coherent and incoherent. 2. Bi-orthogonal signals (require coherence). 3. M-phases —-coherent or differentially coherent. 4. Simplex signals (see Section 2. 2). We now discuss P h calculations for these four cases. Orthogonal Signals. The various types of orthogonal signals were described in Sections 3. 1.2 and 3. 1. 3. We will first note the Pch calculation for the coherent orthogonal case. The receiver used for all of the signals below will be the "optimum" receiver, as discussed in Section 2. 1. For M-ary signals, optimum receiversusethe"maximum likelihood detector;" that is, one chosses the greatest correlator output out of the M possible outputs. The Pch calculation then proceeds as follows. Consider that signal M is sent. Then a certain voltage YM (determined partly by noise) occurs at the output of the Mth filter. The probability of a correct decision P(c) for that YM isthen determined by the probability "that all other yi's are less than YM. Then, if one integrates over all YM, the result is P(c). Let: PM(YM) = the probability density of the output of the correct Mth filter when signal M is sent P(Y1, Y2'... YM-1 < YM) = the probability that the outputs of all the other filters are less than YM. (31) Then P(c) can be written: P(c) = S PM(YM) P(y1, Y2'" YM-1 < YM) dYM (32) -00 29

x:here: P(c) = probability of correct decision = 1 - P ch Now, since signals s. are orthogonal, the corresponding noise outputs yj are independent, and hence: P(y1, Y2,...YM-_1 <y) n1 P(yi< yM) j=1 YM= [P(yj < Y M-1 (33) If white Gaussian noise is assumed, y.2 YM YM i P(yj < YM) = f PM(Yj) d(yj) sf e N d(y) (34) M - M ] where: PM(Yj) is the probability distribution of the output of filter j, which is not matched to the incoming signal, when signal M is sent Now substituting the Gaussian expression for PM(YM) and P(yj < YM), and writing Pch = 1 - P(c), the Pch for orthogonal-coherent cases is given by the following (for simplification let y = YM = output of desired matched filter): 2 M-1 (y-E)2 Yj f e 1 o dyj Pch =1 -o -o0 e d(yj) (35) where: E = energy per character or symbol. No = single sided noise power density In general this integral cannot be evaluated in closed form. However, computer calculations have been published (Refs. 36, 37 and 12). It should also be noted, for M=2, that this equation provides the same result as the binary coherent, orthogonal case treated in Sections 2. 3 and 3. 1. 2 (for M=2, Pch=PB). In Section 3. 1. 2 a slightly different Pe derivation is referred to. To find the appropriate E for the sine wave signal, Eq. 15 of Section 3. 1. 2, may be used. Note that this E is now energy per character (as opposed to energy per bit Eb). 30

Incoherent Orthogonal. For the M-signal incoherent system (for example, a multiple-FSK system using M incoherent frequencies), filters matched to the carrier pulse shape may be used (see Fig. 3), followed by envelope detectors and a decision device which chooses the filter having the greatest output at the end of the transmisstion interval. Because of the envelope detectors, the probability distributions at the outputs will be those of either a noise envelope, or a signal-and-noise envelope. Assuming a maximum likelihood detection, the general equation (32) still holds. But now, using the distribution for the envelope of "noise alone" (Ref. 5) the probability of YM exceeding any yj is given by: 2 YM -yj 2/2 - ~P(~y.~< ~yM) 10J ~ yHi ~e= 1-e (36) P(yj( < I)y yj e dy (36) 0 Also, the distribution PM(YM) for "envelope of signal plus noise" is given by (Ref. 5): y + 2E/N (37) p(y)= PM(Y)= ye I (37) io 0 Y N where: E = energy per character or symbol No = single-side noise power density Using these relations in Eq. 32, the Pch is given by (Ref. 11): YZ +2E2 M-1 P = 1 -S e N 1o(y/1/ N [1 - e dy (38) ch - 2 N2 The term inside the bracket may be integrated and expanded into the binomial series: y2-M-l ry2 M-1 1 - e = 1 ( ) e (39) Then one can complete the second integration, and the result is: E E 1 N rM Nr Incoh-Orthog: Pch = M e (-1) ( ) e (40) ch- M r=2 r 31

where: E = energy per character or symbol No = single-side noise power density -Eb/2No For M=2, ordinary FSK, this reduces to PB = 1/2 e, the expression previously derived for uncoded binary signalling. The above expression (40) holds for any orthogonal system employing envelope detectors, since only the assumptions needed are that the M signals are orthogonal, envelopedetected, and perturbed by white Gaussian noise. Again, the proper expression for the E of the sine wave signal is as given in Eq. 15. Bi-Orthogonal Signals. A signal which is better than coherent orthogonal signals (for Pe purposes) is a set of M-ary orthogonal signals and their negatives called, "bi-orthogonal signals." For example, Kineplex can be considered as using such a set, along with differentially-coherent detection. Such signals require coherence since one must be able to distinguish between a signal and its negative. Two "natural" examples of bi-orthogonal signals are: 1. The Kineplex signal, where the cosine and sine of a single frequency are orthogonal, and two phases of each wave are available (a signal and its negative) —-natural in frequency. 2. A block coded binary (see below, and Ref. 12), where coded blocks of binary digits are made orthogonal to each other (natural in time), and the two phases are available so that a negative for each signal is available. We now show the calculation of Pch for this case. The basic idea is similar to that of the orthogonal case (Eq. 32), except that now one must specify that YM/2 have the proper sign (since there will be a signal which is the negative of the M/2th signal that produced YM/2). Thus the basic equation for Pch is the following (for simplification let y = YM/2 output of desired matched filter): for j = 1,~,...(M/2 -1) Pch 1 -Prob [{yj< I} and y 0] (41) 32

r 2 - 1 1 - f P/(Y[P( y < IYI)] 2 dy (42) Now: P( IYjl < YI) = Y PM/2(yj)dyj (43 -y Using this, the Pch can be shown to be (Ref. 12): M (y-E)2 ___ oo N 0 +y N - ~ch ~ - e f e dy dy (44) o where: E = energy per character or symbol No = single-side noise power density This expression has also to be evaluated on a computer; curves have been published for block coded binary cases, and these are immediately convertible into the general cases. It may be noted that the binary simplex signal [s1(t) = -s2(t)] is also a biorthogonal case. Hence, when M=2, the above expression is equivalent to: y2 Co 2 pch = ff BSf e dy (46) N which is the equation used in Section 2. 3 and 3. 1. 1. Bi-orthogonal signalling requires only half the bandwidth and half the correlators (or matched filters) required by orthogonal coherent signalling. For these reasons, such signals have great attractiveness from the engineering standpoint. Comparative curves will be given later. 33

Simplex Signals. From the purely theoretical standpoint, the best set of signals when considering P are the simplex signals (Section 2. 2). e It is remembered that (Ref. 3) simplex signals, viewed as points in signal space, correspond to a geometrical regular simplex where every vertex is equally distant from every other vertex. If there are M signals, the geometrical simplex appears in an M-1-dimensional space. In one-dimensional space the signal is a binary PSK wave; in two-dimensional space, a 3-phase wave at 1200; in 3-dimensional space (M=4) no signal is easily describable, but the simplex is easily constructed (Ref. 21). However, the Pe calculation for M-ary simplex signals is difficult to achieve in closed form. This is because the noise outputs of the unmatched filters (see Fig. 2) are not independent. However, one can obtain the Pch's from the coherent orthogonal case since one can show (Ref. 16) that use of a simplex set instead of an orthogonal set is equivalent to an increase in energy of M/M-1. Hence, if one has the curves for coherent orthogonal, one can immediately plot them for simplex signals. Remember that, for the binary case, s1(t) = -s2(t) is a simplex signal. 3. 2. 2. 2 Basic M-ary Symbols. The general P ch relations for M-ary signals were given above. As mentioned before, two ways of obtaining the M-ary signals are: (1) basic M-ary symbols, and (2) block-coded binary. First we consider basic M-ary symbols. For M-ary basic symbols one can have: 1. Simplex signals, constructed generally so that the M points correspond to a geometrical simplex. 2. Bi-orthogonal, sine and cosine of given frequency orthogonal, and negative of each frequency. 3. Orthogonal Coherent (M-ary, coherent FSK). 4. Orthogonal Incoherent (M-ary, incoherent FSK). 5. M-phases, coherent or differentially coherent (see below). The M-phase signal is included here because we will later use this signal as an example of a close-packed signal, (see Section 2. 2). For the other signals, the Pch equations given above can be used directly, by determining the proper E for each situation. 3. 2. 2. 3 Block-Coded Binary Signals. Block-coded binary signals represent a very practical wayto implement "more energy per decision. " The practicality comes from the fact that binary operations are involved, both in the transmitter and the receiver. 34

Also, it is possible to get the "best" signals rather easily this way, as we now note. M-ary simplex signals can be obtained by using linear shift register sequences (Ref. 14) to form the binary block. For this, one uses (with an L = 2n-1 long sequence) the entire length of the sequence, and the L signals are obtained from using the sequences one clock step delayed from each other. It is well known that the correlation between such signals is -1/L-1 = M1 (Ref. 14). Orthogonal block codes can be generated in a number of ways, and shift-register implementations are also possible (Ref. 12). Bi-orthogonal signals (discovered by Muller and Reed, Ref. 15) also can be readily generated by shift-register means (Ref. 12). The Digilock communications system uses such bi-orthogonal codes. It is a telemetry data link where 5-bit words are shiftregister encoded into (16, 5) bi-orthogonal codes (32 possible words). To obtain the Pch for any binary block-coded symbol one uses the appropriate equation, 35, 40, or 45. Again it is necessary to find the appropriate E. It must be stressed that the E in each of these equations refers to the energy per complete character. Hence the energy per bit Eb must be multiplied by the "number of bits per character. " 3. 2. 2. 4 M-Phases. A set of M-ary signals, somewhat different from those considered above, is the use of M-phases in coherent situations. For our purposes, these signals represent one example of a "close-packed signal" (if M > 8). These signals are unique in that they only require 2 dimensions, no matter what the level of M (see Table I). One significance of this lies in the ultimate capacity that one can achieve when S/N is very high. A comment on M-phase signals, relative to the "simplex, bi-orthogonal, and orthogonal" category, is pertinent here. Each of these three signals (treated above) is essentially based on the idea of being relatively far apart, so that a certain degree of immunity to noise is achieved by sacrificing bandwidth. The basic worth of these signals is thus intuitively obvious, when conditions are such that one must resort to an M-ary signalling situation. M-phase signals, on the other hand (for M > 2), are based primarily on the idea of conserving bandwidth at the cost of some S/N ratio (i. e., poorer Pch); also, when E/No is high (so that Pch requirements can be met), the M-phase signal is a better signal (BW35

wise) than uncoded binary. Two methods of operation are possible with M-phase signals: coherent (CPSK) and differentially coherent (DCPSK). In coherent operation the receiver must have a phase coherent signal (phase coherent with the transmitted signal). The receiver then appears exactly as in Fig. 2, where the sj(t) are gated sinusoids at the various phases. For differentially coherent (DCPSK) operation, the information is extracted from "phase now relativetophase inprior symbol. " Here it is only necessary that phase coherence be held over one symbol length. This operation (used by Kineplex) is more easily realized in practice than strictly coherent receivers. One result of this operation is that errors tend to occur in pairs. For calculating Pch of M-phase signals a different procedure from those above must be considered, since now the noise outputs of the "unmatched" filters are not independent (the "simplex" outputs are also not quite independent). For this reason it is of interest to note the Pch calculation for M-phase. A brief sketch of a Pch calculation for M-phases (taken from Ref. 113) will now be given. Coherent M-Phases. Again we are interested in the Pch for a "maximum likelihood receiver. " Such a receiver would be as shown in Fig. 2, where the correlator or matched filter is matched to each phase. Each signal of an M-phase system can be written as s(t) = f2S cos (wot + 0) (47) where: S is the received signal power, wo0 is the angular center frequency, and 0 is chosen from a discrete set 2Tk/M, for 0 - k - M- 1. The detection problem amounts to deciding which of the M possible transmitted phases is the received phase. Assuming the M phases to be equiprobable, the receiver chooses the 0 which has the maximum a posteriori probability at the detector. This can be calculated (Ref. 13) by determining the probability that the received signal phase falls within the section 0 < + 36

Assuming that the received signal is narrowband, perturbed by additive Gaussian noise, the interfering noise can be written: n(t) = I cos w t + I sin w t (47a) C 0 S 0 where: Ic and Is are low-frequency random variables with zero mean. The total received signal is then x(t) = s(t) + n(t) = (2- + I ) cos wot + I sin o ot (48) assuming that the desired phase 0 is zero. The probability density of the phase of x(t) can then be written, using the method of Rice (see also Ref. 10, p. 166): cos l (0) = e 1 + V4 4S/N cos 0 e N cs (-2- SNcos 0) (49) where 4(x) is the cumulative Gaussian function: x2 x -24"(x) = 1 e dx Because the S/N ratio here is an input S/N, the conversion to E/No is given by: NE (50) N N 0 Then p(0) appears as: E E 2 p(0) -- -- 1+ Cose 1 + os 0 ) (51) N wivtch is the probability density of the phase of the received signal. Integrating this density 7/ 7r over - 0 < K gives the probability that the received signal phase falls within the sector bounded by the decision levels + -; i. e., the probability of error for an M-phase syskern is 37

7/M Pch = - f p(O) dO (52) -r/M This can be integrated by numerical or graphical means to give curves or error probability versus E/No for given M (see Ref. 13). For reasonably large x, the integral 4(x) can be approximated by the first terms of an asymtotic expansion; thus X2 2 O(x) ~1 e (53) O2x Using this in the expression for p(O), the result is E NE cos2 (cos2 -1) p(0) e cos 0 e cos 0 e (54) Since 1 - cos2 0 = sin2 0, this can be written as E0 NE sin2 0 N sin p(0) - cos 0 e (55) Curves based on previous integrations of Eq. 52 (Ref. 24), which show Pch vs. E/No, are given by dashed lines in Fig. 12 (taken from Fig. 3, Ref. 13). Note that the ordinate is Pch; if bit error rate is desired, the Pch must be converted to PB. Also, the abscissa is in terms of E, where E = log2 M Eb. Differentially Coherent M-Phase (DCPSK). For differentially coherent systems, the Pe calculation is quite involved. Here we will present only the results (Ref. 21 p. II-23) for comparison with the CPSK case above. The Pch vs. E/No curves are shown by the solid lines in Fig. 12. Comparing DCPSK with CPSK, it is seen that, for a P of 10 4, the db ch loss is 3. 6 db for M = 16, and drops to 1 db for M = 2. 3. 3 Comparative Results and Conclusions Since the various signals and operation methods dealt with above represent alternatives available to a system designer, we wish here to draw sensible comparative graphs showing the relative advantage of the alternatives wherever possible. Note that in some cases 38

1.0 _____,___DCPSK.5 --: —-CPSK 10.C2 53-2 M=2 M = 4 M: M: - \ \ \ "' 5 3t'- \ \\/ 2- \ \~~ / 10 2 3 5 10 2 3 5 102 2 3 5 103 E/N 0 Fig. 12. Pch versus E N0 for M-phase CPSK and DCPSK. 39

this is too laborious and so desired comparisons cannot be made without further research. The objective in the following graphs is to show the Pe versus Eb/No for the various types of signalling. Also, the "theoretical" bandwidth for each case will be noted. The viewpoint will be that there is a certain power, S, available, and that this cannot be increased. The other three variables then are P e, data rate, and bandwidth. In all cases we will make essentially "constant data rate" comparisons; this is occasioned by plotting all curves in terms of "energy per bit/noise power per unit bandwidth. " If the energy per bit is constant (a vertical line on the graphs), and power is constant, then a constant data rate will result. The comparisons that one makes depend somewhat on the priority of Pe versus bandwidth, assuming a constant data rate. In Sections 3. 3. 1 and 3. 3. 2 it is implied that Pe is more critical than BW, and the P's of the signals are compared. Section 3. 3. 3 compares signals in terms of conserving BW at the expense of P e's. When comparing the advantages of the previous alternative signals, where Pe is deemed essential, it is sensible to consider two separate cases: (1) bit-error comparisons, and (2) word-error comparisons. If the basic information is a sequence of independent binary bits, (or if one is comparing different types of systems) the proper criterion is the probability of error of the bits, and one is interested in how the various alternatives described above affect this bit probability of error. On the other hand, if the basic information consists of such things as teletype or any sampled analog data, where the basic information is already groups of binary bits, tnen the important criterion (and the criterion which the user will want to talk about) is the P of the "word. " w Consequently, the following comparisons will be made on these two separate bases. We begin with the bit-error comparison. 3. 3. 1 Bit-Error Comparison. The first item of interest is to consider the alternative types of signals when one is transmitting straight binary. The possible signals are: simplex, coherent orthogonal, and incoherent orthogonal. A graph of PB vs. Eb/N for each of these three types of signals was shown in Fig. 4 of Section 2. 3. One can find, for a given No, what PB can be obtained for a given energy if one uses either of the three types of signals. Error-Coded vs. Binary Signals. We simply wish to note here that the probability of bit-error comparison between error-coded and binary signals is not yet possible. Quite laborious calculations would be involved in attaining this. It will be remembered that in Eq. 30 of Section 3. 2. 1 the item discussed was the probability of a word error —- that is, 40

the probability that the amount of digits in error is greater than the correcting capability of the error code. In order to convert this Pw to a bit Pe it is necessary to know the "average number of information digit errors per error-coded word error. " This is quite a laborious calculation and requires effort devoted singly to it. This calculation was not deemed of sufficient worth to the goals of the contract to warrant the necessary effort. M-Symbol Alphabet vs. Binary. The first graphical comparison will compare the bit Pe for M-symbol alphabet vs. binary. As mentioned previously, the chief occasion for considering M-ary signalling as opposed to binary is when the Pe is not good enough (for binary) with the given Eb/No and data rate. If one uses M-ary symbols (M > 2) one is in effect increasing the energy per decision. If one keeps the data rate and the energy per bit constant, one can obtain improved P's at the sacrifice of a slight delay (until decision on M-ary symbol is possible), some equipment complexity, and increased bandwidth. Figure 13 shows the comparison of binary versus M-ary for coherent, bi-orthogonal signalling. The basic data were obtained from the computer calculations of Eq. 45 as reported in Ref. 12. Since the basic data are character probabilities, these data were con2n-1 verted to bit errors by multiplying by 2n 1 (where n = log2M). This factor is approximately 2n-1 2 "the average number of bit errors per word error. " The reasons for choosing bi-orthogonal cases as the coherent comparison (instead of orthogonal or simplex) are: (1) if one can use coherence, the bi-orthogonal signalling gives better P's than orthogonal signals; and (2) the bi-orthogonal signalling is "more practical" than either orthogonal or simplex, since only M/2 correlators and half the bandwidth are required. Figure 14 shows the binary-M-ary comparison for incoherent orthogonal signalling. 2n-1 These data were obtained from Eq. 40, again multiplying the character error-rate by n to obtain P. Figure 15 compares binary-M-ary for coherent M-phases. The data here were 2n-1 obtained by multiplying the Pch of Fig. 12 by 2n to obtain the approximate PB and converting the E to Eb by using E = log2M Eb. The "theoretical BW's" shown on each of the above curves are taken from the values in Table I, and deserve comment here. These BW's would be realistic under the following conditions: if a large number of communicators all used the same signalling meth41

1.0.5 BIT ERROR PROBABILITY.3 - 81-ORTHOGONAL SIGNALS.2 P ( n-2 PX (M) X Pch (M) 10-I M 2 TbLOG2 M 53 M 2~ \\\ 42,4.. N ~ 16 64 256 4096 3 2-:0-4 I5 1.0 2 3 5 10 2 3 5 102 2 3 5 10 Eb/,N Fig. 13. PB(M) versus Eb/No for coherent, bi-orthogonal signals, comparing binary with M-ary. 42

1.0.5.3 -.2 n-' 2ps - - Pch 10''~~~~~~~~~~~~~~~~~ lo-, n = LOG2 M 5 3 2 W-1cf~~~~~~~~2 Tb LOG2M to 5 - 3 - 2 —m i 2t- ~M 2, N=12 i" i( 4, N = 16 10,N:.6 8, N 4 16,N: 3 64,N= 2 4096,N:1 2 53 2ICF5. 1.0 2 3 5 10 23 5 102 2 3 5 103 Eb/No Fig. 14. PB(M) versus Eb/No for incoherent, orthogonal signals comparing binary versus M-ary. 43

od, and controlled their frequencies with respect to each other, each channel would consume the BW shown. We might call this the "controlled-environment' BW. These BW's are established by writing an orthogonality integral, to make "crosstalk" between channels theoretically zero. Note that, while adjacent channels may be noninterfering with this BW, each channel actually uses a much larger spectrum (although the major energy is in the central lobe), so that, for example, square-wave video is attainable. Although the practical use of these BW's is limited, they do serve well for comparison purposes. A more practical BW for each signal is also given in Table I. Here the criterion is that the main lobe of each signal is used, and everything outside filtered severely. Note that, if one uses these BW's in practice, previous theoretical calculations must be altered. sin x For example, orthogonal signals will not be quite orthogonal; video pulses will be x type, rather than square-wave; and simple energy calculations for signals (Eq. 15) will have to be altered. The actual practical BW's of the various signals in an "uncontrolled environment' require substantial work, and must be tailored to the situation since results depend upon the assumptions. 3. 3. 2 Word-Error Comparisons. If the original data are words, it is the worderror probability which is of interest. We wish to compare the Pw's of M-symbol alphabet versus binary. In order to compare word errors sensibly, one needs "equivalent words. " That is, it makes little sense to compare the error-rate of an n-bit word with that of an m-bit one. Therefore all the comparisons below are made in terms of a 12-bit word. As suggested in 12 6 4 3 2 Ref. 22, the 12-bit word is good for M-ary comparisons because 2 = 4 = 8 = 16 = 64 4096. In the following curves the Pw for a 12-bit word is plotted vs. Eb/No —-the "energy per bit" over No. If the data rate is assumed constant, and a maximum power S 0 assumed, then the energy per bit remains constant. Figure 16 shows the 12-bit Pw for the three types of binary operation. The Pw is related to the PB by: Pw -(1- ) = 1 -(1 - p)12 (55a) The PB's for these curves were taken from Fig. 4. 44

.5 DCPSK.............2'0Tb 2 2 M4 Mz \M:6 io-2~~~~~~~~~~~~~~~~~~ 5\\\ - \-\ 5:~ \ ~ \\\ 5 ~~~~~~~\ \\ -5 35~~~~~~~4 1.0 2 3 5 10 2 3 5 io2 2 3 5 103 Eb/No Fig. 15. Pg versus Eb/No for M=4hase (CPSK -i- DCPSK).

1.0 Pw:I -(I-P)f I- ( ) - Tb INCOHERENT ORTHOG.05.03 - Tb.02 - COHERENT ORTHOG -2 BINARY SIMPLEX Tb 310-3 0 2 4 6 8 10 12 14 Eb/N~ Fig. 16. Pw versus Eb/No of 12-bit word for three types of binary transmission. 46

Figure 17 shows the binary-M-ary comparisons for a 12-bit word using bi-orthogonal signalling. This would be the coherent comparison. For these curves the P is calcuch lated from Eq. 45, and Pw obtained by using: P =1 - [1 - Ph(M)] N (56) w 12 N where: 2 = M The curves of Fig. 17 were taken from Ref. 22, Fig. 3. Figure 18 is the incoherent (orthogonal) comparison. The Pch are calculated using Eq. 40, and then Eq. 56 is used to obtain P. The data plotted in Fig. 18 were not recalculated, but were taken from Ref. 22, Fig. 4 after due checking. Figure 19 shows the 12-bit Pw vs. Eb/No for M-phase signals. This curve was found by using Eq. 56 and obtaining the Pch from Fig. 15. Since the M-phase signal requires coherence, it is sensible to compare Fig. 19 with Fig. 17. A remaining comparison is error-coded versus binary. Since a calculation for a 12-bit word was too laborious, only a 5-bit word was used. Curves for a (16, 5) and a (32, 5) error code, both coherent and incoherent, were shown in Fig. 11. Note that, in this comparison, the bandwidths are not the same. The bandwidth expansion for error-coding for the same data rate is equal to the digit expansion (16/5 and 32/5) of the code. 3. 3. 3 Information-R'ite Comparison. The curves of the above two sections are most directly useful where Pe is the essential consideration, and where BW is more or less flexible. Those curves suggest what one must do if a given Eb/No and given data rate do not provide a suitable Pe; also, the curves depict the cost in bandwidth as M increaseS. If the major consideration is conservation of bandwidth, and P is somewhat e flexible, then the quantity of interest is the "information-rate per BW" vs. Eb/No. Such curves would depict how to get a given information-rate through a specified BW, but the high information rate may have a great many errors. Figures 20 and 21 show "information-rate/ BE" vs. Eb/No for the two cases shown before (bi-orthogonal, incoherent orthogonal). The information rate (I), as opposed to data rate (R), is a measure of the average number of correct symbols (or bits) per unit time in a noisy situation. If there is no noise, the information rate equals the data rate. For the orthogonal cases, the information rate is given by (Ref. 6): 47

1.0.5 BASED ON 12- BIT WORD.3 -.2 M 2TbLOG2M 5 M 3 2 t-~ \ \ ~ —4- 8 16 64.-4096 5t -3 5 3 2 i0-4 - 5 3 2 i1.0 2 3 5 I0 2 3 5 102 2 3 5 103 Eb/No Fig. 17. 1w versus Eb/No for 12-bit word with bi-orthogonal signals. 48

1.0 I I.iT-r-T BASED ON i2-BIT WORD.3.2 W: M i0-' Tb LOG2M 5 3- M 2 2- ~ ~ ~ ~ -2~~~~ 8 0 ~~~- 16 64 5 49~~4096 323 20 ] 2-05 0'5 I I I [ i I I I I II I 1Ii......L.. 1.0 2 3 5 10 2 3 5 I0: 2 3 5 & Eb/N Fig. 18. P w versus Eb/No for 12-bit word with incoherent orthogonal signals. 49

log2M x 1 l ch log2M xb [log2-M + Pch log2 (1-Pch bits/sec (57) where: I = information rate = average number of correct symbols log2M x Tb = T = length of general M-ary symbol Tb = length of equivalent bit PCh = probability of an M-ary character error. For the binary (orthogonal) case, this reduces to the well-known equation: 1 + PBlg 1 1- bits/sec (58) I = T -B 102 PB + (1 - PB)log2 (1 - PB b /s The value of I was divided by the "theoretical controlled-environment BW's" shown in Table I (and on the previous curves) to obtain the data for Fig. 21, the incoherent orthogonal case. The P h(M) for each curve was obtained from Eq. 40 of Section 3. 2. 2. For the bi-orthogonal case, available data in tabular form from Ref. 12 was used to obtain the I. Again the BW's shown in Table I were used. 3. 3. 4 Comparison Conclusions. To review, curves have been plotted of PB P and I/BW vs. Eb/No for bi-orthogonal, incoherent orthogonal, and M-phase cases. Since the abscissa is always Eb/No, it is implied that one is making a constant data rate comparison. This follows if one assumes that the power S is constant; then, for a constant data rate (constant Tb), the Eb will be constant. We will be making variable rate comparisons in Section 4. 3, when discussing "capacities." For a constant rate comparison, it is sensible to think of two regions, low Eb/N~ and high Eb/N. If the Eb/No is high, the problems are fewer; that is, one can meet the Pe requirements with a bandwidth efficient signal (such as M-phase). If the Eb/N is low (but greater than 1), then one must emphasize either Pe or W. Hence we will consider two cases for low Eb/No: (1) Pe is most important, and W relatively expendable; and (2) W most important, and Pe relatively expendable. If Pe is most important (for low Eb /N where 1 < Eb/No < 20), and one is restricted to a given power S, then it is clear that one must use an M-ary orthogonal signal sufficient to give the desired Pe. From Figs. 15 and 19 it is seen that M-phase signals where M > 8 do not provide suitable P's. From Figs. 13, 14, 17 and 18 it is seen that, 50

1.0.5.2: t W =3 M34 -M16 2 Tb DCPSK 2o-4 3- CPSK _CPSK 5 3 2 o165 I 1 I I 1I I I I1 I I I I I 11 1.0 2 3 5 10 2 3 5 102 2 3 5 103 Eb/No Fig. 19. Pw versus Eb/No for 12-bit word with CPSK and DCPSK.

/M - 2,4 8 M=8 7.4 VI _ 2LOG2M I" BW M,3.2 M 64.5 1 2 3 4 5 6 Eb/N~ Fig. 20 Information rate/bandwidth for M-ary biortitogc.:al signals (coherent). 52

for a constant Eb/No, increasing M with orthogonal signals improves Pe substantially. However, Figs. 20 and 21 give a measure of the cost in bandwidth of increasing M. If one can consider increases in S (for a constant data rate), the steepness of the curves in the figures shows that this is more profitable than increasing M. For example, in the coherent case (Fig. 13), going from M = 2, 4 to M = 4096 is equivalent to a 3. 5 db increase -5 in power (S) for P e 10. In the incoherent case (Fig. 14), going from M=2 to M=64 is about equivalentto increasing S by 6. 8 db. Thus, to improve P for a constant data rate, increases in S (if possible) are more profitable than increasing M; and this is more true for coherent than for incoherent operation. If the bandwidth W is most important (for low Eb/ No) then one must emphasize Figs. 20 and 21. Here the cost of using large M probably becomes prohibitive. From the figures, it appears that an M of 2 or 4 is suitable for coherent operation, and an M of 4 or 8 for incoherent operation. If W is crucial, one should probably avoid going beyond M-8, even for the incoherent case..6 -.5 M 4.4 - M=:.3 _I, </ /~0 *~~~ M [LOG2M +(I- P...LOG.1 LOG2 M.|.05 O I W__ I _ _I ___,_____._III Eb/No Fig. 21. Intformation rate/tbandwidthi for M-ary incoherent orthogonal signals. 53

Finally, if Eb/N > 20, so that P of the order of 10-5 can be obtained with "correlated signals," it appears clear that one should use signals that stress "high information/W" rather than "distinctness" (simplex, orthogonal, etc. ). The signals of interest then are "close-packed," of which M-phase with M > 8 is one example. Figures 15 and 19 show the Pe's for M-phase signals. Considering that M-phase signals become increasingly correlated with M > 8, the M = 8 signal appears profitable for "intermediate" Eb/No (20 to 50). If Eb/No > 50, one should consider, in addition to increasing M of M-phase, the signals which encode both amplitude and phase. Also, other close-packed signals may be considered. These are the conclusions, then, for a constant data rate comparison of some common (and feasible) signalling methods. Many design situations will not be clearly separable into the regions treated above; for all such cases one must balance the advantages and disadvantages. 54

4. CHANNEL CAPACITIES AND MAXIMUM RATES Although the Pe calculations treated before are probably the most basic evaluator for a digital communications system, there may be cases where bandwidth conservation is more essential; for such cases the curves of Figs. 20 and 21, showing "information-rate per W" are of interest. Any consideration of information-rate per W is related to "channel capacity," which is now familiar to all communications engineers. This section might well then be considered an extension of the ideas in Figs. 20 and 21. Channel capacity (C) is the maximum rate at which one can send information over a channel, and still (theoretically) reduce the Pe to zero (by external coding means). Although there is general agreement on this concept, there is much disagreement on the way in which C is calculated, as we shall note below. The practical significance of C is that it should provide a sensible upper bound for comparing various practical signalling alternatives, especially with respect to utilization of W. In a sense, Pe considerations are outside the sphere of C, since the theory holds that "it is possible" to reduce the error towards zero at the rate C by external means. In Section 4. 1 a channel capacity for digital systems is portrayed. In Section 4. 2 we suggest a "practical" method of viewing "maximum rates" (related to channel capacity) in digital systems. 4. 1 Capacity Relations for Digital Systems In general, when beginning consideration of channel capacity, one considers various types of possible channels (Ref. 6). Thus there may be discrete channels or continuous channels. In a discrete channel it is assumed that the input and the output can be represented by some discrete number of events. For a continuous channel it is assumed that the output and input can take on a continuum of values. There also may be mixed systems such as discrete-to-continuous or continuous-to-discrete. There is also the element of discrete time or continuous time. Although most physical channels may be time-continuous, they can often be satisfactorily represented by means of discrete time models (which is our case). 55

The basic calculation used to find the channel capacity for a given assumed model consists of evaluating the "average mutual information" between input and output sequences of events. This average mutual information depends on the probability distribution of the input sequence of events as well as the conditional probability distribution which characterizes the noise properties of the channel. For any given channel, then, the channel capacity (giventhe channel probability characteristics) is defined as the "maximum average mutual information which would result if one maximized over the possible probability distributions of the input" (Ref. 6). 4. 1. 1 Continuous Channels. One of the most familiar capacity formulas deals with a bandlimited continuous channel in which there is additive white Gaussian noise. The restrictions are that the input be a bandlimited time function and that its average power not exceed a given value S. For this situation, the channel capacity is given by the familiar relation: C = W log(1 + NS) bits/sec (57) where: C = channel capacity, when input is a bandlimited time function with average power S W = signal (and hence channel) bandwidth. This equation means that if one has a given average signal power S, if the channel noise is flat and white Gaussian of power No per unit bandwidth, if a bandwidth W is available, and if one agrees to use as input a time function limited to bandwidth W, then the maximum rate at which one can possibly send information with practically zero error is given by the C of Eq. 57. This equation can be applied to a given channel if the channel gain is reasonably controllable or constant (or slowly varying), and if one is going to use continuous input waveforms. One of the bounding conclusions concerned with the proof of this equation is that only with the use of signals having the statistics of white noise could one possibly hope to achieve this channel rate; this is how the probability distribution of the input enters the capacity relation. Even if the actual channel being dealt with is discrete, Eq. 57 serves as the ultimate capacity, since the capacity for a given continuous channel will always be equal to or greater than the capacity for a corresponding discrete channel (where bandwidth, etc., are 56

considered equal). Consequently it is sensible to use Eq. 57 as the ultimate comparison even though discrete signals are used. 4. 1. 2 Discrete Channels. For the discrete case, the channel capacity is determined by the particular situation. A common case is "uniform" channels. A channel is uniform if: P.. all equal (for i # j) P.. all equal (58) where: P = transitionprobabilities of signals between input and output of channel, due to noise. It may be noted that orthogonal signals and simplex signals in a discrete channel result in a uniform channel, but bi-orthogonal signals do not strictly do so. The channel capacity, in bits/symbol, for such uniform channels (Ref. 6) is given by: Pch bits' = log2M + Pch log2 M-1 + ( ch) log21- ch) symbol (59) where: P = uniform probability of error for each of the M-ary decisions. M = alphabet size. It is very important to note that Eq. 59 lists the capacity in bits/symbol, whereas Eq. 57, for the continuous channel, lists the capacity in bits/second. To convert the "bits/symbol" capacity of Eq. 59 to a "bits/second" capacity comparable to Eq. 57, (and this appears desirable) it is necessary to incorporate W and symbol time in a sensible fashion. Since we are interested in a capacity relation, we desire to use a minimum TW product in any calculation. Immediately below, a minimum TW product for various signals is used which is related to the "dimensions" of the signals (see Table I). The capacity of discrete (digital) systems in bits/second is then found, using the minimum TW product (in terms of dimensions). The final objective will be to compare various signals in terms of their capacity/ bandwidth. Thus, we will plot C/W vs. Eb/N for the various signals. This is the practical interpretation of the theoretical statement that "an infinite number of dimensions is necessary 57

to specify a signal at a'time' density of 2W per second. " This gives a measure of the "best one can do" in terms of utilizing bandwidth, given a type of signalling. Although we feel this is a sensible way to use a capacity relation, there are certainly other ways, and the general disagreement has led to some confusion about "capacity." One common capacity comparison treats the capacity as a single number consisting of the value of C when W approaches xc (C o). This usage has value for ultimate comparisons and concepts, but it is difficult to derive conclusions about any "real world" choices. Another capacity comparison (for discrete systems) uses theoretical W's (as in Table I) of signals in the continuous capacity relation of Eq. 57. We feel somewhat dubious about this, since the theoretical BW's specify "how close one can put controlled signals," and not "how much W is used by each signal. " However, this usage does provide comparisons for controlled environments (each communicator using same type of signal, and coherent with respect to each other), even though this is somewhat unrealistic. As mentioned, we propose here a digital capacity (C) in bits per second which is arrived at by using minimum TW products. Although time-bandwidth products of general signals are somewhat arbitrary (see Ref. 23), there is good reason for relating minimum TW products to the dimensions of any given signalling method. By "dimensions," of course, we refer to the signal space representation of the signals. The signal space representation of signals is based on the Shannon Sampling Theorem, which says that, in an interval T, there are 2TW numbers required to specify the signal (if the signal is limited to W). These numbers can be considered as coordinates in a D-dimensional space (where D = 2TW). Consider now the vector space for a large ensemble of successive symbols; this space will have dimensions D', and D' is given by: D' = 2WT' = 2WT x N (60) where: T' = NT = time length of ensemble of successive (possibly overlapping) symbols T = time length of one symbol N = number of symbols in T' D' = dimensions of total space corresponding to T'. If one uses the criterion that succeeding symbols have no intersymbol interference, the vec58

tor space representation for a given symbol time is an orthogonal subspace within the total space of D' dimensions. The number of signals which can possibly be sent in time T' is then given by: D' 2WT' 2WNT (61) D D D where: D = dimensions per symbol time T, called "signal dimensions." The minimum TW for a given signalling method is then given by: min {TW} 2 (62) It now remains to find the signal dimensions D for various signalling methods. An M-ary coherent-orthogonal signal requires M dimensions per subspace, since this operation can be viewed as orthogonal subspaces within the symbol subspaces of dimension D. Therefore, D=M for coherent-orthogonal. For incoherent-orthogonal operation, each symbol subspace requires 2M dimensions since the fact that phase information is not useful means that each "element" requires 2 orthogonal dimensions. Since there are 2 dimensions per symbol, there are 2M dimensions per subspace. Hence D = 2M. Simplex operation has a signal dimension of D = M-1. For M-phase signalling (M > 2) the dimensions D remain 2, no matter what M is. For convenience, Table II shows the signal dimensions for the various methods (also listed in Table I). M-ary Signal Type Dimensions D M-phase (M>2) 2 Bi-orthogonal (Coh.) M/2 Simplex (Coh. ) M-1 Coherent -Orthogonal M Incoherent-Orthogonal 2M Table II. Minimum Signal Dimensions. 59

It should be noted that using minimum TW products based on signal dimensions as in Eq. 62 is a theoretical "capacity" type of calculation. If one is precise, one can never put absolute bounds on W and also on T. However, if one puts "practical" bounds on W (say -50 db on a filter response) one can in principle approach Eq. 62 if one allows overlapping signals (but nevertheless practically non-interfering). The above material is more fully explained in Appendix A, and practical realization problems are discussed. There are the following three implications as one tries to achieve the TW products of Eq. 62 and Table II (see Appendix A): (1) The signals must overlap. (2) The signals have an envelope which sin x approaches a snx curve. x (3) If min {TW) were to be realized, a correlating receiver is no longer necessary; if M = 2, a precise sampler is required, and if M > 2 a combination (or block-coded) sampler is sufficient. It is of interest to compare the minimum TW products here with the theoretical TW products of existing signals as given in Table I. For this purpose the uncontrolled environment bandwidths of Table I should be used, since these are the "limited" bandwidths. For coherent bi-orthogonal and orthogonal, the signal TW's are about twice the capacity TW's. For incoherent orthogonal the ratio of signal TW to capacity TW is M+1/M. Consequently, the "capacity" proposed here is a relatively realistic bound for a given signalling method. We now use the minimum TW product of Eq. 62, along with the signal dimensions of Table I or II, to obtain the capacity of discrete systems in bits/second, given the signalling method. Using Eq. 59, and the minimum T for a given W, the result for coherent simplex is: Coherent Simplex: C T M- log2M +h log2 M+ (1-Pch) log2 (1-P h)} bits/sec (63) where: W = bandwidth as determined by response being about (-50 db) 60

M = alphabet level or number of symbols C = capacity in bits/ second. For general M-ary coherent orthogonal: 2W lo log ch M log2M + chl~g2 M-1 + (1-Pch) log2 ( 1-Pch) bits/sec (6 4. For M-ary incoherent orthogonal: I' = log2 M + Pch iog2 - + (1-Pc) log2 ( 1ch) It should be emphasized that, since we have used the limiting TW product, the oquati(ns above should truly represent capacities for the discrete cases. Using Eqs. 63, 64, and 65, the capacities of the various alternative signa.li starting with binary simplex and going up to the M-ary signals, have been plotted in Fig. 2,. Since the ordinate is chosen to be C/W, the plot actually shows capacity per bandwidth Uf.r each type of signalling. An important difference between the C,/W curves here and the 1/W curves of Figs. 20 and 21 should be noted; in Figs. 20 and 21 the W was based on occupancy -— that is, on how closely similar signals can be placed. Here the W is strictly limited (-50 db). Hence the W here would be comparable to the uncontrolled environment W's of Table I. Also, Fig. 20 and Fig. 21 refer to actual signals, while here a bound is being used with no particular signal being specified. To emphasize, the curves here deal with the capacity per limited bandwidth if one is operating a discrete system with the given signal type and Eb/No ratio. The cases of Figs. 20 and 21 essentially show "what the information rate is if one agrees to operate discretely and has a bandwidth occupancy and an Eb/No ratio." Although there is about a factor of' —w Letween information rate/W and capacity/W for coherent signalling methods, we feel the term "capacity" is justified because of the minimum {TW} concept. Thus this capacity may be regarded as a relatively practical bound. There are two important features to C as used here: (1) it is a bound based on a minimlum TW product; (2) it is a single channel concept, so does not depend on the environment of (oil ei channels (as do the controlled W values in Table I). From the figures, it is seen that binary simplex affords the largest C/W —-thlimiting value is 2. For coherent orthogonal the limiting value is 1. 0, and the curve for 61

2.0 1.8 (a 1.6 14'b M k -nEb 1.2 t (b W I i L232. 1' —-2MI's E" r b M 2 M=8 M=16 M=4 C.2 o~ M= 162 / [- i LO 2M+(l.Pch)LOG(.Pch)+PchLOG 2 MP 1 W T ver Mr i 0 2 3 4 5 E b/N Fig. 23. C/W versus Eb/No for binary simplex, coherent orthogonal, and incoherent orthogonal signals. M=4 stays above all others for the entire range of Eb/N. It is seen that, as M increases, C/W becomes very restricted. For the incoherent case, M=8 is as good as M=4, in the low range. Again, as M continues increasing, the C/W becomes restircted. 4. 2 Maximum Information Rates if Data Rate is Variable All the preceding material plotted either P or I/W vs. Eb/No The various curves can be used to find either the P or the information rate per W for the various condi62

tions. Under each signal type, one specific objective was to compare binary versus M-ary operation. Constant rate comparisons can be made by reading vertically at any Eb/No point. For constant Eb/No, the data rate R is constant if the power S is constant, since R = 1/Tb Consequently, assuming that S is always fixed, all the preceding curves make "comparison sense" for constant data rates. We now consider the case where S is still fixed, but R is variable. The objective is to sensibly compare the various signal types for R variable. The first significant change is that we now plot curves versus S/N, rather than Eb/No, since S is fixed, but Eb is not. For the variable R case, the general aspects are as follows. Improved P's can be obtained by slowing the data rate (longer Tb), and consequently reducing bandwidth. The cost, of course, is the slower data rate. The alternative is to use M-ary signals. The choice between these two alternatives depends on the "economics" involved. Here we will adopt the criterion that "one should use as fast a data rate (low Tb) as possible, commensurate with Pe and W. " Thus we assume that one is given Pe and W, and adjusts the R to as high a value as possible. We would then plot maximum information rate per W versus S/N. In order to do this, it is necessary to break the S/N axis, in any plot, into two regions: (1) the low signal-to-noise ratio case, and (2) the high signal-to-noise ratio case. In the low signal-to-noise ratio case the minimum Tb chosen, under a given S limitation, is determined by the probability of error allowable. For the high signal-to-noise ratio case, the Tb is limited and determined by the minimum TW product (Section 4. 1. 2). In other words the Pe is constricting for low S/N, and TW product is constricting for large S/N. For low signal-to-noise ratio cases, the probability of error as depicted by the capacities in Fig. 23 above become so high as to be of little practical worth; hence it is sensible to put a probability of error limitation on the low signal-to-noise ratio cases. For the high signal-to-noise ratio cases the probability of error becomes increasingly better and hence minimum Tb (for maximum rate) will be restricted by the TW product (discussed in Section 4. 1. 2). The objective here will be first to present maximum rates for binary simplex and incoherent orthogonal under average power restriction. M-ary coherent and incoherent orthogonal will then be treated. In the curves of this section we plot "maximum rate" per bandwidth versus signal-to-noise ratio: the maximum rate is given in bits/second. The direct 63

import of this is that one can evaluate how efficiently one utilizes the bandwidth under the iven p and W restrictions. 4. 2. 1 The Small S/N Region. As mentioned above, when the signal-to-noise ratio is small, the duration Tb must be made long enough to meet the acceptable error criterion or probability of error. Thus in this region the error rate will be constant with energy-to-noise ratio and hence the energy, the N and the I' of Eq. 59, is a constant. This (can 0 be written: I'N IM = Max Rate MiE W(S/N) (66) waiherf: IC1 = ill-formation rate in bits/second when taking both P and W into account. Cur;vs.. I. /W can be found as follows: using previous curves, choose a Pch; find i' corlreE sponding to that Pch; then evaluate Eb/No MN from previous graphs. With this equation, and plotting IM/W vs. S/N, one obtains the constant-P portions of the curves shown in Figs. 24, 25, and 26. 4. 2. 2 Large Signal-to-Noise Ratio. As the signal-to-noise ratio is increased, one arrives at the p ace where the acceptable probability of error is met, and now the maximum rate is determined by Eqs. 63, 64, and 65 given before. Hence the right sides of the {;rves in Figs. 24, 25, and 26 correspond to the same information plotted in Fig. 23, except that now we plot IM/W vs. S/N rather than C/W vs. Eb/No. Note that C and IM differ only in that IM is constrained by Pe until IM = C. From there on, IM = C. To plot IM/W, one first assumes a given value of S/N. Convert this to E/No by multiplying by min {TW = D/2. With the resulting E/No find the Pch from previous E graphs, using Eb/No =log MN Then one can use Eqs. 63, 64, and 65 to find I'/T. 2 o It should be noted that a low signal-to-noise region for one Pe criterion is different from that for another P criterion. Also, for M-ary, one sees that a low signal-to-noise ratio for binary may be a high signal-to-noise ratio for M-ary. To emphasize, we have divided the maximum-rate situation into two areas: small signal-to-noise region and large signal-to-noise region. These regions are dependent upon the probability of error which must be met. The status or comparison between the small and large signal-to-noise regions is depicted in Table III. We contend that this depiction of maximum rates is sensible in terms of practical meaning for "maximum rate." 64

S/N increasing -3 Small S/N Large S/N Error rate constant Error rate begins to drop T just long enough to I' (no. of correct) increases give error rate as S/N increases Little penalty for re- Maximum rate limited by stricting to binary TW product of signal. Can do much better by going to M-ary (especially close-packed signals) Table III. Conclusions about small and large S/'N regions. 2.6 -.2.4 2.2 2.0 - 1.8 4~st jwI' 1.2.8 2 4.6 1. -2W' 0 2 4 6 8 10 12 14 16 18 20 S/N Fig. 24. Maximum information rate per bandwidth for binary.ignals. 65

M=[ 05 W MI M [LOG2M+ MhLOG M +(1-fMh)LG2(1-_h)].03,.02.01 I l I l l ll ll I l.01.02 03.05.10.2.3.5 1.0 2 3 5 10 S/N Fig. 25. Maximum information rate per bandwidth for M-ary coherent orthogonal signals. In comparing these results to the Pe calculations in the previous section (Section 3) one should remember that there Pe was plotted vs. EbIN. Here we are interested in maximum rates, and we ask the question, "How well are we doing in terms of what one could do?" Also, in this comparison we are assuming a power limitation. The variable rate conclusions from Figs. 25 and 26 are similar to the constant rate conclusions. If SN infor a given P is low (SN < 1), M-ary operation becomes desirable (practically necessary). However, if S/N > 10, either M = 2 or a more closely-packed signal becomes desirable. These figures should serve the purpose of depicting "how well one is doing" with respect to "how well one could do," taking both Pe and W restrictions into account. 4.The variable rate c3 Conclusions from Figs 25 and 26 are similar to the constant nratne cnSapacity relatlusions. If S/N for dia gital systemP is low (S/N w ) M-ary operation becomes desired, where a capacity for a given signalling method was studied. The viewpoint taken was that a certain 66

1.0 Pb-103 -3,0-4 0-3 M =.3_.2 -I M =16 ///.1 o.01.02 aU D5.10.2.3.5 1.0 2 3 5 10 E ~ ~ ~ ~ ~ ~ ~ S/_ Fig. 26. Maximum information rate per bandwidth for M-ary incoherent orthogonal signals. effective W was available, and the capacity was determined by using the minimum T for that W. The minimum {TWI was related to the signal dimensions. The resulting capacity relations are a measure of the bandwidth utilization, irrespective of the channel Pe. However, the capacity concept implies that the Pe can be reduced towards zero (by external means), and the rate still maintained. It was seen, in Fig. 23, that binary simplex offers the highest capacity (for farapart signals). For coherent orthogonal signals, M=4 is slightly better than M=2 for the low Eb/No; but both approach the same value at high EbjNo. For incoherent operation, M=4, 8, or 16 is preferable to M=2 for low Eb/No, but M=2 or 4 is best for high Eb/No staincoherent ratthe comparison.als. ForW. The minimum {TW} wasta relate comparison, " the curves of Section 4. 2 are applicable. thSince a communicator imps alwaies that the P canwith both a Preduce and towards zero (by external means)ed that 67

the maximum rate plot be divided into two regions: (1) low S/N region, where P is constraining; and Ai) high S/N where minimum {TW} is constraining. Such curves appear in Fig. 24 for binary sirnplex, and in Figs. 25 and 26 for M-ary coherent and incoherent orthogonal. Glven a W, a Pe, and S/N, and the signalling method, one can use these curves to establish the maximum rate (IM) for that set of conditions. We feel this approach is a sensible way to view maximum rates; furthermore, the concept of maximum rate implies a variable data rate.;I oither words, if one chooses the highest IM/W for a given Pe, W, and S/N constraint, one alets chooses the "best" data rate. To summarize, ti.,:n, one can make two comparisons, depending on the design situation: (1) a constant rate comparison, and (2) a variable rate comparison. A constant rate comparison is implied if both S and data rate (R) are specified. If so, the next step is to decide whether P or W is most important. The conclusions for hesle conditions were given in Section 3. 3. 4. In general, "far-apart" M-ary signals are required if Eb/No is low for a given necessary P e. Since increasing M increases the W, it appears desirable (considering both Pe and W) to use an M of 2 or 4 for coherent operation, and an M of 4 or 8 for incoherent operation. Also, the benefits of any increase in S should be emphasized, due to the steepness of all the Pe curves. If Eb/N is high for a given P requirement, one should use M-phase (Mvl:-` 8) or some other close-packed signal. A variable data rate comparison is implied if one is free to choose the data rate In this case, one can stay within both Pe and W requirement. Here it is sensible to choose that data rate which gives the highest informtion-rate per bandwidth" for the given Pe. r'he proper M is determined by the S,/N, Pe and W conditions, as depicted in Figs. 25 and 26. P rs of about 10 -5and S/N < 1 specify an M > 2. Of S/N > 10, either M = 2 or a more closely-packed signal is desirable. 68

PART II: ASYNCHRONOUS TIME-MULTIPLEXING OF MULTICHANNEL SPEECH SOURCES 1. INTRODUCTION The material of Part I essentially concerned the evaluation and comparison of different signalling methods in digital channel transmissions. In Section 1. 2 of Part I, it was noted that source coding is a potential area for improving the transmission of speech signals. The problem of interest here is how can one improve the transmission of multichannel speech signals, assuming that the information is to be transmitted digitally. Of course all the previous material still applies to the various possible methods of sending the information across the channel, so that this material is complementary to the previous material, and not an alternative to it. If the information to be communicated is speech, then an area of immediate interest is the "source coding" of such speech to improve the information efficiency. For example, speech compression methods can be interpreted as a source coding method. (The relation of this to our topic will be discussed below. ) Whenever the speech information comes from telephone communicators, source coding can effect improvements in two respects: (1) by reducing the redundancy in the speech signal itself, and (2) by altering the fact that, since each talker is part of a two-way conversation, each communicator is using this allotted part of the total facility only one half the time (for radio and cable links). Since various speech codings for reducing redundancy have been proposed and investigated by others, our main interest here will be in the second aspect mentioned —-improving the "channel utilization" when speech sources are part of a two-way conversation. This aspect can realistically be improved only when the facility is handling a number of speech sources so that one can take advantage of a "statistical effect" among them. In the material to follow we will assume use of an existing proposed speech encoding method, and then analytically evaluate the buffer situation and the use of this system in a multichannel 69

speech situation. The pertinent aspect is that each speech source will seek access to the channel only when it is "on" and when it has information (a sample) to transmit. Such operation will be referred to as asynchronous time-multiplexing. 2 The objective of this material will be to provide analytic evidence of the fact that it is both profitable and (at least analytically) feasible to perform the coding and multiplexing function in an asynchronous manner —-that is, by adopting the principle of allowing each source to seek access to the channel only when it has information to transmit. Although this idea is not new, it is only relatively recently that codings have been tested which make it possible to run a system asynchronously. 1. 1 Background and Previous Work For cases where multichannel speech is to be sent digitally, a familiar method is to use time-multiplexed PCM. Considering only the multiplexing function, we regard this as synchronous multiplexing since a given "slot" in the channel-time assignment is always given to a particular speech source (once it is connected by the "central"). In asynchronous time multiplexing the "source-channel time" relation is not fixed, but depends on the instantaneous conditions. A good example of an operational, asynchronously multiplexed system is the "Time Assignment Speech Interpolation" (TASI) system, which is used by the Bell Telephone System on the Atlantic Oceanic cable (Refs. 26 and 27). In this system the asynchronous multiplexing is effected by a "fast-switching" of the source-to-line connection; during a given connection, conventional, regularly-spaced samples are sent. Consider that a talker's time axis is divided into "talk bursts. " Whenever a talker is silent in excess of a stated time, an end of talk burst is assumed and the talker's line is switched to another (momentarily) waiting party. The original talker will get another line when he begins talking again. Thus the asynchronous multiplexing is effected by this fast-switching means, and the effective capacity of the oceanic cable has been about doubled. Although some "freezeout" inevitably occurs, this sytem is working successfully and the tolerable freezeout fraction designed 2The reader is cautioned that the term "asynchronous multiplexing" is used by some authors to denote "co-channel net operation." There the objective is to allow a "net" of communicators to contact one another at will. The "asynchronous" aspect, then, lies in enabling any communicator to contact any other over a common channel. In the problem here, a number of sources (after going through a "central") share a common (trunk) facility. The asynchronous aspect here lies in the assignment of channel time slots to any given source; hence it is called "asynchronous time-multiplexing." 70

into the system is about 0. 5 percent. It is found that the speakers are not aware of the freeze out when it is kept to this fraction. The speech coding method to be used in our analysis of the feasibility of asynchronous multiplexing is the "extremal coding" method (Ref. 28), also called "selective amplitude sampling" (SAS, Ref. 29). Both extremal coding and selective amplitude sampling refer to a sampling procedure wherein one samples irregularly, at the "extremum" of the speech signal. Thus one sends only the values of the relative maxima and minima of the waveform. It is necessary then both to send the value of this extreme and in some way note the time of occurrence of the extreme. In the receiver either an interpolation function, or a "boxcar" detector followed by a filter, may be used to reconstruct a good approximation of the actual speech signal. Before evaluating this coding procedure in the multichannel situation, it will be necessary to analyze the action of the buffer which is a required part of any practical asynchronous system. It is also of interest to note that the ideas behind asynchronous multiplexing treated here are very much related to the ideas of "demand matching" which have been espoused in concept a number of times (Refs. 30, 31, and 32). The general idea in demand matching is that a source have access to the channel only when it has information to transmit. This is meant to incorporate removal of both redundancy and the nonuniform flow of information from the sources. In an ideal multichannel case, demand matching would result in all the information from all of the sources being handled by the channel; further, all the information from the multiple sources would be statistically stable enough that the channel could operate at a uniform rate (without a buffer) even though the individual sources were creating information at a sporadic random rate. In practice one cannot realize this ideal, and hence two difficulties arise: first, not all of the information can be transmitted (a situation usually called "freezeout" for each individual source), and second, the information rate of the total ensemble of sources will not be uniform, so that some buffering will be required. Nevertheless, the advantages to be gained with a tolerable freezeout fraction and a somewhat varying channel rate are sufficient to allow serious exploration of various implementations of demand matching. Finally, it is of interest to note the comparison between speech compression methods and the extremal coding or selective amplitude sampling method to be used here. Speech compression techniques are a very profitable way to encode speech signals, but do 71

not derive any additional profit by being multichannel. In speech compression techniques one studies seriously the mechanism which creates speech and than attempts to encode efficiently. In our case, we consider a multichannel case of some time waveform with a given frequency spectrum. One resulting difference is that the more general coding techniques offer at least a possibility of being applicable to analog waveforms in general (which have the same frequency spectrum, etc. ), whereas the speech compression method is rather strictly limited to speech itself. This is not to say that the coding methods studied here will necessarily work for signals other than speech, but they nevertheless have the possibility of doing so, and this should be kept in mind. Another aspect is that speech compression methods may be considered as an ideal realization which may still require a great deal of equipment before being used on a multichannel system. On the other hand, the techniques of interest here may be considered as intermediate coding where the encoding is not as efficient but has more hope of being realized practically. In the material that follows we will first describe qualitatively the asynchronous multiplexing scheme with the use of a block diagram. Then the performance of the buffer will be mathematically analyzed, since this is the most crucial aspect for the multichannel case. It should be noted that the performance of the buffer is relatively independent of the actual coding scheme, except that the coding scheme must sensibly fulfill the "model" used in the analyzing of the buffer action. We conclude with comments on reconstruction of the waveforms and the performance of system comparisons. 72

2. DESCRIPTION OF ASYNCHRONOUS TIME-MULTIPLEXING SCHEME Although one can describe asynchronous time-multiplexing schemes in general terms of demand matching, here we will (for concreteness) portray a system which involves extremal coding (also called SAS). It should be remembered, however, that the buffer analysis of Section 3 applies to any encoding which reasonably fits the model used in that analysis. Extremal coding is a suitable choice since it reasonably fits that model. Figure 27 depicts extremal coding, where a time waveform is sampled at the occurrence of relative maxima or minima. It may be noted that these maxima and minima occur quite irregularly. For SPEECH WAVE Fig. 27. Depiction of sampling for extremal coding. a multichannel system, it will be seen that the following information would have to be transmitted: (1) amplitude of the extremum, (2) time of occurrence (in some fashion), and (3) source identification number. If one wishes to encode a number of multichannel speech sources with extremal coding, and then send them asynchronously —-that is, let the sources seek access to the channel only when they have an extremum to send —-it is seen that a buffer will be required to smooth out the inevitable variations in arrival of information. If a buffer were not used, a great many extrema would be rejected or "frozen out. " A block diagram of a system which can be allowed to run asynchronously is shown in Fig. 28. 73

AMPLITUDE INFO. I~I ~SOURCE Il~ lJ~ |NUMBER ~I I SAMIPLESARRIVE OUTPUT TAKEN ~I I I |RANDOMLY AT AVERAGE EVERY T SECONDS I: ~ It~~ 8TOTAL RATE R N I..' I I INI N SOURCES I R E Fig. 28. Block diagram of transmitter for asynchronous multiplexing of multichannel speech using extremal coding. In Fig. 28 it is seen that only the amplitude at an extremum is allowed to pass through the gate. This amplitude is then quantized and digitally encoded. Source number should be added at this point. Since the information (extremum) occurs at random, the essential idea is to allow the buffer to accept extrema from the individual sources whenever they occur with no prior ordering. Outputs will be taken from the buffer (if any are present) at a regular clocked interval —-every T seconds. (Hence, the multiplexing is asynchronous but the channel transmission is synchronous. ) It is seen that, for some fraction of time, the channel will have nothing to transmit (buffer empty). The percent of time in which this happens is called "channel idle time." On the other hand, the M-long buffer may be full when "a sample" arrives from a source. In this case the sample is rejected, and the total percent of such samples (from all sources) is called the "freeze out fraction" (FOF). Clearly the FOF will depend upon both the buffer length (M) and the operating parameter u (the ratio of average number of buffer input 74

samples to average number of output samples per unit time). The evaluation of FOF as a function of these two parameters will be the most important objective of Section 3. For acceptable operation, the FOF must be kept to a very low level. Since each "sink" at the receiver end must know the time of occurrence of the sample, it is necessary to insert the "buffer delay time" onto the other information. This can be done at the buffer, as shown in Fig. 28. The accuracy of the "time" information must be commensurate with the encoding method used (see Section 5). For extremal coding it should be sufficient to send the buffer delay time in integers of T (for a sufficiently large number of channels). As mentioned, the essential problem in dealing with the feasibility of any such system lies in analyzing the action of the buffer. It is quite necessary to assure that only an acceptably small fraction of the information samples be frozen out. In addition, a number of other properties of such asynchronous operation can be analyzed by considering the performance of the buffer. 75

3. PERFORMANCE OF THE BUFFER As noted above, the buffer will accept samples from any of the sources, so long as it is not "full. " Samples are removed from the buffer at regular intervals —-every T seconds. In order to do any analysis of the buffer operation it is necessary to have available the probability distribution for the number of samples going into the buffer. A sample is a quantized (and digitized) amplitude of the extremum with source number added (time information must be added later; see Fig. 28). Since the output time axis of the buffer is divided into intervals of T seconds, it is useful to consider the input time axis as also divided into T intervals. Then, the question of interest is, "What is the probability of k units occurring (from all of N sources) within an interval T?" The analysis of the buffering situation below will assume that the probability distribution for k events of the equivalent source (of all the N sources) occurring in a time interval T is given by the Poisson relation: k p(k,u) = e'uu (67) where: p(k, u) = prob. of k extrema among the N sources in a time interval T u = average rate (taken across sources) of occurrence of extrema in time T. It may be noted that u can be interpreted as the ratio of the average buffer input rate to the average output rate (since the output clock rate is one, in terms of T). This probability distribution results if one assumes a Bernoulli model for the group of N sources divided into T intervals. That is, assume that (1) the occurrence of a sample in one source is independent of all other sources, and (2) the probability of occurrence of a sample in a time interval 7 (for any given source) is constant; under these conditions a Poisson distribution, as in Eq. 67, will result for a number of sources taken to76

gether. Certainly the first assumption is justified, as is the second one if T is sufficiently small. (T is inversely proportional to N, the number of sources. ) Nevertheless, it should be remembered that all results of the following analysis apply so long as the probability distribution of Eq. 67 holds. A study of the range of applicability of Eq. 67 for cases relative to this problem was made, and reported in Appendix B. The study was made in terms of the "distribution of intervals between zero-crossings" of the individual sources. It was found that, if the interval distribution is exponential [f(u) = e u], the Poisson relation of Eq. 67 holds for N multiple -Xu is assumed to be Erlang type-1 [f(u) = Xue ], it was necessary to allow N - o to show that N multiple sources have a "unit" distribution of Eq. 67. It will be the goal of further work to continue the study of the relation of N to the Poisson distribution assumed in Eq. 67. Although experimental evidence would be very helpful here, it is exceedingly difficult to obtain. In any event the distribution of events for the equivalent source which feeds the buffer will be taken to be Eq. 67. 3. 1 Queueing-Theory Approach The analysis of any buffering situation can usually be interpreted as a basic queueing-theory problem. We will present briefly the basic ideas associated with this. However, it will be found that in this case results are found much more easily by considering a special case of a queueing problem, which neglects transients and is relatively simple. Therefore the bulk of the material will be devoted to this special analysis. Consider for a moment the buffering problem as a queueing-theory problem; then one speaks in terms of "state probabilities. " That is, the buffer is said to be in the nth state when n samples are in storage. Thus the nth state probability (Pn) is the probability of exactly n samples in the buffer. Thus if one evaluates PM (where M = stages of buffer), one can find the average freezeout fraction —-that is, the average number of samples from the equivalent source which will be turned away. The probability P0, which is the probability that the buffer is empty, is related to the channel idle time, that is, the fraction of the total time that the transmitter is idle. If one is interested only in steady-state conditions, then the following equations must be solved, to get any Pn (written in matrix form): 77

P(sT)] = [T] Pn(s-1)] (68) where: P (sT] = probability of n items in buffer at time ST (column matrix) P n(s-1)T] = probability of n items in buffer at time (s-l)T (column matrix) [T] = matrix of transitional probabilities, which can be found using a state diagram and Eq. 67. For any steady-state buffer operation of the type here, the state probabilities (Pn) are cyclo-stationary in time; that is, the P's are cyclic with the time intervals T. Therefore: Pn(sT) Pn(S-)T] = Pn(S-m)T] for m=0,1,2, (68a) Consequently Eq. 68 appears as: Pni = [T] Pn] (68b) In addition, for an M-long buffer, one has the equation: M E P = 1 (69) n=O n Equations 68a and 68b result in a set of M+2 equations in M+1 unknowns. Selecting any M equations from the matrix of (68b) and using Eq. 69, one can solve directly for the state probabilities as indicated in the appendix. This is the queueing-theory approach, then, to the steady-state buffer situation. It should also be noted that use of the queueing-theory approach above allows one to solve for transients in the buffering situation, i. e., the probabilities of given states after s intervals, assuming that one started with a buffer of given state probabilities. In this case the more general equation is: P (sT)] = [T] s P(0)] (70) where: P(sT)] - a column vector of state probabilities after s intervals P(0)] = a column vector of initial state probabilities [T] = the transition matrix. 78

3. 2 Specialized Approach Although the queueing theory just described is more general, it is simpler to obtain the steady-state buffer conditions (which are of interest here) by a more specialized approach. Consider that a clock pulse occurs at every T interval, at which time an output is taken from the buffer. The technique used will be to write a recursion relation between state probabilities just after the clock pulse to determine the state probabilities prior to the clock pulse. The quantities of interest can then be related to these state probabilities. We will now portray this more specialized analysis of the buffering situation. The quantities of interest will be the freeze out fraction, analysis of freeze out runs, channel idle time, and the wait time distribution of the input samples. The state probabilities describing the occupancy of the storage stages cannot approach stationarity after long operation, but will approach "cyclo-stationarity" (mentioned earlier). That is, the probability of a given state will be a function of the time that has elapsed since the last clock pulse. A way to analyze this queueing problem is to single out two sets of state probabilities. Let: = probability that exactly k buffer stages are occupied, just after a clock pulse; in other words, after one sample has been removed, if it was available; i3k = probability that exactly k buffer stages are occupied just before a clock pulse (Sk = Pk of Eqs. 68 and 69). The definitions of these two particular distributions are illustrated in Fig. 29 IF AVAILABLE, SAMPLE IS REMOVED FROM BUFFER DISTRIBUTIONS OF QUEUE LENGTH ALL EQUAL ak IMMEDIATELY T AFTER CLOCK PULSE I I r 12r 3T 4T 3r5 I I I DISTRIBUTIONS OF QUEUE LENGTH ALL EQUAL /k Fig. 29. Depiction of the probabilities ak and >Sk along a time axis. 79

Note that cyclo-stationarity means that the probability distribution of queue lengths for any instant of time within a clock interval will be the same from clock interval to clock interval (since there is no reason to favor a particular clock interval). Since one sample is removed from the buffer at each clock pulse, the following relations are valid: a = f3 +/3 ao 0o 1 ak k= k+1' k > 1 (71) With the help of the Poisson distribution of Eq. 67, it can be shown (see Appendix B) that: -u o =e a -U -U f = ue a +e a 1 U2 -U -U -U 02 2 e ao +ue a1 +e a2 etc. (72) Then 1 = a o- 0 = (1-e ) a a1 = e[ - ue uaO] eu[a - e ua ue U] = ao[e-u - 1] 2 -u -u -u 2 = - e a2 +ue a1 +e U2 U. U2 -U -u [eu] U2 a2 e — e a - ue a a1 — a (73) In general, by continuing this iteration, k n 1 < _ - ak = [eu-u- 1 k] ak-1 k2 n! = k-n (74) = 0 where: 5 = the Kronecker delta. 1,k Thus, for a given input rate, u samples per clock interval, one can find the probability of any queue length (or state) at the beginning of a sampling interval in terms of the probability ao (of zero length). If the queue length at the beginning of a clock interval is not permitted to exceed M-1 (implying that the buffer has a capacity of M samples), and since all ak s can be re80

lated to a by Eq. 74, a can be determined from the following equation: M-1 Z ak = 1, (75) k=0 since the sum of the ak s encompasses all possibilities for queue length at that time. Each ak is a simple multiple of a, and is found by multiplying aO by the appropriate factor. The distribution for queue lengths at the beginning of a clock interval is then determined. 3. 3 Freeze Out Fraction (FOF) The values of aO found through use of the above procedures are substituted into a simple equation for freeze out fraction. As previously explained, freeze out occurs when there are M pieces in storage and more pieces arrive before one is "dumped" at the next clock pulse. The freeze out fraction is written as: FOF = 1 - av. no. of samples emerging from buffer each clock interval av. no. of samples entering buffer each clock interval r1-e-ua u-l+e-u (76) = 1 - L u ~ 2 = ~ (76) u u This expression leads to relations with FOF as a function of u and buffer capacity M, plotted as the family of curves of Fig. 30. In the design of a nonsynchronous multiplexed system the curves can be used to find the best compromise among the three parameters u (input rate), M (buffer capacity), and FOF (freeze out fraction). It is seen that, as M increases, the FOF drops appreciably as u is decreased. A u of 0. 8 is chosen as a suitable calculation point for the system evaluation (Section 5). Also, in that section a tolerable FOF of. 05 percent (. 0005) will be selected. 3. 4 Freeze Out Runs Figure 30 evaluates the expected fraction of total samples (from all the sources) which will be frozen out. In addition to this number it is of interest to have some information on the distribution of rejected samples from a given (single) source. This is a difficult problem and has not been solved completely. However, the following probabilities have been solved, and bear on this problem. First, consider the distribution of freeze out "runs:" 81

1.0.5.3.2 () 0 2- 2 i;;<~~~~~ 2I~~~~ _ u=0 \u=0.9 c( L - 10'2 1 2 3 0.5 4 5 0.6 u= 0.7 U) 00 M - NUMBEROLUESFBELR STAGE z 3 82 0 12 2 u=O.5 uO.6 9O27 M=NUMBER OF BUFFER STAGES 82

let: P (c) = probability of a run of exactly c clock intervals (during which one or more samples from some source are frozen out during each interval). It can be shown (see Appendix C) that: P(c) = (1 +u)eU[1 - e-u(1 +u)] c_ 1 (77) Next, consider the probability of a given number of units frozen out, provided that "one or more" was frozen out in the previous interval. Let: P(d f -> 1) = probability that exactly d units are frozen out, in a given interval, provided that one or more was frozen out in previous interval; i. e., the interval of interest is within a run. Then it is shown in Appendix C that: -u d+1 P(dlf 1) = e U (77a) Table IV presents evaluations of P (c) and P(dlf = 1) for c=1 through 5 and d=l through 5, for u = 0. 8. c or d P (c) P(dlf > 1) 1 80.9 9 80.88 2 15.4 4 14.38 3 3.0 3.53' 4 0.6~ 0.71~ 5 0. 50 Table IV. Evaluations of P (c) and P(d f > 1). The Pr,(c) results indicate that samples arriving at least 5 intervals apart will be very unlikely to be members of the same freezeout run. With 10 or more channels partici83

pating, the probability of two consecutive freeze outs from the same channel becomes very small. There are other possibilities for consecutive freeze out besides a single freeze out run —-for example, two freeze out runs separated by a moderate number of "no-freeze out" intervals. However, the rapid fall in Pr(c) for increasing c is indicative of the trend for other possible realizations of two or more consecutive freeze outs from a given channel, so that for u = 0. 8 and M > 10, little deviation from average FOF rate is expected. The significance of the P(d) results is that the number of lost samples during a freeze out run can be expected to be somewhat more than the number of clock intervals constituting the run, as is expected. One may expect that the ratio of frozen out pieces to clock intervals in the freeze out run will be less than 1. 5. 3. 5 Starting Transients When the system begins operation, the buffer will be empty and thus a starting transient can be expected. The most significant aspect of this transient would be the lack of any freeze out for a period of time. No detailed analysis has been made of starting transients because such transients would enhance rather than degrade the freeze out situation. Transients can be calculated using the matrix equation (70). 3. 6 Channel Idle Time The objective of "channel idle time" is to estimate the fraction of clock intervals during which no samples are transmitted. Since the normalized average rate of entry of samples into the buffer is usually less than 1 (u = 0. 8 is suggested), and since some samples are lost through freeze out, there will be a number of clock pulses occurring when the buffer is empty and nothing can be transmitted. The fractional quantity of such pulses is just 30, the probability of the buffer being empty at the end of a clock interval. Thus: -u Fraction of unused channel slots = = e (78) Channel idle time is plotted in Fig. 31 as a fraction So vs. M. The plot shows that minimization of channel idle time requires a high input rate, large buffer capacity, and consequently longer delays. These results are not unexpected; information theory has always recognized, particularly in the method of coding by large blocks, that the attainment of a smooth flow of information from a system whose input is irregular requires considerable delay and storage. 84

70 60 - ~ U:5O 0.5 -50 eu 40 a. 0 5*_____ _ _ _ _ _ _ _ __ _ __u:0.8 10 CAPACITY OF BUFFER (SAMPLES) Fig. 31, Channel idle time versus buffer length. In Fig. 31 it is seen that, for a u of 0. 8, the idle time is about 20 percent for buffer lengths ) 10. Although this may seem inefficient, it mustbe remembered that in synchronous multiplexing the idle time is greater than 50 percent for telephone communicators (over radio and cable links). It will be seen, in Section 5, that asynchronous multiplexing combined with a speech coding (such as extremal coding) provides a more efficient multichannel system. Also, consideration should be given to the possibility of sending low-priority nonreal time data during such idle times. 3. 7 Wait Time Distribution For the buffered system it is of interest to examine the distribution of waiting times for samples entering the buffer. Because of the cyclo-stationary pattern of buffer occupancy, this distribution will not be a smooth function. A figure is also provided in which the variation of waiting times from the expected values is plotted. The most important consideration, as far as time placement of samples is concerned, is not the average delay of a 85

message, but rather the unpredictable perturbation of individual samples from their correct relative positions in time. The noise thus generated must not be excessive. Figure 32 clarifies the measurement of waiting time for a sample entering at a fraction 0 of a time interval T and encountering k pieces already in the buffer. Derivation of Distribution Consider the following notation: P(A, B) = joint probability of the events A and B. P(AI B) = probability of A conditional on B. In the discussion of freeze out fraction, the following distribution for samples entering the buffer was used: k p(k,u) = e ku (79) where: p(k, u) = probability of k samples arriving during the interval T, when the average rate of arrival is u. When considering "waiting times," it is necessary to break the time axis into intervals smaller than T. Assume 0 is the fraction of T at which a particular sample enters, so that 0 varies Irom 0 to 1. Assume further that the distribution of samples entering during a period 0 T remains Poisson. Then the following equation is applicable: -Ou (Ou)k p(k, Ou) = e k! (80) where: p(k, 0u) = probability of k samples arriving during the interval 0T. It is desired to find the waiting time probability density vs. waiting time, with the rate u and the buffer capacity M as parameters. To do this it is first necessary to find the (discrete) probability of waiting time with u regarded as a constant. Then 0 will be allowed to vary continuously (from 0 to 1) and a probability density will be defined. The technique used is to evaluate the probability of waiting time for the (k+l)th sample, assuming that it arrives at a time OT. First, k (uO)ne-uO P0(k) k-n u! (81) n=O - 86

where: P (k) = the probability that k samples are in the buffer immediately before the arrival of the (k+l)th sample. ak-n = the probability that (k-n) samples are in the buffer at the very beginning of the T interval. It may be noted that the wait time w for the (k+l)th sample, arriving at time 0, will be: w = k + (1-0) clock intervals (T's) (82) In order to find the probability of waiting time, it only remains to note that k may not exceed M, since additional arrivals (k _ M) will be frozen out. Thus one needs to evaluate a conditional probability —-the probability of waiting time conditional on "no freeze out," i. e., conditional on our test sample getting into the buffer. This is given by the usual equation: P 0(k, get in) P 0(kget P(get in) (83) where: P0(k Iget in) probability of a sample encountering k samples already in the buffer when it enters at time 0, if it enters at time 0 P (k,get in) = the joint probability of getting into the buffer and encountering k samples already there, at time 0, and is equal to P (k) since only those values of k permitting entry into the buffer by a new arrival will be considered (k - M-l). M-1 oglgetin) = kiO P0(k) = sum of probabilities (84) of all possible values of k (at time 0) that do not prevent entry of a new sample. Since P (k) has been defined in Eq. 81, a simple substitution of Eqs. 81, 82 and 84. into Eq. 83 yields the desired "wait time" distribution: (uS)ne-uO P0(w-l+0 lget in) = Ps(kget in) = M1 k (ne-u (85) k nk-n n! kI0 n=O where: k = w-1+8 = an integer. 87

1.0INPUT RATE 0.8 ARRIVALS/CLOCK INTERVAL. M NUMBER OF SHORTAGE SLOTS IN BUFFER..BO'.6 - 2 0 <1 3 4 5 6 2 w.4 5-. 00! 2 3 4 5 6 7 8 9 WAITING TIME (MULTIPLES OF CLOCK INTERVAL) Fig. 32. Probability density curves of waiting time in buffer. The distribution can be evaluated for one 0 and all k's K M- 1, yielding the probability of M discrete waiting times. A probability density for continuous 0 is, however, the physical problem. If the interval T in which 0 is defined is considered to be of unit length (i. e., we measure time in T's), the formula (85) will give the probability density for the value of 0 specified. If several sets of calculations for different values of 0 are made, smooth curves may be drawn through the resulting points to give the smooth probability density curves. Figure 32 is a plot of these curves for M = 1, 2, 3, 5, and 10, and input rate 0. 8 pieces/clock interval. Figure 33 presents the same curves, but with each shifted by an amount equal to its mean waiting time. Means were graphically computed from Fig. 32. Since perturbation of waiting time from the mean value is of principal importance, Fig. 33 will be the more valuable. For buffers with capacity greater than 1, the most likely range of waiting times remains between O and 1, with emphasis toward 1. This situation is strongly implied by the 88

1.0-.INPUT RATE =0.8 ARRIVALS/CLOCK INTERVAL z M =NUMBER OF STORAGE SLOTS IN BUFFER. M MEAN VALUE OF WAITING TIME80 |. ( T INTERVALS) W~~~~~I 0.5 -3 -2 0.93 3 1.29 5 1.82 z 6 I 10 2.38 L0 t - z.2. DEVIATION OF WAITING TIME FROM MEAN VALUE (MULTIPLES OF CLOCK INTERVAL) Fig. 33. Probability density curves of deviation of waiting time in buffer from its mean value. physical picture, in which a sample arriving just after a clock pulse is likely to encounter no samples at all in the buffer and can thus expect to wait almost 1 clock interval. The above graphs depicting the buffer performance will be referred to when evaluating system performance in Section 5. 89

4. RECONSTRUCTION OF WAVEFORMS Before proceeding to evaluate the asynchronous system performance, using the above buffer analysis, we consider here the reconstruction of waveforms when the samples are irregular. It is noted that, with the extremal coding being considered here, one must reconstruct the waveform from extremely irregular samples for each source. First the theory for regular and irregular sampling is reviewed, and then comments about practical reconstructions will be given. 4.1 Theoretical Sampling Relations The theory for uniform sampling is based on the familiar Shannon Sampling Theorem for bandlimited waveforms, where the waveform can be theoretically reconstructed using the relation: f(t) = z f(2W) n(t) (86) n=-xo where: On(t) = sin x/x type composing function W = bandwidth of the bandlimited waveform. Here the uniform samples are taken at times t = n/2W (n=O, ~ 1, ~ 2,...). The composing function is given by: sin 2irW(t - 2W) n(t) = 2rW(t - n/2W) (87) For uniform sampling, the same composing function (except for a delay time) multiplies every sample. An ideal low-pass filter (with an r delay) would reconstruct the f(t) from the uniform samples. There is similar theory available for nonuniform sampling (Ref. 33). One important difference for this nonuniform sampling theory is that one considers "energy-limited" signals, rather than "time-limited" signals (as is done in the Sampling Theorem). Noting this difference, the following theorem is stated in Ref. 33: "If the sample values at a finite 90

set of arbitrarily distributed sample points t = Tp (p = 1, 2,..., N), are given, a signal f(t) with no frequency component above W cps is defined uniquely under the condition that the'energy' of the signal f f2 (t)dt is a minimum. " Note that the exact duration of the waveform -Co is not required. The reconstruction of the signal is given by: N f(t) = f(rT )p(t) (88) p=l p p N sin 2ir W(t-T ) where: 4/p(t) = ap 2W(t-T q=l 1P qW(t-r) The coefficients a are obtained from the inverse of a matrix, where the matrix has eleqp ments given by: sin 27 W( r -T) 2rW(T -T ) P' q = 1, 2,..., N. (89) W( q Although such nonuniform sampling theory is of potential interest for codings like extremal coding, it is not clear how it may be applied to extremal coding. We have no conclusions as to the application of this nonuniform theory (for our purposes), but simply note its existence here for future study. 4. 2 Reconstruction Methods Here we wish to note two methods for reconstructing speech waveforms from the nonuniform samples of extremal coding. They are: (1) inset a standard interpolation function between samples, and (2) low-pass filter the "boxcarred" samples. Both of these have been experimentally tested, and shown to yield acceptable quality. Interpolation Function Method In general, the waveform between two arbitrary samples (ai and a ), using an 1 i+1' interpolation function F(X), can be written: S(X) = a +(ai+ -a) F(X) X < X < X (90) i i a- ai+1 where: F(X) = interpolation function F(Xa.) = O; F(Xa ) 1. 91

Writing in terms of time. S(t) = ai + (ai+ - ai) F t ti < t < ti+1 (91) where: F(ti)= 0; F(ti+l) = 1 A particular interpolation function experimented with extensively (Ref. 28) for extremal coding is of the type: F(X) = X2 (3-2X) (92) This function provides a continuous first derivative at the sample points (extremum), and is found to produce good speech. In the work of Ref. 28 the experiments were done with a computer; in practice, one may have to devise an analog "inserter" to use the interpolation function approach. Filtered Boxcar Method A second method of reconstruction shown to give acceptable speech (Ref. 29) is to low-pass filter the "boxcarred" samples. This is certainly a relatively simple method. its significance for our purposes is that the reconstruction of the waveform should not be a serious obstacle to implementing an asynchronous system with extremal coding. 92

5. PERFORMANCE OF SYSTEM It will be the objective here to choose sensible parameters and then evaluate the digit rate for asynchronous multiplexing when extremal coding is used on the individual speech sources. This rate will then be compared both to standard companded PCM, and to another efficient multichannel method. It should be remembered that the qualitative system operation was described in Section 2, and that Fig. 28 shows a block diagram of the transmitter side. 5. 1 Choice of Parameters In the following calculations we will assume that the number of sources (N) to be handled by the multichannel system is 100. This number if chosen for the following reasons: (1) With N of the order of 100 we are certain that the Poisson distribution (Eq. 67) is a good approximation for the "equivalent source" provided by the 100 sources (even considering that sources are active only half the time). (2) With N > 100, the T interval is sufficiently small that one need only send "time" information specifying the "number of buffer intervals (T) delayed. " Also, the fixed delay, MT, improves as T is made smaller. (3) An N of 100 is physically reasonable, since present Signal Corps capabilities go up to 96. From the buffer analysis of Section 3, a u of 0. 8 is chosen. This was chosen so as to incur a reasonable idle time (see Fig. 30) while at the same time getting very small FOF for reasonable sized buffers. For buffer length, an M of 16 was chosen. This requires 4 binary digits to specify the buffer delay time, and the corresponding FOF is less than.05 percent. Considering that the freeze out runs are distributed among the 100 sources, we feel that this parameter choice (M = 16, u = 0. 8) is conservative. The remaining parameters are selected with reference to the experimental tests 93

of extremal coding in Refs. 28 and 29. The parameters and explanatory comments are listed as follows: (1) Q = 64 = quantizing levels for the extremum samples. It was found that, with this number of levels, the decoded speech was definitely comparable to 40, 000 bits/second companded PCM. (2) R' = 1500 samples/second = average rate of occurrence of extremum from an "on" source. Although Ref. 28 finds an R' of 1400, Ref. 29 uses 1600. We have chosen the compromise. (3) R" = 750 samples/second = average rate of extremum, considering that each source if "off" 50 percent of the time. This is the rate we shall use in the following calculations. This factor should still allow us to envisage a Bernoulli model so that Eq. 67 is valid. -4 (4) AtM =.125 x 10 seconds = necessary accuracy of location (in time) of the extremum. It was found that, when this was exceeded, a quaver occurred in the decoded speech (Ref. 28). These parameters, and the preceding ones, will be used to calculate the digit rate. 5. 2 Digit Rate Before calculating the digit rate it is necessary to determine the extent of time information required to stay within atM (above). This is done by evaluating T, the length of buffer output intervals (see Section 2). The T can be found using: u 0.8 -4 T = RN = =.107 x 10 seconds (93) 7.50 x 102 xl x 102 where the parameter values described above have been used. Thus T is less than AtM for the given parameters. Consequently, one need only identify the number of T intervals by 94

which the sample was delayed (in the buffer). Thus, the time information requires only M levels (log2M bits). Note also, with an M of 16, that the steady-state delay (MT) is only 171 microseconds. In a two-way speech conversation, this delay can be as great as half a second without discomfort. We can now evaluate the digit rate for the proposed parameters. First, the following bits are required per sample: Quantity Bits Required Amplitude of Extremum 6 Source Number (N=100) 7 Time information 4 17 bits/sample (94) To convert this to digit rate in bits/second, one divides Eq. 91 by Eq. 93. The result is: R = digit rate -= 17 5 = 1. 59 x 106 bits/second (95) 1.07 x 10 It may appear that 17 bits/sample is large since a synchronous multiplex (PCM with uniform sampling) requires only 6. However, it must be remembered that the extremum samples have an average rate of only one fifth of uniform sampling, and that the extrema occur only when a source is active. It will be seen that the balance of these items lies in favor of the asynchronous method. 5. 3 Comparisons It is now of interest to estimate the quality associated with the asynchronous operation, and to compare the digit rate with existing and proposed alternative methods. Table V depicts the comparison of extremal coding for N=100 with time-multiplexed, companded PCM (and with log-differential PCM). We now discuss this comparison. Based on the extremal coding listening tests of Ref. 28, it is concluded that the quality obtained with the digit rate of Eq. 95 is comparable to 40, 000 bits/second, companded PCM. Using some curves in Ref. 34 (also reported in Ref. 35), 40, 000 bits/second companded PCM results in about 21 db of quantization signal-to-noise ratio. (Note that this 3This quantization signal-to-noise ratio [(S/N) ] should not be confused with the channel S/N ratio of Part 1, Section 4. 2. The channel S/I (related to E/Nn) is a measure of the channel disturbances, whereas (S/N)q is a measure of the disturbance due to digital modulation (quantization). 95

value is an estimated average over a 25-db signal level range. ) The (S/N)q for a present military system (6 bits at 8 kc) is about 26 db (from the same curves). Consequently, it appears that the reduced digit rate of asynchronous multiplexing (Eq. 92) may incur about a 5 db reduction in (S/N)q; we feel this is quite acceptable. The digit rate is reduced by a factor of 3. In Table V, the time-multiplexed, companded PCM (at a 48, 000 bit/second rate per source) is used as the reference. It may be suggested that one might use "time assignment" (such as in TASI —-see Section 1. 1) on the time-multiplexed PCM signal and decrease the multichannel digit rate by a factor of 2. On the basis of a brief study of this, we concluded that the equipment complexity of this is much too great for radio or cable links (i. e., economic factors for radio links are not comparable to those for oceanic cables). Another possibility for handling multichannel speech digitally consists of using a speech coding of the "delta-modulation" type (with uniform sampling). It is of interest to compare the asynchronous case with this type. One of the most efficient methods tested thus far (Ref. 35) is "Log-Differential PCM. " Although this log-differential scheme was not tried on speech itself, experimental results using sine waves are reported in Ref. 35, in which it is essentially concluded that four-digit log-differential PCM sampled at 9600 cycles per second is about equivalent to six digits companded PCM. Thus one can regard the four-digit PCM as effecting a saving of two digits per sample. Thus, for N=100, the digit rate would be 3. 84 x 106 bits/second. This would provide an estimated (S/N)q of 26 db, as in the six-bit PCM case. Another case would be to use a three-bit log differential case. Here the estimated (S/N)q is about 17 db. Each of these cases is noted in Table V. 96

Bit Rate Bit Rate Assumptions Quality Remarks for Ratio Estimates N=100 (to PCM) 6-Digit Time-multiplexed PCM Good military This is our standard of compariCompanded PCM 4. 8 106 Each source sampled at telephone son for the other two systems. 8 kc quality Multiplex PCM is fairly complex (Time-Multiplexed) Companded (S/N) z26 db in equipment. q = 1500 samples/sec. Quality is comparable Automatic use is made of the fact from each active source. to 4 digit, companded that the channels are active about Each source "on" about PCM sampled at 10 kc. one half the time; thus, one obone half the time. (Based on listening tains an automatic "time assignAsynchronous N = 100 = total number tests in Ref. 28) ment. " Time-Multiplexing of channels. Four digits (S/N) q 21 db Such an asynchronous system perw5 used to encode time in- mits an "elastic" overload, if reformation —-in the form quired. Extremal Coding of identifying the buffer Non-real time data may be put in c (Buffer Time Encoded) delay for that sample. the channel "idle time. " Time need only be speci- Receiver extrapolation fitting can fied within a time inter- be expected to be complex in equipval T. (T r.125 x 10i ). ment, if a "boxcar" detection is not Freeze out rate of. 05 O satisfactory. A device to sense is acceptable for extre- extremes of a waveform would mal coding. have to be developed. In total, this appears to be a good system to pursue'. 4-bit 3. 84 x 106 1. 25 Sine wave experimental (S/N)q 26 db Each channel requires a "differtests of Ref. 35 are good Based on sine wave ential-encoder.'! Log-Differential indicator for actual speech experiments of System offers possibility of PCM transmission. Ref. 35. acceptable speech at "standard" Some method would be (S/N) 17 db transmission rates. (Tie-ulipexd)provided to prevent channel Base n sine wave 3-bit 2. 88 x 10 1. 67 noise from seriously affect- experiments of Ref. ing the performance of the 35. delta-modulation. Table V. Comparison of three methods of digitally sending multichannel speech.

6. CONCLUSIONS A comparison for the three systems —-time-multiplex PCM, asynchronously time — multiplexed extremal coding, and time-multiplexed log-differential PCM —-is shown in Table V. It is seen that the asynchronous system offers a substantial bit-rate improvement over that of time-multiplex PCM and log-differential PCM. The asynchronous system would consume only one third of the bandwidth (or corresponding improvement in channel S/N) of that of PCM. One cost of this improvement is about a 5-db loss in quantizing S/N ratio. It should be remembered that conservative figures were used throughout to obtain the estimated values. Further, extremal coding has been investigated by two groups and found to produce good quality speech. For a number of reasons we conclude that asynchronous multiplexing has great potential. We note the following: (1) Automatic use is made of the fact that the speech sources are "off" half the time. In other words, since the channel is only being used when an extreme occurs, it naturally follows that one automatically obtains the advantages of time assignment. (2) The extremal coding should not be as seriously disrupted by channel noise as the delta-modulation methods. (3) Nonreal time data may be conveniently sent during the idle time of the asynchronous system. Thus, with the system parameters used for Table V, one obtains a potential data capability of 20 percent of the total channel capability. Thus one obtains not only a more efficient multichannel speech system, but also the capability of sending irregular data. (4) The asynchronous method allows one to temporarily add more sources than designed for. The degradation in performance would be gradual (or "elastic"). This is not possible for any synchronous multiplexing system. 98

Probably the main obstacles to actual implementation of this asynchronous method are: (1) guaranteeing that the designed freeze out fraction is not noticeable; (2) assuring that time jitter of the extremal samples is kept within tolerance; and (3) developing suitable coding detectors if boxcarring-plus-filtering does not give sufficient quality. We feel that each of these obstacles can be handled with reasonable effort. In conclusion, it is believed that the advantage of (3) above over PCM, which appears from conservative calculations, and the other advantages listed above, are sufficient to justify serious feasibility studies of the asynchronous method. The next step could be either a simulation investigation, or an actual experimental set-up. Each of these appears to be quite extensive. The main objective of the above work has been to establish that a sufficient gain is available, so as to warrant extensive further efforts. 99

APPENDIX A: DETERMINATION OF THE MAXIMUM NUMBER OF BINARY DIGITS PER SECOND THAT CAN BE TRANSMITTED OVER A CHANNEL OF BANDWIDTH W (T. G. Birdsall) A. 1 Introduction There are several studies in the literature of minimum time bandwidth products which can be achieved for various waveforms. This discussion considers what appears to be the same problem of minimum time bandwidth product, but it differs from other studies sufficiently to justify performing the detailed work here explicitly, without reference to other material. Previous studies typically take two forms: (1) determining the minimum time bandwidth product for the duration of the signal when the bandwidth of the signal is measured on specific points of the waveform and spectrum (such as the 3-db points or the 6-db points of the spectrum); (2) placing very stringent conditions on one or the other. That is, the duration may be measured from the very start to the very end of the waveform but the bandwidth measured at some moderate point such as the 6-db point, or conversely, the spectrum may be limited stringently and the duration of the waveform measured at some moderate point. This is a reflection of the relationship between time and bandwidth for a single waveform. The spectrum of a definitely time-limited waveform is itself unlimited, and conversely a strictly bandlimited waveform cannot be strictly time-limited. The purpose of this section is to determine the limitations on the time rate of signalling for block codes such that there is no interblock interference in the demodulation process. The logic of the analysis is based on the assumption that the bandwidth limitation placed upon the communication system is very stringent, like that laid down by the FCC in an assignment of the channel. It is assumed that the communicator does not have control of the adjacent channel transmission; therefore, it is assumed that the bandwidth limitation is of the type that assumes that the energy spilled out of the band, if any, is some low figure like 40 db or 60 db. For this reason the calculation will be based on the mathematical assumption that the band is absolutely limited, and in practice the only energy spilled out of the band will be the result of the practical generation of signals which approximate to a high degree 100

the mathematically bandlimited signals (but this approximation is only good to 1 percent or 1/10 percent in the rms spectrum). The second assumption is that the reception will be sufficiently well-controlled that, in determination of the rate of signalling, we may consider the effective duration of a signal to be the quotient of the number of signals that are sent over some long time, divided by that long time; that is, the duration of the signal is the average amount of time that is assigned to any single signal. Specifically, this is shown in Eq. 96. T = lim duration of N symbols (96) N- wo The following three sections will deal with three specific cases of signalling. All of these assume that the transmission and reception are time synchronous. The first two will deal with RF phase coherent reception for two-polarity transmission, and orthogonal transmissions, and a third section will deal with phase incoherent reception and orthogonal transmission. In each section a maximum signalling rate, the maximum number of digits per second, will be determined under the given transmission and reception limitations, using the criterion that there will be no interblock interference caused by spillover of energy of one block of symbols to the next. This is important because the basic assumption in computations of information rate and error probabilities made in the analysis of block codes is that each block can be considered as an entity, without regard to previous or subsequent transmissions. A. 2 Time Synchronous Block Codes, Phase Coherent Reception, Two-Polarity Transmission The analysis in this section is the direct application of Shannon sampling theorems, Ref. 10, for strictly bandlimited waveforms. If we consider any waveform, x(t), which is strictly bandlimited, this theorem tells us that, if the values of the waveform are specified at instants of time separated by equal intervals of 1/2W seconds apart, then the entire waveform has been specified. This is shown in Eq. 97. x(t) = Z k sin (2Wt - k) (97) k=-oc ir(2Wt - k) N sin 7T(2Wt - k) I(t) E v sin (2Wt - k) (98) k=l m(2Wt - k) The sampling functions, or "reconstruction functions as they are often called, are commonly 101

sin x known familiarly as. Our interest here is in binary transmissions which can take on either polarity, plus or minus. - Let us assume that the amplitude has been specified at the fixed level v. We could consider transmitting a waveform I(t) such as shown in Eq. 98, composed of N information digits, with no transmission preceding or following it. Since each of the reconstruction functions is itself strictly bandlimited to the band W, and in fact, uniformly occupying the entire bandwidth, then the transmitted waveform I(t) is also strictly bandlimited. In a strict sense the duration of this information waveform is infinite; however, if we specify some small amplitude and measure the duration out to the occurrence of this N small amplitude, we will find that it occupies a duration somewhat longer than 2W-* As we increase the number of information digits that we consider to be transmitted, then the fraction of the duration that exceeds N becomes vanishingly small and the time duration that can 2Wbe assigned to each single binary symbol is 2 lim effective duration of I(t) 1 lim N (99) N - co N 2W We have thus demonstrated theoretically a form of signalling for which the 2WT product is unity. We must check to see that there is no interblock interference in the demodulation period. If one were to sample the received waveform composed of the attenuated transmitted waveform, plus added noise, at the synchronous times corresponding to the sampling points as they were impressed on the signal, one would recover the information digit. The preceding and subsequent information digits would have no effect on this sampled value, since their reconstruction functions are going through zeros at this instant of time. In addition, one would have added to the information digit the instantaneous value of noise occurring at that time. Since the 2WT product of the information waveform is unity, no advantage is available by the use of crosscorrelators or matched filters, since the signal-to-noise ratio improvement factor for such devices is simply the 2WT product, here unity. It is obvious that such a signalling system could not, in practice, use the s x x functions, but as a limitation it is still valid, since one could use actual functions close to the sin x which would spill energy outside of the bandwidth to as small a degree as necessary. The price that would be paid is complexity of the modulation equipment and the delay incurred (which typically might be 10 or 20 signal durations). In most communication applications such delays are insignificant.. This unity 2WT product might also be accomplished by other tech102

sin x niques than using sin transmissions and encodement in time. However, if one attempts to get a smaller 2WT product than unity, an analysis of the sampling theorem demonstrates that intersymbol interference must occur and no filtering technique will be available to remove such interference, since the waveform can contain at most 2W independent bits of information per second if this is going to be impressed on it by binary techniques. A more general but more abstract consideration of this limitation can be obtained by viewing the vector space representation of a transmitted waveform. Let us consider a sufficiently long duration, 2WT, which is much larger than one. This is the signal space which is available for the transmission of information. We would like to find subspaces in this space, each subspace corresponding to a specific bit of information, such that these subspaces are mutually orthogonal, i. e., every vector in a subspace is orthogonal to any vector outside of the subspace. When we have the freedom to use two-polarity transmissions, since the information is binary, we can look for orthogonal co-ordinate systems in this large dimensional space and use each axis of the co-ordinate system for one bit of information. The value of the co-ordinate on each individual axis is chosen either plus or minus a specific quantity. There are an infinite variety of co-ordinate systems that one could lay down in the space, but by pure counting of the dimensionality of the space, any given orthogonal co-ordinate system contains exactly 2WT vectors. Thus the amount of information that can be transmitted is 2WT bits using a total duration of T seconds, and therefore one can transmit 2W binary symbols per second for an average duration of 1/2W for each symbol. Figure 34 shows the central portion of a sin x/x reconstruction function which might be used for maximum rate transmission. Figure 35 shows a pulse function which is four times as long, that is, a 2WT product of 4, with the energy just over 50 db outside of the 2/W duration of the pulse. These are included to show that the prime engineering problems will be encountered in attempting to get short duration signals which are individually timneconcentrated only in the last factor of about four, in attempting to get short, effective duration symbols. A. 3 Time Synchronous Blocks, Phase Coherent Reception with Orthogonal Transmission We can attack the study of orthogonal transmissions in two ways: first by demonstrating a possible transmission method very similar to that of Section A. 2, and second, by establishing a definite proof of maximum signalling rate by using the vector space representation. It is obvious that if one considers a form of binary pulse position modulation 103

MAXIMUM RIPPLE: 21% —13.46 db OF PEAK.8 - 4V 0 sin x Fig. 34. Portion of curve suitable for maximum rate transmission. wherein the time axis is considered in pairs of points, all points separated 1/2W seconds apart and each pair therefore occupying 1/W seconds, the orthogonal binary encodement can be accomplished by sending a (sin x)/x pulse centered either at the first time of the pair or at the second time of the pair. These two possible waveforms are orthogonal to each other, as are all (sin x)/x waveforms separated by integral multiples of 1/twice their bandwidth. Thus it is seen that it is possible to transmit orthogonal signals at a rate of W digits per second. In order to see that this is the maximum rate posqiblep we appeal to the vector space representation. Here a long duration, T, of transmission is considered, which can be effectively represented in a 2WT vector space. There are exactly 2WT vectors in every orthogonal complete set, and if we divide this space into orthogonal subspaces and require that each subspace contain two orthogonal vectors, then there will be at most 2WT divided by 2, or WT subspaces. Therefore the maximum rate of signalling corresponds to the number of subspaces possible over a long time, divided by that time: W subspaces per second. This 104

MAXIMUM RIPPLE:.308% OF PEAK (A LITTLE OVER -50db) 1.O-.8-.000177 WIDE.6-.FL'..... 1% OF TIME AXIS *| l SAMPLE VALUES O 0 O o0 I O O O O 4 4 Fig. 35. More practical curve (2TW=4) for maximum rate transmission. means a maximum of W binary orthogonal signals per second. Although the above reasoning may appear quite geometric or pictoral, it is based on the rigorous logic of the requirement that the total transmitted waveform be composed so that the demodulation of a given symbol is not unaffected by the information bit or modulation of previous and subsequent symbols. In the vector space representation this corresponds to a requirement that every possible component in one subspace be orthogonal to every other possible componenet in the other subspaces. A. 4 Time Synchronous Blocks, RF Phase Incoherent Reception with Orthogonal Transmissions In dealing with RF incoherent demodulation one does not have the freedom to draw a direct analogy between an RE waveform and a low-pass waveform, as was done in the previous two sections. Therefore this discussion will be completely limited to the vector space representation of the transmission and demodulation process. With an RF coherent demodu105

lation one can represent each single waveform, and in particular each pure sinusoidal waveform, as occupying a one-dimensional subspace, that is, a pure single vector in the space. If one considers incoherent demodulation, then the demodulator will be affected similarly by the reception of any phase of a pure sine wave component, and indeed can be shown to be a function of the "radius" in a two-dimensional subspace. This is customarily shown in studies of narrowband transmissions and noise (where "narrowband" means that the total bandwidth of the transmission is small compared to the center bandwidth, a condition always met in conventional communications). If the part of the waveform carrying one bit of information is decomposed into the two phase coherent components, the first coherent with a given phase of center frequency and the second at an orthogonal phase of center frequency as shown in Eq. 100, x(t) = x1(t) cos wt + x2(t) sin wt (100) then one can transform to variables similar to polar co-ordinates by the use of the "cosine of two variables" formula, to obtain 2 x(t) = /x;(t) + x2(t) cos[wt + 0(t)] (101) This is often simply written as x(t) = R(t) cos[cot + 0(t)] (102) where the radius R, or envelope R, is the square root of the sum of the squares of the two phase coherent components. A time synchronous, but RF phase incoherent, demodulator is one which is sensitive to the value of R and insensitive to the individual phase components x1 and x2. That is, it is insensitive to the phase angle 0. If one wants to signal using this type of reception, one has to choose two waveforms which will be distinguishable at the receiver. The usual means of accomplishing this is to choose two different frequencies. In the vector space this is represented by assigning one two-dimensional hyperplane to one of the signals and a second and orthogonal hyperplane to the other signal. Thus each pair of binary signals consumes four dimensions, and the total number of such signalling subspaces that can be utilized in a long transmission of duration of T seconds is 2WT divided by four, yielding an average rate of W/2 signalling subspaces per second. 106

A. 5 Summary This section has attempted to demonstrate limitations on the maximum signalling rate which can be used for binary transmissions where (1) the channel assignment is strict; (2) the communicator has no control over adjacent transmissions; and (3) the signalling waveforms need not be individually limited to short times, but merely concentrated in time, and cause no intersymbol interference because of the lack of limitation in time. The results are a direct consequence of the strict band limitation, as is pointed out in the Shannon Sampling Theorem. The method of deriving the results has been to consider the average rate of formation of allowable independent orthogonal subspaces over long durations of transmission. The ultimate limitations that are derived are in terms of the familiar 2WT product and, for the three cases, the 2WT values take on the minimum values of 1, 2, and 4. In actual practice one will find that the difficulty of obtaining improvement will increase steeply as one gets within a factor of about 4 of these limiting values. In these cases very stringent specifications must be applied to the precision of the time synchronization and the precision in formation of the actual signalling waveforms. Thus these limitations are seen to be valid: although they are physically approachable, they may not be viewed as practical goals by many users. 107

APPENDIX B: THE EQUIVALENT SOURCE (R. A. Carlsen) INTRODUCTION In an asynchronous time-multiplexed communication system, N sources feed samples into a clocked buffer (Fig. 36). The purpose of this appendix is to show the relationship between the individual sources and the equivalent source indicated in the figure. It turns out that the equivalent source is Bernoulli (i. e., that the number of samples "generated" in an interval T is Poisson-distributed) when a large number of independent sources is assum ed. In the first section, we define a model for the single source. Then, on the basis of this model, the probability of exactly k samples in an interval T is determined for the single source. This result is then used to determine, in Section B. 2, the probability of k samples in the interval T, given a large number of independent single sources which becomes, in the limit, the Bernoulli equivalent source. The discussion is summarized, in Section B. 3, by considering an alternative approach which leads to the same result. That is, the N source, are modeled as trials in a Bernoulli experiment. The approach presented in the first two sections is instructive for at least two reasons. First, the end result is obtained by progressing from the single source whose characteristics are known to the equivalent source; second, some insight is gained into the problems encountered when N, the number of sources, is restricted to small values. B. 1 The Single Source The distribution of the intervals between extremes has been experimentally de4 termined by Mathews. However, this distribution alone is not an adequate description of the statistics of the speech waveform. For example, we have no knowledge of the correlation between successive intervals. It is reasonable, therefore, to fill in the missing data by assumption (the procedure which is always followed in the construction of an analytical model). 4See Ref. 28 and Fig. 37. 108

N SOURCES Perhaps one might question the wisdom of defining a model for the speech process. At least a partial answer to this objection is the advantage which can be gained by knowing the precise characterization of the process which has been substituted for the original source. For example, the speech source is modeled as a stationary, simple Markoff process. As a first step in defining a model for the speech process, stationarity is assumed. The individual speech source is obviously nonstationary, as the speaker is silent approximately half the time. Perhaps we can think of a single source as being equivalent to two system users, one occupying the silent periods of the other. Second, the distribution of the intervals between extremes will be assumed to be of the form flu) = u -Xu (103) where: X in= the average sample generation rate for a single source u The mean interval between extremes. TPerhaps function is a reasonable approxmight quesmation of the experiment data and is, in a model for mathe speech process. matically convenient. This function, Eq.103, is called the Erlang type-i distribution and is obtained by the convolution of two exponential density functions. An exponential source means f(u) u e (103) 109

POINT OF LAST ZERO OBSERVATION LAST ZERO POINT OF LAST ZERO CROSSING T CROSSING OBSERVATION..... X dX, Fig. 37. Notation for f(x/ ). Fig. 38. Notation for f( ). simply that the intervals between samples are exponentially distributed. For example, an Erlang type-1 source can be obtained by removing every other sample from an exponential source. It is clear that the extremes of the speech waveform are the zero-crossings of the derivative of the original waveform. Therefore the distribution for the intervals between the extremes will be referred to as the "zero-crossing distribution" with the understanding that the derivative of the speech signal is being discussed. In addition to stationarity, it is assumed that the intervals between zero-crossing are independent. Assuming independence between intervals and the zero-crossing distribution (Eq. 103) suggests that one might model the speech source as a simple Markoff process. 5 Although this additional assumption is not utilized in this appendix, it does establish a firm basis for additional analytical work. For example, Faureau has shown experimentally that the output of a three-pole low-pass filter with a white Gaussian input has a zero-crossing distribution of the form assumed here for the speech process. 7 Thus, once the zero-crossing distribution for bandlimited Gaussian noise is solved, an analytical solution based upon a Markoff model would be an approximate solution for the current problem and an exact solution for a modified application of the system (e. g., the transmission of pseudo-noise coded signals). 5Middleton, Ref. 39, pp. 45-48. 6Ref. 40, and Fig. 38. 7In fact, McFadden (Ref. 41) has shown that the intervals between zero-crossings are nearly independent for the three-pole low-pass filter. 110

When the simple Markoff model and an Erlang type-1 zero-crossing distribution have been specified, the next step is to determine the statistical properties of the source. If the interval between zero-crossings, u, is normalized with respect to the mean interval between zeros (1/X), the assumed zero-crossing density function becomes f(u) = u e (104) where: u = the normalized interval between zeros. Given a point of observation ( ) relative to the last zero-crossing, the probability of the next zero-crossing in the range x, x + dx (where x is the elapsed time since the point of observation) is given by f1(x/A) = f(u) (105) f f(u)du where: f1(x/ ) = the probability density of a zero-crossing, given 5 since the last zero (see Fig. 37). This is a conditional zero-crossing density which, for the zero-crossing density assumed in Eq. 104, is given by f (x/) = x ex (106) One can easily show that for an exponential source, f(u) = e, the above conditional density function, f1(x/ ), is still exponential. That is to say, for an exponential source the distribution of the intervals between an arbitrary point of observation and the next zero-crossing is equal to distribution of the intervals between consecutive zero-crossings. That is equivalent to saying that probability of a zero-crossing from an exponential source in an incremental interval, du, is equally likely and independent from interval to interval. Since the zero-crossing density function is independent of the point of observation for the exponential density function, the probability of k zeros in an interval - is Poisson-distributed (i. e., the single source is Bernoulli). Furthermore, N independent exponential sources in parallel are equivalent to a single exponential source with a generation rate N times that for a single source. As a result, an equivalent Bernoulli source is obtained when a finite number of independent exponential sources is assumed. However, the assumption that the zero-crossing density function 111

POINT OF /OBSERVATION SOURCE NUMBER T I I I POINT OF OBSERVATION I I I | 1 3 -- - r~-,~ 1 Fig. 39. Notation for P'(k,?T/). N.. Fig. 40. Notation for P(k,T/rl' 2''' * N)' is exponential is a poor approximation of the experimental data. 8 The objective is to obtain the probability, P'(k, -r/ ), of k zero-crossings in an arbitrary interval 7, given. It turns out, as indicated in Section B. 2, that the expected value of P'(k, u/ ) is of interest. However, it is first necessary to determine the probability density function, f(4 ), for the intervals. Suppose (as in Fig. 38) that the interval between two zeros is known to be u. The point of observation will always fall between two such zerocrossings. Now, the point of observation, 5, is selected at random (i. e., each interval d~ between the indicated zeros is equally likely); the probability of starting the observation between ~ and 5 + d, given u, is f( /u) d5 = - d~ (107) u as the point of observation is uniformly distributed between 0 and u. However, the interval u is a random variable having the density function given by Eq. 104. Therefore, o0 f(5 )dk = f f(4/u) f(u) du d5 (108) f( ) = e Now, one finds P'(k, T/4 ) by finding the probability of k zeros in the interval -. See Fig. 39 and Ref. 28. 112

This is determined by knowing the probability density for the intervals between every second, third,... zero-crossing. Since the intervals between zeros are independent, the desired density functions are obtained by convolution. f2(x/ ) = f1(x/5) * f1(x/O) f3(x/) = f1(x/) ) * f2(x/O) fn(x/ ) = f1(x/() * fni (x/O) (108a) where: fn(X/k )dx = the probability of the nth zerocrossing in the interval x, x+dx, given k since the last zero fn(x/O)dx = the probability of nth zero-crossing in the interval x, x+dx, given an intial zero-crossing. The probability of k zero-crossings or less is obtained by integrating these density functions for x greater than T. P'(k<n, 7/) = f fn(x/ ) dx (108b) That is, we obtain the probability that the nth zero has not occurred. For n=1, in the above equation, we have the probability of no zero in the interval T. The probability of k or more zeros is therefore P'(k> n,T/) = f fn(x/ ) dx (108c) Thus, the probability of exactly k=n zerus in the interval is given by P'(k=n, / ) = P'(k n+l, 7/ ) - P'(k _ n, 7/T) (108d) These results agree with McFadden's equivalent expression for P(k=n, r/0).9 These computations can be simplified by taking advantage of the following relationship: 9J. McFadden, Ref. 41, Appendix I. 113

P'(k,T/5) = f f(x/5 )dx P'(k-1, T-X/0) 0 P'(k,T/t) = f f(T-x/ )dx P'(k-l,x/0) (109) 0 The prime identifies the result with a single source. The probability of no zeros in the interval T, given, is oo (0, +X e dx (see Eq. 106) T P'(O,T/ ) = + T+ 1 e (110) + 1 P'(k, T/ 0) is of particular interest as the point of observation is always at a zero-crossing. One can show, by convolution, that -P'k 2k 2k+ 1 P'(k,T/0) = e 2 (2k+)! (111) [ R 2k-l + 2 + 1 The expected value of P'(k, u/h) is P'(k,u) = f P'(k,u/5) f(5) d5 (112) o T 00 or P'(k,u) = f f f(T-x//) f(t) d~ P'(k-l,x/0) dx (113) O O P'(k, u) is the probability obtained by measuring across an ensemble of sources or from a long observation of a single source (an ergodic source is assumed). The computations are summarized by: P'(0,T) = e-[1 + yT] P'(1,T) = e + 2! + ye [T+ 3!] = CT ~ + yeT P'(2, T) = e + i + ye - 3! +5! 4!. +3., _TF 2k- 1 2k _- [_ 2k- 1 T2k+ 1 P'(kr) = e (2k- 1)! 2k (2k- )! + (2k+l)! (4) oo -~ where: y = fd.62 j O+1 114

The above equations may be written as Z P'(k,T) =1 k=O P'(k,T) > 0 all k and T lim P'(k,T) = 0 all k T - 00 lim P'(k,T) = 1 k =0 T - O lim P'(k,T) = 0 k O T - 0 The expected number of zero-crossings, k, in an interval T, is o0 u' = Z kP'(k,T) k=O -T 2 3 4 5 6 u' = e [T++ ) + 2 + 3( ) +... -+e I-T+! +! +... which for small T becomes u' = (1 - Y)T (115) All the above results have been obtained using the normalized zero-crossing distribution. Therefore, T is replaced by X T in all the above equations for the actual source having a generation rate of X samples per unit of time. B. 2 The Multiple Source Since it is the multiple source which is of interest, consider the N identical (but independent) sources shown in Fig. 40. We want to determine the probability of k samples from the equivalent source in the interval T, given the intervals 5 1' 2... n. The point of observation is just prior to a clock pulse, and the interval T ends just prior to the next clock pulse. The probability of having no zero-crossings from the equivalent source during 115

the interval T is P(O,7/ 1, 2...n)= P'(0, 7/ ) P'(,/ 2)... P'(O,7/ N) (116) In the same way, the probability of just one zero-crossing from the equivalent source is P(1,7/1 I 2. n) = P'(1,7/1 ) P'( O,7/ 2)... P'(O,T7/N) + P'(0, T7/1 ) P'(1,7T/ 2)... P'(O, / N) + P'(0, T7/1 ) P'(O, 7/2 2... P'(1, /t/ N) (117) For larger values of k similar expressions are obtained by considering all the possible combinations of N channel outputs which will produce k zero-crossings. If the number of sources, N, is large enough to justify an ensemble average for P(k, T/ 1' I 2'... N) one can take advantage of the results of the previous section, Eq. 114. For example, the expected value of P(O, T/5 1, 5 2'... I N) is P(0, T) = f f f... P'(0, T/ 1) P'(O, 7/ _)... P'(0, T/ N) o o 0 f(l 1I 22 - n) d 1 d 2.'' d N (118) Since the sources are assumed to be independent, the joint distribution for the intervals 1' 2'''' * N can be written as f('1 42' * N) = f( 1) f(2)' f(N) (119) and, as a result P(0,7 T) = f P'(0, T/ 1) f( 1)d4 f... f P'(0, T/A n) f( n)d5 n or P'(0,T) = CN[P'(0,T)] (see Eq. 114) (120) In the same way, the probability of k=l, 2, 3, 4,... can be shown to be given by 116

P(2,T) = CN P'(2, ) P'(O,) + C 2 P'(1, ) P(, N N-i N-2 P(3,T) = C1 P'(3, T) P'(0, T) + C P'(2, T) P'(1, ) P'(0, T) 1 2 + C3(1 T) P'(l3 T)N-3 N N-i N p(4, T) = C P'(4 T) P'(, ) + C2[P'(3,T) P'(1, T) + 1(2,)] P(,'T) + C P'(2,T) P'(1,T)3 pq(O,)N3 3 N N-4 + CN P'(1 )4 P'(0, T)N-4 That is to say, one can obtain P'(k, 7) for a single source and use this result and a knowledge of the combinations of sources which will produce a total of k zeros in a clock interval 7. The taking of the ensemble average (i. e., a large number of sources) is the assumption which must be avoided in order to study the advantages of asynchronous multiplexing for a small number of sources. It is instructive to consider the probability of k samples in the interval 7 as the number of sources, N, approaches infinity. Since the N sources are independent, the expected number of zero-crossings per clock interval from the equivalent source is u = N u' u = N(1- y)T (see Eq. 115) Therefore, = N(1 - ) (121) When this substitution is made in Eq. 120, we have 117

P(O,u) = CO exp N(1-y)} (1+ N(u ))1 It is easily shown that lim P(O,u) = e u (122) N- co Likewise, lim P(1,u) = u e-u N- co 2 lim P(2, u) = e N- oo k u -u lim P(k,u) e N- oo That is, the number of samples generated by the equivalent source is Poisson-distributed. Thus, the equivalent source is Bernoulli when a large number of independent, simple Markoff, Erlang type-1 sources is assumed. B. 3 The Bernoulli Experiment The same result can be obtained by considering the transmitter as performing a Bernoulli experiment. In essence, the transmitter is looking at N channels in the interval r which, for large N, is small enough that the probability of more than one sample from a single source is negligible. That is, a look at each source during an interval T constitutes a Bernoulli trial (i. e., only a binary outcome is possible). If, in addition, it is assumed that the N sources are independent and that probability of a sample is constant from source to source in each interval T (i. e., that the expected number of samples per clock period.is constant), the assumptions underlying the Poisson approximation for a series of Bernoulli trials are satisfied. Thus, an equivalent Bernoulli source is obtained when (1) the assumptions governing a Bernoulli experiment are satisfied (from interval to interval), (2) a finite number of independent exponential sources is combined in parallel, or (3) the assumptions and model of Section B. 2 are applicable. 118

APPENDIX C: ANALYSIS OF FREEZE OUT RUNS (T. G. Birdsall) Summary of Freeze Out Runs Equations and definitions: P(c) -- probability of a run of exactly c clock intervals containing some (total) source inputs frozen out P(c) = (1+u) e [1 1+u)] c1 E(c) = (c= P(d) = probability of d source inputs being frozen in a clock interval, during the second and third, etc. clock interval of a run of freeze outs d+1 -u u P(d) = p(d+l,u) = e d_ 1 (d+l)! Numbers for u -. 8 Runs of Length c Blocks of Size d in a Given Interval P(1) = 80. 9 % P(1) = 80. 88 % P(2) = 15.4 % P(2) = 14.38 % P(3) = 3.0% P(3) = 3.53 % P(4) = 0.6% P(4) =.71 % P(5) = 0.1% P(5) =.50% E(c) = 1. 24 Note: These conditional equations are independent of M, the buffer size. Detail An attempt on the problem of freeze out follows (assume N large): I) The first step is to consider a different problem —that of the distribution of consecutive freeze outs from all sources combined —and then to tackle the problem of a single source (voice channel) and its individual freeze out. 119

We know the distribution of "blocks" of freeze outs, that is, the number of inputs dropped between successive clock pulses. This is Prob(O digits dropped I some dropped) M+d i M+i Since there is only a finite number of nonzero a k terms, we can write each M+d as: U 2 U3 M M U 1 OM+1 e aM 1 2! M2 3!+ 1 M! o (M+1)! J 3 4 M+2 31\112 z e-u u u u tM+2 = e aM-1 3! aM-2 4! + + ao (M+2)! d+ 1 d+2 M+d M+d a e aM- 1 (d+l)! + aM-2 (d+2) + ao (M+d)! M d+i = e Mi a-i (d+i)! -uud M d! i e d! i aM-i(d+i)! u M p(du) i1 aM-i (d+l)(d+2)...(d+i) Note: M+d is strictly monotone decreasing in d since each coefficient of ak is progressively smaller for u < 3 (so it certainly is for u < 1). This makes the last equation look like a poor choice; thus the following is preferable: d u M ui d -uM i d uM u e u MM+d iue aM-i (d+i)! 1(M) i-1 -1 (d+l)! These can be readily calculated for u =. 8, for which we already know ak and r(M). Thus P(d digits dropped at once j there is freeze out) ud e iIl NM-i (d+i)! M-i (d+i) Cx j M i cx x uj e- YM-i i+j)! E E u - i Mi 120

A slightly different method is to calculate the probability of d2 simultaneous freeze outs, assuming at least one freeze out in the previous T-clock interval. In this case we know that there is one buffer cell empty at the beginning of the input interval (that is, 0 = 0+) P(d2 previous freeze out) = p(d2 + 1,u) 2 -u u (d2+1)!' while the probability of no subsequent freeze out is the probability of 0 or 1 arrival: p(no FO I previous interval FO) = e-u(1 + u) These are independent of buffer size. For example, for u =. 80, we have the following: p(no further FO I FO in previous interval) = 80. 88 % p(1 FO ) = 14.38 T p(2 FOinblock ) = 3.53% p(3 FO in block I ) =.71 T p(-> 4 FO in block ) =.50 % This implies that there will be definite runs of freeze out once single freeze out occurs. If we further manipulate this, we get P(1 interval contains FO'sI previous interval contained FO's) = l - eU(- + u) = e- ueu 1 - u) e-u = e Yu Iteratingfthis, we get P(c subsequent intervals all contain FO's I initial interval contains FO's) = (euyl)c = ecuylC 121

This is a geometric progression. Summarizing formally: Def. A freeze out run of length c occurs when there are source inputs frozen out in exactly c consecutive clocking intervals. Corr. During a freeze out of run length c, only the first source input arriving after a clock pulse will be accepted by the buffer. Equations: P(c = 1) = e-(l+u) P(c' 2) = 1 - e-u(l+u) = e-u(eu-u-1) P(c > 3) = e 2(eU-u-1)2 P(c > c*) = e(c*-l)U(eu-u- )(C*-l) Therefore P(exactly c) = P(> c) - P(- c+l) = -u u c-i -u U c [e (eu-u-1)] - [e- (e-u-i)] = [e(eu-u-1)]c- [1 - eU(eu-u-)] = [e(eu-u-)]c- [(u+l)e I] P(freeze out run of length c) = (u+l)eu [1-eu(l+u)]c Detailing: P(c = 1) = e-u(l+u) P(c = 2) = e-U(l+u) [eu(euu-l)] P(c =3) = e-U(l+u) [e-U(eu-u-l)] 2 122

To check: These should add to 1. 0000. 0u u eU (+u) Z P(c) = eU(l+u) Z [e u(eu-u-l)] = c=1 i=O 1-[e U(e-u-1)] = e u(l+u) = 1 1-[1 - e U(-u-l)] The average value of a geometric probability density is 1 -atio so 1 1 ~~1 ratio 1-[e-u(eu-u-1)] e-u (u+1) P(c=l) 1 E(c) P(c= I Example: u=. 8 P(c=l) =.8088 E(c) = 1. 24 P(c=l) =. 8088 80. 9 % P(c=2) =. 1542 (slide rule) 15. 4 % P(c=3) =.0285 3. 0% P(c=4) =.0056.6 % P(c=5) =.0011.1% P(c=6) =.0002 It is difficult to say directly how many inputs are frozen out during the initial interval of an FO run, but we have established above that the probability distribution in the remaining intervals is P(d FO's) = p(d+l,u). 123

LIST OF REFERENCES 1. R. W. Sanders, "Digilock Telemetry System," National Symposium on Space Electronics and Telemetry, Section 6. 3, September 28-30, 1959, San Francisco, California, pp. 1-10. 2. Private communication and "Low-Density Parity Check Codes, " Gallager, R. G., PGIT, Vol. I, T-8, No. 1, January 1962 3. G. L. Turin, "An Introduction to Matched Filters," PGIT, June 1960, pp. 311-329. 4. Davenport and Root, Random Signals and Noise, McGraw-Hill, 1958, p. 156. 5. S. O. Rice, "Mathematical Analysis of Random Noise," BSTJ, Vol. 23, July 1944, pp. 282-332. 6. R. M. Fano, Transmission of Information, MIT Press and John Wiley, 1961. 7. A. Viterbi, "Classification and Evaluation of Coherent Synchronous Sample Data Telemetry Systems," 1961 Western Electronic Show and Convention, Session 14-3, August 22-25, 1961. 8. W. W. Peterson, Error Correcting Codes, MIT Press and John Wiley, 1961. 9. J. J. Metzner and K. C. Morgan, "Coded Feedback Communication Systems," 1960 National Electronics Conference, pp. 250-256 and bibliography. 10. Davenport and Root, Random Signals and Noise, McGraw-Hill, 1958. 11. Siegfried Rieger, "Error Rates in Data Transmission," Letter to the IRE Proc., May 1958, pp. 919-920. 12. A. J. Viterbi, "On Coded Phase Coherent Communications," JPL Technical Report No. 32-25, JPL Laboratory, California Institute of Technology, August 15, 1960. 13. Charles R. Cahn, "Performance of Digital Phase Modulation Communications Systems," PGCS, May 1959, pp. 4-6. 14. T. G. Birdsall and M. P. Ristenbatt, "Introduction to Linear Shift-Register Generated Sequences," Cooley Electronics Laboratory Technical Report No.. 90, The University of Michigan, Ann Arbor, Michigan, October 1958. 15. I. S. Reed, "A Class of Multiple Error Correcting Codes and the Decoding Scheme," PGIT-4, September 1954, p. 38. 16. V. A. Kotel'nikov, Translation by R. A. Silverman, The Theory of Optimum Noise Immunity, McGraw-Hill Book Co., 1959. 17. R. M. Lerner, "Modulation and Signal Selection for Digital Data Systems," Paper in National Electronics Conference, 1960, pp. 2-14. 18. G. L. Turin, "The Asymptotic Behavior of Ideal M-ary Systems," Letter to the Editor of IRE, January 1959, p. 93. 124

LIST OF REFERENCES (Cont.) 19. J. M. Brumbaugh, L. C. Morris, et al, "Future Astro-Communications Techniques," Technical Report RADC-TN-60-142, August 1960, Applied Research Defense Electronic Products, RCA, ASTIA No. 243698. 20. G. A. Roberts, D. R. Rothschild, R. A. Carlsen, J. L. Daws, Jr., T. G. Birdsall, and M. P. Ristenbatt, "An Introduction to Pseudo-Random Systems, Vol. I: Basic Concepts and Techniques," Cooley Electronics Laboratory Technical Report No. 104-I, The University of Michigan, Ann Arbor, Michigan, May 1960 (S). 21. John G. Lawton, "Investigation of Digital Data Communications Systems," Technical Report RADC-TR-161-58, Cornell Aeronautical Laboratory, January 3, 1961, ASTIA No. 256584. 22. K. H. Powers, Lectures in Communications Modulation and Detection Techniques, University of Pennsylvania Summer Course, 1961 (Notes to be published). 23. R. M. Lerner, Lectures on Communication System Theory, Chapter 11, "Design of Signals," McGraw-Hill Book Co., 1961. 24. A. Norton, E. L. Shultz, and H. Yarbrough, "The Probability Distribution of the Phase of the Resultant Vector Sum of a Constant Vector Plus a Rayleigh Distributed Vector," Journal of Applied Physics, January 1952, pp. 137-141. 25. R. W. Sanders, "Communication Efficiency Comparison of Several Communication Systems," Proc. IRE, April 1960, pp. 575-588. 26. Internal Memo to Cooley Electronics Laboratory Project 4251 File, E. P. Gould, Subject: TASI System. 27. BTL Record 1960 and AIEE, July 1959, Transactions on Communication and Electronics. 28. M. V. Mathews, "Extremal Coding for Speech Transmission," PGIT, Vol. IT-5, September 1959, No. 3, p. 129. 29. L. R. Spogen, et al, "Speech Processing by the Selective Amplitude Sampling System," Journal of the Acoustical Society of America, Vol. 32, No. 12, December 1960, p. 1621. 30. R. Filipowsky, "Ein Mehrkanal-Impulsubertragungsverfahren mit automatisch veranderlicher Impulszahl," Osterreichische Zeitschrift fur Telegraphen-Telephon-, Funk- und Fernsehtechnik, No. 9-10, 11-12 (Sept. -Oct., Nov. -Dec. ), 1955. 31. R. J. Filipowsky, "Digital Data Transmission Systems of the Future," PGCS, March 1961, p. 88. 32. Monthly Letter No. 10 on Project 04251, Signal Corps Contract No. DA 36-039 sc-87172. 33. J. L. Yen, "On Nonuniform Sampling of Bandwidth Limited Signals," IRE Trans. on Circuit Theory, December 1956. 34. Bernard Smith, "Instantaneous Companding of Quantized Signals," BSTJ, 36 (1957) 653. 35. R. L. Miller, "The Possible Use of Log Differential PCM for Speech Transmission in Unicom," Globecom-5, Chicago, Illinois, May 22-24, 1961. 36. A. H. Nuttall, "Error Probabilities for Non-Orthogonal M-ary Signals under PhaseCoherent and Phase-Incoherent Reception," Technical Report TR-61-1-BF, Litton Systems, Inc., June 15, 1961. 125

LIST OF REFERENCES (Cont.) 37. D. M. Green and T. G. Birdsall, "The Effect of Vocabulary Size on Articulation Score," Cooley Electronics Laboratory Technical Report No. 81, The University of Michigan, Ann Arbor, Michigan, January 1958. 38. "The Theory of Signal Detectability," by W. W. Peterson and T. G. Birdsall (U). Part I. "The General Theory," June 1953. Part II. "Applications with Gaussian Noise, " July 1953. 39. D. Middleton, An Introduction to Statistical Communication Theory, McGraw-Hill, 1960. 40. R. P. Faureau, "Evaluation of Complex Statistical Functions by an Analog Computer," IRE Convention Record, Part IV, pp. 31-37, 1956. 41. J. A. McFadden, "The Axis Crossing Intervals of Random Functions, II," PGIT, pp. 14-24, March 1958. 126

DISTRIBUTION LIST (1 copy unless noted otherwise) OASD (R&E), Rm 3E1065 Commander, Air Force Command and ATTN: Technical Library Control Development Division The Pentagon ATTN: CCRR & CCSD Washington 25, D. C. L. G. Hanscom Field Bedford, Mass. Chief of Research and Development OCS, Department of the Army Commander, Rome Air Development Washington 25, D. C. Center ATTN: RAALD Chief Signal Officer Griffiss Air Force Base, N. Y. ATTN: SIGRD-6 Department of the Army Commanding General Washington 25, D. C. U. S. Army Electronic Proving Ground ATTN: Technical Library Chief Signal Officer Fort Huachuca, Arizona ATTN: SIGOP-5 Department of the Army Commander, Armed Services Technical Washington 25, D. C. Information Agency ATTN: TIPCR Chief Signal Officer Arlington Hall Station ATTN: SIGAC Arlington 12, Va. (10 copies) Department of the Army Washington 25, D. C. Chief, U. S. Army Security Agency Arlington Hall Station Chief Signal Officer Arlington 12, Va. ATTN: SIGPL Department of the Army Deputy President Washington 25, D. C. U. S. Army Security Agency Board Arlington Hall Station Director, U. S. Naval Research Laboratory Arlington 12, Va. ATTN: Code 2027 Washington 25, D. C. Commanding Officer U. S. Army Signal Equipment Support Commanding Officer and Director Agency U. S. Navy Electronics Laboratory ATTN: SIGMS-ADJ San Diego 52, California Fort Monmouth, N. J. U. S. National Bureau of Standards U. S. Continental Army Command LnO Boulder Laboratories USASRDL ATTN: Library Fort Monmouth, N. J. (3 copies) Boulder, Colorado Corps of Engineers Liaison Office Commander USASRDL Aeronautical Systems Division Fort Monmouth, N. J. ATTN: ASAPRL Wright-Patterson Air Force Base, Ohio AFSC Liaison Office Naval Air R&D Activities Command Commander, Air Force Cambridge Research Johnsville, Pa. Laboratories ATTN: CRO Marine Corps Liaison Office L. G. Hanscom Field USASRDL Bedford, Massachusetts Fort Monmouth, N. J. 127

UNIVERSITY OF MICHIGAN IlIlIlEaIIIEaiIhlhlmliflhI 3 9015 03695 4538 DISTRIBUTION LIST (Cont.) (1 copy unless noted otherwise) Commander, Air Force Command and Commanding Officer Control Development Division USASRDL ATTN: CRZC ATTN: Logistics Division L. G. Hanscom Field For: B. J. Keigher Bedford, Mass. Fort Monmouth, N. J. (9 copies) Commanding Officer Commanding Officer USASRDL U. S. Army Signal R&D Laboratory ATTN: Dir of Research/Engineering ATTN: Tech Info Div Fort Monmouth, N. J. Fort Monmouth, N. J. (3 copies) Commanding Officer Chief of Naval Operations USASRDL U. S. Navy Department ATTN: Tech Documents Center ATTN: E. H. Stuermer (OP-07T1C) Fort Monmouth, N. J. Washington 25, D. C. Commanding Officer U. S. Army Signal R&D Laboratory ATTN: SIGRA/SL-NDM Fort Monmouth, N. J.