CELL DELAY IN ATM MULTIPLEXERS FOR CONSTANT BIT RATE CELL STREAMS CA. Cooper, Bell Communications Research 445 South Street P.O. Box 1910 Morristown, NJ 07960-1910 C. Teresa Lam Department of Industrial and Operations Engineering The University of Michigan Ann Arbor, MI 48109-2117 U. Madhow, T.J. Ott Bell Coimunications Research 445 South Street P.O. Box 1910 Morristown, NJ 07960-1910 Technical Report 93-4 January 1993

ITM-TSV-022371 December 4, 1992 Cell Delay in ATM Multiplexers for Constant Bit Rate Cell Streams C. A. Cooper T. Lam U. Madhow T. J. Ott 1. INTRODUCTION This paper quantifies the congestion-related delay which may occur when a number of Constant Bit Rate (CBR) cell streams are multiplexed into a single transmission link, or into a pool of fixed transmission capacity. The intended application of this work is the engineering of Asynchronous Transfer Mode (ATM) multiplexers and switches. The congestion we study inl this paper is the congestion due to random clustering of the cells from different cell streams arriving at an ATM multiplexer. Quantitative treatment of this problem is of considerable importance in the formulation of requirements for Broadband Integrated Services Digital Network (B-ISDN) equipmentls for several reasons: a B-ISDN switch's buffers should be large enough to handle the anticipated level of cell clustering without appreciable loss, the amount of Cell Delay Variation121 experienced on an ATM connection because of cell clustering must be acceptable from network performance and service perspectives, and a B-ISDN switch's policing function1131 must have a certain level of tolerance for cell clustering. Current standards141 define the Virtual Channel Connection (VCC) and the Virtual Path Connection (VPC) as the basis for transporting information through a B-ISDN. A VCC or VPC is identified on any B-ISDN link by means of an address field contained in the header of each cell, and a VPC may be considered as a set of VCCs. A specified peak cell rate or transmission capacity can be associated with each VCC and each VPC1112131. A peak cell rate can also be associated with a B-ISDN link and with a fixed-capacity pool contained within a B-ISDN link that serves, for example, as an administrative entity used in B-ISDN resource allocations51. An ATM multiplexer generally delivers its output to such a fixed-capacity pool. This study makes use of available results161 171 181 to quantify the distribution of the delay due to cell clustering experienced by CBR traffic in a single stage ATM multiplexer for relevant configurations of B-ISDN equipment and service applications. Numerical results are obtained which underline the recognized need131 for the B-ISDN policing function to tolerate a certain degree of cell clustering. These results suggest engineering guidelines for multiplexer design by relating B-ISDN performance parameters121 specifically Cell Loss Ratio and Cell Delay Variation, to the size of the output buffer in the multiplexer. This memorandum is organized in the following manner. Section 2 describes the cell clustering phenomenon and the resulting system engineering problem. Section 3 describes some specific situations which are likely to arise in the context of anticipated B-ISDN equipment and service applications, and gives some numerical results illustrating the tradeoffs in engineering the system. A more comprehensive set of numerical results is given in the appendix. Section 4 gives a brief overview of the 1

TM-TSV-022371 December 4, 1992 theory, and gives the formulas for generating most of the computations presented in this paper. While this information suffices for a good system design, more detail can be obtained from the open literature16171181. Further refinements of the interpretation of our results are presented in Section 5. Section 6 describes possible applications and extensions of our results for the delay analysis of multi-stage multiplexers. Section 7 contains our conclusions. 2. PROBLEM FORMULATION This paper studies congestion-related delay in an ATM multiplexer or switch which carries only Constant Bit Rate (CBR) Virtual Channel Connections (VCCs), in the situation where all those VCCs have the same bandwidth. Since this is the situation where the ATM network is strictly used for circuit emulation, it is at first glance surprising that congestion occurs at all. We will now explain the system, and explain why congestion can indeed occur. The output channel of the multiplexer is a slotted channel which exactly once every A time units transmits a cell (if one is waiting for service); that is, the output rate is I/A cells per timeunit. For instance, A for an STS-3c equals 2.831 microseconds. Each of the VCCs using the multiplexers is truly CBR, in the sense that their peak rate equals their average rate, and all these VCCs have the same rate. In fact, each VCC satisfies an even stronger condition: it sends a cell exactly once every A time units, where of course 1/A = peak rate = average rate. For packetized 64 Kbits/sec voice channels, A would probably be 6 milliseconds. Further details and other examples are given in Section 3. Suppose that at some point in time t exactly K(t) such virtual circuits are actively utilizing the multiplexer. Of course, K(t) varies over time as new virtual circuits are added or become active, or as existing virtual circuits become inactive or are taken down. In this paper we restrict ourselves to the situation where call set up control ensures that K(t) < A/A for all t. (2.1) Thus, the maximum number of VCCs which can be multiplexed is at most N = A /A], where [ J denotes integer part. In different systems, where for some reason (2.1) is occasionally and temporarily allowed to be violated, a completely different approach to congestion analysis than utilized in this paper is needed (such as the so called fluid flow models91 ). Since all VCCs are CBR, it might seem plausible that maintaining (2.1) is sufficient to prevent congestion in the multiplexer. This is not the case: while (2.1) and the fact that each source sends a cell exactly once every A time units ensure that the multiplexer receives exactly K(t) cells in any interval of length A, and serving K(t) cells takes K(t) A < A time units, random clustering of the cells can lead to time intervals shorter than A during which the multiplexer receives more cells than it can handle. This results in a build-up of cells in the buffer of the multiplexer, and if enough buffer capacity is not available, it even causes buffer overflow and cell loss. In the most extreme situation, once every A time units exactly K(t) cells arrive essentially simultaneously, resulting in either cell loss or delay by up to (K(t)- 1) A time units. Luckily, this extreme pattern has negligible probability. Less extreme and more likely arrival patterns can still cause congestion, however, and it is the probabilistic behavior of this phenomenon which is 2

TM-TSV-022371 December 4, 1992 the subject of our study. The situation studied in this paper is that of Figure 1, where the different VCCs submit cells to the multiplexer. In this paper, we assume that there is no input buffering (Bi, = 0). Incoming cells are moved instantaneously to the output buffer, where they are served in a slotted fashion, one cell exactly every A time units. There is no congestion before the output buffer. In this respect the results in this paper apply to an ATM switch with output port congestion only, and "blocking-free" behavior inside the switch. This model suffices for our purpose, which is to restrict attention to the delay due solely to the phenomenon of random clustering. The output port has a buffer capacity of B cells. There is an option to set a maximum number of VCCs which can be multiplexed, Kmax where Kmax < N < A/A, (2.2) and require that call set-up control always keeps K(t) < Kmax < N < A/A. (2.3) Our goal is to engineer this system given A and A. In this context, engineering the system is to choose B and Kmax in such a way that we have an appropriate trade-off between cell-loss probability, probability distribution of delays of those cells that are not lost, utilization of the system, cost, etc. VCCl Bi |'- |B }.. R (pool) VCCK -Bi Iin iout Cells enter the multiplexer through boundary iin and exit the multiplexer through boundary Iout. Figure 1. ATM multiplexing of VCCs into a fixed-capacity pool Fortunately a queueing system very much resembling the congestion problem of interest in this paper has been studied previously in the open literature by Bhargava, Humblet and Hluchyj (1989)161, Ott and Shanthikumar (1991)[71, and Roberts and Virtamo (1991)81. The main thrust of this paper is to provide tables and graphs of expected delays, delay distributions, and cell loss probabilities, and explanations of how to use these tables and graphs in engineering the multiplexer in Figure 1. All this focusses on the six systems described in the next section. The tables and graphs are based on the three papers from the open literature just cited. We provide in Section 4 most of the information required to generate our results, together with an interpretation of the numerical results given in Section 3 and the appendix. 3

TM-TSV-022371 December 4, 1992 3. ANTICIPATED B-ISDN APPLICATIONS: SOME EXAMPLES This section lists the system parameters corresponding to some important B-ISDN equipment and service applications. We give some numerical results for one of these applications in order to provide a flavor of the system tradeoffs explored in this study. The corresponding results for other applications are given in the appendix, and a more precise interpretation of our results is given in Sections 4 and 5. We illustrate the calculation of the parameters A and A for one of the applications. The SONET STS-3c provides an effective capacity of 149.760 Mbits/secI101 for the transport of ATM cells, each of which requires 53 octets. Therefore an STS-3c carries (149.760 x 106)/(53 x 8), which equals 353,207.5 cells/sec. Thus, A equals 1/353207.5 seconds or 2.831 microseconds. If each of the VCCs being multiplexed is a 64 Kbits/sec voice channel, and if each cell has 48 information octets, each voice channel is generating 64000/(48 x 8) = 166.67 cells/sec. This means that each such VCC generates one cell every 1/166.67 seconds, so that A equals 6 milliseconds. The maximum number of VCCs which can be multiplexed in this application is thus N = LA/A] = 2119. Parameter listings for some important applications are provided in the following. Case I (N = 2119): Multiplexing 64 kbps voice channels carried by cells having 48 octets of user information into an STS-3c. The output link can transmit a cell every A = 2.831!psec, i.e., it transmits at a rate of 353,207.5 cells/sec. Each VCC generates a cell every A = 6 msec, corresponding to a rate of 166.67 cells/sec. Case 2 (N = 86): Multiplexing DS1 circuit emulation channels carried by cells having 47 octets of user information into an STS-3c. The output link is as in Case 1, and each VCC generates a cell every A = 243.5 lsec, corresponding to a rate of 4106.4 cells/sec. Case 3 (N = 344): Multiplexing DS1 circuit emulation channels carried by cells having 47 octets of user information into an STS-12c. The output link is four times as fast as in Case 1, so that A = 0.708 jusec, i.e., the output rate is 1,412,830 cells/sec. The VCCs are as in Case 2. Case 4a (N = 2): Multiplexing DS3 circuit emulation channels carried by cells having 47 octets of user information into an STS-3c. The output link is as in Case 1, and each VCC generates a cell every A = 8.405 psec, corresponding to a rate of 118,978.7 cells/sec. Case 4b (N = 3): Multiplexing DS3 circuit emulation channels carried by cells having 48 octets of user information into an STS-3c. The output link is as in Case 1, and each VCC generates a cell every A = 8.584 usec, corresponding to a rate of 116,500.0 cells/sec. 4

TM-TSV-022371 December 4, 1992 Case 5(N = 11): Multiplexing DS3 circuit emulation channels carried by cells having 47 octets of user information into an STS-12c. Each VCC is as in Case 4a, and the output link is as in Case 3. One possible system design is to use a buffer size equal to the maximum number Kmax < N of VCCs being multiplexed. This results in zero cell loss, and the delay suffered by a VCC is at most Kmax A < N A. This maximum delay works out to be less than 10 microseconds for Cases 4a, 4b, and 5. Since this is well within the allowable values in typical applications, engineering the system for these cases is very easy: use a buffer B of size N cells and set Kmax = N. For Cases 1 through 3, however, the maximum number N of VCCs and the worst-case delay NA is relatively large. In order to keep the maximum delay under control, we may choose to utilize only a fraction of the capacity of the output link (i.e., let Kmax be stricly less than N), and choose a buffer size strictly less than Kmax. Thus, we may be willing to tolerate some cell loss in order to reduce the buffer size B, and hence the maximum delay B A. The precise relationship between Cell Loss Probability (CLP) and buffer size is developed in Section 4, and numerical results are given in the appendix. We give the reader a preview of this material by means of the following tables (Tables 1 and 2) for Case 3. Table 1 Buffer size versus cell loss probability for a typical VCC at selected loads for emulated DSls into an STS-12c Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K p CLP = 10-4 CLP = 10-7 CLP = 10- (cells/psec) (cells/psec) (cells/pisec) 275 0.8 18/13 27/20 36/26 292 0.85 21/15 32/23 41/30 309 0.9 26/19 37/27 47/34 326 0.95 32/23 44/32 54/39 344 1.0 40/29 53/38 63/45 Let us examine this table by considering what a particular entry means. The entries of buffer size in cells are, of course, integer values. The entries in microseconds are rounded up to the nearest integer. For the above application, the maximum number N of VCCs which can be multiplexed is 344. If the number of active VCCs is K, the utilization p = KIN. The entries of Table 1 can be interpreted in two ways. The first interpretation is for a situation in which a K cell buffer is available, so that even under the worst arrival pattern every arriving cell always finds a free buffer. The entry for K =275 and a CLP of 10-4 (18 cells / 13 p sec) means that if there are 275 active VCCs and if we choose at random one of these 275 sources, then the probability that the next cell from this source finds at least 18 cells in front of it in the buffer (and therefore has a delay of at least 13 lsec, assuming a first-in first-out discipline) is at most 10-4. The second interpretation is that if the buffer capacity equals 18 cells and the number of active sources equals 275 then the probability that the next cell of a randomly 5

rTM-TSV-022371 December 4, 1992 chosen source finds no free buffer location is at most 10-4. Hence the interpretation: "If N = 344 and K=275 then, for a Cell Loss Probability of l0-4 we need an 18 cell buffer". The second interpretation must be qualified as follows. In most situations in telephone traffic theory, if a source has a Cell Loss Probability of (say) 10-4, that means that in the long run it loses a fraction of 10-4 of its cells. In the case of CBR sources being multiplexed into an ATM multiplexer this interpretation as a cell loss ratio need not hold. Depending on the circumstances (see Sections 4 and 5) it might be necessary to interpret a CLP of 10-4 as follows: a randomly chosen source has a probability 10-4 of losing, in the near future, all its cells, and a probability (1 - 10-4) of losing no cells at all in the near future. "Near future" could mean the next 10 or 100 or even many more intervals A. A proper interpretation of the CLP is very important for good engineering decisions. Section 5 provides further discussion regarding this issue. The other entries in Table 1 have similar interpretations. For example, the entry for K = 344 and a CLP of 10-10 shows that if the buffer is large then the probability that a randomly chosen source incurs a delay greater than 45 pi sec is about 10- 10, while if the buffer capacity is only 63 cells, then a randomly chosen source has a CLP of about 10-10. Table 2 Buffer size versus cell loss probability for the most unfortunate VCC at selected loads for emulated DSls into an STS-12c Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K p CLP = l0- CLP - 0-7 CLP = l0-=~ (cells/pisec) (cells/psec) (cells/jsec) 275 0.8 23/17 32/23 40/29 292 0.85 27/20 37/27 46/33 309 0.9 32/23 43/31 52/37 326 0.95 39/28 50/36 59/42 344 1.0 47/34 59/42 68/49 As just explained, Table 1 contains information about delay and cell loss for a "randomly chosen" or "typical" VCC. Table 2 addresses a different question. As long as exactly K sources are active we know at any point in time that in the next A time units exactly K cells will arrive (one from every source). The entry for K = 275, CLP = 10-4 again has two interpretations. If the buffer capacity is large (at least equal to K cells) it means that the probability that at least one of those next K=275 cells finds at least 23 cells in front of it (i.e., suffers a delay of 17 psec) is about 104. On the other hand, if the actual buffer capacity is 23 cells, the probability that at least one of those K = 275 cells arrives while the buffer is full also is approximately 10 4. For K = 344 and CLP = 10-10 we see that for a 68 cell buffer, the probability that at least one source will have, in the near future, cells arriving at a full buffer is about 10 10. The probability that not a single source will lose any cells in the near future therefore is close to I - 10-10. It is convenient to think of Table 2 as describing delay distribution and cell loss probability not of a "typical" source, but of the "most unfortunate" source. 6

1'T-TSV-022371 December 4, 1992 Comparing Table 2 with Table 1, we note that, for a given utilization and cell loss probability, a larger buffer size is required according to Table 2. Thus, the maximum allowable delay increases slightly when we attempt to provide performance assurances for the most unfortunate VCC rather than for an arbitrarily selected VCC. In our opinion, if buffer sizes less than the number of sources are employed, they should be based on assurances of cell loss probability for the most unfortunate VCC, so that Table 2 is more appropriate for deciding buffer sizes than Table 1. Thus, a buffer size of 23 cells or 17 microseconds is needed for K = 275 sources and a CLP of 104, whereas a buffer size of 68 cells or 49 microseconds is needed for K = 344 sources and a CLP of 10-10. 4. OVERVIEW OF THE THEORY The papers by Bhargava, Humblet and Hluchyj (1989)161, Roberts and Virtamo (1991)181 and Ott and Shanthikumar (1991)[71 all require that A/A be an integer. Henceforth we choose N = [A/ A, replace A by A* =NA < A. (4.1) In the model we assume that each source sends a cell exactly once very A* time units. For the same value of K(t), the model therefore has slightly worse congestion than the original system. In this paper we study the congestion in the multiplexer for the situation in which exactly K sources are active, by assuming that "for a long time" exactly K sources have been active (i.e., K(t) has been constant, equal to K), each sending in a cell exactly once every A* time units. Actually, this is a very mild assumption, since'long time" in this case means a single interval of A* time units. More precisely, it is easily shown that if K(t) < N = A*/A (4.2) for all past values of t, and if the last change in K(t) happened more than A* time units ago, then for an analysis of the current congestion we might as well assume that K(t) has always been equal to the current value K. Throughout this paper we therefore assume that each of the K currently active sources independently chooses a time slot Sk (1 ~ k ~K) uniformly from {1,2, -.. N} and then sends a cell at all time slots Sk + jN, j = 0,1,2,..., where each time slot spans A time units. Clearly, if each of the K active sources chooses a different time slot Sk, then there is no congestion and no cell encounters delay. Due to random clustering, multiple cells may arrive in some slots, which results in the congestion that we analyze here. Let Y(t) be the number of cells arriving during time slot t. The preceding construction shows that 7

TM-TSV-022371 December 4, 1992 Y(t) = Y(t+jN) for j = 0,1,2,.... (4.3) and that given K, for every t 2 1, Y(t), Y(t+l), -, Y(t+N-I) follow a multinomial distribution: P{Y(t) =, Y(t +l)=yl...., Y(t+N -) =YN-1} = (4.4) K! (I YO!Y1!..., YN-I! N as long as yo +y 1 + * +YN -I = K. Quantities of interest are Qt, the number of cells in the buffer at the end of time slot t, and Dkj, the number of cells an arriving cell of source k finds in the buffer in front of it when it arrives in time slot Sk + jN,j = 1,.... Clearly: Qt+ = (Q, + Yt+1 -1)+ (4.5) where (...)+ denotes positive part. Ott and Shanthikumar171 showed that all Qt,t > K, have the same distribution and that also Qt = Qt +jN if t > K,j =0,1,.... (4.6) It is easily seen that also Dkj = Dk, 1 for all j > 1 (4.7) It is useful to think of the number Sk as the "phase" of the source k. The model described above assumes that the phase of each source (and, more important, the phase differences between that source and other sources) is perfectly constant. In reality there will usually be a slow, random, drift in phase differences. Equations (4.6) and (4.7) show that as long as the phases are indeed constant there can be a pronounced source effect: at least one source ko is lucky in its phase differences, and all its cells arrive into an empty system (Dkj = 0 for all j ) but some other source kl may be unlucky and has the same high value Dk,,j, for all j > 1. Thus, for system design purposes, therefore, it may be prudent to work with the entity Dmax, the delay for the most unlucky source, defined in (4.11) below. First we consider the distributions of Qt and Dk,j. If the buffer has sufficiently large capacity (B 2 K) the distribution of Q. for t > K is known: 8

TM-T'SV-022371 December 4, 1992 K-b N-K+b K v(b+j (k-b-v p,{Q > b + } N-b b - (4.8A) v=l N-b b +j N1 N P{Qt = b}=P{Q, > b} - P{Qt > b + 1} (4.8B) Bhargava et. al.[61 apply a well-known level-crossing argument to show that if Q has the distribution of Qt for t > K, then for all j > 1: N P{Dk,j = d} =.- P{Q = d + 1}, (4.9A) P{Dk,j 2 d} = N P{Q > d + 1}. (4.9B) Because of the periodicity of the queue length and delay (see equations (4.6) and (4.7)), we can define the largest value of queue length and the largest delay encountered by a cell as Q max = max Qt = max Qt (4.10) K < t: K + N- I t K and Dmax = max Dkj (independent of j forj > 1), (4.11) respectively. Thus, Dma is the delay seen by the most unfortunate source. It is easily seen that Dmax = Qmax- (4.12) where Dmax is expressed in units of time slots and Qmax is expressed in units of cells. Ott and Shantikumar[71 have an algorithm for the exact computation of the distribution of Qmax which is quite computationally intensive. A good approximation which is easy to compute is given by the inequality P{Qma >b} < N P{Qt = b + } (t > k), (4.13) which follows from the observation that if Qmax > b then necessarily Qt = b + 1 for (at least one) t among N,.N + 1,...., 2N- 1. Letting D be a random variable representing the delay of a typical VCC, the above formulas give information about the distribution of Q,D, and of Qm,, D,,, under the assumption that B > K, which corresponds to a lossless system. These distributions depend on N and K only. As we explain below, however, these formulas can be used to obtain very good approximations to the cell loss probability for a system with buffer size B < K as well. Consider first a system design based on the delay seen by a typical VCC. If B < K, the probability P{D 2 B} as obtained from (4.8A) and (4.9B) is a good approximation for the cell loss probability. This approximation becomes very accurate if the cells loss probability is small. It has the added advantage of being an upper bound on the cell loss 9

11TM-TSV-022371 December 4, 1992 probability, so that using it results in a conservative design. The entries in Table I of Section 3, and in Tables Al, A3, and A5 of the appendix are the smallest value of B for which P{D > B} is less than or equal to the desired cell loss probability (we consider the values 10-4, 10-7, and 10-0). The corresponding buffer size in time units equals B A. Numerical results based on the delay seen by the most unfortunate VCC are obtained similarly. If the buffer size B is less than K, the probability P{Dmax > B} = P{Qmax > B + 1 for a lossless system is again an accurate approximation (in fact, an upper bound) for the probability that there is at least one source (among k = 1,2,....,K) which loses cells due to buffer overflow. The smallest value of B for which this probability is less than a desired value of CLP can now be obtained by a numerical search. The exact distribution of Qman for a lossless system has been obtained by Ott and Shanthikumar[71, but is fairly time-consuming to compute. The upper bound (4.13) for P{Qma, > B + 1 thus becomes especially useful. Firstly, as shown by our numerical results in the appendix (Tables A7 through A9), it is a very good approximation to the exact distribution from the point of view of buffer sizing. Indeed, it is probably preferable to use this upper bound for system design purposes, since it is very easy to compute and leads to a conservative design. Secondly, even if the exact results of Ott and Shanthikumar are employed, the search for the cut-off values of B for a given value of CLP is aided by using the upper bound. An initial estimate, say B, of the cut-off value is obtained by searching over B using the upper bound (4.13). The exact computation is then done for the values B - 1, B - 2,... until the desired CLP is exceeded. This procedure is used to obtain the results in Table 2 of Section 3 (Table A6 of the appendix), as well as in other tables of the appendix (Tables A2 and A4). As seen in the appendix, the cut-off values of B obtained using the bound and the exact result differ by at most 10 cells even for the largest values of N (the discrepancies for smaller N are even less). The equations (4.8A), (4.9B) and (4.13) thus give accurate approximations to the cell loss probability for typical and most unfortunate VCCs, respectively. A desirable feature of these approximations is that, since they are also upper bounds, they result in a slightly conservative design. The above formulas also yield other performance measures of interest. Since the random variable D represents the number of cells found by a "typical" cell in front of it in the buffer upon its arrival, AD represents the time it takes to serve those cells, and therefore represents the waiting time of the arriving cell. Therefore, AD is a very good approximation for the cell delay in the buffer. The appendix also contains a figure displaying the complementary delay distribution P{ A D 2 x} = P{D > x} for a typical VCC in Case 3. Such figures can be used to predict the likely range of delays in the multiplexer and give information about the likely range of Cell Delay Variation encountered by the individual traffic streams, which under certain circumstances may be relevant for engineering of Kmax. The appendix also gives the expected value E{A D} of the delay for the applications listed in Section 3. 10

TMI-TISV-022371 December 4, 1992 5. PERIODICITY AND THE SOURCE EFFECT In (4.6) and (4.7) we saw that the theoretical model studied in section 4 exhibits periodicity: the state of the system at time t is identical to be the state of the system at time t + A* (as long as t > A*). In this section we first further analyze what this means for cell loss and cell delay in the theoretical model, and then explain what this is likely to mean for the actual system. In the theoretical model, this periodicity means that for each source, all its cells (after the first cell) see exactly the same number of cells in front of them in the buffer. This also means that if the policy is to drop a cell whenever it arrives at a full buffer, then every source is either in the position of losing no cells at all, and having identical delays for all its cells, or is in the position of losing all its cells! This asymmetry between different sources is called a "source-effect" and has been studied in Ramaswami and Willinger (1990)[1l It is worth noting that as long as the time slots Sk (see Section 4) in which the cells of different sources arrive are different, no source sees any loss or delay. It is the random clustering (many of the Sk being the same) that causes the delay and blocking studied in this paper. For CBR sources, the real system and the theoretical model share the property that successive cells of one source almost always have identical delays, so that over the short term, variation in delay is small compared with total delay in the multiplexer. However, there are differences between the real system and the model which imply that the conclusions of the previous paragraph, which hold exactly for the model, hold only approximately for the original problem. A minor difference is that the real system is periodic with period A instead of A*. A somewhat more important difference may be that the incoming streams are not quite as perfectly periodic as assumed before, due to jitter before arriving at the multiplexer being studied. The most important difference probably is that in the real system K(t) changes because some VCCs become inactive and some new ones are started. Also, for calls that remain active the phase difference changes (slowly) because in all likelihood their original clocks are less than perfect. The result is that while in the model a source either loses no cells at all or loses all of them, in the actual ATM multiplexer a source usually sees no loss at all, but it may occasionally (rarely) get in the position of losing, for a significant amount of time, all or many of its cells. This problem can be alleviated by buffer management strategies as described by Ramaswami and Willinger11ll, or of course by providing a buffer capacity of Kmax cells. 6. MULTI-STAGE MULTIPLEXERS An interesting extension of the work reported in this TM is when a number of CBR streams are multiplexed in several stages, where the output of a number of stage 1 multiplexers is the input into a stage 2 multiplexer, etc. In this situation, on the one hand, each cell goes through a number of multiplexers, each potentially contributing to that cell's delay. On the other hand, each multiplexer smooths the cell stream it handles, thus decreasing delay in later stages. It has been shown that if all multiplexers have the same output linespeed, equal to the linespeed of the final stage, the expected value of the total delay of a cell is equal to that in a single giant (one-stage) multiplexer with the 11

TM-TSV-022371 December 4, 1992 same output linespeed. However, since each stage of the multiplexer smooths the cell streams, the variation in the delay is conjectured to be less than for a single-stage multiplexer. Examining these issues is part of ongoing research. A different, perhaps more realistic, example is the situation where first each of four streams of up to 86 DS-1 streams is multiplexed into an STS-3c (case 2 in section 3), and then the 4 STS-3c pipes are multiplexed into an STS-12c. This two-stage multiplexer emulates case 3 in section 3. In this situation the output linespceds of the four stage-I multiplexers are one fourth of the linespeed of the final (stage-2) multiplexer, so that the delay in the aggregate aggregate multiplexer is not close to that in a single stage multiplexer as case 3, section 3. Information on the cell delay variation due to the first stage can be obtained from Tables A3 and A4. For instance, at a utilization of 1.0, with probability at least (1- 10-10), the delay for the most unfortunate VCC is at most 34 cells or 97 microseconds. It is easily seen that in this situation the maximum delay of any cell in the second stage is at most 4 cells or 4 x 0.708 = 2.83 microseconds. Since the second stage has a high output linespeed and is used to multiplex a very small number of streams, the dominant component of the delay is seen to be due to the first stage. It is easy to see, then, that for the two-stage multiplexer, at a utilization of 1.0, with probability at least (1 - 10-10), the total delay for the most unfortunate VCC is at most 34 + 4 = 38 cells, or 97 + 2.831 = 99.83 microseconds. The corresponding delay for a single-stage multiplexing of 344 DS-1 streams into an STS-12c is 68 cells, or 49 microseconds (see Table 2 of Section 3). Thus, while multiplexing in two stages reduces the delay in terms of cells, the actual delay in microseconds is less for a single-stage multiplexer in this example. 7. CONCLUSIONS We have considered the congestion-related delay that can occur when a number of CBR cell streams are multiplexed into a single transmission link or a pool of fixed transmission capacity. The information provided here quantifies this congestion-related delay for a number of significant applications. For all of the applications considered here, this delay is substantially less than 250 microseconds with probability I - 10-' as long as the utilization factor is 0.85 or less. This is the limit proposed in a recent Bellcore Technical Advisoryl for the cell delay variation generated at a single Broadband Switching System. However, it is important to note that in most of the applications considered here (all except Case I of Section 3), with probability I - 10-10, the delay is less than 100 microseconds even at a utilization of 1.0. It is also worth repeating the observation in Section 5 that the variation in delay for cells in a given CBR stream is likely to be much less than the actual value of the delay incurred by these cells. Our results quantify the variation in delay between cell streams. Clearly, the results in this paper can also be used to obtain (possibly overly conservative) engineering rules for ATM multiplexers carrying peak rate allocated Variable Bit Rate (VBR) cell streams. As noted in the introduction, the delay analysis for VBR streams which are not peak rate allocated requires techniques very different from those used in this paper. 12

lTM-TSV-022371 December 4, 1992 8. ACKNOWLEDGEMENTS The authors thank Gerr McTigue for his programming assistance in generating some of the numerical results presented here. The authors also acknowledge valuable discussions with Alan Tedesco and Steve Walters. 13

TM-TSV-022371 December 4, 1992 APPENDIX Tables Al through A6 list buffer sizes corresponding to various values of cell loss probability for Cases I through 3, based on the delay for a typical VCC as well as for the most unfortunate VCC. Tables Al and A2 correspond to Case 1, Tables A3 and A4 to Case 2, and Tables A5 and A6 to Case 3 (Tables A5 and A6 are identical to Tables 1 and 2 in Section 3, and are reproduced here for the reader's convenience). These tables are obtained in the manner described in Section 4. Note that the buffer size in microseconds is rounded up to the nearest integer in these tables. Next, we provide in Tables A7 through A9 a comparison of the buffer sizes obtained using the upper bound (4.13) and using the exact distribution71. We conclude that the easily computable bound is adequate for most system design purposes. We then compute the expected values of the delays incurred by a typical VCC for each of the six applications of interest. The values are of the order of one microsecond or less for Cases 4a, 4b, and 5, and are therefore not listed. The average delays for Cases 1 through 3 are listed, for various utilizations, in Table A10. Finally, we plot the complementary distribution of the delay seen by a typical VCC in one of the applications (Case 3) in Figure Al, which gives an idea of the likely range of delays encountered in the multiplexer. Table Al Buffer size versus cell loss probability for a typical VCC at selected loads for 64 kbps voice channels into an STS-3c (N = 2119) Number of VCCs Utilization uffer size fo Buffer size for Buffer size for K | p CLP = 10-4 CLP = 10- CLP = 10[_ [ _(cells/lsec) (cells/psec) (cells/' sec) 1695 0.8 21/60 35/100 49/139 1801 0.85 27/77 45/128 61/173 1907 0.9 38/108 61/173 81/230 2013 0.95 59/168 87/247 111/315 2119 1.0 99/281 131/371 156/442 14

'I'M-TSV-022371 December 4, 1992 Table A2 Buffer size versus cell loss probability for the most unfortunate VCC at selected loads for 64 kbps voice channels into an STS-3c (N = 2119) Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K p CLP = 10-4 CLP = 10-' CLP =10(cells/psec) (cells/psec) (cells /psec) 1695 0.8 30/85 45/128 58/165 1801 0.85 40/114 57/162 73/207 1907 0.9 53/151 74/210 94/267 2013 0.95 76/216 102/289 125/354 2119 1.0 118/335 147/417 171/485 Table A3 Buffer size versus cell loss probability for a typical VCC at selected loads for emulated DSIs into an STS-3c (N = 86) Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K p CLP = 10-4 CLP = 10- CLP = 10-1 (cells/psec) (cells/psec) (cells/psec) 68 0.8 13/37 18/51 23/66 73 0.85 14/40 20/57 25/71 77 0.9 16/46 22/63 27/77 81 0.95 18/51 24/68 29/83 86 1.0 20/57 26/74 31/88 Table A4 Buffer size versus cell loss probability for the most unfortunate VCC at selected loads for emulated DS s into an STS-3c (N = 86) Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K p CLP = 10-4 CLP = 10- CLP = 10-"1 (cells/psec) (cells/psec) (cells/psec) 68 0.8 15/43 20/57 25/71 73 0.85 17/49 23/66 27/77 77 0.9 19/54 24/68 29/83 81 0.95 21/60 26/74 31/88 [86 1.0 23/66 29/83 34/97 15

TM-TSV-022371 December 4, 1992 Table A5 Buffer size versus cell loss probability for a typical VCC at selected loads for emulated DSls into an STS-12c (N = 344) Number of VCCs Utilization Buffer size for Buffer size for Buffer size for [K | p | CLP = 10-4 CLP 10 -7 CLP I=0(cellsi/psec) (cells/ Usec) (cells psec) 275 0.8 18/13 27/20 36/26 292 0.85 21/15 32/23 41/30 309 0.9 26/19 37/27 47/34 326 0.95 32/23 44/32 54/39 344 1.0 40/29 53/38 63/45 Table A6 Buffer size versus cell loss probability for the most unfortunate VCC at selected loads for emulated DSls into an STS-12c (N = 344) Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K p CLP = 10-4 CLP = 10- CLP 10(cells/psec) (cells/lsec) (cells/i sec) 275 0.8 23/17 32/23 40/29 292 0.85 27/20 37/27 46/33 309 0.9 32/23 43/31 52/37 326 0.95 39/28 50/36 59/42 344 1.0 47/34 59/42 68/49 The following tables, Tables A7 through A9, compare the buffer sizes (expressed in cells) obtained for given values of CLP for the most unfortunate VCC using the bound (4.13) and the exact algorithm of Ott and Shanthikumar71. The first item in each entry corresponds to the exact result and is identical to the first item in the entries of Tables A2, A4, and A6, respectively. The second item corresponds to the bound (4.13). For a given value of N, the agreement is seen to improve for smaller CLP and smaller K. The agreement gets somewhat worse as N increases, but is excellent for all the cases considered, even for N = 2119. Thus, directly using the bound for system design is a very attractive option. Table A10 lists the expected value of the delay E{D} for a typical VCC at different utilizations for Cases 1 through 3. The first item in each entry is the average delay in cells, and the second is the average delay in microseconds (both rounded up to the first decimal place). The average delay in all these applications is seen to be small, as expected for CBR sources. In particular, it is much smaller than the buffer sizes calculated earlier (Tables Al through A9), which correspond to very small tail probabilities. Finally, for some applications, more detailed information about the distribution of the typical delay D may be desired. As an example, we consider Case 3 (N = 344) at a utilization of 0.85 (K = 292), and plot in Figure Al P[D ~ d] versus d (expressed in both cells and microseconds) for a typical VCC. Since P[D 2 B] approximates the loss probability, we can read off from the figure the smallest buffer size such that this probability is at most equal to some desired values. We indicate in the 16

TM-TSV-022371 December 4, 1992 Table A7 Comparison of buffer size (in cells) obtained using bound and exact result for the CLP of the most unfortunate VCC for 64 kbps voice channels into an STS-3c (N = 2119) Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K p CLP = 10- CLP = 107 CLP = 10(exact/bound) (exact/bound) (exact bound) 1695 0.8 30/34 45/48 58/61 1801 0.85 40/44 57/60 73/76 1907 0.9 53/59 74/79 94 98 2013 0.95 76/85 102/109 125/130 2119 1.0 118/128 147/154 171/177 Table A8 Comparison of buffer size (in cells) obtained using bound and exact result for the CLP of the most unfortunate VCC for emulated DSls into an STS-3c (N = 86) Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K CLP = 104 CLP = 10-7 CLP = 10-~" (exact/bound) (exact/bound) (exact'bound) 68 0.8 15/16 20/21 25/ 25 73 0.85 17/18 23/23 27/27 77 0.9 19/20 24/25 29/29 81 0.95 21/21 26/27 31/31 86 1.0 23/24 29/30 34/34 Table A9 Comparison of buffer size (in cells) obtained using bound and exact result for the CLP of the most unfortunate VCC for emulated DSIs into an STS-12c (N = 344) Number of VCCs Utilization Buffer size for Buffer size for Buffer size for K p CLP 10-4 CLP = 10- CLP = 10- (exact/bound) (exact/bound) (exact/bound) 275 0.8 23/25 32/33 40/41 292 0.85 27/29 37/39 46/47 309 0.9 32/34 43/45 52/53 326 0.95 39/41 50/52 59/61 344 1.0 47/50 59/61 68/70 figure the buffer sizes corresponding to the desired probabilities considered in Table A5. For instance, the probability 10-4 corresponds to a buffer size of just over 20 cells (which gives rise to the integer entry 21 cells in Table A5). Similarly, the entries corresponding to 10-7 and 10-10 are 32 cells and 41 cells, respectively. 17

'I'M-'TSV-022371 December 4, 1992 Table A10 Average delay for various cases at selected utilizations Utilization Average Delay (cells/ psec) p Case I (N = 2119) Case 2 (N = 86) Case 3 (N = 344) 0.8 2.0/5.6 1.5/4.3 1.9/1.4 0.85 2.8/7.9 2.0/5.7 2.5/1.8 0.9 4.3/12.2 2.6/7.3 3.6/2.6 0.95 8.2/23.3 3.5/9.7 5.7/4.0 1.0 28.2/79.8 5.2/14.6- 11.0/7.8 1 OeO 10e-210e-4................ 10e-6 a. 10 e-8 - 10e-1 - 10e- 124- 1 Oe-1 4 (cells) 0 10 20 30 40 50 (microseconds) 0 7.1 14.2 21.2 28.3 35.4 Delay d Figure A I Complementary distribution P [D 2d ] of the delay for a typical VCC at utilization 0.85 for emulated DSls into an STS-12c (Case 3. N = 344) 18

TM-'ISV-022371 December 4, 1992 REFERENCES 1. Bellcore, "Broadband Switching System Generic Requirements," Document Number TA-NWT-001110, Issue 1, August 1992. 2. CCITT Draft Recommendation I.35B, "B-ISDN ATM Layer Cell Transfer Performance," available in Report of Working Party XVIII' 6 (Performance - Part I, Temporary Document Number TD 74 (XVIII), Geneva, June 1992. 3. CCITT Recommendation 1.371,'Traffic Control and Congestion Control in BISDN," Temporary Document Number TD 69 (XVIII), Geneva, June 1992. 4. CCITT Study Group XVIII, "Recommendations Drafted by Working Party XVIII/8 (General B-ISDN Aspects) to be Approved in 1990," Document Number COM XVIII-R 34-E, Geneva, June 1990. 5. Wai Chen and C. Anthony Cooper, "An Approach for B-ISDN Resource Allocation and Traffic Measurement," Proceedings of the Second Bellcore Symposium on Performance Modeling, Document Number SR-TSV-002424, November 1992. 6. Amit Bhargava, Pierre Humblet and Michael G. Hluchyj, "Queueing Analysis of Continuous Bit-Stream Transport Packet Networks," Proc. IEEE Globecom'89, pp. 25.6.1-25.6.5. 7. Teunis J. Ott and J. George Shanthikumar, "On a Buffer Problem for Packctized Voice with an N-Periodic Strongly Interchangeable Input Process," Journal of Applied Probability, Volume 28, pp 630-646, 1991. 8. James W. Roberts and Jorma T. Virtamo, "The Superposition of Periodic Cell Arrival Streams in an ATM Multiplexer," IEEE Trans. on Communications, Volume 39, Number 2, pp. 298-303, February 1991. 9. D. Anick, D. Mitra and M. M. Sondhi, "Stochastic Theory of a Data-Handling System with Multiple Sources," Bell System Tech. Journal, Volume 61, No. 8, pp. 1871-1894, October 1982. 10. Draft American National Standard for Telecommunications, "Broadband ISDN User-Network Interfaces: Rates and Formats Specifications," American National Standards Institute, Document Number TIS1/92-310, 1992. 11. V. Ramaswami and W. Willinger, "Efficient Traffic Performance Strategies for Packet Multiplexers," Computer Networks and ISDN Systems, Vol. 20, pp. 401-407, 1990. 19

TM-TSV-022371 December 4, 1992 20