THE UNIVERSITY OF MICHIGAN COMPUTING RESEARCH LABORATORY1 MEMORY INTERFERENCE MODELS WITH VARIABLE CONNECTION TIME T.N. Mudge and Humoud B. AI-Sadoun CRL-TR-16-84 MARCH 1984 Room 1079, East Engineering Building Ann Arbor, Michigan 48109 USA Tel: (313) 763-8000 1This work was supported in part by National Science Foundation under Grant MCS-8009315. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies.

MEMORY INTERFERENCE MODELS WITH VARIABLE CONNECTION TIME by T. N. Mudge and Humoud B. Al-Sadoun Computing Research Lab Department of Electrical and Computer Engineering The University of Michigan Ann Arbor, MI 48109 313/764-0203 Abstract This paper develops two discrete time models of the interference that occurs during memory access in multiprocessor systems. These models, the equivalent rate model and the Markov chain model, provide for variable connection times between processors and memories if these times can be characterized by a discrete random variable, X, with a probability mass function f(i). Neither model requires a complete description of f(i). The equivalent rate model, which is the simpler, requires only the first moment, while the Markov chain model requires the first and second moments. The models yield estimates of the bandwidth, BW, and related measures such as the probability that a memory request is accepted, Pa and processor utilization, U.. A brief summary of earlier discrete time models is included, and it is shown that one of them is a proper special case of the Markov chain model. Comparisons with simulations show that both models give good estimates of BW when the coefficient of variation, C,, of X is small. When C, reaches 2.0 the Markov chain model still shows an error of less than 4% while the equivalent rate model exhibits a 50% error that, unlike the Markov chain model, continues to increase with increase in C,. Finally, it is shown that BWY drops significantly with increase in C,, suggesting that processor-memory transfers should use a fixed block size if memory conflict is to be minimized. Index terms —Memory Interference, Multiprocessors, Memory Bandwidth, Performance Evaluation, Markov Chains. This work was supported in part by National Science Foundation under Grant MCS-8009315

1. Introduction This paper develops two discrete time models of the memory interference that occurs during memory access in a multiprocessor system. These models, termed the equivalent rate (ER) model and the Markov chain (MC) model, are based on a set of assumptions that characterize the memory access behavior of a multiprocessor system as a stochastic process. Figure 1 illustrates the type of system dealt with by the models. It shows a synchronous multiprocessor having N processors and M memory modules. The processors share the memory modules through an NX M crossbar interconnection network. The whole system is synchronized with a system clock whose period is referred to as "the system cycle." Previous discrete time memory interference models assume that once a connection is established between a processor and a memory module it is maintained for a fixed number of system cycles (usually one). The models developed in this paper consider the case where the connection may be maintained for a variable number of system cycles. In particular, the connection time is represented by a discrete random variable, X, with a probability mass function f(i) 1. This extension allows the modeling of processor-memory transactions that consist of variable length packets. Those cases where the processor-memory connection time is a fixed number of system cycles will be referred to as fixed connection time (FCT) systems. While, those that allow variations in the number of system cycles will be referred to as variable connection time (VCT) systems. The literature contains a number of memory interference models for FCT systems (see [11 -[111). In these studies system operation is approximated by a stochastic process as follows. At the beginning of the system cycle a processor selects a memory module at random and makes a request to access that module with probability r (~ 1). If more than one request is made to the same memory module, it will choose one at random; the other processors will retry in the next cycle. A processor has at most one pending request waiting for access at any time. The behavior of the processors are considered to be independent but statistically identical. A processor that obtains a connection to a memory module at the beginning of the system cycle will release that 1 In other words, f(s) = PrIX= i*. TC: Mudge & AI-Sadoun M'arch 1984

module at the end of the cycle. A model for the FCT system described above results in a Markov chain having an unmanageably large state space, see [11 and [41. One of the main themes of the work in 111-1111 is to develop models which avoid this complexity while maintaining reasonable agreement with simulation results. This is done by further simplifying the assumptions of the system behavior. The models develop equations for the memory bandwidth, BW, the probability that a memory request is accepted, Pa, and in some cases processor utilization, Up. In [101 a classification has been proposed for these models according to the approach used in their formulation. The classes are probabilistic models, rate-adjusted probabilistic models, queueing system models and steady state flow models. Since we make use of some of these models, and to provide further background, the classification is summarized briefly below. The first class-probabilistic models-simplifies the analysis by assuming that a memory request from a processor will be discarded if it is denied memory access as a result of memory interference. At the next system cycle a new and unrelated request will be made by the processor X-BAR CONNECTION Proc. Mem N M Figure 1. Block diagram of a multiprocessor system. TC: Mudge & AI-Sadoun March 1984

with probability r. Examples are reported in [21, 131, 171-191, [111 and 1121. In this class of models the bandwidth is given by: BW M[1 - ( -r/M)N] (1) The second class-rate-adjusted probabilistic models-is proposed in 15] and [61. This class simplifies the analysis by assuming that a processor which is denied memory access will make a new request, not necessarily to the same memory module, with probability 1.O in the next cycle; this assumption tends to give an upper bound for the memory bandwidth. The bandwidth is obtained by calculating the adjusted rate, a, from the following two equations using iteration: Pa = - 1 (1 a/M) ] (2a) And: a 1 + Pa (r - (2b) The bandwidth is then obtained from: BW =M [1- ( 1-alM)N] (2c) Notice the "rate," r, in equation (1) is replaced by the "adjusted rate," a. This takes into account the fact that, in general, more than one request is made before the connection is established, i.e., O< r <a1.O. The third class-queueing system models-views the memory modules as service stations and processor requests as customers. In [4] a binomial approximation is used to solve these queueing systems. The assumption in this class is that arrivals to the queue are binomially distributed. The resulting bandwidth is given by the following equation: BW 1= - ) (3a) Where, L, the approximate mean queue length is given by: TC: Mudge & AI-Sadoun March 1984

2 N-1 t N J The expression for L is obtained by approximating the length of the queue seen by an arriving customer to 1 - 4 )L. This linear approximation usually gives good result for highly utilized system but works poorly for system with low r. An improvement on this approximation is given in [71 for the case r = 1.0. The improvement relies on a decomposition approximation suggested in [41. A fourth class-steady state flow models-is proposed in 110i. The idea behind this proposal is that the flow of active processors' requests to the memory modules will equal the flow of satisfied requests from the memory modules. An active processor is a processor that does not have a pending request. The bandwidth obtained from the flow model is given by the following two equations: BW = N Up r (4a) And: BW = M 1 - - 1 - - 1 (4b) M The above equations can be solved by iteration. In [101 it is shown that the steady state flow model has a smaller maximum error and a smaller average mean-square error than the other models over the whole range of r, i.e., O< r < 1. In addition to the above discrete time models, continuous time models have been proposed. These are typified by the models in [131 and [14], in which the the memory-processor connection time is denoted by an exponentially distributed random variable and the processor interrequest time is also denoted by an exponentially distributed random variable. Therefore, unlike the discrete time FCT models, these models can accommodate variable connection times provided the TC: Mudge & AI-Sadoun March 1984

times can be approximated by an exponentially distributed random variable2. This is an important step in modeling VCT systems. However, continuous time models are usually less accurate than discrete time models when discrete time events are being modeled [161, [171. ( The trade-off being that the continuous time models use the memoryless property of the exponential distribution to simplify model development.) Furthermore, the restriction of approximating connection times as exponentially distributed random variables does not allow one to gauge accurately or to gain insight into the effect of situations where this approximation does not hold. As stated earlier, this paper develops two discrete time VCT models. The connection time is modeled as a discrete random variable, X, having a probability mass function f(i). There are no restrictions on f(i); for example, f(i) need not be a geometric distribution (the discrete time analogue of an exponential distribution). This freedom to specify f(i) is important, because, as we will show, the memory interference behavior is highly dependent on, C, the coefficient of variation 3 of the random variable X. In particular, it is shown that increasing Cv reduces, BW, P, and Up. Thus, for example, caching schemes that employ variable block size transfers will experience greater memory conflict than schemes employing fixed size blocks, all other things being equal. The paper is organized as follows: Section 2 will describe the assumptions that characterize system operation; Section 3 will define the performance measures that can be obtain from the models; Section 4 will develop the two analytic models, i.e., the ER and MC models, which are used to approximate the performance measures of VCT systems; Section 5 will present the simulation results and compare them to the models' results; Section 6 will present some concluding remarks. 2 These models also considered the case, not considered here, of multiple-bus interconnections rather than a crossbar. Until recently no simple discrete time model existed for this [15]. v C - standard dvaluetio of X /- - where X and X are the first and second moments of 1() respectively. TC: Mudge & AI-Sadoun Msarch 1984

7 2. The System Operation Assumptions A processor in the system depicted in Figure 1 can be in any of three states: thinking, when it has no outstanding request to memory (it might be performing local processing); accessing, when it is connected to a memory module; and waiting, when it has a pending memory request waiting to be serviced. The memory module can be in one of two states: busy, when it is being accessed by a processor, and idle, when it is not being accessed. The following assumptions further characterize the operation of the system: I. Processors' requests for memory form independent, statistically identical stochastic processes. II. At the beginning of a system cycle a processor in the thinking state or that has just com. pleted a memory access makes a request to access a memory module with probability r. III. If more than one processor issues a request to a particular memory module, that memory will choose one at random to get the connection. The other processors will retry to the same memory module in the next cycle. IV. The requests originating from the same processor are independent of each others. In each case a memory module will be chosen at random from the M memory modules with equal probability, i.e., with probability 1/M. V. The connection time, in units of system cycles, between a processor and a memory module is determined by a discrete random variable X, which has a probability mass function f(i). Assumptions I through IV are common to all the discrete FCT models mentioned earlier. Empirical evidence to support their adoption can be found in references [131-[51 among others. Assumption V is the key difference between the FCT and VCT models; thus, the models developed in this paper depend on the extent to which connection times can be approximated by a random variable, X, having a probability mass function f(i). From assumption V it can be seen that the FCT case is just special case of the VCT case where X = 1 with probability one. In order to derive numerical information from the memory interference models developed later, the values of M, N, r and the first two moments of f(i) must be obtained through measurement or, if it is considered satisfactory, by hypothesis. These quantities can be regarded as the inputs to the models. TC: Mudge & AI-Sadoun March 1984

3. Performance Measures The performance measures that will be derived from our VCT models are: memory bandwidth, BW, the probability that a memory request is accepted, Pa, and processor utilization, Up. These quantities can be regarded as the outputs of the models. The memory bandwidth is defined as the expected number of busy memory modules seen by a random arrival after the system reaches its steady state. Equivalently, it can be defined as the expected number of accessing processors seen by a random arrival after the system reaches its steady state. It can be shown that the Markov chain which describes the behavior of a system defined by assumptions I through V is ergodic. This implies that the limiting probability of a state equals the probability that a random arrival sees the system in that state, see [181 and [191. Thus, the memory bandwidth can be expressed as follows: BW = lim E Pr[ processor i is accessing at time t j Or i =1 Since the processors are independent and statistically identical (assumption I), the above equation can be restated as follows: BW = N lim Pr[ a processor is accessing at time t ] t-.oo The probability that a memory request is accepted, Pa, is defined as follows: Pa = Pr[a proc. obtains access to a memory I the proc. requests that memory Finally, processor utilization, Up, can be defined as the fraction of time a processor spends thinking or accessing a memory after the system has reached steady state. However, system ergodicity also implies that the limiting probability of a state equals the fraction of time the system spends in that state when it has reached steady state. Thus, Up can be expressed as follows: Up - 1- lim Pr[ a processor is waiting at time t | t- oo TC: Mudge & AI-Sadoun March 1984

4. Memory Interference Models 4.1. Equivalent Rate Model (ER Model) This model is a modification of an approach first presented in 1201 to study the behavior of multiprocessor systems in which each processor has a private cache memory. The processormemory connections were assumed to be a fixed number of system cycles-the time needed to transfer one line. An FCT probabilistic model (equation (1)) was used to compute BW, and an equivalent value for r (the equivalent rate) was derived from the fixed number of system cycles needed to transfer a line and the probability of requesting a line transfer. In our ER model the foregoing approach is retained; the main differences are, firstly, that the equivalent rate is modified to take into account the fact that processor-memory transfers take a variable number of system cycles and, secondly, that the probabilistic model of equation (1) is replaced by the steady state flow model of equation (4) because it produces a smaller error. The ER model can be regarded as a technique for mapping a VCT system into an equivalent FCT model by appropriately defining an equivalent value for r in the FCT model. The derivation of the equivalent rate, rt,,, proceeds as follows. The average connection time in units of system cycles can be expressed by: x= Zif(i) i 1 i.e., the first moment of the random variable X (S is the maximum connection time). From assumption II the thinking time is geometrically distributed. Therefore, if T is the average thinking time, it follows that: 1- r r The equivalent rate can now be defined by: X+ rea = _.,_ This is the fraction of time a processor in a VCT system is accessing memory assuming no interference. Thus, it can be viewed as the request rate of an equivalent FCT system (the request TC: Mudge & AI-Sadoun March 1984

10 rate is obtained from an interference free trace of processor activity [5]). As will be shown, the Eit model captures the average behavior of VCT systems only if C, is zero. It introduces inaccuracies in those cases where there is randomness in the connection time, because it does not take into account the second moment of the connection time. However, it cannot be improved upon if the only information about f(i) is its first moment. 4.2. Markov Chain Model (MC Model) The Markov chain which models a VCT system according to the assumptions outlined in Section 2 has an unmanageably large state space, see [11 and 14]. The MC model dramatically reduces the size of this state space by making a further simplifying assumption. The Markov chain in the MC model describes the behavior of one processor. Within the framework of the assumptions this is sufficient to describe the complete system behavior because all processors are assumed independent and statistically identical (assumption I). The essence of the simplification is similar to that used in the rate adjusted probabilistic models (see [5] and 161); when a processor is blocked in an attempt to access a memory module after placing a request, it enters a series of waiting states for the residual service time of that module. The residual service time (RST) of a memory module is the time remaining before the currently accessing processor releases the memory module. The model assumes that the processor waits for the RST before placing a new and independent request with a probability of 1.0. The request is directed to any memory module independent of the particular memory it was previously blocked at. This assumption, iIIa, is a relaxation of assumption III which requires that resubmission be to the same memory. It simplifies the Markov chain but usually causes an overestimate of the memory bandwidth of the VCT system being modeled. The Markov chain, shown in Figure 2, defines a processor's behavior if assumption IlIa is used. When a processor is in state i it is accessing a memory module and it needs i more cycles before releasing the connection. After one cycle in state i the processor always moves to state i-i, indicating it needs i-1 more cycles. Thus, there is a single transition from state i (1 < i < S) to state i-i with a probability of 1.0. The set of states {i} are accessing states. When a processor is TC: Mudge &: Al-Sadoun M-arch 1984

11 STATE STAT... TATE' 1 STATE $ ATATE Z Figure 2. The Markov chain for a VCT processor (assumptlon ma). in state 0 it is in the thinking state and it may be performing local processing. When a processor is in state i it had a memory request blocked and must wait i cycles before it can resubmit the request. After one cycle in state i the processor always moves to state i-i, indicating it needs i-i more cycles. Thus, there is a single transition from state i ( 1 < i < S) to state i-i with a probability of 1.0. The set of states ( i ) are waiting states. The transition probabilities ai, fi, ii and.8i are defined by the following equations: 1- r i=O Cai = Pr[a request is made and accepted and needs i cycles] 1 < i < S =B - Pr[ a request is made to a memory that has an RST of i cyclesl 1 < i < S a' = Pr[ a request is accepted and needs i cycles] = - 1 < i < S r Bl = Pr[ a memory has an RST of i cyclesla request was made to itl = - 1 < i < S r If the processor is in the thinking state, i.e., state 0, one of three possibilities can occur at the beginning of the next system cycle: the processor continues in the thinking state with probability ao (=1-r); the processor accesses a memory module for the next i cycles with probability TC: Mudge & AI-Sadoun March 1984

12 ai, i.e., the processor enters state i; or the processor is blocked for the next i cycles with probability fs, i.e., the processor enters state i. The same three possibilities can occur if the processor is about to end its connection period at the beginning of the next system cycle, i.e., when the processor is in state 1. On the other hand, if the processor is at the end of its waiting period, i.e., it is in state 1, and since assumption IIIa requires the resubmission of blocked requests with a probability of one, only one of two possibilities can occur at the beginning of the next system cycle: the processor gains access to a memory module for the next i cycles with probability 5j, i.e., the processor enters state i; or the processor is again blocked and must wait for the next i cycles with probability BI, i.e., the processor enters state i. There is no transition from state 1 to the thinking state, i.e., state 0, because of assumption IIIa. The following definition will be useful in the remainder of this section: R A Pr[ a request is made at the beginning of any system cycle] r(Pl+Po) + Pi where Pi is the limiting probability for state i. One important distinction should be noted: R is the probability that a processor makes a request at the beginning of any system cycle, while r (<R) is the probability that a processor makes a request at the beginning of any system cycle given that the processor is thinking or has just finished accessing (assumption II). 4.2.1. FCT Models as a Special Case of the MC Model If the MC model is to have validity it should agree closely an FCT model when X is deterministic with probability mass function defined as: 1 if i- 1 f(i) W 0 otherwise In this subsection it is shown that an FCT model, the rate-adjusted probabilistic model, is a proper special case of the MC model. The Markov chain for the case, where the probability mass function is given above, is shown in Figure 3. In order to develop expression for the transition probabilities it is first necessary to develop an expression for, P,, the probability that a request is accepted (see definition in Section TC: Mudge & AI-Sadoun MFach 1984

13 a91 Figure 3. The Markov chain for an FCT processor (assumption IHa). 3). The development proceeds as follows. The probability that a request is made at the beginning of a system cycle is given by R, and the probability that this request is directed to a specific memory is R/M (assumption IV). It follows that the probability that the request is not directed to the memory is ( 1 - R/M), and further, that the probability that no request is directed to the memory from any processor is (1 - RIM) N. Thus, the probability that at least one processor makes a request for the memory is given by 1 - (1 - RIM) N. The expected number of bu'sy memories is then given by M[1 - ( 1- R/M) N]. However, the expected number of request made by all of the processors is NR, therefore the fraction that are accepted, i.e., Pa, is given by: P" = N R [_-(Ipf ()5) The transition probabilities can now be expressed as follows: TC: Mudge & AI-Sadoun March 1984

14 ao - 1 - r a rP4 f(1)= rP, = r - ax = r(1t-P) r Y1 a= Pa B1 = 7 - l - Pa The limiting probabilities for the states of Figure 3 satisfy the following equations: P1 = al Po + al Pi + s Pi Po-= aoPo + a0oP Pl = l1 Po + Pl1 P + hPi And: P1 + Po + P1 = 1 These simultaneous equations together with the earlier definition for R yield the following equation: 1 + Pa ( l/r - 1) (6) Equations (5) and (6) can be solved using iteration. From the definition of BW in Section 3 it can be shown that: BW = NP1 = M[1-(1-R/M)N] (7) Similarly, it can be shown that: = -P = I - (1-Pa )R Equations (5), (6) and (7) are isomorphic to equations (2a, b, c) with a replaced by R as the adjusted rate. Thus, the Markov chain shown in Figure 3 is equivalent to the rate-adjusted probabilistic model of 151, and thus the rate-adjusted probabilistic model is a proper special case of the MC model. Before concluding this subsection, it is worth noting that a Markov chain model, similar to the MC model of Figure 2, can be developed which assumes that the processor discards the unaccepted request altogether and, after waiting for the RST of the memory module, makes a new request with a probability of r. This assumption, IIIb, is another relaxation of assumption III that TC: Mudge & AI-Sadoun Mgarch 1q84

15 also simplifies the model, but is less realistic than IIIa. The Markov chain describing the processor's behavior under assumption IIIb is shown in Figure 4. Unlike the MC model of Figure 2 there is a transition from state 1 to the thinking state. A solution to the Markov chain of Figure 4 can be found in a similar fashion to that presented later for the MC model. The similarity between the two models becomes apparent as r -+ 1. They are identical when r = 1.0. Assump. tion IIIb is used by the probabilistic models developed in [2J, 13), 17]-J9] and 111]. For the case where the probability mass function is again given by: 1 if i = f(i) 0 otherwise Figure 5 results. Under assumption IIIb the expression for R changes, since resubmission in state PT is now made with probability r rather than I.0. Therefore: R = r(Pl + Po + Pi) = r And: TATE S ATE 1TE O STATE 0 i STATE ) ao Figure 4. The Markov chain for a VCT processor (assumption IlIb). TC: Mudge & AlISadoun March 1q84

The branching probabilities for the Markov chain shown in Figure 5 are as follows: ao 1- r; r P; 31 P- r- ar The limiting probabilities are: P1 = a1; Po - ao; PY - = l And the memory bandwidth for the system can be expressed as: BW = NP1 = M[1 - ( 1 -r/) N] This equation agrees with the bandwidth equation of the most simple memory interference model —the probabilistic models of equation (1). STATE 1 STATE 0 1STATE 1 ) '1 Figure 5. The Markov chain for an FCT processor (assumption IIb). TC: Mudge & AI-Sadoun March 1984

17 4.2.2. General MC Model In this subsection the equations for the general MC model are developed. The following definitions will be used: B A Pr[ a memory is busy at the end of a system cycle] = Pr[ a processor is still accessing at the end of a system cycle] S = Pi i 2 Bi A Pr[processor i requests a memory that is busy] N -= Priprocessor j is still accessing a memory at the end of a system cycle] N B (N-1) B =1M M It follows from assumption I that B, is independent of i, thus we can define: B Bi 1 < i < N From these definitions the following terms can be developed. These terms are used to derive the transition probabilities of the MC model. They are presented here by way of explanation: (1 - Pa )f(i) = Pr[ a memory request is blocked and the proc. which obtains the memory and blocks the request needs i cycles] 1 - B' = Pr[ a processor requests a memory that is idle] B 1 = Pr[a memory seen by a proc. has an RST of i cycles the memory is busyl The transition probabilities for the Markov chain of Figure 2 can now be expressed as follows: =co = 1 - r a, = r P, (1 - B' )f(i) 1 < i < S = r[B +(1-' - )(1Pa )f(i] 1 < i < S-1 = r(l- B' )(1-Pj)f(S) ai = r l-is =he Iimiting probabiliisfo1<i<S The lmng proailitisfr te a b TC: Mudge & Al-Sadoun March 1984

18 P;I= ( J - 1 < i <S Since the above 2S+1 equations add to one, it follows that: r o=l l-ao i=R From the definitions of B', B and equation (8), B can be expressed as follows: B= ~BiS = B: E'Pi [= (/ 1) I < -- Substituting in the above equation for at from equation (8) allows B to be expressed as the following function of R: (X - 1)Pa R B - N o-1 (11) 1 + (X- 1)P, R (11 Substituting equations (8) and (9) into (10) results in the following equation: 1 N-1 BI [X+ (1/r - 1)IP + M (N-1Pa R By defining the second moment of the connection time distribution in the normal way, i.e., --- i2f(i), the above equation can be written: 1 I_ (N1 -1B [X + (NIr - ] (12) The equations for P, (equation (5)), for B (equation (11)) and the above equation form the MC model. They can be solved by iteration. In the experiments reported in the next section a fixedpoint iteration on the value of R was used. Solution typicaly required 4 iterations with a maximum of 8) when an initial value of = r was used. Higher order iterations schemes could be TC: Mudge & AI-Sadoun Starch 1984

used but were found unnecessary within the scope of our work. The value of Pa falls out directly from the above solution method. The value for memory bandwidth, BW, and processor utilization, Up, can be calculated from the following: BW = N(P1+B) S Up -=1-PP; i = 1 These equations follow from the definitions of Section 3. It can be seen from equation (12) that R depends, among other things, on the inverse of XT. Therefore, it follows from the above equations, that both BW and Up depend on the inverse of kn Furthermore, it can also be seen from equation (12) that R is independent of S. Thus the underlying Markov chain of the MC model need not be finite. TC: Mudge & AI-Sadoun March 1984

20 5. Simulation Results A simulation, written in SIMSCRIPT 11.5, of VCT systems operating according to the assumptions given in Section 2 was run for different probability mass functions, f(i), and different values of r. The results were compared to results calculated from the ER model and the MC model. The comparisons, which are shown in Figures 6, 7 and 8, were made for three systems sized 8X8, 16X16 and 32X32 respectively. The figures compare BW. The %Error shown in the figures was defined as: %Error = Model BW - Simulated BW 100 Simulated BW Six different distributions for the connection time were used. All the distributions had the same expected value, X, but their coefficient of variation, C, ranged from 0.0 (i.e. fixed connect tion time) to 2.0. As can be seen from the figures, both models gave results within 4% of the simulation for small C,. The MC model remained within this error bound, but the ER model showed a monotonic increase in error with increase in C,. In the case of C, - 2.0 the error was as large as 50%. Similar results are obtained if Pa or Up are compared. The poor performance of the ER model, which continues to worsen as C, increases beyond C, = 2.0, confirms the importance of using the second moment of the connection time distribution in calculating BW, see equation (12). The error in the MC model is due to assumption IlIa being used in make of assumption III. As noted, the simulation works according to the assumptions of Section 2, in particular assumption IIl. The key difference between assumption IIIa and III is that in IIIa blocked requests for memory need not be resubmitted to the memory from which they were previously blocked. This is clearly unrealistic, but our experimental evidence indicates that the effect on the quantities BW, Pa and Up is quite small —less than 4% error in all cases. Furthermore, as mentioned earlier, the relaxation of assumption III to that of IIIa makes possible a manageable model. Finally, the error is comparable with the empirical evidence reported in earlier work [3]-[51 that led to the assumptions of Section 2 as a phenomenological basis for the behavior of a large class of multiprocessors of the the type shown in Figure 1. TC: Mudge & Al-Sadoun March 1984

21 % ERROR:MC M'dl ERROR MC Model 20 2 0 ER Mo1el ER Model A 15 15 ~~~~~~10 _~~~~10 0 _ =r 0 r.0 0.2 0.4 0.6 0.8 1.0.0 0.2 0.4 0.6 0.8 1.0 (4 c - o (b) C, -0.2 ' ERROR MC Model ERROR e 20 r o 20 M ER Model.ER Model I ~KER Model 5 L 15 - 10 - 1. r0 r, - -Z. r 0 I [ {rt J0 0.2 0.' 4 (.6:.!.C.0 0.2 0.4 0.6 0.8 1.0 kC, 0.4(d) C C, o.s S ERROR %MC Model 20 MC Motel 60 ER Modxel Mods Mel 15L -r 40 10 20 5 ~- J 0 ~.~...~.. ~~ f~~~~~~rT.'..........".......: ' "~......'..... - ~ ~....... 0... O. B..:_ 0 r -20 r.0 0.2 0.4 0.6 0.8 1.0.0 0.2 0.4 0.6 0.8 1.0 () C, - 1.0 () C, - 2.0 Figure 8. Comparisons with simulation results for an 8x8 system. TC: Mudge & AI-Sadoun March 1984

22 % ERROR C Model % ERROR 20r 0 20r MC 1odel 0 Mt Modl. ER Model 15 15 4 10 10 5 5 0 0 r 0 r.0 0.2 0.4 0.6 0.8 1.0.0 0.2 0.4 0.6 0.8 1.0 (a) C, 0.0 (b) C, 0.2 % ERROR ERROR MC Model % ERROR 20 0 20 Modil ER Model 4 EER Modiel 515 -5 5 0 r 0 1.0 0.2 0.4 0.6 0.8 1.0.0 0.2 0.4 0.6 0.8 1.0 (e) C, - 0.4 (H) C,- O MC Modtel %b ER:ROR CM MC Model % ERROR 20 60 R odela ER Model 15 40 - 10 0 s5 0 0 p............ 0.....a..... i............ 0 r -20! I ' I I r.0 0.2 0.4 0.6 0.8 1.0.0 0.2 0.4 0.6 0.8 1.0 (e) C,- 1.0 (t) c, - 2.0 Figure 7. Comparisons with simulation results for a lx18 system. TC: Mudge & AI-Sadoun March 1984

23 % ERROR 20 rMeMC A6I: %' ERROR 0 a20 -od 1 nHiIC5ER Model 10 10 S L; k 5 5 0 I 0.0 0.2 0.4 0.6 0.8 1.0 0. 2 0.4 0.6 0.8 1.0.0 0.2 0.4 0.6 0.8 1.0 (a) c, o u (b) C, 0.2 % ERROR X ERROR 20r MCMod.1 20C 1 0 - ER Molel 15 1- Mod5l 10 10 5 5 -0 r 0 r.0 0.2 0.4 0.6 0.8 1.0.0 0.2 0.4 0.6 0.8 1.0 (c) C. a o.4 (d) C, o.8 X ERROR MC Model % ERROR Model r0 a60 ER Model E1R Modal 20 ( 0 40 15 - 10 -20 5 -.0 0.2 0.4 0.6 0.8 1.0.0 0.2 0.4 0.6 0.8 1.0 () C, 20 Figure 8. Comparisons with simulatlon results for a 32x32 system. TC: Mudge & AI-Sadoun March 1984

24 Figure 9 shows explicitly how BW varies with C, while X is held constant. As can be seen, by just varying C, from 0 to 2.0 the BW can drop by as much as 40% for high request rates (rl). In fact, in the case of 32X32 system with C -= 2.0 the BW drops to the point where only 13 of the 32 memory modules are busy even where r = 1. This agrees with the interpretation, based on equation (12), that was made at the end of the last section where it was concluded that BW would decrease if X' (or Cv ) increased. The most obvious consequence of BW depending on C, in this way is that transfers between processors and memories should be restricted to fixed blocks, or nearly fixed blocks in which variations in size are rare, if maximum BW is to be achieved. TC: Mudge & AI-Sadoun lM.~ch 10984

25 Be C, - 0.0 6 0o C, mi.0 4 C, = 2.0 2.0 0.2 0.4 0.6 0.8 1.0 r (a) 8 x 8 System BW C, = 0.0 10 0 C, = 0.4 8 C, - 1.0 C, =2.0 4 2.0 0.2 0.4 0.6 0.8 1.0 (b) 16 x 16 System BW C, =0.0 20 - o C, =0.4 15 C, 1.0 0.0 0.2 0.4 0.6 0.8 1.0;c) 32 x 32 System Figure 9. The results of varying Ct on BW. TC: Mudge & AI-Sadoun March 1984

6. Conclusion This paper has developed two discrete time models of the memory interference that occurs during memory access in a multiprocessor system when that access can have a variable duration. The first of these models, the ER model, is the simpler model and, according to comparisons with simulations, provides accurate estimates of the values for P,, BW and Up, if C, is small. The second of these, the MC model, is the more complex model but, according to comparisons with simulations, provides accurate estimates of the values for P, BW and Up-, for a wide range of C,. The ER model requires the values of M, N, r and X as inputs. The MC model requires, in addition, the value of n. The explicit dependence of the MC model on g (and hence C ) can be observed in equation (12). This was confirmed empirically; specifically, it was shown that BmW decreases with increase in C,. The fact that the second moment is an important feature of memory interference should not be completely unexpected as the behavior of similar systems, e.g., networks of queues, also depend on the variance of underlying stochastic processes (in the case of queues, it is the variances of the inter-arrival time and service time). There are a number of possibilities for future research including the following two fairly straightforward ones. The first would study the effect of relaxing assumption IV so that systems in which each processor has a preferred memory module, or in which a subset of tlhe memory modules are used as common memory (perhaps as mailboxes), can be studied in a manner analogous to earlier FCT studies in [5j, [91 and [11i. The second would attempt a synthesis of the techniques presented here with the discrete time models of multiple-bus systems developed in [151. Analytical comparisons of circuit switched versus packet switched buses would then be possible. Acknowledgments: The authors gratefully acknowledge comments and suggestions made by B. A. Makrucki. TC: Mudge & AI-Sadoun Sarch 1984

27 7. References Ilj C. E. Skinner and J. R. Asher, "Effects of Storage Contention on System Performance," IBM Systems Journal, vol. 8, no. 4, pp. 319-333, 1969. [21 W. D. Strecker, in Analysis of the Instruction Ezecution Rate in Certain Computer Structures Ph.D. Thesis, Carnegie-Mellon Univ., 1970. [3] D. P. Bhandarkar, "Analysis of Memory Interference in Multiprocessors," IEEE Trans. on Computers, vol. C-24, no. 9, pp. 897-908, September 1975. [41 F. Baskett and A. J. Smith, "Interference in Multiprocessor Computer Systems with Interleaved Memory," Comm. of ACM, vol. 19, no. 6, pp. 327-334, June 1976. [5] C. H. Hoogendoorn, "A General Model for Memory Interference in Multiprocessors," IEEE Trans. on Computers, vol. C-26, no. 10, pp. 998-1005, October 1977. [6] J. S. Emer and E. S. Davidson, "Control Store Organization for Multiple Stream Pipelined Processors," in Proc. 1978 Int'l Conf. on Parallel Processing, pp. 43-48, August 1978. [71 B. R. Rau, "Interleaved Memory Bandwidth in a Model of a Multiprocessors," IEEE Trans. on Computers, vol. C-28, no. 9, pp. 678-681, September 1979. [8] J. H. Patel, "Processor-Memory Interconnections for Multiprocessors," IEEE Trans. on Computers, vol. C-30, no. 10, pp. 771-780, October 1981. [9) T. N. Mudge and B. A. Makrucki, "Probabilistic Analysis of a Crossbar Switch," in Proc. IEEE 9th Ann. Symp. on Computer Architecture, pp. 311-319, April 1982. 1101 D. W. L. Yen, J. H. Patel, and E. S. Davidson, "Memory Interference in Synchronous Multiprocessor Systems," IEEE Trans. on Computers, vol. C-31, no. 11, pp. 1116-1121, November 1982. [11] L. N. Bhuyan and C. W. Lee, "An Interference Analysis of Interconnection Networks," in Proc. 1988 Int'l Conf. on Parallel Processing, pp. 2-9, August 1983. [12] F. A. Briggs and E. S. Davidson, "Organization of Semiconductor Memories for ParallelPipelined Processors," IEEE Trans. on Computers, vol. C-26, no. 2, pp. 162-169, February 1977. [13) M. A. Marsan and M. Gerla, "Markov Models for Multiple Bus Multiprocessor Systems," IEEE Trans. on Computers, vol. C-31, no. 3, pp. 239-248, March 1982. [14) I. H. Onyuksel and K. B. Irani, "A Markov Queueing Network Model for Performance Evaluation of Bus-Deficient Multiprocessor Systems," in Proc. 1988 Int'l Conf. on Parallel Processing, pp. 437-439, August 1983. TC: Mudge & AI-Sadoun March 1984

UNIVERSITY OF MICHIGAN lllLIIIIIlI [[III tlY1111iI11I i lll~ll[lI [28 3 9015 03483 1191 [15] T. N. Mudge, J. P. Hayes, G. D. Buzzard, and D, C. Winsor, "Analysis of Multiple-Bus Interconnection Networks," Computing Research Lab Report # CRL-TR-12-84, February 1984. 116] S. H. Fuller, "Performance Evaluation," in Introduction to Computer Architecture, H. S. Stone, Ed S.R.A., Inc., pp. 474-546, 1975. [17) D. P. Bhandarkar and S. H. Fuller, "Markov Chain Models for Analyzing Memory Interference in Multiprocessor Computer Systems," in Proc. 1st Ann. Symp. Computer Architecture, pp. 1-6, December 1973. [18] D. P. Heyman and M. J. Sobel, in Stochastic Models in Operations Research, vol. 1 McGraw-Hill Book Co., 1982. [19 E. Cinlar, in Introduction to Stochastic Processes. Englewood Cliffs, N.J.: Prentice-Hall Inc., 1975. [20) J. H. Patel, "Analysis of Multiprocessors with Private Cache Memories," IEEE Trans. on Computers, vol. C-31, no. 4, pp. 296-304, April 1982. TC: Mudge & AI-Sadoun M.rwch 1984