A STUDY OF INFORMATION FLOW IN MULTIPLE-COMPUTER AND MULTIPLE-CONSOLE DATA PROCESSING SYSTEMS K. B. Irani A. W. Naylor J. D. Foley et al The University of Michigan This document is subject to special export controls and each transmittal to foreign governments or foreign nationals may be made only with prior approval of RADC (EMIRA), GAFB, N. Y. 13440.

FOREWORD This final technical report was prepared by K. B. Irani, A. W. Naylor, J. D. Foley, J. W. Boyse, D. M. Coleman, D. R. Doll, V. L. Wallace, and A. M. Woolf of the University of Michigan, Systems Engineering Laboratory, Ann Arbor, Michigan 48104. This is annual report No. 3 under Contract AF 30(602)-3953, Project 5581, Task 02. The Rome Air Development Center Project Engineer was Lawrence W. Odell, EMIRA. The distribution of this report is restricted by the U.S. Mutual Security Acts. This report has been reviewed and is approved. Approved: LAWRENCE W. ODELL Project Engineer Approved: A. E. STOLL, Colonel, USAF Chief, Intel & Info Processing Division FOR THE COMMANDER:', I RVING J E LMAN Chief, Advanced Studies Group.ii

ABSTRACT This report documents the achievements from January 1968 to March 1969 of continuing research into the application of Queueing Theory and Markov Decision Processes to the design and investigation of multiple-computer, multiple-user systems. A summary of the theoretical investigation conducted, the major conclusions reached, and some typical applications are included. iii/iv

TABLE OF CONTENTS 1. INTRODUCTION 1 1. 1 Contract Objective 1 1. 2 Contract Requirements 1 1. 3 Progress Toward Contract Objectives 3 2. NUMERICAL QUEUEING THEORY AND THE ANALYSIS OF SYSTEM MODELS 5 2. 1 Recursive Queue Analyzer 5 2. 2 The Numerical Solution of Infinite Markov Chains 6 2. 3 Relative Performance Analysis of the Selected Computers and Associated Bulk Storage Devices 23 2. 4 Analysis of Proposed On-Line System 36 2. 5 Analysis of A Candidate CRT Support System 57 3. OPTIMUM DESIGN TECHNIQUES 63 3. 1 Design of Optimum Telecommunications Networks 63 3. 2 Design of Optimum Display Systems 77 3. 3 Design of Optimum Memory Systems 94 4. MEASUREMENTS 105 4. 1 General MTS Data 105 4. 2 MTS Display Console Data 143 APPENDIX Ao THE DEGREE OF MULTIPROGRAMMING IN PAGEON-DEMAND SYSTEMS 163 v

LIST OF FIGURES Figure Page 2-1 Illustrating a Model of a Multiple-Access System 8 2-2 State Space 13 2-3 Graphs of Boundary Leaving QBD-processes 19 2-4 Storage Configurations 27 2-5 Percent of Time Storage Waits for CPU in Beta 31 2-6 Available Execution Cycles 32 2-7 Performance Relative to Beta with Disk Only Storage System: Disk-Drum 33 2-8 Performance Relative to Beta with Disk Only Storage System: 2 Mbs Dual Channel Drum 34 2-9 Performance Realtive to Beta with Disk Only Storage System: 6 Mbs Dual Channel Drum 35 2-10 Proposed System Diagram 37 2-11 Model of Proposed System 38 2-12 RQA Analysis of Proposed System 46 2-13 RQA Analysis of Proposed System 47 2-14 Response Time 50 2-15 CPU Idle Fraction from RQA and GPSS 51 3-1 Point to Point Configuration 66 3-2 Multipoint Configuration 66 3-3 Single Stage Message Switching Center 68 3-4 Queueing Model for Single Stage Message Switching Center 69 3-5 Equivalent Structural Model for Multiplexor Concentrator 71 vi

LIST OF FIGURES (Continued) Figure Page 3-6 Comparison of MSC and MX Concentrators 73 3-7 Comparison of MSC and MX Concentrators 74 3-8 Display System 79 3-9 Minimum Response Times, Text Editing 88 3-10 Minimum Response Times, Two-Dimensional Drawing 89 3-11 Minimumn Response Times, Three-Dimensional Drawing 90 3-12 Minimum Response Times, Network Analysis 91 3-13 Best and Worst Average Response Times 93 3-14 Storage System Cost and Access Time 97 4-1 MTS Data 109 4-2 MTS CPU Intervals 111 4-3 Requested CPU Intervals 113 4-4 Page Request Delays 115 4-5 Overall I/O Wait Distribution 118 4-6 MTS Ready Intervals 119 4-7 Disk I/O Relays 121 4-8 Disk Pack Queueing Delays 123 4-9 Terminal I/O Waits 124 4-10 Basic Simulation Data 132 4-11 Basic Simulation Data 133 4-12 Advanced Simulation Data 134 vii

LIST OF FIGURES (Continued) Figure Page 4-13 Advanced Simulation Data 135 4-14 Modified Simulation Data 136 4-15 Modified Simulation Data 137 4-16 Modified Simulation Data 138 4-17 Modified Simulation Data 139 4-18 Modified Simulation Data 140 4-19 Modified Simulation Data 141 4-20 Modified Simulation Data 142 4-21 Summary Data for Remote Display Terminal 152 4-22 Think Time Distribution 153 4-23 Response Time Distribution 154 4-24 Response Time Distribution 155 4-25 Distribution of Total CPU Time per Response Period 156 4-26 Distribution of Total CPU Time per Response Period 157 4-27 Distribution of CPU Processing Interval Times during Response Periods 158 4-28 Distribution of Output Line Lengths 159 4-29 Distribution of Number of CPU Intervals per Response Period 160 1 The Simplified System 166 2 Schematic of the Model 167 3 Analysis Results 173 viii

LIST OF FIGURES (Continued Figure Page 4 Analysis Results 174 5 The Effect of Multiprogramming when Np = 2N 178 6 Optimum Degree of Multiprogramming 179 7 Optimum Degree of Multiprogramming 180 ix

LIST OF TABLES Table Page 2.1 States of Figure 1-1, when m=2, and m'=1 10 2. 2 The Transition Intensity Matrix of the Model of Figure 2-1 11 2.3 Computer Specifications 25 2. 4 Bulk Storage Device Specifications 26 2. 5 Flow of Control by Job Type 42 2. 6 GPSS and RQA Parameter Values 52 4.1 Data Gathered for IBM 2250 Display Console Used for Graphics 146 4. 2 Data Gathered for IBM 2250 Display Console Used as a Teletype 147 4. 3 Data Gathered for Random Teletype Users 148 4. 4 Data Gathered for Remote Display Terminal 150 4. 5 Comparison of Statistics 151 x

EVALUATION This final report documents the last fourteen months of progress on a three year effort. Earlier progress was documented in RADC TR-67-345 dated July 1967 and RADC TR-68-57 dated March 1968. The work was performed under contract AF30(602)-3953 between the Rome Air Development Center, Griffiss AFB NY and The University of Michigan, Systems Engineering Laboratory, Ann Arbor, Michigan. The contract formed a portion of overall efforts to develop advanced data processing techniques and concepts under Air Force Project Number 5581. This effort was undertaken to explore and-develop better techniques for analyzing the performance, control, and structuring of complex data. processing systems. Emphasis has been placed on developing and testing improved theories and modeling tools permiting rapid and accurate analysis of system designs. These developments have then been applied to various system design aspects in order to verify the accuracy of the models and to predict system performance. Of particular significance in this area has been the refinement and extension of the earlier Recursive Queue Analyzer (RQA) program (a computational tool based on Queueing and Markovian Modeling theories). This program has recently been shown to provide acceptable analytical solutions at rates significantly faster than standard "simulation' techniques. Additional work has been completed toward extending the program's capability for processing extremely complex models which are beyond the range of present analysis techniques. (See also RADC TR 69-134). Major progress has also been made toward the formulation of basic design guidelines and principles which can be used by designers of large-scale, multiple-user computer systems. These principles are being developed through the analysis of models of various system aspects in order to determine optimum or nearoptimum configuration and performance algorithms. Initial results in this area have been published in RADC TR-69-132. This work shows significant promise for improving the efficiency and effectiveness of future large-scale system design efforts and makes steps toward moving system's design from the realm of "art" to a refined'"science". Of equal value is the promise of more costeffective initial system implementations by applying these analytical techniques to proposed designs prior to actual development or production commitments. By these means, many alternate approaches and design trade-offs can be evaluated much more rapidly and cheaper than with present "built-it and try-it" or xi

"full-scale simulation" techniques. Additional effort to continue research in these areas is being supported by RADC under Contract F30602-69-C-0214. LAWRENCE W. ODELL Project Engineer Reconnaissance Engineering Section (EMIRA) Rome Air Development Center, Griffiss AFB NY xii

1. INTRODUCTION 1.1 CONTRACT OBJECTIVE The objective of this contract is to develop analytical techniques to assist system planners, designers, and evaluators in their determinations of the gross structure, control rules, and performance characteristics of multiple-computer, multiple-user systems, and to apply the techniques to large computer systems shared on a real-time basis by a large number of users, through consoles. 1. 2 CONTRACT REQUIREMENTS a. Continued exploration of Queueing Theory and Markov Decision Theory to improve their capability in determining scheduling and performance parameters for general on-line computer system problems. b. Development of improved computer programs for carrying out analysis and optimization of system models, based on procedures resulting from the study of Markov Decision Theory and Queueing Theory. c. Utilization of the computer programs and theoretical investigations to develop new conclusions and rules which can be used by persons performing initial design of real-time computer systems having a large number of user consoles. Such rules will allow system designers to more rapidly choose the type of hardware/ 1

software system needed to fit a particular organization or problem. d. Utilization of the computer programs to complete a queueing theory model of actual or proposed systems which will be used to project system performance and operating characteristics. The Markov Decision Technique will be applied to determine optimum scheduling and priority structures for such systems, and determine any system operating "bottlenecks" which are antic ipated. e. Final analysis of The University of Michigan Computing Center data, collected under contract AF 30(602)-2661, and use of the final conclusions of that analysis to analyze data from the other systems (when it is available). These data and conclusions will be used to provide parameters and validity-checks to other system models. f. Statistical data will be gathered, where possible, from a number of running systems to gain a better understanding of user demand and processing structures and to confirm theoretical conclusions. The data from idem "d" above will also be used for this purpose. g. A search for other theoretical approaches to the analysis, control and design of multiple computer systems will be pursued. 2

1. 3 PROGRESS TOWARD CONTRACT OBJECTIVES Progress has continued in both the theoretical and empirical areas. Of particular interest is the completion of three long-range projects concerning systems analysis and optimization. The first of these projects has successfully laid the theoretical basis for extending the concepts of numerical queueing analysis to infinite state Markov chains. This work is reported briefly here in Section 2. 2, and in detail in a separate report [ 3 ]. The second project has resulted in a telecommunications network design procedure which is less restrictive and yields more nearly optimum results than previous methods. This work is described in Section 3. 1 and in a report [ 1 ]. The third project concerns the optimum design of computer driven display systems. This is described briefly in Section 3. 2 and in detail in a report 2 i. Other shorter-term projects have been completed: Chapter 2 reports on conversion of our numerical queueing analysis program to a new computer, and the analysis of several proposed information storage and retrieval systems. Chapter 4 reports on data collected from The University of Michigan's time-sharing system. The data is analyzed, and some useful conclusions are drawn. In addition to the completed long-range projects, there is one which is not yet completed. This concerns the optimal design of large memory systems, and is discussed in Section 3. 3. 3

Finally, Appendix A presents a soon to be published paper which discusses multiprogramming in page-on-demand systems. REFERENCES, CHAPTER 1 1. Doll, Dixon R., "The Efficient Utilization of Resources in Centralized Computer.-Communication Network Design, T SEL Technical Report No. 36, The University of Michigan, Systems Engineering Laboratory, April 1969. 2. Foley, James D., "Optimum Design of Computer Driven Display Systems, " SEL Technical Report No. 34, Systems Engineering Labcratory, The University of Michigan, March 1969. 3. Wallace, Victor L., "The Solution of Quasi Birth and Death Processes Arising from Multiple Access Computer Systems, " SEL Technical Report No. 35, Systems Engineering Laboratory, The University of Michigan, March 1969. 4

2. NUMERICAL QUEUEING THEORY AND THE ANALYSIS OF SYSTEM MODELS Our past work has led to the development of numerical techniques for the analysis of complex queueing systems. The programs implementing these techniques have been converted to the University's new IBM 360/67 computer, as is described in Section 2. 1. Unfortunately, these techniques are not adequate to study certain types of large computer systems. Section 2. 2 reports on the extension of numerical analysis techniques to more (but not all) large computer systems. The remaining three sections report on the analysis of several proposed systems and system models. 2.1 RECURSIVE QUEUE ANALYZER The Recursive Queue Analyzer, or RQA [ 4], has been reprogrammed to run on the University's IBM 360/67 computer. Because of the greater computational capabilities of this new computer (as contrasted to the old IBM 7090), the new RQA can do more in less time than the old version. The most important consideration here is that the 360 version of RQA can handle system models having up to 32, 000 states, as opposed to 5, 000 states in the old version. Interestingly enough, the limit of 32, 000 states can be easily increased even further! 5

2.2 THE NUMERICAL SOLUTION OF INFINITE MARKOV CHAINS 22. 2.1 Introduction Markovian models have found considerable application to the art of computer system design for two significant reasons. First, they have been applied because there is a sizable theory upon which to draw, and many powerful theoretical results which are available. Second, computer systems have come to a stage of evolution where they are predominant "hightraffic" systems and, as a consequence, the modeling and estimation of statistical performance has become a crucial aspect of their design. However, there are limitations to the models of this type which are capable of analysis. The models which must be solved are often too complex for classical queueing theory, and often have too many states for modern numerical techniques [ 4 ]. The precision required over a wide range of parameter variations often rules out simulation as too costly. The study of multiple-access computer systems and the so-called "computer utilities" are particularly difficult in this regard. Furthermore, as these systems grow larger and more complex (through servicing more user circuits, the introduction of multiprocessing, the use of satellite computers and data-concentrators, etc. ) this situation can be expected to become worse. Complexity and largeness of state spaces will be the rule. Yet these are the very systems for which analysis of Markov chain models can be of greatest use to the system architect. Thus, new techniques are urgently needed. 6

A typical example of such a model will serve to make this need more concrete. Consider a multiple-access computer system with two thousand common-car'rier trunk lines leading to a bank of fast processors operating on a large, multiprogrammed, paged main memory. Suppose that the memory is demand paged with a large bulk-core "virtual" memory. Suppose, finally, that we wish to study the relationship between the expected number of queued unserviced requests for computation and, say, the size of main core memory. A queueing model which could be used for this study is illustrated diagrammatically in Figure 2-1, where circles represent queues and squares represent service-holding. This model anticipates that congestion in the channel -to the bulk-core will be a major problem (see Appendix A), and so it details the paging operations in a manner similar to the model of Appendix A. However, arriving tasks in this case are queued in a "wait queue" which can, potentially, assume very large values. (Typically, for a system of this size it could contain an average of several hundred requests. ) Because of this, the state space is very large and has been modeled as infinite, a consequence of assuming that the wait queue can assume any nonnegative value. A detailed description of this model is not vital to this discussion. The important thing is the form of the state space and the transition intensity matrix from which the equilibrium probabilities and expectations are to be calculated. 7

Q. Q.o _ Ij) a s0 0 I E ~-| - mer I r D3~~~~~~ (\| M(1)~~~~~a I ^1 rC)~~~~~~~~~~~~~r-4 0 0 0Z0 g kJ" 8

rTn mlrak thinrc rQ cimn e cQ nfchcQli fr-'r thic illiicotrtirn ax7n lot g -th a

Table 2. 1 States of Figure 1-1, when m=2, and m'=1 nO n1 n2 n3 n" n1 n2 n3 n" n1 n2 n3 1. 0 0 0 2 10. 1 0 0 2 16. 2 0 0 2 2. 0 1 1 11. 1 0 1 1 17. 2 0 1 1 3. 0 0 2 0 12. 1 0 2 0 18. 2 0 2 0 4. 0 1 0 0 13. 1 1 0 0 19. 2 1 0 0 5. 0 1 0 1 14. 1 1 0 1 20. 2 1 0 1 6. 0 1 1 0 15. 1 1 0 21. 2 1 1 0 7. 0 0 0 0 22. etc. 8. 0 0 0 1 10

Table 2. 2 i! I I i I i I I I I O I'~~~~~~~~~!, I I. I ----- —----- ------— 1 —-- ---------------— *- ---------------— _ - _ —-----------— L —------------ 0 -0 O O 0 0 -' I IjjI O0 0 00 0 0 Ii) jo o o o o o 0I I 0 z O O 10 i *0;b 1~ < ~I ~ ~' 0 0 "0 0 a I00 000 K* c000 00 -------------------------- ----------------- - r —--------------- ----------------— r —------------ 0r jooooo o a: o X, o o-aO o o I IK * 0!00 O o~ o N OIO 00 00oo 0o ------------------ - ------------------------------------ ----------------'0 h < O o O O O!X gO C O O O3 O, o o~ ~', O o o < o o o o o o 5o r ix o o r l o o o o o o Q! -4 ~ oo o o o o o 0 o o o o o00 00 0 000 0 O O N O O0 0 0' 0 0 0 0 0 0 0 I I 00000 0 000o o, o'o o 0 o 0 o 0, 04 1 0 c'J 0 00 0 0 00 00,0 r0 00000 S~tij io o o o o o'o o o oo o oooo, U a I:N 0000 000000000:o o o I i N a-.0 0 0 0 0 0 000001. a = o Qi o I I I I I _ I _ ________

2. 2. 2 The QPD Process Let m0 and m be positive integers, with mo > m. (2.1) Let N = {N(t), t > 0} be a continuous parameter Markov chain with stationary transition probability functions continuous at t=0, and with state space f, where 0 f = {<0, 1>, <0, 2>,..., <O, mo>, <1,1>, <1, 2>,...., <1, m>, <2, 1>,.... <2, m>,,... }. (2. 2) The process N is called a QBD process if its transition intensities {uj; j, k e, satisfy the properties: (a) If j = <ij jj2> and k = <k1, k2 > are two states for which jl-kl > 2, then u jk= 0. (b) If j = <jl' j2> and k = <k. k2> are two states for which Ij -kl <1, if(j + kl) >3, and if j' <j-l, j2> and k' = <kl-l, k2>, then ujk = ujk. A continuous-parameter Markov chain will also be called a QBD process if a mere one-one mapping of its states results in a QBD process. However, no loss in generality will result from consideration of only state spaces of the form (2. 2), represented by the set 71. The state space IL is more graphically described as a two-dimensional integer space whose first dimension is countably infinite, whose second dimension is finite, and which has the configuration of Figure 2-2. 12

States in the set = {<0, 1>, <1, 2>,..., <0, m >} will be called boundary states and the set o0 will be called the boundary of N. n2 * e, ~ ~ ~ 2*m o o o 0 0 * *4 *' 9' * 0 0 0 0 32 ~ ~ o o * * * ni Figure 2-2 State Space If the lexicographic ordering of the states used in equation (2. 2) is preserved, then the conditions (a) and (b) are equivalent to a specification that the transition intensity matrix U = {Ujk; j, k e R} is an infinite matrix which can nevertheless be displayed in the partitioned form A E A A C B A 0 U= C B A (2.3) 0 13

where the submatrix E is an m0 x m matrix; where A, B, and C are 0 0 A A m x mmatrices; and where A and C are m0 x m and m x m matrices, 0 o0 respectively, and are further partitioned into the forms A A A = (2. 4) and A c = C 03 (2.5) Many QBD processes whose transition intensity matrices do not have the appearance of Equation (2. 3) can nevertheless be identified as QBD processes by simply regrouping the states and reidentifying them. When m = m = 1, the QBD process is readily recognized to be a simple birth and death process. The "QBD" in the name "QBD process" is an acronym for the somewhat barbaric, though apt descriptor "quasi birth and death", in recognition of this fact. (The word "quasi" is frequently used in matrix theory in describing matrix forms created by replacing scalar elements by submatrices, as in "quasi-diagonal" and "quasi-triangular" matrices. ) In order to calculate an equilibrium distribution of a QBD process N, reasoning reminiscent of that commonly followed in solving birth and death processes can be followed. This reasoning and the calculation technique that results is essentially the same, in outline, as that proposed by Evans [ 1]. 14

An equilibrium distribution is a sequence 7T = {7rk, k e % } of nonnegative real numbers which satisfies the equation 7r 2, = 0 and whose components have unit sum. If we let 7r = [fo 0,1' 02''.' ] be a partitioning of that sequence for which 00 is an m -component row vector, and 01, 02,..., are m-component row-vectors, then the equation 71 t = 0 is equivalent to the equations A %OE + 01C = O (2. 6a) A 50A + 1B + 42C = O (2.6b) n-1A + nB + n+1C = 0 n=2, 3,.... (2. 6c) Now suppose that there is a solution [%o k1, 2'... ] to Equations (2. 6) of the form A 01 = o0R (2. 7a) ~n = Rn-1 n=2, 3,..., (2. 7b) A where R is an m x m matrix, where R is an m0 x m matrix partitioned to the form A R R=, (2.8) and where 00 is some row-vector whose components are not all zero. Then it develops that, by substitution of Equations (2. 7) in Equation (2. 6), the matrix R and the vector ~0 must satisfy the equations 15

A A f0(E + RC) = O (2. 9a) %0(A + RB + R RC) = O (2. 9b) A 2 2 ORRn (A +RB + RC) = 0. (2. 9c) Furthermore, since a solution of Equations (2. 6) where %0, 1' 02,...'' have only nonnegative components is sought, 00 and R must guarantee that nonnegativity as well. If we can show that such c0 and R exist, and yield a solution iT which is an equilibrium distribution, then the assumption of Equations (2. 7) will be vindicated. The Equations (2. 9) suggest a sufficient condition for such a solution to be found. This condition is that a matrix R exists having the properties: (a) All entries of R are nonnegative. AA (b) E + RC is singular, and has a nonnegative, nonzero vector ~ in its null-space (so that Equation (2. 9a) is satisfied). (c) All entries of A + RB + R C are zero. (This latter property is reminiscent of the classical characteristic roots of a difference equation, while the first two are reminiscent of the application of boundary conditions to difference equation solutions.) It is our purpose to show sufficient conditions that a matrix R does exist having the properties (a), (b) and (c). Remarkably, those conditions are very general, and include almost all interesting QBD processes. Under those conditions, an equilibrium distribution can be calculated by the four steps 16

(1) Determine a matrix R having properties (a), (b) and (c). (2) Determine a vector 00. (3) Normalize 00, if possible, so that 7r will have unit sum. (4) Determine as many of the 0n as desired using Equations (2. 7a) and (2. 7b). It will be shown also that such a matrix R can be determined by an iterative process in every interesting case, and that 00 can also be calculated by known techniques. Thus, the problem of determining an equilibrium distribution of a QBD process (which has an infinite state space) will be shown to reduce to several routine finite problems, each of which does not tax the memory or computing capabilities of a digital computer of moderate-to-large size. In addition, results are given providing (1) necessary and sufficient conditions on R for an irreducible QBD process to be positive recurrent, (2) sufficient conditions for R and 7r to be unique, with 7r equal to the limiting distribution. 2.2. 3 Boundary Leading QBD Processes The sufficient conditions for the existence of the quadratic root R are satisfied by a very general type of QBD process, the "boundary leading" process. Definition A QBD process N is boundary leading if every nonboundary state leads to a boundary state. 17

This condition is a very weak one, and encompasses all irreducible QBD processes as well as a considerable variety of reducible ones. Transition graphs of a number of simple boundary leading QBD processes are shown in Figure 2-3, demonstrating this variety. These graphs represent every state by a vertex, and every nonzero, non-diagonal transition intensity uj(j k) by a directed branch from state j to state k. The states are arranged as in Figure 2-2. Two-way paths in Figure 2-3 are shown by heavy lines, and boundary states by square vertices. These graphs are useful in visualizing the communication properties of particular QBD processes, since a state j "leads to" a state k if and only if there is a directed path in the graph which can be followed from j to k. Note that only the graphs (e) and (f) describe an irreducible process. The rest are reducible. It should be particularly noted that the definition of boundary leading processes makes no mention of paths between boundary states. Thus, branches between square vertices in Figure 2-3 can be removed or added at will without altering the boundary leading character of the examples. This exercise offers still other interesting examples. The most interesting aspect of all the examples, however, is the way that the horizontal repetitiveness, characteristic of QBD processes, forces certain communication properties. The assumption that N is boundary leading is so general that it automatically includes every QBD process which has a finite number of 18

C12 pb Q) V a)'4- C12~~~~~~~~~~~~~~~~~~~~~~~~a) 0 CI a) bfcl *r43 k c+0s r FrJ2 0. 0~~~ 19 ~ ~ ~ ~ ~

communicating classes. The converse is not true, however, since Figure 2-3(d) describes a boundary leading process which (degenerately) has an infinite number of communicating classes. The boundary leading QBD process has been singled out for study here because it represents one of the most general processes for which a quadratic root matrix R exists having the required properties. 2.2.4 Conclusions It has been shown that, under common and predictable circumstances, the QBD processes lend themselves to expeditious numerical solution, and are therefore a useful class of stochastic models for computer system analysis (or other queueing application). The equilibrium distributions and other related measures are determined by a procedure which involves operations with only finite matrices, and which is analogous to procedures used in solution of the difference equation of simple birth and death processes. This conclusion has been demonstrated by a comprehensive algebraic theory whose two main theoretical conclusions can be summarized as: Conclusion 1 If a QBD process is boundary leading, then the matrix quadratic 2 A + XB + X C is, indeed, analogous to the characteristic equation for the birth and death process, having a root matrix R which generates the characteristic solution 20

fn = 01 R n=2, 3,... The m-vector 01 satisfies a boundary condition A A O0(E + RC) = 0 01 = oRA and the equilibrium solution is given by = [ Y0, l 2 2'.. ] The root matrix is in a known set *. If 7r is convergent, the equilibrium distribution is merely a scalar multiple of 7r. Conclusion 2 If the QBD process is irreducible and positive recurrent, then the root R can be found by a well-defined iteration process S = T(S) n+l n R = C(B+S ) and the resulting solution 7T will be convergent. Conversely, if the process is irreducible and a root R is found for which the resulting 7T is convergent, then the process is positive recurrent. Much more has been said, of course, but these results are the most significant and general. The theory presented is broad enough so that there are numerous specific questions about QBD processes which can be 21

answered by intelligent use of the theorems, corollaries, and lemmas explicitly supplied. Thus, this theory provides a good base on which to build even greater usefulness. 22

2. 3 RELATIVE PERFORMANCE ANALYSIS OF SELECTED COMPUTERS AND ASSOCIATED BULK STORAGE DEVICES A very important use of system analysis techniques is the selection of hardware to be included in new or modified computer systems, and also to determine how the hardware is best used. Thus this and the next two sections discuss specific systems which have been analyzed. In this section, we discuss the computer and bulk storage needed in an information retrieval system. In a system of this type, two things are of concern: (1) Data and programs must be moved from peripheral bulk storage devices into and out of the core memory; and (2) while in the core memory, data must be manipulated by the central processing unit (CPU) by executing programs. We shall use speed of response of the system as the performance criterion. The system performance is related to both the speed of the CPU and the speed at which data can be moved into and out of the core memory. If the CPU is in use 100% of the time and its speed is doubled, then if the CPU can still be kept busy 100%of the time, the system speed of response doubles. On the other hand, suppose the CPU is only in use 50%of the time because there are not enough programs and data in core memory available to keep it busy. In this case doubling the CPU speed will have little effect on system response. It will merely mean that the CPU will now be idle 75%rather than 50%of the time. 23

Thus two kinds of imbalance may occur in the system. If there is excess CPU capacity, increasing this capacity further will have little (if any) effect on system performance. If data can be transferred from bulk storage to core memory faster than the CPU can process it, then increasing this data transfer rate will have little effect on system performance. Of course, the relative CPU and bulk storage transfer rates required to maintain a balance vary depending on the type of problem the computer system is solving. In many scientific calculations, for example, there is much manipulation of a small quantity of data. Here the speed of the CPU should be high relative to the data transfer rate. In the system we are concerned with, however, the reverse is true. Large quantities of tactical intelligence data must be scanned for parameters of interest. This involves transferring a lot of data into the core memory but involves little computation on each record read in. Thus for a balanced system the relative CPU to data transfer rate can be much lower in this case. 2. 3. 1 System Specifications The computer specifications we considered in this analysis are shown in Table 2-3. Note that the Execution Speed is the ratio of word size to core cycle time. The specifications of the bulk storage devices are shown in Table 2-4. Two storage systems are considered. One consists of 200 million bits of drum storage and is called the dual channel drum. From Figure 2-4 it can be seen that transfers can take place from the disk and 24

Table 2-3 Computer Specifications Computer Beta* Epsilon Word Size 18 bits 30 bits Core Cycle Time 2 usec 1.8 isec Execution Speed 9 bits/psec 16.7 bits/isec Max Data Transfer Rate 7.20 mega-bits/sec 4. 8 mega-bits/sec (400, 000 words/sec) (160, 000 words/sec) also (100, 000 words/sec), (55, 500 words/sec) * "Beta" and "Epsilon" are designations chosen for ease of reference and to avoid use of any commercial designations. 25

Table 2-4 Bulk Storage Device Specifications Device Disk Drum Avg. Access Time 184 msec9 msec Transfer Rate.730 mega-bits/sec 2 - 6 mega-bits/sec 26

To To' —- L, Buffer - Disk Computer To Buffer Drum Computer (a) Disk-Drum Storage To Buffer-Drum Computer To Buffer Drum Computer (b) Dual Channel Drum Storage Figure 2-4 Storage Configurations 27

drum storage simultaneously in the disk-drum storage system and from both halves of the dual channel drum storage system simultaneously. It should be noted that the access time is much more significant in determining bulk storage speed than the transfer rate. Average access time is the average time required for the disk or drum to rotate to a point where the desired data are located. Once this point is reached the transfer rate is the rate at which data are read from the disk or drum. Thus for the dual channel 6 mega-bits/sec drum 9 msec average access time is required but only 1. 5 msec are required to read a 512 word, 18 bit record. 2. 3. 2 Results We have mentioned the concept of balance between the CPU rate and the rate of data transfer. For the proposed use of the computer systems under consideration we estimate conservatively that only rarely will more than five execution cycles be required per data word transferred. This estimate is made because the system is basically a data storage, and retrieval system (file system) in which little manipulation of data by the CPU will be required outside of the executive control program. For example, a target record may be searched on a few key words, but much of the record is text. Well under one execution cycle/word transferred would probably be needed here. Using the estimate in the preceding paragraph the disk-drum storage system may be rejected immediately for both computers as being highly 28

inefficient. This system is so slow (average access time 184 msec) that approximately 90 execution cycles/word transferred would be available in Beta and about 160 cycles/word transferred in Epsilon. (See triangles in Figure 2-6.) This means that the Beta processor would be idle 95% of the time and the Epsilon processor would be idle 97% of the time. At the other end of the scale consider the six mega-bit/second dual channel drum with these two computers. At its maximum rate (average access time 9 msec), approximately 6. 5 execution cycles/word transferred are available in Beta and about 13. 5 execution cycles/word transferred are available in Epsilon. Thus the performance of these two computers should be roughly equivalent even when the fastest available bulk storage system is attached to them. Both these systems are much faster and make much more effective use of the CPU capabilities than did the systems using disk drum storage. In fact there will be roughly an order of magnitude improvement in the performance of either computer with the dual channel drum over that same computer with the disk-drum. With the fast dual channel drum storage (6 mega-bits/sec) the computer system will be fairly well-balanced. Only in this case will the execution speed have any appreciable effect on performance because now the storage system may occasionally have to wait for the performances of the two computers as a function of the percentage of time the storage system is held up by the CPU in Beta. For most programs the percent of time storage must wait for the CPU will probably be 1% or below. In this range the 29

performance of the two computers is essentially the same. Note that for a given storage configuration the relative performance of Epsilon to Beta will never rise above about 1. 85 since this is the ratio of their execution speeds. We also studied the relative performance of the systems as the fraction of time varied for which the storage is idle waiting for the CPU to generate an I/O request. This is plotted in Figures 2-7 and 2-9. It is the ratios between these curves that are important and not absolute values. The system should usually be operating at or below the one percent level because the system is seldom compute bound. The curves are useful in comparing the relative merits of the various storage systems for different (small) degrees of compute-boundedness. 30

CL E 0 4)c CL^I~~~~~~~~~~~~~~~~~~C o |. 0'a I o e0" 0' cI 0 ~.) crr o —^ g ac u u\ \ us 4)\ a3uDWUJO~Jd uo|!sd8 31

-) 100 0 BETA sA / CANNL / \ 20,000 W o C HA NNEL 20000x 00000 000,000 Core and Bulk Storage Figure 2-6. Available Execution Cycles 32

2.5 2.01l 1.5 I o —-o Beta Block Xfer! ] — x Beta Dual Channel a —-a Beta Single Channel 1.0 - Epsilon'J3 0).5 I I I 1 I I I I 0.01.02.03.04.05.06.07.08.09.10 Fraction of Time Storage Waits for CPU Figure 2-7 Performance Relative to Beta With Disk Only Storage System: Disk-Drum 33

24 \ 18 -. i, "xA Beta Xfer _,,, o'x _ Epsilon 6 _ Beta Single Channer Xx L I I I! I I I I i 0.01.02.03.04.05.06.07.08 09.10 Fraction of Time Storage Waits for CPU Figure 2-8 Performance Relative to Beta With Disk Only Storage System: 2 Mbs Dual Channel Drum 34

30 24 18 ox X I' X. 4) x':>X Beta -,.] I I -x ^aBlock Xf i4) X!- X Beta x l6 Single Channel I! I I I I i,I 1 0.01.02.03.04.05.06.07.08.09 1.0 Fraction of Time Storage Waits For CPU Figure 2-9 Performance Relative to Beta With Disk Only. Storage System: 6 Mbs Dual Channel Drum 35

2.4 ANALYSIS OF PROPOSED ON-LINE SYSTEM A proposed on-line storage and retrieval system has been modeled to study response time and CPU utilization. Analysis of the model has been completed using both minimal queueing analysis and simulation. Figure 2-10 shows the physical configuration of the system being modeled as it is presently envisioned. Figure 2-11 is the general queueing model which was developed for the purpose of analysis by both the Recursive Queue Analyzer (RQA) [4] and the General Purpose Simulation System (GPSS) [2]. The specific descriptions of the respective models with pertinent assumptions made for each are presented in Sections 2. 4. 2 and 2. 4. 3. 2. 4.1 Description of the Physical System being Modeled The console-computer system being studied is assumed to have 14 active consoles at which operators generate job requests requiring processing by the main computer. These requests originate at the consoles and are for terminal support functions of the following types: a) Copy from one terminal to another b) Column tab c) Scrolling d) Page turning Requests generate jobs which are entered in a queue for execution at the CPU. In both models it has been assumed that the computer (CPU) is 36

U 2 3s I ss cl) 0 o 37' ~~~~o ~~~~~~~~~~~~e-j 0 o Q -P 0 SD S 37 _______ __ O C h M o~~~~~~~~~~~~~~~~~~~~~~~< CQ~ V) r c ____ 1 1P T _ ~ *....~.111.iiii.iii i..iii... 1111i n.ii.j[. C ^ *H M~~ i~~~~~~~~~r o! ~ z <U (^ 0~ 1 *"- &~ ^~~3

i' co 1n~~~~i' ~ ~T4 0 rrn r0i r-4 wi I zI-I -Q~ _ cr - O0 0 0 38

dedicated to the four tasks associated with terminal support. Jobs may also require access to a drum and/or a parallel arm disc unit for input-output activity before being completed. If there is a request for such a data transfer from the disc, jobs queue at the disc data channel awaiting service by the disc which processes the incoming requests on a FCFS basis. There is a second independent data channel for the drum and its operation is considered to be independent of the disc. Again requests for this data channel are allowed to queue -and the FCFS processing discipline is used. Basic performance features of a typical drum unit have been used in both models. 2. 4. 2 Queueing Analysis In the queueing model it is necessary that the various service time distributions be derived from the negative exponential distribution. For simplicity, the negative exponential distribution itself has been used. The same negative exponential random variable is used for the CPU service time distribution for each of the four job types. In the actual system a deterministic service time (in general different for each type of job) would be the most realistic manner of describing the CPU service mechanism. It is also assumed that upon completion of the CPU execution phase, jobs are either completed with probability ql or request some type of inputoutput activity with probability q2 +q3. In the actual computer system, certain job types request a deterministic number of druin and disc accesses 39

and in a specific sequence; other job types are completed without any input or output; hence if the type of job departing the CPU is known it can be determined with certainty whether a drum access, disc access or neither will follow. It has also been assumed that in the queueing model the console interaction time (the time elapsing from the completion of a job at the CPU to the time when the next request emanates from that console) is an exponentially distributed random variable with mean 24 sec. A further assumption is that the service time distributions for the drum and disc are exponentially distributed with means of 12 and 120 millisec, respectively. In the physical system the console interaction time would be more nearly uniformly distributed; certainly the service time distributions for the rotational storage devices would be more nearly uniformly distributed between the minimum and maximum positioning times than exponential. In summary, exponential service times have been assumed for all service mechanisms in the queueing model; in reality the service distributions range from deterministic at the CPU to uniformly distributed for the drum and disc service times. The actual process which is being treated as Markovian is significantly devoid of the necessary exponential characteristics. 2. 4. 3 The Simulation Model The simulation model has been formulated to simulate the behavior of the actual computer system in every element of detail. Each of fourteen 40

facilities corresponding to the consoles initially submits one job (chosen from a mix of job types) to the computer system after a think time interval determined by a sampling from a uniform distributed random variable following system startup. Associated with each job submitted is a parameter which indicates which type of job (Page Turn - Type 1, Scroll - Type 2, Tab - Type 3, Copy - Type 4) has been submitted. Job types 1 and 2 always require 50 ms; job type 3 always requires 75 ms; job type 4 always requires 100 ms of CPU service time. Different types of jobs "flow" through the system facilities in differing sequences, since each type of job requires a unique sequential service from the various system facilities. Table 2-5 indicates the sequential flow through the system for jobs as a function of type. Upon the termination of one job at the CPU, a new job is chosen from the mix of job types. A free console facility is seized by the job and after a think time interval of 24 sec has elapsed the new job reenters the CPU queue for service. For example, a type 2 job generated at a console will in sequence request service by the CPU, drum CPU, disc, and CPU facilities before being terminated. A type 4 job does not request any input-output activity and is completed following a single CPU service increment. The job mix is specified in the simulation model on a probabilistic basis through use of a discrete valued random variable for the job mix distribution. Table 2-4 indicates the job mixes that were used in various 41

Table 2-5 Flow of Control by Job Type Type 1 Type 2 Type 3 Type 4 CPU CPU CPU CPU DRUM DRUM DRUM TERMINATE CPU CPU CPU DISC DISC TERMINATE CPU CPU DRUM TERMINATE CPU DISC CPU DRUM CPU DISC CPU TERMINATE 42

runs. When a particular type job does enter the queue for service by the drum, the service time is uniformly distributed between 4 and 20 ms; for the disc the service time is uniformly distributed between 10 and 230 ms. Response time is defined to be the elapsed time between job submission at the console and the job termination following its final CPU processing operation. In determining the length of the simulation, runs of 1000, 5000, 10, 000, and 25, 000 jobs were made. The difference in response times for the 10, 000 and 25, 000 job runs was less than two percent for each of two runs made with different job mixes. Since the average IBM 360/67 execution time for one 10, 000 job GPSS run was approximately five minutes, the significantly increased expenditure of 360/67 processing time on the longer 25, 000 job run did not produce a proportionate improvement in the results. Hence 10, 000 jobs were used for all the simulation runs made. A critically important question arises as to how one determines the set of parameters for the queueing analysis which correspond to those for a particular simulation. Since the queueing analysis parameters T1, T2, T3, T4, q1, q2, and q3 are all expected values, averaged over all four job types, it is obvious that different sets of parameter values in the simulation model might correspond to one particular set of parameter values in the queueing model. In the comparisons which have been made, only the job mix distribution function in the simulation model was changed in order to vary q1. 43

Define the following parameters, taken from the GPSS model: d. = the total number of departures from the CPU in one complete sequence of processing operations for a type i job. m. = fraction of the d. departures which are followed immediately by a drum processing operation. c. = fraction of the d. departures which are followed immedii 1 ately by a disc processing operation. x. = mix fraction of job type i (an input to the GPSS model). 1 i i=1 i=2 i=3 i=4 d. 7 3 2 1 1 I _....... m. 3/7 1/3 1/2 0 1;.., c. 3/7 1/3 0 0 The following expressions are used to compute the parameters ql, q2' and q3 for the RQA model, given the GPSS model's values for ci, di, mi, and x.. 4 z xi i= 1 ql = 4 x.d. x.d. i 1 1 = 1 1 i=1 i=1 44

4 y xid. ci x.d. i=1 1 1 92 = 4.23 xidi 4 x.di m q3 4= S x.d. 1=1 1 1 To compute the average CPU service time for an RQA run the following expression was used 4 2 T.x. d. i=i 1) Avg CPU Service Time, T1 = 2 x.d. i=1 1 where T. is CPU processing time for a type i job and xi and di were previously defined. Since the RQA output consisted of steady state probabilities for various states in the model, the following expression relating various queue lengths, service times, etc., was used to compute the system response time. 45

Average Think Times 3000 T2 = 24 SEC Tl =Average Cpu Execution Time g2 =93 g92+g3= —g | 2000 U) E[:L \ \ T =.096sec. 1000 TI =.048 sec..1.2.3.4.5.6 9ql Figure 2-12 RQA Analysis of Proposed System 46

TI =.048 SEC..9.8.7 _ f )5 Think Time = 24 SEC. TI = Av,CPU Execution Time A,.6 / / TTI.096 Sec..I / 2+93=g2 g3 9 fl / I GE g2 +g3 I-gi o /?, 1 CXb~~~~~I I. / /. I I I I I I.1.2.3.4.5.6 QI Figure 2-13 RQA Analysis of Proposed System 47

System Response Time = 1 [QCPU* T - A 1q3 1 DRRuM* T4 ql 1 - P{] CPU is idle idle 1 r q21 T3(PDSC1 +. 5(QDISC PDSC1)) + L [ Lq2+q3j 1 - Pr{Disc is idle} where QCPU is the average number of jobs in the queue for CPU service, QDRUM and QDSC are similarly defined, and PDSC1 is the joint probability that one disc mechanism is busy and there are no jobs in the disc queue. All the Q's and probabilities were obtained as output from the RQA program. For example, a type 2 job generated at a console will in sequence request service by the CPU, drumCPU, disc, and CPU facilities before being terminated. A type 4 job does not request any input-output activity and is completed following a single CPU service increment. The job mix is specified in the simulation model on a probabilistic basis through use of a discrete valued random variable for the job mix. 2. 4. 4 Discussion of Queueing Analysis Results Figures 2-12 and 2-13 depict the results of RQA runs. The following parameters were used in the calculation of the system response time with the model of Figure 2. N number of consoles in operation = 14 T1 = 1/L mean job execution time T2 = 1/p2 mean operator response time 48

T3 = 1/L3 mean block transfer time for disc = 120 ms T4 = 1/p4 mean block transfer time for drum = 12 ms ql, % of jobs departing the CPU which do not require I/O accesses q2 % of jobs departing CPU which require a disc access q3 % of jobs departing CPU which require a drum access. As indicated on the plots, it is assumed that q2 and q3 are identical for these particular RQA runs of Figures 2-12 and 2-13. 2. 4. 5 Discussion of Results of Comparison Figures 2-14 and 2-15 contain plots of response time and CPU utilization as functions of q1, the probability that a job departing the CPU does not require further input-output activity. The RQA curves for these figures were obtained using the same model as for Figures 2-12 and 2-13 with different parameter values. Table 2-6 indicates the parameter values which were used in making the runs. The lower bound curve of Figure 2-14 was computed by assuming no queueing delay at each server; hence response time was simply the summation of the average service times of the various servers which a job sequences through in the system between input and termination. As would be expected the results of GPSS and RQA compare most favorably in the middle and upper ranges of q1 values. This interval corresponds to the area in which the CPU service time distribution used in the 49

----- GPSS 820 _ RQA 700- T2 =24sec 600 \. U0~~)~. 500 \ X ) 4 Q i 400 Ep ^0Lower Bound <, 300 200 100. I.1.2.3.4.5 q, Figure 2-14 Response Time from RQA and GPSS 50

1.00 wM~ ~GPSS 0.9 - ~at 9 E RQA 0 0 // U. 8o.1.2.3.4.5 Figure 2-15 CPU Idle Fraction from RQA and GPSS 51

tS - l: C O C 0 u oO c ccu2; po 4 V4Cd C C 4 C v; i C13 C~ CO Ul L CC' C c c Lc in CC!0 cu L- CDC VDC cO >s a) - L-c CCD COc.t -CL O1 cc -': COOC o COC J co^^JI 00 Li 0 0 00 t C- $^lr: E C^r- LA~(D aTD(D inl ^zi ~ r-CL c OC Cl' oo mL 1 cc ~1 _!: co c o c o C cD L Ln mC1 CCt CC ~<cc ~c ic ~c ~ ~o ~ O 0 1-~ ~-H C O O0CO CO C O-. CO LO C V " -:r-,. O c C O - cc ci. q cc cx cc o cO cc rL- a) 0In 03 c ^0 0 CI I cc 0 r < CC S * *0 1r -CI C\J C\J cc0zI co c 10 10 10..... cc o o 1 1 cc cs H O~ H'' Cl2 oo 45 10 0 10 10 1 10 M10 0 Q) sM4 c r-4 c c C o j 5-. 00 < CO 0 5 5 0 \-4 ^0 52

~ClO~~~C ri11- 1 II'~"1', a 0 0 0 cl 5., c - r Ct Q: o UC |ci 3 I a Cl aC) U S. C -0 —.,) C,) t?)...... L~ * 3 *3: 3 O ~ O0 1-oo o o> CCl a) r —4S O c L T-e C) C 53 3QO^- p 5 c z CM II CS L CD TCM CO CO

GPSS model most closely resembles the exponential distribution assumed in the RQA model (even though the resemblance is in fact quite poor). For example the weighted CPU service time distribution is given by 1.0 ---- Pr{CPU service time < x} 50 X CPU service time, ms for the job mix (100%, 0%0, 0% 0%), which corresponds to ql =. 1428. When the job mix is changed to (30, %, 30, %, 30, %, 100/0) corresponding to q1 =.27778 the weighted CPU service time distribution is given by 9 weighted pR Service Time CPU} ean is less than X ms i 7 50 75 100 CPU service time, x in ms At the upper end of the q1 range with a job mix of (1000%, 20%, 30%, 40%) corresponding to q1 =.43478 the weighted CPU service time distribution is given by 54

1.0! Pr {Service Time CPU 6 Weighted is less than X ms 6. W h - Mean.3 c ~ o This in the middle and upper portions of the q1 range, the weighted CPU service time distribution for the GPSS model most resembles the negative exponential distribution function used in the corresponding RQA model. As would be expected, the percentage of error in using RQA is large where the dichotomy of corresponding service time distributions is relatively large. Even though in the worst case a constant rate server in a GPSS model was assumed to be exponential in the corresponding RQA model, the percentage error in system response time and CPU idle time was less than 8%. Furthermore, in all cases, as would be expected the response time computed using RQA exceeded that obtained using GPSS. This is so because the variance (normalized to the mean) of the respective service times in GPSS is always less than the normalized variance which arises as a result of the negative exponential service time distributions assumption for the same server in RQA. 55

2. 4. 6 Conclusions The results of this analysis (Figures 2-12 to 2-15) show that system response time as well as the CPU idle time is critically affected by the intensity of I/O demand generated by the consoles. The response time (defined as the average time for a request originating at the consoles to be serviced) is seen to grow monotonically with an increase in the fraction of jobs requiring I/O. This is to be expected since the disc and drum I/O times enter additively into the computation of response time and if on the average more jobs require I/O, then the average response time will increase. It suffices to say that the q1 parameter should be made as large as possible, corresponding to minimum I/O. The natural question which arises is: what can be done in the actual system to force changes in the q! parameter of the model? Certainly the job mix would be fixed once the operating environment has been established. However it is possible that the disc and drum units can be organized so that multiple accesses can be made without interrupting the CPU. If this is done, then the resulting ql value is effectively increased and system operation improved. 2. 4. 7 Comparison of Analysis Times Since the Recursive Queue Analyzer program is implemented on the IBM 7090 and GPSS is implemented on the IBM 360/67, it is most meaningful to normalize execution times before comparisons are made. 56

The computations for all nine points on the RQA curve of Figures 2-14 and 2-15 required approximately 18. 67 minutes of CPU time on the 7090. The corresponding GPSS computations required 49. 48 minutes of CPU time on the 360/67. Since the 360/67 is approximately three times faster than the 7090, the RQA time normalized to a 360/67 equivalent machine would be reduced to approximately 6. 22 minutes. In conclusion, the GPSS model effectively required 7. 95 times more CPU execution time than did the RQA model to compute nine points. The maximum error in system response time introduced by using RQA was 7. 7% while the maximum error in CPU idle time introduced by using RQA was 2. 88%. It is noteworthy therefore that the Markovian modeling tool, RQA, which is substantially more efficient than GPSS from a computational standpoint, can yield such satisfactory results in modeling a system whose behavior is significantly devoid of the characteristics theoretically necessary for a valid Markovian model. 2. 5 ANALYSIS OF A CANDIDATE CRT SUPPORT SYSTEM This section presents results of an initial analysis of a candidate CRT Support System. The purpose of the analysis is to determine the additional utility gained by implementation of the system on an IBM 360 model 40; the basic capabilities are presently being implemented on an IBM 360 model 30. 57

The basic on-line capability will provide a CRT user with the ability to retrieve, by specifying arecord number, a record from a single file and to display that record on a CRT in one of two formats. The system will be so partitioned as to allow batch processing to proceed in the background. The CRT's will be locally operated, that is, they will be cable connected to the computer. The essential system components considered in determining how well the system will perform the initial functions are: 1) IBM 2311-Disk 2) IBM 2848-Display Control 3) IBM 2260-Displays (8) 4) IBM Multiplexor Channel 5) IBM 2030-Processing Unit. The multiplexor channels available have data transfer rates well in excess of the devices to which they will be attached, i. e., the disk and display control. Therefore, the rates at which blocks of data are transferred will be determined by the devices themselves. Assuming a block of data consists of eight-80 column lines (640 characters), the average transfer time for the devices will be obtained from the following information. The 2311 disk has an average positioning time of 75 milliseconds; the average latency time is 125 milliseconds; the transfer rate is 156 Kilobytes/second. Therefore for a block of data the total time required is 1 75 + 12. 5 + 640 x = 91. 75 milliseconds. 156x 10 58

Normally a request will require six blocks of data from the disk; this increases the total disk time to 113 milliseconds (assuming all blocks of data are contiguously located). Assuming that a full screen display is 12 lines (960 characters), we compute the time for the 2848 as follows: The time required for synchronization of the channel and the 2848, prior to data transfer, averages 8. 4 milliseconds. After synchronization, the transfer of data requires approximately. 4 ms/character for the characters on a given display line. During the transfer of data from the channel to the 2848, data transfer is halted for a period of 16. 7 ms before the start of each display line except the first. During this pause the 2848 services 2260 display station key boards, i. e., it stores the data from a single 2260 keyboard in the associated 2260 buffer or performs the specified control function. Therefore the 2848 time required for transfer of 12 lines each consisting of eighty characters is 8.4 +80.4x 12 +llx 16.7 = 576.1 ms. Therefore, for the average request, the time required to display the first block of data will be 576 + 113 = 690 milliseconds plus the required processing time (generally less than a second). These times are independent of the 360 model used. With the model 30 there is suspension of the CPU while the data transfer is in progress. The model 40 allows the multiplexor mode transfers and the CPU to operate concurrently once the data transfer has been initiated. 59

To get an estimate on the response time for the system we made the following assumptions: 1) Average request requires six blocks of data (3840 bytes) 2) Requests handled sequentially 3) Buffer space provided for complete record (all six blocks) 4) Core space not released when data is transferred to display 5) CPU is suspended during data transfer in channel (as is done with model 30). If response time is defined as the time interval between the system first receiving a target number request and the first screen display associated with the request, the minimum response time is. 69 + 1 = 1. 69 seconds. This response time is composed of one second processing time associated with a request, plus the disk and 2848 time. In the unlikely event that eight CRT's required service simultaneously, the maximum response time will be 13. 6 seconds. On the average, the response time will be 6. 8 seconds with half the CRT's in the service queue. If core buffer space must be provided for each of the consoles, and a 65K core model 30 is used, up to 45% of core could be tied up for buffering. The decision to use a given 360 model should, of course, be based on the ultimate uses of the system; however we can make some qualitative judgments. If an average response time of almost seven seconds is not unreasonable, and if the size of the supervisor and the associated on-line software package will allow up to 45% of core to be tied up for the 60

information retrieval function, then the model 30 is adequate. If this is not the case, then it might be necessary to change to the model 40 since: 1) With the model 40, overlapping of CPU execution with the I/O channel will allow a more efficient operation. If the system is multiprogrammed, response time can be decreased considerably. 2) The added complexity and the size of the software package might make multiprogramming infeasible with the smaller model 30 core. 3) The larger core of the model 40 gives potentially a larger buffer area, thereby giving possibility of fewer disk accesses and hence better response time. Before we can make more quantitative estimates on system performance, additional information is needed: 1) How much core will be used for buffering? 2) When a record of six blocks is accessed, will only that part of a record which can be displayed be brought into core? Or will all six blocks of a record be stored in core? 3) Will the system be multiprogrammed or will the supervisor queue up request from CRT's and handle on a first come — first served basis? 4) Is core released when information is transferred to display buffer? 61

5) Under what conditions are data swapped within each core partition? 6) How is core partitioned between batch and on-line jobs? 7) How much of core will be occupied by the supervisor and the resident on-line software package? REFERENCES, CHAPTER 2 1. Evans, R. V., "Geometric Distributions in Some Two-Dimensional Queueing Systems, " Operations Research, 15, 5 (October-November 1967), pp. 830-846. 2. International Business Machines Corporation, General Purpose Simulation System/360 User's Manual, IBM Publication H20-0326, White Plains, New York. 3. Wallace, V., "The Solution of Quasi Birth and Death Processes Arising from Multiple Access Computer Systems, " SEL Technical Report No. 35, Systems Engineering Laboratory, The University of Michigan, March, 1969. 4. Wallace, V. and Rosenberg, R., "RQA-1, The Recursive Queue Analyzer," SEL Technical Report No. 2, Systems Engineering Laboratory, The University of Michigan, February, 1966. 62

3. OPTIMUM DESIGN TECHNIQUES As computer system designers our goal must be not just to find a system adequate for the job at hand, but to find the best, or optimum, system design. In virtually all cases, the optimum design must meet certain constraints, usually with respect to performance or cost. The three sections in this chapter discuss in turn the optimum design of telecommunications networks, graphical display terminals, and large storage systems. 3.1 OPTIMUM DESIGN OF TELECOMMUNICATIONS NETWORKS 3.1. 1 Introduction In this section we will be concerned with the problems of designing the portions of remote access systems which lie outside the central location. A central digital computer complex and the communication facilities which connect remotely located terminals to the central complex constitute a teleprocessing system. The most significant analytic distinction between a teleprocessing system and a conventional batched input system lies in the random manner in which requests for service (jobs) arrive at various entry points to the system. In the conventional batch mode, arrival patterns are deterministic in the sense that they are monitored by machine operators and computing center scheduling personnel, while inputs to a teleprocessing system are seldom controllable. Furthermore, a very significant fraction of capital expenditures in a teleprocessing system is devoted to communications facilities outside the central complex; in the batch system almost all 63

capital expenditures are devoted to software and hardware facilities at the central computer complex. In the design of such teleprocessing systems, various queueing problems arise as consequences of both the random or unscheduled requests for service and the importance of effectively utilizing the expensive communications equipment linking the remote terminals to the central computational facility. This research is accordingly centered around the development of techniques for the analysis and synthesis of the communications network portion of the teleprocessing system. In the communication network portion of the teleprocessing system, the costs of communication link transmission capacity and data concentrators are frequently of the same order of magnitude as the costs of the central processing facility. Hence there is significant motivation to make efficient utilization of these communications resources in order to maximize the cost effectiveness of system operation. In general, the important macroscopic lead variables with which the designer of the network must be concerned are: * Topological structure Link capacity value Response time or performance of the network System reliability Control procedures Types of data concentrators 64

* System cost. The development of the topological structure of the network is probably the single most important portion of the design procedure; it is next to impossible to achieve any degree of cost-effective performance in a network where structure is poorly chosen. A completely point-to-point network (Figure 3-1) is defined to be a structural configuration in which each remote terminal is connected to the computer over a channel which is not shared with any other remote terminals. A multipoint network, (Figure 3-2), on the other hand, is a configuration in which one or more information channels in the network are shared by two or more remote terminals. In general, the completely point-to-point network can seldom be justified from an economic standpoint since there is no sharing of communication lines, and the cost of lines in this structure tends to be high in comparison to other structures. Thus a strong motivation exists for the development of techniques which enable the communication channels of a network to be shared, thus reducing the total cost of lines in the system. However, as multipoint configurations are formed, additional costs for concentrating equipment are incurred and must be considered. Furthermore, the average response time increases as the congestion builds up due to the formation of more multipointed branches. 65

Figure 3-1 Point to Point Configuration Figure 3-2 Multipoint Configuration 66

In designing the network, one must be able to assess these tradeoffs with queueing and other types of analytic models which characterize the interrelationships in mathematical terms. As the first step in the desired direction, we consider some relatively simple models for data concentrates in networks having fixed structures. These models form the basis for the more complex design problems in which topological structure and link capacity values are treated as variables in the design procedure. 3. 1. 2 Models for Data Concentrators In this section we present models which are useful in assessing the performance of multiplexor and message switching center (MSC) types of concentrators. The multiplexor model is valid for either frequency or time division schemes; the MSC model holds for any store and forward devices which can buffer message blocks at intermediate nodes of the communication networks. 3.1.2.1 Message Switching Center Model Consider the single concentrator configuration of Figure 3-3. The queueing model for this network can be depicted as shown in Figure 3-4. where it is assumed that messages arrive at node i according to a Poisson distribution having mean yi. The mean lengths of all messages are - and we let {a1,.. an} denote the capacity of the input links to the concentrator. The capacity of the high speed shared link is denoted by aH. For most 67

Y.+, I/I al Hg I // I Q/ MSC 3-n - Yn a I/AL Figure 3-3 Single Stage Message Switching Center 68

Servers Queues Shared MSC -(o LinkServer Buffer ___ Que I-L —I Vo o H'V 0 o 0 Mean Service Time =_n F —e. y t, i//. Figure 3-4 Queueing Model for Single Stage Message Switching Center 69

reasonable state of the art MSC's it can be shown that the buffer capacity is large enough so that the average message delay in the network can be written as follows, assuming a first in-first out discipline at the queue in the MSC, TMSC - n+1 + LCri - =1 1 i=l i n+1 where An+1 is the sum of the mean arrival rates of messages from all remote terminals in the configuration. 3.1.2.2 Multiplexor Model (MX) In the network of Figure 3-3, if a MX concentrator is used instead, the equivalent structural model can be shown as depicted in Figure 3-5. Passages refer to the fractions of the total link transmission capacity which are assigned to the individual remote terminals in the sharing group. It is possible to express the average delay for a single multiplexor network as follows: MX n+1 c1 j=1 j i=l1 i Yj where Cj is the capacity of the j-th passage on the shared link and it is assumed that each passage capacity is less than the capacity of the 70

Passages in Shared Link 2 MX Fgr n Figure 3-5 Equivalent Structural Model for Multiplexor Concentrator 71

corresponding input link. The problem of determining the values of the C. capacity assignments which minimize TMX for an arbitrary centralized network has been investigated. An optimization procedure has been developed for minimizing the average message delay in any centralized network where the link capacity values are known and where one or more MX concentrators are used. These models were used to conduct some comparisons between MX and MSC concentration sehemes. It is well known that MSC's are more costly than MX's. However, there is no basis for determining which type of concentrator should be used in a given situation, unless one uses models like those just presented to quantify the differences in performance between multiplexors and MSC s. Figures 3-6 and 3-7 depict the results of two studies which were made to compare the performances of a MX and a MSC at the concentrator node in the network of Figure 3-3 when n=9. As the next phase in adding reality (and hence complexity) to the design model the problem of efficiently allocating the link capacity resources in a network has been considered. 3. 1. 3 Efficient Allocation of Link Capacity We now consider the problem of how to assign capacity values for the links of a structure so as to get the minimum line cost and satisfy an average response time constraint. The problem is combinatorial in nature and hence the size of the optimization space can get very large for networks having more than 15 or 20 remote terminal nodes. Computational short 72

a = 150 b/s a 200b/s a 300b/s 04 =400b/s a -500b/s 6a 600b/s a7 =70b 00b/s a 800b/s ab/s Yi I mess/sec I,...10 I//.= 100 bits/mess t10- Pi =.277, the average Utilization of a low speed _8 input link -- Tm Tmx 0 ("4 c _. C.H 2 ^ ^Tmsc 0'-.2 4.6.8 1.0 PH' Utilization of the Shaped link Figure 3-6 Comparison of MSC and MX Concentrators 73

ai =600 b/s i=l,...10 I/L = 100 b/mess yi = Imess/sec i 1,...10 Pi.1667, the average utilization of an incoming low speed link 10.0. 8.0 OI 6.0 s mx 4.0 D C 2.0 0.2.4.6.8 1.0 p, Utilization of The Shared Link H Figure 3-7 Comparison of MSC and MX Concentrators 74

cuts have been developed which enable one to optimize otherwise untractable networks. Also, an approximate solution which is computationally very easy to determine, even for networks of moderate size, has been obtained. These approximation procedures consistently produced solutions to the capacity assignment problem which were within reasonable limits of the true optimum and did so in an efficient way. 3.1. 4 Network Design when Structure is a Variable All of the optimization models developed for the analysis of data concentrators and the assignment of link capacity obviated shortcomings of existing network design procedures such as the minimum spanning tree algorithm [11] and one due to Esau and Williams [ 7 ] of IBM. These procedures do not treat link capacity as a variable and also do not consider the average response time property which was introduced in Section 3.1. 2. In this research, we have directed our efforts toward the solution of the network synthesis problem: Minimize total telecommunication network system cost (lines and concentrators) subject to the constraint that the average response time of the network must not exceed Tmax a given maximum acceptable value. The important control variables are network topology, link capacity values, and the types of data concentrators. A procedure has been developed for solving this prdblem which, although it generally produces suboptimal solutions, represents a significant improvement over existing network design algorithms. The improvements accrue as a consequence of the greater level of reality and generality which 75

are embedded in the design model. Some of these major improvements are now summarized: It is capable of working with nonlinear discrete valued costcapacity functions. * It treats the type of data concentrator as a design variable. * The performance (average response time) of any potential configuration is readily assessable. Since the problem statement involves a given constraint on average response time, it is possible to predict cost-performlance relationships for different levels of response time, before the network is actually constructed. We now summarize the fundamental approach which has been taken in solving the above formulation of the design problem. A multi-dimensional design parameter space is constructed in which topological structure, link capacity value and types of data concentrator are varied. Topological structure is varied by systematically decreasing the number of central links one at a time, reducing link costs at each step until no further reductions can be made. Then link capacity values are assigned for the structures of these topological sequences in a manner which creates efficient utilization of the system resources. The type of data concentrator is varied by using MSC's in those situations where they produce a large enough performance improvement to justify their larger cost. 76

No configurations are ever formed by the topological variations which have average response times in excess of a certain given value. A number of examples which illustrate the usage of the proposed design procedures and the results of their application to realistic problems are discussed at length in the final report on the communication network synthesis effort [ 6 ]. 3.2 DESIGN OF OPTIMUM DISPLAY SYSTEMS 3. 2. 1 Introduction The subject of the study reported here is the systems design of highly-interactive graphical display terminals for time-shared computer systems. A more detailed report of the work is available in a reference [ 8 ]. The overall goal of the study is to develop insight into how the choice of subsystems for a display system can affect the system's performance, and to develop methods of finding the combination of subsystems which will be optimum for any well-defined display application, where optimum is defined as minimizing a display system's response time subject to a cost constraint. Viewed in a slightly different way, display system design can be thought of as presenting a problem in resource allocation. The resource is a fixed number of dollars, which are allocated to the purchase of display subsystems in a manner which minimizes the total system's response time. Response time is important in any highly-interactive remote access computer system, and is even more important when considering the 77

graphics terminals which often form part of such systems, because fully capitalizing on the potential interaction rates achievable with a graphics terminal demands good response time. Dollars are significant because display system hardware is still quite expensive, and improper allocation of the dollars can produce a display system whose response time is orders of magnitude worse than what can be achieved with the optimum allocation. Figure 3-8 shows the type of system of interest to us. The four subsystems of particular interest are the data link, the remote computerdisplay control, the display terminal's core storage, as well as the display terminal's bulk storage. In specific cases the terminal's bulk store may simply be a cable between adjoining rooms. In most cases, the remote computer-display control and its core storage will be part of a display system, with the core serving both as a refresh buffer for the display, and as program and data storage for the remote computer. This computer (usually small and inexpensive) is used to take the burden of much of the display processing from the main computer. The exact nature of the relationships among the display control, remote computer, and core storage is discussed in a reference [12]. The main computer is included in the system to provide inexpensive bulk storage and, above all, because of the extensive computations demanded by most display applications, such as network analysis [ 4, 5, 14, 17], drafting and numerical control [ 13], integrated circuit layout [ 9,15], and many others. When designing a display system, there are large numbers of 78

LU <d s~ U M'-/ H z -FJi H(() 00 U I_4-s 1 Ld U< CO JJ U) b __-q o >-u.~P _mq 7, —, rLd o O_.7 79

hardware-hardware and hardware-software tradeoffs or alternatives which can be exploited. By hardware-hardware tradeoffs is meant the possibility of increasing the capability of one display subsystem in exchange for decreased capability of another display subsystem, while keeping response time constant. The purpose of juggling hardware in this manner is of course to minimize cost. As a specific example, if we wish to maintain a specified response time at the display console and wish to decrease the data link speed, it is necessary to increase the "power" of the remote computer/display control, or the amount of core storage, or the amount of remote bulk storage, in order to compensate for the extra data transmission time. The converse also applies. Increasing remote computer power cuts computation time, while increasing core storage decreases bulk storage accesses, either at the terminal or at the main computer, and increasing remote bulk storage cuts down on data link usage. In some cases, however, it may not be possible to completely compensate for a lower transmission rate. This will depend both on the decrease in transmission rate and on the relative usage of the four system components. Specifically, a small decrease in transmission rate for an infrequently used data link is far easier to accommodate than a large decrease in a heavily used link. Similar statements can be made with respect to each of the other system resources: a decrease in any one can be compensated for by increases in one or more of the other resources, within certain limits. 80

With hardware-software tradeoffs, we are referring only to the remote computer-display control. Certain display-oriented functions can be implemented either by software with the remote computer or by hardware in the display control. Several possibilities exist here. When a position indicating device, such as a RAND table [ 3 ] is used at the display console, it is often necessary to correlate a position with an entity currently being displayed on the CRT. This can be done with software, or with display control hardware which continually compares the c urrent CRT beam position with the indicating devices' position [10]. The first method can consume much remote computer time, but costs nothing: the second method takes neither computer time nor display control time, but does take money. If, on the other hand, a light pen-type entity indicating device is employed, its position will frequently need to be known: this is the familiar light pen tracking problem. Once again, the work can be done with either special purpose hardware built into the display control, or with a program running on the remote computer. A current hardware implementation takes about 100% of the display control's time, decreasing by a like amount the quantity of flicker free material which can be displayed [16]. Software implementations of various pen tracking algorithms do not affect the display, but do require remote computer time to execute. One of the most demanding display functions is the rotation of a three dimensional object, which requires a matrix multiplication operation 81

of six scalar multiplications and four scalar additions for each point and line of the display. Implemented in software, this can be very slow, and can limit the smoothness and rate of dynamic rotation. A first step toward improvement is adding hardware multiplication to the remote computer. A second step is implementation, in the display control, of the actual matrix multiplication. There are two current manifestations of this second possibility. One uses binary rate multipliers, followed by digital addition [16]. The second uses analog multipliers and analog addition [1]. Hardware facilities for display subroutining allow one display list to be used many times in the course of drawing a picture, and therefore avoids needless duplication in core of display instructions. This is all a direct parallel to subroutining for computer programs. Similar display control hardware-remote computer software tradeoffs exist with respect to problems of dashed lines, blinking lines, transfer of control, recursive subroutining, displaying lines, and displaying alphanumerics. With this multitude of tradeoffs between the various display system components, an important question arises: for a given display system application, and a given dollar cost, what combination of display subsystems will produce the best possible service for display users? The best hardware will produce the fastest, or minimum, average response time experienced by the display system's users. 82

What is needed for use by display system designers is a rigorous objective method for evaluating the effects upon system cost and response time of the various tradeoffs discussed qualitatively in the previous section. By now the qualitative tradeoffs, which are in fact rather obvious, are well understood. The tradeoffs need to be quantified for the sake of intelligent systems design, because the consequences of using poor systems design are the overloading of some subsystems, under utilization of other subsystems, and decreased productivity for the 3ystem's user. The work reported here has been conducted with this goal always foremost. 3. 2.2 Display System Model In order that display systems be studied in a rigorous manner, particularly to find optimum display systems, a mathematical model or abstraction of how a display system operates is needed. To be useful, the model must reflect the varying capabilities of the display subsystems; the remote computer-display control, the data link, and the remote terminal's core and bulk storage. The model must also be sensitive to the varying computational, storage, and data transmission requirements of the many different applications which might be implemented with a display system. Furthermore, any explicit or implicit assumptions imbedded in the model must be tenable. Finally, the model, when appropriately analyzed, must yield some measure of system performance; specifically, the system's response time will be the desired performance measure. 83

A model which satisfies these requirements has been developed. It possesses the properties that as the capabilities of the display subsystems increase, the system's response time decreases and its cost increases. These two complementary properties will greatly facilitate the finding of optimum display systems. The model also includes a mechanism for dividing display processing between the main and terminal computers. 3. 2.3 Analysis Techniques The manner in which the display system can be analyzed to find response time is of significance. When the display terminal serves only one display console, no queueing occurs in the system. An analytic expression is then available to calculate response time. However, when queueing occurs, either simulation or Markovian (queueing) analysis must be used. A problem arises here. Markov analysis requires that the model possess certain properties; there is unfortunately no assurance that the display model satisfies the requirements. Simulation, on the other hand, places virtually no restrictions on the model. The problem is that in terms of computer time, simulation costs from 5 to 15 times more than Markov analysis. Because of this problem a study was made to compare the results of simulation and Markov analysis as applied to the model, even when the model did not meet the requirements for Markov analysis. The results showed less than a seven percent difference between the two analysis methods for the specific parameter set used with the model. While this result 84

is definitely not generalized to other models or to greatly different model parameters, it does justify the use of Markov analysis for the work reported here. This means that the Recursive Queue Analyzer (RQA) programs developed by the Systems Engineering Laboratory[18] can be used to analyze the model. 3. 2. 4 Optimization An optimization procedure has been devised to find the display system (set of four subsystems) which minimizes response time subject to a cost constraint. The optimization accepts as inputs a paramaterized description of the display system's application, and descriptions of up to sixteen choices for each of the four display subsystems. The most important feature of the optimization is that it minimizes the number of times RQA is used: even though RQA is less expensive to use than simulation, it is still not cheap. Typically in examining about a thousand possible display systems, RQA will be used but two or three times during the optimization. 3. 2. 5 Evaluation of Computing Power A critical necessity in the optimization is the creation of a suitable data base of hardware subsystems from which an optimal display system can be chosen. This can rather easily be done for all subsystems except the remote computer-display control subsystem for which not just computing power, but also display capability must be determined. Display capability 85

must be known because not every display control can display the same amount of information. Only those able to display more than some minimum amount of information can even be considered for inclusion in a display system; this minimum amount of information is application dependent. Having eliminated unacceptable display controls, the remaining display controls-remote computers must be rated according to their computational abilities. A convenient way to measure the display capability of various display controls is with standard test patterns typical of various applications. The method used by Adams Associates in The Computer Display Review [ 2 ] is typical of this approach. The second part of the problem, measuring the computing power of remote computer-display controls, is a bit more difficult. A list of display oriented macro-level operations was drawn up; such that any and all display work could be performed by programs made up of sequences of these macros. Different pieces of hardware are evaluated by finding the execution time of each macro. These times will vary as the hardware's capabilities change, and will decrease or increase as more or less of the hardware-software tradeoffs are manifested in hardware. By taking a weighted sum of the execution times (where the weights are application dependent), a particular remote computer-display control is evaluated for a particular application. 86

3. 2. 6 Results Four display system applications were selected for close study. They are text editing, general two-dimensional drawing, general threedimensional drawing, and general network analysis. These four applications were chosen for their differences; that is, they each use the facilities of a display system in different ways. For each application all the parameters needed by the model and optimization were estimated (not measured). Optimizations were' performed for each of the four applications for systems with one, two, and three display consoles, and for different levels of capital investment. This resulted in a large number of display system designs. The per console cost and average response times of these systems are graphed in Figures 3-9 to 3-12. R refers to the number of display consoles in a display system. There are no display systems below and left of the curves: there are many display systems above and right of the curves, but they are nonoptimum. They cost more and give poorer response times than systems on the curves. If a display system is built, it should be chosen from any one of the curves, so that it is optimum. Choosing which one of several systems to build can be based on a cost criteria, a response time criteria, or on some combination of cost and response time which gives a cost-effectiveness measure, such as interactions per second per dollar. It is useful to notice the changing slopes of the graphs. For long response times, a small additional investment yields very good returns in 87

0 0 0 ro Jl Ld bs, I I5 cI_ MI C (9j ~88 I I 2 S$-i70 G1S03 A1HINO!AI 88

W Loc lro _ ILd 0 0 0 0 0 r()o/ N o^ SUvn0oaC'1SO) AKHINOI' Figure 3-10 Minimum Response Times, Two-Dimensional Drawing 89

r~~~. 11-~~~~~~~~~~~~o 900~0 U)) 0,r-i 3L 0 0 < _ _ _ _ _ _ _ I I

f) - TC\j 0 C)o~ ~ ~ ~~~~~f o l ii C) (91 I I! I II ad _________ I _ __..o 00 In N$I~~L S$V1-10'I.SO3 AH-IINOE/M 91

terms of decreased response times; the returns diminish as response time continues. By studying this behavior and by correlating it with the hardware used in each of the display systems, a set of general design guidelines has been developed. The read as follows: "A satisfactory inexpensive display system uses a voice grade data link, no bulk storage, little or no core storage beyond the min imum needed, and the least expensive remote computer-display control. For little additional expenditure, the addition of a significant amount of bulk storage provides better response time. Inexpensive increases in the remote computer-display control's capabilities are also helpful. Further response time decreases are achieved with broad band data link speeds and more bulk storage. Additional response time improvements are obtained (at high cost) first by improving the remote computer-display control and then by using more core storage. " [8 ] To justify the necessity for these guidelines, Figure 3-13 shows just how severe the penalties of using a nonoptimal design can be. The figure plots response time versus cost for the network analysis application with three users. Both optimum and worst case response times for various values of cost are plotted. The worst case response time is the maximum response time of those display systems which cost the same as an optimum system. The graph's interpretation is that, for instance, is that if $2340 per month is to be spent on a display system, the response time can range from. 155 seconds (Point A) to 3. 92 seconds (Point B). These times differ by a factor of 25. Also, the graph can be interpreted to show that if a response time of. 1 seconds is needed, the display system's monthly cost can vary from about $2600 (Point C) to about $3400 (Point D). This represents an unnecessary expenditure of up to $800! 92

10.0 F V \ I < I \ -S Ldc 0-3 |~f)~\ -X D 15U 0 2000 2500 300 35 93 ~Ld~^ I> A D D.10 E 1500 2000 2500 3000 3500 PER TERMINAL MONi0THLY COST 93

In conclusion, the differences between optimum and worst case display systems are significant, with respect to both cost and response time. Therefore, display system designers must have at their disposal means of making intelligent design decisions, because the consequences of making bad decisions are too serious. 3.3 DESIGN OF OPTIMUM BULK IMEMORY SYSTEMS 3. 3. 1 Introduction The purpose of this work is to provide usable techniques for designing large scale computer storage systems. The approach being taken has been to develop techniques for the optimal design of such systems. Our concern in this research is assembling existing components rather than developing new ones. In the normal process of developing optimal design techniques it is common to encounter obstacles which require that simplifications or constraints be made in the mathematical models being used. The final result is a precise, well formulated technique which will design optimal memory systems under a set of restricting conditions and assumptions. There are several uses for such an optimal technique. First, one may find many situations where the memory design problem to be solved does not conflict severely with the restrictions imposed by the technique so that an exact optimal solution may be found. Second, when the problem to be solved does not conform to the restrictions of the techniques, often much can be learned by solving related problems which do conform to the 94

restrictions. Third, the rigorous optimal technique provides a base for future work as well as the possibility of generating general theorems relating to memory structures. Having stated the general goals and direction of this work, we now turn our attention to the specific goals of the optimal memory technique which has been developed to date. We will also discuss the specific restrictions which limit the present techniques as well as what direction has been and will be taken in removing or reducing some of these restrictions. 3. 3. 2 Design Method and Restrictions Progress has been made along two branches. The first is the development of improved mathematical techniques. The second is the implementation of those techniques on the computer. As would be expected the development of a mathematical techniques precedes its implementation by a considerable period of time. At the beginning of this past year, the mathematical limitations of the method which had been developed required that the size of the system, i. e. the total number of storage bits, be stated. The method also required a description of how storage will be used. This took the form of the expected value of how many times a given unit of storage (a unit of storage may be a bit, word, page, etc.) is accessed, divided by the number of times that the entire system is accessed. The method also required a statement of the cost and specifications of all devices which might be used to construct the storage system. From this information a set of configurations could be 95

generated which represented the optimal storage systems having average access times which vary over a predetermined range. During the past year all the capabilities of the earlier methods were realized in software. Figure 3-14 shows part of the results obtained from a single run of the current software. In this case the total system capacity is fixed at four million bits and the devices used to build the system were fictitious. This computer generated graph shows the optimal storage systems generated, with the system cost in dollars plotted versus the average access time of the system. Each asterisk in this figure represents a com plete storage system composed of from a few to many devices with varying characteristics. These systems were selected as optimal by taking into account the various factors discussed above for a specified case. There are several interesting facts to be discussed concerning this plot. Because of the discrete nature of the components used the resulting optimal systems are discrete. The large gaps between some optimal systems is caused by large gaps in device performance. The average access time of a typical core memory is about 10, 000 less than that of a typical drum. There are 34 systems plotted and there are only 34 optimal systems in the range considered. This implies that between system A and B of Figure 3-14 no system exists with a cost less than that of system A. Notice that it is simple to build systems above and to the right of the plot. That is, a system with a capacity of four million bits, a. 1 sec. average access time, and a cost of ten million dollars is easy to design. 96

XX x x X <~D x Q x m\ X o~XCa C) Q 0 X d v /, x ij U >- C() X uu ~ —-~'-~X - O - I- 0 0- 0 a) oL m'o (t ) lS03 Wi 97 z0X x omxm gOIXI ~01x1 m OXIm (8)LSO0 131SAS 97

Conversely, it is impossible to design systems below and to the left of the plot of optimal systems shown. The general shape of this curve is also important and can be utilized in the design considerations of the overall computing system. Notice that between systems C and D there is very little increase in cost and a corresponding large increase in performance. Between systems E and F there is a relatively small performance gain corresponding to a large cost increase. During the past year the storage system optimization procedure has progressed, becoming extremely general. Currently in principal there is no limit to the detail which may be introduced in the description of how the system is to be used. Correspondingly, there is no limit to the detail with which prospective components may be described. There is also a wide range of possible relationships between components selected and system performance which may be used. In order to realize this new flexibility in the software implementation of this procedure, great care must be exercised to isolate the basic optimization procedure, which does not change, from the software which deals with the details of a specific problem. This partitioning of software (into subroutines) will have two effects. 1. Maintain the new flexibility of the technique in the software implementation 2. Result in a more logical and manageable software organization, which will be easier to maintain. 98

During the past several months numerous changes have been made to the internal structure of the programs in order to effect this partitioning of the software. These changes have not changed the operation of the implemented algorithms in any way but have increased the flexibility and maintainability of the package. Other changes are being made in the software. These changes are of a different nature and involve the basic process of optimization. Among these are several small modifications which provide increased performance and improved error checking. One large modification is under way which will result in a major change in performance. Currently the dynamic programming process must take rather coarse increments in capacity during its operation. This results in an error which is almost undeterminable. The major change now being effected will allow the incre-6 ment size to be reduced by a factor of about 10 8 This major change will be the result of approximately four relatively small changes and one major addition. The small changes may be made without changing the basic operation of the system and serve a preparation for the major addition. The modifications discussed above have been in progress and will remain in progress for some time to come. It is expected that they all will be completed by the end of 1969. Both the old and the new approaches produce the same result. That is, the least expensive storage system having an average access 99

time equal to or less than some fixed time, and the storage system which gives the smallest average access time for a cost equal to or less than some fixed costo In either case, the storage system will be completely specified. The most important change is increased flexibility and increased accuracy. One of the primary restructions remaining concerns the problems of describing how storage will be used. This concerns the difficulties involved in predicting the usage of a future system. There is a straightforward problem of not being able to closely predict the future. A second and far more complex problem, however, arises from the fact that the way in which a storage system is used is often a function of its design. Therefore we may face the problem of designing a storage system for an existing computing system whose storage usage characteristics are not changing with time, yet the introduction of a new storage system will cause those characteristics to change. One may be able to predict the changes in usage which will occur due to the introduction of a new storage system, but this will not be simple. For many situations we may be able to reason that any changes in usage which take place due to the introduction of a new storage system, which is optimal for present usage, will result in further improvement of the overall system economy. An interesting situation arises when a computing system is performing some well defined service for which there are several different implementation methods. Very often the specific method used is dictated1 100

by the storage system design. It would be possible under these condi tions to compare the methods when each is using an optimal storage structure. In this way we may select an optimal method and an optimal storage system. An example of this type of problem is found in large scale information retrieval problems. A second major restriction is imbedded in the definition of average access time which is being used. In the older technique it was assumed that, regardless of the number of storage devices, there was no overlap in its usage; that is, each storage request is handled sequentially. The new techniques allow much greater flexibility in this area. Another limitation which the older method encountered is accuracy versus computer time. This problem has been all but eliminated in theory and only awaits implementation. Accompanying the effort to improve the basic theory and its associated software has been an effort to bring this technique to bear on real design problems. As a result a data base of components has been structured and will be added to in the coming months. Also, considerable effort has been expended in making the results produced by the computer as useful as possible to the user. This is being done by presenting the results in graphical form. During the coming year a complete report on the most advanced technique will be written. 101

REFERENCES, CHAPTER 3 1. Adage, Inc., System Reference Manual-Adage Graphics Terminal, Boston, 1968. 2. Adams Associates, The Computer Display Review, Bedford, Massachusetts, 1968. 3. David, M. R. and T. O. Ellis, The RAND Tablet; A ManMachine Graphical Communication Device, Proc. FJOC, Spartan Books, Washington, D.C., 1964, pp. 325-331. 4. Dawson, D. F. et al., "Computer-Aided Design of Electronic Circuits-A User's Viewpoint," Proc. IEEE,, 11 (November 1967), pp. 1946-1954. 5. Dertouzos, Michael L., "An Introduction to On-Line Circuit Design," Proc. IEEE, 55, 11(November 1967), pp. 1961-1971. 6. Doll, Dixon R., "The Efficient Utilization of Resources in Centralized Computer-Communication Network Design," SEL Technical Report No. 36, The University of Michigan, Systems Engineering Laboratory, April 1969. 7. Esau, L. R. and Williams, K. C., "On Teleprocessing System Design, Part II, A Method for Approximating the Optimal Network," IBM Systems Journal, 5, 3(1966). 8. Foley, James D., "Optimal Design of Computer Driven Display Systems, " SEL Technical Report No. 34, Systems Engineering Laboratory, The University of Michigan, March 1969. 102

9. Koford, J. S., et al., Using a Graphic Data-Processing System to Design Artwork for Manufacturing Hybric Integrated Circuits, Proc. FJCC, Spartan Books, Washington, D.C., 1966, ppo 229-246 246. 10. Konkle, K. H., "An Analog Comparator as a Pseudo-Light Pen for Computer Displays," IEEE Trans. on Electr. Comp., 17, l(January 1968), pp. 54-55. 11. Kruskal, J. B., Jr., "On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem," Proc. Am. Math. Soc., 7:48-50(1956). 12. Lewin, Morton H., "An Introduction to Computer Graphic Terminals," Proc IEEE, 55, 9(September 1967), pp. 1544-1552. 13. Prince, David M., "Man-Computer Graphics for Computer-Aided Design," Proc. IEE, 54, 12(December 1966), pp. 1698-1708. 14. So, Hing C., "OCLA: An On-Line Circuit Analysis System," Proc. IEEE, 55, 11(November 1967), pp. 1954-1961. 15. Spitalny, Arnold and Goldberg, M. J., "On-Line Graphics Applied to Layout Design of Integrated Circuits," Proc. IEEE, 55, ll(November 1967), pp. 1982-1988. 16. Stotz, Robert, "Man Machine Console Facilities for ComputerAided Design," Proc. SJCC, Spartan Books, Washington, D.C., 1963. 103

17. Temes, Gabor C. and Calahan, D. A., "Computer-Aided Network Optimization- The State-of-the-Art," Proc. IEEE, 55, 11(November 1967), pp. 1832-1863. 18. Wallace, Victor L. and Rosenberg, R. S., RQA-1, The Recursive Queue Analyzer, Systems Engineering Laboratory Technical Report no. 2, The University of Michigan, Ann Arbor, Michigan, 1966. 104

4. MEASUREMENTS It is always desirable to understand how the various components of a computer system are utilized. Such knowledge can be helpful when used to analyze and modify existing systems, or to synthesize new systems. Existing systems often need modification to decrease response time or to improve the utilization of system resources. New systems need to be designed to give high levels of performance from the start of their operation. The following sections describe two different types of data collected from the Michigan Terminal System (MTS), which is a time-sharing system running on an IBM 360/67. 4.1 GENERAL MTS DATA This section describes the general data collected from MTS [ 4] using a data acquisition system [ 3 ] built into the MTS operating system. The information has been taken from a more detailed report on Virtual Storage Computer Systems [2]. 4. 1. 1 General Description The data acquired for this study was taken during the period October 15, 1967, to March 31, 1968, in normal operating periods of the MTS system. Data collection periods ranged from 15 minutes to 7 hours duration, depending on the volume of data being generated and the nature of the jobs being observed. The periods were selected insofar as 105

possible to mirror the typical prevailing demand on the system: unusual circumstances of light load (such as just after system startup or malfunction) and heavy load (such as the hours preceding a student problem due date) were avoided. Essentially three types of jobs are run in the MTS system: 1) Normal remote terminal, interactive use of MTS with Model 33/35 Teletypewriters, IBM 1050 and 2741 Communications Terminals. (During the data acquisition period the number of such tasks which could be supported concurrently by the system grew from about 4 to 40). 2) Non-interactive use of MTS via batch mode, using an IBM 2540 card reader/punch and 1403 line printer as input/output devices, while providing full use of the command language and other systems features. 3) Peripheral support programs for an IBM 7090 batch-processing system which produce input tapes from punched card, and print and punch output tapes. During normal daily operation of the MTS system, from 9 a. m. to midnight, the then current maximum number of communications lines for remote terminals was enabled, one or two batch streams were processed, and up to two additional line printers and reader/punches were used for peripheral support jobs. Generally, less than half of the remote terminals were active at one time, one batch stream was rather consistently busy, 106

and an average of two to three of the SPOOLing1 operations were in progress. Data are not included here for the SPOOLing jobs, since they exhibit a very regular behavior: each I/O wait for tape or unit record device requires 50-200 msec., and is separated from the next such event by 2 to 3 msec. of processing. These peripheral support programs create a maximum of about 15% of the CPU load, and normally only 5- 10%. Data given here for the use of the CPU and I/O devices are separated for batch and conversational tasks. The I/O delays are also shown separately for different kinds of devices. A great deal of information is implicit in the data being collected which is not shown here, and additional kinds of data can be collected to answer specific questions. At least three distinct points of view could be taken to govern the organization and collection of computing system data: workload, system, or user. For the purposes of this investigation we have regarded their relative importance in the given order. For example, in displaying CPU data, we are more interested in the CPU intervals requested by a task than by the service actually supplied by the system. Again, when considering I/O delays, we are concerned about the length of time the system must wait before it can resume processing a task rather than the time an individual must wait to receive his answer. Thus in our case some output delays appear to ta'ke almost no time because the lines are buffered and the computation can proceed while the line is still being typed. Sirnaltaneous Peripheral Operations On-Line 107

4. 1. 2 Specific Characteristics Some facts about the MTS data chosen for display in this section are given in Figure 4-1. The number of jobs is the approximate number of University of Michigan Multi-Programming System (UMMPS) jobs using MTS that were observed with the data collection facility. The number of tasks is the approximate number of individuals who used these jobs during the collection interval. The total number of unit record devices, communication lines, disks, etc., that were referenced by these jobs is also given. The identification numbers 0-5 of these different sets of data will be used to label graphs in the succeeding figures. Several general observations can be made about the environments in which the data was taken. MTS # 4 was obtained after only a few weeks of experience with paging in MTS. At that time the system had a communications line capacity of about 10: at most 10 conversational users could run in addition to the batch and SPOOLing operations. Furthermore, use averaged considerably less than the capacity. The data sets # 2 and # 3 include only data for the batch streams active at the time. Because they use unit record devices instead of remote terminals, batch jobs are processed much more quickly and create a heavier system load than conversational jobs. More I/O operations generally occur in batch, since people take advantage of its lower cost and speed for input and output functions. The data # 1 was taken with 10-20 conversational users, which approximated an average load at that stage of MTS development. With less than 108

General Characteristics of the MTS Data r -..-...-..... —---.............. —---- ID MTS # 0 I MTS # 1 MTS # 2* IMTS # 3* IMTS # 4 1 --. —.....+_......~ ----- -------- ---- __ —_. Date 3-20-68 1 3-18-28 1 3-18-28 1 1-15-68 111-29-67 I - — 4 —-, _.. I —---- ---------— H- -------- I I I I I Time 1 15:27 1 15:36 1 11:41 1 14:51 10:23 1 t..- ---------- -.. —. ——.....- ------—. 1 1 1 I I I Duration 1 20 min. I 76 min. 1150 min. 1 79 min. 1266 min. — _ ~ —- ------- ---— + —-----._...... I 1 I I i I # Items 1 542246 1 1355274 1123503 1 4662231 705890 F —-- 4 —-- - ------------—. —----- # Jobs! 40 1 40 2 1 9 - ------- -— I —----- --- -— I --- 1 1 1 I # Tasks 1 50 1 150 1 75 161 84 -c=__| —---- ---- -------- ------- -------! I I i I I Devices I 57 1 65 291 20 I 29 ____a * Denotes batch job data. Figure 4-1 MTS Data 109

about 16-20 users, the paging drum frequently had periods of inactivity: most command chains for I/O operations were half-empty, and there was often no channel program in progress at all. In order to observe tasks under a heavier system load, data set = 0 was taken while a special background job (called PAGE-IT) was running to increase drum activity. The PAGE-IT job acquires a large number of virtual storage pages and references them cyclically in rapid succession. When run with a moderate load of normal taks, PAGE-IT keeps drum channel programs running more or less continuously and forces other tasks' pages out of main storage at a faster than normal rate. Figure 4-2 shows the distribution of actual CPU intervals obtained by tasks during the data collection periods. Any interruption in service to process another task ends the CPU intervals appearing in this distribution. One to four percent of these intervals lie in the neighborhood of about 300 microseconds, or about the minimum time required by UMMPS to service an interrupt which requires little or no processing. The smoothest curve, and the one with the smallest mean, is that of data set # 0. The reason for this is that the system was the most occupied with ordinary tasks at that time, and PAGE-IT was running to force additional page-wait interruptions for the normal tasks. Since # 0 was the latest of the given data to be taken, it also reflects some improvements made to the system which make it operate more efficiently under periods of heavy load. 110

I (N rt \ 2:~ ~~~~C 1 0 ea~\ C4 9 CY 1) (0i U; -Cr)~~~~C co~~~C n I, 8 cc ln 9 1 W\ CC $ in Q ~0(f Q. ~~~~a~c, P$ I~~~~~p ~^ r —In\ c- uj l\\~~~~~~~~~~~~~C 2 co co CD n LO^ CD a: C' I \\\ \ U~~~~~~~~~~~~~~~~~~~~~~~~~~L- bD ^ Inc~ ls r C u cQm i I I f~0tI r' " Cl3 U; 00 02'~~ ~~9"~ Itii 3 "O'\ \?~lC8N JD * ^ —-::^^.

Actual CPU intervals depend on the frequency of interrupts, hence it is not surprising that the data set (# 2) with the longest mean value was collected over a noon hour. Except for that case, the mean length of a CPU interval decreases steadily with time-average system load grew significantly over the period of study. Data set # 1, whose curve shows a large number of intervals of length about 4 msec., was taken during the testing of a new I/O routine when an error occurred, and over 28, 000 I/O operations were executed in rapid succession, generating an equal number of short CPU intervals. In Figure 4-3 we have distributions of CPU intervals requested by individual tasks: here the interruptions in service to process other tasks are removed from the data, so that the given times represent CPU intervals terminated only by I/O and paging operations for the task using the CPU. The curves for # 3 and # 4 also accumulate time across paging delays for the given task-they show the distributions of CPU time requested between I/O operations. We note here that both of the latter curves exhibit a gap in observed values near the origin: there are a few nearly immediate I/O operations, but then almost none of duration 800 to over 2000 microseconds. The difference between the curves for # 3 and # 4 is probably due to the fact that the former data is for batch jobs, which generally do more I/O operations. Turning to the curves for # 0 and # 2 in Figure 4-3, which represent CPU intervals terminated by either page or I/O wait for the given task, we see again that a heavy paging load (# 0) significantly reduces the mean length 112

co %rl ~-T 0 ~ ~ ~ ~ 0 \~~~~~~~\' 04 tt\%~~~~~~~~~~~~~~~~~t co VL \ V \^~~~~~~~~~~~~c, cl\ " e -8c C3 a tor f\I l0~ \\\ s';-^ \y, I~ ~ ~~ \~ ^- d" \s^^ W \ (D S. T CS)^ J\\ \ \- \ol ^ Oc~'~~~~~~~~c.^-+-^,^aj 3^1-0'~~~~O0 ^oo: sgltf^N~ P juJ u03

of a CPU interval. There is too much variation in the origin and makeup of batch jobs to draw hard and fast conclusions from differences between the curves for # 1 and # 2, which represent conversational and batch jobs run on the same day. Batch runs include a number of distinctly different tasks: small student problems limited to batch mode, use of faster devices for listing and card reading jobs, and long computational tasks. The difference between the two curves is probably more influenced by the fact that the batch data was taken earlier in the day, when a lighter overall load contributed an average of fewer page-wait interruptions. The density functions for requested CPU intervals exhibit a number of local maxima in the range 0-6 msec., which are present in almost every case and more noticeable with less paging load. A careful analysis could probably associate these peaks with one or more frequently-used system functions. Figure 4-4 displays the distributions of page-wait delays experienced with the MTS paging drum processor. The mean values of these distributions strictly increase with increasing system load, which is the reverse order from which the data sets are numbered. Although the given data all represent the time required to obtain a requested page, each curve is a composite of several different kinds of distributions. Once an unavailable page is referenced one of five actions may be taken: 1) It is a new page for which space can be allocated immediately in core (1-5 msec.). 2) It is a new page for which core space can be allocated only after 114

LO o cvJ I0o~)~ o 03 ~_\ _0j3 Ia) -_ 0 1\ 1- in di E; MS1 &cn 7 o |\IS~Sfe -— n 0\19'u 5 0~ I v ~ — a cJ- NNAI Cfi 11 5^ 3u LU 3 Un a) C ) 1n \J ~> rt 1r5 LdL

another page has been pushed out (a written page may be posted at any time) 3) An existing page must be read from the drum (at least 36 msec.) 4) An existing page must be read from the drum but must wait for a write to provide core space. 5) The page may still exist in core even though a write operation is currently in progress, which may be cancelled to make it available immediately. If a drum operation is required (cases 2 to 4 above) then a further distinction is possible: whether or not the drum is currently under the control of a channel program to which the new request(s) can be chained. If so, a distribution of actual completion time can be found. [2] If the drum is not currently transmitting an additional average delay of half a physical revolution is experienced, since a new channel program is always started at the drum index point. In the latter case, however, it is unlikely that more than a single revolution will be necessary to reach and transmit the desired page. One final fact is of importance in understanding the distributions of Figure 4-4: MTS page transfers were posted (at the time) exactly once at the end of each logical revolution of the drum. Thus every actual read or write operation is known to be completed only after some multiple of the logical revolution time (35 msec.), plus any delay in synchronizing with the construction of channel programs. 116

All the data in Figure 4-4 is dominated by the characteristics of startstop rather than continuous drum operation except # 0, where the drum was forced to run more or less continuously. In that case a great many read operations took three logical revolutions or more. Apparently most available core pages were "reclaimed" or used for newly created pages, so that read requests for drum-resident pages often had to wait for drum writes as well as queueing and read delays. In any case the performance of the drum under the conditions of # 0 leaves a great deal to be desired. The overall distribution of I/O wait times given in Figure 4-5 is poorly shown due to a gap in the plotted points in the range from 0. 2 to 2. 5 seconds. Most delays fall into three ranges according to the type of device: a) Terminal and other I/O to buffered devices which appears to take almost no time at all. b) Disk and unit record I/O, most of which lies in the range from 35 to 70 msec. c) Terminal output with buffer full, and unbuffered input, which usually takes over half a second. The curve for MTS # 1 is missing from Figure 4-5 because of the large number of identical I/O waits occurring in the test mentioned earlier. Batch and conversational jobs have distinct distributions —in the former case the longest normal I/O operation is a maximum disk seek, which requires about half a second. The ready distributions of Figure 4-6 show how the MTS system 117

to c) C3 Lfl O 0 LO 0)0 aCf) C) 0)1*n 0 C 0:) *3 0 O O L crE- 0cr (n z. C4 M. CL _ 1 V-L- >-J MD oo >- cJ co w c LL -- I C 1u- rr oC) S3I~JL _0..ij f~. gs g noz o L\J C~ -. \l S31]IIN3 JO NOII13OUJ 3AIiUfnNn3 118

Te 00 3CO -0 0 ~ \\ \ ^S ~(n I — }s \\ Q E — Z: >' "-J C ai 5= ~~~~~~~~~~Cs u O oo. 0- g o ISSI N] U NO OUHI - 3AIIU1W IS I- U ~ 119~ z M I1 co ro I- cr. EC IX-t ui LOj~ CO x -z f cnc~ ~Ld; a inn o c LL LC} 00' 08' 09' 0'~' 02' 00~' S3I1IN3 JO NOI13WWJ 3AIlIqnNn3 119

responds to requests for service, This data is rather consistent. The fact that # 0 has the shortest mean is again due primarily to the fact that the heavy paging load forces many more transitions between tasks entering page-wait. The remaining figures in this section give disk and terminal characteristics. The disk observed during this period was the IBM 2314 Disk Storage Unit, which provides a bank of eight separate packs on a single control unit and channel. The disk is used in MTS for line file storage and utility files for compilations and assemblies. In Figure 4-7 we see the actual lengths of disk I/O operations. The principal components are a) Control unit wait (transmission of data or commands from anothere disk pack on the unit) b) Seek time to move the read/write heads to the proper cylinder (which is often not required) c) Rotational delay to reach the front of the appropriate record on the disk track d) Transfer time for actual reading or writing of the record. The distribution with the longest mean is that for the data taken under the heaviest load. Under these conditions we expect control unit wait to be a significant factor, since at the time up to six of the packs could be in competition for the use of the single control unit. Another very large factor is the frequency with which seeks are necessary: a single task will often require several records from the same cylinder in short succession, hence 120

0 tC c; Z B C) CY)t~~L LDnd L.c) C %Ui o- o -u lUit cu l* ( ) i-4 ui Co (L) oo Cn3 ( ^i3 L1J21 ca (C.) I-n~~t Co I l\ - YO0 __.~I~ 09 090 — 0-^~0?2 00'

few seeks are necessary unless the load is such that a task can obtain only one record at a time before another task causes a seek to a new cylinder. The fact that the earliest data has the second longest mean value is due in part to the fact that less efficient file routines were in use at the time, which required more long search operations that tie up the control unit. Another cause is the fact that only four disk packs were available. Similar remarks can be made for Figure 4-8, which shows the distributions of times that tasks had to wait in queue for the use of a specific disk pack. The times in Figure 4-8 are somewhat inflated because the end of a wait for a pack is non-interrupting, and hence it is not discovered to be over until the task reaches the head of the CPU queue. Finally, we display in Figure 4-9 a few distributions of terminal I/O times. Two distributions each are given for specific Teletype lines from the three sets of conversational data. Although this data exhibits considerably more variation than the synchronous intervals, we can see the effects of the following characteristics of terminal I/O operation: over half the times are for output lines, which clearly predominate. Many of these lines can be placed immediately in the one-line output buffer, so that most output times are less than the time required to actually print a line at the remote device. Shorter times for the input lines of MTS # 0 data suggest that as system response time moves away from practically instantaneous, an individual can make use of the time to formulate his next command, which he then gives more quickly. 122

C) 0C r1 U ooo O r s) C a-4 UJCu C) _ Co.= 1 o' cr 09 III~ 0~~ 0NI ft0 0 4 SI- Ic cON oI ^I 001 03' g S S]IIN iG NOWUo Af 123 5 i l\ r8 Q~~~~~~~~~~~ \i\~~~~~~~~~~~~~~~ \ ~ A^S^ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~ e ~^^^^s ^s. ^<^,i -^^,..^5T^-^^~ ii.iu^'~M+^-a~~~~~~~~ i iN oNnua y" 123~~~~~~~~~~~~~~~e

U~ 0C D -t c'J ~n I) — u0 cl: >- a Cl Ub~ J~-I L Oc- Co 0 C c cv -DC) C) ) _2 |o,, L L oLLJ!'NC F- Zc X O |cozco co J | e- z ID ^i_ I I — 0 I V. I I: I-: CC_ C I _I CD'- {4 0 S]I WIN] _ilO NOIIOU.3W~_J ZIAIIUlWIfi 1- ---- j ----- j ----- I ------------ j ----- j ----- j ^\ -^~ ^s^~ ^ ^ ^ ^ ^ — - 00*1 08' 09' 0^' OS' 00"'~~~~~~~~~d S3IU1N3 JO NOI13UUJ 3AIJ.9~in~~~n3

4.1. 3 GPSS/360 Simulation Model A simulation model of MTS was written to study the effects of hardware changes on system performance. Input to the simulation is the data discussed in the preceding section. 4.1.3.1 GPSS/360 The General Purpose Simulation System/360 is a discrete, digital simulation program developed by the IBM Corporation [ 1 ] which runs in the standard IBM Operating System/360. Simulation model statements, after simple processing by an assembly program, are interpreted during execution by the GPSS/360 program. GPSS/360 is the outgrowth of a series of general purpose system simulators for the IBM 7000-series computing machines, the most recent version of which is titled GPSS-IIL GPSS/360 relaxes many GPSS-III restrictions, introduces new entities and block types, allows much more control over storage allocation, and provides a limited graphic output feature. The GPSS language is particularly suited for modeling in such commercial applications as job shop scheduling and manufacturing network flow. However, it is also useful for computer system simulation at the fairly gross level of detail which is used here. Its primary disadvantages are that it is comparatively difficult to learn and relatively slow in execution. Its generality and flexibility were considered to outweigh those disadvantages for this particular application. 125

4.. 3. 2 Organization of the Model The simulation model described in this paper consists of about five essentially different parts: 1) hardware descriptions 2) operating system algorithms 3) workload characteristics 4) variable system parameters 5) statistics gathering. The first of these is the simplest to implement in a simulation language, and requires a negligible fraction of the set of model statements. Characterizing the data in the system (in this case requests for computing services) and specifying the parameters by which the system is "tuned" to improve its performance require about the same amount of effort, and together account for only a fifth or so of the total description. By far the most design and analysis work and the resulting model statements are required to describe the algorithms used by the operating system for handling devices and service requests, and to decide what information about the model operation is relevant and measurable, and how it should be collected and presented. Each of these two aspects of this particular model required over a third of the total simulation effort. The following sections will discuss each of the basic aspects of the model in terms of its content, overall structure, and the GPSS statements essential to this implementation. 126

4.1. 3. 3 Model Data: the Workload Description An approximate description of each MTS interaction is given by the parameters of a single transaction (XACT). The size of the executing program, the time intervals required for execution and synchronous I/O waits, and the frequency of requirements for additional virtual memory pages during execution are all described by XACT parameters. The master XACT representing each MTS interaction is passed through the job scheduling part of the model. Transactions are also used elsewhere to stand for a single memory page as it passes through the drum I/O channel, an entry on the queue for use of the CPU, and to implement the data transfer during the drum rotations. Virtual memory page XACTS required by a task are members of a GPSS assembly set, and a specified number of them must be ASSEMBLED before a task is ready for execution. These XACTS are SPLIT from others as needed, and TERMINATED when they are no longer required to trigger additional events by their flow through the model. 4. 1. 3. 4 Hardware Descriptions The only parts of this model which directly represent physical devices occur in the choice of the time unit as the drum sector rotational delay, and in the model parameter which specifies the capacity of core storage. Hardware/software combinations such as the operation of the drum as an extension of main memory and the supervisor algorithm which drives it are also modeled. 127

As each page transfer request arrives at the drum processor, it is LINKed to one of the GPSS "user chains, " which are used to represent drum sector queues. Once every time unit a special "clock" XACT passes through an UNLINK block which attempts to remove a queue entry from the sector currently at the read/write heads. Any XACT so UNLINKed is next ADVANCED for one time unit to represent the transfer time, and then either sent to be ASSEMBLEd for CPU service (if a page-in request) or simply TERMINATEd to remove it from the model (if a page-out request). 4. 1. 3. 5 Operating System Algorithms Additional algorithms represented in the model for the operating system itself include a) the choice of when to interrupt the task currently using the CPU b) the choice of when to allow a new task to bring its pages to real memory and compete for the use of the CPU c) the choice of when to page-out a task not ready to use the CPU. Decisions are made with these algorithms at specific "control points" in the model by routing task XACTS based on the value of task characteristics (ACT parameters), system load factors (QUEUE lengths and STORAGE usage), and decision-aiding parameters (SAVEVALUEs). The control points occur when the executing task requests a virtual memory page which is not in real memory, when a task has used a certain amount of CPU time, when real memory usage drops enough to allow room for the pages of another task, when a task begins an I/O operation, and when an executing task 128

is interrupted. 4.1.3.6 System Parameters Decision-making algorithms for operating systems are normally designed to use a number of parameters-numerical values which can be changed as frequently as desired to improve performance as more is learned about how the system behaves, as the hardware configuration changes, and as the workload develops different patterns of demand for system resources. Examples of such parameters existing in this model are a) the maximum amount of real memory available to a single task at one time b) the maximum length of time a task may continuously use the CPU c) the maximum number of real memory page requests allowed to enqueue in the system when core is full d) the number of real memory pages required to be free before a new task is allowed to compete for the CPU. These values are stored in GPSS SAVEVALUE locations. They are initialized for each run, and can be changed during the simulated operation if necessary. 4.1. 3. 7 Simulation Results This section gives the data obtained from GPSS/360 simulations made with the model described in the previous sections. A number of simulation 129

runs were made using a pair of JOBTAPEs of task transactions from the MTS data. These tapes were obtained by abstracting MTS data sets # 0 and # 1 during the analysis of that data for the presentation in Section 4. 1. 2. Each simulation run begins at the front of one of these two tapes and uses as many transactions as necessary in order to simulate system operation for a specified amount of elapsed time. For initial runs, both JOBTAPEs were run with a pair of "extremal" choices of the model parameters: a) a Basic configuration of hardware and programming techniques, using: 80 core pages a 9-sector drum 16 time unit time-slice read before write drum queue discipline drum writes to shortest queue FIFO CPU queue discipline posting of pages once per revolution b) an Advanced configuration, which differs from the Basic one in the following choices of parameter values: 144 core pages large core storage instead of drum 24 time unit time-slice priority in drum queues to large tasks 130

immediate posting of pages. The remaining runs were made using parameter choices differing from Basic in only one or two parameter values, and generally with values chosen from the Advanced model substituted for the Basic values. Each run begins with an initialization period, after which two or three sets of data are taken for intervals of 30 simulated seconds. Because the initialization intervals were taken to be rather long, their data (which is not included here) agrees quite closely with the values observed for the regular intervals. The Basic and Advanced models were each run for three intervals, and the remaining models for two. In several cases, however, runs were terminated after a somewhat shorter last interval because of a problem with controlling the input rate of task data from the JOBTAPEs. Thus the data displayed in the figures of this section is normalized by the length of the run segment. Figures 4-10 and 4-11 show the results of running the Basic model with the MTS # 1 and MTS # 0 JOBTAPEs, respectively. The same data for the Advanced model is given in Figures 4-12 and 4-13. Figure 4-14 lists the values obtained by using the Basic model except for the larger core size taken for the Advanced model. Similarly, Figure 4-15 gives the data for the Basic configuration altered only by immediate page posting. Subsequent figures show other variations. Only one run (described in Figure 4-18) was made with values deliberately chosen outside the techniques used in the Basic and Advanced models. In this case several poorer techniques 131

BASIC SIMULATION MODEL MTS Data # 1 ~~r....- ~-.- -T. _ _PERFORMANCE... I I II I III I —------ ---------------------------- ------ — __ —-- --- * CPU Utilization I.456.436.591 Average queue contents 1.601 1.508 1.767 Maximum queue contents 1 9 I 7 6 Percent zero entries 53.6 55.3 1 40.7 * Paging mechanism utilization I.499.685 I.672 Average queue contents 12.5 1 22.1 19.4 Maximum queue contents 69 69 65 Mean completion time 1 25.1 1 32.0 28.9 Tasks Completed/Second I 10.5 12.8 1 16.4 II I Mean No. of Active Tasks 1 8.8 7.1 1 9.2 I I I I I WORKLCAD I I II I III _- -- * __ _ _ ---- -+- * 1 250- 564- 1 950I Task No. Range 563 1 949 1 319 I 1 I I No. of CPU Intervals/Second 52.4 4 3.2 63.8 Mean CPU Service Time 1 2.24 1 2.60 1 2.38 Percent Initial Pages 1 86.4 1 89.7 I 82.3 No. of Pages Written/Second 64.2 88.3 1 86.2 Mean No. of I/O Operations 5.49 13.38 1 5.20 Average I/O Time 1 42.3 41.3 41.6 I I I I I Figure 4-10 Basic Simulation Data 132

BASIC SIMULATION MODEL MTS Data # 0 PERFORMANCE... I I II I III F —- ------------------------ ----- -_ —-- ----- I * CPU Utilization I.253 1.311.189 I Average queue contents 1 304 I.261 1 136 I Maximum queue contents 8 6 1 5 Percent zero entries 37.6 1 51.0 I 55.3 I I I I I 1 * Paging mechanism utilization I.439.401.263 I Average queue contents 11.0 1 5.56 1 3.48 I Maximum queue contents I 71 1 47 1 48 1 Mean completion time 24.6 13.8 I 13. 2 I!I I I Tasks Completed/Second 1 11.0 7.8 1 4.9 i I I i I I Mean No. of Active Tasks 1 7.5 1 8.7 1 6.0 WORKLOAD I I 1 II I III 252- 582- 816- Task No. Range 1 581 1 815 1 901 I I I I i I I No. of CPU Intervals/Second 47.7 6 52.8 1 36.5 Mean CPU Service Time 1 1.37 1 1.28 1 1.34 1 i I I I I I Percent Initial Pages 1 51.3 1 34.2 1 34.9 No. of Pages Written/Second I 56.5 1 51.7 33.4 I Mean No. of I/O Operations 1 3.73 1 5.55 3.24 I Average I/O Time 102.4 I 68.4 1 82.5 1 I I I I I ( ___ -_____ _ _, _,, ___L___ L J Figure 4-11 Basic Simulation Data 133

ADVANCED SIMULATION MODEL MTS Data # 1 PERFORMANCE... I I II I III * CPU Utilization 1.717 1.621 1.856 Average queue contents 1 2.19 1 2.14 1 5.08 Maximum queue contents 16 1 20 1 16 1 Percent zero entries 1 28.9 1 34.9 15.2 I * Paging mechanism utilization I.804 1.779.721! Average queue contents 1 32.9 1 36.8 21.1 I lMaximum queue contents 1 >100 1 >100 1 95 Mean completion time 40.2 47.2 1 29.3 I Tasks Completed/Second 1 13.7 1 13.8 16.2 I I I I I Mean No. of Active Tasks 14.5 12.9 18.3 i I I I I,-t —-.. -.........-I WORKLOAD I I II 1II 251- I 661- 11077- Task No. Range 1 660 1 1076 1 1483 No. of CPU Intervals/Second 1 81.5 1 64.8 1100.6 1 Mean CPU Service Time I 2.26 1 2.47 1 2.19 Percent Initial Pages 1 78.2 1 76.1 1 74.6 No. of Pages Written/Second 1104.5 1 99.2 1 92.3 Mean No. of I/O Operations 1 6.69 1 3.98 1 7.60 Average I/O Time I 38.1 1 37.5 1 32.0 L..._. J __ __J Figure 4-12 Advanced Simulation Data 134

ADVANCED SIMULATION MODEL MTS Data # 0 1 PERFORMANCE.. I I I II I _ _ —---- - -__ —----- ------ ---— _ I I I I CPU Utilization I.642 1 o630.555 I AAverage queue contents t 1.55 1 1.79 1. 74 Maximum queue contents 1 16 17 17 Percent zero entries 36.3 1 37.7 1 40.2 I i I t I Paging mechanism utilization.900.881 1.957 I AAverage queue contents j 25.4 29.4 1 45.2 Maximum queue contents 96 1 100 1 >100 3 Mean completion time 1 28.0 33.2 i 47.2 I 3 1 i 1 Tasks Completed/Second I 21.1 16.9 1 20.5 Mean No. of Active Tasks 18.5 15.7 1 17.9 I I I I I WORKLOAD I I II III 1 251- 1 884- 11392- 1 Task No. Range 1 883 1 1391 2007 1 1 X I I I No. of CPU Intervals/Second 1114.7 1100.5 1 88.7 1 Mean CPU Service Time 1.44 I 1.61 1.61 I I I I I Percent Initial Pages 1 46.6 1 48.4 60.3 No. of Pages Written/Second 1116.2 1112.5 1123.1 I. I I I I Mean No. of I/O Operations i 7.32 1 3.69 1 2.48 Average I/O Time 1 59.2 I 37.0 1 32.9 I. __ I __ _I I Figure 4-13 Advanced Simulation Data 135

MODIFIED BASIC SIMULATION MODEL (Using 144 cote pages) MTS Data # 1 -.-.. — -----— TPERFORMANCE... I II t —--- —.. —---. —+- -.-.........-...... — I * CPU Utilization.601 1.707 I Average queue contents 1.78 1 4.08! I Maximum queue contents 15 1 17 1 Percent zero entries 1 38.0 1 21.8 * Paging mechanism utilization.854.856 Average queue contents 1 49.3 I 56.2 Maximum queue contents >100 I >100 Mean completion time 56.8 1 65.5 Tasks Completed/Second 1 17.6 1 20.0! I I I Mean No. of Active Tasks 12.8 1 13.3 WORKLOAD I I II 250- 777- Task No. Range 776 1378 No. of CPU Intervals/Second i 68.1 72.2 lean CPU Service Time 1 2.27 2.52 Percent Initial Pages 1 91.0 1 89.2 No. of Pages Written/Second 1110.3 1110.2 Mean No. of I/O Operations 6.78 1 4.79 Average I/O Time 42.9 30.5 I I I I I Ii. ---- ___- __LJ Figure 4-14 Modified Simulation Data 136

MODIFIED BASIC SIMULATION MODEL (Using immediate page posting) MTS Data # 0 PERFORMANCE... I I II I * CPU Utilization.484.310 I Average queue contents.250.1 099 Maximum queue contents 6 5 Percent zero entries 73.9 82.8 I I I I * Paging mechanism utilization.705.387 I Average queue contents 16.6 1 5.96 1 I Maximum queue contents 69 1 55 Mean completion time 1 23.5 15.4 Tasks Completed/Second 14.0 6.11 * i I I Mean No. of Active Tasks 1 7.7 1 7.6 I I I I WORKLOAD I II ____- -,. --- --- -......... l.+__.. —--— t~~~~~~I I ~201- I 621Task No. Range 620 1 805 i I I I No. of CPU Intervals/Second I 98.3 60.1 Mean CPU Service Time 1.44 1 1.33 Percent Initial Pages 44.3 28.9 No. of Pages Written/Second 90.7 50.0 Mean No. of I/O Operations 4.41 5.05 Average I/O Time 1 51.9 69.3 I1_._..... _-_. _ Figure 4-15 Modified Simulation Data 137

MODIFIED BASIC SIMULATION MODEL (Using LCS for paging mechanism) MTS Data # 0 IPERFORMANCE... I I II * CPU Utilization.362.202 I Average queue contents I.562. 332 I Maximum queue contents 10 10 Percent zero entries 36.2 39.8 i ~ Paging mechanism utilization.598, 369 Average queue contents 10.8 7.15 I Maximum queue contents 59 62 I Mean completion time 1 18.0 I 19.3 I I I i Tasks Completed/Second 16.9 9.8 I I 3 i Mean No. of Active Tasks 1 12.6 1 7.3 1 I g I I WORKLCAD I I II..................... ——.. —------------------— i!I ~~~~~~1 251- 759- I Task No. Range i 758 a 1052 I I I I No. of CPU Intervals/Second I 72.7 l 37.6 Mean CPU Service Time 1 1.28 | 1.39 1 i I I I Percent Initial Pages 53.5 | 61.6 i No. of Pages Written/Second 1 77.0 47.5 I I I I Mean No. of I/O Operations 7.42 3. 24 Average I/O Time, 94.5 85.2 I I I I Figure 4-16 Modified Simulation Data 138

MODIFIED BASIC SIMUIATION MODEL (Using preemptive CPU queue discipline) MTS Data # 1 ~ r _ _ _ _ T 1 I PERFORMANCE... I II | F —-- ---- -- -- ------------— e —I CPU Utilization I.425 1.407 I Average queue contents.339 1.347 Maximum queue contents 5 1 8 Percent zero entries 53.0 53.3 Paging mechanism utilization.513 1 709 I Average queue contents I 13.5 22.9 Maximum queue contents 68 1 74 1 Mean completion time 26.4 1 32.1 I I I I Tasks Completed/Second 10.8 1 13.0 I I i I Mean No. of Active Tasks 1 8.7 I 7.0 II I I i F.- -- I WORKLCAD I I II F —- --- ---------- ----— + ---- -A 250- 1 574Task No. Range 573 963 No. of CPU Intervals/Second 1 54.4 42.8 Mean CPU Service T'me I 2.01 1 2.45 Percent Initial Pages 85.4 1 89.3 No. of Pages Written/Second 1 66.2 1 91.4 I i I I Mean No. of I/O Operations 1 5.28 1 3.30 Average I/O Time I 39.8 1 41.9 8r _ ~_.Figure 4-17__J Figure 4-17 Modified Simulation Data 139

MODIFIED BASIC SIMULATION MODEL (Using random drum writes and FIFO queues) MTS Data # 1 PERFORMANCE... I I II I-' - _______ _____ * CPU Utilization.410.291 I Average queue contents I.455.154 I Maximum queue contents I 9 I 5 1 Percent zero entries 1 59.7 65.9 * Paging mechanism utilization.493.515 Average queue contents 1 13.2 I 15.2 I Maximum queue contents 1 62 1 61 1 i Mean completion time 26.7 1 29.4 1 i I I I Tasks Completed/Second 9.7 1 8.9 Mean No. of Active Tasks 1 9.1 7.3 WORKLOAD I I II I.-................... — t-.......t- -- ~~I~~~~~ 1 ~250- 1 542- Task No. Range 1 541 1 808 I No. of CPU Intervals/Second I 52.3 1 37.8 1 Mean CPU Service Time 1 2.02 1 1.98 Percent Initial Pages | 82.9 1 80.7 7 No. of Pages Written/Second I 63.3 1 66.5 I Mean No. of I/O Operations 1 4.88 1 3.24 Average I/O Time 39.3 51.4 I _ I _ _I__ Figure 4-18 Modified Simulation Data 140

MODIFIED BASIC SIMULATION MODEL (Using queueing options and longer time slice) MTS Data # 0 r - -T.1 1 PERFORMANCE... I I II I I — -----------—.. —--------— I —---- ------ I * CPU Utilization.324 1.238 Average queue contents.533 I.180 Maximum queue contents 1 101 5 Percent zero entries I 33.9 1 42.2 * Paging mechanism utilization I 478.296 Average queue contents I 9.56 1 3.72 I Maximum queue contents 67 I 39 Mean completion time I 20.0 1 12.6 Tasks Completed/Second 11.2 5.9 Mean No. of Active Tasks 7.7 7.8 I I I I F —__ _ __ _ —-4- +-_ -— 1__ _____ WORKLOAD I I II I I I 251- 588- I Task No. Range 1 587 1 765 I 4~I I I No. of CPU Intervals/Second I 55. 1 47.9 1 Mean CPU Service Time 1.51 1 1.28 I I I I Percent Initial Pages 47.6 1 31.3 No. of Pages Written/Second 1 61.7 38.1 1 Mean No. of I/O Operations 1 3.68 1 4.98 Average I/O Time 80.1 79.8 I I I I Figure 4-19 Modified Simulation Data 141

MODIFIED BASIC SIMULATION MODEL (Using 4-sector drum and 20% unchanged pages) MTS Data # 0 r — -"- T ---- - - ---- - -r PERFORMANCE... I I I II I —................ _.. —- — + —--— 4 * CPU Utilization I,426 1.350 1 I Average queue contents.475 1.323 i Maximum queue contents i 6 1 7 I Percent zero entries 1 30.6 36.9 * Paging mechanism utilization I.574 1.458 Average queue contents 11.2 1 6.08 Maximum queue contents 70 1 57 Mean completion time 1 19,5 1 13.3 I I I I Tasks Completed/Second 1 13.9 1 9.7 t I I I Mean No. of Active Tasks I 9.2 8.2 I I I I I — - ----- --- --- t —------- -- WORKLOAD I I I II 253- 1 669- Task No. Range 668 959 No. of CPU Intervals/Second 1 81.4 68.2 Mean CPU Service Time 1.35 1.32 Percent Initial Pages 1 63.4 1 62.2 No. of Pages Written/Second 64.6 1 52.7 I I I I Mean No. of I/O Operations 1 5.32 1 5.05 Average I/O Time 45.4 67.4 4 I I i I L__.... - i _ Figure 4-20 Modified Simulation Data 142

were used to gauge the sensitivity of paging drum processing to such changes. Figure 4-20, which was run with a 4-sector drum in the (otherwise) Basic configuration, also provides for a fraction of unchanged pages: 20% of the write requests were not executed in this case, assuming that the pages were unchanged since their last trip to the drum. 4.2 MTS DISPLAY CONSOLE DATA In this section data taken on different display applications will be presented. The purpose of this data is to compare the computational requirements of various display (and non-display) applications. The data has been collected for jobs running on the University of Michigan's 360/67, using the Michigan Terminal System. Imbedded within MTS's executive system is an efficient data collection system. Basically, a data item is recorded (on tape) whenever an event pertaining to specified jobs occurs. Examples of the events are 1) the start of a CPU processing interval, 2) the end of a CPU processing interval, 3) page read in or page read out start and end, 4) acquiring or releasing virtual memory pages, and 5) I/O to terminals, printers, or tapes. A program has been written to analyze the data for any job and produce a series of probability distributions and summary data for the following quantities. 143

1) User think time. This begins when the computer system is ready to accept new input from the user, and ends when the input is completed with an end-of-line indication. 2) CPU time used during the think period. 3) Computer system response time. This begins at the completion of an input line, and ends when all output has been finished and the computer system is again ready to accept input. 4) Processing interval lengths during response periods. During a processing interval a job has exclusive use of the CPU, except for supervisor functions. 5) Number of processing intervals during a response period. 6) Number of characters in input lines. 7) Number of characters in output lines. When the analysis program was originally conceived and implemented, some of its results were intended to be used as part of the application specification required by a display system model (section 3.2). A serious deficiency in the data collection facility became evident and frustrated this aim. The problem is that input-output information is gathered only at the MTS level. The display service routines for the IBM 2250 display console do not use the MTS I/O routines once a graphics application program has been loaded and started from the console until it has been terminated. Therefore to the data collection facility, running a graphics program appears as one long response time, despite 144

the many user interactions generated with the light pen and function buttons. Because this was the case, gathering much in the way of useful statistics for display applications became impossible. All of use that can be garnished from the display applications statistics is CPU utilization during response periods, and also averaged over think and response periods. While this information will be shown to be useful, it is not what was anticipated. This information is in Table 4-1. The first application referred to in the table, Michigan's Own Math ematical System (MOMS) is used to manipulate and plot mathematical functions. All interaction is via the light pen. The second application, text editing, uses the light pen and keyboard to modify text displayed on the console. The 2250 display console could also be used in MTS as a teletype, with a screen instead of printer and paper. Table 4-2 shows pertinent data from this mode of operation. The first three applications consist of general program preparation, correction, and debugging. The last application, running the system program *TASKS, gives a listing of jobs active in MTS. The data collection facility was also used to monitor all MTS users for about one hour. The statistics gathered from a random sampling of the monitored teletype terminals are given in Table 4-3. Finally, a small amount of data was collected from a remote display terminal using MTS. The system consists of a DEC 339 with 145

spuoo0asoJ.oJI'PtAL4uI CI Co c Y -4. 0:4 0' 3 O -q Co USS3OJJ J0A:sp uoaV ) t C t-.C) LO O LeO @SPUO),S oo C) C03 CD C C) GLUIT frjo LO r-I ^ spMuo O C) C)at0 c3'allllm pasd-eia ~, OO CO et aLUIejL asuodsao-,iq.x1nac 6 9 Q; lu13.~ojd Aq flJD Jro asfl. LO ) 5' s pa ooeg 0} UThIJ^ - dc r AT q CO C) I C) C.).. ^(otj3 asuodsn aHuSuar n CO-,. o q:J ~rt if> O O O C.0 flcOl Jo s oo io o.,E —i C'2.2 C^ ~ ~ ~ ~ ~ ~ U C C) Q,-4 0~ ^ **,-,-4 *r4 r4 3 4- 4 14 i cq I C.) b) b S S *1-4 o ~' -! o o b-i E146

SpUoco ES.. o0 C cO; Oc; %5'aiTTTI asuodssI- a-eaohV 1 T 0 am c~ o a) 4spuomgj.. H'z'oraiL 5IUUjq&L aiouadJAV' eq eq x Uc)^~~~~C c Cld spuoats'maul fladi' v0 0 0 L o ro 0 o z u r q fD JO O!fi j spuoaas'am![L pasd-GI, Cm ~ o oq I a1 C ) 0 cO LO L-4 1-4 Taui o- LoUCI C H A'q fncD Jo asn f ae.XA:-b bD) b X bD *H *.s S.3 03Co Co Co 0) 0) 0 C Cd C co * -' * -- 4e r cc C S cd 0 0 0) C ca3 c'S cp.V b) bl bfl Z 0 0 0 0) 147

Table 4-3 Data Gathered for Random Teletype Users I C OPx IH aPO tR ^: I ^III op o o') 00 ( 1.% 2% 27 37 1 0 180 0 D 4 4 D7 bO - 4 5 3.50% 4.8% 1525 53.0 0.85 1.84 6103 1.80% 3.1% 578 10.2 25.00 31.90 3316 1. 20 22.4% 297 3.7 16.10 15.80 5275 0. 15% 4.4% 4760 7.0 194. 0 4.60 5837 2.40% 4.5% 512 2.3 14.60 15.50 3998 4.50% 9.8% 779 34.7 42.60 35.2 0 5782 0. 7 0% 1.2% 207 1.5 12.10 39.80 3947 1.10% 1.5% 83 0.9 14.00 28.6 0 4610 2.10% 2.9%o 356 7.6 25.30 63.6 0 3462 t. 10% 1.8%o 1580 17.5 24.10 36.70 4454 0. 20% 1.0(o 3610 7.1 35.9 0 5.7 0 3438 0.90% 3.3o 2930 25.2 44.60 14.10 4807 148

16, 384 words of core storage, connected to the main computer via a 2000 bits per second data link. The application is queueing network analysis, in which the user draws on the display a queueing network, and then can see various probability distributions pertaining to the network. All graphics work is done by the display terminal: the main computer merely solves the network for the required results. The data is in Table 4-4. Only a very simple queueing network was analyzed. It can be expected that for more complicated and realistic models, the average CPU utilization of 7.3% would increase. Note that in this case "Think Time" does not refer to the user, but to the remote display terminal. Thus an average of 2. 5 seconds after receiving a reply from the main computer, the remote computer sends a new request for service to the main computer. Table 4-5 compares and summarizes some of the more important statistics from the four modes of using MTS reported on in Tables 4-1 through 4-4. Several points should be noted. First, using a display in lieu of a teletype reduces response time by a factor of 4, with think time remaining nearly constant. This is a consequence of the fast rate of output attainable with the display console. The input rate (and consequently think time) is unaffected because a keyboard is used in both cases. Second, the faster output rate results in higher CPU use by the terminal, because more computation is done in less time. This means that a display terminal user gets more done in unit time than does a 149

Table 4-4 Data Gathered for Remote Display Terminal Average Use of CPU 3 by Display Terminal 7.30% Average Use of CPU by Display Terminal During Response Time 13.30% Elapsed Time, Seconds 800. 00 CPU Time, Seconcs 58. 50 Average Display Terminal "Think Time," Seconds 2. 50 Average Response Time, Seconds 2. 96 Average Processing Inter- 6 val Length, Microseconds 6 150

Table 4-5 Comparison of Statistics Application Average Use Average User Average Response Class of CPU Think Time Time Teletype 1.05% 22.8 24. 40 2250 Display Used as Teletype 2. 36% 23. 0 6. 40 2250 Display Used for Graphics 3. 03% * * Remote Display Used for Graphics 7. 3 * 2. 96 * Not Applicable i51

MEAN SIGMA SAMPLES 2557643.C t64ccC2.C 145 T-INK TIME 10393.4 _3563.5 145 TOTAL CPU TIME USEC CURING THINK PERIODS 2962184.0 1CC45i7E.C 144 RESPONSE TIME 3367C.6 146e383.C 144 CPU TIMES DURING RESPCNSE PERIODS 6CC7.1 SScc.4 9437 CPU INTERVALS CURING RESPCNSE FERIODS 65__.55 _ 25E.C 144 NUMBER OF CPU INTERVALS CURING RESPONSE PERIODS 5.8 1.E 145 INPUT MESSAGE LENGTHS 91.3 2s__.7 _ 144 CUTPUT MESSAGE LENGTHS_ TIMES ARE IN MICROSECONDS _MESSAGE LENGTHS ARE IN CHARACTERS ________ Figure 4-21 Summary Data for Remote Display Terminal 152

1.000 ------ + ---- -------- ----------- --------- - ------------— + —--------------— + —**** I I I I 1 I I I I I I I I I I I I I I*4444444*4444* I I I I I I * **~___I****#*__..______________ I I I I I I I I I I I I I I I I I I I * I I I I I I I I. I I - I I I I I I I I I I I I I I I I I I.956 + —----- + -- ------- ----— + —----------— + ---— + —. ——.+- + —- + — I I I I I I I I I I C I I I I I I I I I I __ I_ I I I I I I I I I I I I * I I I I I I I I U _ __. I I I I_ I I I I I I I P, I I I'~**'~* I- I I I I I I I U I I I * I I I 1I I I I _ 1 I I I I rI I I I A I I I I I I I I I I I B.912 + —-----------------------— + —------- ---------—. —------------- ------------ --------------- I III___ I 1 1 I _ I I _ I _I 1__ I ___ I I: I I I I I I I I I I _ __I I I I I I I I I I _ T I I I I I I I I I I'I * I I I I I I I I I I'R I I -I I I I I I I I I 1__ I _ I t II I I I I I I I I1* I I I I I I I I I ____I__ -_ I I _ I I I I I I _I__ _ I _ I__.868 + —------- - --------— + — -—. —--------------------- - -----------------— + —-----— + —----------- I I. I I I I I I I I I L I I I I I I 1 1 1 I f____ ______ ___ _ THINK TIMES, UN r IT = 1/2rSECCK Cr_ _ I * F I I I I I I I I I Y I' I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I * I I I I I I I I I i I I I I I I I I I I I II I I I I I I I I I * I I I I I I I I I I 1.4 I I I I I I I I I I I I I I I I I I I I I 1 * I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I.779 4**........+....+.+ —-----— +..............4 ---— ~ -+..+ 0.o00C 9.9CC 19.eCC 29.7CC 3S.600 49.500 59.4CC 69,300 79.200 89.1C0 99.000 ThIIK JI-.ES, UNIT = 1/2' SECCNC Figure 4-22 Think Time Distribution 153

1.000 ------— + —----------------- + —---- ------- ----- --- ---— + --—. —----—.+ - ----—.. I I I 2I _ I I I I I I I I I I I II I I_. I I I I I I I i I I I I I I I i I I ____ I I _ I _I I ********3****@***********$***************** I I 44*F4)****** **4****4**4*~***4*******4***4*4 *****4* 1 1 I I I _ I f_ I I I 2 2 J I I I I I I I I I I I I * 1 I I I I I I I__ I.800 + —* —— + —---— + —- - -- -+- - ---- ----- ----- ------ --------- -_ -_ S _I I I I I I, I _ I I I I I I I I I I I I I I I ______ I_________ I ___________I I I I I I I I I C I * I I I I I I I I ____ I _ I I.! I I 1 I.__ ______ I____ I I I I I I I I I I I! U - I I I I I I I _ I i_ I L I I I I I I I I I I I A I I II I I! I I I I T..600 4 —-----— + —— + —----- + —----------- + —----- + —----- ----- + —--- ---— + --—. — -.. I I I I I I I I _ I I I V I I I I I I I I I I E I I _ I I I I I I I ____ I I I I II I I I I I ___I I 1 1 1 I IT 1I = e_ 1 I. I I I I 2I I.~. I * I t I I I i I I I I I I I II I I I I I I I B__I_ I I I I I I I__ I I -. I I I I I I I I I' I I I I I I I I I I I ___ I II II I I I I I I - I I I I I I I I I I I - I___ I I 1 I I I 3 I I I I I I I I I I I I I I I == I _ I I I I. 1 c200 + —-----— + —-----— + —-----— + —------- ------- — + —----— 4 —---------— 4I I I I I II I I I I 1 I I I I I! I I I 3 I I I I.. I I I I I II I! I. I I I I I I I.1 I I I I I I I I I _ _I _ I I I I I I I I I I I I I I I I I I I I I I 0.000 ** —-------- ----—. —+4 —--— +.-.-+...- -.4~ ~.~.~.4.-~+- ~ 4 0.000 9.900 1.S800 29S70C 3s.6CC 49,500 59,400 69,300 79,200 89.1C0 99.000 RESPONSE TIME. UNIT = ICO MILLISECONCS Figure 4-23 Response Time Distribution 154

000 + ---------------------------------—. ------ --—. —-- ************** ______ ________I____I I _I I _'___________1 I I I! I I I I I I I I I I I I I I I I I I I I I I I _~ I~ ~ ~ I I I I I I I I I I I [ I I I 1 1 I I I I I I I I I I I I I I I 976 +. —~ —------- - -—. —+ —--—. —----—.. —------- +- - -—.. -----—.. --—. -----— 4...........I I I I I I I I I I I I I I 1I4*4* I I I I I I C I I I I I I I I I I I U I_ _ I I I I I I I I I I I I I I I I I I I I U. I__ I I 1 I I I I I I L I I I * ** I I I I A I I 1 I I I II I I I I T.953 -----------------------------------— ~ 4 — + —---— + ---— 4 -----— + --------------.~..........-._... _..._ _ *___ _________. _ 1 I ___ __ ___.___ ___ I __ I I I....._I V I I I I I I I I I I I - J - --! I I II _ I J -_ _ I _I I ____ I I___ I ___I I___________ I __I______ I __._I I 1____ R 1 *I I I I I I I I I I C. -........ -......... ____... ______.__I_.I _._.-_._.... __.._-........__ __... ____.........._ ___. _ B I I I I I I I I I I I B.929 ------— ~ ---------- - ~ —- --------— ~ —------- ---------------— 4- ------ -----— ~ -------- __...._...I.i..__.. _..... _..___.. _.__ ______I____ _.____ I___.._.. _.________._......____..........___.. - I I I - I I I I I I I L I I I I II I I I I I I I ------ ------ I -- I —------------- ------ -- I --— I I -- I I I I I I I I I I I,806 + — ------------— + —---------------------------------------------------- -------- ____ ______ 1_____ I _8___2__I _____09____5__0051 ____4C__ 1 __ _ I_ ______ I I I I I I I I I I I III I* I I I I I I I I.906 + -~..........__ - - 4.............-.. — ~ -- I * I I_ I I I I...I. I I__ I I I I I I I I I I I ______ _____I______I_ I _______II I I _______ I I____I _______ I I I I I I I I I I I'__ * I______ I II ____ I I I I I I I I I I I I I I 2 I* ~ I _ _ _ _ I I I [ _ _ I I.882 ~***....,......4...... —--— + —.......... —---— + —..-. - ~ -----— + —-~ —4- ~.. ----—. 0.000 9.9CC00 19.80C 29.7CC 3.600 49.500 59.4CC 69.300 79.200 89.1CC 99.000 RESPONSE TIlEI L'_IT = 1 SECCNC ____________ _______ Figure 4-24 Response Time Distribution 155

1.000 --------------— + —--------------— + —-- - - -----------------.9I I I ____ _______ — I I —-_____. —--— _____ —-_ _ I I I I I I I I I I I I I I..I I I I I I I I I I I I I I I 1 1 I I I I I I I I I I I I __ _I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I __________I ___________________I_____ __I 1 I I I I I I I I I I I _ I _ I I _ I I I I I I I *vI I I C I * I I I I I I I I I I U! I I I I I I__ I _I ___ I I. __ _ _ I I I I I I I I I I I I I I I I I I I I I I U I_ I I I I____ I _ I _I _____ __ I_____ I __________ I_ I _ I I I I I I I I I I I I I I I I I I I I I.806 + —-----— + —--------------— + —--------------- --- ---------— + —-------— _____ —------— _ E. I I I — - - I I I ___ _ _-I __ I I -- _ _ I I_-' I I I I I I I I I I I E L I I I I I I _ I_____ I I_ I' I I I I I I I I I I R! I I I I I I I I I I R I I I I I I I I I I I I__ _ II I I I I I I I I I I L I I I I I I I I I I I 4___ I ____I I. I I _ I I I__ ____ ___ ___ _ I __ __I_ TI I I I I I I I I I I I L I I I I I I I I I I I T I I I I I I I I I I I T I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I _ IE USED I I I I I I I I bution I I ICPU I Iper I Response Period I I I I I I I I I I I I I I I I I I I I I'1 I I I I I I I I I I _ I' I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I.514 *.........~+......... ~.........+.........+................+ — + —-----—..............+.......+_____ aoo000 9.900 15.80C 2S.7CC 3S.600 45.500 59.400 69.300 79.200 89. 1I0 9s.c0c CPL TI~E USED CURInG RESFChSE PERICCS, tNII = 1C MILLISEC@NiCS Figure 4-25 Distribution of Total CPU Time per Response Period 156

1.000 + —------ + —------------------ --------------- **(**** *** ** *~ ****4* *****44c^ ___I ___I__ I __I I____ I ___I ____I I I ___!__ I_ I I I I I I I I I I I I I I I I I_ I I I I I I I I I * 44*4:*4 I I I I I I I I I I I I I I I I I I I I I 1 1 I I I I I I I I I I I I I I -I. I-_ I I:_____ ____ __ I I I l I I I.982 + —--— + —----— + —— * —- --— ~ —- ~ ~ +~...........__.-_I I I I i I [ I I I I I i I I I' 1 1 I I I. I I I I _ I I I I I I I C I I I I I I I I I I I L I.. I I I ~I..I................ I I I __ I I I I I I I I I I I U _ I_ I I i I I I I I I_ _ I L I I I I I I I I I I I A I I I **4* I I I I I I I T.964 + —--------------— + —-----— + —-—.. —..+-+ I I I I I 1 _ I I I I. V i I I I I I I I I I I -..___.__._._._._.__I.................___._- I -I48 I IK. -_ K 1 I I RI II I I I I I I I I I I I_ I _ _I _ I I I I I R I I I I I I I I I I C............ __. I - _I I I I.I __ ___ I I..... B I I I I I I I I I I I ~!I I. I _ 1 l 1 1 I I __ B.946 + — ---- + ------— + —-- -— + —--- ----------- -------------------------— + —-- ---- ------ _I _I * ______ I I____ I I I I I. I I_ L I I I! I..I...........I...... I I I I I I.I.. L___ i I I I I I I I I I I I _. I --.................. I......I I _ __ - I I ___ _ I I I I I I I I I I I _____I I I _________________I I I I I _____I I I I I I I I I I I I I I I __ I I I I_ I I I1 I I 92 + ~I I I I I I I I I I I I I'_ _ I. I I I I I I I I I I i I I I I _____.____________ iIII I I I ___I ___I_______ I I I I I I I I I I I I I I I I I I I I I I I -iI I I I I I I I I I I I I!I I I I I I 1 1 1.910 ******** —+ -~....- ~-+..... 4- ~ ---— + —----- +....+.. —--—. O,000 9.9CC L.800C 29.7CC 39,600 49.500 59,4CC 69,300 79.200 89.100C 99.000 ~~-____ ___ _ _ CPU TIllE USEC CUPIG- FESPCNSE PEPIGCS, UNIT = 200 MILLISECONDS_ ______ Figure 4-26 Distribution of Total CPU Time per Response Period 157

1.000 + —--------------------------------- --------------- — *************************** _I II I I **' *: **44*l'****4**14** I I I I I I I 44444*4 I I I I I I ___ I I [_S1* *, I4 I I _ _I I I I__ _ ____ I I I**4 1 _ 1 I I I -I_ I I ** I I I___ _I _ I *I_ I I I I _I_______ I____ I _____ I I* I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I C I * I I I I i I I I I I _U _ __ I I I I I III... I. _......._.. _..._ I __ _-...._ _ _.._._ __.... LI I I I I I I I I I I I' IIII I I I I I _ _I _ 1______ L I I I I I I I I I I a 45 -- - — I * I I I I I -I I - _ — I I!-_ I I I I I I I_ ___ I L I___ I V I I I I I I I I I I.__.______ I I I I I I.............I______.............._............. ~ -- I —--- ------ -— 1 —------ ------ ------- ----------- -- ---- I I I I I I I I I I I 7 I * I I I I I! I I __ I R I I I I I I 1 I I _ _______ _90I I I__ 0 I I 4... I I_.5. ___ __4.6..__.37.208.I _ 1__................... __ I __L _-_ I __- I l I L I.... I 8+ —....._ I+........- I.. I I I I ___ __..____..__...__._____...._..._____....._.....__ i I I I I I I I I I I I -i.! ___I. I I I I_!...____......________.._.-...._.____....__......._....._.___ i I I II I I I I 1 V I I I1 I _ I I _ I I 1 I I I 1 I 1 1 I I I I I I I I_ I I I 45 +.... —---— + —--— +......... —-+ —... —---.............+..................+ I I I 1 I 1 I 1 I 1 I. I I I I I I I I I I ______ I [ I I I_ _ I I I I [ Distribution I I I I I I I _DI I I I I I I I iP1 i I I I I I I I I I I! 0.000 9.900' 15.80C 25.7CC 3C.6CC 45.500 5S9.CC 69.300 79.200 89,1CC 9900CC CPU INTERVAL.T TIEES CLRING RESPONSE PERICOS, UNIT = 1 iILLISECCN__ Figure 4-27 Distribution of CPU Processing Interval Times During Response Periods ~15S

1.000 + —-----— + —-----— + —-----— + —-------- ---------------------------------- _______ _______ ______ _____I ____.................. -+ 7.....I____._I.....1 I 7 7I I 1 7 I I i. I I I I I I I i....................i I 7 ___I 7!III I I I I I I I I I I I I I I 7I 1 * I I i I l _______ I__L_______..,_____ I ____. ____ I ________ I 1 C I I I I I I I I I U I i1* I I I I 7 I I __ I I I 7 I I I I i L 7! I I I I 1 7 7 I 7 * ) I 7 I I 7 I T.6CO + —------— + —-----— + —--------------— + --------------------------------------- R I * I I I i I I I I I L 7 I 7I I7 7 1 7 7 7 7.......-...- I __ —--- - ----—. - - -------- —... —.. — -—... ——..- —.-...... —..-..........-.. C IO 13 8 2 I C! CC00. 1 II _I ____I I I.1 ______ I I -- I I I I I I I I I I A 17_ __I II I 7I -... _ I _ I I 1 I I I I I I I I I I g I I I l I I I I I I I 1 I I I II I 7 I 1..................................................................!_...................!................................_ I t I i I I i I I I l __._.___.___._______ _1_7__ __I_ 7_____ 1 _____ I _7__ _ _ ___L_____ __!______ I I I I 1 I I I I _ _ _ I i _ __I _ ___ _____ ___.__ __________ 1______ I___________ I i i I 1 1 I i 1 I i I I i I i I I I 1 i ____ I___I_____1_..t.........7 A _______I_____I_.........._______................__.___ ___ C.000 *** ~~ —--- + —-----— + —-----— + —- ~~4~ — cOoQC.90CC0 1.80C 29.7CC 39.6CC 4S9.500 59,4CC 69,300 7S.200 89.CCO 99.000 __NUlEP CF CIRCEPS IN CUIPUT lNE __________ _______ Figure 4-28 Distribution of Output Line Lengths I 59

1,000 + —------— * * * * * * * * * * * * * * 1. O O O-_____*___ _____I _*_***_____I_____I _*_____I___I tz.I. z_* Ir *I I I I I I I I. I I I *I I I I I I I I I I I {I I I- - I I I. I _ I I I I I I I I I I I I I I.___.I I I I__ I _____I I I I I I I I I I I I I I I I — __T I I______I__ ___, I ____ I I ____ I I I I I I I I I I I I I I I -_I__ I L ______ 0 __-____ 1 -_______ 1 ________ ________ I I I_.800 -+ — ------- - ---- ----—. —----- ------ -—. —----------- ---—. I I I I I I_ I I I I I _ I I I I I I I I I I I I __ I I I _ I I I I I I C I I I I I I I I I I I I___ I I___ I __ I I I I ____ _ __ _ I _I __ _I 1 rI I I I I I I I I I I U1 _ _ I I I _ I I II _ I _I I_ _ _ ____ I I L I I I I I I I I I I I A I * I I I I I __I _ I I_ I I T,.600 + —----------- -— + —-- -- ------- ----- ---- --—. — _ + —-+. + — + —---- -- - I I I______ I _______I_____I I I I __I I I I V I I I I I I I I I I I I I__ I I I I I I I I I I I I I I I I I I I I I _________ I __ I I I I I__ I I___ I I I. 1 I I I I I I I I I I, I. I I I I i I I I I _ __ 1 * _ II_____I I ____I__ I _ 1_ ____ I I I - 3 I * I I I I I I I I I I A _____ 1 *+______L_ 1 ______ 1 ___________-____,_1_______+_______I______ I__ I.1I I I I I I I I.400 + —------ + —------- --------- + — ------- -------—..+ —------- ---- ----- - ------------—. —--.+ —------ L I I I I I I I I I I I. I _ I I I I I I I I I I T _ I I I I I I I I I I I Y I I I I I I I I I I I ~ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I.200 - -- -4- — + —-------- ---— F2____-______ —___ ____ i I I I I II I I r I I I I I I I I I I I I 1i I I I I I t I I II I I I I I I I I I I I I I I I I I I I _ I_ I I I I I I I I I I I I I i i I I I I 1 I I I I I I I I I I I I I I I I I I! I I I I I I 0.000 ** + ----— + —-----— ++ —. +-+....-+-~4. —----.+ —-.. —-— +' —--—.4 ~ 4+ 0.000 9.900 19.80C. 29.7CC 39.6CC 49.500 59.4C0 69.300 79.200 89.100 99.000 _NUER CF CHIRACTERS IN INPUT LINE Figure 4-29 Distribution of Number of CPU Intervals per Response Period 160

teletype terminal user, and the display user should therefore have a shorter connect time. Third, the 2250 display used for graphics work takes more CPU processing than when it is used as a teletype. Last, the remote display terminal uses more CPU processing than any other application class. However this final point, being based on just one data set, must at best be regarded as tentative. Also, the 360/67 hardware configuration in use when the remote display data was taken differed from the hardware in use when the other data was taken. All this data, then, gives very positive confirmation to the idea that displays in place of teletypes use the CPU more and increase response time, and that graphics-oriented work requires more CPU time than does teletype-oriented work. Figures 4-21 through 4-29 show some of the probability distributions and the summary data found from the remote display terminal statistics. They are meant to be indicative only of the type of information available: because the data represents only one application, no conclusions should be made from it. REFERENCES, CHAPTER 4 1. International Business Machines Corporation, General Purpose Simulation Sstem/360 User's Manual, IBM Publication No. H20-0326, White Plains, New York. 161

2. Pinkerton, Tad B., "Program Behavior and Control in Virtual Storage Computer Systems," Concomp Technical Report 4, The University of Michigan, April 1968. 3. Pinkerton, Tad B., "The MTS Data Collection Facility," Concomp Memorandum 18, The University of Michigan, June 1968. 4. University of Michigan Computing Center, Michigan Terminal System, (2nd Ed.), Ann Arbor, 1967. 162

APPENDIX A The Degree of Multiprogramming in Page-On-Demand Systems by V. Wallace and D. Mason published in Communications of the ACM Volume 11/Number 6/June, 1969 163

1. Introduction The relationship between the number of programs permitted simultaneous assignment of core, drum traffic rates, and central processing unit (CPU) utilization is a central relationship to a page-on-demand, multiprogrammed, time-shared computer system [ 1 ]. A clear understanding of this relationship is vital to systems planners (software or hardware) if adequate performance is to be obtained. In this apaper, a simple stochastic model will be described which offers a base for that understanding. This paper also serves to further demonstrate the feasibility of applying numerical analysis of queueing models to the development of general insight into key computer organization questions. Considerable investment has been made in building systems to operate under the page-on-demand strategy, and difficulty has been encountered in obtaining satisfactory performance. Several excellent studies, both analytical [2, 3] and simulational [4, 7, 10], have been applied in attempts to explain and predict their performance. One of the chief underlying conclusions of each of these studies is that the number of tasks simultaneously in core memory must be severely limited if efficient operation is to be achieved. However, because of the detail attempted in these studies, a clear, general picture of the fundamental trade-offs has not emerged. The model to be presented offers such a picture by ruthless simplification, particularly in those places where reliable data is sparse and the use of "guesstimates" must still prevail. The simplicity can then be exploited by 164

exploring a much broader range of environmental parameters than would otherwise be possible. The insights resulting can better educate our intuition when reviewing more refined results and more refined environmental statistics as they become available. 2. A Model To that end we shall consider a system (Figure 1) which possesses a single CPU, a single secondary store for page swapping (which will be called a drum, for concreteness), and a single data channel serving that secondary store. Overhead (execution of system software) and non-drum input-output (I/O) operations will be ignored. Furthermore, it will be assumed that the supervisor is so designed that the number of users simultaneously having core assigned (which will be called the "degree of multiprogramming") will have a maximum value N, and further requests for execution when N users' programs are already in this active condition will simply be queued until one of the N active programs terminates or is terminated by the supervisor. We wish to determine, among other things, the rate R at which jobs having a known statistical character can be completed when requests for execution continuously exceed the rate of service. On the other hand, our only concern is with the behavior of user programs during the interval when they actually have core memory assigned to them. Thus, a "job" in our context will be that set of operations which starts when a page of core is first put at the disposal of a user, and ends the next time that the user's program is made ineligible to hold core memory. It is during that interval, which will 165

CONCENTRATOR OR i CHANNEL.... J_ CPU TELETYPE CHANNELS PAGED __ CORE --—. MEMORY --- DRUM CHANNEL DRUM Figure 1 The Simplified System 166

0. o Qt,~ zz I C t I-: -, rI I E l _ an./ ~,~ I ~ -. -q Ccc r wr c~ 3r 0 a: ^ vo7 ^^Our / y\^S~~~~~ee ^ S }5~fi33 ~^.^iss"e~~e

be called a "service interval", that execution of the program will take placein smaller intervals interspersed with the page transfers demanded from the drum. During any service interval, the control of the CPU will be shifted among the N jobs in core as they receive their demanded pages. Finally, we make the following assumptions about the page demands: 1) Upon initiation of a "job", there is a burst of page demands during which a negligible amount of execution is achieved for that particular job. 2) The cumulative drum-channel service time for this burst of demands is an independent, exponentially distributed random variable having mean Tb. 3) Once the burst is over, a job executes for an independent, exponentially distributed interval of time before a page demand occurs. 4) The drum-channel service time for the latter request is an independent, exponentially distributed interval with mean T. P 5) The execution-page demand cycle is repeated a random number of times, with the number of pages demanded after the initial burst (but before termination of the job) being a geometrically distributed random variable having mean N. 6) Jobs which become eligible for either execution or page transfer join an appropriate queue whenever another job is already undergoing execution or page transfer. Jobs in the 168

midst of their initial burst of page demands will be interrupted by the non-burst demands. When the burst demands cannot proceed, the jobs waiting for such such service also form a queue, separately from the execution and non-burst page demand queues. (See Figure 2.) It is readily shown that the above assumptions assure that the total execution time taken by each job is also an independent, exponentially distributed random variable having a mean which is N times the mean P execution interval described in assumption (3) above. The mean total execution time will be designated Te, so that the mean execution interval will be T/N. e p The assumption that the supervisor will enforce an upper limit (N) on the number of programs allowed to have core assigned results from the conviction, supported by the results below and the cited prior art, that such a policy must be enforced (under heavy use of the system) if the page demands are not to swamp the drum channel, with serious consequences to CPU utilization and to the rate of job completions. Indeed, it is a central idea of this paper that for a given set of environmental statistics (Te, Tb, Np, Tp), there will be a finite "best" value for the variable N. The assumed "burst" behavior reflects, qualitatively, behavior described by Fine, Jackson, and McIssac [ 5 ] and by Coffman and Varian [ 8] in experimental studies. Although their data were taken from a 169

simulated paged system, and although they measured paging characteristics for programs not specifically wirtten for a paged system, their results do give an idea of the nature of the page —on-demand environment. Besides showing that the rate of page demands is remarkably greater when the number of pages assigned to a job is small (as would be the case for recently started jobs), they also argue forcefully that the area of primary concern should be the high level of page requests when any program has significantly less core than its total storage requirements. This "burst" assumption also corresponds to the assumption of the existence of a "working set" of user pages, used by Denning [9 ] to model paging behavior. The "burst" represents acquisition of the initial working set. Obviously, that set will consist of an average of Tb/T pages, and total memory will need to be at least N Tb/T pages if the burst is not to be prolonged by considerable reshuffling of pages. The assumption of exponential distributions for drum channel service time gains some credence from the realization that several factors (access, latency, transfer, and posting delays) make up its total, and their durations are in general independent of one another. Latency will favor shorter intervals because the write operations will use the first sector available. Organization of queue disciplines can also affect distributions. Nevertheless, need for simplicity is still the major justifiction. 170

Unfortunately, the samples of the statistical measurements by Fine, et al., are too small, and the results insufficiently differentiated, to give an accurate notion of the distribution function for the execution intervals conditional on the degree of multiprogramming. Smith [ ] argues for the hyperexponential approximation. However, the principal reason for the exponential assumptions used here is to avoid complicating the model, especially in view of the sketchiness of present data. It seems reasonable, also, to assume that continuing jobs would be given priority in the use of the drum-channel over jobs undergoing their initial burst of paging. Prudence suggests that once you have committed a major portion of core to a user, you should do everything possible to expedite completion before committing yourself to new arri vals. While this model does not explicitly show the process of task or job creation, "thinking" delays at terminals, priority assignments in queues, page removal strategies, time-slicing, and many other facets of time-sharing operation, it is nevertheless a fairly realistic representation for the purpose at hand. Each of these factors has an effect on the parameters Te, Tb, N, Tp, through their influence on the distributions of the job properties-execution intervals, number of pages demanded, etc. While there may be considerable dependence among the statistical properties of the jobs of a particular user at a terminal, these jobs will begin their service intervals at widely spaced times, 171

mixed in order with the jobs of many other users, making the assumed independence a reasonable approximation. 3. The Results of Analysis The model described was analyzed for 448 separate combinations of the parameters Te, Tb, N, and N covering a wide range of values. The analysis was accomplished using the Recursive Queue Analyzer (RQA) [ 6], a program for rapid numerical solution of the equations of equilibrium probabilities for Markov chains. The results were calculated to three significant figures accuracy, using a total of about 20 minutes of RQA execution time (on an IBM 7090). The results of the analysis are plotted in Figures 3 and 4, which show the CPU utilization 77 as a function of the parameters. The CPU utilization as N - oo was hand calculated according to the formula. I 1 whenever Tb +N T < T n = Jo b p p - e X T otherwise Tb+N T Pb P All results are shown with time normalized to units of the mean page transfer time T. Thus T = 30 means that, on the average, the mean p e total execution time of a job is 30 times the drum-channel time for a page transfer. If it were assumed that the drum channel could service page requests at an average rate of one every 5 milliseconds (ms), then Te = 30 would correspond to an average total execution time of 172

1.0 0.6.6 T2 Te= 30 2 TbN' Ta~,Figure 310.2 T173 T _ _ _ _ _ _ N co 1.0 8 z N=N N6 -.Jm,4 Te=3 Te =30 I —b Tb =10 Figure 3 Analysis Results 173

| Te'lOVTe50O Te IO'0 Tb 50 1.0 Tb-20 -.8 D L- _ __ \I ~ Te=30 Z t 10.T e IL-L ^2.4 N=~ N.4 Te30 NZI | -TbS0 T" ia0 Figure 4 Analysis Results 1X4 cl~~~~~~nyk Result

1 50 ms. The point of balance between execution time per job and drumchannel time per job is, in every case, the value of N at which the N = oo curve breaks away from n = 1. 00. It is seen that this value of N is critical, for it represents a value at which CPU utilization begins p to fall markedly, regardless of the degree of multiprogramming. The CPU utilization is proportional to the expected rate R of job completions (per unit time), according to the formula R = n/T, while the expected service interval can be found from the formula W = N/R. Consequently, the results shown readily infer the values of these alternate measures of performance. The figures show how, other parameters remaining constant, CPU utilization improves as more programs are permitted to share the core memory. This, of course, is the desired effect of multiprogramming, resulting from the improved probability that some one of the jobs is available for execution at any given point in time. It is to be noted, however, that the improvement becomes increasingly small as N is increased. Indeed, in every case the ratio r1/N decreases with increasing N, and we thus deduce that the expected service interval W will always increase with increased N. Nevertheless, the improved rate of service completions R will result in a more than ~ 7o

compensating decrease in the time the job must wait to get into the active state, have core assigned, and begin its service interval. Thus, overall delay, as seen by users at terminals, will decrease in spite of the increase of the service interval. Since the waits before entering the service interval will represent the major portion of the delays, they can be expected to be roughly proportional to 1/Tr. The figures also show that over a very wide range of the parameters Te, T, Np, and Tp, there is relatively little to be gained by choosing an N greater than about 8, since those curves already approach the N=o curves quite closely. Indeed, when the rate of page requests is such that drum-channel demands (as described by Tb + Np Tp) exceed the CPU demands (T ) of a job, all performance measures inevitably get worse, and multiprogramming offers less and less advantage. 4. The Optimum Degree of Multiprogramming The figures offer a picture of the advantages of multiprogramming which is misleadingly optimistic. Because systems generally operate under a core memory limitation (for economic reasons, if no other), it can be expected that an increase in N will cause each job to have, on the average, proportionately less core available to it. This, in turn, will cause the amount of page shuffling to increase, tending to lessen performance, and countering some of the advantage of the increased N. We do not, as yet, have a very good quantitative idea of how these page demands might increase with decreasing core. However, suppose for 176

example that the number of page requests per job, Np, was known to relate to the degree of multiprogramming in such a manner that N = mN +b. p Then, for m > 0 there will be some N which gives best CPU efficiency. The dotted curve in Figure 5 shows the effect of increasing N when Te = 30, Tb = 20, m=2, and b=0. For this example, if one must choose between N=1,2, 4, or 8, the best choice would be 4, since that gives best C PU utilization. Figures 6 and 7 show the optima for a number of choices of Te, Tb, m, and b. What is shown in each case are the regions in an (m, b)space for which the optimum selection would be either 1, 2, 4, or 8 when these are the only allowed values. The solution of Figure 5 is shown as a heavy black point on the horizontal axis of Figure 7 (T e=30, Tb=20), Notice that small values of N are favored for conditions where the expansion of demands on the drum is great, and where execution times are small. The size of the burst page demand seems not to be too important to the choice of optimum N, although it is very important to the performance which that optimum gives. It is remarkable that the selection is so relatively insensitive to b. It is also clear that the value of m for the jobs to be served by the system is an extremely important parameter in determining a suitable choice of N. Hopefully, future published data from real or simulated POD systems can give us a better idea of what values to expect. 177

1.0 c —-- -.8 DEANDS PER.6- N0.4 D 0 4 8 12 16 20 PAGE DEMANDS PER JOB Figure 5 The Effect of Multiprogramming when Np 3N 178

12 Te 10 TeO10 Tb I I IO a 6 -/ N= I 4 2 NI CL) z 12 30 2 Te=30 Te 30 LU Tb= Tb10 0 8 N 6)-/' 12 /Nl /8 4 2 N=I 00 4 8 12 16 20 0 4 8 12 16 20 RATE OF DEMAND INCREASE (m) Figure 6 Optimum Degree of Multiprogramming 179

12 Te 10 Te= 10 Tb= 20 Tb= 5 Q 4 2 N=1 U D 6p42 NI 2 N~I LU IIJ. I I 5 z 12 r I'e = 30 Te= 30 Tb=20 Tb 50 r i' 6 \ 0 4 N=2 4 N=2 LUI N 1 I~,,!.,' I 0 4 8 12 16 20 0 4 8 12 16 20 RATE OF DEMAND INCREASE (m) Figure 7 Optimum Degree of Multiprogramming 180

5. Conclusions The observations above have been based on a grossly simplified model, and have aided understanding of a complex problem through exploration over' a wide spectrum of parameter values. This is the type of analysis most benefited by queueing models, and it is well served by numerical solution technique. The analysis has provided a more quantitative aid to understanding the role of multiprogramming and of job characteristics in determining CPU utilization in POD systems. While most of the observations are intuitively unsurprising, or perhaps even obvious, the model offers some clearer insight into the questions "how much better?" and "how much worse?". It also dramatizes the fact that the optimal choice of degree of multiprogramming is primarily determined by a few key statistical properties of the job population to be served. While scheduling strategies and equipment selection can strongly affect these parameters, it is still vital that an idea of their values for each choice of strategy and equipment be available for a rational choice of N to be made. REFERENCES, APPENDIX A 1. Arden, B. W., Galler, B. A., O'Brien, T. C., and Westervelt, F. H., "Program and Addressing Structure in a Time-Sharing Environment," J.ACM, 13, 1 (January 1966), pp. 1-16. 2. Smith, J. L., "Multiprogramming Under a Page on Demand Strategy," Comm. ACM, 10, 10(0ctober 1967), pp. 636-646. 181

3. Gaver, D P., Jr., "Probability Models for Multiprogramming Computer Systems," J.ACM, 14, 3(July 1967), pp. 423-438. 4. Nielsen, N. R., "The Simulation of Time-Sharing Systems," Comm. ACM, 10, 7(July 1967), pp. 379-412. 5. Fine, G. H., Jackson, C. W., and McIsaac, P. V., "Dynamic Program Behavior Under Paging," Proc. ACM, National Meeting, 1966, pp. 223-228. 6. Wallace, V. L., and Rosenberg, R. S., RQA-1, The Recursive Queue Analyzer. Technical Report No. 2, Systems Engineering Laboratory, University Of Michigan, Ann Arbor, Michigan, February 1966. 7o Oppenheimer, G., and Weizer, N., "Resource Management for d Medium Scale Time-Sharing Operating System," Comm. ACM, 11, 5(May 1968), pp. 313-322. 8. Coffman, E. G., and Varian, L. C., "Further Experimental Data on the Behavior of Programs in a Paging Environment," Comm. ACM, 11, 7(July 1968), pp. 471-474. 9. Denning, P. J., "The Working Set Model for Program Behavior," Comm. ACM, 11, 5(May 1968), pp. 323-333. 10. Pinkerton, T. B., Program Behavior and Control in Virtual Storage Computer Systems, Technical Report 4, Concomp Project, University of Michigan, Ann Arbor, Michigan, April 1968. 182

UNCLASSIFIED Security Classification DOCUMENT CONTROL DATA - R & D (Security classification of title, body of abstract and indexing annotation must be entered when the overall report is classified) 1. ORIGINATING ACTIVITY (Corporate author) 2a. REPORT SECURITY CLASSIFICATION University of Michigan Unclassified Systems Engineering Laboratory 2b. GROUP Ann Arbor, MI 48104 3. REPORT TITLE A Study of Information Flow in Multiple-Computer and Multiple-Console Data Processing Systems 4. DESCRIPTIVE NOTES (Type of report and inclusive dates) Final Report (Jan 68 to Mar 69) 5. AU THO R(S) (First name, middle initial, last name) K. B. Irani J. W. Boyse V. L. Wallace A. W. Naylor D. M. Coleman A. M. Woolf J. D. Foley D. R. Doll _ 6. REPORT DATE 7a. TOTAL NO. OF PAGES 7b. NO. OF REFS August 1969 200 8a. CONTRACT OR GRANT NO. 9a. O RIGINATOR'S REPORT NUMBER(S) AF 30(602)-3953 b. PROJECT NO. Annual Report No. 3 5581 c Task No: 9b. OTHER REPORT NO(S) (Any other numbers that may be assigned this report) d. 0 RADC-TR-69-189 10. DISTRIBUTION STATEMENT This document is subject to special export controls and each transmittal to foreign governments or foreign nationals may be made only with prior approval of RADC (EMIRA), GAFB, NY 13440. 11. SUPPLEMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY RADC Project Engineer: Rome Air Development Center (EMIRA) Lawrence W. Odell Griffiss Air Force Base, New York 13440 EMIRA 13. ABSTRAC T This report documents the achievements from January 1968 to March 1969 of continuing research into the application of Queueing Theory and Markov Decision Processes to the design and investigation of multiple-user systems. A summary of the theoretical investigation conducted, the major conclusions reached, and some typical applications are included. F1 NO 14 73 UNCLASSIFIED Security Classification

3 9015 03025 2301 UNCLASS I FI ED Security Classification 14. LINK A LINK B LINK C KEY WORDS ROLE WT ROLE WT ROLE WT Data Processing Systems Digital Computers Programming Displays Models (Simulations) Mathematics UNCLASSIFIED Security Classification AFLC-Griff-is-s AFB NY 17 Sep 69-113