Technical Report No. 170 06868-1-T THE OPTIMAL CONTROL OF QUEUES, WITH APPLICATIONS TO COMPUTER SYSTEMS by Dennis W. Fife Approved by: U 7i A4.,:si^Thomias':W.'Butler, Jr. -.......; fqr.........;.;: ~ COOLEY'. EtEGTRONICS. LABORATORY Department of -Electrical Engineering The University of Michigan Ann Arbor Contract No. AF 30(602)-3555 United States Air Force Rome Air Development Center Rome, New York and Grant No. GK-176 National Science Foundation Washington, D. C. October 1965 THE UNIVERSITY OF MiIHGAi ENGINEERING LIBRARY

.,./. ~,...: ~. ~ (:. f/ift:~. This report was also a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The University of Michigan, 19659

ACKNOWLEDGMENTS This research effort is the culmination of over seven years of graduate study and research at The University of Michigan. During this time, I have benefited from many associations, especially with the members of my doctoral committee. For this, and their helpful suggestions on this report, I am most appreciative. Professor Chuang deserves special recognition and gratitude for it was his early work which led me to investigate this subject matter, and he has been a constant source of advice and encouragement throughout the course of this work. To my other research associates I also express my thanks. The rough draft of this dissertation was typed by a number of the girls from the Cooley Electronics Laboratory, whom I thank, but special mention should be accorded to Mrs. Joan Carstens and Miss Carol Wolanski. Ronald Kilgren did most of the drafting, and Ralph Olsen was very helpful in plotting some of the graphs. The final draft typing and printing has been handled by George Prosser and staff of the Office of Research Administration. I am grateful to them for a fine job done under pressure of time. I am also indebted to the United States Air Force and the National Science Foundation for financial support under contracts AF 30(602) -2661 AF 30(602)-3553, and NSF grants GP-1381 and GK-176. Finally, the unfailing encouragement of my wife, Joyce, has been of immeasurable value to me in all of my graduate study, including this research. ii

TABLE OF CONTENTS Page LIST OF TABLES vi LIST OF FIGURES viii LIST OF SYMBOLS x ABSTRACT xviii CHAPTER I. INTRODUCTION 1 lo1 Engineering Study of Queues 1 1.2 Organization of Multiple-Access Computers 7 1.2.1 Purpose 7 1.2.2 Equipment and Use 10 1.2.3 Executive Control 14 1.2.4 Scheduling 18 1.3 Illustrative Analysis of a Simple Problem 25 1.3.1 Problem Description 25 1.3.2 Analysis 28 II. OPTIMIZATION IN THE CONTROL OF QUEUES 34 2.1 Results Obtained From Analysis 34 2.1.1 Single Queue Systems 34 2.1.2 Multiple Queue Systems 39 2.1.3 Experimental Analysis and Simulation 42 2.2 A Markovian Decision Process 44 2.2.1 System Behavior 44 2.2.2 Control Policies and Returns 48 2.2.3 Optimization 51 2.2.4 System Behavior as a Stochastic Process 33 2.3 Background on Markov Decision Processes 56 2.4 Objectives and Results 61 2.4.1 Theory 62 2.4.2 Application 63 III. THE DECISION PROCESS WITH DISCOUNTING 65 3.1 The Set of Admissible Policies 66 3.1.1 The Sets Ei 67 iii

TABLE OF CONTENTS (Continued) Page CHAPTER 3.1.2 The Set H 69 3.1.3 The Set P 69 3.2 Basic Relations and the Optimization Criteria 71 3.2.1 The Random Variables 72 3.2.2 The Set S 75 3.2.3 Formulation of Return 78 352.4 Optimization Criteria 80 3.3 Existence of an Optimal Policy 82 3.4 Dynamic Programming Formulation 87 3.4.1 A Functional Equation 87 3.5 Optimality of a Stationary, Deterministic Policy 92 IV. THE INFINITE-TIME DECISION PROCESS 103 4.1 Existence of an Optimal Stationary Deterministic Policy 104 4.1.1 System Behavior with a P' Policy 104 4.1.2 Principal Existence Theorem 109 4.2 Illustrations of the Principal Theorem 111 4.2.1 First Example 112 4.2.2 Second Example 116 4.2.3 Third Example 117 4.3 The Case of Ergodic Policies 119 4.4 The Howard Algorithm 126 4.5 Computational Considerations 128 4.5.1 Errors 129 4.5.2 The Actions for the Transient States 135 V. A DECISION MODEL FOR A MULTIPLE-ACCESS COMPUTER 138 5.1 The Physical System 139 5.1.1 Hardware Structure 140 5.1.2 Software Structure 143 5.1.3 General Control Process 147 5.2 Formulation of the Mbdel 154 5.2.1 Simplifying Assumptions 154 5.2.2 Definition of States and Actions 160 5.2.3 Derivation of Formulae 168 5.3 Optimization Criteria 174 5.3.1 Minimization of Response Time 174 5.3.2 Minimization of Weighted Response Time 176 5.3.3 Assignment of One-Transition Returns 180 iv

TABLE OF CONTENTS (Concluded) Page CHAPTER 5.4 Method of Solution 185 5.4.i Application of the Principal Theorem 185 o.4.2 Ergodic Policies 186 5.e43 Interpretations of the Gain g 187 594e4 Restrictions on Solutions 191 5o4o5 Admissible Policies 192 VI. OPTIMUM CONTROL OF QUEUES IN THE MULTIPLE-ACCESS COMPUTER 195 6o1 Minimization of Response Time 195 6.11l Negative Exponential Distribution 195 6olo2 Non-Exponential Distributions 199 6o1.3 Equivalent Criterion 211 6.1.4 Comparison of Time-Sharing Policies 212 6.2 Minimum Weighted Response Time 213 6.3 Sensitivity of Solutions 222 6.4 Application and Extension of Results 224 VII. CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH 228 APPENDIX A: SUPPLEMENTARY THEOREMS 232 APPENDIX B: ASYMPTOTIC PROPERTIES OF A MARKOV-RENEWAL PROCESS 237 APPENDIX C: FORMULAE FOR THE COMPUTER MODEL 240 APPENDIX D: THE SEMI-MARKOV SYSTEM INVESTIGATOR (SEMSIN) 257 Do. General Description 257 Do2 Subroutines 260 D.3 Performance 263 REFERENCES 265 v

LIST OF TABLES Table Page 501 Description of Possible Transitions for Each Allowed Action 163 5 2 System Parameters 170 5 3 Equal Weight Response Times for Weight Curve 1 179 61i Comparison of -Theoretical Gain of Continuous-Time Model and Computed Gain of Discrete-Time Model: First-Come, FirstServed. Policy, Distribution 1 197 6 2 Minimization of Response Time With Distribution 2 202 6o3 Minimization of Response Time With Distribution 2 203 6, 4 Minimization Qf Response'Time With Distribution 3 204 6.5 Minimization of Response Time With Distribution 2 208 6~6 Values of o, Mean Rate of Job Completion for the Optimal System 211 6 7 Minimization of Weighted Response Time: Weighting Curve 1, Distribution 2 217 6~8 Minimization of Weighted Response Time: Distribution 2 218 6 9 Minimization of Weighted Response Time: Distribution 1 219 6e10 Minimization of Weighted Response Time: Weighting Curve 1, Distribution 2 220 6.11 Comparison of the System Gain g 223 6o12 Number of States for the Computer Model 226 C.1 Elements of the Vector G 241 Co2 Correspondence of State Indices and Vector Elements 248 Co3 Elements of the Vector P of Transition Probabilities 249 vi

LIST OF TABLES (Concluded) Table Page C.4 Elements of the Vector T of Mean Transition Times 254 Co5 Elements of the Vector A, Mean Time-in-System of a New Arrival During a Transition 256 Do Observed SEMSIN Performance on the Computer Model 264 vii

LIST OF FIGURES Figure Page 1.1 Equipment units and data paths in a multiple-access computer. 12 1.2 Executive control functions. 17 1.3 Representation of round-robin service. 21 1.4 Representation of a simple queueing problem. 26 1.5 A sample of system behavior showing service completion times. 29 2.1 Graph of expected time-in-system for round-robin service. 38 2.2 Priority service on a single facility. 40 2.3 Example of changes of state during the decision process. 46 2.4 Example of tabular representation of a stationary deterministic control policy. 50 2.5 Behavior of a Semi-Markov process and its associated Markov chain. 57 3.1 Illustration of the random variables Yn, Zn, Wn in a 5-state process. 74 4,1 State transitions in the first example. 114 4.2 The computation of the Howard algorithm. 127 5.1 Organization of the system to be studied. 142 5.2 Executive control functions in the system being modeled. 146 5~3 Multi-level queueing resulting from restricted execution time allocation, 151 ~54 Three level queue with round-robin service on last level, 153 viii

LIST OF FIGURES (Concluded) Figure Page 5.5 Observed distribution of execution time at The University of Michigan Computing Center. 156 5.6 Observed probability distribution of execution time for uncontrolled users of The University of Michigan Computing Center. 158 5,7 Comparison of system transitions and transitions of the imbedded decision process. 169 5o8 Weighting functions for time-in-system. 178 5.9 Three distributions on execution time. 193 6.1 Increase of remaining execution time with execution time received. 201 6o2 Conditional expectation of response time for distribution 2, Tu = 20 sec, Te = 4 sec, e = O, s = 0.15 sec. 214 6.3 Conditional expectation of response time for distributicn 2, Tu = 20 sec, Te = 1 sec, 0 = 1, S = 1 sec, s = 0.15 sec. 215 k Col Structure of the vector P of transition probabilities Pikj 246 k Co2 Structure of the vector T of mean transition times vi. 247 D 1 Organization of SEMSIN functions. 258 Do2 Simplified diagram of MAIN sequence. 262 ix

LIST OF SYMBOLS Page Where Symbol Meaning Frt Ue First Used a A factor for discounting the one-step return, 54 oA Partial one-transition return attributable to new arrivals in the transition interval. 180 i Partial one-transition return attributable to the swapping and set-up time for a job. 180 5i(k) The test quantity of the Howard algorithm, for state i and action k. 126 k yo Total weight per unit of time-in-system for jobs in system at beginning of a transition, excluding the job to be executed. 180 A ~T A transformation of the policy jT which advances the time sequence of decision by I time steps. 87 n(kli) A probability governing random selection of action k in state i, 66 ~n(kli) The action selection probability specified for time n by a policy. 67 0 Probability that a new arrival is a program requiring a set-up time, 155 X Reciprocal of the mean value of tao 27 41iI24L3 Reciprocals of the mean values of the phases in a three-phase hyperexponential distribution. 157 k vi Expectation of duration of a transition from state i with action k. 48 i True difference in the test quantities in state i for some pair of policieso 130 k lo Partial one-transition return attributable to execution time of a jobo 180 x

LIST OF SYMBOLS (Continued) Page Where Symbol Meaning First Used t An admissible control policy for the decision process. 67 td A uniformly optimal stationary policy for the discounting optimization problem. 96 Ti Stationary probability of state i of a Markov chain. 31 it The vth element of a sequence of policies from the set P. 83 l, c2,c3 Probabilities in the definition of a three-phase hyperexponential distribution. 157 T The integral duration of a transition. 47 *O(jv) Probability that the decision process begins in state j at time v. 73 ^ij(n) Probability that the system enters state j at time n, conditional on initial state i for the process. 99 ~n(j) Briefer notation for *n(j,o). 77 4n(j,v) Joint probability that Zn = j and Wn = v. 76 n(j,vlso, ), The distribution An(j,v) with its dependence on so and J emphasized. 83 D Probability of a job completion in an arbitrary time step. 189 b. The expectation of total return over one transition from state i with action k. 51 A di(e,e) The distance between elements e, e of Ei. 70 dp(zJT~) The distance between elements t, 7 of the set P. 71 ds(.s,) The metric on the set S. 76 xi

LIST OF SYMBOLS (Continued) ^11~~Symbol Meaning^~ Page Where Symbol Meaning First Used e2 Accomplished execution time of jobs in Q2. 150 e3 Minimum accomplished execution time for a job in Q3s 150 ei A vector of probabilities r(kli) for state i. 66 f.(T) The unconditional probability distribution on the duration of a transition from state i with action ko 184 fij Probability of reaching state j from state i. 106 fk (T) Probability of duration T for a transition befo.(T) Probability of duration T for a transition between states i and j with action k. 47 fn Probability that a job is completed in the nth time step of its execution, 171 fl(hnl) Operator notation for the transformation which yields sn as a function of sn_ and a decision hnl o 77 g The gain of the decision process with a stationary deterministic policy. 111 g.(i,T) The greatest lower bound on the subsequential limits of AT(i,rt) for T = 1,2,3,.oo 53 g~(so,c) The greatest lower bound on the subsequential limits of VT(so,)!/T for T = 1,2,o. o80 h A set of vectors ei, i 1,2,.oo,No 67 i Integer index for a state of the decision processo 44 k Integer index for a control action. 45 ki Thne index for the action to be taken in state i for a stationary deterministic policy. 101' xii

LIST OF SYMBOLS (Continued) Page Where Symbol Meaning F Ue First Used I State variable giving the status of the user memory area. 161 ~jj The mean recurrence time of state j. 106 nl,n2,n3 State variables giving the number of jobs in Qi, Q3, Q3 respectively. 160 k pij The probability of passing from state i to state j in one transition with action k. 47 q The maximum amount of execution time allowed on each pass for a job in Q3. 150 qij(TIk) Joint probability that the system passes from state i to j in one transition of T units duration with action k. 47 qij(Tlhn) Joint probability that a transition is from state i to state j with duration T, when the decision hn governs the transition. 7 rij(mlT) The return for the mth step of a transition having duration T, between states i and j, with action ko 49 s The ECP time involved in swapping two jobs between main memory and drum memory. 155 so The element of S corresponding to the initial state distribution ( (jv))o 76 Sn The element of the set S corresponding to the distribution (4rn(j,v)) 76 ta Arrival interval for a single source or console. 25 te Required execution time for a job in the computer model. 154 xiii

LIST OF SYMBOLS (Continued) Page Where Symbol Meaning First Used u(mli,k) The expectation of the one-step return received from a transition from state i with action k, m steps after it begins~ 49 u(m!i,hn) The expectation of the one-step return received from a transition from state i m steps after it begins, where the decision h governs the action taken at its beginning. 7 vi A constant for state i, known as a "relative value," involved in the asymptotic behavior of VT(i) with an ergodic policy. 121 wlwpw3 Expectation of the weight per unit-of-time in systems for jobs in Q1, Q2, Q3 respectively. 181 Xi Computed value equal to vi+g within limits of computational errors. 129 A. The set of integers 1,2,..,Ki which is the set of control actions for state i. 45 AT(i,<) The average per unit time expectation of return over T steps, for policy T and initial state i. 5 CPU Abbreviation for Central Processing Unit. 13 Da(i.) The expectation of total return for a decision process with discount factor a, policy i, initial state i. 54 D (jv,~,) The expectation of total discounted return from the decision process with policy t, conditional on the process beginning at time v in state j. 93 D*(Jv) The expectation of total discounted return from an optimum decision process, conditional on the process beginning at time v in state j. 94 xiv

LIST OF SYMBOLS (Continued).~~Symbol MePage Where Symbol Meaning First Used Dc(s o,) The expectation of total discounted return from the decision process with policy c and initial distribution so. 80 D*(so) The expectation of total discounted return from an optimum decision process with initial distribution So* 87 ECP Abbreviation for Executive Control Program. 14 Ei The set of all vectors ei. 67 F(T) The probability distribution function on execution time, Pr(te<T). 154 Gij(T) The probability of reaching state j from state i for the first time in T steps or less. 105 H The set of all N-tuples h. 69 Ki The number of possible control actions in state ~~~io,~~~~~~~ ~45 L Expectation of number of jobs in system at an arbitrary timeo 187 Mij(T) The expectation of Uj(T), conditional on the initial state i. 105 N Total number of states in the decision process. 44 U.(T) The number of times state j is entered in the first T steps, excluding the initial state. 105 P The set of all admissible policies -o 69 P' The set of all stationary deterministic policies. 102 Q1,Q2,Q3 Notation for queue levels 1, 2, 3 respectively. 152 xv

LIST OF SYMBOLS (Continued) Page Where Symbols Meaning Frt Ue First Used Rn(i i) The expectation of return at the nth step of the decision process with policy T, conditional on an initial state i. 52 Rn(sov) The expectation of the return for the nth step of the decision process with policy i and initial distribution Soo 78 S The set of all joint probability distributions (4(jv); j = 1,2,. oN; v = 0,1,2,.o ) 75 S Set-up time for relocation of a job program. 148 Te Mean value of execution time teo 170 T Expectation of the time interval between an arrival from a user and the end of the transition in progress. 184 Tu Mean time for a user to generate a new requests equal to l/o 154 VT(i, T) The expectation of total return for T steps of the decision process with policy A, conditional on initial state i. 52 VT(Sot) The expectation of total return over T steps of the decision process with policy t and initial distribution Soo 78 W(te) A function giving a value for weighting the timein-system of a job requiring execution time teo 177 Wi The expectation of time a job spends waiting in Qi(i = 1,2 3) including swapping and set-up time. 189 Wn The random time interval measured forward from time n to the beginning of a transition. 72 Xi The number of jobs present in system at beginning of a transition from state io 183 xvi

LIST OF SYMBOLS (Concluded) Page Where Symbol Meaning F Ue First Used Yn The state of the system at time n. 55 Zn The state entered at the first transition beginning at time n or later. 72 xvii

ABSTRACT An important element in. the operation of a system with queues is the procedure for controlling the use of system resources in processing the random demands for service. Many systems, especially those employing general purpose computers, admit a wide class of economically practical procedureso It is usually infeasible to use classical techniques of analysis to study every admissible control policy. The system engineer instead requires an optimization technique which will systematically isolate an optimal policy within the admissible classo This research pursues the theoretical development and application of Markov sequential control processes for this purpose0 Major effort is devoted to a discrete-time Semi-Markov decision process, and the optimization of performance return from this process over infinite time. Proceeding from an appropriate definition of optimization, it is established that a particularly simple optimal control policy existsO This policy is stationary and deterministic, and optimal for any initial condition on the process0 This result simplifies both the computation of the optimal control and its implementation in physical systems. A timely example for the application of the technique is provided by the scheduling control problem of multiple-access or time-shared computers' A Semi-Markov model is developed for a computer serving four remote consoles with three queueing levels for user jobs. Swap time and the set-up time for initial program loading are included, and rather general execution time distributions are treatedo Two optimization criteria aimed at minimizing response time to commands are proposed0 The computational method devised by Howard and Jewell is used to determine optimal policieso The execution intervals assigned to jobs are also studiedo Results for the computer model show that optimal policies are predominantly priority policies, although a much wider class is admissible. Also, a need is indicated for more rapid preemption of low priority jobs than used in existing time-shared systems. The mean response time for given execution time is compared for the optimal system and the control procedures now being used. The practicality of the optimization results establish that this new technique can profitably be used in the design of stochastic service systems v xviii

CHAPTER I INTRODUCTION 1.o ENGINEERING STUDY OF QUEUES Since the early days of the telephone industry, electrical engineers have had a basic interest in the design and performance of systems with queues, or what have recently been called "stochastic service systems" [1] A physical system may be called a service system if it exists to perform a function or a set of functions in response to discrete demands presented to it in the course of time from an external source. The functions performed are generally called "services" and the demands presented are called "arrivals." Everyone is familiar, of course, with the service system represented by a theatre ticket booth. A telephone switching center is also a service system, in which the arrivals are subscriber's calls and the service performed is to establish and hold a communication path to the called party. A less apparent example is a nuclear radiation counter, a service system in which the arrivals are nuclear decay particles, and the service performed is to detect and record the occurrence of each particle. During the last decade, another class of service systems has evolved which will have a widening role in our technology and which are of specific interest in this study. These so-called on-line, remote access, or multiple-access computer systems. 1

2 The arrivals to these service systems are commands or digital data communicated directly from a remote device; the service performed is the execution of a computational process. A service system generally consists of a number of different types of service facilities. For economic or physical reasons, the facilities in almost all systems are limited, so that only a certain number of arrivals can be accomodated at one time. To avoid turning away demands which occur when the system is fully occupied, many systems have a means for holding arrivals until facilities become available. The waiting arrivals then constitute a queue. Each arrival toga service system will require the use of various facilities for various periods of time before its service is completed. In a majority of the physical service systems one can imagine, moreover, there is a significant degree of uncertainty in regard to when arrivals will actually occur, and the time durations and facilities which will be required to service eacho For example, neither the times at which subscribers will initiate telephone calls nor the duration of any particular call can be known beforehand to the telephone engineer. Indeed, in many service systems, including the telephone system, it is both convenient and proper to consider the occurrence of an arrival and the end of a service interval as randomly occurring events, It is for this reason that the term "stochastic service system" is applied

The performance of a stochastic service system is degraded by the queueing delays which arrivals encounter in being serviced, and by the occurrence of a busy condition during which arrivals are turned away. The random nature of events during system operation demands that performance be described statistically, i.e., in terms of probabilities and average values'. A. prediction of performance which includes the stochastic effects is generally difficult, but often of such importance as to warrant extensive mathematical analysis. As a result, a large body of knowledge on stochastic processes and their application as models of physical service systems has developed in the last half century, This literature comprises the so-called "theory of queues." References [1] through [10] are examples of the more important works The theory of queues has a long and honorable history of providing solutions to practical problems in electrical engineering, as well as other fields. The theory had its origin in telephony in the early 1900's, and its early development was largely due to A. Ko Erlang [2]. An early problem treated by Erlang was estimating the number of trunks required between toll centers in order to provide a satisfactory grade of service. The grade of service is the probability that a caller encounters a busy signal because all trunks are in use. Erlang's result for the grade-of service, now known as the Erlang loss probability, is still in use [34]. Fry [5], Riordan [1], and Syski [9] give comprehen

4 sive coverage of the queueing problems in telephony. More practical aspects can be studied through papers such as Wilkinson's [34] in the Bell System Technical Journal. The increasing volume of long distance data transmission and message switching in recent years has given new applications for queueing analysis beyond the problems provided by telephone communication. For example, Otterman [35] has studied the grade of service when store-and-forward data transmission has shared access to trunks along with voice communicationO Haenschke [36] has used queueing models to investigate several switching center designs for data transmission. The control of errors in digital data and resulting queueing problems have been of concern [37]. The study of routing procedures and priority schemes for ordering the transmission of messages over a communication network has recently produced some results of more general interest [38]. The use of computer control for high-volume message switching has brought queueing analysis into the design of computers, see [39] for example. Aside from thi s computer application, the trend in modern computer design generally is toward asynchronous, independently operating subsystems, and queueing studies are now seen valuable for general purpose computer design [40-44] Surprisingly, the theory of queues also has application to devices for recording nuclear radiation, see [45] and [10], po 205ff. This is because radiation detection elements are limited physically by a dead

5 time occurring after each particle detected. This dead time corresponds to a "service time" for each recorded particle. The problem for analysis is to determine the maximum average counting rate possible with random particle arrivals. Beyond electrical engineering applications the mathematical study of queues has made important contributions to air traffic control, highway traffic control, organization of hospital facilities, assembly line manufacturing, and many other fields. Saaty [8] and Morse [7] give some indication of the additional breadth of applications. To a large degree, the applications of the theory of queues have centered on determining the extent of the service facilities required to provide a specified quality of performance. But ah equally important aspect to system design, especially when there are queues, is the overall operational control procedure for the service system. The operational control procedure consists of those actions which govern and coordinate the utilization of the system's service facilities by the arrivals~ Basically it involves two kinds of decisions which occur sequentially in time: 1) selection of arrivals for service, 2) allocation of time and service facilities to the selected arrivals, These decisions may provide the only flexibility available in controlling the performance of the system.

6 A common control procedure used in physical systems is "first-come, first-served." With this procedure, arrivals are selected in the time order of their occurrence and each thereupon receives whatever time and facilities are required for its completion. A great part of the theory of queues is limited to this rule. But it is widely recognized that other procedures can improve a system whenever there is large variation in the amount of time and facilities required to service each arrival~ This is especially true when convenience to a human user is important0 An illustration of this fact occurs in the modern supermarket, which provides an express checkout counter for customers with small orders' An improved control strategy will generally be more complex than the simple "first-come, first-served" rule, The control actions may depend on a number of factors, including time, the number of arrivals in queue, and distinctions between arrivals, for example. The use of a more complex strategy will usually imply on additional investment in facilities to implement the control logic. Economics may therefore eliminate all but simple procedures like the first-come, first-served rule. However, when a principal system component is a general purpose electronic digital computer, the queue control function is conveniently implemented as a stored program. There is every reason then to investigate the performance advantage which may be realized by incorporating a sophisticated, dynamic decision procedure.

7 This study concerns the theory and application of amathematical technique which has practical utility in finding optimum decision processes for controlling queues. It is particularly motivated by the queueing problems in large remote access computer systems -Such service systems are ideal for the application of the more sophisticated processes which the optimization technique allows, The optimization of control of service systems is a logical direction in which to extend the theory of queues, yet fairly little has been accomplished in this area. This study appears to have merit as a general contribution in this respects The results on optimization models of multiple-access computers should be of special value for computer system engineering. The technique advanced in this study should have additional applications as wide and varied as the theory of queues has had. 1.2 ORGANIZATION OF MULTIPLE-ACCESS COMPUTERS The scope of operational control in a stochastic service system is well illustrated in a multiple-access computer system. A. general discussion of these systems will give a better appreciation of the problems involved, and an introduction to the principal application of interest in this report. 1.2 1 Purpose A multiple-access computer is a special case of a class which may be called "on-line" computers. This general class is characterized by

8 direct communication between the computer and many remotely located devices. These devices may-serve to couple the computer to a physical process, as do temperature sensors and electronic counters; they may couple the computer to another computational process, as when the remote device is another computer; or, they may couple the computer to a human being, as when the remote device is a teletypewriter, or a cathode-ray tube display. The existence of direct communication is essentials Conventional. operation of digital computers today, the so-called "batch processing" mode, involves an intermediate conversion of program input from punched cards to magnetic tape. This includes much manual activity from a human machine operator, and results in considerable delay between receipt of the punched cards from the user and the start of execution of the user's program. The batch processing operation therefore does not permit the computer to be used in controlling a rapidly varying physical process, nor does it allow users of the computer to quickly detect and correct errors in their programs. The term "multiple-access computer" has in the past two years come to mean an on-line computer aimed at providing computationr facilities to hu-man users on a "public utility" basis. From a narrow view, this simply means improving the convenience of using the computers This is done by providing each user with a personal communication device, chiefly a teletypewriter at the present time, and making it possible for the user to activate a computer program and quickly receive the results, directly at his console0 The speed of a modern computer is so

9 great that a user will usually spend much more time in the conception of each request than the computer spends in performing the requested computation. The public utility concept is therefore more economical if a large number of users are connected to the machine. Then in the time intervals when one user is deciding what to request next, the computer can be serving the requests of other users. Thus, as time proceeds, the users are actually sharing the computer in turn, and this mode of operating a computer is frequently called "time-sharing." From a broader view, however, multiple-access computing aims to increase the usefulness of a computer, beyond mere convenience. What is sought is what Ligcklider [11] has called "man-computer symbiosis." Man-computer symbiosis is achieved when the man can exploit the tremendous speed and logical capability of the computer as fully and easily as he would use a human assistant. The premise is that in circumstances permitting free and rapid interchange of information between man and machine, the essence and solution of a problem is more quickly exposedo See [12,13] for examples of this approach to computer usage. Beyond direct communication, successful man-computer symbiosis requires a large repertoire of programming languages and computing utility programs. It is by means of this "library" that the man can "converse" with the machine, and have it respond with "intelligence" analogous to that of the human assistants The size of such a library, and the need for rapid access to any part of it, implies that the multipleaccess computer should have a very large capacity, fast-access

10 memory unit. Magnetic drum and disc memories have been found suitable. Although military and business organizations have been interested in special purpose on-line computers for some time [14,i5] and despite rather early speculation on general purpose systems [16], much credit for achieving a useful multiple-access system belongs to F. Jo Corbato and associates [17,18,19] at the Massachusetts Institute of Technology. As a principal effort of Project MA.C at MIT, Corbato's original concept has been expanded in the last two years to more closely approach the scale appropriate to the public utility concept. Presently, there are about 100 remote.units on the MIT campus, and as many as 25 can be simultaneously in use [19]. There are and have been a number of other multiple-access computing projects in progress throughout the country [20-27], and much interest in such systems exists among organizations now ordering new computing equipment. 1.2o2 Equipment and Use Let us turn now to more description of the equipment and use of a multiple-access computers The discussion will occasionally refer to particular characteristics of currently existing systems, but the principal intent is to exhibit the common characteristics of them. 1An acronymn meaning both multiple-access computing, and "machineaided cognition," a term equivalent to man-computer symbiosis.

11 Figure i1 depicts the equipment and principal data flow in the typical multiple-access computer existing today. The equipment used is in most respects commonplace even in batch processing systems, undoubtedly because multiple-access computing is such a new development.2 The central processor is typical of modern general purpose computers. When executing a particular program, the central processor obtains instructions and data for the program from the main memory. The input-output processor, sometimes called a "data channel," is a special purpose digital computer for controlling data transmission with a peripheral device-in this case, the secondary memory. The I/O processor can independently trafismit data or computer programs between the main memory and the secondary memory. The secondary memory is a large capacity magnetic disc or magnetic drum unit, capable of storing at least a million computer words. This unit holds almost all of the system library~ Some programs however are permanently held in the main memory. The latter constitute part of the operational control for the system. The communication processor is a special purpose digital computer which provides the interface between common carrier telephone and telegraph lines and the data processing complex. The IBM 7741 [30] is an example of such a unit. Together with suitable line termination devices the communication processor performs such functions as conversion 2This is bound to change as new computers are developed specifically for multiple-access computing. A suggestion of things to come is contained in Dennis' paper [28], and a paper by Galler, Arden, Westervelt and O'Brien [29]

Remote User Stations Central Data Processing Complex Interval Common Carrier Lines CnaTiming Centr al Clock Processor __ r ~~~~-1 Communication Main Input- Output Secondary ~ I Processor Memory Processor Memory ( ~ - ^~^ Data Flow Path - ~ -^ Interrupt Signal.Path Fig, 1,1. Equipment units and data paths in a multiple-access computer.

13 of signal levels, conversion of teletype code to computer code, etc. As each teletype character is received from remote users, the communication processor stores the character and the user line number as a single word in the main memory. The central processor is thereby relieved of many time-consuming but necessary communication activities. The interval timing clock is a digital device which will count out a prescribed interval and signal the CPU1 at its completion. The signal is an "interrupt" signal which forces the CPU to branch from the instruction sequence currently being executed to another designated instruction location. This action is intimately associated with the overall operational control, as we will presently describe. Other interrupt signals originate from the communications processor and input-output processor to notify the CPU of the completion of operations assigned by it, or equipment faults detected by these units. During operation of the multiple-access computer, users at their remote stations are free to independently type in commands and input data to the computer. The commands call for the computer to execute a designated program, and input data may be required from the user during the subsequent execution. References [18,20,23,24,25,26] contain extensive illustrations of user commands and machine responses in a session at the console of a multiple-access computer.'Central processor unit.

14 1.2.3 Executive Control The central processor of a multiple-access computer can execute only one program at a time. However there is no constraint on the users which prevents two or more from attempting to have a program executed at the same time. Moreover there is nothing to prevent a user from initiating the execution of an incorrect program which will never terminate. One cannot rely on the user to recognize such a fault in time -to prevent serious interference with other users of the computer. Hence to ensure orderly use of the available facilities there must exist an operational control process which supervises user program activities and resolves conflicting demands. Part of this control function is implemented in the "hardware" of the system. However, the principal operational control functions of the system reside in "software," i.e., a computer program, commonly referred to as the Executive Control Program (ECP) or Supervisor. The decision to implement the major control in a stored program stems of course from the desire to be able to readily alter the procedure after observing the system in operation. The ECP goes into execution in the central processor whenever an event occurs which requires a control action to be taken. Such events are indicated by either an "interrupt" signal, or the execution of a transfer instruction~. The interrupt forces the CPU to leave the user program being executed and transfer to a designated entry point in the ECPo The ECP may also be entered by a "voluntary" transfer from the

15 user program, as at the completion of the user program. During its execution, the ECP initiates control actions which will depend upon the system conditions existing at the time, In addition to the knowledge conveyed by the type of interrupt, the ECP uses information stored in tables held in main memory to determine all pertinent system conditions at the time of the interrupt. One important table in particular is the Job List. This lists each user who has a program in process and the status of the user's program. The. status might indicate for example that the program was in execution before the interrupt. Alternatively, it may have been waiting to be executed, or waiting for input data from the user. Additional information about the program may also be included, such as the total execution time it has received, and the total. size of the program in computer words. Such information is useful in "scheduling" of the central processor. This is the decision process that determines which user program will be placed in execution at any time, and how long it will be allowed to run before being superceded by another user program. This aspect of executive control is particularly of interest in this report. There are also so-called "buffer" areas in main memory which hold information the ECP may use or alter during its execution, The Input Buffer, accessed by both the ECP and the communication processor, receiv'es input characters from the userso The ECP inspects each of these and interprets them relative to the user's current status. For example, if the user program is waiting for input, the received characters are

16 simply passed on by the ECP to the user program. Otherwise, a single character might signify a particular command, to which the ECP must respond by entering the user in the Job List. The Output Buffer stores characters being passed by the ECP from the user program to the communication processor. Figure 1l2 gives a somewhat simplified picture of the overall flow of the Executive Control Program, and the functions performed in response to various interrupts. We have essentially covered already the ECP response when an input character is received from the communications processor. The main effect of the ECP response to a user error, user job completion, a clock interrupt, or a user program request for input is to remove the user program from main memory and schedule another user program for execution. When removed, the program is saved in the secondary memory unit of the computer. From there it can be retrieved later for further execution, or for debugging analysis by the user from his console. The latter would involve other commands which use the saved program as input data. The process of removing one program from main memory and substituting another from the secondary memory is referred to as "swapping"o Note that a user program may be removed from execution whenever it requires input or output with the user through his typewriter, or with the secondary memory through the input-output processorO In the case of a request for input from the user, this is done because the user may require an appreciable time to type in this data. Meanwhile, the CPU can

Type of Interrupt or Exit Enter New Job in Job Command List Interpret Input Character, the Character User Data,| Exit ______I'_ Move Character Return to User from Buffer to Current Curren Program | User Error Program Update Remove er Job Completion _ List Program Clock Interrupt I,, User Input Reauest+ Sc edule Next User -" Execution Move Buffer Full User Output Request+ Character to Output Buffer I I~ I/ O Request*,. Enter Request in I/ 0 Queue ________________u~ -Initiate I/ O Completion* __ Next I/ Operation * Secondary Memory Input-Output + Typewriter Input- Output Fig. 1.2. Executive control functions.

18 serve other users. For a program request to send output to the user, swapping occurs only if the user program fills up the Output Buffer, which can hold only a certain number of characters. Swapping here allows the communication processor to clear the Output Buffer. In the case of I/O with the secondary memory, swapping is done because such I/O can be carried out concurrently with program execution in most systems, due to the capability of the I/O processor. The removal of a user program at the occurrence of the clock interrupt is associated with the scheduling process for the central processor. The importance of this aspect here warrants a special discussion of existing techniques"and their limitations. 1,2.4 Scheduling The Job List of the Executive Control Program constitutes a queue of user programs which are waiting to be executed. The number of jobs in the list will fluctuate randomly in time as jobs receive execution and are completed, and as new commands are typed in by the users. Perhaps more often than not, a new user job entering the Job List will find another user's program already waitingo Before the arriving job is executed it will generally have to wait in queue for some interval of time, This delay increases the "response time" which the user observes. Response time is the interval from the time when he user ompletes typing a command, to the time when the machine completes typing back the resuits for that command.

19 The average response time that a user observes should not be uncomfortably long if multiple-access computing is to accomplish its avowed purpose. For each command however, the response time must be at least as long as the execution time required by the command, Now, effective use of man-machine interaction in multiple-access computing tends to encourage short execution time. The great speed of a modern computer also contributes, so that job execution time will frequently be only a fraction of a second. Nevertheless, because of either logical errors in programming or the complexity of some computing problems, execution time could be manyseconds in duration or even unending, This fact makes it clear that first-come, first-served scheduling of job execution is unsatisfactory for multiple-access computing. One must instead impose a maximum time allocation for execution of each job- Its value moreover should be consistent with desirable response time values not more than a few seconds' Yet, a program is not necessarily faulty if it is not completed within the allocated time intervalo These observations have led to widespread use [20,23,24226] of a scheduling process known as "round-robin" service0 In round-robin, new jobs are entered in order by arrival at the end of the queue. Jobs are selected for execution one at a time from the top of the queue, and executed for some "quantum" of time. The same amount of execution is given to each job, unless of course it terminates itself during the quantum0 If uncompleted at the end of the quantum, the job is placed. at the end of the queue, behind any which arrived. The processor con

20 tinues in this way, cycling through the queue. In each cycle the first job to receive its quantum is also the earliest arrival present. Figure 1,3 depicts the process. In most systems today only one executable user program occupies main memory at any time, To switch execution to another user program, the input-output processor must swap two programs between the main and secondary memories. The swap time is not productive per se, since no job receives execution during this time. A problem in round-robin service is to determine the optimum quantum to-be allocated, in order to balance the loss of production in swapping against desirable'response time values, One can see that if the quantum is very short,say 50 msec, the machine will spend less than half the time in productive processingo On the other hand, if the quantum is very long, say 5 seconds, then an arriving user program might wait many seconds in queue before receiving its first pass of executiono No complete solution has been obtained for this problem. What has been done is a rough setting of the quantum at 200-500 msec, This makes swapping relatively less inefficient than for shorter values, and allows a short response time provided the queue does not get large. Two other approaches to the problem have led to modifications of round-robin service. The System Development Corporation technique [25] takes a more direct view of the response time obtained by the user. A response time cycle of about 2 seconds is established, and the quantum allocated to each job is determined by equally dividing this interval

21 Queue Select Job at Top of Queue Job Job Execute for Completed Maximum Time Jobs Depart Interval q Arriving Jobs If Any Return Uncompleted Job to End of Queue and Move Others Up Fig. 1.3. Representation of round-robin service.

22 among the jobs in queue when it is executed. Thus jobs receive a longer quantum when there are few of then in queue, and a very short job is completed in close to 2 seconds. The process becomes inefficient for long queues however, since the swapping time takes a larger portion of the response cycleo To counteract this, a minimum quantum duration is established and not every program is processed in each cycle. This scheduling technique might be called "dynamic round-robin," since the allocated time changes with the number of user programs in queueo The most sophisticated technique in use is that developed by Corbato, et alo [171o The MIT system, more than any other, is faced with wide variability in the service requirements of users. For example, user programs can range up to 32,000 words in size. Swap times can therefore approach several seconds. Execution times can be as small as several milliseconds, while for programs with complex decision trees, such as Samuel's checker player [32], the execution time can be on the order of minuteso The technique devised by Corbato may be called a multi-level priority scheduling process. There are a fixed number, say M, of queueing levels for jobs in processO That is to say, the set of jobs in queue may be separated into M disjoint subsets, some of which may be empty. The levels are ranked by priority, with level 1 having highest priority. This means that whenever there is a job in level 1, it will be executed in preference to a job at any other level~ A job at level 2 will be executed in preference to a job of level 3 or lower, etco There is a max

imum time allocation for execution associated with each level, and if a job is not completed in this time it is moved to the next lower priority level. The lowest level queue is processed by a round-robin procedureo Within each level, jobs are processed in order of arrival. The size of a program in words determines its priority level at arrival, according to a general formula. Let W denote the size of a program, and WO a specified number of words. A job enters level I on arrival if o2 ( ]+1 = 0,1,2,...,M-1 Here [ ] denotes the integral part of the enclosed quantity. Presently W0 is taken as 4000 words [133] The maximum time allocated to each job processed from level I is 2"q seconds, where presently q is 4 seconds [33]o This procedure has two notable advantages. First, the variable time allocation allows a continual balance to be achieved between swapping time and productive processing. Second, the priority scheme allows small, short jobs to receive short response with less interference from large or long jobs. Tts principal disadvantage is that large short jobs obtain. comparatively poor response. The round-robin scheduling procedure has considerable appeal because it seems to directly fit the idea of time-sharingo One can see that in a round-robin each user gets equal treatment from the central processor, an equal share of its time as it wereo But the objective of

24 scheduling in multiple-access computing is not merely to ensure each user an equal share of central processor time. Rather it is to provide each user with as rapid a response as possible, consistent with the execution time the user has requested by his command. The multiple queue scheme, such as advanced by Corbato, appears to have greater flexibility for achieving a comprehensive optimization of response time performance. The techniques in use today have resulted from good engineering judgment and limited experimentation with the actual systems. The success of the multiple-access computing concept speaks well for the ingenuity of the pioneers of the field. But with such large and complex systems there is great value in supplementing the designer's intuition with whatever theoretical and experimental evidence that can be made available. Experiments are so expensive and difficult to control, however, that a purely experimental approach is not likely to establish the general principles that the designer may seek. The study of a theoretical model can lead to such results, as engineers are well aware. This report has this objective in mind. In Chapters V and VI we will consider a theoretical model of scheduling in a time-shared computero This model will include a wider class of scheduling decision processes than have been used in practice. By an optimization procedure which is established in Chapters III and IV, we are able to determine some general principles governing the optimum control of queues in such computer systems.

25 1,3 ILLUSTRATIVE ANALYSIS OF A SIMPLE PROBLEM The nature of the theory of queues is best illustrated by outlining the solution of a simple queueing problem. This will also permit introducing some fundamental concepts to be used later. Moreover it will provide background for contrasting the analysis approach with the optimization approach which is the principal technique of this study. 1.3oL Problem Description Consider a service system in which there are two distinct sources for arrivals, Each source originates only one arrival at a time, and no more until service of its preceding arrival is completed. The arrivals from both sources compete for use of a single service facility on a firstcome, first-served basis, A queue will exist if one source generates an arrival when the server is already occupied. Figure 1.4 depicts the situationo Classically this is viewed as a "repairman" problem, where the sources are machines which fail and the service facility is a repairman. See Takacs [10] for a more general treatment of this type of problem. Let us assume for this illustration that the service time for each arrival is always of one unit duration, We are also concerned with the time interval ta which elapses between the end of service of an arrival from one source and the time when it generates its next arrival. Let us assume that this interval is a random variable, Moreover, assume that the two sources have the same statistical properties for tan and that

I Single Arrival Single Departure Sources Queue ~ Server of Completed Unit 2 Fig. 1.4. Representation of a simple queueing problem.

27 separate observations of ta are mutually independent, and drawn from a common distributiono This distribution is assumed to be described by the "negative exponential" probability distribution function Pr(ta<T) = 1 - e-T < T < T ( 1) Here 1/k is the mean value of ta. The negative exponential distribution plays a very prominent role in queueing theory. This stems in part from the "memoryless" property which it alone possesses of all possible probability distribution functionso This property is evident from-the conditional distribution function Pr(tal+T tta>T) = e -Te ( T+T > O e 1- easT = - e-T - Pr(t<T) (1.2) This result states that at any time T after the interval ta commences, the remaining time in the interval has the same distribution as the entire intervale Thus T is irrelevant, and the distribution is said to be memoryless with respect to T7 As this system operates, there is a possibility at any time of there being 0,1, or 2 arrivals present for service. If there are 2, one arrival must be waiting in queue. If there are 0 or 1, then at least one source is in process of generating a new arrival0 It is convenient to say the source is "idle" in this case0

28 The classical problem of queueing analysis is to determine the probabilities Po, Pi, and P2 which govern the number of arrivals present at any time, Generally these probabilities will change with time due to the influence of the initial condition of the system. In a great many problems of interest, however, the probabilities approach constant values as time proceeds. This is called the "steady state" or "statistical equilibrium". The values Po, P1, and P2 are then called the "steady state" or "equilibrium" probabilities. 1.3.2 Analysis As a means of finding Po, PI, and P2, let us first consider quite specific instants of time during the system operation, namely the successive instants when a service is completed. Figure 1.5 depicts a sample of the time behavior of the system, and the points mentioned. Immediately after a service completion, the number of arrivals present can only be either 0 or 1, Consider any two successive instants, Given the number present at the first point}, we can readily determine the probability of 0 or 1 present at the second, For example, if 1 is present at the first, then the second point occurs one time unit latero The probability that one arrival is present then is simply the probability that an arrival occurs from the idle source during the unit interval. iAlthough the phrase "at the first point" or "at the second point" is used, we mean of course immediately after the time points As shown in Fig, 1,5 there is a step change in the number present at the time points in questiono

Time instants of successive service completions 2 Cl Q) O L I. l I l, l l I 0 1 2 3 4 5 6 7 8 9 10 Time Fig. 1.5. A sample of system behavior showing service completion times. ^^ ^ a 1 _ II _____ ______ ___ _____ ^^ O

50 The memoryless property of the negative exponential distribution on ta now enters to assure us that we need not know how long the source has been idle. The probability that ta ends in any unit of time is always 1-e-X. Thus we have derived a "transition" probability Pll = 1 - e- (1.3) where the subscripts indicate the given number of arrivals at the first instant, and the observed number at the second, respectively, On the other hand, no arrivals will be present at the second time point if the idle source does not generate a service demand, The probability of this event gives Plo = e- (1,4) Now when no arrivals are present at the first time point considered, the second completion occurs one time unit after the next arrival for service. At the second point there will be one arrival present provided a second arrival occurs during the unit interval where the next arrival is in service. Therefore Poi = 1 - e- (1o5) and further Poo = e.-k (1.6) The number of arrivals present can be defined as the "state" of the system at the service completions, With respect to this definition of state, the system has the Markov property at these time points. This means that the known state at any such time is sufficient to determine the probability of the state at any future one of these time points.

31 The states of the system at successive service completions then constitute a stochastic process known as a Markov chaino. Because of the properties of the system we have described, one finds a set of asymptotic values, denoted 1t and to which are the equilibrium probabilities of states 1 and 0 at a service completion. These satisfy the system of equations Jto - toPoo + ETiPio (197) Jt = toPoi + EiPii Thus we find that Xo = ea(1.8) Jti = 1- e-X With this result the probabilities P,, P1, and P2 can be found. Intuitively,2 they represent the average fraction of time that the system spends in states 0, 1, 2, respectively, throughout its history of operation. But they also are the average fraction of time the system spends in these states during an interval between successive completions, after the system has attained statistical, equilibrium, At the first completion defining such an interval, the system is in state 0 or 1 with probabilities to and it respectively. By considering the random events during such an interval, conditional on states 0 or 1 at its beginning, Also called an "imbedded" Markov chain for systems such as treated here, because only specific points in time are included9 2The truth of this can be rigorously established by considering in detail the ergodic properties of the system.

32 one can evaluate the expected time spent in states 0, 1, and 2 before the end of the interval, Then Expectation of time spent in state i between successive completions Expectation of total time between successive completions for i = 0,1,2. For example, we obtain Po...2 ~ (1.10) 3~ t + L j+ g1(l) where 1/2\ is the expectation of time spent in state 0, conditional on state 0 at the first service completion considered. But Co+gt = 1, and so = o _ e2X+1o 2+e(1.11) 2.+go 2,N+e~A similar consideration leads to P = 2Z e (112) and P2 = 2X-2(1-e-X) (1 13) 2/,+e-% Knowledge of P o, P1, P2, 7o and l1 enables one to determine a number of other statistical properties of system behavior. For example, the expected number of arrivals present for service at any point in time is (P1+2P2)o The important point to bear in mind is that analysis consists of evaluating the behavior of a completely specified system. The

33 general objective of analysis is to determine the equilibrium probabilities on the possible states of the system of interest.

CHAPTER II OPTIMIZATION IN THE CONTROL OF QUEUES This chapter has two objectives. First, a brief review will be given of some results from the classical theory of queues. These are aimed at optimizing the control of queues and illustrate the limited results obtained from closed form analysis, Second, the general optimization model to be studied in this research is introducedo Previous research in this area is then reviewed, and the new theoretical results obtained here are summarized. 2. 1 RESULTS OBTAINED FROM ANALYSIS 21o1 Single Queue Systems One of the simplest systems in structure is that where arrivals form a single queue in the service system. Generally arrivals are presumed to be a Poisson process and we limit discussion to this case. Then the interval between successive arrivals has the negative exponential distribution. It is important to distinguish two cases. On the one hand, arrivals may issue from an unlimited source. With an unlimited source the mean rate of arrivals is independent of the number present for service. On the other hand, arrivals may arise from a finite source as in the simple example given in Section loo5 Here the mean rate of arrivals diminishes as the number waiting for service in34

55 creases Different aspects of system behavior may be of interest in determining performance, but a primary concern is often the expectation of the number of arrivals in queue at an arbitrary time. The system designer might like to institute a queue control procedure which reduces this expectation below that of the first-come, first-served procedure. However, there is a rather widely accepted principle defining a class of procedures which do not exhibit such improvement. Paraphrasing Morse ([7], PO 116): all single server systems having the following properties have equal expectation of the queue length. The properties are: lo1 There is a common distribution applying to the service time of all arrivals. 2. All arrivals remain in queue until served. 53 Every arrival remains in service until completed. 4o The service system is not allowed to be idle when arrivals are waitingo This class of controls includes in particular the first-come, firstserved process; random selection for service; and last-come, firstserved selectiono The author is not aware of any rigorous mathematical proof of this claim in the generality with which it is stated. However.for the three before-mentioned queue disciplines the result has been proven by separate analyses of each case, assuming arrivals from an unlimited source. In regard to random service the analysis has been carried out

36 only for exponential service times and for constant service time at a single server (see Saaty, p. 243). For first-come, first-served and last-come, first-served, the analyses have been done for service times having an arbitrary distribution with finite expectation. In all cases it has also been shown that the expected time-in-system of a random arrival is unchanged. In fact, the principle may hold under other conditions than those giveno This is suggested by an analysis given recently by Kleinrock [38, 47] for the round-robin strategy. Kleinrock uses a discrete-time formulation in which a time step has duration q, the length of the quantum of service time allocated to each arrival in queue. Bernoulli arrivals are defined for the process, where.at each time step there is probability Xq of an arrival, and at most one arrival may occur. The service time is assumed to have a geometric distribution, the discrete-time analog of the negative exponential, and in each time step there is probability ca that service continues to the next step. Kleinrock proves that the expected time-in-system is identical for the round-robin strategy and for the first-come, first-served procedure. Moreover he derives an expression for the conditional expectation of the time-in-system, given an arrival requires nq units of service time for completion. This result is l-p.: (-p1 Tn = L-pq- + ((2. (1), ) where a = +LXq q l-cy p =. —.1-0

37 Figure 2.1 is a graph of Tn for one set of values of a, X, and q. Note in particular that the conditional expectation for n = q/l-/a is the same as the expected time-in-system for the first-come, firstserved case, This value of n is the average service time, and this result is generally true. For smaller values of n, however, Tn is less and for larger values of n the converse is true. Thus a roundrobin strategy favors arrivals having smaller service time than the average, at the expense of increasing the time-in-system of arrivals requiring more service time than the average. Coffman and Krishnamoorthi [48] have given a more general analysis of the round-robin strategy for the case of a finite arrival source. Their model treats the geometric distribution on service time, but allows arrivals to occur as a continuous time process. Moreover the model includes a constant time for swapping jobs, included in every quantum. Coffman and Krishnamoorthi approach the system in basically the same manner as used for the system of Section 1o3. By considering the number of arrivals present at the end of every quantum of service they establish an imbedded Markov chain. Formulae are derived for the probabilities ii that i arrivals are waiting at the end of a quantum, The mean and variance of the number waiting is also given. However there is no derivation equivalent to Kleinrock's for Tn, the expected timein-system for a job requiring n quanta to complete. This is more difficult to obtain for the limited source model.

38 90 80 70 -'7 Round-robin I // 60 (D / ~ /Z~ First-come, z^~50L~~~ -/ ~ ~ ~~~ // ^~first-served [ 50 / I'' // 40 Parameter Value X.01 30 / q 1.00 o/ ( 0.95 p 0.20 20 - 10 0 0 10 20 30 40 50 60 70 n Required number of quanta of service Fig. 2.1. Graph of expected time-in-system for round-robin service.

39 Analyses like the foregoing suggest that the system designer must turn to another class of control procedures to reduce the expected timein-system. This leads to consideration of procedures where distinctions between arrivals are allowed on the basis of their service timeo Such procedures give rise to multiple queue systems with priorities. 2o 1o2 Multiple Queue Systems One of the noteworthy general results of the theory of queues was obtained by Cobham [49] and Holley [50] in treating a priority scheduling procedure. As depicted in Fig. 2~2 they assumed Poisson arrivals from an infinite source to each of R queues, serviced by a single facilityo Service is non-preemptive in that any service' once begun proceeds uninterrupted until it is completed. Priority service means that whenever the facility is vacated, the next arrival served is from queue p only if queues 1,2,,o0,p-l are empty and p is note The earliest arrival is always taken from each queueo The priority 1 queue is always served unless emptyo Arrivals of each priority class have a common service time distribution which may be arbi-traryo Cobham's principal result was a general formula for the mean time spent in queue by a job of any priority class. This enables a system designer to investigate the reduction of waiting time by priority assignmento Cobham also gave a similar result for the case of multiple parallel servers' and a negative exponential distribution on service time0 common to all arrivals,

40 Mean rates of Poisson Priority arrivals queues x 2 Singe Departures p *^ / | server p P f. A~R ( R Fig. 2.2. Priority service on a single facility.

41 Cobham's result has been extended to a system with an infinite number of priority levels by Phipps (Saaty [8], p. 236). This makes it possible to study priority scheduling based on the service time of arrivals' Assuming the service time of every arrival is known exactly, it follows that the expected time-in-system is minimized when the arrival of shortest service time has highest priority (see [3,38,51,52]). In other::words ^"shortest-job-first" describes the rule for optimizing the expected time-in-system when each arrival is served to completion and service times are known exactly. This holds moreover for any distribution of service times over the population of arrivalso Rather than minimize the expected time-in-system it may be desirable to treat a weighted expectationo Here assume that the population of arrivals can be separated into classes based on the required service time, denoted by a, and a quantity p. The latter is a "loss rate" by which the time-in-system is to be weighted to yield a loss in servicing each arrivals The expected total loss of a random arrival is then minimized by giving highest priority to the arrival in queue having the largest value of p/a. See Refs. 3, 38, 51, 52. Again it is assumed in this result that p and a are known exactly, and that a service once begun is not interrupted until completed. Another type of priority mechanism has been treated by Jackson [53] and Kleinrock [38]. Jackson, who terms his a "dynamic priority' derives upper ed and thus compares his control process with first-come, first-served and static priority

42 schemes as treated by Cobhamo Only non-preemptive service is-treated. Kleinrock treats a delay-dependent priority system which generalizes one aspect of Jackson's approach. Other consideration has been given to preemptive priority.scheduling under static priority assignments. Saaty [8], p. 237 ff, gives a review of such studies. Generally they are limited to requiring a negative exponential service time distribution common to all arrivals, and negligible time involved in the act of preemption. The priority methods which have been studied have been shown to give the system designer some freedom in manipulating the expectation of waiting times. But only in special cases, as we have described, has a priority control process been proven optimal. 2 1o 3 Experimental Analysis and Simulation An immediate and correct conclusion from the preceding discussion is that the closed form analytical results of queueing theory will not be sufficient to cope with the analysis and optimization of many practical systems. Where these results are insufficient they nevertheless form valuable background from which the system designer can proceed to carry out experimental analysis and simulation of proposed system operation, By experimental analysis we mean the process of formulating a mathematical model which can be rapidly solved numerically by a digital computero The most convenient and powerful class of models for experimental analysis is that of Markov stochastic processeso For such models, very

45 rapid numerical procedures have been devised (see [54], for example), and used to explore many aspects of behavior in a queueing model [55]. Simulation, by contrast, consists of developing a descriptive functional model of the physical system in a computer program. Execution of this program allows one to observe and record the actual operation of the simulation model and thereby deduce properties of the operation of the physical system, Simulation is a useful tool because a much wider range of physical systems can be treated, but it is more expensive to use than experimental analysis. Experimental analysis and simulation are fruitful for investigating complex control procedures, provided the permissible control processes can be restricted to a few alternatives by virtue of economic or practical reasons, Often there is a very large number of acceptable control processes, and it is not feasible to analyze and compare each of these separatelyo The system designer needs to supplement the available tools of analysis and simulation with an optimization technique~ This technique should allow formulating a set of alternative control procedures which reasonably represents the possibilities in a physical system. The technique should also include an efficient algorithm which will systematically synthesize a control process attaining optimum performances These objectives are largely realized by the techniques of Markov decision theory, The remainder of this chapter will describe a particular Markov decision process to be studied in the following chapters~

44 2.2 A MARKOVIAN DECISION PROCESS This section defines a sequential decision process by which one can treat optimal control of a class of stochastic systems. Following the definition of the process, the remainder of the chapter will show how this decision process relates to other work and will review the contributions made in this report to the theory and application of such processes. The wide spread application which Markov stochastic processes receive in the analysis of queues has already been pointed out, It seems quite likely that Markov decision processes will be equally versatile in the optimization of service systems. Since one cannot anticipate all problems to which the formulation may be applied, an attempt is made to keep some generality in the abstract definition. The particular model we define is more precisely described as a "discrete Semi-Markov decision process", as we will explain. 2o2.1 System Behavior The decision process involves a system which exists at any time in one of a finite number of states. We identify each of the states by an integer index iand let N denote the total number of states. Thus i may take on values 1,2,o.o,No As time proceeds the system will experience random changes or transitions in state. A change of state mayv.only:.occur at.an integer. value of time, For this reason time essentially proceeds in steps of one unit duration, and we say that we have a discrete-time process.

45 The system remains in a given state for a random number of time steps. At the end of such an interval it moves immediately to a new state, determined by a random mechanism. We allow however that the new state may be the same as the old, io.e, the system may make a transition into the same state. As used here, a "transition" includes the time interval wherein the system remains in a known state and ends with the subsequent random movement to the next state, Figure 2.3 depicts this behavioro In describing this behavior we will often say that a transition "begins in state i and ends in state j", or is "from state i to state j". For example, the first transition depicted in Fig. 2 3 begins in state 3 and ends in state 4. It need not be assumed, as shown in Fig. 2-3 that the origin of time is the beginning of a transition, The system state at the beginning of the first transition, as well as its starting time relative to the time origin may be specified by a joint probability distribution. A sequential decision process is associated with the system behavior because a control action must be taken at the beginning of each transition. The control action is taken instantaneously upon entering the new state, The state is assumed to be observable, and the observation is used in determining the control action to be taken, For each observed state, there is a finite set of alternative control actions. For state i5 this set is denoted by Ai and the total number of different actions by the integer Kio An integer index k is used to identify each action, so that Ai consists of the set of integers k = 1,2o,,,.oKio

Initial points of transitions i ~ \~ n n~I~ I I I N N-1 N-2 E (Note transition 4 N-3 into same state) ^I I I I 10 15 20 25 30 35 -4 3 2 0 5 10 15 20 25 30 35 Time Fig. 2.3. Example of changes of state during the decision process.

The probabilistic mechanism governing each transition in state is assumed to be Markovian with respect to the state observed at the beginning of the transition and the control action taken. This means that the only historical knowledge required to describe a transition is the current state i and the action k taken upon entering io The probabilistic behavior is also assumed to be stationary, so that given i and k, one need not know the time at which a transition begins, The probability that a transition ends with the system entering state j is dependent only on i, j, and k, and is denoted by p. (Here k is a superscript)'3 Further. the probability that the transition interval is T units long (T = 122,5, o.) depends only upon i, j, k, and T) and is written f..(T)~ Every transition must be at least one time step in duration. The joint probability that a transition has duration T and terminates in state j, given that it began in state i with action k, is written1 k k qij(Tlk) = ifi(T) (2~2) The following conditions must be true for the permissible values of i, j, k, and T, Pkj > 0' N XJ pk. = 1 (2.4) j=1 1The use of k as a superscript in defining some quantities stems from the usage established by Howard [63,74] and Jewell [73 1 Otherwise we follow the convention indicated in Eq. (2.2).

48 fk.(T) > 0 (2.5) ij o0 f %(T = 1 (2.6) T=l The expectation of the duration of a transition from state i when k action k has been taken is denoted v. and satisfies N 00 p=k.k (T) (2 7) 1i 1313(27) j=l T=1 It will be assumed that fkj(T) is such that is finite for all permissiIe,(T) is such that vi is finite for all permissible actions k and all states i. This assumption is denoted by the statement vk < o for all i, k. (2.8) 2.2o2 Control Policies and Returns A control policy or strategy is the means of specifying which control action is to be taken at the beginning of each transition, Given an observed state in one can in general select a control action by a random choice among the actions permitted for that state, the set Aio Moreover the probability governing the random selection of a given action k may depend explicitly on time. The control policy is especially simple however if it is stationary and deterministic. In this case a unique action is specified for each state, and it is to be taken at any time a transition begins in that state, Such a policy is simply described by a table which lists the

49 control action associated with each of the possible states of the system, See Fig. 2e 4. One objective of this report is to determine sufficient conditions for which one need only consider such policies in attempting to optimize system performance. In the formulation we are now defining, the performance of the system during a transition interval is measured by a "reward" or "return" received as time proceeds. The return received in each unit time step during a transition can be arbitrary, except for the following limitations The dependence of the return on the past history of the system operation may be described knowing only the state and the action taken at the beginning of the transition. In other words, the return has the Markov property. The return may additionally depend upon the end state j of the transition and the duration T of the transition interval, Thus the return received for the mth unit time step during a transition may be denoted r k(m T). We assume that rkj(mlT) = 0 when m > T for all 13 i i, j, k?, T so that a transition will contribute a return only while it is in progress. Later in this work we are interested in the expectation of the return contributed by a transition at the mth unit time step after it begins. For a transition from state i with action k, this expectation is denoted u.(mli.k) and N X u(mli,k) = Z pk - r(mlT)fk.(T) (m = 1,2,3oo) (29) j=1 T=m

50 Index i of Observed Index k of Action to State be Taken 1 2 2 1 3 2 4 3 5 1 6 2 7 3 8 1 Fig. 2.4. Example of tabular representation of a stationary deterministic control policy.

51 This definition accounts for the possibility that the transition ends before m steps have elapsed, in which case no return would be received then from the transition. The expectation of the total return from a transik tion, called hereafter the one-transition return, is denoted as b.o This 1 is the sum of the expected returns at every unit step after the transition begins, so 00 bk = u(mli,k) (2.10) m=l We will assume for convenience in obtaining the results of Chapters III and IV that the series in Eqb (2.10) is absolutely convergent, ieo., 00 J lu(mji,k) < oo for all i and k. (2.11) m=l An example will be given in Section 4,2, Chapter IV, which demonstrates that this is not a necessary condition for these results, The complete specification of the decision process requires giving k k k the values of N, Ki:Ppij fij(T) and rij(mIT) for all i, j, k, T, m. With this information one can proceed to consider optimization of the k k k process. It will be seen later that N, Ki, pkij, vik and bi suffice for the optimization problem of principal interest in service systems, 2o2o3 Optimization The performance of the system over a period of time is measured by,the sum of the separate returns for the unit time steps in the interval,

52 The expectation of the total return will depend upon the initial state and time at which the process begins, and the control policy which is used. There are several optimization problems which are pertinent to the background and application of the decision process. It is convenient now to characterize the optimal policy for each case. However the significance of these definitions can be questioned at this point on mathematical grounds. The required mathematical properties will be established in Chapters III and IV. Let the symbol r denote any one of the set of admissible control policies. A precise specification of this set will be given in Chapter III. Assume for the present that the decision process begins at time 09 with a transition beginning in state i(i = 1,2,.ooN). Define Rn(ir) as the expectation of the return received at the nth time step of the process with this starting condition when policy 31 is usedo Then letting VT(i-i,) denote the expectation of the total return of the process at time T, one has T VT(in) = Rn(i,) (2.12) n=l A worthwhile optimization problem arises with the above definitions. Finite-Time Optimization: For given i and T, find the policy I which yields a maximum value for VT(i,7t). If one considers an infinite interval of time (T + co), corresponding to indefinite operation of the system, VT(i,) will not in general be

53 finite for any admissible policy. One cannot define optimization in the same way as the finite time case, and still have a meaningful problem. It is natural to inquire in this case whether the average expectation per unit time AT(i,t) = VT(i,)/T (2.13) remains finite as T -+ o. Generally this is true for every policy.1 It is very tempting here to suggest defining optimization as producing a maximum value for the limit of AT(i,r) as T -+ o. For an arbitrarily chosen policy however a limit may not exist. Rather AT(i,)t) 2 may oscillate within finite bounds for increasing values of T. Nevertheless there will be' points within the oscillation interval such that within any distance of these points there are infinitely many points of the sequence (AT(i,7), T = 1,2,.... Consider the inferior limit3 or lim inf of the sequence. Only a finite number of the values AT(i,J) are strictly less than the lim inf. It is therefore consistent to define the following problem. Infinite-Time Optimization: Find the policy t which yields a maximum value of lim inf VT(i,) (4) (i^ ) = T (2.14) iSee Chapter III, Section 3.2.4. 2The function sin tT/2 (T = 1,2,3...) gives an example of such behavior, Its bounds are +1, -1 of course. 3See [56], p. 41-42.

54 Finally there are circumstances, notably in applications to economics and management science, where one wishes to consider infinite-time optimization but with "discounting" of the return received in the distant future. Discounting reduces the value of the return received at the nth time step by a factor Can where 0 < a < 1. The expectation of the total return with discounting, conditional on the initial state i, is 00 D(is ) = n anRn(i,) (2.15) n=l We then define the following problem. Discounting Optimization: For given i and a, find the policy Jr which yields maximum value of Du(i,), 0 <ca < 1. The return function with discounting, Da(i,t), can also be viewed as the generating function1 of the sequence of unit time returns Rn(it). This interpretation is more appropriate to its use in this study. Moreover by substituting z = 1/a, the return function with discounting is seen to be the transform of the time sequence of unit step returns. Hence, although discounting has a physical interpretation, it is also a well known mathematical approach to discrete-time problems. We will continue to use the term "discounting" because of its prevalence in the background of the Markov decision processes. 1See Feller [4], p. 248. 2See Kaplan [57], p.o 75~

55 2,2o4 System Behavior as a Stochastic Process This report will deal exclusively with the decision process we have just definedo There are other types of Markov decision processes to be found in the literature, which is reviewed in the next sectiono When the control policy to be used has been specified, the state of the system as a function of time can be viewed as a stochastic processo When also the control policy is stationary and deterministicthe stochastic process is most easily described. Letting ki denote the action specified for state i by the policy, the pertinent data are the transiki ki tion probabilities pij and the transition time probabilities fij(T)o Depending upon the random variables of interest, there are several stochastic processes associated with the system. Consider the random variable XmA Xm = the observed state of the system at the beginning of the mth transition of the process, m = 1,2, oo o The mth transition may begin at any unit time step > m-10 The sequence of random variables (Xm is a Markov chain with transition probabilities kij Pij (see Pyke, [58], Lemma 5 3o l, po 1233o) Next consider the random variable Yn' Yn = the observed state of the system at unit time n (n = 0,12,. o o)'The sequence of random variables (Yn} is a Semi-Markov process ([58], Definition 3.5^ Po 1234) o It is fundamentally determined by the transition distribution qij(Tjki.) see Eqo (2-2)o A Semi-Iarkov process is therefore a Markov chain with the added complexity which allows transitions to be of random durationo The

56 Markov chain (Xm) is called the underlying Markov chain of the SemiMarkov process ([Yno The contrast between Yn and Xm is illustrated in Fig. 2.5. An additional stochastic process is obtained by considering the random variables Uj(T) = the number of times state j has been entered up to and including time T, (T = 1,2,3,,..) and the vector random variable, (T) = (UV(T)U2(T),..,UN(T)) e (T) is a Markov-Renewal process ([58], p. 1234). Pyke has shown that the Semi-Markov process and the Markov-Renewal process are equivalent ways of looking at the same stochastic phenomenon. It will be fruitful at one point or another to view the decision process via each of these stochastic processes. Note that the problem treated in Section 1,3 is a Semi-Markov process, but a continuous-time rather than a discrete-time process. 2 3 BACKGROUND ON MARKOV DECISION PROCESSES A special case of the Semi-Markov decision process arises when all transition intervals are of constant one unit duration, Most of the past efforts have concentrated on this case, The earliest reported treatment of such a decision process over infinite time is due to Ro Bellman [59 ] Considering only stationary

8 7 6 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Elapsed Time, n (a) An 8-state discrete Semi-Markov process. 6 5 a) 4 - 3 2 1 I I I I I I I i 0 1 2 3 4 5 6 7 8 Elapsed Transitions, m (b) The associated Markov chain. Fig. 2.~. Behavior of a Semi-Markov process and its associated Marko-v chain. - chain,

58 deterministic policies, Bellman was able to show that the expected total return has a linear asymptotic behavior for large T, i.e., for T oo, VT(i,)) + gT + vi(i = 1,2,...N) Here both vi and g depend upon the policy., but g is independent of i. By this result lim AT(in) = g Thoo for all i. Bellman's proof requires however that every stationary k policy have a positive transition probability matrix, i.e. pij > 0 for all i, j, and k. Later Chuang [62] extended Bellman's proof to the less restrictive case where some finite power of the transition probability matrix is positive, This means that the Markov chain describing system behavior under any stationary policy is irreducible and aperiodic (see [46], p. 256). Results of this kind are important in ascertaining whether a proposed decision process can be treated with the theory. Bellman's and Chuang's results ensure that continued iteration by the dynamic programming algorithm [60,61] will converge to yield the optimal values of g and vio However a great many iterations may be necessary and it may be difficult to determine when convergence has occurredo This realization motivated R. Howard in his well known work [63 ] Like Bellman, Howard considered only stationary, deterministic policies. Using the z-transform technique he established the linear asymptotic behavior for the above decision process under the most general conditions. This behavior is explicitly utilized in the computational technique now

59 known as the Howard algorithm for infinite time optimization. The Howard algorithm is much superior to the dynamic programming method, which Hbward calls "value iteration". By contrast, Howard's method is a "policy iteration". This will be clarified in Chapter IV where Howard's method is introduced. Howard also considered a continuous-time decision process, and demonstrated the application of these processes to baseball strategies, automobile replacement, and taxi cab operation. Blackwell [64,65] and Derman [66,67] are apparently the first to have considered infinite time optimization without an ad hoc restriction to stationary, deterministic policies. Blackwell in [65], restricting consideration to deterministic policies, proves that discounted time optimization can be achieved with a stationary policy regardless of the properties of the transition probability matrices of admissible policies. This is shown to be true for all values of the discount factor 0 < a < 1. Derman [66] proves without restriction that there exists an optimal policy which is stationary and deterministic for infinite-time optimization. In addition to the dynamic programming algorithm and the Howard algorithm, the Simplex method of linear programming has also been suggested for solving the Markov decision process with constant transition times [66,68,69 ] In connection with Manne's paper, Wagner argues that an optimal policy is deterministic [70] by virtue of the properties of a linear programming solution. Also de Ghellinck [71] has suggested an algorithm based on manipulations of the graph depicting the possible transitions in state. This does not seem to be a feasible approach for

60 large systems however. Finally, White [72] has proposed an iterative algorithm which uses the recursive calculation of dynamic programming, but employing the linear asymptotic behavior to accomplish a substantial improvement The discrete Semi-Markov decision process which was defined in Section 2.2 is a special case of the process introduced recently by W. S. Jewell [73] and Howard [74] In the process treated by Jewell, the transition time intervals need not have integer values, but may vary continuously. Jewell and Howard, again considering only stationary deterministic policies, extend Howard's algorithm to cover this decision process where transition times are of random duration, De Cani [75] has also treated the extension of Howard's algorithm for infinite-time optimization of this process. Although much of the past development of Markov decision processes has been motivated by operations research and management science applications' significant conttionsions have also come from control systems engineeringo Eaton and Zadeh [76] have developed the theoretical formulation for decision processes which have constant transition times and terminate by entry to an absorbing state, Florentin [77], Astrom [78], and Feldbaum [79] have also studied such processes, the latter two treating the case where the state of the system cannot be accurately determined for decision making. De Leve [80] has shown a formulation for a decision process with a continuous state variable, and an algorithm analogous to HowardYs,

61 The reported cases were Markov decision processes have been applied to practical problems are not extensive. Howard [6311 Jewell [73], and de Cani [75] give numerical results for problems of baseball strategy, automobile replacement, and machine repairo Except for Howard s, these results are merely illustrative of the computations requiredo Howard treats the largest system, an automobile replacement model having 40 states' Derman [81,82] discusses the mathematical formulation for machine replacement problems, as does Manne [68] for an inventory control problem. A serious attempt to apply the method to business strategy over finite time was made by de Vries [83] using actual statistics on the machine tool industry, The earliest suggestion that Markov decision processes would be of value in the design of service systems seems to have been made by Rutenberg [84], Rutenberg limits the suggestion however to optimizing the economic return from systems in which the number of servers is varied9 eogo the checkout counters of a supermarket. Chuang [62] observes the general merit of the approach and, in particular, its potential value for computer schedulingo 2.4 OBJECTIVES AND RESULTS The remainder of this report is devoted to development of the larkov decision process defined in Section 2.2 as a useful optimization technique. This development must deal with both the theory of the process and the considerations involved in applying it to a practical situa

62 tion. More specific objectives are outlined belowo 2.4.1 Theory In Chapters III and IV the decision process will be considered with a view to infinite-time optimization, with and without discounting. The process will be formulated in a very general way and consideration will begin at the most basic level. Using an approach contained in a rather abstract paper by Karlin [85], the existence of an optimal policy with discounting will be demonstrated. Having this result, the dynamic programming formulation may be derived. This is used to prove that a stationary policy exists which is optimal for any initial condition. It is then shown that an optimal policy is deterministic. In Chapter IV, the method suggested by Derman [66] will be used to prove the principal theoretical result, establishing the existence of a stationary deterministic optimal policy, without discounting' In this proof, the results obtained in Chapter III are vital. Our consideration of discounted return is largely as an avenue to obtain the principal theorem of Chapter IV. The work of Howard [74] and Jewell [73] on a computational algorithm will then be reviewed to complete the theory. Some original results on the effect of round-off errors in Howard's algorithm are also presented. Most of the theoretical results in Chapters III and IV have not been given before for this particular decision process~ The need for these results can be pointed out solely with regard to the principal

63 theorem. One could of course assume as other authors have that only stationary, deterministic policies need to be considered. This assumption has appeal for two reasons. First, as a practical matter one might not consider implementing non-stationary or non-deterministic policies because of their additional complexity. Second, intuition may lead one to correctly surmise that the decision process must have an equilibrium behavior, in which a stationary policy is optimal. On the other hand, so long as the possibility exists that a non-stationary or non-deterministic policy might lead to improved performance, designers may be lead to an erroneous justification for implementing such controls. Further, an ad hoc restriction to "stationary deterministic policies in conducting a study is an unnecessary qualification on the results, based on our conclusion. Finally and most importantly, applying any mathematical theory while supposing facts not rigorously evident may lead to unperceived errors. The theory we present is necessary understanding for avoiding such pitfalls. Moreover the theory suggests that the principal result is more widely applicable than one might suppose on the basis of intuitive ideas, Several examples will be given to illustrate the complexity of systems covered by the theorems, 2o 4.2 Application An equally important contribution of this dissertation is to show that a Markov decision process can lead to worthwhile practical conclusions regarding queue control procedures. In Chapter V we shall

64 formulate a Markov decision model for multi-level queue control in a multiple-access computer. This model is a discrete Semi-Markov decision process, as we have defined it. It incorporates all queue control policies which are presently in use in actual computer systems, except the "dynamic round-robin" used by Schwartz, et al. [25 ]. But a large number of other stationary policies are also treated, demonstrating the capability of the formulation. The principal limitation is that only a small number of remote consoles can be feasibly studied. We shall deal with a 4 console model, which although not unreasonable at the present time (see [20,28]), will not be representative of future multiple-access systems. Still, the general conclusions we obtain seem equally valid for larger systems, based upon the intuitive justification which develops. In order to carry out the optimization of the queue control for this computer model, a computer program has been developed. This program implements the Howard algorithm, for both discrete-time and continuoustime Semi-Markov decision processes. Although little effort was expended to achieve an exceedingly efficient program design, the solution times for optimization of the queue control model are quite competitive with the solution times for analysis [55] of a comparably large model. Often they are lesSo Thus the computation involved in treating Markov decision processes is reasonably modest, and shows promise for future application of the technique. The specific results obtained for the scheduling optimization are presented in Chapter VI.

CHAPTER III THE DECISION PROCESS WITH DISCOUNTING The purpose of this chapter is to establish certain properties of the decision process defined in Chapter II, when the future return of the process is discounted in value. These will lead to the principal theorem on infinite-time optimization to be proven in Chapter IVo The first substantial result of this chapter is to establish the existence of an optimal policy in the manner suggested by Karlin [85 ] This is accomplished by showing that the discounted total return of the process is a continuous function on the set of admissible policies. To this end section 531 describes a precise characterization of the set of a.l policies as a metric space and as a topological product space. Section 352 is devoted to the formulation of the expected total discounted return as a function of an arbitrary policy. This formulation depends upon viewing the Semi.Markov process as a two-dimensional Markov process, under the assumed policy. The existence theorem is established in section 3o3. The last part of this chapter deals with the nature of an optimal policy for discounting optimizaticno An important result, obtained in section 359, is the existence of an optimal policy which is stationary deterministic, and uniformly optimal for any initial condi.tion of the decision process9 This theorem stems from a functional equation of the type used in dynamic programming, derived in section 30.4

66 3.1 THE SET OF ADMISSIBLE POLICIES The proof of existence of an optimal policy hinges upon a precise mathematical description of all possible control policies for the decision process. The important properties of this set are developed in the following. In Chapter II it was stated that a control action could generally be determined at any time by a random selection among the actions permitted for the state being observed. Such random selection is governed by assigning a probability to each of the possible actions. Suppose state i is observed at the beginning of a transition and define i(kji) = probability of selecting action k when state i is observed (3 1) Because some control action must be selected and there are K. different actions, Ki X r(kli) = 1, i = 1,2,*.N k=l The random selection can then be described by a vector ei of probabilities, ei ((311i), r(211i), l(31i),..: Ti(Ki i)) (3.2) The state observed at the beginning of a transition could be any one of the N possible states of the system. Because the state which will occur is not known in advance, specification of the action to be taken at any particular time requires a set of N probability vectors, one for each

67 state. An N-tuple h = (e612,,o eN) (3-3) can be conveniently viewed then as a single decision in a control policy. We shall continue to use the term "decision" whenever the control action is not uniquely specified, but depends upon a random selection. The transitions instate occurring during the system operation may begin at any integer time value. A control policy must therefore specify a decision for every value of no For any n the decision given could depend upon n, so hn will denote the decision specified for time n. The action selection probabilities will similarly be denoted by Yn(kli)o A control policy can thernbe viewed as a sequence = 0(ho,hi,ha2,.o, hnn. o)0 (3.4) Now we proceed to further characterization of the set Ei of all possible vectors ei, the set H of all possible decisions'h, and the set P of all possible policies s. 351.1 The Sets Ei For each i the symbol Ei will denote the set of all possible vectors eio Every vector ei is a probability vector of dimension Ki. Thus Ei is a set of points in Euclidean space of Ki dimensions. With n(lji), r(21i), cow.'(Kiji), the lst, 2nd,..,K co-ordinate variables, Ei is the set of all points such that 1 > kli) 0, k =,2,,K, (3.5) and

68 Ki X r(ki) = 1, i = 1,2,...,N. (3.6) k=l The properties of Ei of special interest arise when we consider Ei as a metric space by associating a distance between any pair of points. A convenient and familar metric function to assign on Ei is the Euclidean distance (Rudin [56], p. 27), With respect to this metric the sets Ei are uniformly bounded, and closed ([56], p. 28). By a well known theorem ([566] po 35) Ei is therefore a compact set for every i. The metric associated with Ei defines a class of subsets of Ei called open sets (Simmons [86], p. 59). These open sets constitute a topology on Ei, known as the usual topology on a metric space ([86], p. 92). Thus Ei can be considered a compact topological space for every i, The concept of a continuous function defined on a compact space leads to some useful properties, and extends the familiar case where the function is real-valued and defined on an interval of the real line. A closed, bounded interval on the real line is an example of a compact set. We know that a continuous function defined on a closed, bounded interval is bounded, and attains its greatest lower bound (g.lb,) and its least upper bound (lou. b) at some points in the interval, This theorem extended to a metric space is ([56], p. 77): Theorem 3,1: Suppose f is a continuous real-valued function on a compact metric space X, and M = luobo f(x), m = gl.b,. f(x) x~X xeX

69 Then there exist points x, yeX such that f(x) = M, f(y) = m. This theorem will be important later on. 3.1.2 The Set H The symbol H will denote the set of all decisions which may be specified for any time during the decision process. It is the set of all ordered N-tuples h = (el,e2,..,eN) (3.3) where eieEi(i = 1,2,,,,,N). By this definition H is the cartesian product of the sets E1,E2,..,EN, written H = E1 x E2 x,.x EN (37) ([86], p, 23). The Ei are called the component spaces of the product space Ho It is important to establish that H is also a compact space, For this purpose one can call upon the Tychonoff theorem ([87], p. 143): Theorem 3,2 (Tychonoff): The cartesian product of a collection of compact topological spaces is compact relative to the product topology. The Tychonoff theorem does not require that the collection of component spaces be finite, It establishes that H is a compact topological space with the product topology ([86], p. 115ff. or [87], po 90)3.1,3 The Set P The policy space P is the set of all sequences E = (hohlh2,o.,hno o) (3~4)

70 where hnEH for all n. Any such sequence is a control policy for the process, and the space P includes all conceivable policies. From the above definition one can represent P as an infinite cartesian product P H xHxH., (3.8) Inasmuch as H is a compact topological space, the Tychonoff theorem can again be used to establish that P is a compact topological space with the product topologyo However it is convenient in dealing with the expected total return as a function on P to treat P as a compact metric space. This can be done providing the metric is chosen so that the usual topology on P as a metric space is the same as the product topology on P. Two theorems in Kelley [87] ensure that a suitable metric on P exists. Let di(e,e) be the Euclidean metric for points e and e in the set Ei, and define a new metric mi(e,) = min(l,di(e,g)) Theorem 13 of Kelley ([87], p. 121) establishes that the metric spacel (Eimi) has unit diameter and the same topology as the space (Ei,di). Then Theorem 14 of Kelley ([87], p, 122) proves that for component spaces of unit diameter, the metric N dH(hh) = mH(e,e) i=l 1This notation indicates that the metric space is comprised of the set Ei with the metric mi.

71 on the product space H generates the same open sets as the product topology on H. This argument can now be repeated to establish a metric mH such that (H,mH) has unit diameter, and is homeomorphic to (HdH). A similar formula to the above yields a metric dp on P such that the usual topology is the product topology, 00 dp((,) = X:..mH(hnhn) n=o Thus P is a compact metric space. 3.2 BASIC RELATIONS AND THE OPTIMIZATION CRITERIA The goal of this section is a formulation of the expectation of the total return of the decision process as a function of an arbitrary policy re from the set P. As indicated earlier, we let ln(kli) be the generic symbol for the action selection probabilities of the decision hno Although a policy must give a decision for every unit time n, a given decision hn will have no effect on the return during actual system operation unless a transition happens to begin at time no This is a random event, and in order to formulate the expectation of return we must consider-the probability of such random events, The first part of this section deals with a pair of random variables whose joint probability distribution provides this informationo 1That is the topologies of the two spaces are equivalent, see [86], Po 93~

72 After the formulation of the return has been achieved we will reconsider the optimization criteria introduced in Chapter II, giving a more precise statement of each, and developing some properties which make them meaningful optimization objectives. 352.1 The Random Variables Our basic approach, as indicated in Chapter II, treats the total return as the sum of one-step returns received in each unit time step of system operation. We need then to express the expectation of the return received in an arbitrary unit time step after the policy ET has been placed in operation, This one-step return for time n is contributed by a transition which may begin at any time m < n, Thus we must determine the probability of the random event that the system enters state i at time m, for all i and m. Beyond that, however, we must ascertain how this probability is affected by the sequence of decisions ho,h*,.*..,hml of the policy The way to begin is to relate the random event to a set of random variables which describe the time behavior of the system. Although several choices are perhaps available, we have found a suitably elegant formulation to result from using the random variables Zm and Wm, defined as Zm = the state entered at the first transition beginning at time m or later, Wm = the time until the beginning of a transition, measured forward from time.m, for m = 0,1,2,3,.oo The variable Zm takes values in the set i = 1,2,3,

73...N, while Wm may have value 0,1,2,.. for any m. Figure 3.1 illustrates the values of the random variables for a sample period of operation of a five state system, and shows their relation to the Semi-Markov process Ym defined in Chapter II. Note that in actual system operation the values of Zm and Wm may not be observable at time m, for they depend upon a transition which may begin later than m. The exception occurs if a transition begins at m. Then Wm = 0 and Zm = i if state i is entered, The joint probability distribution of Zm, Wm therefore provides the information needed to formulate the expected return. The variables Zo, Wo demand special attention, In Chapter II we limited consideration to the case where the decision process begins in a known state i at time 0. Zm and Wm allow us to introduce some additional generality in the form of a priori uncertainty regarding the beginning of the process. Zo now accounts for uncertainty in the initial state of the process, Wo accounts for uncertainty regarding the time of the first decision of the process, relative to the time origin from which the policy c is specified. Zo and WO are specified by a joint probability distribution o(jv) Pr(Z- j, W = v) (j = 1,2,.N; v = 0,1,2,.~o) which is to be given as additional data for the decision process. We assume of course that 0o < o(j,v) < 1 (3.9a) and

74 Y 3 - n I I I I I I 0 1 2 3 4 5 6 7 8 9 10 11 12 Time, n 5 Z 3 n 2 0 1 2 3 4 5 6 7 8 9 10 11 12 Time, n 5 4- - _ Ln __I. I I I I I n 0 1 2 3 4 5 6 7 8 9 10 11 12 Time, n Fig. 3.1. Illustration of the random variables Y, Z, Wn in a 5-state process.

75 N A o(Jv) = 1 (3.9b) j=l v=o Moreover, we assume that there is no return from the process between time 0 and the beginning time Wo. The vector random process ((Zm,Wm), m = 1,2,,.. is a Markov process, For suppose that at time m it is given that Zm = i, and Wm = u. If u > 1, then it is clear that Zm+l = i, and Wm+l = u-l. But if u = O then Ki Pr(Zm+i =j W +1 = VZ = i, Wm =) = E +im li) ij( lI k=l Despite the fact that Zm and Wm are not always observable, the above is an important point. This will become clear as we discuss how the joint probability distribution of these random variables is affected by the policy.o 3o2,2 The Set S Define the following sets, IN = the finite set of integers 1,2,,.*,N, I = the set of all integers 0,1,2,,... The set S consists of all probability distributions defined over IN x Io To explain further, let s denote an element of S. For every pair of values (j,v) E IN x I, the element s assigns a value 4(j,v) such that

76 o <l (j,v) <1 and (.3.10) 00 N j z f^j.,v) = 1 v-= j = The joint probability distribution on the random variables Zm and Wm can therefore be taken as an element of S, for m = 0,1,2,oo In particular we let so denote the distribution for Z0, Wo, so that SO = ({.o(jv), j = 1,2,oo,N; v = 0,1,2, oo) Similarly, sm will denote the distribution for Zm, Wmo The set S can be metrized by associating a distance between elements s, s given by o N dS(ss) = I (jv) - (jv) | (3.11) v=o j=1 So S is a metric spaceo The probabilities comprising the distribution sm will be denoted by1 Vm(j,v) = Pr(Zm j, Wm = v) The element sm of S depends upon the given distribution So and the policy c, but we will leave this as an implicit fact temporarily for we can show that sm is determined when sm-l and hm-l are given. This is a consequence of ((Zm,Wm) 3 being a Markov process. First we define the transition distribution for time n resulting from the decision hn of the policy t, IA more conventional notation perhaps is sm(jv) rather than l.m(j v) o

77 Ki qij (vhn) = X n(k i)) ij(v k) (3512) k=l Also, to shorten notation somewhat we use m(J) = YVm(j,o) (5313) Then we have the equation N 4m(J v) = m_1(Jv+l) + j ml(i)kij(v+lIhml) (314) i=l for j = 1,2,,N, and v = 0,1,2,o,. This equation gives the probability of the event that at time m there are v time steps until a new transition begins, and that the state entered is j, This event comes about if a transition from any state i is just beginning at time m-l, which will last v+l time steps and move the system to state j. The probability of this is given by the summation on the right side of (314). But one may also have a transition already in progress at time m-l, and this produces the event in question if it lasts for v+l steps beyond m-l and moves the system to jo This probability is the first term on the right of (3l14)o Based on (3514) we may formally write in operator notation, Sm = fl(hm-l)sm (01) for m = 1,2,.o, where fi operates on sml to effect the transformation of (3.14). The transformation (3,15) is exactly the type defined by Karlin [85] in treating the general structure of sequential decision

78 models.l Although sm is not observable from the process, (3.15) reveals that it can be calculated given the policy Et and the initial distribution so~ 3.2 3 Formulation of Return The initial probability distribution so defines the beginning of the decision process relative to a time origin from where the policy 31 is specified. The policy is actually placed in operation at time Woo We are concerned with the expectation of the total return'received from the decision process at the end of T unit time steps from this origin. This depends upon so and it, and is denoted VT(so,<T). The total return is the sum of the returns for each of the unit steps comprising the interval T, and so we are lead to consider the expectation of a one-step return. The expectation of the one-step return received at time n will be denoted Rn(so,i), and we have T VT(son) = ) (soT) (3.16) n=l 1Equation (3515) represents a deterministic transformation, and in Karlin's framework sm corresponds to the "state" of the system. From the viewpoint of modern control theory, it is often useful to consider a probability distribution as a "generalized state", in contrast to the observable state which -is random, See R. Bellman, Adaptive Control Processes: A Guided Tour, Princeton University Press, Princeton, NoJo, 1961, ppo 139-140, for further discussion~ Also see H. Jo Kushner, "On the Dynamical Equations of Conditional Probability Density Functions, with Applications to Optimal Stochastic Control Theory", Jo Matho Analysis and Applications, 8, 332-334, 1964, for an example of this approach,

79 In order to formulate Rn(so,r) we need an expression for the expectation of the one-step return received from a transition at the Tth time step after it begins. (Note that the transition may end before T steps, but in that event no return is contributed to the expectation.) If the transition begins at time m from state i, this expectation is u(Tii,hm), given by Ki U(Tlihm) = k Tm(kli)u(TIik) (3.17) k=l and u(Tli,k) defined in Eq. (2.9). Then we have n-l N Rn(SoT) = j m(i)u(n-mlihm) (18) m=o i=l for n = 1,2,3,.oo From (317) and (3 18) it immediately follows that n N Ki IRr(so,)t) I < lu(mli,k){ m=o i=l k=l and then N Ki 0 IR,(sO I) I < lu(mli,k)l i=l k=l m=l The assumption stated in Eqo (2,11) now establishes that Rn(soJ,) is uniformly bounded for all n, So, and. e Pc

80 3.:2.4 Optimization Criteria The three optimization problems introduced in Chapter II are now precisely stated: a. Finite-Time Optimization: Find a policy i E P which maximizes VT(sor), T VT(so,) = Rn(so,n) n=l for a given so e So b. Discounting Optimization: Find a policy -C e P which maximizes D,(so,) = rn(son) (3.19) n=l for a given a, O <a < 1, and so e S. Co Infinite-Time Optimization: Find a policy T e P which maximizes g(S, lim inf VT(So') (S^ o,0 = ~ ~ (3.20) for given so e So As a first step to establishing the validity of these objectives, it should be proven that the objective functionsl VT(SoS ), DI(so,), g((SOC) are well definedo First it is easily seen that VT(sot) is bounded for 1Here we are borrowing a term from linear programming where the quantity to be maximized is usually called the "objective function."

81 finite T regardless of so and., inasmuch as Rn(soT) is uniformly bounded. Theorem 3.3: For every so E S and, t e P lim inf VT(Sot) gR(So'l ) T-+oo T is finite. Proof: Because Rn(so,it) is uniformly bounded, the average expectation of return VT(so,)/T is uniformly bounded. Therefore g8(so,0) is finite. Theorem 3.4: The discounted return 00 Da(sO,IT) = E oRn(SoIt) n=l exists for 0 < a < 1, any so e S, and all t e P. Proof: The infinite series above must be proven convergent under the stated conditions. Let R be the least upper bound on lRn(so,t) I for given so and 3t. Then for a large integer 3 and y > 3, r 00 00 > C X Rn(sot) cRn(soT) I < S lR I R n=- n-X n=X The last term can be made as small as desired for 0 < a < 1. By the Cauchy criterion ([56], p. 52), the series defining D1(so,T) is therefore convergent. The next requirement is to establish that an optimal control policy indeed exists for each of these problems.

82 3.3 EXISTENCE OF AN OPTIMAL POLICY The question of the existence of an optimal policy essentially concerns the behavior of the objective function on the defined set P of admissible policies. An optimal policy is one which maximizes the value of the objective function. But does the objective function have a "maximum" value on the set P? Conceivably the function may behave analogously to the function f(x) = x defined on the semi-open interval of the real line 0 < x < 1. Here for any value x, there exists an x' > x such that f(x') > f(x). Hence f(x) = x has no maximum value on the semi-open interval 0 < x < 1, Of course, f(x) does have a maximum value of unity on the closed interval 0 <x < 1. In-order to demonstrate the existence of an optimal policy for the first two optimization problems in section 3.2,4 we will show that the objective functions VT(sO,r) and D1(so,0) are continuous functions on the compact metric space Po Then the existence of an optimal policy follows from Theorem 3.l The existence of an optimal policy for infinitetime optimization is treated in Chapter IVo Theorem 3~5: For any so E S, sm is a continuous function of L on the set P, for m = 12,.... Proof: In the framework of metric spaces, a function f, defined on a space P and taking values in a set S, is continuous at a point Jo if for any ~ > 0 there exists 6 > 0 such that dp(CEo) < 6 implies ds(f(t), f(~o)) < A. The function is continuous on the set P if it is continuous at every

point in the set. Alternatively, the function f is continuous at to if for every sequence of points (12...o) where Tv e P for lim lim = 2,.. and = o, we have t f( ) = f(to) (Simmons [86], v+-oo Vvoo V P. 76). Suppose we have such a sequence of policies (Tv), and let us write sm(Jv) to denote the dependence of sm on the policy iv for given so e S. Because the metric topology of S is the topology of point wise convergence, lim Sm(v) = sm(o) if and only if lim m (jvlso, v) = m(j,ViSoto) (o 21) v+oo lim for j = 1,2,o..N; m = 1,2, oo; v = 0,1O 2,... Moreover Tt = o if v+oo V and only if lim ['m(kli)]v [1m(kli)]o (n= 1,2,..) (322) V-oo00 Here [rm(k i)]V denotes the action selection probability of the decision specified for time m in the policy Tv' Assuming (3.22) we proceed to show that (3.21) holds by induction on mo For m = 1, we have from (3514) N ri(Jvison v) = to(j,v+l) + X o(i)qij(v+llhI) i=l LThe dependence of 4m(j,v) on so and E is now made explicit by the notation tm(jvlso,t).

84 v where ho is the decision specified for time 0 by rv. The above holdsalso for v = O From (3.12). it is clear using (3.22) that lim i o(Tlhv) = (iI(Th). TMI hn) - Ij n) for any n. Thus N im t1(jvlso) v0(,) = o(Jv+l) + o(i)qij(v+llho) = 1(j,vlsoYo) i=1 Assuming (3.21) holds for all values up to some arbitrary value m (the induction hypothesis), it is easily shown true for,(m+l) from (3.14). We conclude that it holds for all m, and sm is continuous at AO. Because ao is an arbitrary element of P, the result is true for all eC P and the theorem is proved. Theorem 356: For any s E S, and n = 1,2,...,Rn(so,T) is a continuous function of E on the set P. Proof: Consider a sequence of policies ([y} converging to an element eo E P. From (3o18),n-1 N Rn( So'v) = X m(i I soz)u(n-m i, hV) (3. 23) m=o i=l But from (3517) and (3.22) it is seen that 1The quantity 4m(iIso,, y) appearing on the right of (3023) should be interpreted for m = 0 as to(i), see (3.13)o

85 im u(n-mi, h) - u(n-mli,hm). V-oo Using this result and Theorem 3,5, it is seen that the summation in (3023) is a sum of products of continuous functions of E on the set Po Hence lm Rn(so, v) = Rn(so,Jo) v-,oo for any so e S and n = 1,2,... Since to is any element of P, Rn(so,S ) is continuous on P for any sOo Theorem 3.7: For any so e S and a such that 0 < a < 1, there exists at least one policy AT E P which maximizes VT(so,t) and at least one policy I E P which maximizes Da(so,)). An optimal policy therefore exists for the finite-time optimization problem, and for the discounting optimization problem, Proof: Because of Theorem 3.6, T VT(sO,) = Rn(so) n=l is a sum of continuous functions of A., and is therefore a continuous function of C on P for any so e S. But P has been shown in section 3,1.3 to be a compact metric space. Appealing to Theorem 3,1, there must exist some policy AT in P for which VT(s^OT) = l ub. VT(So).b.

86 This establishes one part of the theorem. Now as we have seen there exists a uniform bound R for any so such that lRn(So,) I R for all r E P, n = 1,2,... But the infinite series 00 n=l is summable for 0 < a < 1, being a geometric series, It follows therefore that 00 X aoRn(so, ) 00 n=l is -a uniformly convergent series of continuous functions and is therefore itself a continuous function on the set P. (Rudin, p. 136). Again, by Theorem 351 and the compactness of P, there exists some policy -ca E P such that D (Sog ivC) = 1.(s ) for given a and so E S.

87 3 4 DYNAMIC PROGRAMMING FORMULATION Let jatc denote an optimal policy for the discounting optimization problem, and D*(so) the optimal return for given a and so E S, i.e. D*(so) = D (So,) (3.24) As a consequence of the previous section of this report, iA maximizes Da(so,,) and we can write symbolically Da(so) -= m Da(soj) (5325) We now proceed to expand (3.25) to arrive at a functional equation in terms of the first decision ho of any policy (ePo This result will then be used in the last part of this chapter to establish that an optimal policy exists which is stationary and deterministic. 3.4,1 A Functional Equation In the derivation it is necessary to consider a transformation of a policy A, which replaces the decision hn at time n by the decision hn+i previously assigned for time n+~, for n = 0,1,2,.o and some fixed positive integer 1. This left time shift of I units will result in a policy denoted symbolically by A t. Thus if - = (ho0hlh2,.oo,nhno.) and t' = (hl,hi hf,. {,h,...,) are two policies in P, then

88 implies h hn+ (n = 0,l,2, 4) (3026) The following will also involve the notion of a decision process beginning with the initial distribution si e S. Previously we have used So E S to denote a joint probability distribution 4o(j,v) on the indexed state ZO and the starting time Wo at the beginning of the decision process, The point si e S corresponds to a joint distribution determined by so and decision ho of the policy, si = fl(ho)SO as expressed in (3ol5)o However sl can represent an initial distribution equally well as s o We-shall use s = ([(jv)) to denote the initial distribution in this case, i.e.o 4r(jv) ='((j,vlso,o) Theorem 3j8: For any initial state so e S and any admissible policy reP, the expectation of return Rn(so,T) for time n (n>2) satisfies N Rn (so, ) = o(i)u(nli,h) + Rnl(so,^i) (3,27) i=l where so = s. = fi(h)So as represented in Eq. (3 15)o Proof: Equation (3.27) separates Rn(so,,) into two parts, The first term on the right side is the part due to the event that the first transition begins at time 0. The second term is simply the rest, but its interpretation is important~ What we claim is that if we neglect the part due to a transition beginning at time 0, the expected one-step return at time

89 n is the same as if we shifted the time origin one unit ahead, used si as the initial distribution, and found the expected one-step return at time n-l on the new time scales From (3o18) we have N Rn(Sot) = I (i)u(nliho) i=l N n-l + j m(ilson)u(n-mli,hm) (3.28) i=l m=l Comparing this to (3.27) it is clear that the second summation must be shown to be R_l(s,A). We begin by proving that'rIm(jvISo,) = Vml(j,vlsl,A ) (3.29) for m = 1,2,... This holds by definition for m = 1, since o)(j,vlIlA^') -= (j,v) = tL(jvsoT) For other values of m, the proof is conveniently done using the properties of the operator fi(hm) introduced in (3-15). Let us write fl(hm-l) = Tm(v), m = 2,3,.. and note that Tm(t) = Tm-l(At) By repeated application of the operator the dependence of sm on so and t is explicit in Smt(So,) = Tm(t)Tm_l()oo-T1()so5 But then

90 Sm(So,X) = Tm-l(()Tm2(At) t) T A TL(A.) T( rc)So = Tm-l(A)Tm- 2(A ) o-. Ti(A)s! = Sm3l(Sl,A c) which proves (3.29). In (3.28) we can then write N Rn(So,) -= * o(i)u(nli,ho) i=l N n-1 + j j ml(isi,AC)u(n-mlihM) i=l m=l N = I o(i)u(nliho) i=l N n-2 ^+ V } m(ils,n)u(n-l-mIi,h) i=l m=o and thus, comparing this with (3,18), we see that (3.27) is valido The functional equation governing D*(so) is established by the next theorem, Theorem 35.9: For any so e S and a such that 0 < a < 1, the optimal return D&(s ) uniquely satisfies oo N D~(s0) = hm =X ACn o(i)u(nlih) +.cD*(s,4 (5330) o hoEH 1 1 J where so = si = f (ho) so0 Proof: The relationship in (3.25) can also be expressed as

91 D( = max OD (ho, h...) { nRn(So, =1 Using Theorem 3.8 this becomes 00o N max D*(s) = m a a' 7' (i)u(nlih) 0 o (hoyhjl,.) o o n=1 i=l 00 + J n-l(so, At) n=2 The second summation is a D(so,Ai)), so r00 N D*(S o) - max n a o) (ho hi~ ol o(i)u(nli,h ) n=1 i=l + a D(slAt)}, But given sl, De(sO,AT) depends only upon the sequence of decisions (hl,h2,.,o). Hence r o N _ max D*(s0) = hmaHx= n d ~ o(i)u(nli h0) LrL=1 i=l + hO D(s S, As j (3.31) hij h2,..' or D&(s ) = hmax d' oC o(i)u(nli,ho) + a D(s)O

92 We have therefore shown that the optimal return satisfies (3130). The uniqueness proof is straight forward following Karlin [85] and we omit ite Equation (3~30) is a functional equation of the same general form as those usually encountered in the dynamic programming formulation of optimization problems L59,60jo It is not a very useful equation for computational purposes, because of the nature of the sets S and H. Nevertheless it is an extremely useful vehicle for establishing some general principles concerning optimal policies, which allow considerable simplification in describing an optimal policy. These results are the subject of the remainder of this chapter. 3o5 OPTIMALITY OF A STATIONARY, DETERMINISTIC POLICY Proceeding from the functional Eq. (3530) we can now establish three important properties of the discounting optimization problem, These are: a, There exists a policy which is optimal for all so e S. Hence it is not necessary to know the distribution so describing the beginning of the decision processo bo There exists at least one stationary policy having the property described in a. Co There exists a stationary deterministic optimal policy having property a-.. If Et = (ho hi h2 ooo,hn, ~.o) is stationary then ho = hn, n = 1,2,3,ooo If rr is also deterministic then the action selection probabilities are

93 either zero or unity, i.e. for the decision ho one has r(kli) = 1, for k = ki e Ai and (kli) = 0 otherwise. Because of the above properties, an optimal policy for discounting optimization can be simply described by a table which lists the action index ki for every state i. This was pointed out in Chapter II. Chapter IV will show that this property holds also for infinite-time optimization. In order to establish the validity of the above three properties it is convenient to introduce the expectation of discounted return, conditional upon the policy' t becoming operational with an observed state j at time v, (j = 1,2,o..N; v = 01,o2,o,). (Recall from section 3.203 that the time variation of the policy iE is specified from a time origin which is not necessarily the beginning of the actual sequence of decisions used.) This will be denoted by Da(j,vT) for any policy ieP, D (j,v,) = Expectation of discounted total return using policy i, conditional on the policy becoming operational with state j at time v. D,(j,v,T) is the value of D((so,J) when sO assigns unit probability to the pair of values j,v, De(j,v,r) is defined for all j and v, and the unconditional expectation D((so,t) for arbitrary so can be written as N 00 Da(so~) -= I to(J,v)D(jv,) (3032) j=l v=o

94 Because of (3532) it is sufficient to show that the three properties hold for every pair of values j, v. For then N oo max D i/ max p^ D~(SoU ) =X t ) o(jv) p D(jvJ) j=l v=o because there is one policy, denoted Jd, which simultaneously realizes the maximum of D v(j,v,.) for all j, v. First we establish Lemma: For all j, D,(jv) = aVD(j) (3,33) and moreover, if (j) = (jO) =. an optimal policy for the case when the policy becomes operational with initial state j at time 0, and c(j,v) = an optimal policy with the initial state j at time v, it is sufficient to take Avr(jv) = t(j) Proof: We first point out that2 00 D(jv) max n (x(, - v P T j'') n=l However!For notational simplicity, we use D*(j) instead of D~(j,o) when v = 0. 2Rn(j,v,I) denotes the expectation of one-step return at time n under the same initial conditions as apply to Da(j,v,t)o

95 Rn(jv,C) = 0 for n = 1,2,..,v inasmuch as no return is received until after the process begins. Hence 00 v) = max v n (jv Ea(jv) ~ YV C n=l By repeated application of Theorem 3.8 however, we can establish Rn+v(j,vc) = Rn(, Av) and therefore D*(j,v) max C vcnR (j,Avr). n However, as a result of the existence theorem (Theorem 3.7) and the fact that the decisions hohi,,oohv-l of any policy do not enter into maximizing 00 cnRn(j,AVAt), we can write n=l 00 D( j,v) = iv maxP onR (j Av-) n n=l But 00 00 Da) Rn(j ) = ax. Rn(j A D~(j) = max 7L = Axp 7 n=l n=l and so D^(j,v) = QvVD( j)

96 Also this shows that the maximizing policy c(j,v) for initial state j at time v may be taken as a v unit time delay of an optimal policy for initial state j at time 0. Any sequence of decisions ho,h,,,..,hvl may make up the initial part of the policy Jr(j,v) since they do not affect the return. It is sufficient therefore to consider only the case of a process beginning at time 0 in some state jo If we establish the desired properties aO, b., c. here, then (3~32) and (3533) allow the immediate extension of the result to the general case. Theorem 3o10: There exists a stationary policy cd E P for the discounting optimization problem such that Da(iqd) > DXa(i't) for all i and all JTeP. Thus Jd is a uniformly optimal policy for all initial states, Proof: Let t(i) be an optimal policy for initial state i at time 0, so that D( r(i(i)) > Dc(i, ) for all teP Further let ei be the vector of action selection probabilities for the initial decision (ho) of policy'(i), for i = 1,2,..,N. From the functional equation or Eq. (3o31) it follows that1 00 r(1n(1)) max ( D(i,(i)) u(ni) + hlh2, D(sl,Ac) n=l 1For simplicity we use u(nli) in place of u(nli,ho), this being solely determined by the vector ei of t(i)o

97 where si is determined by the transformation of Eq. (3.15) from the vector ei and the initial state i. For simplicity of notation we write 00 di = Qnu(nli) i =,2,.oN. n=l Now we can say thatl N oo D(i(i)) < di + i I ((m)(mD(2.(I )) ~=l m=l We next introduce the notation 00 I i (m)m = Pi m=l Using this we have N D(irn(i)) < di + Pi+D(di( )) ~=1 Now we can apply this inequality again to the term D~(Q,r(Q)), obtaining 1Again, we use qi~(m) in place of qi~(mlho) for simplicity~

98 N N N Di(i,(i)) < di + Pid + P PD(k) 2=1 Q=1 k=l Writing and N (n) (n-i) p( i PLiE )Pk for n >2 Pik Pil Pik =i1 we have N N Dc(iLv(i)) < di + j Pid~ + P )D(k,T(k)) -=i k=l Continuing to iterate this expression, we obtain for large M M-1 N Da(ic(i)) < di + I P d + P(k D(k, (k)) n=l 2=1 k=l We note at this point that 0 < Pik < 1 and moreover N Pik < 1 k=1 Hence N N (M) (M-1) Pik < ipk < 1 for all M k=l k=l which can be established by induction, Now N N IJ pi( (k (k) ) < p(M) kmax D (k, (k)) k=1- k=l

99 N but for M o, Pik) 0 monotonically. k=l Then in the limit M + o, o N (n) (i,r(i)) < d. + X P dR n=l ~=1 The right side of this inequality we claim to be the return of a stationary policy d (h,hoo), where h = (el,eoo. eN) and the ei are the action selection probability vectors specified for time 0 in the optimal policies T(i). Because the inequality holds for all i, we have shown the existence of a uniformly optimal stationary policy. The proof is now completed by showing that 00 N D (i,iEd) = d P+ ) n=l ~=1 for all i We do this by deriving D((i,id) in the manner previously used and showing it yields the same expression. From earlier definitions, 00 D(i'd) = j Rntid) n=l and n-l N Rn(id) = u(nli) + i i i (m)u(n-mlI) m=l ~=1 where 1iR(m) = probability of entering state - at time m given initial state i at time 0. (5o15'[)

100 But by proceeding from (3514) one can establish m-l N iR(m) = qi~(m) + > ij(n) qj(m-n) m > n=l j=l and so oo n-1 N n=l m=l ~=l1 co N 0o D(icd) i= d nIi) + a i 4(mau(n-ml) 00 00 - di + = j4 )mn4 n^i (m) n= mm=ll =l dwhich is determim)dnist m-l ~l.Now one can show that n=+l m=l and thus D(i S (i)) = Dm(icrl ) for all i, Theorem 3~ll1 There exists a uniformly optimal stationary policy which is determinizstico Proof: Confining attention now to the stationary policy ed' there is a single decision h to be determined, The notation in the functional equation (3030) can therefore be simplified, Because hn = h for all n, the return for the mth step after a transition begins from state i can be denoted u(mji) rather than u(mli,hn), Similarly the transition distribu

101 tion qij(Tlhn) can now be simply denoted as qij(T). The equations for these quantities are Ki u(mli) = r(kji)u(mli,k) k=l and Ki qij(T) = l(kji)qij(Tlk) - k=l where T(kli) are the action selection probabilities for Jd, and the other quantities are defined in (2.2) and (2.9). Note that for each i these depend only upon ei = ('(lli)<..- -(Kili)). Equation (3.30) can now be written N Xo K D(so) = 7 z o(im)m iE (ki) (nik) i=l m=o N oo ) F Z Z I Vq.j (v k) D*(j) j=l vwl The bracketed quantity [ ] has a maximum value for some kieAi. The B(kli) are constrained to be probabilities. It is seen then from the properties of such a constrained maximization that it is sufficient to choose ei = (0,0O..,,0,1,0,...,0) where for every i the unit value occurs in the ki position, that is n(kli) = 1 for k = ki, 1)(kli) = 0 otherwise.

102 The two preceding theorems allow one to describe an optimal policy in much simpler terms than one might originally anticipate. This is reason enough to seek only an optimal policy fitting this description, i.e. a stationary deterministic policy which is optimal for every initial state. Granting this, any further consideration of discounted optimization may consider the process as beginning at time 0 in an initial state i, without loss of generality, Hence forth, the subset of P consisting of all stationary deterministic policies will be called P', and any member will be called a P' policy. Our principal interest is to show in the next chapter that P' contains an optimal policy for infinite-time optimization. The proof will lean heavily on the results of Theorems 3.7, 3.10, and 3.11.

CHAPTER IV THE INFINITE-TIME DECISION PROCESS As we have already mentioned, the notion of discounting has a definite physical interpretation in applications to economics and management science. But discounting does not seem to be indicated in applying the decision process to the control of queues. In high volume traffic situations in modern computer and communication systems, the transitions in state and the control decisions occur very rapidly in time. Over a period of operation as short as a few hours it is reasonable to view the system as operating indefinitely, with respect to the average time interval separating successive decisions' With respect to the useful life time of the system, however, a few hours is small enough to obviate any discounting of performance throughout this intervals The infinite-time optimization problem discussed in. Chapters II and III therefore seems to be the most important decision problem in the control o.f queues. This chapter will extend the principal result of Chapter iII to this case. With this result we shall be able to introduce the Howard algorithm for determining an optimal stationary deterministic policy. This chapter will close with some consideration of the computational aspects of the Howard algorithm, preparatory to using it for optimization of computer time-sharing in Chapters V and VRI 103

10i 4o1 EXISTENCE OF AN OPTIMAL STATIONARY DETERMINISTIC POLICY We are able to establish that finite-time optimization can be achieved with a stationary deterministic policy by considering discounting optimization when the discount factor a-* from the left, written c+1~o From Chapter III it is known that optimization with discounting can be obtained using a P' policy for 0 < c < 1. The following section develops an important theorem relating the discounted return Da(so,iT) as a-+l and the asymptotic behavior of VT(sO,)r)/T as T->oo for a policy JtcP' This is subsequently used in the existence theorem, Theorem 4.2, which is the principal theoretical result of this reports 4.1ol System Behavior with a P' Policy Under any stationary deterministic policy, i.e., any P' policy, the decision process can be treated as a Markov-Renewal stochastic process [58,88]. As Pyke [88] points out, this process is fundamentally determined by the transition distribution qij(Tr) q= ij(ki), i,j = 1,2,,N; T=,2,o, where ki is the action specified for state i by the assumed P' policy. Theorem 4o1, to be established shortly, relies on certain results obtained from the Markov-Renewal approach. The Markov-Renewal process concerns the random variable Uj(T) and its conditional expectation Mij(T)o The precise definitions are

105 Uj(T) = the number of times the system enters state j in the time interval [1,T], T = 1,2,3,oo and Mij(T) = the expectation of Uj(T), conditional on the initial state i at time 0, or Mij(T) = E(Uj (T) Zo=i, W0=O) If we also define the probability.ij (n), 1J cij(n) = probability that the system enters state j at time n, conditional on the initial state i at time 0, for n = 1~,2,3o.o* then T Mi (T) = tij(n) (401) n=l Note that the initial state i is not counted in Mii(T)o The renewal function Mij(T) is related to the first passage probability distribution Gij(T), where Gj(T) = probability that the system has entered state j at least once in the interval [iT], conditional on the initial state i at time Oo Note that Gij(T) = Pr(Uj(T) >!Zo=i, Wo=O) The relation between Gij(T) and Mij(T) is derived by Pyke [88] and is expressed in a convolution form following Eqo 5o13 of [88]'Written out for the discrete-time model, this result is

106 T-1 Mij(T) = Gij(T) + Gi(T-n) j(n) (4.2) n=l The important properties of Mij(T) which we require concern its asymptotic behavior as Too. In this regard, we define f.. = probability of passage from state i to state j, or fij = lim Gij(T) ( T->oo Any state j for which fjj = 1 is called a recurrent state ([58], p. 1240). For a recurrent state the mean recurrence time jj is defined by 00, 00 jj= n[Gjj(n) - Gjj(n-)] = ng(n) (4g4) n=l n=l Note that gjj(n) is the probability that the first recurrence of state j occurs at time n. Pyke has shown ([58], p. 1241) that.jj is finite for a finite state Markov-Renewal process if and only if j is a recurrent ki state and vi < oo for all states i which communicate with state j. Bek cause we have assumed v. < o for all i and k, and the decision process has a finite number N of states, ijj is finite for any recurrent state j There are two important theorems on the asymptotic behavior of Mij(T) which can be established from the theory of simple renewal processes [90,91]. These are developed in Appendix Bo We have chosen to give them the names by which their counterparts in renewal theory are known. For Theorem 4.1 we need only the following.

107 Elementary Renewal Theorem If j is a recurrent state of a Markov-Renewal process, then lim Mi (T) = fij T-oo T 3j, for any state i. If j is non-recurrent, i.e., fjj < 1, the limit is zero for any state i. A proof is given in Appendix B. Using this we now establish Theorem 4.1. Let R be the subset of the states i=1,2,3,...,N, such that fii=l under a given policy teP'. For any so0S VT(So,7) lim (1-c)D(so,n ) = lim,, c0 1"- To N E E Cpifijbd i/2 a (4-5) i=l jeR where 00 (Pi = o(iy) (4. v=O is the probability of the initial state i, and b.kj is the one-transition J return for state j. 00 b i = u(mlj,kj) m=l Proof: We will establish that VT(so,J)/T has the limiting value claimed, and use an Abelian theorem from Appendix A to complete the proof of the theorem.

o08 To begin, one can show by specializing (3oi8) that T VIT(SOI) 1 V R,(so, T) T T n=l T N n-l. = 7 o ) ) *0(i,v)u(n-v i,ki) n=l i=l v=0 T N n-2 N n-v-1 7T 7 7 o(ijv) 7ij(m ) u(n-v-mlj,kj) n=2 i=l v=0 j=l m=l Because the summations are finite, the order can be interchanged and N T-1 T-v vT(so) 1 7 o(i v) u(nli,ki) T T *0, -- i=l v=0 n=l N T=2 N T-v n-1 +7 7 o(iv) j(m) u(n-mlj,kJ) i=l v=0 j=l n=2 m=l From lemma 1 of Appendix A, T-i T-v ki lim ) o(iv) u(mli,ki) = pib V=0 n=l which has finite value, and therefore N T-2 N T-v n-l lim VT( lim o(iv) T ij(m) u(n-mlj kj) T.~oo TT T-o il= T->o v=O j=1 n=2 m=l Again by lemma 1 of Appendix A, T-2 N T-v n-l Tim 7 (i,v)7 ti 7 (m) u(n-mlj,k) = v=0 j-l n=2 m=l

109 00 N T n I o(ivZ) lim r -.(m) u(n+l-m jk v= j=l n=l m=l whenever the latter limit exists. Now from lemma 2 of Appendix A and the Elementary Renewal Theorem above, lim I, i (m) u(n+l-mj,k) T-oo n=l m=l T T+l-m bk lim j(m) j u(nlj,k) =. T- ^ "T40 m=l n=l if state j is recurrent, and is zero otherwise. Therefore N 0o k N k VT(S.9 Vi V.i ^V V lim: = j o(iv) J - T^oo T L'i. J ^T- ii=l v=O jeR i=l jeR Abelian Theorem II of Appendix A now establishes that VT(S0,T) lim (l-ao)D (so,) = lim T a+l- T-o 4.1.2 Principal Existence Theorem The principal theorem of this report is: Theorem 4.2. There exists a stationary, deterministic policy tr*eP.' which maximizes lim inf VT(so,t) g(o~,=) oo T for every soeS.

110 Proof. First we establish that there exists a sequence (ai] such that 0 < ai < 1 and (ai) + 1, and a policy nde P' for which Dci(so,i) < Dai(so'gd) for every soeS, all geP, and every i. Let {i], 0 < Pi < 1 for all i, be any sequence converging to the value 1. Because ({i) is a convergent sequence it is also a Cauchy sequence (see Rudin[56], p. 46). Therefore every subsequence of {(i} converges to the same limit, 1. In Chapter III it has been shown that for every element Pi there exists an element, say xi, contained in P' and maximizing D.(So,) for every soeS. But P' is a finite set of policies. In fact the number of policies in P' is equal to the product of the values Ki (i=1,2,...N), EI Ki. Hence there exists at least one element of P' which occurs ini=l finitely often in the sequence (7i). Letting 7d be such an element, (cai can be taken as the subsequence of ([i) such that 7i = gd. Then for the sequence (ci) and the associated policy gd, (l-~i)Di(s o') < (l-0i)Dai(so,d) This holds for any policy, i.e., any jrP, and therefore lim sup (l-ci)D i(so ) < lim (1i)D0i(sod) i -*oo -- i0 ) o From Theorem 3.4 and Theorem III of Appendix A it follows that lim inf VT(So,) lim sup (s T 0oo T ~ — i - 0o0

111 for every s eS and any rcP. Theorem 4ol however has established that lirm (lCi) (o d )-lir VT(So,7d) iL (l-i)D-(sO id ) - T-' T Hence lim VT(Solimd) rim inf VT(sO,) T-o T T+ oo T for any ReP and every so-$S The theorem is proven by letting 7* = do This theorem, with Theorem 4o1, points out that for any policy TcP'- the following limit exists, lim VT(sOJt) g(s,~) =, for icP'e (47) 0 T4oo T The value g is referred to as the "gain" of the policy Tot This term was introduced by Howard [63] The results of Chapter III, together with the above theorem, correspond to what Derman [66] has done for the decision process with constant transition durations. 4~2 ILLUSTRATIONS OF THE PRINCIPAL THEOREM The principal theorem has considerable merit because of its generality. No unusual restrictions are imposed on the transition k k probabilities Pij, and the return functions ri (mlT) can be arbitrary k as long as b., the expectation of total return over one transition, is defined by an absolutely convergent series, see Eq0 (2o10)o Since there are no complex conditions in the theorem, the conclusion can be easily applied to models of physical processes, often by inspection~

112 The theorem is not trivial to prove as.evidenced by the deliberations up to this point, even allowing that we have included certain steps which are obvious to the sophisticated reader. Moreover, the result is not a trivial property. To illustrate this point, we would like to consider some simple examples. The first shows that an obviously nonergodic system falls under the theorem. The second demonstrates that systems do exist which do not possess the property of the theorem. This example has non-stationary return functions, and we show that a non-stationary policy is better than any-stationary policy for the system. Finally we show in a third example that the assumption of absolute convergence associated with b. is not necessary, but only sufficient for our results. 4,2,1 First Example Consider a decision process in which there are two states for the system, i = 1,2. Let there be only one action allowed in the first state, and two alterative actions in the second state. Thus there are two P' policies corresponding to the choice of k = 1 or k = 2 in state 2o We shall refer to the former as policy 1, and the latter as policy 2. For both policies, let the transition probabilities be Pai = 0, P2 = 1, Pi2 = 1, P22 = 0 The two policies will differ however in regard to the transition intervals and the return values. Assume that

113 f2(1) = 1, fi22( = = 0 for T > 1; f2l(1) =, ) = 0 for T > 1; f2(l) = 0, f1(2) = 1, f2(Ti(T) 0 for T > 2; and also assume that ri2(1 1) = 0, ril(ii) = 1, (4.8) 2 2= 0, r2(12) = o. r2l(212) = 2 These are the only data values which matter because of the assumed transition probabilities, and transition time distributions' Figure 4o1 depicts the behavior of the system under the two P' policies, assuming the decision process begins in state 1 It is clear that the two systems are periodic and thus the underlying Markov chains do not have stationary solutions. The systems are not ergodic in the sense of a stochastic process. Observe that the two processes can be compared in terms of 6 unit intervals, ioe., the relative phase of the two processes is the same at times 6, 12, 18, etc,, as at time 0. The expected total return at time n assuming the process begins in state I at time 0 is Vl(l) for policy 1 and V2(l) for policy 2. The n n return at time 1 is zero for both policies since ri2(1jl) = O0 But V(1l) = 1, while V2(1) = 0 because rl1(l12) = O. Using the notation [x] = integer part of x, x > 0

114 2 0 1 2 3 4 5 6 7 8 9 10 Time (a) Behavior under policy 1. 12 l I ~ I I I I I I I 0 1 2 3 4 5 6 7 8 9 10 Time (b) Behavior under policy 2. Fig. 4.1. State transitions in the first example.

115 we can write in general n - [2] and vn(l) = 2Eij3 Then A.(l) - Vn(l) 1 n ^^ " n n 2J and because [n/2] < n/2, it is clear that 0 < A(1) < 1/2. Moreover 1n An(l) approaches 1/2 as a limit for n -> oo. This is true because rn n lJ 2 - 5n Although $n changes value with n, 0 < 5n < 1 for all n, and therefore 1 n 1 n 5n 1 n 2 2 n 2 n 2 n r n] = 1n bn _ <1 which is as small as desired for n sufficiently large. This limit value, 1/2, is the gain g of the policy 1. 2 2 Similarly it can be seen. that 0 < An(l) < 2/3, and An(1) + 2/3 as n + oo. Because policy 2 has a higher gain than policy 1 we conclude that policy 2 is optimal. From our theorem, no other admissible policy has higher gain than the value 2/3. It is also clear that policy 2 is optimal if the processes begin in state 20 Looking at Fig. 41o, this situation comes about by shifting the time origin. to the right one unit. Because V~(l) = Vi(l) = 0, the gain of either policy has the same value as beforea

116 4.2.2 Second Example We now show that a decision process which does not conform with the assumptions we have made has a nonstationary policy with greater gain than any stationary policy. Consider basically the same process as the first example, but replace (4.8) with another set of returns including one which changes with time. For values of time n such that 6(m - 1) < n < 6m, let r12(11l) = + (-1), m 1,2... Let the, m = 1,2, Let the other return functions have the same values as in (4.8). Thus over the first 6 intervals the transitions from state 1 contribute nothing to the total return, as in the first example. Over the next 6 units, from time 6 to time.12, each transition from state 1 earns a return of 4/3o In the first example policy 2 earned 4 return units in each 6 unit time interval, whereas policy 1 earned 3 units of return. Here policy 1 will earn 3 + 3~4/3 = 7 return units in any 6 unit interval in which state 1 contributes return. On the other hand, policy 2 earns only 4 + 2'4/3 = 6 2/3 return units in such intervals. Thus A n() + (7 16 = 5/6 as n +-c A2(l) + [(4) +1 3 /6 = 8/9 as n +oo and policy 2 is the preferred stationary policy. However the nonstationary policy which uses action 1 in state 2 for those time values n such that 6(m - 1) < n < 6m m = 2, 4, 6, 8e.., and action 2 in state

117 2 for other values of time, will have a greater gain than either policy 1 or 2. Its gain is g = (4) + ( 6 11/12 4.2o3 Third example. The requirement that the series 00 I u(mli k)l m=l be convergent for all i and k, where 00 k V bi = u(mlik) m=l has simplified the proof of the results in Chapters III and IVo It has played an important role for example in Theorems 3~3 and, 34, Theorem 3~7, and Theorem 4l1. Yet these theorems may be true for certain decision processes which fail to meet this requirements We now give an example of a particular system for which Theorems 3.3 and 354 hold. It is sufficient to consider the behavior of a two state system 1 under only one stationary policyo Let the transition matrix P of the underlying Markov chain be -P - Let f12(l) = l,f12(T) = 0 for T > 1 1Because there is only one action in each state we drop the k index from u(mik), rk.(m t'), etCo L-J

118 and f21(T) = (1 - ) (T = 1,2,,...) where 0 < a < 1. Thus the time duration of the transition from state 2 to state 1 is determined by a "coin-flipping" random mechanism, with,u the probability that the transition does not end on any one unit time step. We additionally specify r12(m1T) = 0 for all m,T and (-1)1 ri(m) = ( m (m = 1,2,3,...,T) m-1 m[r Thus r21(m1T) oscillates with increasing magnitude as m increases. 1 Now one sees that u(mll) = 0 for all m, and 00 Zim[2> C (- ~) T -1 m-1 )m-1 u(m12) = (1 m (_m- m_ M m- 1 m T=m for m = 1,2,35... o Then the alternating series oo oo m-1 b2 = Eu(mi2) = E ( - m=l m=l converges ([56] p. 62), whereas the series of absolute values 00 00 I u(m2)1 = m=l m=l 1Because there is only one action in each state we drop the k index from u(mli,k), r1j(mlT), etc.

119 diverges, for it is the well-known harmonic series. Nevertheless the discounted return Da(so,0) exists for every so, since 00 00 ID(so )I K E nIRn(so,) < < an n=l n=l and the last series converges for 0 < o < 1. Moreover T-1 T-n VT(SoC1) 7' lim T= lim 1 12(n) u(m12) T-vo T-Too n=l m=l and this limit exists. 4.3 THE CASE OF ERGODIC POLICIES An objective of this dissertation is to demonstrate that the SemiMarkov decision process is useful for investigating systems with a large number of alternative control policies. The procedure just used in the illustrations to determine an optimal stationary policy is essentially one of direct enumeration. This is an impractical procedure for the large models of interest in applications. An optimization technique which is effective in large problems has been devised by R. Howard [74] and W. S. Jewell [73] for the general class of Semi-Markow decision processes. This includes the discrete-time process which we are considering. The Howard algorithm for Semi-Markov decision processes has been developed for the case where all policies in the set P'are "ergodic" originally called the completely ergodic case by Howard [74]. A similar algorithm for nonergodic policies can undoubtedly be developed, as Howard has done [74] for the special case of constant transition intervals. This will not be necessary for our demonstration.

120 The ergodic property is determined by the underlying Markov chain of the decision process, and the associated transition time distributions. A policy in P' will be called ergodic if it produces a single closed, communicating class of states which is aperiodic and recurrent. These terms indicate concepts which are very much related to the same concepts in the theory of Markov chains. A set of states is a closed, communicating class if every state can be reached from every other state, and it is impossible to reach a state outside the class from any state within it. A closed communicating class with a finite number of states can be shown to be recurrent, i.e., every state in the class is recurrent. Since we are treating a decision process with a finite number of states, every closed communicating class of states is recurrent. A recurrent state i is aperiodic (for the discrete-time case) if the greatest common divisor of the values n for which *ii(n) > 0 is unityo Now one can see that N n-l tii(n) = qii(n) + ij(m) ji(n-m) j=l m=l for the system may return to state i at time n in one transition of duration n; or, it may pass at time m < n to any other state and subsequently return to state i at n. It is evident from this equation that state i is aperiodic if qii(n) is aperiodic, i.e,, is not a lattice distribution. 1See Parzen [46], Gantmacher [95] for a complete discussion of the classification of states of a Markov chain.

121 This means for discrete time that the greatest common divisor of the values n for which qii(n) > 0 is unity (and that qii(n) > O for at least one value of n = 1,2,3,...). A sufficient condition that a closed communicating class be aperiodic is that there exists at least one state i in the class for which qii(n) is aperiodic. For all states in the class are aperiodic if any one is, as can be seen from the corresponding argument for Markov chains, Feller [4], po 354. For an ergodic policy the asymptotic behavior of the expected total return VT(i) as T-*oo has an especially convenient form. This behavior is the key to t:oward's algorithm. The purpose of the next theorem is to establish this behavior. In stating the theorem, we suppress the notation which indicates the dependence of the return upon the policy and the actions for each state. The symbol R denotes the recurrent class of states for the assumed ergodic policy. Theorem 4.30 An ergodic policy is such that VT(i) + gT +'i as T">oo, where vi is a constant and the gain g is independent of i, Z= b g = A - (4.9) gjj jcR Proof: It is sufficient to prove that lim [VT+i(i) - VT(i)] = g for any state io By specializing Eqo (5318) one can show that for any

122 policy in P, the one step return at time T+1 conditional on the initial state i at time 0 is N T RT+l(i) = VT+(i) - VT(i) = u(T+li) + Zu(mlj)tij(T+1-m) j=l m=l Then using (4.1) one sees that *ij(T+l-m) = Mij(T+l-m)-Mij(T-m) Next using (4.2) VT+l(i)-VT(i) = u(T+lli) N T +, Yu(ml j)[Gi (T+i-m)-G.. (T-m)] j=l m=l N T T-m i+ ) u(m j)Z [Gij(t)-Gij(t-1)) ]jj(T+l-m-t) (4. lo) j=l m=l t=l. Taking the limit of (4.10) as T + oo, u(T+l.li) - 0 because it is a term of a convergent series. Because lim Gij(T) = fij < 1, it is seen that T-oo lim [Gij(T+l-m)-Gij(T-m)] = 0 T-oo Then because the series in u(mlj) is absolutely convergent for all j, lemma 1 of Appendix A allows us to establish that T u(ml j) [Gij(T+l-m)-Gij(T-m)] +- 0 as T -+oo m=l The first two terms of (4.10) therefore vanish in the limit. Again using lemma 1 of Appendix A we can now say that

123 lim[VT*+(i)-VT(i)] T-oo N T = bj lim [Gj (t)-Gij(t-l) ]jj(T+-t) j=l CT t=lJ whenever the limit on the right side exists. To show the existence of the limit we must consider whether or not j is a recurrent state. If j is not recurrent, from Feller [100] we have the result (Bo3) that 00 tjj(t) <t=l So from lemma 1 of Appendix A. T lim [Gij(t)-Gij(t-l)]. rj.(T+l-t) 00 T->oo lim [Gij(t)-Gij(t-l)] E..(t) = 0. ~T~o t=l~tl Next if j is a recurrent state, the Key Renewal Theorem in Appendix B establishes that the limit has value f ij/.jj Therefore b f.. iim [VT (i)-VT(i)] = 1 T->oo JjeR But fij = 1 for jeR, since the system eventually passes into the recurrent class from any state. So we have shown that im [VTl(i)-VT(i)] = g = j T^ j~oo~~~cR

124 Note therefore that the gain of an ergodic policy does not depend upon transient states, those not contained in R. Although both Howard [74] and Jewell [73] recognize and use the truth of Theorem 4.3, neither have provided the proof in their papers. The values vi have been termed "relative values" by Howard. They can only be determined to within an additive constant, as a consequence of the next theorem~ Theorem. 4.4. For an ergodic policy the gain g and the relative values vi(i = 1 2.oo.N) are given by the system of equations N i + gv = bi + P v (4.11) j=l Proof: Defining T Ci(T) = Iu(mli) m=l and again suppressing the index k denoting the action, the finite time return VT(i) is given by N T-l VT(i) -= i(T) +E j Pijfij(m) VTm(j (412) j=l m=l Then subtracting gT from both sides, N T-1 VT(i)-gT i(T) + Z Pijfij(m)[VT-(j)-g(T-m)] j=l m=l N T-l N o -X Eg Pijfij(m)m - gTX Pij E fij(m) (4.13) j=l m=l j=l m=T

125 Consider the limit of both sides as T -+ oo From Theorem 4~3 we know that VT(i) - gT vi and clearly N T-1 I I Pijfij(m)m i j=l m=l oi(T) + bi Because N mo N o O < TI Pijlfij(m) < Pij m fij(m) j=l. m=T j=l m=T and the right side of this inequality approaches zero, the right most term of (4l13) approaches zero as T + oo, From Theorem 4.3 and the properties of fij(m) it follows that T-1 E fij(m)[VT m (J)-g(T-m)] 1 Vj as T oo. m=l Thus from (4.12) we find N ~i = bi +E PijVj - vi j=l as claimed. One can see that (4.11) remains valid if any constant is added to both sides of every one of the N equations. In particular if we add (-vN) the equations remain valid. Hence forth then we can consider vN = 0 if this is convenient. With this convention, (411) represents N equations in N unknowns, g, vl, v2,.2.,vN1. Thus it can be solved

126 to give the gain g of any P' policy in the completely ergodic case. An optimal P' policy is one having a maximum value of g. 4.4 THE HOWARD ALGORITHM The Howard algorithm is a systematic and effective computing procedure for finding a P' policy having maximum gain g. The algorithm consists of two fundamental operations. The "Value-Determination Operation" consists of solving (4.11) for a given P' policy to find its gain g and relative values vi. The "Policy-Improvement Operation" allows one to use the relative values vL of a policy in a systematic way to determine a new policy of increased gain. Execution of the algorithm consists of cycling through these operations until the policy on two successive passes through the Policy-Improvement Operation is the same. The computation can begin in either of the two phases. Figure 4k2 diagrams the calculation of the Howard algorithm. In the Policy-Improvement Operation one evaluates a set of test quantities (k) = k + Pj v-v (4.14) for each i and k = 1,2,.oKi. The values vi are those determined for the previous policy considered, or can be taken as zero if the computation is being started. The new policy consists of the set of action indices ki(i = 1,2,,oo,N) such that i( )= max (k) (4.15)

127 Set the relative Policy - Improvement Operation. values v.=0 For i=1, 2,... N, and k=1,2,...Ki, evaluate N (k)1 bk N k i(k) [bi+ E Pi V -v.] V. j= J and find the policy with actions k. such that (k) = max 6 (k) k=, 2,... K. Is the new policy the same as the previous policy? yes no Value - Determination Operation. End~~4 ~Using b., i., and pij values for the policy, solve N v. + g v. = b. + piv. 1 1 1j _ for vl,2,..., vN_ 1 and gwith v =0 Choose an initial S ta — rt. ~ Start policy Fig. 4.2. The computation of the Howard algorithm.

128 Should there be several values of k satisfying (4.15), any one can be used. If this occurs at the termination of the algorithm then the optimal policy is not unique. All optimal P' policies are discovered by listing for each i all the k values maximizing (4.14) at conclusion of the calculation. Howard has given in [74] the theorem which guarantees that the algorit.hm will always converge on a policy having maximum gain go Convergence is assured regardless of whether one starts in the ValueDetermination or Policy-Improvement, and regardless of the initial values used or the initial policy chosen. In this proof Howard assumes that the relative values vi and the gain g are determined with perfect accuracy in the Value-Determination Operation. Semi-Markov decision models of physical systems can be expected to have a large number of states. This will require use of a digital computer to carry out the computation of the Howard algorithm. Appendix D describes briefly the program which has been developed for this research. Because of the finite nature of computer arithmetic, it is inevitable that errors will arise in the course of computation. In order to have confidence in the Howard algorithm, some thought must be given to' the effect of such errors. 4.5 COMPUTA.TIONAL CONSIDERATIONS The effect of computational errors on the Howard algorithm will be considered firsto Satisfactorily disposing of this question, we then

129 turn attention to the actions which are found by the algorithm for the transient states. Theorem 4.3 assures one that such actions do not contribute to determining the gain g of the decision process. The Howard algorithm however determines the actions for transient states so as to maximize their relative values. Since the relative values do not enter in the optimization criterion, these actions are irrelevant for an optimal po icyy. 4-5.1 Errors The Value Determination Operation, as implemented in the program described in Appendix D, produces a set of numbers, say xi,x2,,..,xN, where Xi vi + g (i=l,2,.,N-1) and xN M g. The computation is only approximately correct because of computational errors which inevitably arise. We consider that Xi+ Ci = vi + g (i=l,2,.oo,N-l) (4k16) XN + N = g so that the values ei (i=l,2,..o,N) are the errors of the computation. The values xi are rational numbers with 8 significant digits (for the program of Appendix D). The Ci are unrestricted in value, but are presumably not too large, say iil < 10 v + gl In this case the error in xi would occur beyond the fourth significant digits Such an assumption will not play a major role in the following however

130 The principal concern is whether the Howard algorithm will always converge to an optimal policy even if errors are present. The answer, strictly speaking, is no. Two difficulties appear to arise. First, the algorithm may converge to a non-optimal or sub-optimal policy. We can show in this case however that the gain of the policy will differ from the optimum within the same order of magnitude as the computational errors. Second2 the algorithm may fail to converge at allo This would be evidenced by a continuing iteration among policies whose gains are very close in value In the following paragraphs we explain why these problems can come about, and show how to cope with them in practice. The argument proceeds via the same consideration used by Howard to prove the theorem on convergence without errors. Suppose that the Value Determination Operation (VDO) has been carried out for some policy, called policy A.. Consider what the Policy Improvement Operation (PIO) indicates for some other policy, say policy B, based on the results of the VDO for A. Excluding errors, the difference in the test quantities of state i for the two policies is N 1 _,B A. B A. j=l N 1 A A A A - i [b - + v (4. 17) j=l 1Superscripts A and B refer to the values for policies A and B.

131 It can be seen, as Howard has indicated, that the difference in the gains of the two policies is. icRB gB Ei (4.18) i c RB where RB is the recurrent class of states for policy B. Thus gB > gA if and only if > 0 for all i, with ~i > 0 for at least one state i c RBo When there are negligible errors in the VDO, the PIO will declare policy B to be an "improved" policy if these conditions on ~i hold. Thus the PIO can discover any policy B for which gB > gA, assuming negligible errors, Now consider the effect of errors. In our implementation of the Howard, algorithm, the test quantities are N ~~ PA A A A. i = {bi xi +p i JV. vi j-=1 ~j and N'B - 1 B A V PiB A. V. iPi JJ' j=l A where xi, i=l,2,o..N, are the values computed in the VDO for A.. The A B A. B A B values bi, b'B, vi, vi Pij Pij can be assumed to have negligible error since they are given datao Moreover any round-off error in computing A B 5, 5~ using the xi values can be treated as negligible compared to errors in the'WO0, Then using (4.16) in the two previous equations, one has

152 N N A. 1 A. A A. A AA A A A = 1 bi vi + Ci + Pijj - Pi jC i j=l j=l and N N B 1 B A A V B A B A i = v [ bi - i + ci + i PijV P I 1 v~j=l j=z Now we have from (4.17) and (4.18) that B A B A A 1 IS 9~1 I _ ~E) 1 _ 1 gA = 1 Ci RB B iB vA N N 1 Y' B A. 1 AA pije.1E.C B- (4.19) viB L VIA { j J j=1 1 j=l A.ssuming that B is any stationary policy, the result we would like B A. at this point is an upper bound on the difference gB g in terms of 5i, 5,i and A. Since 5iB and 6iA can be observed during the computation, such a result would provide a means of evaluating whether some policy B is better than A. To accomplish this, we first use a limit B theorem for Semi-Markqv processes ([92] or [94]) relating i.. to the equilibrium probability 2oB of state i in the underlying Markov chain. This is 1 ~i B j R B jRRB and from this it follows that

133 A. B With this and the fact that vi, vi > 1 we can state that B gA 7 1 BA 53 max IciAI gB gA< BB (BiA) + k ii. i < (imi min V. iCR ik The set RB may not be easily determined by inspection of an arbitrary policy and we would like to circumvent its use. Let R(A,B) be the subset of system states for which the actions of policies A and B are difB A ferento Then 5iB - = 0 for all i not contained in R(A,B), so B (. ) < max ( _) X ~ B max i - ) j.iB {~ -"i _i. iB-BnR^.,B) i(., icRB icRBR(AB) icRBfR(A,B).< y max (siB-siA) icR(A.,B) where' L.B ieRBnR(A,B) B) ii is positive and less than unityo -So we finally have'.. gBgA < max ( A 4. 20) icR(A.,B) min i iyk B A Now if the algorithm has converged to policy A, we have 5i < 5i for all i and any policy B different from A. As seen from (4.19) however, the errors iA may conceivably be such that in fact g > gA for some B A policy Bo Equation (4.20) ensures that g cannot exceed g by more than B A the value of the right most term. Moreover if max (5i - 6i ) is a icR(A,B)

134 sufficiently large negative number for all policies other than A, we are safe in concluding that policy A is indeed optimal. In practice B A the values of 5i - i can be used to resolve this question. The possibility that errors will lead to an unending iteration can be avoided by terminating the computation whenever the computed gain xN does not increase on every iteration. This also prevents the additional iterations brought about by transient states (see the next section). A drawback of this modified procedure occurs in the possible event that the algorithm passes through two policies of equal gain before reaching the optimal policy. Termination of the computation would be premature. This event can be ascertained in practice by inspecting the new policy obtained on the last iterationo Should the new policy have the same actions in the recurrent states as the old policy, the new one is optimal. If not, then a premature termination has occurred and the computation should be begun again with some other initial policy. If a policy with larger gain exists, it will be discovered after one or more such attemptso In conclusion, it is important to keep computational errors reasonably small in order for the Howard algorithm to be successful in practice. Any detrimental effect of small errors is easily surmounted by observing the sequence of policies and test quantities obtained in the computation.

1 5 4.5.2 The Actions for the Transient States Suppose that there is no difference in the test quantities for policies A and B for all states in the recurrent class RB. From (4.18), the gains of the two policies must therefore be equal. There may however be a positive difference in the test quantities for some state i not in the recurrent class, i.e., a transient state. The Howard algorithm would still suggest policy B as better than policy A. But for what reason? The answer given by Howard in [74] is that the relative value B A vi is greater than vi. Thus when further increase in the gain is impossible, the Howard algorithm determines the actions for the transient states so as to maximize the relative values. To see this it is sufficient to consider the recurrent class of states as consisting of a.single absorbing state, say state N. Then b PNj = 0 for j # N and g = v assuming as usual that vN =. For i f N, consider (4.11) in the form N-l Vi = ( - g) Vi + ij (i=l,2,...,N-l) v = e +P* v where v is a column vector of the vi, e is a column vector of terms (bi/vi - g)vi, and P* is an (N-l) x (N-l) matrix obtained from the transition probability matrix P by deleting the NIt row and column.

136 From this equation v = (I-P*) e The inverse (I-P*)- is guaranteed to exist and (I-*)- = I + P* (p*) +... (4.21) see [96, pp. 22 and 46]. In fact (4.21) shows that the terms in the inverse matrix (I-P*)-1 are the expected number of times the system enters state j (a transient state) in an infinite number of transitions, having started in state i. Thus denoting the latter quantity by mij, N-1 Vi= m (v (4.22) j=l j Now (4.22) is equivalent to the set of Equations (4.11). So upon realizing that (4.18) arising from the equation' N- 1 B A. B A B B B B A (Vi - v) + (g g) = Vi + Pi( - vj) j=l is a solution to (4.11j)y Wve have rom (4.22),, N v. - v = i - (gB _ - fB > wh f a l o i t But if g g g= 0 and i > 0 with i > 0 for at least one i, this reveals that B A vi - vi > 0 as Howard has pointed outo.

137 With the confidence and understanding derived from these theoretical results, we can now proceed to use the Semi-Markov decision process formulation and the solutions obtained via the Howard algorithm for the study of control policies for queues.

CHAPTER V A DECISION MODEL FOR A. MULTIPLE-ACCESS COMPUTER The introductory discussion of multiple-access computer systems given in Chapter I has shown that proper control of queues is important in obtaining most effective performance from such on-line systems. At the same time, the classical results of queueing theory provide very meager aid to understanding the control of queues in existing systems, since their complexity forbids a closed form analysis in the traditional approach. Those analyses which have been produced have depended on very special assumptions which limit their applicability. Considerable progress could be made through the numerical solution of complex queueing models, or through simulations, but no really appropriate studies of this kind have been publicized. The large number of alternative models and queue control procedures for computer time-sharing, which limit the generality of any single analysis, may be part of the explanation for this. It is therefore a timely and worthwhile effort to apply the discrete Semi-Markov decision theory which has been developed to the queueing control problem of multiple-access computers. This chapter and the following one will serve to demonstrate that a rather general model can be developed for existing systems, and that a number of useful conclusions are obtained. The present chapter will concentrate on the formulation and justification of the model. The restrictions imposed on the physical systems represented by the model are explained first. The next chapter 138

139 will present the conclusions obtained from solutions of the decision via the Howard algorithm. 5.1 THE PHYSICAL SYSTEM The process of developing a mathematical model which accurately represents a physical system is generally quite difficult. Ideally, the development is well supported by experimental data on subsystem behavior and interactions. Usually however the need for analysis and quantitative system studies antedates the final design specification and prototype construction, whereupon such information could be obtained. The system engineer must conduct preliminary studies relying on various hypotheses to construct appropriate models~ Bearing this in mind, we can proceed to describe the important physical characteristics of a hypothetical time-shared computer system whose behavior is quite reasonably represented by a discrete Semi-Markov decision process. The model for this system captures to a significant extent the important and fundamental properties of time-shared computer operation today. No simulation or analysis to our knowledge has dealt with a more realistic or more general model of these systemso The decision model, as we shall see, includes the important queue control policies currently in use in actual systems as well as a great number of more complex procedures. The simplifying assumptions we will make are motivated in most instances by a desire to promote a clear understanding of the relation between the decision model and the physical system, not by a limitation of the technique.

140 5.1.1 Hardware Structure The computer system consists of a single arithmetic processor, a communications interface with the telephone lines to remote users, a magnetic drum serving as the secondary memory, and a magnetic core main memory unit. The drum memory is a large storage capacity unit, and we assume its revolution time to be 50 ms. This value does not enter into modeling the system, but does enter in the physical interpretation of results for the model. Because of its fixed revolution time, the drum provides a convenient source for clocking information for the system. We will assume that the system derives interval timing information from a drum interrupt signal. occurring on each revolution of the drum. Such signals could be obtained by detecting a field of bits recorded on a special clock track. This assumption will lead to the discrete-time character of the decision model for the system. The data organization of the drum memory is an important consideration in arriving at an appropriate model. This is so because some interval of non-productive time is required to find any program on the drum, bring it into the main memory, and prepare it to begin execution. The data organization influences the duration and variability of this intervalo The drum. memory is organized into a number of recording fields. Each field consists of a band of adjacent tracks running around the circumference of the drum. All fields have their origin at the same circumferential point. Each field consists of the same number of fixed

141 length records. A. data or program file on drum memory is comprised of some number of records. It will. be assumed that the successive records of any file are stored sequentially in one field, and if the file requires more than one field, the remaining records are stored sequentially in one or more adjacent fields. The process of retrieving a file will consist of reading the records of the first field occupied by the file, until the initial record of the file is located, then transmitting the file record-by-record and serially-by-field to the main memory. Files which occupy one field or less can therefore be retrieved in one drum revolutions The origin of the drum fields is displaced from. the clock field by one-half revolution. This gives the processor sufficient time to initiate a drum read/write operation after receiving a clock interrupt, if desired. See Fig. 5ol1 The main memory is taken to be of moderate size, allowing say 8000 words for use by user programs, the remainder of the memory devoted to Executive Control Program usage. This rather limited user memory space forbids having multi-programming, in which more than one user program could occupy main memory at one time. Consequently, only one user program occupies main memory at any time and it can be permitted to access any part of the user memory area. The entire area (8000 words) must therefore be unloaded when the user is removed from main memory. An important class of data files on the drum consists of these "core images" for user programs which have been unloaded from main memory.

Remote Console I Central Clock Interrupt Signal Processor Communication _- Main Interface Memory H Remote Ct Clock Console 4 Read/Write Control Control \' - - - ~ ~~~I L Magnetic Orig of Recording Drum Fields I Memory I 1 / Clock Track Fig, 5.1, Organization of the system to be studied.

143 Unloading of a user could be done in one drum revolution, assuming that the drum is capable of storing a user core image in one recording field around the drum periphery~ This is certainly possible with present day drum memories. If the user memorvy area is larger than the drum field capacity, several revolutions may be required to unload a user program. We shall assume at all times that the user memory area is an integer multiple of the drum field capacity~ The portion of main memory devoted to the Executive Control Program is to be large enough that the ECP and its associated tables can permanently reside there. No control function should require retrieving a data file from t'he secondary memory before it can be carried out. The ECP area shall. include the subroutines needed for scheduling and control of queues5 for communicating with the drum memory and properly loading the programs needed to carry out a user command. The command programs themselves are all to be retained in the drum memory As shown, in. Figo 5.1, we shall only treat a system with four remote consoles actively in use at one times This is.done to simplify the task of specifying the data for the decision modelo 5 1.2 Software Structure A. good understanding of the behavior of a multiple-access computer system requires a familiarity with the structure of programs or "software," in addition to that of hardware. Two principal classes of programs are involved, the Executive Control Program, and. the user programsO In regard

144 to user programs, interest centers on the programming conventions to which each user must conform in order that his programs will operate on the system. One programming convention which we shall introduce in the system being modeled is that no user program can directly issue commands to the ECP which call for the execution of another program. This means that any programs or subroutines required must be loaded as part of a user program, or else brought into action by command from the user console. This is not an especially desirable convention for it limits the flexibility of usage of the system0 On the other hand, it makes the ECP somewhat simpler in design. Another convention imposed allows user communication with the secondary memory only through a command program initiated from the user consoleo Moreover, the largest admissible data transfer is equal to a drum field, which can be accomplished in one drum revolution. The above conventions ensure that all commands entering the system come from a user console0 The latter convention also ensures that the minimum time interval in which a program would be allowed to run is sufficient to accomplish any data transfer, without interruption and loss of datao A.n additional convention we will require is that each console may have only one program or command in process at any time9 This is generally the case in existing systerrms'

The Executive Control Prognam must perform essentially the same functions discussed in Chapter I. Because the drum memory is used via a command program, the diagram of Fig. 1.2 can be simplified, eliminating the I/O Queue from the ECP. (The I/C Queue becomes a part of the Job List, which is the central queue.) Because the drum interrupt only counts off the passage of a 50 ms interval, the ECP must include a timekeeping function not emphasized in Fig. 1.2. In order to measure any prescribed interval, we presume the ECP uses a one-word register, called the clock register, which the ECP decrements by one upon receipt of each drum interrupt. If the clock register is initially set to any number, it subsequently will reach zero value when that number times 50 ms has elapsed. Figure 5.2 indicates this function, and gives more detail than Fig. 1.2 on ECP functions for the system we shall. model. A user program or command program is removed from further ECP consideration for several reasons: an error, such as arithmetic overflow; successful termination of execution; or, an excessive amount of output to the console. The latter points out another assumed convention, Presumably the output buffer assigned to the user is adequate for normal communication of several. type written lines in a short. interval of time. If the user program generates output equal to the buffer capacity in less time than the communication processor can transmit it to the con.sole, the ECP will remove the program. The user must reactivate the program by command from the console to obtain further output. In a

Type of Interrupt or Exit Input Interpret xe ueue Currently |Drum Jl_______________gt Qu Ety tog the Ch-irncter Chiricter Move to xecuting t Intserrupt \User'uff er Data Request * Modify _____User D~ ~. Fxit frow m Decrement s eturn | Error ________ Queue Set Job C Ichdu~~& Typa Rewriter Modify Input-rr Output Queue Set Job Fig 5.2. ecutie coTermination s in th Schedule &d and Wait I Load Next Successful User Program Completion Buffer Full ^~Move to Program oga Output Output* Buffer Buffer Not Full Typewriter Input-Output Fig,.2. Executive control functions in the system being modeled.

147 physical system, this convention would serve a useful purpose in discouraging users from excessive typing, and as a safety measure against program errors which result in excessive output, For purposes of modeling, it simply eliminates the possibility that jobs will exist in queue which require further execution but cannot be given it, even if desired. The decision model we shall formulate shortly serves to investigate the queue control actions involved in scheduling the execution of jobsa This ECP function is emphasized in Fig, 5~2~ Each scheduling decision or queue control action requires the selection of a single job for execution, and the allocation of a maximum interval of time in which the job may execute~ We now describe the general framework in which this selection and allocation is to be made in the physical system. 51.03 General Control Process The physical constraints resulting from the assumed structure of hardware and software, and additional practical constraints whichn will. be discussed now, establish the general form of the queue control process for the system, Because there is a single arithmetic processor, the computer system appears as a single server service system. There is a finite source of arrivals, because only four consoles are allowed to be in use at one time and each may have but one job in process Since only one job may be held in memory, changing from execution of one user program to another involves a swapping times for saving one core image and loading the other0

148 An additional time may be needed if the second program is a newly received commands This will be called the "set-up time" S. It is the computing time required to establish proper relocation and linking of the subroutines comprising the program. 1 At any time that jobs are present for service, the ECP has a very basic choice of either executing one of the jobs, or simply remaining idle, waiting for a new command to arrive, Conceivably the latter course could be useful if it was imperative to give immediate attention to newly received commands, and there was no way to interrupt a job in execution. The time-keeping function of the ECP, which controls the maximum time allocated to a job on any execution pass, always provides a way to break into any jobs We assume that proper allocation of this time interval will in practice be sufficient to eliminate any need for the action of leaving the machine idle. Every queue control action therefore consists of selecting a job to be executed and allocating a maximum interval of time in which the job may run before possibly being swapped for another. The choice involved in each action should conceivably be based on the totality of information obtainable about the jobs present. Such information could consist of the number of jobs, the list of subroutines comprising each program, the total program size of each in words, the execution time each has obtained, the console originating each command, a user estimate 1That is, waiting in queue or being executed.

149 on. how much execution time is needed, and many other data. The applicable and useful data is sharply diminished by practical considerations and the physical constraints already imposed. As a general principle, the execution time required to complete a command is not accurately known before hand, So it appears reasonable for simplicity to consider the execution time of a job as a random quantity, with a probability distribution obtained by observation of the entire population of user jobs submitted. Because the system of interest here has a somewhat limited user area in main memory, there will not be a wide variation in total user program size. Thus any correlation between program size and execution time will not be as distinctive and useful as in a larger memory system. We shall. assume then that the simplest and most relevant item of information on a single job which bears upon the queue controlactions is the execution time which the job has received at any point in time. Neither the name or class of a program or the originating user enters the scheduling process. This situation is sometimes referred to as the "context-free" case. The most general scheme possible for scheduling is therefore as follows. Each job in queue has associated with it the value of execution time which it has received. Whenever a job is placed in execution it is allocated a maximum interval for execution. There is no restriction on the interval which the ECP may allocate, other than it being an integral number of 50 ms steps' At the expiration of that interval, the ECP examines the queue again. The ECP then selects another job to be executed

150 or elects to continue executing the same job. Upon swapping, if any, the ECP determines another value for the execution interval and turns control of the computer over to the user job for that interval. We shall simplify and modify this framework in several ways. These restrictions are consistent with design simplicity for the ECP and present practice, but also are convenient for modeling the system. We shall limit the permitted values for the allocated execution interval to three. The first value, denoted e2, is to be assigned to every new job receiving its first pass of execution. The second value, denoted as a difference (e3-e2), is allocated to any job entering a second execution pass, having'previously received an interval of duration e2. The third value, denoted by q, is assigned to every job which has previously received execution for a total time e3 or greater. These time allocations result in establishing a multi-level queueing structure. Each level corresponds to a particular value of accomplished execution time. The successive values of accomplished execution are 0 (new arrivals), e2,e3 e3 + q, e3 + 2q,o.,,etc. As a job receives time on the processor it passes from one level to the next lower one, or is completed and leaves the system, as shown in Fig. 5.3We shall further assume that when there are two or more jobs at the same level, for either levels 1 or 2, the earliest arrival is to be chosen for execution, For example, if there is a job at level 2 and another job is just completing its first pass of execution, the former

151 New Level 1 arrivals arrivals -) New arrivals First pass execution: maximum time e2e Completed _ jobs depart Level 2 Jobs requiring total time > e2 Second pass: maximum time e3-e2 Completed jobs Level 3 Jobs requiring total time > e3 Third pass: Completed maximum time q jobs Level 4 Jobs requiring total time > e3+a Fg53i i I i Fig. 5.3. Multi-level queueing resulting from restricted execution time allocation.

152 would be preferred to the latter for next execution, since both jobs are then at the same level. This will involve swapping these jobs, which could otherwise be avoided by continuing execution of the latter. A. loss of productive time thus occurs, but it will be a small loss when swapping is done rapidly, as it will be with a drum memory. Maintaining order by time of arrival among equivalent jobs seems to be a good service policy from an individual user's view, and is followed in most systems. Thus the small additional loss in time to achieve this seems worthwhile. Finally for additional simplicity we shall consolidate the levels corresponding to e3, e3 + q, e3 + 2q,,,, into a single third level queue which is handled in a round-robin fashion. Whenever a job from this queue completes an execution interval of duration q, it is moved to the end of the queue. Jobs reaching the third level by execution from level 2 are placed at the end of the third level queue, which is consistent with a head-of-the-line selection on each levels The three queue levels which now constitute the framework for the scheduling process will be denoted by Ql, Q2, and Q3. Figure 5.4 illustrates the general control process Note that an execution pass on only one job can be in progress at any time, and therefore each -chedu.ling decision corresponds to the choice of the queue level which will, be served. The Semi-Markov decision process which we will now formulate will allow us to investigate how this choice should optimally depend on the number of jobs present in each queue.

153 New arrivals Queue 1 Head-of -line selection New arrivals d t First pass execution: maximum time e2 I Completed j jobs depart Queue 2 Jobs requiring Head-of-line selection total time > e2 Second pass: maximum time e-e Completed -jobs Queue 3 Jobs requiring Head-of-line selection total time> e3 Next pass: maximum time q Completed jobs Return to end-of-line Fig. 5.4. Three level queue with round-robin service on last level.

154 We point out that the freedom to assign the choice arbitrarily allows us to include priority policies, as used in existing systems such as Project MAC [18]. 5~2 FORMULATION OF THE MODEL 5o2o1 Simplifying Assumptions A formulation of the queue control operation as a discrete SemiMarkov decision process is obtained on the basis of the following assumptions* 1. Each command arriving at the computer requires an execution time te which is an independent random variable with probability distribution function F(T), F(T) = Pr (te < T), T > O. 2, The distribution function F(T) has a negative exponential tail for T > e3. That is, for some constants C and A F(T) = I - ae, T > e3 (5.1) with, > 0, and 0 < a < lo 3. The time interval between completion of any user's command and receipt of the next command from the user is called the arrival time. It is assumed to be an independent random variable with a negative exponential distribution. Denoting this interval by ta, we have Pr (ta < T) = 1- eXT T > (5.2) and Tu = l/k is the expectation of tao

155 4. The swapping time s and the set-up time S are of constant value, and an integral number of 50 ms steps. 5. The event that an arriving command requires a set-up time is an independent random event with probability 0, 0 < K < 1. 6. The ECP actions occurring in the interval between successive scheduling decisions are accomplished -in negligibly short time. Assumptions 1, 3, and 5 are necessary in order to achieve a formulation of the physical system within the theory we have developed. The independence of the random variables, and (5.2) are the key parts of these assumptions. We can offer no concrete justification for these. They are examples of the assumptions which the system engineer is forced to make, the validity of which can only be verified by examining statistics of actual system operation. The need for statistical observations of time-shared computer operation is coming to be realized by the management of existing systems, and some observations have been reported [.19, 48, 971] No reported data bears on assumptions 1, 3, and 5 however. As long as we are seeking to establish broad notions of system behavior, these appear to be reasonable approximations. The use of (5.2) can easily be relaxed by use of a different model, but this brings up a new theoretical area which is beyond our present effort. More will be said on this point in Chapter VI. Quite reasonable justification can be given for assumption2 on the basis of some recent observations [98, 99] on the IBM 7090 batch-processing system at The University of Michigan Computing Center. Figure 5.5 shows

1.0. 9 __________.____ ~.~ ~.' " u v J-~ o o o^ ^ o 71.44 0.44.12 |3.30 1.65 l20 ~ ~ @^ J.. -2. I~~~~~~~~~~~~~~~~~~~~ Normalized Succtime T/Te Uncontrolled Succe0 2.0 3.0.0 5.0 6.0 70 ^g* ^^~ Observ~~~~~~~~~~~~~~~~~~~~~~~Fitted distribution of execution time at The Universiy.7 Parameters of Fittedgan Computing Center. o. 44 0 I4410. 12 13. 30]l, 5!.20.6 VI 0.4 2 Normalized time T/Te 0 1.0 2.0 3. 0 4.0 5.0 6.0 Fig. 5.5. Observed distribution of execution time at The University of Michigan Computing Center.

157 the probability distribution function on execution time for two classes of users of this system. The classes are designated "Controlled," which consists of all student programming for scheduled courses, and "Uncontrolled," which includes faculty, sponsored research, and thesis users. Only programs which executed without recognized errors or resort to memory dumps are included, hence the term "Successful" is applied to these programs. The distribution functions are plotted on a time scale T which has been normalized to the mean execution time, Teo For Controlled users the observed mean was 0012 minutes, while for Uncontrolled users the observed value was 1.21 minutes. If is clear that for T/Te > 2.5, the tails of the two distributions are much alike. Figure 5.5 also shows a rather good fit to the observed data by means of a distribution known as a three-phase hyper-exponential ([7], p. 39ff), described by F(T) = 1-al e-lTa2e —42TT 3ese (5-3) with ai + a2 +,3 = 1, and 0 < ai < 1, i=l,2,3> One can see that for 41 > - > s13 and T sufficiently large, (953) gives F(T) 1- 3 e43T which is the behavior required by (5l1). Figure 5.6 shows the tail of the observed data for Uncontrolled users on a semi-logarithmic plot, verifying that the observed execution time distribution does indeed have an exponential tail.

.3-.1 A 05 Straight line asymptote demonstrating the p ~ exponential tail. oJe 0 00~~~~~~~~~~~~~~0.01 1 2 3 4 5 6 7 8 9 10 11 12 13 1415 T - minutes Fig* 5.6. Observed probability distribution of execution time for uncontrolled users of The University of Michigan Computing Center.

159 Of course, time-shared computer programming is different than batch processing, due largely to the convenience of direct communication. Two facts suggest that the result of Figs. 5.5 and 5,6 can be carried over* First, Fig. 5.5 indicates that two classes of batch processing users, engaged in fundamentally different activities, have similar execution time distributions despite a wide disparity in the mean values. Second,-the time-sharing system we are modeling retains the fundamental structure of a main program and associated subroutines, loaded together at execution, which characterizes a batch processor. We shall therefore use execution time distributions of the class described by (535) in investigating the multiple-access computer. The mean execution time however will be taken to be on the order of a few seconds, which one expects for users taking advantage of the man-machine interaction possible with the system. Assumptions 4 and 6 could be avoided by a more elaborate model than we will investigate. Instead of 4 we could assign probability distributions to the swap time s and the set-up time S. These would have to be justified on some grounds which account for the data organization of the secondary memory and the operation of the loader subroutine of the ECP. The size of the user memory area, the integral relation between this and the drum field capacity, and the fact that the ECP operation is in effect synchronized to the drum through the drum interrupt, seem sufficient to warrant 4 as a good approximation.

160 Assumption 6 is easily justified since from Fig. 5.2 the primary ECP actions which occur are timekeeping and console input interpretation. For the latter, the ECP simply examines each character received and makes an entry in a queue when a recognizable command is completely received. Considering the great speed of a modern processor, the comparatively slow speed at which the fastest typist can strike keys, and that a number of characters are usually required to make up a complete command, the time needed is negligible~ For example, at a speed of 60 words per minute a typist takes one second to type 5 characters. Allowing as much as 100 memory cycles (2its each) to process each character, the ECP input interpretation time is Oo.1 of the time to type them, The timekeeping time in the same interval should be even less. 5.2,2 Definition of States and Actions The state of the system for modeling as a decision process is described by four state variableso Three of these describe the jobs which may be waiting in queue for execution, nI = the number of jobs in Q1 n2 = the number of jobs in Q2 n3 = the number of jobs in Q3 The programs corresponding to jobs in Q1, Q2, or Q3 are located on the drum memory. In addition, there may at any time be a job in the main memory which is being executed, or on the other hand, the memory may be empty due to the completion of a job. It is necessary to record this

161 information in the state in order to properly account for swapping time. The fourth state variable & has values as follows, & = 0 if the user memory area is unoccupied, & = 1 if user memory contains a Q1 job, & = 2 if user memory contains a Q2 job, = 3 if user memory contains a Q3 job. When we use the expressions "a Q1 job," "a Q2 job," "a Q3 job," we mean a job which has obtained 0, e2, or e3 or more units of execution time, respectively. If & = 1, 2, or 3, the job occupying the user memory area presumably does not appear in Q1, Q2, or Q3, and hence is not counted in n1, n2, or n3o (In the. physical system, the list of job entries comprising Q1, Q2, and Q3 would include the job, but with a suitable mark indicating it was in execution.) The execution of a job begins at the completion of the swapping and set-up time, if any, needed to bring the program into the user memory area. Execution continues until the drum interrupt which signals the end of the maximum time allocation, or until an interrupt which terminates the job (see Fig. 5.2)o The next scheduling action occurs at the drum interrupt marking the end of execution, or the first one thereafter if execution is terminated during a clock cycle. The scheduling action commences with an examination of the queue and a decision as to the queue to be serviced nexta Then any necessary swapping and set-up is carried out, The one-half drum revolution between the drum interrupt which initiates the scheduling action and the origin of the drum fields should

162 be sufficient to accomplish all table searching and computing needed in the scheduling decision. The important point is that the decision is based on the state of the system at the end of an execution interval. Now the state variable I can only have the value 1 at the beginning of an execution interval. This value is therefore excluded from the possible states observed at scheduling control points. Table 5.1 lists the states of the system for formulation of the decision process. The description of the state by the values of the state variables is given in addition to an arbitrarily assigned integer index i. The value of N, the number of states, is 75. The action indices k = 1,2,3 correspond to the decisions to execute a Q1 job, a Q2 job, or a Q3 job respectively, except for i = 1 where the system is empty, and the only possible action is to wait until a command arrives. The action k = 1 in state 1 corresponds to this course. For each action, Table 5.1 also lists the indices for the possible end states of a transition from each state. A transition in state, as observed in scheduling, takes place over the interval between successive drum interrupts which start the scheduling control action. The system state throughout the interval is taken to be the state which the system had at the first interrupt. The state is considered as changing instantaneously at the second interrupt to a value determined by the random events which have transpired during the interval. Among these random events are arrivals of new commands from the user consoles. Each arrival increases the value of nH by one. Another random event is the completion

163 TABLE 5.1 DESCRIPTION OF POSSIBLE TRANSITIONS FOR EACH ALLOWED ACTION oJ E X State Variable Possible End State Indices for a Transition from State i u Description Action Action Action n1 n2 3 k=l k=2 k=3 1 0 0 0 0 1,2,7,19,41 2 1 0 0 0 1,2,5,7,10,19,22,44 3 0 1 0 0 1,2,6,7,11,19,23,45 4 0 0 1 0 1,2,6,7,11,19,23,45 5 0 0 0 2 1,2,6,7,11,19,23,45 6 0 0 0 3 1,2,6,7,11,19,23,45 7 2 0 0 0 2,7,10,19,22,44 8 1 1 0 0 3,8,14,20,26,48 2,7,11,19,23,45 9 1 0 1 0 4,9,17,21,29,51 2,7,11,19,23,45 10 1 0 0 2 3,8,14,20,26,48 2,7,11,19,23,45 11 1 0 0 3 4,9,17,21,29,51 2,7,11,19,23,45 12 0 2 0 0 3,8,15,20,27,49 13 0 1 1 0 4,9,18,21,30,52 3,8,15,20,27,49 14 0 1 0 2 3,8,15,20,27,49 15 0 1 0 3 4,9,18,21,30,52 3,8,15,20,27,49 16 0 0 2 0 4,9,18,21,30,52 17 0 0 1 2 4,9,18,21,30,52 3,8,15,20,27,49 18 0 0 1 3 4,9,18,21,30,52 19 3 0 0 0.7,19,22,44 20 2 1 0 0 8,20,26,48 7,19,23,45

164 TABLE 5.1 (Continued) i n2 n3 k= k=l2 k=3 21 2 0 1 0 9,21,29,51 7,19,23,45 22 2 0 0 2 8,20,26,48 7,19,23,45 23 2 0 0 3 9,21,29,51 7,19,23,45 24 1 2 0 0 12,24,33,55 8,20,27,49 25 1 1 1 0 13,25,36,58 9,21,30,52 8,20,27,49 26 1 1 0 2 12,24,33,55 8,20,27,49 27 1 1 0 3 13,25,36,58 9,21,30,52 8,20,27,49 28 1 0 2 0 16,28,39,61 9,21,30,52 29 1 0 1 2 13,25,36,58 9,21,30,52 8,20,27,49 30 1 0 1 3 16,28,39,61 9,21,30,52 31 0 3 0 0 12,24,34,56 32 0 2 1 0 13,25,37,59 12,24,34,56 33 0 2 0 2 12,24,34,56 34 0 2 0 3 13,25,37,59 12,24,34,56 35 0 1 2 0 16,28,40,62 13,25,37,59 36 0 1 1 2 13,25,37,59 12,24,34,56 37 0 1 1 3 16,28,40,62 13,25,37,59 38 0 0 3 0 16,28,40,62 39 0 0 2 2 16,28,40,62 13,25,37,59 40 0 0 2 3 16,28,40,62 41 4 0 0 0 19,44 42 3 1 0 0 20,48 19,45 43 3 0 1 0 21,51 19,45 44 3 0 0 2 20,48 19,45 45 3 0 0 3 21,51 19,45 46 2 2 0 0 24,55 20,49 47 2 1 1 0 25,58 21,52 20,49

165 TABLE 5.1 (Concluded) Hi n1 n 2 n 3 k=l k=2 k=3 48 2 1 0 2 24,55 20,49 49 2 1 0 3 25,58 21,52 20,49 50 2 0 2 0 28,61 21,52 51 2 0 1 2 25,58 21,52 20,49 52 2 0 1 3 28,61 21,52 53 1 3 0 0 31,65 24,56 54 1 2 1 0 32,68 25,59 24,56 55 1 2 0 2 31,65 24,56 56 1 2 0 3 32,68 25,59 24,56 57 1 1 2 0 35,71 28,62 25,59 58 1 1 1 2 32,68 25,59 24,56 59 1 1 1 3 35,71 28,62 25,59 60 1 0 3 0 38,74 28,62 61 1 0 2 2 35,71 28,62 25,59 62 1 0 2 3 38,74 28,62 63 0 4 0 0 31,66 64 0 3 1 0 32,69 31,66 65 0 3 0 2 31,66 66 0 3 0 3 32,69 31,66 67 0 2 2 0 35,72 32,69 68 0 2 1 2 32,69 31,66 69 0 2 1 3 35,72 32,69 70 0 1 3 0 38,75 35,72 71 0 1 2 2 35,72 32,69 72 0 1 2 3 38,75 35,72 73 0 0 4 0 38,75 74 0 0 3 2 38,75 35,72 75 0 0 3 3 38,75

166 of the job being executed. This event changes the value of & to zero. For every state except state 1 there will be a job in execution during the transition interval. The third random event is the non-completion of the job being executed. This will also result in a change in value of the state variable e, In order to see how the state transitions come about, let us consider some of those given in Table 5.1. For example, consider state i = 2. The only action possible in this state is to execute the Q1 job present. Under this action the transition interval will be at least s units long, for this is the minimum time required to load the job program into the user memQry area. The interval will usually be longer due to the possibility of set-up time, and the execution time of the job. Suppose the job does not terminate itself during the assigned maximum execution interval e2, Excluding set-up time the transition interval will have duration e2 + s, and the value of the state variable & will be 2 at the end of the interval. Because there may be arrivals from any of the three other users, the value of nl may be 0 (no arrivals), 1, 2, or 3 at the end of the transition interval, Writing the state variable description as a vector (n1, n2, n3, Y), the possible end states are (0, 0, 0, 2), (1, 0, 0, 2), (2, 0, 0, 2), and (3, 0, 0, 2) if the Q1 job is not terminated during its execution. This accounts for the indices 5, 10, 22 and 44 listed in Table 5.1. If we suppose that the job does terminate during its first execution pass, I will have~ zero

167 value at the end of the transition intervals Accounting for the possible number of arrivals, the possible end states are (0, O, O 0), (1, 0, 0, 0), (2,. 0, O0 0), or (3, O, 0, 0), whose indices are given in Table 5.1. The state (4, 0, 0, O) is not possible, for this requires that a new command arrive for the user whose job just terminated, A1though there may be a non-zero time interval between the termination time and the succeeding drum interrupt where the scheduling decision is made, it will be no greater than 50 ms and we assume that the arrival time does not commence until the end of this intervals Now consider the state i = 1 Here the computer remains waiting until a command arrive's. The scheduling action does not begin until. the next drum interrupt after the arrival. Between the first arrival and the drum interrupt the other three consoles could each conceivably complete typing a command, and thus up to four arrivals could occur in the transition interval. The states (1, O, 0, 0), (2, 0, 0, 0), (3, 0 0, 0), and (4, O, 0, 0) are possible end states~ The state (0, 0, 0, O) is also listed as an end state for itself in Table 51lo This is the case if we assume the ECP examines the queue at every drum interrupt when the system is empty, We may however assign zero probability to this transition in our solutions, and simply consider the next decision to occur after there has been at least one arrival. This excludes many trivial decisions, those having only one possible result, from the decision process~

168 It should also be noted that we have in effect defined an imbedded decision process. For with a suitable definition of how the state variable & changes during swapping and set-up, we can consider (nl, n2, n3, C) as the state of the system at any time of system operation. Then an arrival produces an instantaneous change of state at the time of arrival. For the imbedded decision process this change is not noted until the drum interrupt at which the scheduling action occurs. Figure 5-7 shows a sample sequence of transitions for the system and the imbedded decision process, clarifying this relationship. In this figure, swapping takes two time steps. 5.2o3 Derivation of Formulae In order to have complete freedom to explore whatever effects prove to be interesting in optimization, we must avoid a rigid assign-. ment of values to the system parameters. Table 5.2 is a complete list of these parameters assuming the execution time to be modeled by the class satisfying (553)~ If we are to vary these parameters at will, a set of general formulae must be derived for the transition probabilities k k Pij, the expected transition times vi, and the expected one-transition k returns b.o As seen from the Howard algorithm,only this data is rek k quired. This is fortunate, for one can sometimes determine vi and bi by k k inspection, while it is tedious to ascertain f. (T) and r. (mIT) for every i, j, k, m, and To

State o0 I' I NO o 0 0 ~~: 0?? N o 0 o'0 0 0 o 0^:' <~0 t L 0 0 1^ ___^ i I Scheduling ii n' I decision 0 i (D* ~ C C _of g H.. I g2s O ~_. End of swapping 0 0 0 H 0, C oen ~ I_ _J < End of m i -o -e) I time 0D 0 ~,, | End of II co deci 1, swapping 69T 3.u o,. End of SY C~ I da _ __J ^__End of 6g

170 TABLE 5.2 SYSTEM PARAMETERS Symbol Definition X\ ~Reciprocal of mean arrival time e2 A.ccomplished execution time of Q2 jobs e3 Minimum accomplished execution time of Q3 jobs":q Maximum execution interval for Q3 jobs s Swap time S Set-up time 0 - Probability that a Q1 job requires set-up Tal a2, 2 3 Probabilities in hyper-exponential execution time distribution iL 2,.2 3 Reciprocal mean values in hyper-exponential execution time distribution. Tu Mean user reaction time, 1/X Te Mean execution time, aC1/pl + C2/ +2 + 03/43

171 There is little to be gained by proceeding here to derive each of the many formulae required. Appendix C lists the results of the derivak k tions. We will illustrate here the derivation of pij and vi. The next section, where we introduce the optimization criteria of interest for multiple-access computers, will deal with the derivation of bi. Assume an arbitrary execution time distribution F(T) having a negative exponential tail as required in (5.1) and suppose the time scale T is normalized to the drum revolution time, 50 ms. Let fn = F(n) - F(n-l), n = 1,2,3,... (5.4) th This is the probability that a job terminates in the n-t step of its execution. Note that F(0) = 0. Assume that the parameters in Table 5o2 are also normalized to the drum revolution time, where appropriate. We have already indicated that we set pli = 0o Consider P12o For every transition from state 1 there must occur at least one arrival. Any transition from state 1 ends with the unit step in which the first arrival occurs. Then p12 is the probability of only one arrival in a unit step, conditional on there being at least one arrival. The unconditional probability of only one arrival from four users is 4 e -3(1 - e-) and the probability of at least one arrival is 1 - e-4 Therefore -32- -42' P12 4 e (1 - e)/(l- e (5-5) Consider now the transitions from state 2. The probability that the transition interval has duration (n + s) and there is no set-up is (1 - Q)fn, for n < e2. The probability that the duration is (n + s + S) and there is a set-up required is 0 fno The transition from state 2 to

172 state 1 requires no arrivals during the transition interval, and termination of the job. Thus e2 v-3(n+s) P2i = (1 @) e fn nl= e2 ~C -3X (n+s+S) +4 e fn n=i or e2 -3XS -33S -3Xn pp1 = e (1 - + e ) e fn (5.6) n=i The transition from state 2 to state 5 requires that no arrivals occur, and the job remains uncompleted at the end of the allocated execution time. The transition interval in that event has duration (e2 + s) with probability (1 - F(e2)) (1 - 0) The duration is (e2 + s + S) with probability 0(1 - F(e2)). Then,-3(e2+s ) P5 = ( - g) e (-F(e2)) (507) -3X(e2 + s + S) + e (1 - F(e2)) Now consider transitions from state i = 75. Because the system is fully occupied with a job from each console, no arrivals will occur. The only action possible is to serve a Q3 job. Because we have assumed roundrobin service, the job occupying the main memory must be swapped for the one at the head of Q3. The exponential tail on F(T) for T > e3 provides us with the memoryless property of the negative exponential distributionh So we need not know how long the next job has been in Q3. The probability

173 that the transition interval has duration (n + s) is fn+e/( - F(e3)) for n < qo Note that this is the conditional distribution on the remaining execution time of the jobs A transition from state 75 to state 38 occurs if the next job is terminated during the q units of execution which it will be allowed. Thus q P7s,38 = fn+e3/( - F(e3)) (5.8) n=l The formulae for the mean transition times are comparatively easy to derive. Consider state i = 1. Here we have a constant probability (1 - e 4) that the transition interval ends with any unit step. The expected number of time steps in the transition is therefore vi = 1/(1 - e (5o9) Next consider state 2. The transition interval includes swapping and set-up time with an expected value (s + QS). The execution interval has duration e2 with probability (1 - F(e2)), the probability that the job does not end during the assigned period. If it does end at or before e2 the expected interval of execution is determined from fn Then e2 vl = s+ QS + n f + e2 (1 - F(e2)) (5.10) n=l For state 75 the action k = 3 involves no set-up time and therefore q ~v5 = s + nf + e/(1 - F(e3)) n=1 (5.11) + q (1 - F(e3 + q))/(l - F(es))

With these illustrations as introduction, the reader can turn to Appendix C for the complete derivation of formulae. 5 3 OPTIMIZATION CRITERIA The decision model which has been formulated now requires a quantitative statement of the appropriate optimization criteria for computer time-sharing. Although the model is well defined and rigorous within the limits of the assumptions used, the choice of an optimization criterion brings in more of what may be called intuition and personal judgment. In large measure this is because the performance of a multiple-access computer must be judged in terms of user reactions to the delays which they encounter in using the system. A large amount of experimentation and experience with time-shared machines will undoubtedly be needed before there is a precise and widely accepted standard of performance from a human stand point. At this early stage of development we are content to explore two criteria which seem to cover the first-order user reaction to delay. These will illustrate the flexibility of the formulation and provide depth to the study of the multiple-access computer. 5.3.1 Minimization of Response Time The response time to a user command is the time interval between the reception of the complete command by the computer and the completion of typed output from the program to the user console. The user will normally wait for all output from a previous command before proceeding to develop a new request. Hence the response time may limit the rate

175 at which the user inputs commands and receives results. The response time must always be as long as the execution time required for a command. Any added time results from conflicting user demands and can perhaps be reduced by proper queue control. The more cLosely response time approaches the execution time, the less chance there is for aggravating and noticeable delays to impede the user, This suggests the definition of a return for any command which is equal to the negative of the response time for the command. Maximizing the return is then equivalent to minimizing the response time. But the computer system is intended to operate over a long period of time and so the system return should include the response time for all commands processedo A. straight forward approach is to measure the system return by the negative sum of response times for all commands processed in an interval of time. Moreover, because we have a stochastic system, the optimization will be with respect to the expectation of the system return. In the physical system we are modeling, response time is very close to the interval between the arrival and the termination of a job, ice., the time-in-system of the job. Any difference would stem from the brief time to type out the characters generated just before the job termination, which we will neglect, These considerations therefore lead to the definition _ expectation of the total time-in-system of commands i VT(i) arriving in [O,T], conditional on the initial state (512)

176 We will consider the infinite time optimization problem with this return function, and we will refer to this as "minimization of response time." The manner in which the one-transition returns bi are to be determined to accomplish this will be treated after we introduce another optimization criterion. 5~3.2 Minimization of Weighted Response Time The user can never receive a response time less than the execution time he has requested by his command. Thus users who request extensive computation must be content with longer response times. It seems reasonable then that such users will not be troubled as much by a given amount of delay as the user who has requested a trivial computation. An example of a trivial computation is the acceptance of a typed input line from a user constructing a computer program on-line. Here the user could be ready immediately to type the next line, and any significant delay would be an aggravation. The previous criterion gives equal weight to the time-in-system for all jobs, regardless of their required execution time. Thus it does not recognize that users with longer execution time can tolerate proportionately more delay than users with trivial computations. One way to include this effect in the optimization criterion is to weight the time-in-system of any job by an amount which decreases as the required execution time increases.

177 Figure 5.8 illustrates a family of weighting curves which can serve for an exploratory study. We will not attempt to justify any of these as the most desirable or reasonable of all. possible curveso They have several attributes however which one would expect for an appropriate curves The curves are flat for small execution time, recognizing that users with trivial computation have equal need for rapid response. The weight falls smoothly as execution time increases, reflecting a greater user tolerance for delay. For large values of execution time, the minimum possible response time will be large enough to exceed any user's tolerance for delay. Thus the weight curve should become flat again to reflect an equal intolerance for delay. The weight functions plotted in Fig~ 5.8 have a convenient functional form W(te) in terms of the execution time teo k-1l (kP te) _kPte W(te) = + (1 - ) (k: t e k(513) m=O The parameters a, 3, k are given on the figure. With the weight value taken as unity for trivial computation times, the effect of weighting is to scale the response time of any job to an. equivalent value for a trivial. job. Table 5.3 gives the set of response time values equivalent to one second response for a trivial job according to curve 1 of Fig~ 5.8. It should be clear that minimization of response time is achieved if we merely set W(te) = 1 for all te, The assignment of the one-transition k returns bi is determined by the same general process, which we will now develop.

178 1.0 0.9 0.8 1, 2 0. 6 0. 5 0.1.. EParameters of 0. 3 _.8 weighting functions 0 No. I k bDp 1.05 1 2 B 2.05.8 6 0. 2 03.05.6667 10 Id 4.05..5 20 0. 1 0.1 0.2 0.5 1.0 2.0 3.0 4.0 5,0 6.0 Execution time in seconds. Fig. 5.8. Weighting functions for time-in-system.

179 TABLE 5.3 EQUAL WEIGHT RESPONSE TIMES FOR WEIGHT CURVE 1 Execution Time Response Time Trivial 1 sec. 0.5 sees. 1.34 sees. 1.0 ses, 2-30 secs. 2.0 sees. 7-30 sees. 3.0 secs. 15.0 sees. 20 sees. 20 sees.

180 5-3.3 Assignment of One-Transition Returns We assume that we are given a function W(te) according to which the time-in-system of a job requiring execution time te is to be weighted. The system return function is defined to be IVT(i) = - [expectation of the sum of the weighted time-insystem of commands arriving in [O,T], conditional on initial state ii The Howard algorithm requires only the values of the one-transition k k returns bi. It is convenient to decompose bi into a sum of partial returns occurring over a transition interval, k k k k kk bi= +i + i + i-7ii (5014) where k. = the partial return attributable to new arrivals during the transition interval, k. = the partial return attributable to the execution time of the job executed during the transition interval, k 6. = the partial return attributable to the swapping and set-up time of the job executed during the transition interval, k y. = the total weight per unit of time-in-system for jobs in system at the beginning of the transition interval, excluding the job placed into execution. The evaluation of these terms will be easier than specification of the kk k rik (mjT) values' The term 7i v. in (5.14) gives the expectation of weighted time-in-system over one transition for the jobs which will certainly remain in system throughout the transition, The job which

181 is executed during the interval may terminate and its associated return, k k k consisting of ik + ik, is evaluated separately. The term Cik will have to account for the possible time and number of arrivals during the transition interval. The execution time of a job is not observable until the instant of its termination. While the job is in the system prior to termination, each unit of time it spends must be weighted according to the expectation of W(te) with respect to a conditional probability distribution on teo The condition involved is a given value of accomplished execution time. If the job is in Q1 the appropriate weight per unit of time is 00 wl = W(te) d F(te) (5.15) 0 Similarly the weight per unit of time for jobs in Q2 and Q3 are respectively 00 A W(te) d F(te) w - 1______2__ (5o 16) 1 - F(e2) and X W(te) d F(te) ( w3.=._ _ (517) 1 - F(e3) provided we require that e3 be large enough that W(te) is constant for te > e3. This will be a restriction on the admissible values for e3. The weighting values wl, w2, and W3 allow us to determine y For any state i in Table 5.1 described by the vector (nh, n2, n3, a) except i - 1,

182 k ri = nl wl + n2 w2 + n3 w3 + wl - wk (5.18) for permitted values of k only and with the understanding that wV = 0. For i = 1, y1 = 0. The weight to be applied to the swapping and set-up time is also based upon the accomplished execution time of a job when it is again placed into execution. Because the set-up time may only occur when a job is first put into execution, and occurs independently of the required execution time i = -w(s + 0 S) (5-19) for all i for which k = 1 is a permitted action, except i = 1. We have 651 = O. For k f 1 and i having ~ = k and nk = 0, ik = 0; otherwise i = -wk s, k = 2, 3 (5.20) using only the values of k permitted in a given state i = 2,...N. The return associated with the execution time of a job may be accumulated in either of two ways during system operation. On the one hand one can accumulate the return on each execution pass received at the time the execution pass is completed. On the other hand, one can accumulate the return associated with all the required execution time at the time the job is first executed. This accumulates all the return at once, ahead of the time actually received. The two ways are equally acceptable as long as we only consider infinite-time optimization. The latter is more convenient, resulting in

183 00 = - te W(te) d F (te) (5 21) 0 for all i in which k = 1 is a permitted action, except i = 1. Again, 1- = Oy and moreover, ~i = i3 = 0 for all i in which k = 2 and k = 3 are permitted actions. Evaluation of the term Uik requires some fairly extensive considerations. First let us define X = number of jobs present in the system at the beginning of a transition from state i (5.22) This does not depend upon the action k taken at the beginning of the transition for this model. The number of possible arrivals during a transition from state i is (4-Xi), since each console may have only one job in system. Now aik is composed of the sum of separate returns for each of the possible arrivals. But because each user generates a job independently of other users or the state of the system, the return for each of the possible arrivals is the same. Therefore we can express k a. as 1 aik = -w1 (4- xi) Tik (5.23) k defining Ti as Tk = expectation of the time interval between occurrence of an arrival from one user and the end of the transition interval, The time interval is to be taken as zero if the arrival does not occur during the transition interval,

184 We will only illustrate the derivation of Tik, referring the reader to Appendix C for complete details. Consider the state i = 1. The transition interval ends at the first unit time step in which at least one arrival occurs, For T1 we need the expectation of the interval between the occurrence of an arrival and the end of a unit time step, conditional on there being at least one arrival from four users in the unit step. This is 1 - e 1.- e-4 - e (5.24) -4 e -' Now consider any other state and let N fi (T) = Pijk fi (T) T=1,2,*. (5.25) j=l be the unconditional probability distribution on the duration of the transition. Then T.k = fi (T) F x e dx - re T=l 0 because the arrival is independent of the transition duration. It is easily shown from this that 00 Tik = vik (1 - e ) fi (T) (5.26) T=1

185 The summation in (5.26) is simply the probability that the arrival occurs in the transition intervals It follows directly that for state i = 2 T21 = V2L - - (1-9) e s - @.e S) e2 e2, ^s -Xn -%(s+S)V -%n -(1 - ) e e f - e s) e f n=l n=l -X(s+ep) -\(s+e2+S)L -(l-G)(l-F(e2))e -.0 (l-F(e2))e Similar expressions are described in Appendix C for other states and actions 5 4 METHOD OF SOLUTION Finally we establish in this section some relations between the model and the theory of Chapters III and IV which provides the method of solution 5o4.1 Application of the Principal Theorem The only condition which must be investigated to ensure that the principal theorem, Theorem 4.2, is applicable is absolute convergence of the series 00 7 u(mji,k) m=l for every i and:k. In the discussion following Eq. (5.14), it is clear that the one-transition return b, results from the time jobs spend in system over a transition 1

186 interval. According to our definition of return for either criterion, a negative return results whenever there is at least one job in the system. Otherwise the return is zero. Although an explicit expression has not been derived for u(mli,k), it is clearly zero or has negative value for k all m, i, ko Because bi exists, the series must be absolutely convergent. For the infinite-time optimization problem it is therefore sufficient to treat only stationary deterministic (P') policies in seeking an optimal queue control policy. 5.4.2 Ergodic Policies For the computer model we have developed there is but one recurrent aperiodic class of states under any P' policy. This is justified by study of Table 5.1, showing the possible transitions. It is possible with any action to pass in one transition from any state i, except i=l, to another state j such that Xj < Xi (see Eq. (5,22))o Hence it is possible to eventually reach state 1 under any P' policy. Moreover state 1 communicates with state 2, and state 2 communicates with itself. Further, state 2 is aperiodic for state 2 can pass into itself in one transition with possible durations s, s+l,..,,s+S+e2 and these values have no common divisor greater than unity. Because state 2 is reachable from every other state, every other state is either contained in an aperiodic recurrent class including state 2, or is a transient state. By Theorem 4o3, section 4[3, VT(i) has a linear asymptotic behavior as T-+ oo, and

187 the gain g is the expectation of return in any unit step after a sufficiently long time of operation.has elapsed. 5 4.3 Interpretations of the Gain g The last statement above means that the system has a "steady state" or "equilibrium' behavior, as the term is used in queueing theory. We wish now to further explore the implications of this fact, to assist in interpreting the results in Chapter VI. The considerations to follow chiefly concern the physical interpretation of the value of g, as various one-transition returns are imposed on the model, including others besides those we have already described for the optimization criteriao Consider first the minimum response time criterion, Section 53.1,o Let (Y(t),t > O0 be the continuous-time random process where Y(t) is the number of jobs present for service at time to The system receives a return -Y(t) At for the time increment (t, T + At) for At arbitrarily small. The gain g is the expectation of the integral of -Y(t) over the unit step T for any value of T sufficiently large, g -^ E i - Y(t) dt}, T-*0o0 L T~ A. more convenient interpretation of g can be obtained however. Suppose we were to choose a single time point t sufficiently large to ensure steady state behavior The chosen point falls somewhere within a unit step, and for that step the expected return is go Define L as the expectation of Y(t),

188 L = E(Y(t)) t +o The ergodic behavior which leads to Theorem 4.3 ensures that g = -L. (5L27) A rigorous, albeit lengthy proof of this can be given, but we will consider this as intuitively clear. Next suppose that one unit of return is accumulated upon termination k of any job, i.e., when a job leaves the system. Then bi is equal to the probability of termination of the job executed during a transition from state i with action k. The gain g in this case is the probability of a job termination in any unit time step in the steady state. We shall use the symbol D to denote' this probability. For future reference, we should point out that optimization with this definition of return maximizes the expectation of the number of jobs completed in a unit step, i.e., the rate of job completions. For either of the optimization criteria we have proposed, the return of the decision process is the sum of the returns for all jobs processed by the computer system. In the formulation of the decision process the return for each job is viewed as being accumulated continuously with the passing of time spent in the system. In regard to the gain g of the process however, we can as well consider that all the return for a job is accumulated instantaneously upon its termination. Let us take this view, and let B denote the expectation of the total return for a single job terminated in any unit time step in the steady state. The

189 gain g, since it is the expectation of the-return in any unit step, is in this case g = B + (1 - D) (0) = B (5.28) This is evident inasmuch as $ is the probability of a job termination, and no return is received from a job until it terminates. Equation (5028).has important consequences. For the minimum response time criterion, it means that L = T TR (5.29) where TR expectation of the time-in-system (response time) for a job A. better physical picture of the performance of an optimal policy can be had from TR than from L, since it is response time that directly affects user reactions. However an even more informative way of describing response time performance under a given policy can be found using (5.28). The not ion that users requesting extensive computation can tolerate longer response time naturally suggests that one consider the expectation of response time conditional upon given execution time for a job4 The question arises as to how this can be determined from the model. Suppose we could determine Wi = expectation of time spent waiting in Qi by a job which reaches Qi level, including swapping and set-up time where applicable (i=1.,2,3 ) This result is representative of the ergodic properties typically presented in classical queueing studies, see Saaty [8], p. 116, and Morse [7], P- 75~

190 Let TR(te) denote the conditional expectation of response time. Then for O < t < e2: TR(te) = W + te (5.30) for e2 < te < e3: TR(te) = W + W2 + te (5o31) We are unable to determine the conditional expectation of waiting time in Q3 for any te in the range te > e3* However W3 is the waiting time on the less restrictive condition, te > e3. From this we can determine the expectation of response time, conditional only on te > e3, and this is TR(te > e3) = W1 + W2 + W3 + E(telte > e3) (5.32) With the required exponential tail on F(T), the execution time distribution, Eq. (5o1) gives E(telte > e3) = e3 + where u. is the parameter describing the tail. The values Wi(i=l,2,3) are easily determined for a given policy by making three separate passes through-the Value Determination Operation of the Howard algorithm. The required return on each pass associates unit value to every unit of time a job spends in Qi, i=1,2,3 respectively. Swapping and set-up time are also included when applicable. Let gi be the gain which is found for the decision processon these three runs (i=l,25). From (5.28) it follows that gi= W

191 The portion of all jobs which reach Q2 level is P2 = 1 - F(e2) = Pr(te > e2) Therefore g2 ='P2 W2 and similarly g3 = O3. W3 with P3 1.l F(e3) = Pr(te > e3) Thus upon computing g1, g2, and g3 the values W1, and W2, and W3 are easily found. The conditional expectation of response time as determined using (5o30), (5o31), and (5{32), will play an important part in interpreting the results in Chapter VI0 5.4.4 Restrictions on Solutions, The soiutions we will obtain will involve only the hyper-exponential distribution (5, ) for the execution time. The basic model will accommodate a wider class, however, subject only to the limitation (51o). For the hyper-exponential distribution it is necessary to choose e3 reasonably large in order to ensure that (5.1) accurately represents the behavior for te > e3o An adequately large value is given by 1 1000 cx ^e3 > ~'- loge.03 (5.33) There is a basic conflict in having to choose e3 large, because many

192 time-shared systems use rather small time allocation (a 0.5 sec) which we are unable to study due to this restriction. In order to permit smaller values of e3, we have chosen to study three particular distributions shown in Fig. 5.90 One of these, number 1, is the exponential distribution for which any value of e3 is permissible, as long as it is an integral number of 50 ms steps. The others are two-phase hyperexponential, obtained from (5-3) by simply letting 43 = P42 Then (5-33) is replaced by i 1000 i e3 > loge 0 (5-34) The parameters shown on Fig. 5.9 are for a mean value Te 1 sec. We will have occasion to consider other values of Te, obtained by changing p.1 and 42o We have already noted that e3 must be chosen large enough that the weight function W(te) is constant for te > e3. With the weight curves of Fig. 5o8 this may not be less restrictive than the result of (5034). 54-o5 Admissible Policies Because there are 42 states allowing two alternative actions, and 12 states allowing three alternative actions, the set P' consists of 2425312 different policies This number is well over a billion. Among the number there are six priority policies of special interest. One of these is the first-come, first-served policy which arises by assigning first priority to Queue 3, priority 2 to Queue 2, and priority 3 to Queue 1. In terms of the tabular presentation described in Fig. 2,4,

1.0.9 (2).8 (3).7 Distribution Parameters No. a a2 Al Aq 66.6^~~~ ^ I/~~~~~~1.5.5 1.0 1,0 VI +ICU -M"~~~~~~~~~~~2.85,15 7.2 0.17 0..5- 3.714.286 5.028 0.333.0 ~^ r /~~~~~~~~~~~~~~~~~.4.3.2 Normalized time T/T.1 0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 Fig. 5.9. Three distributions on execution time.

194 this policy is realized by choosing k=3 in every state having n3 > 0 or & = 3,choosing k = 2 in any other state for which n2 > 0 or & = 2, and k = 1 in the remaining states. The other priority policies are obtained by permuting the priority ordering among the three queues.

CHAPTER VI OPTIMUM CONTROL OF QUEUES IN THE MULTIPLE-ACCESS COMPUTER The formulation of the decision model for the multiple-access computer has required a substantial efforts This stems from the generality and realism which has been incorporated in the models In compensation, a variety of results can be obtained from optimization solutions computed using the Howard algorithm. A collection of such numerical results is given in this chapter. These solutions do not exhaust the potentiality of the model for the study of the queue control function in multiple-access computers. They are sufficient however to provide better understanding of the proper design of queue control policies. Moreover they illustrate the versatility of the Semi-Markov decision process for the general study of such problems. We will first describe the various conclusions reached relative to the criteria introduced in Chapter Vo Then we will comment on the application of the results to physical systems, and possible extensions of the model. 6ol MINIMIZATION OF RESPONSE TIME 6o.l l Negative Exponential Distribution A natural starting point for the study of the computer system is to assume a negative exponential distribution on execution time, as shown in curve (I1), Fig. 5o9o Under a first-come, first-served scheduling 195

196 policy,l the gain g of the system can be approximately determined from the solution of a classical continuous-time queueing model, namely the machine repair model [10. This closed form analysis serves to check the accuracy of the formulae for the decision model. The comparison of the classical and the computed results also points out the degree to which the discrete-time model approximates the continuous-time model, In order to carry out the comparison, an interpretation is needed for the gain g in terms of the performance measures obtainable from the classical analysis. As discussed in Chapter V, section 5.4, g = -L, with L = expected number of jobs in the system at any time point, The quantity L is easily obtained from formulae given in [10] for the continuous-time model, Table 6.1 lists a number of situations in which the numerical computation for the discrete-time model has been compared to the closed form continuous-time solution,2 The discrete-time model involves additional idle time for the server (i.e. the processor) above that of the continuous-time model, This is because a new service is not begun on the discrete-time system until the end of a unit step, even though a service completion may terminate at any time during the unit step. Hence it is expected that L will usually have a larger value for the discretetime system for a given set of system parameter values. The close coriSee section 5o.45 o See Table 5.2, section 5o2.3 for parameter definitions.

197 TABLE 6 1 COMPARISON OF THEORETICAL GAIN OF CONTINUOUS-TIME MODEL AND COMPUTED GAIN OF DISCRETE-TIME MODEL: FIRST-COME, FIRST-SERVED POLICY; DISTRIBUTION la Parameter Values Values of -g = L T Te S G S Continuous -time Discre te-time sec.) (sec.) (sec.) (___ sec.)__ 5 1 o 15 - o1160o8 1-13093 5 1 50 0 - 1.40244 1.41543 30 10.15 0o6 Oo 1.66925 1 67088 0 15 0 0o16255 0o16585 30 4 1 7 1 0.76472 0 76763 30 5 o15 0.3 10 1031688 1.31908 20 o05 o05 1,0 100 3.80020 3579997 aSee Figure 5,9

198 respondence of values gives confidence that the correct formulae have been implemented in the computer algorithm, and that the discrete-time model is a very good approximation to the continuous-time model for the first-come, firstsserved policy. The next question is, what is the optimal policy for minimizing the response time? A fairly good heuristic argument can be advanced that the optimal policy is first-come, first-served for any set of system parameter values with a negative exponential distribution on execution time. The argument begins with the intuitive notion that in order to minimize the expected response time the system should maximize the average rate of job completionso This means that the control policy should always select for execution the job having the smallest expectation of remaining service time, i.e. the sum of remaining execution time, swapping, and set-up timeo But a characteristic of the negative exponential distribution, as seen in section lo3ol, is that the expectation of the remaining execution time is the same regardless of how much execution a job has already had~, Then because swapping and set-up time are added to determine the service time of a new arrival, the remaining service time of a job being executed always has smaller expectation than that of any arriving jobo So once started, a job should be run to termination, and the first-come, first-served policy is optimal. A substantial proof of this, stemming from a closed form analysis, can be given for the case of a single server priority queueing system

199 with an infinite source of Poisson arrivals, see [3,52]o But this proof cannot be extended to the case at hand. The decision model however provides the means to rigorously establish this for a given set of parameter values. By investigating a large number of cases, a firm basis can be established for claiming it as generally true, This has in fact been done for the cases listed in Table 6.1. In all of these the computed solution of the decision model revealed that the first-come, first-served policy was an optimal policy for scheduling. Hence we venture the Conclusion: For any values of system parameters with a negative exponential distribution on execution time, a first-come, first-served policy is optimal for minimizing the expected response time to any command. 6o 1 2 Non-Exponential Distributions The optimal policy for an exponential distribution does not depend on the assigned values of e2, e3, and q. Such is definitely not the case for any other distribution~ There is in fact a two-fold optimization problem involved in determining the optimum values of these parameters and the optimum policy associated with this assignments We can only accomplish the first part of this dual problem by a trial and error search among the possible values for e2, e3, and qo Thus a number of optimization solutions will be needed. ao To begin, let us ignore the influence of set-up time by setting e = 0O The intuition used for an exponential distribution still seems valid, suggesting that one should always be executing the job having

200 smallest expectation of remaining service time, With distributions such as (2) and (3) of Fig. 5,9, the expected remaining execution time increases as a job is executed. Figure 6.1 illustrates this increase. Hence it appears that the optimal policy will give preference to execution of a Qi job, ie, a new arrival, before execution of a Q2 or a Q3 jobo This statement assumes that swapping time is reasonably small compared to mean execution time, as it will be in the system being considered here for, say, Te > 0,5 seconds, This intuition is in fact borne out by solutions which have been obtained for distributions (2) and (3) of Fig. 5,9. Tables 6,2 to 6,4 list the pertinent data obtained from some of the optimization solutions, In all cases the optimal policy has a priority structure, Whenever a scheduling decision occurs, the selection of a queue from which the next job is taken is according to the priority ranking. Priority 1 indicates first preference, and we use for example the notation "1-3-2 PR" to describe a policy in which levels Ql, Q2, Qs are ranked 1, 3, and 2 respectively. As expected, the Qi level is preferred in the optimal policies obtained. To whatever extent it is possible, it is desirable to extract from the data a general concept governing the optimal time allocation and the optimal policy. Inasmuch as each solution is valid only for its particular parameter values, this cannot be accomplished for every conceivable set of parameter values, But a number of solutions, as given in Tables 6.2 to 6.4, should be sufficient to achieve this for parameter values

6. 0 Distribution 2 5.0. W /0 4. 0 Distribution 1 1. 0 Execu Distribution time 0 I I I l I o / 0,1 0.2 0.5 1.0 2.0 5.0 10.0 Fig. 6.1. Increase of remaining execution time with execution time received.

202 TABLE 6,2 MINIMIZATION OF RESPONSE TIME WITH DISTRIBUTION 2 Fixed Parameter Values Tu Te s S (sec) (sec.) (sec.) (sec ) 30 4 0.15 0 Variable Parameters e2 e3 q Optimal (sec.) (sec.) (sec.) Value of - g Optimal Polic 0.6 6.0 0o05.81307 1-3-2 PR 2.0 600 0.05.68447 1-3-2 PR 3.5 6o0 0.o05 66912 1-3-2 PR 4.0 6o0 0005 66713 1-3-2: PR 4.25 6.0 o005.66902 1-3-2 PR 3.5 6.0 0.15 o66994 1-3-2 PR 4.0 6.o0 015 66969 1-3-2 PR 4.25 6.0o 0.15 66985 1-3-2 PR 2o0 6.0 0.50.68782 1-3-2 PR 4.0 6o0 0o50 o67254 1-3-2 PR 4.25 705 0.05 o 67254 1-3-2. PR

203 TABLE 6,3 MINIMIZATION OF RESPONSE TIME WITH DISTRIBUTION 2 Fixed Parameter Values Tu Te ~ s S (sec ) (seco) (sec.) (sec.) 20.1 0.15 0 Variable Parameters e2- e3. q Optimal (sec.) (sec.) (sec.) Value of -g Optimal Policy 0,75 150.05 2606 5.1-3-2 PR 1o0 1.50 o05 260 2 1-3-2 PR 1.25 1.50 05.26065 1-3-2 PR 1.0 1 0.1o5.26098 1-3-2 PR 10o 2.50 o 05 26243 1-3-2 PR

204 TABLE 6 4 MINIMITZATION OF RESPONSE TIME WITH DISTRIBUTION 3...._____Fixed Parameter Values Tu Te s S (sec) ( sec) (e) ________ (sec) 30 4 Oo 15 0 Variable Parameters e2 e3 q Optimal Polcy C) Value^ of Optimal Policy s(ssec).(sec) Value of -g 3-75 6.o.05.66952 1-3-2 PR 4o0 6.o o 05 o 66928 1-3-2 PR 4.25 6.0.o5.66930 1-3-2 PR 4.0 6 0. 15.66992 1-3-2 PR 4,25 7 5 05 67551 1-3-2 PR

205 which are physically appropriate for contemporary time-shared computers. In Tables 6,2 to 6.4, two appropriate values of the mean user reaction time, Tu, and the mean execution time, Te, are represented0 The two distributions, as seen from Figs, 5.9 and 5~5, seem appropriate also. For all cases given in Tables 6~2 to 6~4, the optimal policy is 1-3-2 priority, In order to consider the time allocation, let us observe the values of the gain g giveno The minimum value of e3 permitted in the model for either distribution is 15 Teo With this value of e3, it appears that g is maximum when e2 = Te and q is as small as permitted, namely one time step or one clock interval. The gain is apparently more negative for larger values of qo Moreover if e3 is increased, the gain also decreaseso This suggests that the optimum time allocation is e2 = Te, e3 = minimum permitted value for the model, q = minimum value permitted physically, iLe. one clock intervals There is ample intuitive justification for this result. We see that with a l-3>2 priority policy there is never more than one job at the Q3 levels Hence the above choice of q does not involve swapping between Q3 jobs. Rather it allows the system to rapidly preempt a Q3 job to begin execution of a new arrival at the Qa level. This is consistent with always serving the job of smallest expected remaining service time, A Q3 1The reasonableness of these values is verified by the first extensive data to come out of MIT^s Project MAC, contained in the PhODo thesis by Allan Scherr, "An Analysis of Time-Shared Computer Systems", June, 19650

206 job therefore continues to receive execution as long as there are no Q1 jobs present. This is expected based on Fig. 6.1. There it is seen that for either distribution the expected remaining execution time is essentially constant after a job has received execution time totaling Te. This means that jobs which have received Te should thereafter be processed one at a time to completion assuming no jobs present which have received execution time less than Teo Indeed, the 1-3-2 priority policy and optimal time allocation accomplish this precisely. We may further conjecture that it would be beneficial to preempt execution of a Q2 job to begin execution of a new arival. This would lead to a queueing stracture with effectively only two priority levels. A lower level job would be preempted by an arrival, and the new arrival executed for a maximum time Te before being placed in the lower queue. This corresponds to the time allocation e2 = Te, e = Te + one clock interval, q = one clock interval, Such an assignment is not permitted however if our model using the twophase hyperexponential distribution is to retain the Markov property, see Eqo (5-34). Hence we are not able to study the system with e3 < 1.5 Te for this distributiono There is some reason to believe that a further improvement in gain could be obtained were this not the case, bo The assumption 0 = 0 is valid for a physical system in which command programs are not relocatable in memory, or where the central

207 processor hardware can carry out dynamic relocation [28,29]. The former is not often the case and systems capable of the latter include embellishments not present in our model. So we now consider the effect of the swap time s and the set-up time S, aiming to discover how these values would alter the result found above. A given value of set-up time will have its most prominent effect when 8 = 1, so let us assume this value, Then every command requires set-up before commencing execution, The set-up therefore increases the expectation of remaining service time for new arrivals. We would anticipate that the optimal policy remains the same (1-3-2 priority) for small S and changes as S approaches the expected remaining execution time for jobs drawn from the exponential tail of F(T), Table 6.5 gives a set of solutions which verify this supposition. A value of S in the neighborhood of 1 second or less seems appropriate for the computer system we are modeling, There is little point then in considering Te greater than 1 second, for the optimum system will not be different than the case for E = 0. Table 6.5 shows that even with S = 1 second, Te = 1 second, the optimum system remains basically as before Table 6.5 however shows that the optimal policy undergoes a change for S greater than 1 seconds For S = 5 seconds, a first-come, firstserved policy (denoted FCFS) is optimal, and hence the time allocation has no effect on the system gain, (The last entry in the table illus

208 TABLE 6.5 INIMNIZATION OF RESPONSE TIME: DISTRIBUTION 2 Fixed Parameters e 63 T Te s (sec) j (sec) (sec) (sec) 1.5 20 1 0.15 1.0 Variable Parameters S e2 q Optimal | (sec) Value of g Opt iml Optimal Policy (sec) (sec) sec) Value of -g _ 1. 50.05.48932 1-3-2 PR 1 1,0 05.48526 1-3-2 PR 1 1 45 005.48628 1-3-2 PR 1 050. 50.49113 1-3-2 PR 3 o85.05.96092 1-3-2 PR except k = 3 in state (1,2,0,3) 3 1,25.05.96043 1-3-2 PR except k = 3 in state (1,2,0,3) 3 1.45 o05.96052 1-3-2 PR except k = 3 in state (1,2,0,3) 3 1.45.50.96054 1-3-2 PR except k = 3 in state (1,2,0,3) 3 1.45 1 50.96149 1-3-2 PR 5 o85. 05 1.40186 FCFS 5 Io 4.5 05 1.40186 FCFS 5 1, 0 2 5 1 40234 FCFS

209 trates the extent of round-off errors in the SEMSIN1 computation, The value of gain should theoretically be identical for every time allocation ) Table 6.5 also illustrates a phenomenon requiring explanation. The optimal policy found for S = 3 seconds has the optimal control action depending upon the number of jobs in the queues rather than on simply the presence of jobs, Thus k = 3 is the optimal action for state (1,2,0,3) but not say for state (2,1,0,3). A principal advantage of a Semi-Markov decision model is that it allows one to discover when an optimal queueing system has this structure. However in this case we can offer no intuitive explanation for what the optimization procedure has claimed to be an "optimal" policy. The possibility arises that computational errors have in fact led to a sub-optimal policy. Investigating further with Eqo (4920) in mind, and denoting the "optimal" policy by A, we have found that max max bB A 002 TBEP' iER(A,B) i at termination of the iteration, for any set of parameters given in Table 6.5 with S = 3 second. Moreover, the computation for the "optimal" policy gives max xA = 3.5 x 103 and so if we assume that 1See Appendix < 10 LSee Appendix Do

210 we can use Eq. (4,20) to estimate the gain g of any other policy B. With the minimum mean transition time equal to 4 time steps, (4.20) yields B gA <_ o002y + 3(.035)/ =.028 -.002 It is quite conceivable then that the "optimal" policy is in fact suboptimal, and this is the explanation we propose. We presume that the optimal policy is 1-3-2 priority, as before. The swapping time s will have an identical effect upon the optimal policy as the set-up time So For small s, the optimal policy is 1-3-2 priority. As s approaches the expected remaining execution time on the tail of F(T), it brings' about a significant extension of the expected service time for a Qi job in comparison to the expected remaining time for a job being executed. Hence the optimal policy should eventually change to FCFS, as occurs for large S. Such large values of s are not appropriate however for a drum secondary memory. We are led finally to this conclusion: Conclusion: For execution time distributions F(T) shown as (2) and (3) of Fig. 5.9, and values of swap time s and set-up time S which are no longer than the mean execution time Te, an optimal scheduling policy for minimizing the mean response time to any command is described by: a, A 1-3-2 priority ranking for selecting jobs from queues 1, 2, and 3 respectivelyo bo A maximum execution time allocation of Te, for a job from Q1; 0.5 Te, the minimum permitted value in the model, for a job from Q2;

211 one clock interval, for a job from Q30 6.1.3 Equivalent Criterion The intuitive explanations put forth in the preceding paragraphs to justify the result of the computational procedure have relied on the idea that minimization of mean response time is equivalent to maximization of the mean rate of job completions. No rigorous proof of this was given and indeed it may be extremely difficult to do so. The computation itself however can provide additional confidence in the validity of this notion. k For the model allows us to define bi so as to collect a unit of return at completion of every job. Maximizing g for this definition of return maximizes the mean rate of completions. By using the optimization algorithm one can show that the optimal policy and the optimum time allocation are the same as previously obtained. This has been done, and Table 6.6 gives the value $ which is the mean rate of job completions, or equivalently, the probability of a job completion in any unit step. TABLE 6.6 VALUES OF $, MEAN RATE OF JOB COMPLETI.ONS FOR THE OPTIMAL SYSTEM (sec),se Dist,(ec) |, s ec. Optimal Polic sec> (sec)_ (sec), (sec) 20 4 2 0.15 0 o 0007411 1-3-2 PR 20 1 2 0 o 15 0 - 009339 1-3-2 PR 20 i 2 0o 5 1 1.oo00877 1-3-2 PR

212 6.l.4 Comparison of Time-Sharing Policies The discussion of section 5.4, Chapter V, has indicated how we may proceed to compare the optimal scheduling control policy we have found to the time-sharing policies now being used in actual systems. The comparison is in terms of the conditional expectation of response time to a command, given the required execution time. This can be accomplished for all execution time values te in the range 0 < te e3. We can also compare the expectation of response time, on the condition that te > e3. There are two basic policies now in use to which we wish to compare the optimal system. First is the round-robin procedure. This corresponds in our model to a 2-1-3 priority policy with e2 having any value less than e3, and with q = e3. (In contrast to the round-robin procedure discussed in Chapter I, the above places new arrivals at the head of the round-robin queue.) Because of the restrictions of the model we can only study "quantum" values of at least 1.5 Te, The second basic policy is a 1-2-3 priority policy, corresponding basically to the multi-level priority procedure as used at Project MAC [18] and Lincoln Lab [21]. The time allocation espoused by the developers of these systems involves increasing the maximum execution time allocation as the priority ranking of a queue decreases. This is distinctly different from what we have found. A time allocation of e2 = 0~5 Te e3 = 1.5 Te q = 2 Te

213 will provide execution intervals increasing by a factor of 2 with each decreasing level, as used at MAC. This will be called priority policy A. But we will also investigate the allocation e2 = 0.5 Te e e3 = 1.5 Te = e2 referred to as priority policy Bo This time assignment allows a Q1 job to supercede a Q3 job only after the latter has received as much execution time as the former may take. This notion has also been advanced by time-sharing pioneers as preventing the preemption by high priority jobs from interfering too much with the advance of lower priority jobs. Figures 6.2 and 6.3 give the conditional response time for these three scheduling control systems and the optimal system we have found. A noteworthy observation is that the optimal system gives faster response to the "trivial computation", i.e. tetO, than any of the others. By the same token, the expected response time for very long computations (te>e3) is larger for the optimal system except in comparison to priority policy Bo In every case, a user requesting a long computation can lock forward to a significantly long response time. 6.2 MINIMUM WEIGHTED RESPONSE TIME The intuitive reasoning which justifies the optimal policy computed for the preceding criterion should hold even more strongly when ighting is applied to response time, as shown in Figo 5.80 Because jobs taking

214 24 (50. 5 sec. ) 22 / Expectation of response 20 time for jobs over 6 sec. duration 18 16 Type e2 e3 q 4A 1-2-3 PR 2 6 8 B 1-2-3 PR 2 6 2 -X R-R 2-1-3 PR - 6 6 12 Opt 1-3-2 PR 4 6.05 12 o a),10 0 _ (48. 7 sec.) X ^^ (49.0 sec.) 06 X > (52. 8 sec.) 2/ ^ /^ ~Optimal system 0 1 2 3 4 5 6 Required execution time-sec. Fig. 6.2. Conditional expectation of response time for distribution 2, Tu = 20 sec, Te = 4 sec, e = O, s = 0.15 sec.

215 Control Policy Data Type e2 eg q A 1-2-3 PR 0.5 1.5 2.0 B 1-2-3 PR 0. 5 1. 5 0. 5 R-R 2-1-3 PR - 1.5 1. 5 Opt 1-3-2 PR 1.0 1. 5.05 5. Optimal (12. 2 sec. 4(12.2 sec. o^~~~~~~~~~~~~ ~~~~~PriorityA (12. 1 sec.)? Priority A (12. 9 sec.) ~. 3'obix" (12. 1 sec. ) vfoviVJ Expectation of response time for jobs over 1. 5 sec. 2. C. * --?~~ -~t~ duration 4 1. I I I I I i 0 0. 25 0. 50 0.75 1.0 1.25 1.50 Required execution time-sec. Fig. 6.3. Conditional expectation of response time for distribution 2, TU = 20 sec, Te = 1 sec, e = I, S = 1 sec, s = 0.15 sec.

216 longer to execute receive less weight, it is even more imperative to execute shorter jobs whenever possible. One expects that this may lead to a smaller value of e2 than found in the previous case. The value of e3 should again be the smallest value permitted by the model. With this criterion however the minimum permitted value is determined by both the weighting curve and the hyperexponential execution time distributions. By contrast, only the latter was pertinent with the minimum response time criterion. The smallest value of e3 to be studied is given by e3 = max (4 seconds, 1.5 Te) The value 4 seconds is-established by weighting curve (1) of Fig, 5.8. Tables 6o7 to 6,10 present data pertinent to minimizing weighted response time. Among the three distributions under study, the negative exponential produced a distinctly different result under the previous criterion. This is not so here, for similar results hold for distributions (1) and (2) and can be expected to hold also for (3). The optimal policy is 1-3-2 priority in almost all cases, as it was under the previous criterion. Table 6.7 indicates that for Te = 1 sec, the optimal value of e2 is slightly larger than Te, where as for Te = 4 sec, the converse is true. This is partially, if not completely, explained by the fact that in the first instance e3 is considerably larger compared to Te than in the second instance. The optimum value of q is one clock interval, as before, in all but one of the interesting cases.

217 TABLE 6 7;.. -::-INIMIZATION OF- WEIGHTED RESPONSE TIME: WEIGHTING CURVE 1, DISTRIBUTION 2 Fixed Parameters e (sec) (sec) 30 0.15 0 Variable Parameters Te e2 e q Optimal OptimalPolicy (sec) (se) (sec) (sec) Value of -g ____...... 1 1 4 0 05 o4656 1-3-2 PR 1 15 4.05 o04585 1-3-2 PR 1 2 5 4.05 o04616 1-3-2 PR 1 1 5 4.15 o04609 1-3-2 PR 1 1,5 4.50.04683 1-2-3 PR 4 2 6 o05.08062 1-3-2 PR 4 3 6.0.5.07822 1-3-2 PR 4 4 6 o05 07885 1-3-2 PR 4 3 6.15.07909 1-3-2 PR 4 3 6 50.08208 1-3-2 PR

218 TABLE 6 8 MINIMIZATION OF WEIGHTED RESPONSE TIME: DISTRIBUTION 2.___..__.. Fixed Parameters_ TU Te s. e3 (sec) (sec) (sec) (sec) 20 4 0.15 6 0 Variable Parameters e2 Optil Opttimalptimal Policy (sec) (sec) Value of -g Weight Curve 2 |2 |,05.o 13690 1-3-2 PR 3. 05 o.3314 1-3-2 PR 4 o 05 o13439 1-3-2 PR 3 o- o15. 13502 1-3-2 PR Weight Curve 4 2.5. 05.15751 1-3-2 PR 3 o 05 o 15572 1 —2 PR 4..c 05 o 15710 1-3-2 PR 3 15 1578 1-3-2 PR 3 o 50.16518 1-3-2 PR

219 TABLE 60 9 MINIMIZATION OF WEIGHTED RESPONSE TIMEo DISTRIBUTION 1 Fixed Parameters Te. s e (sec) (sec) 4 0o15 0 - Variable Parameters______ T. e2 es q Optimal Optimal Policy sec) ) ( ((sec) (Value of g Optimal Polic Weight Curve 1 20 2'6 05 o 09837 1-3-2 PR 20 3 6 05 09639 1-3-2 PR 20 4 6 05.09899 1-3-2 PR 20 3 6.15 09657 13-2 PR 20 36. o0 -09701 1-2-3 PR: ~~~:~...~. X~. ~except k = 3in (0,1,0,3) and (0,2,0,3) 20 5 4 05.o9613 1-3-2 PR 20 2.5 4 05 o 09024 1-3-2 PR 20 3 75 4 05 o 9453 1-3-2 PR 20 2 5 4 o15 09053 1-3-2 PR 20 2,5 4 50 o 09155 1-3-2 PR 30 2 6.05 o06322 1-3-2 PR 30 3 6 05.06226 1-3-2 PR 30 4 6 05 0 o6364 1-3-2 PR 30.3 6 15.06236 1-3-2 PR 30 3 6.o0 o06268 1-3-2 PR Weight Curve 4 20 1 5 6 005 15919 1-3-2 PR 20 3. 0 6 05 o 14913 1-3-2 PR 20 3~.5 6 o 05 15:132 1-3-2 PR 20 3 6 o1l5 ol4876 1-2-3 PR 20 3 6.o50, l4944 1-2-3 PR

220 TABLE 6.10 MINIMIZATION OF WEIGHTED RESPONSE TIME: WEIGHTING CURVE 1, DISTRIBUTION 2' ______________________ Fixed Parameters__ " Tu Te e3 s e (sec (sec ) I.I _ (sec)....I (sec).... 30 1 4 0.1 1,0 Variable Parameters S e2 Optimal - | Optimal Policy (sec) (sec) (sec) Value of -g_____..... 1 0.85.05,15924 1-2-3 PR 1 1,0,05.15913 1-2-3 PR 1 | 1.5 05.15909 1-3-2 PR 1 2.0.05.15949 1-3-2 PR 1 1.5.15.15919 1-2-3 PR I O1.0.50.15990 1-2-3 PR 3 0.85.05.38763 1-2-3 PR except k = 3 in (0,0,3,2) 3 1.0.05.38768 1-2-3 PR except k = 3 in (0,0,3,2) and (0,0,2,2) 3 1 o5.05.38856 1-2-3 PR except k = 3 in (0,0,3,2), (0,0,2,2), (0,0,1,2) and 3 1.0 5 09147.3~ PR9(0,1,1,0) 3 1.0.50.39147 1-2-3 PR

221 Tables 6.9 and 6.10 exhibit again the kind of behavior recorded in Table 6.5, That is, for certain parameter values the optimal policy is in process of changing from one priority assignment to another, and the algorithm converges on the suboptimal policy which represents a mix of the control actions from each, This is especially clear in Table 6.10 for S = 3 sec, As e2 increases, the exceptional states become more numerous. One expects that for larger e2, the algorithm will produce 1-3-2 priority as optimal. Interestingly, it appears that for S = 3 sec the optimal policy is 1-2-3 PR rather than FCFS as found under the previous criterion. It is expected however that as S becomes much larger the increased service time for Q, jobs will offset the greater weighting they receive, and FCFS will again become optimal, Once again it is desirable to extract a general conclusion regarding the optimal system from Tables 6.7 to 6.10o This is not as straightforward as before, considering for example the exceptional case observed in Table 6.9 for weighting curve 4. Taking account of the relative lack of sensitivity of the system gain to departures from optimality, we feel justified in stating this Conclusion: For the execution time distributions in Fig. 5.9 and the weighting curves in Fig, 5.8, and for swap time and set-up time values smaller than Te, an excellent suboptimal policy for minimizing weighted response time is given by ao A 1-3-2 priority ranking for selecting jobs from Qi, Q2, Q3 respectively,

222 b, A maximum execution time allocation for Q1 jobs approximately equal to Te; for Q2 jobs, the smallest value permitted by the model here; for Q3 jobs, one clock interval. As before, we anticipate that further improvement in performance relative to this criterion is possible for smaller values of e3, which cannot be studied through this Markov model, Table 6.11 gives a comparison of the system gain for the optimal scheduling system, and the three other time-sharing policies described in section 6,.1,4 6.3 SENSITIVITY OF SOLUTIONS A very encouraging outcome of the optimization solutions is evidence that the optimal system is relatively insensitive to changes in the parameter values. One must admit that parameter values are not often known precisely, and may also change slowly with time during the normal evolution of an operational system. Our investigation gives a good basis for confidence in the practical implementation of an optimal system, At any moment the system might not be truly optimal due to small perturbations of parameter values, but an excellent suboptimal policy could apparently be maintained with rather infrequent adjustment. The data presented show that fairly broad changes in Tu, Te, s, e, and S are tolerable~ Moreover there is ample evidence that the optimal scheduling systems for distributions (2) and (3) are identical under

223. TABLE 6.11 COMPARISON OF SYSTEM GAIN g Parameter Case 1 Value Case 2 Value Tu 2 s20 sec 20 sec Te 4 sec 1 sec s 0.15 sec 0o15 sec 0 0O 1 sec S 01 sec F(T) (2) (2) Policy e2 e3 Cq Parameters Criterion Policy ( sec) (sec) (sec) Values Round-robin - 6 6 Case 1 MRTa -1 1044 Priority A 2'6 8 Case 1 MRT -1 1109 Priority B 2 6 2 Case 1 MRT -1. 0780 Optimum 4 6 o05 Case 1 MRT -1 0341 Round-robin - 6 6 Case 1 MWRT -.21703 Priority A 2 6 8 Case 1 MWRT - o22561 Priority B 2 6 2 Case 1 MWRT -.14790 Optimum 4 6.05 Case 1 MWRT -.09639 Round-robin - 1.5 15 Case 2 MRT - 49525 Priority A 0 1 5 2.0 Case 2 MRT - 49565 Priority B 0, 5 1 5 0.5 Case 2 MRT -.49967 Optimum 1 1.5.05 Case 2 MRT -.48526 Round-robin 4.0 4.0 Case 2 MWRT - 27086 Priority Ab 1,25 4.0 5,0 Case 2 MWRT -.26051 Priority B 1.25 4.0 1.25 Case 2 MWRT -24588 Optimum 1 5 4.0 05 Case 2 MWRT -.24070 aMRT signifies minimum response time criterion, MWRT denotes minimum weighted response time as per curve 1 of Fig, 508. bBecause e3 is constrained here to be larger than Te, we take e2 larger than indicated in section 6o1.4 for priority policies A and Bo

224 either of the criteria we have proposed. These are significantly different distributions, but more closely represent what one can expect in time-sharing systems than does the exponential distribution. For this reason we discount the importance of the result obtained for the exponential distribution in minimization of response time. Thus we conclude that the optimal system is fairly insensitive to the form of F(T) within the limits of what one expects for actual systems. Most significantly, the optimal scheduling control systems are nearly identical for.both criteria, including the three weighting curves given in Fig. 5.8. To be sure, the restriction which the Markov model imposes on the choice of e3 accounts for this to some degree. Nevertheless, the similarity of results is good evidence that the system designer need not be overly concerned with achieving a precise specification of the relative weight associated with response time as a function of required execution time. Inasmuch as largely qualitative considerations would be used to obtain a specification, this is a valuable result in our approach. 6.4 APPLICATION AND EXTENSION OF RESULTS In order to come finally to a scheduling control system suitable for actual practice, we should eliminate the restrictions of the model to whatever extent the data justifies, Taking advantage of the lack of sensitivitivy of the solutions and using a rather conservative conjecture on the improvement possible through reduction of e3, we can suggest a near optimal system suitable for either criterion, This is a two priority

225 queue system as previously described in section 6.lo2. New commands are entered in the first priority queue, and a new command preempts a lower priority job at the end of one clock intervals A first priority job is executed for maximum time Te and then placed at the end of the lower priority queue. In the absence of higher priority jobs, the lower priority queue is served on a round-robin basis for a "quantum" of execution time which is larger than Te, say 2Teo The last feature is not suggested by the optimizationi results from the model. Although the model was designed to include round-robin service on the last queue, the 1-3-2 priority policies found actually circumvent this. Yet this seems necessary on practical grounds, to ensure that extensive execution time is not devoted to commands which are in error, The above system is very similar to the scheduling procedures currently in use in time-shared computers. The striking difference is the emphasis upon rapid preemption of lower priority jobso Existing systems permit preemption only after a significant delay, and this degrades the response time obtained for trivial computations. Indeed, the predominance of priority policies in our results suggests that existing multi-level priority systems have the basic structure needed for an optimal system under either criterion we have studied. The critical element requiring further analysis is the optimum time allocationm

226 As already mentioned, there is good reason to believe the optimalsystem could be improved by studying smaller values of e3 than allowed by the model including hyperexponential distributions. If interest remains only on two or three queue systems, this cannot be accommodated by the Markov modeling approach. By passing to systems with more than three levels however one can obtain whatever advantage there is to be gained. The limiting consideration is the number of states in the extended model, and the attendant difficulty in deriving formulae for transition probabilities, etc. Table 6.12 illustrates how the number of states increases with the number of queue levels and the number of consoles. Of course, inasmuch as our results have simplified the systems which need be considered, one could also use a simulation model to study smaller e3 values. A largely different Semi-Markov decision model will be needed to handle a large number of consoles (50 or more), as systems will have in a few years. TABLE 6.12 NUMBER OF STATES FOR THE COMPUTER MODEL Number of Consoles Number of Queues Number of States 4 3 75 4 4 175 6' 462

227 Another short coming of the model which requires further study concerns the distribution of the user reaction time, i.e, the arrival time ta. We have assumed a negative exponential with mean Tu. However the distribution observed in practice is likely to be more accurately modeled by a hyperexponential or Erlang distribution. In order to make these practicable we can use the method of exponential phases ([7] po 19). Using say a two-phase Erlang, the state of the system could be described by variables, a = number of consoles in second phase of arrival time, n1, n2, n3 number of jobs in Q1, Q2, Q3, respectively, I = a variable indicating the status of main memory. Now, however, the value of a is not observable for decision making. Thus we enter a new theoretical area, decision processes without complete observability of the system state, Extensive theoretical work will be needed to determine how such processes may be optimized.

CHAPTER VII CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH The efforts in this research have ranged over the spectrum of activities involved in the application of Semi-Markov sequential control processes to systems with queues. These include theoretical developments, computational methods, and modeling and optimization of physical systems. The results in each of these areas are promising and recommend this technique for use by system engineers concerned with stochastic service systems. The theoretical work has dealt with a discrete-time Semi-Markov decision process, and the optimization of the return from this process operating over infinite time. Beginning from an appropriate definition of optimization, we have established under quite general conditions that there exists a particularly simple optimal control policyO Such a policy is described as stationary and deterministic, in that a unique control action is associated with observation of each of the system states, for all time. Moreover this policy is optimal for any initial condition on the beginning of the process. This original result has important implications of a practical nature, for it simplifies both the computation of the optimal policy and its implementation in a physical system. The computational method exploited in this study is the algorithm devised by Howard and Jewell, limited to ergodic policies. Our experience 228

229 with this procedure has brought to light the effects of computational errors upon it, previously not described by its originators. Without resorting to special computing techniques to increase precision, the algorithm has been quite successfully applied to a 75 state model for a multiple-access or time-shared computer. Occasionally, evidence has been found of improper convergence, but the most unfavorable result of this has been to mask the structure of the true optimal policy. More importantly, computational errors seem to practically eliminate the discovery of equivalent optimal policies. Although the potential applicability of Markovian decision processes to stochastic service systems has been recognized, our effort in this direction is, we believe, the first serious attempt to exploit such models, The results are largely encouraging. Using the discrete-time SemiMarkov control process, we have formulated a model for the scheduling control function in time-shared computerso It is the most realistic stochastic model, other than simulation models, which has been proposed for these systems' Two alternative optimization criteria have been proposed for the model, both concerned with the response time given a user's command by the time-shared computer. The effectiveness of the Howard algorithm has allowed us to economically determine optimal scheduling systems over an appropriate range of system parameter values. Two important conclusions have been reached relevant to current timesharing practiceo Although the Semi-Markov decision model includes a

230 wide class of scheduling policies, it has been found that priority policies are predominant in the computed optimal policies. Hence existing multiple priority scheduling algorithms have the basic structure needed for an optimal system. Further study is needed regarding the maximum execution time allocation for each priority level. In this respect our results also show a need to have almost immediate preemption of a low priority job upon the arrival of a high priority job. In existing systems, preemption can occur only after a significant delay. The computer model described is limited to three queue scheduling systems. Jobs are placed in the last queue after receiving total execution time significantly larger than Te, the mean execution time of the population of jobs. Subject to these limitations, our results lead us to propose a two priority level scheduling system, A new arrival would be given first priority and would preempt any second priority job at the end of one clock interval. First priority jobs would receive maximum execution time equal to Te before being relegated to the second priority queue. Round-robin service would be given to the lower queue, with a quantum duration much larger than Teo While seeking a comparison of the computed optimal scheduling system with existing time-sharing procedures, we have discovered another important advantage of the decision process formulation. This involves the freedom of using alternative return functions for the model with a specified control policy. The Howard algorithm provides an analysis of per

231 formance relative to the given returns, and thus one can study various aspects of performance which are difficult to determine in any other approach. With this capability we have compared the expectation of response time for alternative policies, conditional upon the execution time of a command. This presents a much more informative picture of the performance observed by the user, This research has shown that very practical results concerning stochastic service systems can be obtained through Markovian sequential decision processes. Future research in both the theoretical and applied areas is likely to be profitable. Attention should be given to decision processes where additional constraints are imposed on the set of admissible policies, As an example we suggest decision processes where the system state is not completely observable, so that the same control action must be used in a subset of system states. Another valuable effort would deal with systematizing the formulation of transition probabilities and such data for complex models, This formulation has been somewhat burdensome in the model. undertaken here. Finally, we would emphasize that future research ought to be well motivated by realistic models for practical systems. Future developments in time-shared computer organization are bound to be a gold mine in this respect for the system engineer,

APPENDIX A SUPPLEMENTARY THEOREMS Several theorems used in Chapter IV are presented here. The first of these is the well-known Abel's theorem, a proof of which is given by Rudin ([56], p. 160). The second theorem stems from a corollary la of Widder ([89], p. 182) for the Laplace-Stieltjes transform. 00 Abelian Theorem I: Let f(a) = anan be convergent for n=o 0 a < 1. If T an + A as T o, n=o then f(a) + A as a + l100 Abelian Theorem II' Let f(a) = j anan be convergent for n=o 0 < a < 1 -If T T- an A as T+ o n=o. then (l-a))f(a) - A as a -+ 1Theorem III: Let an, n = 0,1,2,o.., be a uniformly bounded sequence, so that 232

233 00 f(a) = annd n=o is absolutely and uniformly convergent for 0 < a < 1o Then T lir inf ( ) f() >lir inf 1 V 1 (l-oa)f(a) > -I4 ~- an n=o Proof: Let a be the uniform bound on the an, so that Ian\ < a. The stated convergence properties of f(a) follow immediately from theorems in Rudin ([56], p. 52, p. 62, p, 134). Moreover both sides of the above inequality are finite. T Let ST = an, and observe that aT = ST-ST_-, for T = 1,2,oo. n=o Then 00 f() = aO + I (Sn-Sn i)n n=l 00 - (i-) Snan n=o from the absolute convergence of f(a). So 00 (i-a)f(a) = (1i-c) Sn n=o Consider a fixed value of T > 1o One can write T-1 oo (1-a))f() = (-1_)2 Sno + (1_a)2 Sn3d n=o n=T T-I oo = (i-)2 i Sn + (i-a)2 k ~ na n=o n=T

234 The term Sn/n can be positive or negative for any n > T. In any case however S > g.l.b. Sn n - n>T n where g.l.b. denotes the greatest lower bound. The existence of this bound follows from the boundedness of the an. Therefore T-1 oo (l-a)f(a) > ( 1a) Sn + (a)2 t.l. Sn nn n=o n=T and evaluating the last summation T-l (l-a)f(a) > (LCa) 2 Snen + gl.b. S [TcT(_-,) + aT+l] (A.1) - - ~, n>T n n=o Now for a + 1 we have lim inf (1-a)f(a) > glb. Sn 01"- — n>T n for any T. Upon letting T + oo we have T lim inf f() lim inf 1 cl - TPoo T T an n=o Lemma 1: Let (an), (bn) be two sequences, n = 0,1,2,3,... Suppose that 00 a. l anl is convergent n=o b lim = b nboo n = b

235 Then T oo T+oo anbTn = b an n=o n=o Proof: See Smith [91], po 14. Lemma 2: Let (an), (bn) be two sequences such that a.n =- b T lim 1 V b. noo n = a n=o T lim ab ProofT One can write n=o no no=0 T T T n=o n=o n=o 1 anb - anbT< 1 For a given c > O, let N be such that Ibn-bI <_ for n > N, A and let M denote the least upper bound on Ib-1bI for any no Then we have

236 T T-N anbT-n - ab- < - an T LBTn i A T - n=o n=o T + MT lan n=T-N+l T + Ibl (an-a) n=o Now T can be chosen so large that T 1 1X anbTn - ab < n=o and the lemma is proved.

APPENDIX B ASYMPTOTIC PROPERTIES OF A MARKOV-RENEWAL PROCESS In a Markov-Renewal process the principal interest centers on the random variable Uj(T) and its expectation Mij(T). These have been defined as Uj(T) = the number of times the system enters state j in the interval [1,T], where T = 1,2, 3, Mij(T) = the expectation of Uj(T) given the process begins in state i at time O. or Mj(T) = E[Uj(T)IZo i, Wo 0] Two important theorems concerning the asymptotic behavior of M (T) as T + o are given here. These are established from the theory of simple renewal processes [90^91992]o A Markov-Renewal process contains an imbedded simple renewal process for every recurrent state, Additional references are [93 94,100 ] Elementary Renewal Theorem: Let j be a recurrent state of a Markov-Renewal process so that fjj = 1. Let ~jj denote the mean recurrence time of state j For any state i lim Mi(T) _ fi T+oo T j If j is non-recurrent, ioeo, fjj < 1, the limit above is zero. 237

238 Proof: The proof of this theorem follows from Eq. (4.2), T-1 Mj(T) = Gij(T) + X Gij(T-n)>jj(n) (4.2) n=l and some results of Feller [100] on simple renewal processes. Feller has shown that for a state j for which fjj = 1, lim (n) = (B.) n+~ ~:jj provided that Gjj(T) is not a lattice distribution, i.e., is not periodic. If Gjj(T) is periodic then there exists an integer X > 1 such that gjj(n) > 0 only for values of n which are integer multiples of a. If Gjj(T) is periodic lim jj(nu) 1 n-oo j j If j is not recurrent, Feller establishes that 00 X,jj(n) < 0o (B. 3) n=l From Eq. (4.2) and the fact that Gij(T) is bounded, T-1 li. Pi (T) lim 1V i, To' = T-~oo Gi:(T-n): T(n). 1.." n=l It follows then from Eqs. (B.1), (B.2) and Lemma 2 of Appendix A that if j is recurrent

239 lim Mij(T) fij T00 T Ijj On the other hand, if j is not recurrent it follows from Eq. (B.3) that the limit is zero. A recurrent state of a Markov-Renewal process is called aperiodic if the greatest common divisor of the values n for which 4jj(n) > 0 is unity. The following theorem is also established from Feller's result, in the manner suggested in the discussion of Smith's paper [90]. An alternative proof is given in [91]. Key Renewal Theorem: Let j be an aperiodic recurrent state of a Markov-Renewal process. Let an be a non-negative sequence with 00 an = a n=l Then T-1 lim a(T-n)j(n) = a T-oo0 ~jj(n n=l Proof: This follows immediately from (B.1) and Lemma 1 of Appendix A.

APPENDIX C FORMULAE FOR THE COMPUTER MODEL The data which are required for the decision process are the values k k k of pij, vi, and b. for every i, j, and k. An examination of Table 5.1, which lists all admissible transitions, shows that there are 475 transition probabilities to be specified. In addition, there are 141 values k k each for the mean transition time v. and the one-transition return b.. It will be tedious both for the reader and the author to explain the derivation of the formula which yields each of these values as a function of the system parameters of Table 5.2. We shall rely on the illustrations given in Chapter V to convey the method. Here we will merely list the results of the derivations. There are a number of terms common to many of the formulae. In order to condense the presentation we list these terms in Table C.1 as individual components G(m) of a vector G. k k It is convenient to also treat the values p.. and v. as elements of vectors P and T respectively. The vector P will be constructed by suck cessively inserting the values of pij for the admissible values of j and k, in ascending order by i from i = 1to i = 75- The first value of 1 1 1 P is p. Behind p1, we place the values p.l in ascending order by j. Because there are no other actions than k = 1 for state i = 1, the next 1 1 1 1 element of P is P21' Behind this are placed P22, P25-... P2 44- Con240

241 TABLE C.1 ELEMENTS OF THE VECTOR G m G(m) -kS -Xs 1 e -22s 2 e -33s 3 e -AS 4 e -2XS 5 e -3kS 6 e 7 1- -3kS 8 1-0+80 e 9 - 0 + 0 e 10 6+9e e2 11 Z e nf n= 1 n n=l e2 -2 U e n n=l1 An U 14 Z fn= F(e2) n=l

242 TABLE C.1 (Continued) m G(m) o0 15 n fne 1- F(e2) n= 1 2 -Xe2 16 e -2,e2 17 e -3Xe2 18 e -X(e2 + s) 19 e -2X(e2 + s) 20 e -3X(e2 + s) 21 e e3-e2 / 22 7 e f f n=l n+e2 / n +e2 e3-e2 00 23 Z e2Xnef / fne n e f 2 n= 1 n2 1 n+e2 25 1 n+e2 jZoon+e2 n=1 1n2 I "^2 26 7 f / f n=1l 3 / n+e2 275 e -(e3 - e2) 27 e

243 TABLE C.1 (Continued) m G(m) -2X(e3 - e2) 28 e -3(e3- e2) 29 e -X(e3- e2 + s) 30 e -2X(e3- e2 + s) 31 e -3X(e3- e2 + s) 32 e q An /co 334 e f / 3 n=1 n+e n+e n=l e3 /n=l 3 34 e- f f n=l 3 nl e3 3 35 e fn e f / f n=l n+e3 n+e3 q /00 n=l e n=l +e3 00 /00 37 3 f fn+e n=q+1 +e3 /n=1 e3 38 e'X 39 e-24q 40 e-3~q

244 TABLE C.1 (Concluded) m G(m) 41 eX(s) 42 e-2x(s +q) 43,-3X(s +q) 43 e 44 1/(l-e-4) e2 oo 2 45 s+8S+e fne + E nfn n= 1 +2 n=!( e3 e2 46 s+ -e2 _) f_+ n f 3 2)e n= 1 e n=1 +e2 -q t)o 0/co n e3 nl n+e3+q /nl n+e3 48 G(46) - s 49 G(47)- s

245 1 1 tinuing in this manner, we eventually will insert P848 behind p8, 26 2 Now state i = 8 also has k = 2 as a permitted action. So P8 2 goes be1 2 hind P8,48 and this is followed by P8,7o For a given i, all the elements pk. are inserted in P before proceeding to place elements for the state ij i + 1. Figure Col illustrates the structure of Po The vector T is structured in essentially the same manner as P. Here however there is only one element for given i and k. The first 1 element of T is vpo Succeeding elements of the vector are obtained by k going to the next higher value of i and inserting vi for each of the admissible values of ko See Fig. Co2. The formulae for the computer model will be presented in terms of the elements of P, T, and G. In order to make it convenient to determine the appropriate entry for given values of i, j, and k, we have listed in Table C.2 the first of the entries used for a given state i, Thus if we are interested in the formulae for state i = 20, we see from Table C.2 that the first P entry pertaining to state 20 is P(164). Referring to Table 5o1 and recalling the structure of P, we find that P(164) is P2 8 Counting the entries in the row i = 20 of Table 5o 1 20,8 we see that p20,45 we see that p20, 45 is given by P(171). Tables C.3 and C.4 give the formulae for all entries of the P and T vectors. The formulation of the decision process for the computer model is completed by the formulae for the one-transition returns, bio. A nearly complete discussion of these formulae has been given in Chapter V.

246 Vector P P(1) Pl P(2) p12 0 P(5) p1, 41 P(6) 1 0 P(13) P, 44 0 1 P(57) P8, 48 P (58) P 2 2 P(63) P 45 -~P (64) 1p41 P(64) P, Fig.;~~~~. S - P. t Fig. C. l Structure of the vector P of transition probabilities pi.:

247 Vector T T(1) v T(2) v T(3) 2 3 3 T(4) v4 Fig. C.2. Structure of the vector T of mean transition times vi.

248 TABLE C.2 CORRESPONDENCE OF STATE INDICES AND VECTOR ELEMENTS State First P First T State First P First T idex element for i element for index element for i element for i i i 1 1 1 39 316 65 2 6 2 40 324 67 3 14 3 41 328 68 4 22 4 42 330 69 5 30 5 43 334 71 6 38 6 44 338 73 7 46 7 45 342 75 8 52 8 46 346 77 9 64 10 47 350 79 10 76 12 48 356 82 11 88 14 49 360 84 12 100 16 50 366 87 13 106 17 51 370 89 14 118 19 52 376 92 15 124 20 53 380 94 16 136 22 54 384 96 17 142 23 55 390 99 18 154 25 56 394 101 19 160 26 57 400 104 20 164 27 58 406 107 21 172 29 59 412 110 22 180 31 60 418 113 23 188 33 61 422 115 24 196 35 62 428 118 25 204 37 63 432 120 26 216 40 64 434 121 27 224 42 65 438 123 28 236 45 66 440 124 29 244 47 67 444 126 30 256 50 68 448 128 31 264 52 69 452 130 32 268 53 70 456 132 33 276 55 71 460 134 34 280 56 72 464 136 35 288 58 73 468 138 36 296 60 74 470 139 37 304 62 75 474 141 38 312 64

249 TABLE C.3 ELEMENTS OF THE VECTOR P OF TRANSITION PROBABILITIES m P(m) 1 0 2 4e (1- e-)/(1- e ) -2X X)3 -4X 3 6e "(1-e- /(1-e ) 4 4 e(l- e- )/(1- e-) 5 (1-.e ) /(1-e-4 6 G(3) G(10) G(13) 7 3(G(2) G(9) G(12) - P(6)) 8 G(10) G(15) G(21) 9 3(G(1) G(8) G(1)) - 2P(7) - 3P(6) 10 3(G(9) G(20) - G(10) G(21)) G(15) 11 G(14) - P(6) - P(7) - P(9) 12 3G(8) G(15) G(19) - 2P(10) - 3P(8) 13' G(15) - P(8) - P(10) - P(12) 14 G(3) G(24) 15 3(G(2) G(23) - P(14)) 16 G(26) G(32) 17 3G(1) G(22) - 3P(14) - 2P(15) 18 3(G(31) - G(32)) G(26) 19 G(2 5) -.P(14) - P(15) - P(17) 20 3G(26) G(30) — 3P(16) - 2P(18) 21 G(26) - P(16) - P(18) - P(20) 22 G(3) G(35) 23 3(G(2) G(34) - P(22)) 24 G(37) G(43) 25 3G(1) G(33) - 3P(22) - 2P(23) 26 3(G(42) - G(43)) G(37) 27 G(36) - P(22) - P(23) — P(25) 28 3G(37) G(41)- 3P(24)- 2P(26) 29 G(37) - P(24)- P(26) — P(28) 30 G(24) 31 3(G(23) - G(24)) 32 G(26) G(29) 33 3G(22) - 3P(30) - 2P(31) 34 3(G(28) - G(29)) G(26) 35 G(25) - P(30) -P(31) - P(33) 36 3G(26) G(27) — 3P(32) - 2P(34) 37 G(26) - P(32) - P(34)- P(36) 38 G(35)

250 TABLE C.3 (Continued) m P(m) 39 3(G(34) - G(35)) 40 G(37) G(40) 41 3G(33) - 3P(38) -2P(39) 42 3(G(39) - G(40)) G(37) 43 G(36) - P(38) - P(39) - P(41) 44 3G(37) G(38) - 3P(40) - 2P(42) 45 G(37) - P(40)- P(42) — P(44) 46 G(2) G(9) G(12) 47 2(G(1) G(8) G(11) - P(46)) 48 G(9) G(15) G(20) 49 G(14) - P(46) - P(47) 50 2(G(8) G(19) - G(9) G(20)) G(15) 51 G(15) - P(48) - P(50) 58 G(2) G(23) 59 2(G(1) G(22) - P(58)) 60 G(26) -G(31) 61 G(25) - P(58) - P(59) 62 2(G(26) G(30) - P(60)) 63 G(26)- P(60) - P(62) 70 G(2) G(34) 71 2(G(1) G(33) - P(70)) 72 G(37) G(42) 73 G(36) - P(70)- P(71) 74 2(G(41)- G(42)) G(37) 75 G(37)- P(72) - P(74) 82 G(23) 83 2(G(22) - G(23)) 84 G(26) G(28) 85 G(25)- P(82)- P(83) 86 2(G(26) G(27) - P(84)) 87 G(26) - P(84) - P(86) 94 G(34) 95 2(G(33)- G(34)) 96 G(37) G(39) 97 G(36) - P(94) - P(95) 98 2(G(38) - G(39)) G(37) 99 G(37) - P(96) - P(98)

251 TABLE C.3 (Continued) m P(m) 160 G(1) G(8)G(ll) 161 G(14) - P(160) 162 G(8) G(15) G(19) 163 G(15)- P(162) 168 G(1) G(22) 169 G(25) - P(168) 170 G(26) G(30) 171 G(26) - P(170) 176 G(1) G(33) 177 G(36)- P(176) 178 G(37) G(41) 179 G(37)- P(178) 1184 G(22) 185 G(25) - P(184) 186 G(26) G(27) 187 G(26) - P(186) 192 G(33) 193 G(36) - P(192) 194 G(37) G(38) 195 G(37) - P(194) 328 G(14) 329 1- P(328) 332 G(25) 333 1- P(332) 336 G(36) 337 1- G(36) 52- 57 64-69 P(46) to P(51) resp. 88-93 88-93

252 TABLE C.3 (Continued) m P(m) 100- 105 106- 111 118-123 l P(58) to P(63) resp. 124- 129 130- 135 P(94) to P(99) resp. 142- 147 P(82) to P(87) resp. 112- 117 136- 141 148- 143 P(70) to P(75) resp. 154- 159 164- 167, 172- 175, 180- 183, 188- 191, P(160) to 196- 199,204-207,2 16-2 19,224-227, P(163) 236-239,244-247,256-259 resp. 200-203,208-2 11, 220-223,228-231, P(168) to 264-267, 268-271, 276-279,280-283, P( 171) 288-291, 296-299, 304- 307 resp. 212-215,240-243,252-255,260-263, P(176) to 272-275,292-295, 300-303, 308-311, P(179) 312- 315, 320-323, 324-327 resp. 248-251, 316-319 P(184) to P(187) resp. 232-235, 284-287 P(192) to P(195) resp. 330- 1, 334- 5, 338-9, 342-3, 346-7, P(328), 350- 1, 360- 1, 366-7,370- 1, 376-7, P(329) 380- 1, 384- 5, 390- 1, 394- 5, 400- 1, resp. 406-7, 412-3, 418-9, 422-3, 428-9

253 TABLE C.3 (Concluded) m P(m) 340- 1, 348-9, 352-3, 358-9, 362-3, 372-3,382-3,386-7,392-3,396-7, P(332), 402-3, 408-9, 414- 5, 424- 5, 432-3, P(333) 434-5, 438-9, 440- 1, 444-5, 448-9, resp. 452-3,456-7,460- 1, 464-5, 470- 1 344- 5, 354- 5, 364- 5, 368-9, 374- 5, 378-9, 388-9, 398-9, 404- 5,410- 1, P(336), 416-7,420- 1,426-7,430- 1, 436-7, P(337) 442-3, 446-7, 450- 1, 454-5, 458-9, resp. 462-3, 466-7,468-9,472-3,474-5

254 TABLE C.4 ELEMENTS OF THE VECTOR T OF MEAN TRANSITION TIMES m T(m) I~~~~1 ~G(44) 2, 7,8, 10, 12, 14,26,27, 29, 31, 33, 35, 37, 40, 42, 45, G(45) 47, 50, 68, 69, 71, 73, 75, 77, 79, 82, 84, 87, 89, 92, 94, 96, 99, 101, 104, 107, 110, 113, 115, 118 3, 9, 16, 17, 19, 20, 28, 36, 38, 41, 43, 52, 53, 55, 56, G(46) 58, 60, 62, 70, 78, 80, 83, 85, 95, 97, 100, 102, 105, 108, 111, 120, 121, 123, 124, 126, 128, 130, 132, 134, 136 4, 11, 18,22,24,25, 30, 39, 46, 49, 51,54,59, 61, 63, G(47) 64,66,67,72,81,88,91,93,98, 106, 109, 112, 114, 117, 119, 122, 127, 129, 131, 133, 135, 137, 138, 140, 141 5, 13,23, 32, 48, 65, 74, 90, 116, 139 G(48) 6, 15, 21, 34, 44, 57, 76, 86, 103, 125 G(49)

255 as indicated there, it is convenient to decompose b. into several terms i k k+6 k k k bk = + i- +i Except for o, the values of the remaining terms for any i and k are given in Chapter V, Section 5.3ol. The term O. is the part of the onetransition return which arises from new arrivals, and accounts for the time-in-system between the arrival time and the end of the transition interval. A general expression for i is k~ - -wl(4-Xi)Te where Xi is the total number of jobs in system at the beginning of the transition, and T is the expected time-in-system for a new arrival during the transition intervals In Table C.o we have listed the values of Tk as elements of a vector A. The vector A has the same structure as the vector To Hence the correspondence of the state index and the vector elements indicated in Table C.2 for T applies as well to A.

256 TABLE C. 5 ELEMENTS OF THE VECTOR A, MEAN TIME-IN-SYSTEM OF A NEW ARRIVAL DURING A TRANSITION m A(m) I 4( + (e - 1))/(1- e4x) 2,7,8, 10, 12, 14,26,27, {1- G(8) (G(11) + G(15) G(16)) G(1)}/X 29, 31, 33, 35, 37, 40, 42, 45, 47, 50 3,9, 16, 17, 19,20,28, {1- (G(22) + G(26) G(27)) G(1)}/X 36, 38,41, 43, 52, 53, 55, 56, 58, 60, 62 5, 13, 23, 32, 48, 65 1 - (G(22) + G(26) G(27)) /. 4, 11, 18,22,24,25,30, |1- (G(33) + G(37) G(38))G(1)}/X 39, 46, 49, 51, 54, 59, 61,63,64,66,67 6, 15,21, 34, 44, 57 (1 - (G(33) + G(37) G(38))}/X All other values 0. from m = 1 to m= 141

APPENDIX D THE SEMI-MARKOV SYSTEM INVESTIGATOR (SEMSIN) D. GENERAL DESCRIPTION This appendix gives a general description of the computer program which has been developed in this research. The program has two major functions: a. To compute the input data for the Howard algorithm from the formulae of the computer model (see Appendix C and Section 5.23, Chapter V). b. To carry out the computation of the Howard algorithm (see Section 4.4, Chapter IV). Because the program is applicable to both discrete-time and continuoustime Semi-Markov decision processes, and because it has utility both for optimization and analysis, we feel it desirable to apply a name suggesting its general applicability. Hence we call the program a Semi-Markov System Investigator, and use the acronym SEMSIN for briefer reference. The program has been written in the MAD language [31] for the IBM 7090 computer system at The University of Michigan. The various functions to be performed are organized into separate routines, with a MAIN routine being central. Figure D.1 shows the structure of this organization. We will shortly describe the specific function of each routine. For the moment we merely mention that MODEL, XTIME, BSET, and NUSET per257

BSET FMAn n IN f VALUE BETTER RESULT MODEL Li XTIME — ~ BSET (~H NUSET l Fig. D.1. Organization of SEMSIN functions.

259 form the major function a. above, while MAIN, VALUE, BETTER, and RESULT carry out the Howard algorithm. In particular, VALUE carries out the Value Determination Operation, and BETTER carries out the Policy Improvement Operation. Aside from the computation which must obviously be included in the program, we have found the following features to be of considerable value. 1. An accurate accounting for each solution of the total computation time expended, and that expended in VALUE, BETTER, and MODEL. 2. The capability to specify either a single pass through VALUE, i.e., analysis, or the iterative computation required..for optimization. 3 mThe capability to read (as input) an initial policy to start the computation, to use a standard (first-come, first-served) initial policy, or to continue using a previously specified policyo 4. A print-out of the gain and the control policy determined on each iteration of the algorithm. 5o Termination of the algorithm when the policy has been unchanged on two successive iterations, or when the gain has not increased. (See Sections 4.5.1 and 4.5.29 Chapter IV.) 6o Functional separation of the Howard algorithm and the evaluation of the formulae for the model. The former is implemented ex

260 elusively in MAIN, BETTER, VALUE, and RESULT. All other subroutines are concerned only with the particular model. 7. Separation of the execution time distribution F(T) from the most substantial parts of formula evaluation, the routines MODEL, BSET, and NUSET. F(T) enters directly only in XTIME, thus facilitating modifications of F(T). 8. The capability to print values of the test quantities for each state and each action, if desired. 9. A print-out on each iteration of the maximum difference over all states between the test quantity for the current policy and that of any other policy. After describing each routine briefly, we will discuss the performance of the program in solving the optimization problem for the computer model, D. 2 SUBROUTINES D. 2.1 MODEL: The subroutine MODEL computes the transition probabilk ities pij. It also checks their accuracy by testing for each i and k whether N L Pij 1 j=L and k 0 < pi - <I for all j.

261 The calculation involved is the evaluation of the formulae given in Table C.3, Appendix C. The numerical values determined are stored in a vector as discussed in Appendix Co Before evaluating the formulae MODEL must first evaluate the subsidiary quantities contained in the array G, see Appendix Co Part of this evaluation is done by XTIMEo MODEL also sets up the initial policy for the computation. D.2o2 XTIME: This subroutine computes the elements of the vector G which depend upon the execution time distribution F(T). It also computes the partial return 1. (see Appendix C and Section 5.3 1 of Chapter V)o This partial return depends only upon F(T)o Do 23 NUSET4 This routine computes the values of v. for every i and k, and stores them in a vector T as described in Appendix C. k D.2.4 BSET: This routine computes the values of bi and stores them in a vector B having the same structure as T. D. 25 MAIN: The main part of SEMSIN, called MAIN9 sequences the entire computation and tests for completion of the computation. Figure D.2 is a simplified block diagram of the sequence. D. 26 VALUE: The routine VALUE carries out the Value Determination Operation of the Howard algorithm, solving the system of equations N xi + xNvI = b + LP~jxj (i =-1,2, 0 0N-l): k + k j=1 XN = bN + NjXj j=1

Read.Yes. Start System Model Value ptimize?1 Parameters End \^.i^ Better Policy New Result No No Result End Fig. D.2. Simplified diagram of MAIN sequence.

263 where xi = v+g, within the limits of computational errors. By assumption, vN = 0, so that xN = g. The solution of this system is performed by a library subroutine (SLE.) available at the UM Computer Center. The method is a sophisticated form of Gaussian elimination. D.207 BETTER: This routine performs the Policy Improvement Operation of the Howard algorithm. D.2.8 RESULT: This routine serves a utility function, converting from the internal representation of the policy to produce a printed tabular representation which is more readily interpreted. D. 3 PERFORMANCE The performance of SEMSIN in obtaining solutions of the computer model is sufficiently good to recommend the Howard algorithm as a practical and economical procedure for large models. The important aspects of its performance as observed in this research are summarized in Table D lo The size of the program, including storage reserved for the data arrays but excluding auxiliary routines obtained from the UM library, is 19,000 computer wordso

264 TABLE D.ol OBSERVED SEMSIN PERFORMANCE ON THE COMPUTER MODEL Item Observation Number of iteration for From 2 to 5, depending upon initial optimization policy. Time for formula evaluation 0.7 sec. approx. Time per iteration 17 sec. Time in VDO per iteration 15.8 sec. Effect of computational Tolerable. In almost all runs reported errors in Chap. VI it was clear that an optimal policy was found. No non-terminating iteration was revealed.

LIST OF REFERENCES 1. J. Riordan, Stochastic Service Systems, John Wiley and Sons, Inc., New York, 1962. 2. E. Brockmeyer, H. L. Halstrom, and A. Jensen, "The Life and Works of A. K Erlang," Trans. of the Danish Academy of Technical Sciences, No. 2, 1948. 3. D. R. Cox and W. L. Smith, Queues, Methuen and Co., Ltd., London, and John Wiley and Sons, Inc., New York, 1961. 4. W. -Feller, An Introduction to Probability Theory and Its Applications, John Wiley and Sons, Inc., New York,.1957. 5. T. C. Fry, Probability and Its.Engineering Uses, D. Van Nostrand Co., Inc., Princeton, No J., 1928. 6. D. G. Kendall, "Some Problems in the Theory of Queues," J. Royal Statistical Soc., B, 13, 151-173 and 184-185, 1951. 7. P. M. Morse, Queues, Inventories and Maintenance, John Wiley and Sons, Inc, New York, 1958. 8. T. L. Saaty, Elements of Queueing Theory, McGraw-Hill, New York, 1961. 9. R. Syski, Introduction to/Congestion Theory in Telephone Systems, Oliver and Boyd, London, 1960. 10. L. Takacs, Introduction to the Theory of Queues, Oxford Univ. Press, New York, 1962. 11, J. Licklider, "Man-Computer Symbiosis," IRE Trans. on Human Factors, 1, 4-11, 1960. 12. J. Licklider and W. E. Clark, "On-Line Man-Computer Communication," Proco Spring Joint Computer Conf., National Press, Palo Alto, Calif., 1962, pp. 113-128. 13. G. J. Culler and R. W. Huff, "Solution of Nonlinear Integral Equations Using On-Line Computer Control," Ibid., pp. 129-138 14. R. R. Everett, C. A. Zraket, and H. D. Bennington, "SAGE - A Data Processing System for Air Defense," Proco Eastern Joint Computer Conf., Washington, D. C., 1957, PP. 148-155. 265

266 LIST OF REFERENCES - Continued 15. R. A. McAvoy, "A Central Computer Installation as a Part of an Air Line Reservations Systems," Proc. Fourth Annual Computer Applications Symposium, Armour Research Foundation, Chicago, 1957. 16. R. W. Hamming, "Frontiers in Computer Technology," Proc. Fifth Annual Computer Applications Symposium, Armour Research Foundation, Chicago, 1958, pp. 7-17. 17. F. J. Corbato, M. Merwin-Daggett, R. C. Daley, "An Experimental TimeSharing System" Proc. Spring Joint Computer Conf.,National Press, Palo Alto, Calif., 1962, pp. 335-344. 18. F. J. Corbato, et al., The Compatible Time-Sharing System,: A Programmer's Guide, The M.I.T. Press, Cambridge, Mass., 1963. 19. R. M. Fano, "The MAC system: the computer utility approach," IEEE Spectrum, 2, 1, 56-64, 1965. 20. So Boilen, et al., "A Time-Sharing Debugging System for a Small Computer," Proc. Spring Joint Computer Conf., Sparton Books, Inc., Baltimore, Md., 1963, PP. 51-57. 21. J. B. Dennis, "A Multiuser Computation Facility for Education and Research," Comm. of the Association for Computing Machinery, 7, 9, 521-529, 1964. 22. T. M. Dunn and J. H. Morrissey, "Remote Computing - An Experimental System,"(Part I), Proc. Spring Joint Computer Conf., Sparton Books, Inc., Baltimore, Md., 1964, pp. 413-42323. J. M. Keller, E. C. Strum, and G. H. Yang, "Remote Computing - An Experimental System,"(Part II), Ibid., pp. 425-443. 24. H. A. Kinslow, "The Time-Sharing Monitor System," Proc. Fall Joint Computer Conf., Sparton Books, Inc., Baltimore, Md., 1964, pp. 443454. 25. J. Schwartz, E. Coffman, and C. Weissman, "A General Purpose TimeSharing System," Proc. Spring Joint Computer Conf., Sparton Books, Inc., Baltimore, Md., 1964, pp. 397-411. 26. J. C. Shaw, "JOSS: A Designer's View of An Experimental On-Line Computing System," Proc. Fall Joint Computer Conf., Sparton Books, Inc., Baltimore, Md., 1964, pp. 455-464.

267 LIST OF REFERENCES - Continued 27. Computation Center Staff, "The Dartmouth Time-Sharing System: A Brief Description," Dartmouth College, Hanover, N. H., October 1964. 28. J. B. Dennis, "Segmentation and the Design of Multiprogrammed Computer Systems " IEEE Convention Record, Session 66, 1965. 29. Bo Arden, B. Galler, T. O'Brien and F. Westervelt, "Program and Addressing Structure in a Time-Sharing Environment," Computing Center, The University of Michigan, Ann Arbor, Michigan, April 1965, (submitted for publication). 30. Jo E. Dresher and CO A. Zito, "The IBM 7741- A CommunicationsOriented Computer-" IEEE Convention Record, Pt. 5 216-224, 1964. 31. Bo Arden, B. Galler, and R. Graham, "MAD: Michigan Algorithmic Decoder," Computer Center, The University of Michigan, Ann Arbor, Michigan, 1964. 32. Ao Samuel, "Some Studies in Machine Learning Using the Game of Checkers," IBM J. Research and Development, 211-229, July, 1959. 33. Fo JO Corbato, Lectures at the Engineering Summer Conferences on "Automatic Programming" and "Digital Computers in Real Time," The University of Michigan, June, 1964, 34. R. Io Wilkinson, "Theories for Toll Traffic Engineering in the UoS.A.," Bell System Techo J., 35, 421-15i4 1956. 355 J. Otterman, "Grade of Service of Direct Traffic Mixed with Storeand Forward Traffics" Bell System Tech. J. 41, 1415-1438, July, 1962. 36. Do Go Haenschke, "Analysis of Delay in Mathematical Switching Models for Data Systems," Bell System Techo J., 42, 709-736s May, 196337. A. Bo Fontaine, "Queueing Characteristics of a Telephone Data Transmission System with Feedback," IEEE Trans. Communications and Electronics, No. 68, 449-455, Sept., 19635 38, L. Kleinrock, Communication Nets: Stochastic Message Flow and Delay, Lincoln Laboratory Publications, McGraw-Hill Book Co., 1964. 39, A. Weingarten, "Storage Requirements for a Message Switching Computer," IEEE Trans Communication Systems, CS-12, 191-195, 1964.

268 LIST OF REFERENCES - Continued 40. P. E. Boudreau and M. Kac, "Analysis of a Basic Queueing Problem Arising in Computer Systems," IBM J. Research and Development, 132-140, April, 1961. 41. P. E. Boudreau, J. S. Griffing, and M. Kac, "A Discrete Queueing Problem with Variable Service Times," IBM J. Research and Development, 6, 407-418, October, 1962. 42. R. L. Boyell, "Analysis of Time-Sharing of Digital Computers," J. Soc. Industrial and Applied Mathematics, 8, 102-124, March, 1960. 43. I. Flores, "Derivation of a Waiting Time Factor for a Multiple-Bank Memory," Journal of the ACM, 11, 3, 265-282, July, 1964. 44. B. B. Tasini and S. Winograd, "Multiple Input-Output Links in Computer Systems," IBM J. Research and Development, 6, 306-328, July, 1962. 45. S. A. Korff, Electron and Nuclear Counters, D. Van Nostrand Co., Inc., New York, 1955. 46. E. Parzen, Stochastic Processes, Holden-Day, Inc., San Francisco, 1962. 47. L. Kleinrock, "Analysis of a Time-Shared Processor," Naval Research Logistics Quarterly, 11, 1, 59-73, March 1964. 48. E. G. Coffman, Jr., and B. Krishnamoorthi, Preliminary Analyses of Time-Shared Computer Operation, Professional Paper SP-1719, System Development Corporation, Santa Monica, Calif., 19 August 1964. 49. A. Cobham, "Priority Assignment in Waiting Line Problems," Operations Research, 2, 1, 70-76, 1954. 50. J. Holley, "Waiting Line Subject to Priorities," Operations Research, 2, 3, 341-343, 1954. 51o Ro W. Conway and W. L. Maxwell, "Network Dispatching by the ShortestOperation Discipline," Operations Research, 10, 1, 51-73, 1962. 52. D. W. Fife, "Scheduling with Random Arrivals and Linear Loss Functions," Management Science, 11, 3, 429-4-37 1965. 53. J. R. Jackson, "Some Problems in Queueing with Dynamic Priorities," Naval Research Logistics Quarterly, 7, 3, 235-249, 1960.

269 LIST OF REFERENCES - Continued 54. V. L. Wallace and R. S. Rosenberg, Recursive Queue Analyzer, Cooley Electronics Laboratory, The University of Michigan, Ann Arbor, Michigan, to be published. 75. D. W. Fife and R. S. Rosenberg, "Queueing in a Memory-Shared Computer, Proco Ninteenth National Conf. of the ACM, Philadelphia, August, 1964. 56. W. Rudin, Principles of Mathematical Analysis, McGraw-Hill Book Co., New York, 19553 57. W. Kaplan, Operational Methods for Linear Systems, Addison-Wesley Publishing Co., Inco, Reading, Mass., 1962. 58. Ro Pyke, "Markov Renewal Processes: Definitions and Preliminary Properties,"' Annals of Math. Statistics, 32, 1231-1242, 1961. 59. R. Bellman, "A Markovian Decision Process," J. Mathematics and Mechanics, 6, 679-684, Sept. 1957. 60. Ro Bellman, Dynamic Programming, Princeton Univ. Press, Princeton, N. J., 1957. 61. R. Bellman and So Dreyfus, Applied Dynamic Programming, Princeton Univ. Press, Princeton, N. J., 1962. 62. K. Chuang, Optimal Markovian Queueing Systems, Cooley Electronics Laboratory Technical Report No. 1i53 The University of Michigan, Ann Arbor, June, 19630 63. Ro Howard, Dynamic Programming and Markov Processes, The M.I.T. Press, Cambridge, Masso, 1960. 64. D. Blackwell, "On the Functional Equation of Dynamic Programming," J. Math. Analysis and Applications, 2, 273-276, 1961o 65. D. Blackwell, "Discrete Dynamic Programming," Annals of Math. Statistics, 33, 719-726, 1962. 66. C. Derman, "On Sequential Decisions and Markov Chains," Management Science, 9, 16-24, 19620 67. Co Derman, "Stable Sequential Control Rules and Markov Chains," Jo Matho Analysis and Applications, 6, 257-265, 1963o

270 LIST OF REFERENCES -Continued 68. Ao Manne, "Linear Programming and Sequential Decisions," Management Science, 6, 259-267, April,19600 69. P. Wolfe and G. Dantzig, "Linear Programming in a Markov Chain," Operations Research, 10, 702-710, 1962. 70. M. Wagner, "On the Optimality of Pure Strategies," Management Science, 6, 268-269, 1960. 71. G. de Ghellinck, "Application de la Theorie des Graphes Matrices de Markov et Programmes Dynamiques," Cahiers Centre d'Etudes de Recherche Operationnelle, 3, 1 5-35, 1961. 72. D. J. White, "Dynamic Programming, Markov Chains, and the Method of Successive Approximations," Jo Math. Analysis and Applications, 6, 373-376, 19630 73. W. S. Jewell, "Markov-Renewal Programming," Operations Research, 11, 938-971, 1963 (in two parts). 74. R. Howard, Semi-Markovian Control Systems, Tech. Report No. 3, Contract Nonr-1841 (87), Operations Research Center, Masso Institute of Technology, Cambridge, December 1963. 75. J. De Cani, "A Dynamic Programming Algorithm for Embedded Markov Chains When the Planning Horizon is at Infinity," Management Science, 716-733, 1964. 76. J. Eaton and L. Zadeh, "Optimal Pursuit Strategies in Discrete-State Probabilistic Systems'"A oS.MEo Transactions, Journal of Basic Engineering, 23-29, March 19620 77. J. J. Florentin, "Optimal Control of Continuous Time, Markov Stochastic Processes," Electronics and Control, First Series, Vol. X, June 1961. 78. K. Jo Astrom, "Optimal Control of Markov Processes with Incomplete State Information," J. Matho Analysis and Applications, 10, 174-205, 1965. 79. A. A. Feldbaum, "On the Optimal Control of Markov Objects," Automatika i Telemekhanika, 23, 8, 993-1007, August 1962 (Translation published February 1963) 80. Go de Leve, "Stochasticke oo-staps beslissings-problemen," Statistica Neerlandica, 16, 433-448, 1962.

271 LIST OF REFERENCES - Continued 81. Co Derman, "On Optimal Replacement Rules When Changes of State are Markovian," Chapto 9, Mathematical Optimization Techniques, Report R-596-PR, The Rand Corp., Santa Monica, Calif., April 19635 82o C. Derman, "Optimal Replacement and Maintenance Under Markovian Deterioration with Probability Bounds on Failure," Management Science, 9, 478-481, April 19630 83 Mo Go De Vries, A Dynamic Model for Product Strategy Selection, Industrial Development Research Program, The University of Michigan, Ann Arbor, August 1963 84. Y. H. Rutenberg, Sequential Decision Models, Contract Nonr-1141(08), Operations Research Group, Case Institute of Technology, Cleveland, April 19610 89. S. Karlin, "The Structure of Dynamic Programming Models," Naval Research Logistics Quarterly, 2, 285-294, 1955. 86. G. F. Simmons, Introduction to Topology and Modern Analysis, McGrawHill, New York, 1963 87. J. Kelley, General Topology, Do Van Nostrand Co., Inc., Princeton, N. J,, 1955, 88 R. Pyke, "Markov-Renewal Processes with Finitely Many States," Annals of Math. Statistics, 32, 1243-1259, 1961. 89. Do Widder, The Laplace Transform, Princeton Univ. Press, Princeton, No J., 1941. 90. W. L. Smith, "Renewal Theory and Its Ramifications," J. Royal Statistical Soc., B, 20, 243-302 (with discussion), 1958. 91. W. L. Smith, "Asymptotic Renewal Theorems," Proc. Royal Soc. Edinburgh, 64A, 9-48, 1954. 92. A. Jo Fabens, "The Solution of Queueing and Inventory Models by SemiMarkov Processes," J. Royal Statistical Soc., B, 23, 113-127, 1961. 93. S Karlin and A. Jo Fabens, "Generalized Renewal Functions and Stationary Inventory Models," JO Math, Analysis and Applications, 5, 461-487, 1962,

272 LIST OF REFERENCES - Concluded 94. Ro Barlow, "Applications of Semi-Markov Processes to Counter Problems," Studies in Applied Probability and Management Science, Arrow, Karlin, and Scarf (Editors), Stanford University Press, 1962. 95. F. R. Gantmacher, Applications of the Theory of Matrices, Interscience Publishers, Inc., New York, 1959. 96. J. Kemeny and Jo Snell, Finite Markov Chains, Do Van Nostrand Co., Inc., Princeton, N. J., 1960. 97. R. M. Brown, "An Experimental Study of an On-Line Man-Computer System," IEEE Trans. Electronic Computers, EC-14, 82-85, 1965. 98. R. Rosin, "Determining a Computing Center Environment," Comm. of the ACM, 8, 463-468, 1965. 99. E. Walter and V. Wallace, Further Analysis of a Computing Center Environment, Cooley Electronics Laboratory Technical Report, The University of Michigan, Ann Arbor, Michigan (to be published). 100. W. Feller, "Fluctuation Theory of Recurrent Events," Trans. Am. Math. Soc., 67, 98-119, 1949.

Unclassi fied Security Classification DOCUMENT CONTROL DATA - R&D (Security claesification of title, body of abstract and indexing annotation must be entered when the overall report is clasaified) 1. ORIGINATING ACTIVITY (Corporate author) 2a. REPORT SECURITY C LASSIFICATION Cooley Electronics Laboratory Unclassified The University of Michigan 2b CGROUP Ann Arbor, Michigan __ 3. REPORT TITLE THE OPTIMAL CONTROL OF QUEUES, WITH APPLICATIONS TO COMPUTER SYSTEMS 4. DESCRIPTIVE NOTS (Type of report and inclueive datee) Technical Report S5 AUTHOR(S) (Loat name, firt name, initial) Fife, Dennis W. 6. REPORT DATE 7a. TOTAL NO. OF PACES 7b. NO. OF REFS October 1965 289 - 100 8a. CONTRACT OR GRANT NO. 9a. ORIGINATOR'S REPORT NUMBER(S) AF 30(602)-3553 o6868-1-T b. PROJECT NO. c. 9b. OTHER RtPORT NO() (Any other numbere that may be aigned thie report) d. CEL TR-170 10. A VA IL ABILITY/LIMITATION NOTICES II. SUPPLEMENTARY NOTES 12' SPONSORING MILITARY ACTIVITY United States Air Force Rome Air Development Center ___ome, New York 13. ABSTRACT An important element of a system with queues is the procedure for controlling the use of system resources in processing the random demands for service. Many systems admit a wide class of procedures. A-nd the system engineer requires an optimization technique to isolate an optimal policy within the admissible class. This research pursues the theoretical development and application of Markov sequential control processes for this purpose. Major effort is devoted to a discrete-time Semi-Markov decision process, and the optimization of performance return from this process over infinite time. A timely example for the application of the technique is the scheduling control problem of multiple-access or time-shared computers. A. Semi-Markov model is developed for a computer serving four remote consoles with three queueing levels for user jobs. The computational method devised by Howard and Jewell is used to determine optimal policies. Results for the computer model show that optimal policies are predominantly priority policies. Also, a need is indicated for more rapid preemption of low priority jobs than used in existing time-shared systems. The mean response time for given execution time is compared for the optimal system and the control procedures now being used. The practicality of the optimization results establish that this new technique can profitably be used in the design of stochastic service systems.I D iJAN64 1473 Unclassified Security Classification

Unclassified Security Classification. 14. LINK A LINK B LINK C KEY WORDS ROLE WT ROLE WT ROLE WT Queue s Computers Markov processes Control Optimization INSTRUCTIONS 1. ORIGINATING ACTIVITY: Enter the name and address imposed by security classification, using standard statements of the contractor, subcontractor, grantee, Department of De- such as: fense activity or other organization (corporate author) issuing (1) "Qualified requesters may obtain copies of this the report. report from DDC." 2a. REPORT SECURITY CLASSIFICATION: Enter the. over (2) "Foreign announcement and dissemination of this all security classification of the report. Indicate whether re t authori "Restricted Data" is included. Marking is to be in accord by DDC is not authorized ance with appropriate security regulations. (3) "U. S. Government agencies may obtain copies of this report directly from DDC. Other qualified DDC 2b. GROUP: Automatic downgrading is-specified in DoD Di- users shall request through rective 5200.10 and Armed Forces Industrial Manual. Enter the group number. Also, when applicable, show that optional * markings have been used for Group 3 and Group 4 as author- (4) "U. S. military agencies may obtain copies of this ized. report directly from DDC. Other qualified users 3. REPORT TITLE: Enter the complete report title in all shall request through capital letters. Titles in all cases should be unclassified.,, If a meaningful title cannot be selected without classification, show title classification in all capitals in parenthesis (5) "All distribution of this report is controlled. Qualimmediately following the title. ified DDC users shall request through 4. DESCRIPTIVE NOTES: If appropriate, enter the type of.__._ report, e.g., interim, progress, summary, annual, or final. If the report has been furnished to the Office of Technical Give the inclusive dates when a specific reporting period is Services, Department of Commerce, for sale to the public, indicovered. cate this fact and enter the price, if known. 5. AUTHOR(S): Enter the name(s) of author(s) as shown on 11. SUPPLEMENTARY NOTES: Use for additional explanaor in the report. Enter last name, first name, middle initial, tory notes. If military, show rank and branch of service. The name of the principal author is an absolute minimum requirement. 12. SPONSORING MILITARY ACTIVITY: Enter the name of the departmental project office or laboratory sponsoring (pay6. REPORT DATE Enter the date of the report as day, ing for) the research and development. Include address. month, year; or month, year. If more than one date appears on the report, use date of publication. 13. ABSTRACT: Enter an abstract giving a brief and factual summary of the document indicative of the report, even though 7a. TOTAL NUMBER OF PAGES: The total page count it may also appear elsewhere in the body of the technical reshould follow normal pagination procedures, i.e., enter the port. If additional space is required, a continuation sheet shall number of pages containing information. be attached. 7b. NUMBER OF REFERENCES: Enter the total number of It is highly desirable that the abstract of classified reports references cited in the report. be unclassified. Each paragraph of the abstract shall end with 8a. CONTRACT OR GRANT NUMBER: If appropriate, enter an indication of the military security classification of the inthe applicable number of the contract or grant under which formation in the paragraph, represented as (TS), (S), (C), or (U) the report was written. There is no limitation on the length of the abstract. How8b, 8c, & 8d. PROJECT NUMBER: Enter the appropriate ever, the suggested length is from 150 to 225 words. military department identification, such as project number, 14. KEY WORDS: Key words are technically meaningful terms subproject number, system numbers, task number, etc. sub t n, s m n, tk n, ec or short phrases that characterize a report and may be used as 9a. ORIGINATOR'S REPORT NUMBER(S): Enter the offi- index entries for cataloging the report. Key words must be cial report number by which the document will be identified selected so that no security classification is required. Identiand controlled by the originating activity. This number must fiers, such as equipment model designation, trade name, military be unique to this report. project code name, geographic location, may be used as key 9b. OTHER REPORT NUMBER(S): If the report has been words but will be followed by an indication of technical conassigned any other report numbers (either by the or~gmn~tor text. The assignment of links, rules, and weights is optional. or by the sponsor), also enter this number(s). 10. AVAILABILITY/LIMITATION NOTICES: Enter any limitations on further dissemination of the report, other than those Unclassified Security Classification

UNIVERSITY OF MICHIGAN 1111111III 111111111111111111111111111111111 II 3 9015 02826 3641 THE UNIVERSITY OF MICHIGAN DATE DUE (& /